Wednesday, 20 June 2018

Artificial Intelligence: What It Is, How It Works, and What Publishers Can Do with It


Atypon logo
AI was one of the hot topics at last year's ALPSP conference, in this guest blog Hong Zhou, Senior Product Manager for Information Discovery and AI at Atypon give us the 101 on this transformational development.

Artificial intelligence, or AI, is much more than the latest technology buzzword. According to Gartner, by 2020, AI will positively change the behavior of billions of workers and users. And Tata estimates that the vast majority of those workers will work outside of IT.

But what exactly is AI?

AI is a broad set of technologies that use the computational capabilities of machines to “think” like humans. There are many different types of AI, each of which can be used to solve different problems.

So how can AI be employed by scholarly publishers? Ultimately, any publishing technology should make the research experience more productive, increase content usage, and add value to the publisher’s content. To do that, R&D at Atypon explores ways to help readers discover useful and relevant information more quickly by improving search mechanisms and refining content recommendations.

Making content relevant: Recommender systems

Recommender systems will be familiar to anyone who has received suggestions about what other products to buy before or after making an online purchase. Publishers can use them to target relevant products to individual customers by understanding their online site behavior and interests.

Anticipating what readers want: Personalized search

AI-driven recommendation technology can be extended to personalize search as well: reading histories can be used to adjust search rankings specifically to each user—and even suggest new queries that may be relevant—with the goal of understanding a user’s intentions even before they search.

Faster, easier content classification: Semantic auto-tagging

Content tagging underlies many important website capabilities, such as automating the creation of topic-specific pages and content bundles, and powering search results and content recommendations. But tagging documents and maintaining tag sets can be a daunting undertaking. Auto-taggers powered by intelligent machine learning algorithms tag articles accurately and even identify which tags may not be assigned correctly. They save curators time by letting them concentrate their efforts only on content that’s assigned low “confidence scores” by the auto-tagger, thus making it easier for publishers to implement and manage taxonomies.

Content enrichment: Natural language processing

Keywords are traditionally extracted or selected manually, but doing it automatically requires a large amount of training data to identify relationships among topics and key phrases. By enabling machines to understand the meaning of content rather than just the individual words, they can extract more valuable information from content. Natural language processing (NLP) automates key phrase extraction and obviates “teaching” the engine about the content first. By extracting key phrases from different sections of the content and ranking them based on their importance, NLP ultimately improves content categorization and, by extension, content discovery.

Beyond tagging and metadata: Knowledge graphs

A knowledge graph charts all of the possible connections among publication-related information like authors, topics, journals, articles, and even external knowledge databases. Based on these connections, algorithms identify and recommend to researchers the most influential entities, trending topics, and even co-authors and reviewers based on their areas of specialization and the subjects about which they’re writing.

Granular discoverability for text and images: Semantic enrichment

Suppose a researcher wants to interpret many figures associated with a single experiment. Editors have to segment them manually using specialized software—problematic when processing a large number of them. Machine learning can be used to extract sub-figures and captions from compound figures and even separate labels from their associated images, enabling each item to be searched and retrieved individually. Such automation not only reduces the cost of segmentation but also extracts and organizes more valuable information so researchers can search for, compare, and recommend images more precisely and easily.

Search the science, not the text

AI is no longer an aspirational conversation about the future—many of the technologies discussed above are all available today and in use by publishers. By using AI to provide better search results for researchers—and enable publishers to target content more effectively—publishers can deepen researchers’ engagement with their websites, increase the value of their content, and further the pursuit of scientific knowledge by surfacing the information they need more quickly and accurately.


Hong Zhou works on Atypon’s next-generation information discovery technologies. Previously, he was the CTO of Digital Fineprint, a startup that leveraged machine learning algorithms for the insurance industry. He also spent a year designing race car games at Eutechnyx. He holds a PhD in 3D modeling with artificial intelligence algorithms from Aberystwyth University and has published widely on computer science.


Atypon is the proud sponsor of our Awards Dinner at the ALPSP Annual Conference which will take place on 12-14 September this year.