Thursday, 28 August 2025

Spotlight on: Taylor & Francis Group - DataSeer SnapShot

This year, the judges have selected four finalists for the ALPSP Award for Innovation in Publishing.

The finalists will be showcased in a lightning presentation session at the ALPSP Conference on 10 September, with the winners announced at the ALPSP Conference Awards Dinner.

In this series, we learn more about each of the finalists and their entries.

Tell us about your organization. 

Taylor & Francis supports diverse communities of experts, researchers and knowledge makers around the world to accelerate and maximize the impact of their work. We are a leader in our field, publish across all disciplines and have one of the largest Humanities and Social Sciences portfolios. Our expertise, built on an academic publishing heritage of over 200 years, advances trusted knowledge that fosters human progress. Under the Taylor & Francis, Routledge and F1000 imprints, we publish more than 2,500 journals, 8,000 new books each year and partner with more than 700 scholarly societies. For this ALPSP Innovation awards entry we worked jointly with DataSeer, an AI Open Science solution provider, to test, and enhance at scale its SnapShot tool with the aim to review and improve the transparency and replicability of published research.  

What is the project/product that you submitted for the Awards? 

In this collaborative project, we have worked closely with DataSeer to pilot and configure SnapShot, an AI tool designed to rapidly assess whether submitted manuscripts meet our data sharing requirements and editorial policy expectations. Rather than performing generic presence/absence checks for evidence of data sharing, SnapShot is trained to align with the specific requirements of our Taylor & Francis data sharing policies (see image), not just to check for evidence of data sharing, but also to assess the author’s data sharing approach and where improvements could be made. The intention is to support us to consistently enforce our editorial data sharing policies at scale, while removing the burden of these complex checks from our administrative staff - and supporting our authors in their publishing journey. 

 

Tell us a little about how it works and the team behind it 

SnapShot is developed by DataSeer, a company specialising in AI-driven tools for open science compliance. SnapShot combines a natural language processing (NLP) pipeline to identify datasets and extract relevant text with a large language model (LLM) that evaluates policy compliance, checks repository links, and generates recommended next steps. 

In our pilot, SnapShot was configured to align with multiple levels of Taylor & Francis’s data sharing policies, from “Share on Request” to “Open Data”. The tool can screen submissions in seconds and generate editor- and author-facing feedback tailored to our policy requirements. 

Our internal teams – including editorial operations, open science, and implementation specialists – working closely with DataSeer’s AI and editorial experts to evaluate the tool’s performance, accuracy, and usability, ensuring it supports rather than replaces human judgment in our editorial workflows. 

In what ways do you think it demonstrates innovation? 

Currently there are no comparable research integrity tools which can support this level of manuscript checking for data sharing. The data availability statements which authors use to describe their datasets are not templated or consistent, and datasets can be shared in various ways including on request, via supplementary files, or in data repositories. Some authors need to use policy exemptions to avoid sharing sensitive data openly. Without significant training, journal administrators can struggle to identify appropriate data sharing - the DataSeer Snapshot can assess a manuscript and return feedback in moments. DataSeer’s process includes an assessment of data sharing methods used by the author, identification of data repositories used, checks of URLs for live data, and analysis of exemption requests to confirm legitimacy. We aimed for an accuracy benchmark of 75%, but with our current iteration the tool is already scoring 97%, which is very impressive given the complexity of the checks. 

This tool creates a huge opportunity for publishers to begin enforcing more stringent data sharing checks across their journals and portfolios. We know that the implementation of editorial policies for data sharing has slowed in recent years, and the STM Association’s Research Data Program estimated approximately 52% uptake as of 2020. The TIER2 project (via a group of 20 representatives of academic publishers) has identified major barriers to the implementation of data sharing policies, including costs, resources, training gaps and a lack of scalable technical solutions. The continued gap in FAIR (Findable, Accessible, Interoperable, Reuseable) research data policy implementation and enforcement means journals lose out on key benefits including enhanced transparency and trust, increased citations, alignment with global funder policy requirements, and even deterrence of bad actors or papermills. The Snapshot tool allows publishers to support better research transparency and data sharing in a consistent and scalable way. 

What are your plans for the future? 

We are continuing to evaluate and refine the SnapShot tool through a staged development roadmap, with the aim of exploring scalability across our portfolio. 

In the immediate term, we are working with DataSeer to expand SnapShot’s capabilities in line with both the Taylor & Francis and F1000 Open Data policies. This includes developing more advanced checks for: 

  • Data licensing, ensuring that shared datasets meet requirements for reusability. 
  • Repository suitability, confirming that datasets are deposited in appropriate and trusted repositories. 
  • Formal data citations, supporting improved credit and discoverability. 

We are also preparing to launch a live pilot within our editorial submission workflows, which will allow us to gather performance metrics and qualitative feedback from journal administrators on areas such as triage speed, accuracy, and editorial usability. 

Based on the success of this pilot, our roadmap includes several further developments: 

  • Templated author communications: We are working with DataSeer to generate bespoke, policy-aligned email templates for administrative and editorial teams to use when requesting changes to data availability statements. 
  • Iterative refinements: Feedback from the live pilot, including from journal administrators via KGL (KnowledgeWorks Global) and other implementation partners, will directly inform future improvements to the tool’s logic, outputs, and usability. 
  • Conversational AI interface: Looking further ahead, we are exploring the development of a “chat with the AI” feature that would allow editors and authors to interact with SnapShot in a more dynamic way – asking questions, receiving explanations, and tailoring feedback in real time. 

Through these enhancements, we hope to not only support better compliance with data sharing policies, but also to build scalable, AI-assisted workflows that make the publication process more efficient, transparent, and researcher-friendly. 

About the author

This blog was co-authored with the support of Tim Vines and Adrian Stanley from the DataSeer team. 

Dr. Rebecca Taylor-Grant is Director of Open Science Strategy & Innovation at Taylor & Francis, where she leads the development of policies, practices and pilots to support the publication of open, transparent and reproducible research. She has a background in data management for the humanities and social sciences and is co-chair of the STM Association’s Research Data Program Humanities Data Subgroup, as well as the Research Data Alliance’s Research Data Policy Interest Group.  

Spotlight on: Thoth Open Metadata

This year, the judges have selected four finalists for the ALPSP Award for Innovation in Publishing.

The finalists will be showcased in a lightning presentation session at the ALPSP Conference on 10 September, with the winners announced at the ALPSP Conference Awards Dinner.

In this series, we learn more about each of the finalists and their entries.

Tell us about your organization

Thoth Open Metadata is a UK-registered non-profit community interest company (CIC) dedicated to advancing open access (OA) publishing and supporting the scholarly community by providing innovative, open metadata management and distribution solutions that are specifically tailored to tackle prevalent issues of getting OA books and chapters distributed into the wider book supply chain. 


In doing so, Thoth helps small, scholar-led, university and library presses with implementing good metadata practice, to ensure their valuable outputs are discoverable and accessible in a wide array of book dissemination channels and archive repositories. 

What is the project/product that you submitted for the Awards?

At the core of our services is Thoth’s platform – a free one-stop solution to efficiently manage and expose open metadata via open APIs under a CC0 dedication, in various industry standards incl. ONIX 2.1, 3.0 and 3.1, MARC, KBART, JSON, and Crossref XML DOI deposits.


We offer publishers a two-tiered service: 


Thoth Free: Publishers have free and unlimited self-service access to the Thoth platform, through which they can create, manage and export metadata following diverse industry standards as well as platform-specific configurations.


Thoth Plus: Publishers are provided with a set of managed services covering aspects of automated DOI registration, sending metadata and book files to key stakeholders in the larger book supply chain, as well as the archiving of long-form scholarship on behalf of a publisher. 


Thoth also offers additional services such as bespoke metadata creation, data ingest of back catalogues into the Thoth database, and hosting of book files via Thoth File Hosting. As an extension, Thoth Website Hosting offers a customisable website template tailored to the needs of OA publishers and consortia / organisations managing OA book catalogues. Thoth’s re-usable white-label approach to hosting is already being used in Thoth’s own website and central metadata catalogue, while also powering an increasing number of publisher and consortia websites such as those of Open Book Publishers, and the consortium of Netherlands University Presses (currently under development).

Tell us a little about how it works and the team behind it
Poster Source: https://doi.org/10.5281/zenodo.15672341 
Thoth was borne out of the infrastructural and research work begun under the remit of the Community-led Open Publication Infrastructures for Monographs (COPIM) project (2019-23), which is now being continued through Copim Open Book Futures (2023-26). Both projects receive funding via Arcadia and Research England/UKRI. 

With bibliodiversity and equitable sustainability as two of Thoth’s key tenets, engagement with publishers from a variety of regional contexts was sought early on. Thoth was conceived from the start as a fully open dissemination system


This poster showcases the metadata workflow of a collection of small scholar-led as well as institutional publishers, through the usage of the Thoth platform. Using Thoth, metadata can be ingested, managed, and exported to a variety of major content platforms and ebook aggregators in the global book supply chain, incl. EBSCOHost, ProQuest Ebook Central, Google Books, JSTOR, Project MUSE, and OAPEN, as well as metadata indexes incl. the Directory of Open Access Books (DOAB), OCLC WorldCat, Web of Science, etc. Benefitting from Thoth’s role as an official Crossref sponsor, publishers can automatically register DOIs for books and chapters, and Thoth supports good metadata practice through the use of open and persistent identifiers. OA titles are archived via the Thoth Open Archiving Network, a community-led, transparent and open approach to preserving the scholarly record.


We believe it is also crucial to highlight that Thoth has been developed by publishers, for publishers. Thoth’s community-led governance model is exemplified by its Board of Directors, which comprises representatives from award-winning independent Diamond OA publishers Open Book Publishers, punctum books, and Mattering Press. 

In what ways do you think it demonstrates innovation?

Metadata is a crucial part of the publishing process for both open access and non-OA works, and publishers have highlighted a central need for an open system to create and manage metadata. Therefore, Thoth’s open platform and services are understood as an important intervention in the OA book ecosystem, leveraging the power of open data to connect to other key infrastructures and services. 


In our exchanges with 70+ individual publishers, associations and networks from across the globe, one key element raised as a major difficulty faced particularly by small and medium-sized publishers is the implementation of metadata workflows on commercial platforms, with high complexity and costs usually associated with these processes. This presents a steep barrier for publishers seeking to participate in the larger OA book dissemination ecosystem. Thoth’s platform and services provide an important, equitable and accessible resource that is making publishers’ lives easier on a daily basis.


Everything we do at Thoth is open. Our software is open source, tailor-made for OA book metadata, and our metadata is released under a CC0 dedication permitting sharing and reuse. Our open APIs offer seamless integration with other platforms.


Moreover, at Thoth, we firmly believe in providing publishers with the freedom to choose their own path. Our commitment to openness ensures that publishers are not locked in to any specific platform or service, and all data created within Thoth can be exported in multiple formats - thus making it easy for publishers to move to another service should they want to stop using Thoth. Thoth's open architecture and APIs empower publishers to integrate seamlessly with other platforms and workflows, allowing for greater flexibility and adaptability, such as through the use of the Thoth-Open Monographs Press plugin. Thoth enables publishers to retain full control over their data and operations. With Thoth, you're not just accessing a platform – you're joining a community-driven ecosystem built on the principles of openness, collaboration, and innovation.


This is also reflected in our work to establish close connections with key infrastructures active in the field of Open Access publishing. As a result, Thoth has now been established as a sponsor with Crossref, become a full member of European research infrastructure network OPERAS as well as OASPA, and signed collaboration agreements with the OAPEN Foundation and the Directory of Open Access Books (DOAB) to formalise already-existing close working relationships with each of those infrastructures.


Having established the foundational technical capabilities for open book metadata, we are now developing collaborations with other open infrastructures to enhance interoperability and create robust open infrastructures. We are working closely with like-minded stakeholders active in the field of OA book publishing e.g. with the Public Knowledge Project, OAPEN, DOAB, and Crossref, and have recently founded a Working Group within the OPERAS network dedicated to Open Infrastructures for OA Books, which is jointly being coordinated by Thoth and the Open Book Collective. Moreover, Thoth is listed in Invest in Open Infrastructure’s InfraFinder, the OPERAS Pathfinder, and the European Diamond Capacity Hub Registry, is a Signatory and active contributor to the Barcelona Declaration on Open Research Information, and contributes to the Collaborative Metadata (COMET) community of practice.

What are your plans for the future?

Later this year we will be unveiling our new Thoth Usage Statistics dashboard, offering publishers robust, privacy-focused insights into the usage of their open-access books and chapters. Leveraging the OPERAS Metrics platform to provide aggregated and standardised open data across multiple platforms, Thoth’s service is tailored to help publishers understand and optimise the impact of their publications.


In the next years Thoth will be working towards sustainability and resilience, and will continue to connect, develop and adapt, as we focus on: 

  • Automating distribution to platforms in the OA ecosystem and global book supply chain 

  • Assisting publishers in improving metadata quality, while also implementing good metadata practice for books and chapters alike

  • Implementing our multilingual interface, and extended metadata schema

  • Exposing Thoth’s database via OAI-PMH and investigating provision of OPDS, while also looking to integrate additional export formats such as BIBFRAME.

  • Supporting publishers with legal deposit of books with national libraries in a global context, as well as with their local ISBN registration workflows

  • Extending the uptake of open infrastructure by libraries


About the authors

Hannah Hillen is Metadata & Publisher Outreach Specialist at Thoth Open Metadata. She works closely with publishers, supporting them in use of the Thoth platform and services. She also works on the forming of partnerships and implementation of dissemination channels between Thoth and platforms throughout the global book supply chain. Hannah has an MA in Librarianship and a 15-year background in cataloguing, archiving and preservation of print and digital material in academic libraries and special collections.


Her ORCID: https://orcid.org/0009-0004-9521-0445 



Toby Steiner
is COO and Product Manager at Thoth Open Metadata. He also works on collaborative outreach across open infrastructures for the Copim Open Book Futures project, and co-coordinates the OPERAS Open Infrastructures for OA Books Working Group. He is a co-convener of the Radical Open Access Collective and sits on the Editorial Advisory Board for the OAPEN Open Access Books Toolkit.