Thursday, 28 August 2025

Spotlight on: Taylor & Francis Group - DataSeer SnapShot

This year, the judges have selected four finalists for the ALPSP Award for Innovation in Publishing.

The finalists will be showcased in a lightning presentation session at the ALPSP Conference on 10 September, with the winners announced at the ALPSP Conference Awards Dinner.

In this series, we learn more about each of the finalists and their entries.

Tell us about your organization. 

Taylor & Francis supports diverse communities of experts, researchers and knowledge makers around the world to accelerate and maximize the impact of their work. We are a leader in our field, publish across all disciplines and have one of the largest Humanities and Social Sciences portfolios. Our expertise, built on an academic publishing heritage of over 200 years, advances trusted knowledge that fosters human progress. Under the Taylor & Francis, Routledge and F1000 imprints, we publish more than 2,500 journals, 8,000 new books each year and partner with more than 700 scholarly societies. For this ALPSP Innovation awards entry we worked jointly with DataSeer, an AI Open Science solution provider, to test, and enhance at scale its SnapShot tool with the aim to review and improve the transparency and replicability of published research.  

What is the project/product that you submitted for the Awards? 

In this collaborative project, we have worked closely with DataSeer to pilot and configure SnapShot, an AI tool designed to rapidly assess whether submitted manuscripts meet our data sharing requirements and editorial policy expectations. Rather than performing generic presence/absence checks for evidence of data sharing, SnapShot is trained to align with the specific requirements of our Taylor & Francis data sharing policies (see image), not just to check for evidence of data sharing, but also to assess the author’s data sharing approach and where improvements could be made. The intention is to support us to consistently enforce our editorial data sharing policies at scale, while removing the burden of these complex checks from our administrative staff - and supporting our authors in their publishing journey. 

 

Tell us a little about how it works and the team behind it 

SnapShot is developed by DataSeer, a company specialising in AI-driven tools for open science compliance. SnapShot combines a natural language processing (NLP) pipeline to identify datasets and extract relevant text with a large language model (LLM) that evaluates policy compliance, checks repository links, and generates recommended next steps. 

In our pilot, SnapShot was configured to align with multiple levels of Taylor & Francis’s data sharing policies, from “Share on Request” to “Open Data”. The tool can screen submissions in seconds and generate editor- and author-facing feedback tailored to our policy requirements. 

Our internal teams – including editorial operations, open science, and implementation specialists – working closely with DataSeer’s AI and editorial experts to evaluate the tool’s performance, accuracy, and usability, ensuring it supports rather than replaces human judgment in our editorial workflows. 

In what ways do you think it demonstrates innovation? 

Currently there are no comparable research integrity tools which can support this level of manuscript checking for data sharing. The data availability statements which authors use to describe their datasets are not templated or consistent, and datasets can be shared in various ways including on request, via supplementary files, or in data repositories. Some authors need to use policy exemptions to avoid sharing sensitive data openly. Without significant training, journal administrators can struggle to identify appropriate data sharing - the DataSeer Snapshot can assess a manuscript and return feedback in moments. DataSeer’s process includes an assessment of data sharing methods used by the author, identification of data repositories used, checks of URLs for live data, and analysis of exemption requests to confirm legitimacy. We aimed for an accuracy benchmark of 75%, but with our current iteration the tool is already scoring 97%, which is very impressive given the complexity of the checks. 

This tool creates a huge opportunity for publishers to begin enforcing more stringent data sharing checks across their journals and portfolios. We know that the implementation of editorial policies for data sharing has slowed in recent years, and the STM Association’s Research Data Program estimated approximately 52% uptake as of 2020. The TIER2 project (via a group of 20 representatives of academic publishers) has identified major barriers to the implementation of data sharing policies, including costs, resources, training gaps and a lack of scalable technical solutions. The continued gap in FAIR (Findable, Accessible, Interoperable, Reuseable) research data policy implementation and enforcement means journals lose out on key benefits including enhanced transparency and trust, increased citations, alignment with global funder policy requirements, and even deterrence of bad actors or papermills. The Snapshot tool allows publishers to support better research transparency and data sharing in a consistent and scalable way. 

What are your plans for the future? 

We are continuing to evaluate and refine the SnapShot tool through a staged development roadmap, with the aim of exploring scalability across our portfolio. 

In the immediate term, we are working with DataSeer to expand SnapShot’s capabilities in line with both the Taylor & Francis and F1000 Open Data policies. This includes developing more advanced checks for: 

  • Data licensing, ensuring that shared datasets meet requirements for reusability. 
  • Repository suitability, confirming that datasets are deposited in appropriate and trusted repositories. 
  • Formal data citations, supporting improved credit and discoverability. 

We are also preparing to launch a live pilot within our editorial submission workflows, which will allow us to gather performance metrics and qualitative feedback from journal administrators on areas such as triage speed, accuracy, and editorial usability. 

Based on the success of this pilot, our roadmap includes several further developments: 

  • Templated author communications: We are working with DataSeer to generate bespoke, policy-aligned email templates for administrative and editorial teams to use when requesting changes to data availability statements. 
  • Iterative refinements: Feedback from the live pilot, including from journal administrators via KGL (KnowledgeWorks Global) and other implementation partners, will directly inform future improvements to the tool’s logic, outputs, and usability. 
  • Conversational AI interface: Looking further ahead, we are exploring the development of a “chat with the AI” feature that would allow editors and authors to interact with SnapShot in a more dynamic way – asking questions, receiving explanations, and tailoring feedback in real time. 

Through these enhancements, we hope to not only support better compliance with data sharing policies, but also to build scalable, AI-assisted workflows that make the publication process more efficient, transparent, and researcher-friendly. 

About the author

This blog was co-authored with the support of Tim Vines and Adrian Stanley from the DataSeer team. 

Dr. Rebecca Taylor-Grant is Director of Open Science Strategy & Innovation at Taylor & Francis, where she leads the development of policies, practices and pilots to support the publication of open, transparent and reproducible research. She has a background in data management for the humanities and social sciences and is co-chair of the STM Association’s Research Data Program Humanities Data Subgroup, as well as the Research Data Alliance’s Research Data Policy Interest Group.  

No comments:

Post a Comment