Monday, 1 September 2025

What Recent Court Decisions Mean for AI, Copyright, and Scholarly Publishing

By Roy S. Kaufman, CCC

Bronze sponsor of the ALPSP Annual Conference and Awards 2025. 

AI is not one thing. It’s a collection of technologies, models, and use cases - everything from large language models that generate text, to tools that fine-tune outputs or access content as needed for specialized tasks. That diversity matters when we talk about copyright, licensing, and litigation, because different AI uses raise very different legal and business issues. Moreover, how materials are (or were) gathered for AI, how they are stored, and indeed what judge reviews the case, may well change the outcome. 

Recent U.S. court decisions underscore this point. They don’t settle the AI copyright debate, but they do shape the way we should think about licensing and litigation going forward. They accomplish this paradoxically, by increasing the uncertainty around unlicensed use of content. 


The Warhol Decision: Transformative Use is Not a Free Pass 

Fair use cases rarely reach the Supreme Court, so those that do tend to receive a lot of scrutiny. The Warhol case, which was not about AI, narrowed the way “transformative use” should be applied in fair use analysis. The justices emphasized that transformative use isn’t binary. It’s a matter of degree, and it must be weighed against factors like commercial purpose and market harm. 

In plain terms: it’s not enough to say, “we changed it, so it’s fair use.” Courts will look closely at whether the secondary use serves the same purpose as the original and if it does, the case for fair use gets much weaker. 

For scholarly publishers, that reasoning is significant. AI developers often argue that ingesting and reusing your works is “transformative” because the technology outputs something new. But if the AI performs a similar function to your original, such as supplying factual or analytical content that competes in the same market, it may be seen as substitution, not transformation. 


The AI Cases 

There are more than 40 pending AI training cases in the U.S. (plus others in Japan, Europe, China, UK, etc.) and we now have preliminary decisions in three. If I were to sum them up in a word, it would be “inconsistent.” To vastly oversimplify the results, training AI is non-transformative infringement (Thomson Reuters v. Ross), training is transformative and mostly fair use with a major caveat that pirated materials cannot be used (Bartz), or training is mostly not fair use– but was fair use in the case before the court because the lawyers did not plead correctly and it was irrelevant that pirated content was used (Kadrey). 

I can criticize and disagree with major elements of the last two cases, but what is important is that courts looking at similar use cases (training) are today arriving at vastly different results, and these cases will take years to be resolved. And, because litigation tends to solve yesterday’s problem tomorrow, none of the decided cases mention fine tuning, RAG or MCP, which are newer AI applications and have been the subject of reported deals between AI companies and scholarly publishers. 


Why This Matters for Scholarly Publishing 

Fair use isn’t a blanket defense for mass, unlicensed ingestion, storage and copying of copyrighted content. On the other hand, publishers simply should not rest on their rights given some clearly adverse language in the Bartz and Kadrey cases. Moreover, publishers cannot assume their materials have not been used simply because they refuse to license. In fact, the opposite is true 

From a litigation perspective, whether the AI company uses content for training, fine tuning, RAG, or MCP, we need to consider: 

  • Does the AI use substitute for the original, in whole or in part? 
  • Could there be an actual or potential licensing market for this use? 

But litigation is not an end, it is a means. Publishing is not about preventing others from using content. It is about creating high-quality works and ensuring they are used appropriately, with compensation flowing back to ensure sustainability of future of content creation.  


Practical Steps for Publishers 

While the legal landscape is still evolving, the only wrong thing to do is wait on the sidelines. The necessary steps are: 

  • Define your licensing terms now. Specify how your works can be used in AI training, RAG, MCP, or fine-tuning. When content is used without your consent, you have no control and do not get paid. Imperfect licenses are better than no licenses. 
  • License directly or join collective licensing arrangements. Working with CCC and others, publishers can pool rights to create scalable, attractive licensing opportunities for AI developers and users. 
  • Speak up. Engage in industry associations, submit comments to policymakers, and ensure your perspective as an academic publisher is heard in AI governance debates. 
  • Experiment strategically. Some AI developers are already licensing content. Early deals will help you learn the market’s value and refine your terms. 


The Road Ahead 

Litigation will continue for years, and different courts will reach different conclusions. Waiting for “final clarity” on training could mean missing years of licensing revenue and letting market practices solidify in ways that disadvantage you. 

Ultimately, all courts need to take market harm seriously and they will ultimately recognize the legitimacy of licensing as a solution. For scholarly publishers, that’s both a shield and an opportunity. 

The future of AI will be shaped not just in courtrooms, but in direct licensing negotiations, collective rights initiatives, and the choices we make today about how our works can (and cannot) be used.  About the author


Roy Kaufman
is Managing Director of both Business Development and Government Relations for 
CCC. Prior to CCC, Kaufman served as Legal Director, John Wiley and Sons, Inc. He is a member of, among other things, the Bar of the State of New York, the Author’s Guild, and the editorial board of UKSG Insights. Kaufman also advises the US Government on international trade matters through membership in International Trade Advisory Committee (ITAC) 13 – Intellectual Property and the Library of Congress’s Copyright Public Modernization Committee.  He serves on the Executive Committee of the of the United States Intellectual Property Alliance (USIPA) Board. He was the founding corporate Secretary of CrossRef, and formerly chaired its legal working group. He is a Chef in the Scholarly Kitchen and has written and lectured extensively on the subjects of copyright, licensing, open access, artificial intelligence, metadata, text/data mining, new media, artists’ rights, and art law. Kaufman is Editor-in-Chief of Art Law Handbook: From Antiquities to the Internet and author of two books on publishing contract law. He is a graduate of Brandeis University and Columbia Law School. 

Thursday, 28 August 2025

Spotlight on: Taylor & Francis Group - DataSeer SnapShot

This year, the judges have selected four finalists for the ALPSP Award for Innovation in Publishing.

The finalists will be showcased in a lightning presentation session at the ALPSP Conference on 10 September, with the winners announced at the ALPSP Conference Awards Dinner.

In this series, we learn more about each of the finalists and their entries.

Tell us about your organization. 

Taylor & Francis supports diverse communities of experts, researchers and knowledge makers around the world to accelerate and maximize the impact of their work. We are a leader in our field, publish across all disciplines and have one of the largest Humanities and Social Sciences portfolios. Our expertise, built on an academic publishing heritage of over 200 years, advances trusted knowledge that fosters human progress. Under the Taylor & Francis, Routledge and F1000 imprints, we publish more than 2,500 journals, 8,000 new books each year and partner with more than 700 scholarly societies. For this ALPSP Innovation awards entry we worked jointly with DataSeer, an AI Open Science solution provider, to test, and enhance at scale its SnapShot tool with the aim to review and improve the transparency and replicability of published research.  

What is the project/product that you submitted for the Awards? 

In this collaborative project, we have worked closely with DataSeer to pilot and configure SnapShot, an AI tool designed to rapidly assess whether submitted manuscripts meet our data sharing requirements and editorial policy expectations. Rather than performing generic presence/absence checks for evidence of data sharing, SnapShot is trained to align with the specific requirements of our Taylor & Francis data sharing policies (see image), not just to check for evidence of data sharing, but also to assess the author’s data sharing approach and where improvements could be made. The intention is to support us to consistently enforce our editorial data sharing policies at scale, while removing the burden of these complex checks from our administrative staff - and supporting our authors in their publishing journey. 

 

Tell us a little about how it works and the team behind it 

SnapShot is developed by DataSeer, a company specialising in AI-driven tools for open science compliance. SnapShot combines a natural language processing (NLP) pipeline to identify datasets and extract relevant text with a large language model (LLM) that evaluates policy compliance, checks repository links, and generates recommended next steps. 

In our pilot, SnapShot was configured to align with multiple levels of Taylor & Francis’s data sharing policies, from “Share on Request” to “Open Data”. The tool can screen submissions in seconds and generate editor- and author-facing feedback tailored to our policy requirements. 

Our internal teams – including editorial operations, open science, and implementation specialists – working closely with DataSeer’s AI and editorial experts to evaluate the tool’s performance, accuracy, and usability, ensuring it supports rather than replaces human judgment in our editorial workflows. 

In what ways do you think it demonstrates innovation? 

Currently there are no comparable research integrity tools which can support this level of manuscript checking for data sharing. The data availability statements which authors use to describe their datasets are not templated or consistent, and datasets can be shared in various ways including on request, via supplementary files, or in data repositories. Some authors need to use policy exemptions to avoid sharing sensitive data openly. Without significant training, journal administrators can struggle to identify appropriate data sharing - the DataSeer Snapshot can assess a manuscript and return feedback in moments. DataSeer’s process includes an assessment of data sharing methods used by the author, identification of data repositories used, checks of URLs for live data, and analysis of exemption requests to confirm legitimacy. We aimed for an accuracy benchmark of 75%, but with our current iteration the tool is already scoring 97%, which is very impressive given the complexity of the checks. 

This tool creates a huge opportunity for publishers to begin enforcing more stringent data sharing checks across their journals and portfolios. We know that the implementation of editorial policies for data sharing has slowed in recent years, and the STM Association’s Research Data Program estimated approximately 52% uptake as of 2020. The TIER2 project (via a group of 20 representatives of academic publishers) has identified major barriers to the implementation of data sharing policies, including costs, resources, training gaps and a lack of scalable technical solutions. The continued gap in FAIR (Findable, Accessible, Interoperable, Reuseable) research data policy implementation and enforcement means journals lose out on key benefits including enhanced transparency and trust, increased citations, alignment with global funder policy requirements, and even deterrence of bad actors or papermills. The Snapshot tool allows publishers to support better research transparency and data sharing in a consistent and scalable way. 

What are your plans for the future? 

We are continuing to evaluate and refine the SnapShot tool through a staged development roadmap, with the aim of exploring scalability across our portfolio. 

In the immediate term, we are working with DataSeer to expand SnapShot’s capabilities in line with both the Taylor & Francis and F1000 Open Data policies. This includes developing more advanced checks for: 

  • Data licensing, ensuring that shared datasets meet requirements for reusability. 
  • Repository suitability, confirming that datasets are deposited in appropriate and trusted repositories. 
  • Formal data citations, supporting improved credit and discoverability. 

We are also preparing to launch a live pilot within our editorial submission workflows, which will allow us to gather performance metrics and qualitative feedback from journal administrators on areas such as triage speed, accuracy, and editorial usability. 

Based on the success of this pilot, our roadmap includes several further developments: 

  • Templated author communications: We are working with DataSeer to generate bespoke, policy-aligned email templates for administrative and editorial teams to use when requesting changes to data availability statements. 
  • Iterative refinements: Feedback from the live pilot, including from journal administrators via KGL (KnowledgeWorks Global) and other implementation partners, will directly inform future improvements to the tool’s logic, outputs, and usability. 
  • Conversational AI interface: Looking further ahead, we are exploring the development of a “chat with the AI” feature that would allow editors and authors to interact with SnapShot in a more dynamic way – asking questions, receiving explanations, and tailoring feedback in real time. 

Through these enhancements, we hope to not only support better compliance with data sharing policies, but also to build scalable, AI-assisted workflows that make the publication process more efficient, transparent, and researcher-friendly. 

About the author

This blog was co-authored with the support of Tim Vines and Adrian Stanley from the DataSeer team. 

Dr. Rebecca Taylor-Grant is Director of Open Science Strategy & Innovation at Taylor & Francis, where she leads the development of policies, practices and pilots to support the publication of open, transparent and reproducible research. She has a background in data management for the humanities and social sciences and is co-chair of the STM Association’s Research Data Program Humanities Data Subgroup, as well as the Research Data Alliance’s Research Data Policy Interest Group.