Monday 3 February 2014

Managing the open access data deluge without going grey

Cameron Neylon: the OA data deluge
The final two sessions at ALPSP's Data, the universe and everything seminar reflected on the changing nature of data within an open access context and what needs to be taken into account when trying to cope with data.

Cameron Neylon, Advocacy Director at PLOS, counselled delegates on 'Managing a (Different) Data Deluge'. Publishing is now a different business. Customer may look the same, but they act different and you have to think differently. Data is core to the value you give.

There's no sign of the growth trajectory of open access publishers slowing down. PLOS One on its own is 11% of the funded research papers output from the Wellcome Trust. PLOS One is 5% of the biomedical literature. PLOS One Publishes on average 100 papers per day. All the metadata they have comes from the authors and they don’t necessarily have accurate data on who they are or where they are based, so it gets complicated. This is happening on a large scale across scholarly communication services.

Neylon believes that the business of open access publishing is fundamentally different to subscription publishing. With a traditional subscription business you have a pool of researchers and institutions. Advertising and reprints come from third parties. This is a distribution model and not so much about where the research has come from.

With an APC-funded open access business it is a service or push model. The customer is the author at some level. Increasingly (in UK for example) this is coming through the funder. This means that suddenly all these players have an interest) which they didn’t have before). A third model is the funders directly funding infrastructure (e.g. eLife, PDB, Genbank etc).

The customer = institution, the author, the funder. They have questions about how much? How many articles have you published? What's the quality of service? Are there compliance guarantees (this is relatively simple in the UK, but tricky in North America or the EU). They want repository deposit. And all this has to happen at scale. You need to track who funded the research. This means that the market is being commoditized. It also means that the market is smaller, with space to make profit smaller.

Neylon feels that if we do not do this collectively, the whole system will collapse and we’ll be left with one or two big players. Using identifiers, capturing data up front and making it easy for the author to include the correct data up front are key to tackling the issue of the data deluge we face. If we don’t will have lost the opportunities. It’s about shared data identifiers and making them at the core of your systems.

He reflected on the particular challenge that smaller publishers face if they are to survive. They need to share infrastructure across multiple organisations. ALPSP is well placed to support and advise suppliers that smaller publishers need ORCID and FundRef etc up front.

Ann Lawson, Senior Director of Publisher Relations and EBSCOAdvantage Europe, focused on the various challenges for managing open access data without going grey. EBSCO see the impact of data from their own perspective (with 27 million articles in the EBSCO database products) and also from the perspectives of their client publishers and institutions. They have their own ID systems, but also input any partner or publisher IDs which results in 485 data elements per subscription record.

Ann Lawson: trying not to go grey
In a recent research report drawn from their own data, they've noted that large publishers are getting larger: in 1994 the top 10 publishers were responsible for 19% by value. In 2009, the top 10 publishers represented 50% by value. And in 2013, the top 10 publishers accounted for 68% by value.

In the immediate future, EBSCO see a mixed market of Gold, Green and Subscriptions within scholarly communications. However, there will be an impact on transactions from individual journals, to big deals, to small gold open access APCs. The impact on subscription agents is challenging as they have to keep on doing what they do, plus play in the open access area. There is a challenge of scale and transparency for everyone.

What will these market trends mean for data? There is a new cycle for open access which impacts on the need for data. This includes measures of value for money, speed to publication, reach and impact, reporting, funding sources, and the approval process.

There are data issues for the institution: who are active authors? What funding sources are available? Which funders demand what compliance? Which journals are compliant? What happens at school/research group? How much does the APC cost? Who paid what, with what effect? What reporting is needed for whom? Compliance – and deposit in repositories.

The institution workflow is at the heart of the data flow:

  • Policies
  • Advocacy 
  • OA request form 
  • Acceptance email 
  • Funding pot 
  • Copy of invoice 
  • Article and DOI 
  • CC licence 
  • VAT 
  • Approvals 
  • Records 
  • Reporting and analysis.

The reality is that many publisher systems do not have the ability to adapt their systems. Current points of tension include: money management, complex workflows, and author involvement. Discovery is key, but can be tricky with hybrid journals so discovery at article level is essential. NISO is helping, but there is more work to be done in this and many other areas of data.

No comments:

Post a Comment