ALPSP blog: at the heart of scholarly publishing: Publishers Association

Showing posts with label Publishers Association. Show all posts

Thursday, 20 October 2016

The Brexit Debate

Audrey McCulloch introduces the panel

One of the key topics of conversation this year has, inevitably, been Brexit. Since the results of the UK Referendum on membership of the EU were announced in June, speculation, knee jerk reaction, panic, and uncertainty have been rife. What does it really mean for scholarly communications? Are the worse-case scenarios likely to pass? What are the worse case scenarios and what are the opportunities?

ALPSP, the Publishers Association and The London Book Fair arranged a debate with leading industry figures to find out more. The conversation will continue at the Research & Scholarly Publishing Forum at LBF17 in March.

The debate, introduced by ALPSP CEO Audrey McCulloch and chaired by RELX Group's Richard Mollet featured academic consultant Richard Fisher and Andy Robinson, SVP and Managing Director for Society Services at Wiley.

Currency, taxation and the economy

On a positive note, there are short term currency gains. There is an outside possibility of eliminating VAT on ebooks and digital products and potential to get government departments to support emerging markets and UK research to boost technology market, for example by rebuilding trials sector. However, there is also a risk of the UK becoming a regulatory island. In the short term, the audience were advised to keep an eye on any bump in book sales to check there weren't any inventory blips on a regional basis while the currency is in flux.

Research and development

Scientists for EU has collected examples of academics who refused positions in the UK. There are over 40 examples of researchers being taken off grant applications or made a contributor instead. In institutions, there are 31,000 researchers based in the UK who come from the EU. There is a long term question about how that will impact on the quality and impact of EU research. UK research is a strong contributor to the economy and if damaged could have a long term impact on other sectors. And the European Medicines Agency moving from the UK will have a converse impact on the pharma sector's appetite to launch drugs in the UK. Currently 2% of the world's clinical trials happen in the UK. There is an opportunity to rebuild that, which in turn will bring in pharma investments (GSK has invested £250m in UK since Brexit).

People, influence and perception

There are a number of implications for the publishing industry and related sectors in relation to people. Ten per cent of the UK publishing workforce comes from the EU (compared to 6% in the wider population) and companies will have to consider how to manage that impact.

It was noted that the current Prime Minister, Theresa May, was previously Home Secretary. During her tenure in that role she was very strict on the number of overseas students. It is unlikely this line will change and institutions are watching with concern. The knock on financial effect could be significant with 25,000 students studying in the UK - and 46% of postgraduates coming from the EU - generating income for institutions and local businesses.

Richard Mollet, Richard Fisher, Andy Robinson

How does industry make the case to government without looking like they are moaning about the result? There was a strong reaction post-Brexit from researchers and publishers. We need to respond to the challenges robustly, but on new terms, by putting forward a strong economic case for investment. With three government departments across Education, Culture Media and Sport and Business that influence the world of publishers and researchers it might be tricky to navigate.

There is no doubt that the UK now sits outside debates in EU, for example with open access and open science. Where Britain leads, the world does not necessarily follow so the possibility of isolation is very real. Wiley and ALPSP ran a short poll of their society publishers. in it, 80% of respondents saw the loss of influence in these policy debates, particularly around open science.

Copyright and data protection

In the long term there is possibility of a move to the older world of a US copyright regime, UK regime and EU regime, leading to potential fragmentation. Google will no doubt be watching with interest, particularly around Fair Use issues. The UK is unique in having access to millions of patient records, and if data protection issues can be navigated carefully, there is real research potential in monitoring patient outcome. We have the potential to become world leaders in fields such as stem cell research.

The future?

What will we be talking about at Frankfurt 2021 in five years? Will it still be a big issue or just one of the things we’re dealing with? It is hard to predict. On the one hand, the British political situation is in such flux it is impossible to predict. We should have a much clearer path in 2021. On the other, it will be one of a number of issues that the industry will face.

The industry should reflect on why they don’t employ, publish for or sell to the 52% of voters. June 23 is without doubt one of the most important post-war dates in British history. While there is a worrying flux at government level that will play out over the next two to three years, politicians need to wake up to the importance of research. The industry needs to encourage the UK government to take research seriously and place it at the heart of the negotiation. A national research strategy should be brought together as quickly as possible to ring fence funding up to 2025.

The Brexit debate at the Frankfurt Book Fair was organised by the Publishers Association and ALPSP with the support of the London Book Fair. The discussions will continue at the Research & Scholarly Publishing Forum to be held at LBF in March 2016.

Elsevier has launched a Brexit resource centre to provide benchmark objective data, useful links and other resources.

Tuesday, 4 February 2014

Access to Research pilot launched

Minister for Universities & Science,
David Willetts, addresses the audience

Last night saw the launch of the Access to Research pilot at The Library at Deptford Lounge in Lewisham, South London. The pilot, a two year project in the UK to provide free access to research via computers in public libraries, was launched by the Publishers Licensing Society with guest speaker, the Rt Hon David Willetts, Minister of State for Universities and Science.

The two year pilot has over 1.5 million articles from 8,400 scholarly and academic journals available in 79 local authority libraries. The initiative is supported by trade bodies the Publishers Association and the Association for Learned and Professional Society Publishers, as well as the Society of Chief Librarians and technical partner ProQuest.

Janene Cox, President of the Society
of Chief Librarians

The project will allow users to search and read scholarly research articles while in the library. It is anticipated it will be of particular relevance to small business, students and special interests, where the person doesn't have access to an institutional library.

Libraries and publishers are being encouraged to sign up to boost the number of articles that included and to increase the number of locations where the content can be accessed.

'The government believes in open access, but understands there is a cost to publication.' David Willetts

David Willetts was joined by PLS Chief Executive Sarah Faulder, President, Society of Chief Librarians, Richard Mollet, Chief Executive of the Publishers Association and Phill Hall, project contact at technical partner ProQuest.

Sarah Faulder, PLS Chief Executive

'This is an important initiative and working across organisations in a partnership effort has involved compromise and risks to make this pilot launch.' Janene Cox

ALPSP is delighted to support the project through promoting participation to our members as well as access to our journal Learned Publishing. Further information about the initiative is available on the Access to Research microsite.

News coverage to date includes articles on the BBC, The Bookseller, PR Newswire,

Sunday, 26 May 2013

Text and Data Mining: rights holder licensing tools

Text and Data Mining: international perspectives, licensing and legal aspects was a Journals Publisher Forum seminar organised in conjunction with ALPSP, the Publishers Association and STM held last week in London. This is the last in a series of posts summarising discussions.

Sarah Faulder, Chief Executive of the Publishers Licensing Society announced they are developing PLS Clear – the PLS clearing house – a central window to handle license requests that will be a rights holder search and discovery service.

Text and data mining involves access to, and usage of, articles in bulk. Researchers need to track and contact potentially hundreds of publishers for permission to mine their text. The PLS service will connect researchers to rights owners for search and discovery.

Publishers already entrust licensing their secondary rights to PLS on a non-exclusive basis. As a result PLS has built arguably the most comprehensive database in the UK of publishers and their content (by ISBN/ISSN and, in due course, by DOI). This is a natural role for PLS and the network of Reproduction Rights Organizations all over the world.

They are testing a single discovery portal through which researchers can both find the appropriate publisher(s) and route their permissions requests to the relevant person in the publishing house. The plans are for a generic clearing house. The first application is text and data mining, but it will have wider usage over time.

Text and data mining presents a technical infrastructure problem first and foremost. Licensing is a necessary means of managing access to content where the scale of access increases risk of leakage and therefore piracy, and puts an unacceptable strain on publisher platforms not designed for systematic crawling and scraping.

Carlo Scollo Lavizzari is legal advisor to STM on copyright law, policy and legal affairs. Lavizzari outlined how structuring a license is easy. Leave rhetoric aside and look to business opportunities, it is about defining the terms, what are the sources, input of content, what does the user do with that content? Where is it stored, what is done with it, can it compete or not, etc. Consider the mechanical clause on delivery mechanisms. Should also deal with the end of project – always have an exit strategy! That is the legal skeleton of a legal license.

There are calls for cooperation between those who hold content in public domain, those who hold open access content, those who hold content that is subscribed to or purchased; those who already hold a lot of purchased contents; and researchers who might want to access/mine. The question they haven’t managed to get through with any community is how to combine open access environment and copyright protected license. It is an area where he believes that licensing can provide a solution, but still trying to tackle.

John Billington works in corporate products and services at Copyright Clearance Center, who’ve been working on their Text and Data Mining Pilot Service. They have developed a pilot service that provides licensed, compliant access and retrieval of full text XML and metadata from multiple scientific publishers for the purposes of text mining.

CCC’s role is to provide an authorized means to access and retrieve published content in a standard format. The initial pilot is focused on corporate users with an access, retrieval and licensing layer. Future markets may include corporate marketing users, or academic uses.

He reflected on how it has been challenging to extract full text from different publishers and convert to a normalized format that is usable in text mining technologies. There is a lack of federation and existing tools are still difficult. They are trying to provide a one-stop shop for users and publishers that incorporate standardization, license and business model and access method that works for both sides.

He noted that a researcher wouldn’t want to be limited to what the library is subscribed to. So the tool will show them the metadata for what they aren’t subscribed to. It will filter to help them understand what they have subscribed to or not. They intend to include a purchase mechanism for full text unsubscribed articles. You will be able to download results in normalized XML format. It currently has a web interface, but they are working on an API so they systematize it.

Ed Pentz from CrossRef closed the day by outlining their latest beta application Prospect. They work on the assumption that researchers aren’t doing search or discovery – researchers will know or will have used another tool. Their service relies on DOI Content Negotiation. They are now collecting ORCID IDs for researchers. In text and data mining it is important to have a unique ID for a researcher so you can see who is doing it. They are also including funding information.

DOI content negotiation can serve as a cross-publisher API for accessing full text for TDM purposes. To make use of hits, publishers merely need to register URIs to full-text. NISO is working on a fuller specification on some metadata. They are focusing on an interim solution to at least record URIs to well known licenses. They think it will also be possible to extend to handle embargoes.

He observed that there’s potential to coordinate across initiatives, but only once each organization has individually figured out during their own trial periods. CrossRef are testing the system over the summer and will then assess if it is workable as a production system.

Wednesday, 22 May 2013

Text and Data Mining: practical aspects of licensing and legal aspects

Alistair Tebbit

Text and Data Mining: international perspectives, licensing and legal aspects was a Journals Publisher Forum seminar organised in conjunction with ALPSP, the Publishers Association and STM held earlier this week in London. This is the second in a series of posts summarising discussions.

Alistair Tebbit, Government Affairs Manager at Reed Elsevier, outlined his company's view on evolving publishers’ solutions in the STM sector. Elsevier have text mining licenses with higher education and research institutions, corporates and app developers. They give access to text miners through transfer of content to user’s system enabled by one of two delivery mechanisms under a separate licence: via API or using ConSyn. The delivery mechanisms have been set up and there are running costs. Their policy to date has been to charge the value add to the services to users for commercial organisations, but not academic.

Why is content delivery managed this way? Platform stability is a critical reason. Data miners want content at scale – they generally don’t do TDM on a couple of articles - but delivering scale via their main platform ScienceDirect.com is sub-optimal. APIs or ConSyn are the solution as they leave ScienceDirect untouched. Effectively they are separating machine-to-machine traffic from traffic created by real users going to ScienceDirect.com. Content security is another key issue. Free-for-all access to miners on ScienceDirect would not allow bona fide users to be checked. XML versions are less susceptible to piracy than PDFs. Why is content delivery managed this way? It’s more efficient for genuine text miners. Most miners prefer to work off XML, not from article versions on ScienceDirect. Their delivery mechanisms put the content into data miners' hands fast.

With text and data mining outputs, they use a CC BY-NC licence when redistributing results of text mining in a research support or other non-commercial tool. They require that the DOI link back to the mined article whenever feasible when displaying extracted content. They grant permission to use snippets around extracted entities to allow context when presenting results, up to a maximum of 200 characters or one complete sentence.

Licensing is working well at Elsevier and will improve further. The demand to mine is being met and there are no extra charges in the vast majority of cases. Additional services to support mining will likely be offered as they improve. However, it’s early days. Mining demand is embryonic with low numbers at the moment. Copyright exceptions are a big cause for concern and there is a major risk of spike in unauthorized redistribution. Platform stability may be threatened and there is a risk of a chilling effect on future service innovation.

Duncan Campbell

Duncan Campbell, Associate Director for Journal Digital Licensing at Wiley-Blackwell, provided an overview of emerging solutions in text and data mining with a publisher perspective on intermediary solutions. Text and data mining is important to publishers as it enriches published content, adds value for customers and aids development of new products. For researchers it helps identify new hypotheses, discover new patterns, facts and knowledge. For corporate research and development, the benefits are as above and in addition it accelerates drug discovery and development and maximises the value of information spend.

There are a number of barriers to text and data mining:

Access: how can users get hold of content for text mining
Content formats: there is no standard cross-publisher format
Evaluation: understanding user needs and use cases
Uncertainty: what is allowed by law, what is the use of text and data mining output
Business models: lack of business pricing models e.g. access to unsubscribed content
Scale: define and manage demand, bilaterial licensing unlikely to be scalable.

There is a potential role for intermediary to help with publisher/end user relationship. This could include as a single point of access and delivery; by providing standard licensing terms as well as speed and ease of access. The intermediary may make mining extensible and scalable and they can cover the long tail of publishers and end-users. It also enables confidential access, especially in pharma.

Andrew Hughes

Andrew Hughes, Commercial Director at the Newspaper Licensing Agency (NLA), provided a different perspective on text and data mining. Text mining requires copying of all data to establish data patterns and connections and computers need to index data. Every word on every page has to be copied. Once the copy exists, it needs to be managed. Copying requires access to data so that indexing can only happen on either the publisher database, but there is a risk of damage and disruption unless managed, and expense; or copy provided to text minders’ database where there are costs and control risks for publishers. He believes that you also need to bear in mind that third party licence partners aren’t always as careful with your data as you are.

In the newspaper sector, press packs are produced by text mining. The NLA eClips is a service where the proprietary way of mining content is withheld and a PDF is supplied of the relevant articles. There are substantial risks for publishers in text mining including the potential for technical errors by miners, challenges around data integrity and commercial malpractice. There are also cost implications including the technical loads on systems, management of copies and uses and opportunity costs.

Hughes cited the Meltwater case where the industry had to tackle the unauthorised use of text and data mining for commercial use. It took a lot of time and litigation, but they are now thriving within the NLA rules. They are licensed by the NLA and their users are licensed. It means they are operating on fair and equal terms with competitors and is an example of how licenses can work to the benefit of all parties.

Monday, 20 May 2013

Text and Data Mining: International Perspectives, Licensing and Legal Aspects

Graham Taylor welcomes delegates

Licensing for text and data mining is a minefield for publishers. How do you use the technology? What are the implications of policy development in Europe and internationally? How do you ensure that your licenses are fair and practical?

Text and Data Mining: international perspectives, licensing and legal aspects, a Journals Publishers Forum seminar organised in conjunction with ALPSP, the Publishers Association and STM, gathered together a range of speakers to try and answer these questions. This is the first in a series of posts from the afternoon that provides a summary of the discussion.

Graham Taylor, founder of The Long Game consultancy and director of both the CLA and PLS, kicked off proceedings with a summary of text and data mining. It has been something of a political hot potato, but has recently settled down. He introduced Jonathan Clark from Jonathan Clark & Partners B.V., author of Text Mining and Scholarly Publishing report from PRC, who provided an overview of the 'What, Why and How?' of text and data mining.

Text mining is about mining or extracting the meaning from one or many articles. Sounds a lot like reading, right? Yes, but it's about a machine doing it. If you imagine that you were teaching a machine to read, how would you do it? You can provide formalised rules of grammar and language and teach the machine to read. The other way is a statistical, rule based approach. This is where you take as much text as is possible and tell it to read. Amazingly, machines do make up the rules and start to make sense of it. The best example is Google Translate. It achieves this by sitting on a vast amount of translated content that it searches to match particular phrases. Why is this important? Think of the way that scholarly communication is done and how it is structured. It is essentially to share facts and to shape opinions from them. Data mining is pretty much the domain of machines where you look for patterns and trends.

Why do text mining?

Getting the facts out of the article, making them sensible and enhancing the text.
Systematic literature review: machine reading faster and more of it than humans could ever do, and probably more accurately as well.
Discovery: he referenced brainmap.org - completed manually and has become very important resource for researchers on brain scanning since)
Computational linguistics research: the new rules about making research available

Eefke Smit is Director of Standards and Technology for STM and co-authored a study on Journal Article Mining on behalf of the Publishing Research Consortium. Historically there has been a mix of optimists and pessimists in text and data mining (TDM).

The sceptics claim:

TDM has always over-promised
It is only in specialised fields
The tools are still complicated
It needs manual curation
There are high investments
It is domain dependent
There is no common dictionary
Subject to over ambition in the promise of knowledge discovery.

However, the optimists counter that:

There is a vast digital corpus available and growing
It has more and more application areas (business, legal, social, etc)
The tools are improving fast
Manual work is reduced
It can be public domain or domain precision
Processing power is less of a problem, analytical tools are better, visualisation adds to analysis.

There are some interesting insights into exactly how publishers approach text and data mining in the report as well as insight into what drives the requests. The third part of the report focused on cross-sector solutions to facilitate content mining better. Suggestions made by experts during the interviews included:

standardization of content formats
one content mining platform
commonly agreed access terms
one window for mining permissions
collaboration with national libraries

It was interesting to note that most of the interviewed experts did not see open access as a related issue; access issues relate to datafile delivery or mining on the platform itself.

Richard Mollet on the latest policy

Richard Mollet, Chief Executive of the Publishers Association provided what he described as an 'aide memoir' of how the policy is tracking in the UK and the EU. Since Hargreaves report in 2011, and the UK Government's subsequent acceptance of all the recommendations, the Intellectual Property Office has been tasked with taking this forward. There is a proposed 'three step test' for text and data mining which will allow copying for purpose of analytic techniques. The caveats are:

the person already has a right to access under an existing agreement, NOT the ability to access;
for sole purpose of non-commercial research;
the license may impose conditions of access to licenses or third party system (this allows the publisher to impose some restrictions to avoid degrading the whole system, for maintenance of some form of control).

There is a tension between the challenge of being able to do what is legal under copyright law, but when that is then prevented by a contract. This has made the translation from policy document to parliamentary language even more difficult and has hence been delayed. Due to this, but it's likely that this won't be UK legislation until October 2014.

In parallel, the European Commission has come to that view itself. There are stakeholder dialogue working groups that are trying to identify short term wins. One of these is on data and text mining: trying to ascertain does anyone want to do it and if so how do they want to do it. However, there are real tensions within the Commission with different positions between rights-holder communities who feel they can fix this, with significant work already underway, and the research community that believe the system is broken and needs a complete overhaul. The risk here is that it will move from dialogue to monologue, as researchers have indicated that any licensing solution - as opposed to the reopening of the Copyright Directive - will be insufficient for their purposes.