Friday 12 September 2014

Welcoming the robots

Mark Bide, Chairman at the Publishers Licensing Society chaired the penultimate panel at the ALPSP International Conference on text and data mining (TDM).

Gemma Hersh, Policy Director at Elsevier talked through the Elsevier TDM policy. It has been controversial with calls to change it. Central to their policy is the use of the ScienceDirect API, designed to help preserve the performance of website for everyone else.

One controversy is that a license is Elsevier's way of exerting control. However, they have a global license (which complies with the UK copyright exception and balances with copyright frameworks). Another complaint is around the click through agreement: critics believe it controls what researchers are doing and takes control away from libraries to place liability on researchers. However, it is an automatic process, there is no additional liability, it is aligned with institutional e-amendment, provides guidelines on reuse and can offer one to one support.

Another complaint is that they didn't allow text mining of images. The reason was they did not hold copyright in all the images so they would do it on request. However, they now do it automatically and include terms of use flagging when they need to contact the copyright owner where it doesn't lie with Elsevier.

There were criticisms that they were trying to claim copyright over TDM output. This was inadvertent and they have adjusted the policy to be a little more flexible and take this into account. A final misconception was that the policy was rigid.

In Europe, they have signed a commitment to facilitate TDM for researchers, but their policy is global. They are also a signatory of CrossRef and think the new service is good.

Mark Bide introduces the panel
Lars Juhl Jensen, based at the Novo Nordisk Foundation Center for Protein Research at the University of Copenhagen, provided an academic perspective on TDM. He considers himself a pragmatic text and data miner. The volume of biomedical research that he has to read is huge. Making sense of structured and unstructured data is key. All he wants to do is data mine. It enables him to do things such as associate diseases and identify conditions. Once you've got the data from text mining, you can then bring it together with experimental data, and from other sources.

As a researcher doing text mining, he needs the text. He doesn't want much else. The format doesn't matter too much. If he can get it in a convenient format, great. The licence has to be reasonable.

Andrew Clark, Associate Director Global Information and Competitive Intelligence Services at UCB,  articulated what TDM means and the part it plays in the scientific industry. He recounted the work of the Pharma Documentation Ring (P-D-R). Their aims are to:

  • Promote exchange of experience/networking among members
  • Encourage commercial development of new information services and systems
  • Jointly assess new and existing products and services
  • Provide a forum for the information industry

Gemma Hersh, Lars Juhl Jensen and Andrew Clark
Literature patent analysis, sentiment analysis and drug safety are just a few of the benefits of TDM. One of the challenges is around the unstructured format that the data comes in at. They need several aggregators to make the data mineable. It's not always easy to get the datasets - from small publishers to large ones. It's quiet expensive and labour intensive.

There are high costs for setting up your data mining. There are a lack of technical skills in the organisation.

There are benefits to TDM that include a managed and in some cases auditable processes for protecting IP. It provides added value and potential new revenues streams. Clark closed with a call for industry collaborations and asked everyone to watch this space.

No comments:

Post a Comment