Friday, 26 April 2013

What next for data analysis? Notes from the London Book Fair 2013

The panel line up for questions
What next for data analysis? A scholarly publisher's guide was a seminar organised by ALPSP at this year's London Book Fair. The panel discussed the importance of researchers sharing data, how it benefits the public as well as advancing disciplines, and how a reward system is needed around publishing sharing data. Encouragingly, it's clear that publishers have an important role to play.

The problem with not sharing

Lee-Ann Coleman, Head of Scientific, Technical and Medical Information at the British Library, chaired the session. She has particular insight into the use of data by researchers having worked on both the DRYAD project and currently DataCite. There are a number of challenges sharing data amongst researchers. Coleman acknowledged that publishers have been helpful by requiring this, but this is not standard practice. The lack of sharing can be a real problem, particularly in public health or multidisciplinary areas. A maximum return on sharing data is not realised by the current system despite a focus on open data from policy makers and organisations such as the Royal Society.

Lee-Ann Coleman kicks off the session
The lack of a system to store, cite or link research data is the reason why the DataCite project was established in 2009. DataCite comprises full and associate members organisations, enabling them to assign Digital Object Identifiers (DOIs) to submitted data sets to support finding, accessing and reusing the data.

Read more about DataCite here.



What practical challenges do publishers face in making data open?

Phil Hurst is Publisher at The Royal Society who published a research report Science as an open enterprise in 2012. It highlighted the need to deal with the deluge of data, to exploit it for the benefit of the development of science, and the need to preserve the principle of openness. Hurst asserted that before you can analyse data, you need to open it up. Why bother? A recent outbreak of E. coli was a classic case study of how open, shared data helped to quickly control an outbreak of a deadly virus.

The report highlights the power of opening up data for science and provides a vision of all scientific literature online. The Royal Society makes sharing data a condition of publication. The data should go into a repository where it can be linked to it. Being practical, it is still early days for this. Hurst observed that you need to identify suitable repositories, establish appropriate criteria and share a list to guide authors. One repository they are working with is DRYAD.
Phil Hurst and a nasty strain of E. coli


The Society has amended licences to allow text and data mining and work with partners to facilitate. Challenges to take into account include how to manage access control for text and data mining purposes There are differences between subjects and varying degrees of willingness to share across the spectrum of science. Sharing data allows analysts to conduct meta analyses, modelling and data and text mining; and ultimately, enables scientists get new scientific value from content.


Developing taxonomies to track and map data

Richard Kidd, Business Development Manager for the Strategic Innovation Group at the Royal Society of Chemistry, outlined how they had approached data analysis at the RSC by using topic modelling to determine a set of true topics. They identified/invented 12 broad subjects which then generated 100+ categories. These were narrowed down and then mapped to existing categories.

Richard Kidd from the RSC in action
The 12 general categories and 120 or so sub-categories enable them to map new content. As a result, as their publishing output shifts, they can continue to track and map its evolution. This taxonomy provides a navigation aid for journals. It also works across other books, magazines and educational content. This provides sales opportunities for subject-specific focused customers.


They are now looking at data in their publications and patterns in data for sub-domains and hope that this approach will allow them to look at their back list and bring back the original data points.

Chemists don't have a community norm about sharing with a  laboratory group culture. There is a lack of available standards and issues about releasing data when patents could be developed. This leads to a more protective culture in relation to research data that can be at odds with open data principles. However, the RSC will be operating the EPSRC National Chemical Database, a domain repository for chemical sciences. Use and reuse is a priority with data availability feeds especially.

The rise of the 'meta journal'

Brian Hole of open access publisher Ubiquity Press outlined how researchers’ needs drive their publishing efforts. The model they use encourages researchers to share data. Hole is a strong proponent of what he calls the social contract of science and considers not only publication of research but also research data to be an essential part of it. As a result an author’s conclusions can be validated and their work more efficiently built upon by the research community. On the other hand it is effectively scientific malpractice to withhold data from the community. He argues that this principle applies to publishers, librarians and repositories as well as researchers.

Brian Hole from Ubiquity Press
Benefits of sharing data cut across different interest groups. Researchers want recognition in the form of citations, and those who share data tend to receive more citations, and potential for career advancement. This in turn makes data easier to find and use in future studies which is more data efficient. Shared data can be used in teaching to improve the learning experience.  For the public, if it is easier to find data, it can help build public trust in science. There are also potential economic benefits for the private sector to drive innovation and product development He believes that there are many disciplines that are yet to benefit, especially in the humanities.

Ubiquity Press are developing 'metajournals' to aid in discovery of research outputs scattered throughout the world in different repository silos, and also to provide incentives for researchers to openly share their data according to best practices. The metajournals provide researchers with citable publications for their data or software, which are then referenced by other researchers in articles and books. The citations are the tracked along with the public impact of papers (using altmetrics). The platform so far includes metajournals in public health, psychology, archaeology and research software, with more to come including economics and history. Read more about Ubiquity Press' meta journals here.

If you are interested in data, join us at the ALPSP Conference this September to hear Fiona Murphy from Wiley and a panel of industry specialists discuss Data: Not the why, but the how (and then what?). Book online by 14 June to secure the early bird rate.

No comments:

Post a comment