ALPSP blog: at the heart of scholarly publishing: data analysis

Showing posts with label data analysis. Show all posts

Monday, 9 October 2017

The Emotion of Data – Your Child Is Always Beautiful

In week’s guest blog we hear from Kent Anderson, the CEO of RedLink and RedLink Network, on the emotional pull of well-presented data…..

An Excel spreadsheet or data table isn’t usually enough to rouse the emotions. Rigid rows and columns crammed with shapes are difficult to bond with and even harder to get worked up over. Trends are concealed in there somewhere, meaning lurks, yet our senses are stymied by how raw data are assembled.

Over the past 18 months, guiding RedLink, a data company with the slogan “See What You’re Missing,” has opened my eyes to the wonderful emotional pull of well-presented data – what we might call the ultrasound of data, when a real emotional connection begins to occur. I’ve attended dozens of sessions in which we reveal to new customers their data in our products, and every time there is a strong emotional response – the “ooh!” and “wow!” – because they are seeing something of great interest clearly for the first time.

Visualization isn’t the only way to create emotional connections for users. There are other techniques, such as gamification, personalization, and connection.

Visualization – Seeing Is Believing

Turn a set of columns and rows into a set of interactive curves or lines or bars, and suddenly meaning leaps out. Making these trends clear is powerful for sales people, business leaders, managers, and purchasers. There is also the ability RedLink has to import data for libraries and publishers, saving them days or weeks of effort, that liberates time to look at the data and think about its implications.

Gamification – It Makes Data Engaging

Games are great ways to make complex subjects approachable and more understandable. We’ve adopted some aspects of gamification in our products, adding Unlocks and clever names and treasure maps to business-specific products that otherwise would be officious and off-putting. These conceptual candies help to sweeten the experience, adding memorable and pleasant dimensions to the user experience while boosting utility.

Personalization – It’s Your Data

Increasingly, personal data are viewed not as commodities but as elements you have a right to manage. The EU has been more proactive on this front than the US, for example, with initiatives like “the right to be forgotten” and data portability. This places new constraints on data companies. Yet, constraints drive design and innovation, so new services like Remarq – which allows users to put a lot of data about their usage of the scientific and scholarly literature in one place – are on the leading edge of the data personalization trend.

Connection – Relevance Matters to Meaning

Data matter the most when you can immediately do something with them. We focus a lot on making this happen, whether it’s allowing users to only see data for customers they manage, to see trends across disciplines instead of just around products, to view the macro (consortia, bundles, titles from multiple sources) and the micro (individual institutions, individual titles, individual sources), giving quick paths to relevant views is crucial to making data matter. These views connect the user with the data so that decisions can happen quickly and confidently.

Conclusion

As an independent data company, RedLink helps libraries, consortia, publishers, and end-users “see what they’re missing.” By using visualization, gamification, personalization, and connection, data can become powerful, efficient, and even enjoyable sources of information to help publishers, librarians, administrators, researchers, editors, and authors make better decisions.

https://www.redlink.com

Redlink is a proud Silver Sponsor of the ALPSP 2017 Annual Conference.

Friday, 8 September 2017

Spotlight on SourceData - shortlisted for the 2017 ALPSP Awards for Innovation in Publishing

Last but not least in our series of blogs on our 2017 Awards Finalists is EMBO – the creators of SourceData. We speak to Project Leader Thomas Lemberger to find out more:

Tell us a bit about your organisation

EMBO is an international organization that promotes scientific excellence in Life Sciences.It has over 1700 members elected from the leading researchers of Europe and beyond. The organization is funded by 29 member states to provide support to scientists through events, networking opportunities, funding and fellowships for young researchers and shaping science policy. EMBO also publishes four journals reporting important discoveries from the global bioscience community: EMBO Journal, EMBO Reports, Molecular Systems Biology and EMBO Molecular Medicine.

What is the project that you submitted for the Awards?

SourceData is a technology platform made up of several tools that extract information about published figures and make scientific data more discoverable.Through EMBO’s work at the intersection of research and publishing we realized there is a disconnect between the way research data is published in scientific papers and the way researchers typically want to interact with it. Most scientific papers report the results of carefully-designed experiments producing well-structured data. Unfortunately, during the publishing process this data is typically summarised in text and graphs and “flattened down” thus losing a lot of valuable information along the way. As a result, it can be very difficult for researchers to find answers to relatively simple questions because data is inaccessible.

For example, it is currently very cumbersome for a scientist to find specific experiments where a certain small molecule drug has been tested on a specific cancer cell line or to look at the results of a published experiment and find out whether similar data had been published elsewhere. These are the kinds of scenario where SourceData can help. SourceData goes to the heart of the scientific paper - the data - and extracts its description in a usable format that researchers can access and interrogate. It then goes on to link this data to results from other scientific papers that have been through the same process.

SourceData - Making Scientific Data Discoverable from SourceData on Vimeo.

Tell us more about how it works and the team behind it

With SourceData, EMBO has developed a way to represent the structure of experiments. The principle of SourceData is rather simple: we identify the biological objects that are involved in the experiment and then we specify which objects were measured to produce the data and which, if any, were experimentally manipulated by the researchers. Despite its apparent simplicity, this method allows us to build a scientific knowledge graph that turns out to be a very powerful tool for searching and linking papers and their data.

The development of SourceData has been a collaborative process involving the Swiss Institute of Bioinformatics who provided their expertise in developing software platforms in the field of Life Sciences and the curation of data. After this we worked with Wiley to implement SourceData within a publishing environment. Nature also contributed content to the initiative.

Why do you think it demonstrates publishing innovation?

SourceData transforms the way that researchers can interact with scientific papers by getting to the heart of the paper - the data, and putting it into a highly searchable form. It then takes this a step further by linking this data with relevant results from other scientific papers so that researchers can explore these connections.SourceData can give readers a new level of confidence in finding more of the research that is relevant to their questions. It can give scientists more opportunities to have their publications found and cited and can allow publishers to expose more of their content to interested readers by making it even easier to search and explore.

What are your plans for the future?

Our work to date has involved a lot of manual work so we are now working to automate this process. We are developing artificial intelligence algorithms using deep learning to extract the structure of an experiment from their descriptions in natural language. Our vision is to provide access to our technology to as many publishers as possible and encourage the widespread adoption of SourceData. In doing so we hope to facilitate access to the data behind more and more journals over time and ultimately accelerate Science in the process.

Thomas Lemberger is leading the SourceData project and is passionate about the importance of scientific data and structured knowledge in publishing. Trained as a molecular biologist, Thomas is Deputy Head of Scientific Publications at EMBO and Chief Editor of the open access journal Molecular Systems Biology.

Twitter: https://twitter.com/embocomm
Facebook: https://www.facebook.com/EMBO.excellence.in.life.sciences

See the ALPSP Awards for Innovation in Publishing Finalists lightning sessions at our Annual Conference on 13-15 September, where the winners will be announced.

The ALPSP Awards for Innovation in Publishing 2017 are sponsored by MPS Ltd.

Friday, 26 April 2013

What next for data analysis? Notes from the London Book Fair 2013

The panel line up for questions

What next for data analysis? A scholarly publisher's guide was a seminar organised by ALPSP at this year's London Book Fair. The panel discussed the importance of researchers sharing data, how it benefits the public as well as advancing disciplines, and how a reward system is needed around publishing sharing data. Encouragingly, it's clear that publishers have an important role to play.

The problem with not sharing

Lee-Ann Coleman, Head of Scientific, Technical and Medical Information at the British Library, chaired the session. She has particular insight into the use of data by researchers having worked on both the DRYAD project and currently DataCite. There are a number of challenges sharing data amongst researchers. Coleman acknowledged that publishers have been helpful by requiring this, but this is not standard practice. The lack of sharing can be a real problem, particularly in public health or multidisciplinary areas. A maximum return on sharing data is not realised by the current system despite a focus on open data from policy makers and organisations such as the Royal Society.

Lee-Ann Coleman kicks off the session

The lack of a system to store, cite or link research data is the reason why the DataCite project was established in 2009. DataCite comprises full and associate members organisations, enabling them to assign Digital Object Identifiers (DOIs) to submitted data sets to support finding, accessing and reusing the data.

Read more about DataCite here.

What practical challenges do publishers face in making data open?

Phil Hurst is Publisher at The Royal Society who published a research report Science as an open enterprise in 2012. It highlighted the need to deal with the deluge of data, to exploit it for the benefit of the development of science, and the need to preserve the principle of openness. Hurst asserted that before you can analyse data, you need to open it up. Why bother? A recent outbreak of E. coli was a classic case study of how open, shared data helped to quickly control an outbreak of a deadly virus.

The report highlights the power of opening up data for science and provides a vision of all scientific literature online. The Royal Society makes sharing data a condition of publication. The data should go into a repository where it can be linked to it. Being practical, it is still early days for this. Hurst observed that you need to identify suitable repositories, establish appropriate criteria and share a list to guide authors. One repository they are working with is DRYAD.

Phil Hurst and a nasty strain of E. coli

The Society has amended licences to allow text and data mining and work with partners to facilitate. Challenges to take into account include how to manage access control for text and data mining purposes There are differences between subjects and varying degrees of willingness to share across the spectrum of science. Sharing data allows analysts to conduct meta analyses, modelling and data and text mining; and ultimately, enables scientists get new scientific value from content.

Developing taxonomies to track and map data

Richard Kidd, Business Development Manager for the Strategic Innovation Group at the Royal Society of Chemistry, outlined how they had approached data analysis at the RSC by using topic modelling to determine a set of true topics. They identified/invented 12 broad subjects which then generated 100+ categories. These were narrowed down and then mapped to existing categories.

Richard Kidd from the RSC in action

The 12 general categories and 120 or so sub-categories enable them to map new content. As a result, as their publishing output shifts, they can continue to track and map its evolution. This taxonomy provides a navigation aid for journals. It also works across other books, magazines and educational content. This provides sales opportunities for subject-specific focused customers.

They are now looking at data in their publications and patterns in data for sub-domains and hope that this approach will allow them to look at their back list and bring back the original data points.

Chemists don't have a community norm about sharing with a laboratory group culture. There is a lack of available standards and issues about releasing data when patents could be developed. This leads to a more protective culture in relation to research data that can be at odds with open data principles. However, the RSC will be operating the EPSRC National Chemical Database, a domain repository for chemical sciences. Use and reuse is a priority with data availability feeds especially.

The rise of the 'meta journal'

Brian Hole of open access publisher Ubiquity Press outlined how researchers’ needs drive their publishing efforts. The model they use encourages researchers to share data. Hole is a strong proponent of what he calls the social contract of science and considers not only publication of research but also research data to be an essential part of it. As a result an author’s conclusions can be validated and their work more efficiently built upon by the research community. On the other hand it is effectively scientific malpractice to withhold data from the community. He argues that this principle applies to publishers, librarians and repositories as well as researchers.

Brian Hole from Ubiquity Press

Benefits of sharing data cut across different interest groups. Researchers want recognition in the form of citations, and those who share data tend to receive more citations, and potential for career advancement. This in turn makes data easier to find and use in future studies which is more data efficient. Shared data can be used in teaching to improve the learning experience. For the public, if it is easier to find data, it can help build public trust in science. There are also potential economic benefits for the private sector to drive innovation and product development He believes that there are many disciplines that are yet to benefit, especially in the humanities.

Ubiquity Press are developing 'metajournals' to aid in discovery of research outputs scattered throughout the world in different repository silos, and also to provide incentives for researchers to openly share their data according to best practices. The metajournals provide researchers with citable publications for their data or software, which are then referenced by other researchers in articles and books. The citations are the tracked along with the public impact of papers (using altmetrics). The platform so far includes metajournals in public health, psychology, archaeology and research software, with more to come including economics and history. Read more about Ubiquity Press' meta journals here.

If you are interested in data, join us at the ALPSP Conference this September to hear Fiona Murphy from Wiley and a panel of industry specialists discuss Data: Not the why, but the how (and then what?). Book online by 14 June to secure the early bird rate.

Friday, 12 April 2013

Countdown to the London Book Fair: What next for data analysis? A scholarly publisher’s guide

Tuesday 16 April, 13:00 – 14:00, Thames Room, Earls Court 1

ALPSP is delighted to be hosting a panel discussion at the London Book Fair. It will bring together experts from the scholarly publishing community to demystify key terms and emerging trends in data analysis.

What next for data analysis? A scholarly publisher’s guide will help you will understand key terms and the fundamentals of data analysis. This session will provide you with an overview of the latest trends in data analysis for the scholarly and academic publishing community.

The seminar will be chaired by Lee-Ann Coleman, Head of Science, Technology and Medicine at the British Library. Lee-Ann will provide further information on the DataCite project that publishers are involved with. Panellists include:

Brian Hole, Founder and CEO of Ubiquity Press, who will discuss linking data with humanities and social science books.
Richard Kidd, Business Development Manager at the Royal Society of Chemistry, who will discuss RSC projects such as their semantic enrichment programme and building a domain repository for chemistry research data.
Phil Hurst, Publisher at the Royal Society, who will discuss how the Royal Society has approached the challenge of outlining how their authors can share data associated with their journal articles with their data policy.

Entrance to the seminar is free, but you will need a ticket to the Fair. If you unable to make the session, we will be live tweeting using #alpspdata and will post a blog and photographs after the event.