ALPSP blog: at the heart of scholarly publishing: April 2013

Tuesday, 30 April 2013

Safe Harbour offers clear rules for use of out-of-commerce works: trade associations agree safe harbour provisions

In September 2011, a Memorandum of Understanding was signed in the presence of the EU Commissioner for the Internal Market and Services (Michel Barnier), which set out the principles on the digitisation and making available of out-of-commerce (OOC) works, by publicly accessible libraries and similar institutions in the EU.

The principles recommend that potential users of such out-of-commerce works essentially adhere to an Extended Collective Licensing Scheme, which involves rights holder Collective Management Organisations granting specific licences for the use of such works, following appropriate diligent search for in-commerce formats, and consultation and agreement with the relevant rights holder groups.

The International Association of STM Publishers (STM), the Professional and Scholarly Publishing Division (PSP) of the Association of American Publishers (AAP) and The Association of Learned and Professional Scholarly Publishers (ALPSP) have released a safe harbour provisions statement regarding the use of OOC works under the MoU.

The safe harbour provision provides additional clarity and certainty for users who participate in digitisation and reuse of works under such schemes in EU member states.

The full MOU and Key Principles are available to download. Read the Safe Harbour provisions from STM, AAP/PSP and ALPSP statement here.

A number of publishers have already signed up to the safe harbour provisions. If you wish to do so, please contact Kim Beadle at STM (beadle@stm-assoc.org). Please note that this is not solely for those who publish in the area of STM, but open to all scholarly publishers.

Monday, 29 April 2013

A guide to accessible publishing - what, why and how?

It might not be obvious but digital workflows are a great gift to disabled people and, likewise, disabled people are great gift for digital publishing. But it doesn't always seem that way, so let's explore the issues.

Alistair McNaught (JiscTechDis) and Sarah Hilderley (EDItEUR) have written a new advice note for ALPSP Members: A guide to Accessible Publishing.

This topic will be explored further at the ALPSP International Conference in September when Huw Alexander (SAGE) will be chairing the session Accessibility - are you missing a strong market for your content. Drawing together some of the key players within the sector, the speakers will seek to unpack the issues experienced by publishers, users and institutions and map out how publishers can develop their online products to be accessible to as wide an audience as possible. The session will examine the issues from various viewpoints and provide practical advice and opportunities for discussion. See www.alpspconference.org for further information about the conference programme and registration.

Friday, 26 April 2013

What next for data analysis? Notes from the London Book Fair 2013

The panel line up for questions

What next for data analysis? A scholarly publisher's guide was a seminar organised by ALPSP at this year's London Book Fair. The panel discussed the importance of researchers sharing data, how it benefits the public as well as advancing disciplines, and how a reward system is needed around publishing sharing data. Encouragingly, it's clear that publishers have an important role to play.

The problem with not sharing

Lee-Ann Coleman, Head of Scientific, Technical and Medical Information at the British Library, chaired the session. She has particular insight into the use of data by researchers having worked on both the DRYAD project and currently DataCite. There are a number of challenges sharing data amongst researchers. Coleman acknowledged that publishers have been helpful by requiring this, but this is not standard practice. The lack of sharing can be a real problem, particularly in public health or multidisciplinary areas. A maximum return on sharing data is not realised by the current system despite a focus on open data from policy makers and organisations such as the Royal Society.

Lee-Ann Coleman kicks off the session

The lack of a system to store, cite or link research data is the reason why the DataCite project was established in 2009. DataCite comprises full and associate members organisations, enabling them to assign Digital Object Identifiers (DOIs) to submitted data sets to support finding, accessing and reusing the data.

Read more about DataCite here.

What practical challenges do publishers face in making data open?

Phil Hurst is Publisher at The Royal Society who published a research report Science as an open enterprise in 2012. It highlighted the need to deal with the deluge of data, to exploit it for the benefit of the development of science, and the need to preserve the principle of openness. Hurst asserted that before you can analyse data, you need to open it up. Why bother? A recent outbreak of E. coli was a classic case study of how open, shared data helped to quickly control an outbreak of a deadly virus.

The report highlights the power of opening up data for science and provides a vision of all scientific literature online. The Royal Society makes sharing data a condition of publication. The data should go into a repository where it can be linked to it. Being practical, it is still early days for this. Hurst observed that you need to identify suitable repositories, establish appropriate criteria and share a list to guide authors. One repository they are working with is DRYAD.

Phil Hurst and a nasty strain of E. coli

The Society has amended licences to allow text and data mining and work with partners to facilitate. Challenges to take into account include how to manage access control for text and data mining purposes There are differences between subjects and varying degrees of willingness to share across the spectrum of science. Sharing data allows analysts to conduct meta analyses, modelling and data and text mining; and ultimately, enables scientists get new scientific value from content.

Developing taxonomies to track and map data

Richard Kidd, Business Development Manager for the Strategic Innovation Group at the Royal Society of Chemistry, outlined how they had approached data analysis at the RSC by using topic modelling to determine a set of true topics. They identified/invented 12 broad subjects which then generated 100+ categories. These were narrowed down and then mapped to existing categories.

Richard Kidd from the RSC in action

The 12 general categories and 120 or so sub-categories enable them to map new content. As a result, as their publishing output shifts, they can continue to track and map its evolution. This taxonomy provides a navigation aid for journals. It also works across other books, magazines and educational content. This provides sales opportunities for subject-specific focused customers.

They are now looking at data in their publications and patterns in data for sub-domains and hope that this approach will allow them to look at their back list and bring back the original data points.

Chemists don't have a community norm about sharing with a laboratory group culture. There is a lack of available standards and issues about releasing data when patents could be developed. This leads to a more protective culture in relation to research data that can be at odds with open data principles. However, the RSC will be operating the EPSRC National Chemical Database, a domain repository for chemical sciences. Use and reuse is a priority with data availability feeds especially.

The rise of the 'meta journal'

Brian Hole of open access publisher Ubiquity Press outlined how researchers’ needs drive their publishing efforts. The model they use encourages researchers to share data. Hole is a strong proponent of what he calls the social contract of science and considers not only publication of research but also research data to be an essential part of it. As a result an author’s conclusions can be validated and their work more efficiently built upon by the research community. On the other hand it is effectively scientific malpractice to withhold data from the community. He argues that this principle applies to publishers, librarians and repositories as well as researchers.

Brian Hole from Ubiquity Press

Benefits of sharing data cut across different interest groups. Researchers want recognition in the form of citations, and those who share data tend to receive more citations, and potential for career advancement. This in turn makes data easier to find and use in future studies which is more data efficient. Shared data can be used in teaching to improve the learning experience. For the public, if it is easier to find data, it can help build public trust in science. There are also potential economic benefits for the private sector to drive innovation and product development He believes that there are many disciplines that are yet to benefit, especially in the humanities.

Ubiquity Press are developing 'metajournals' to aid in discovery of research outputs scattered throughout the world in different repository silos, and also to provide incentives for researchers to openly share their data according to best practices. The metajournals provide researchers with citable publications for their data or software, which are then referenced by other researchers in articles and books. The citations are the tracked along with the public impact of papers (using altmetrics). The platform so far includes metajournals in public health, psychology, archaeology and research software, with more to come including economics and history. Read more about Ubiquity Press' meta journals here.

If you are interested in data, join us at the ALPSP Conference this September to hear Fiona Murphy from Wiley and a panel of industry specialists discuss Data: Not the why, but the how (and then what?). Book online by 14 June to secure the early bird rate.

Friday, 12 April 2013

Countdown to the London Book Fair: Academic Publishers: still open for business?

Monday 15 April, 11:30 – 12:30, Cromwell Room, Earls Court 1

Join ALPSP Chief Executive, Audrey McCulloch, as she chairs what should be a lively debate on open access with David Tempest from Elsevier, Mandy Hill from Oxford University Press and Richard Fisher from Cambridge University Press.

The panel will discuss reactions to the Finch Report on Open Access. Each publisher will share their views on the Report and how they plan to address the Finch recommendations in the short to medium term.

This is sure to be a popular event so make sure you arrive early to guarantee a seat.

Further information is available on the London Book Fair website.

Countdown to the London Book Fair 2013: ALPSP stand at N450 and member exhibitors

Monday 15 – Wednesday 17 April, Earls Court 1, Stand N450

This year promises to be an especially busy one at the London Book Fair. We hope that many of you will drop by the stand to say hello to the team or visit one of our member exhibitors.

The ALPSP stand and collective exhibition is bigger than ever. Drop by stand N450 to meet the team or if you’d like to chat about a particular issue. Exhibiting alongside ALPSP this year are:

We look forward to seeing you next week!
The ALPSP Team

Adam Marshall on Beyond the PDF2 in Amsterdam

Adam Marshall, currently Group Head of Marketing and Customer Services at the Portland Press, writes about the recent Beyond the PDF2 conference in Amsterdam.

"Held in wintery Amsterdam in the middle of March, the Beyond the PDF 2 conference (BTPDF2) was not really about the pdf vs. html for online publication and it was pleasing to see that it had also moved beyond a debate about open access vs. subscription publishing. For me the conference was quite unlike other publishing industry conferences I attend, and more of a brainstorming academic conference.

Over 200 attendees with more than half from academia including 33 students who received funding to attend, and less attendance by publishers than normal at information related meetings (but with a sizable group from Elsevier who were one of the sponsors), and with representation by a number of OA publishers (PLoS, PeerJ, Ubiquity Press, BMC, eLife).

Steve Pettifer (Utopia/University of Manchester) one of the attendees at the first BTPDF conference in San Diego in 2011 and a member of the organizing committee this year sums it up:

"BTPDF2 was somewhere between a conference and an unconference -- and for my money managed to blend the best of both worlds. I think I was most struck by how engaged everyone seemed to be with pretty-much everyone else; these are turbulent times for scholarly communication, but it was clear this was a group of people who really wanted to get on with getting stuff going."

The two days were anchored with great keynotes from Kathleen Fitzpatrick (Modern Language Association) coming from a non-science discipline on Planned Obselesence and her experience with MLA Commons and Carol Tenopir (University of Tennessee) on Shaping the Future of Scholarly Communication, but after that the conference was turned over to a mix of short and longer “poster style” presentations from users and producers of data. I expect that we will see many of these appearing at publishing industry conferences in the year to come. As a publisher from a marketing background I found the conference very challenging and as a result I have tried to mention things new to me that others might want to follow up later.

Laura Czerniewicz and Michelle Willmers (University of Cape Town) provided some thoughtful southern hemisphere insight to publishing models in relation to different geographical and demographic contexts that injected an element of reality which recurred throughout the meeting.

The new models of content creation sessions drew together a group of presenters who are building their own tools for content creation and aggregation. Nathan Jenkins a physicist from NY University presented Authorea an online platform used to write up and share research within your browser. Merce Crosas (Harvard) was one of a number of attendees interested in data citation principles and highlighting the Dataverse Network project. One of the subsequent outcomes of BTPDF2 was the initial draft of Amsterdam Manifesto on Data Citation Principles. Amalia Levi (University of Maryland) spoke about data in the context of studying communities where the data may not be online and Joost Kircz (University of Amsterdam) discussed the different practices in scientific reading.

Coming from a life science publisher the presentation by Lisa Girard (Harvard) on the 60-chapter open access StemBook was interesting showing the type of resource that could replace the traditional reference work, one of several projects built using the DOMEO annotation toolkit. This does bring up the thorny question of how such resources are funded, in this case by Harvard Stem Cell Institute. Lisa’s talk sequenced nicely into the presentation by Pauli Ciccarese (Harvard) on Open Annotation and the need to have a uniform and pervasive method for describing digital resources, also based on the DOMEO toolkit.

With changing needs of writers and readers we have been experiencing a plethora of new dissemination models from existing and new hosts which were covered by the review of new models of content dissemination with speakers representing the deconstructing- and reinventing- the journal camps.

For the decoupled journal, Jason Priem with ImpactStory which aggregates altmetrics for researchers; Keith Collier with www.rubriq.com/ providing peer review independent of publisher or journal; Peter Brantley with www.hypothes.is an open-source platform for the collaborative evaluation of information by community peer-review; and Kaveh Bazargan, voted the most entertaining speaker on #btpdf2, putting the case for typesetting in the future model. Arguing the case for the reinvention of the journal were three recent representatives of open access publishing, Theodora Bloom for PLoS, Brian Hole from Ubiquity Press on the case for open scholarship and the metajournal and Alf Eaton bringing us up to date with PeerJ. The connecting factor in many of the products mentioned was the use of open source software such as Git, IPython, drupal and DOMEO.

Following lunch on the first day was what for me was one of the least satisfying sessions which tried to get down to the real issues of money and how to make a credible business case for new systems and tools, and changes to existing ones with three business model representatives from Springer, Figshare and new to me Daniel O’Donnell talking about the Journal Incubator project which seeks to provide funding to train graduate students to be able to act as managing editors and production supervisors. The stakeholders were represented by David Prosser (RLUK), Dave deRoure (Oxford University) and another southern Africa speaker Eve Gray (University of Capetown).

While this session, coming at the end of the day, involved the most discussion between presenters and delegates, it did not draw out how the emerging models would be funded in the longer term while serving the needs of both current and future users regardless of discipline or location.

Steve Pettifer summed up the first day:
“For me the biggest lesson was realising how much can be learned in this space from the social sciences and humanities; it's all too easy for the future of scholarly communications to be dominated by science and technology. And the message that struck me most was the importance of aligning developments with the needs of the developing world: paraphrasing Eve Gray, "Impact Factors kill in a literal sense".

The sessions on the second day took a more practical approach to the discussion. BTPDF2 was organized by Force11 with the aim to accelerate the pace of innovation in scholarly communications. This requires action and this session featured three framing talks around software, usability and data, followed by selected flash talks about technologies/tools/frameworks/standards/containers that need support or acceptance from scholars in some way to move them to the next level. The call to action was for the attendees to discuss these and come together as three or four breakout groups over lunch to try to turn some of the ideas into action plans such as the manifesto on Data Citation Principles mentioned earlier.

Points that stuck out for me were Asuncion Gomez-Perez from Madrid on the SEALS Platform for evaluation of semantic technologies; Scott Edmunds on GigaScience open access journal from BMC; Melissa Haendel (OHSU) on the reproducibility of science; and Alex Garnett on the Public Knowledge Project to provide an automated XML publication pipeline.

Throughout the presentations the theme of reproducibility, usability, preservation, scalability, open data, sustainability, verifiability and data management recurred over and over.

Richard Padley of Semantico said:
“It was good that the humanities folks got some air time. Often STM can dominate the debate in a way that leaves out some of the very interesting humanities projects. Also it was refreshing that the usual debates about green/gold OA were fairly conspicuous by their absence. There was more talk about research reproducibility and data which I think will be forthcoming themes in the more conventional conferences on the circuit.”

In the penultimate session Carole Goble (University of Manchester) presented an interesting case for revising how we evaluate research and researchers and the importance of reproducibility and peer- evaluating as well as peer-reviewing before introducing a slightly tongue-in-cheek role-playing panel to discuss new models for evaluation of research and researchers with a researcher, a publisher, a start-up, a librarian, a funder and a university administrator (in the form of Phil Bourne, UCSD, who organized the first BTPDF meeting). If the ensuing discussion in any way represents real life we have a long way to go.

The meeting concluded with visions for the future with participants giving three minute presentations on how they would change scholarly communications and e-Science. These presentations had a serious side with a panel of experts judging which would be put forward for awards of $1000 to work on their ideas.

If you are interested in any of the projects or products mentioned in this review I urge you to look at the PTPDF2 website where participants continue to tweet on #BTPDF2 and if you want to participate further sign up to Force11.

To conclude Mark Patterson from eLife summed the meeting up:
“This was one of the most rich and energising meetings I've attended. It brought together a really diverse group of people and covered a similarly diverse range of topics. The connective tissue that holds it together is the shared interest in making something good happen, and the proof of its success I think is that many people (me included) seemed to go away with a list of things they wanted to explore and act on.”

Adam Marshall can be followed on Twitter via @adamjmarshall

Countdown to the London Book Fair: What next for data analysis? A scholarly publisher’s guide

Tuesday 16 April, 13:00 – 14:00, Thames Room, Earls Court 1

ALPSP is delighted to be hosting a panel discussion at the London Book Fair. It will bring together experts from the scholarly publishing community to demystify key terms and emerging trends in data analysis.

What next for data analysis? A scholarly publisher’s guide will help you will understand key terms and the fundamentals of data analysis. This session will provide you with an overview of the latest trends in data analysis for the scholarly and academic publishing community.

The seminar will be chaired by Lee-Ann Coleman, Head of Science, Technology and Medicine at the British Library. Lee-Ann will provide further information on the DataCite project that publishers are involved with. Panellists include:

Brian Hole, Founder and CEO of Ubiquity Press, who will discuss linking data with humanities and social science books.
Richard Kidd, Business Development Manager at the Royal Society of Chemistry, who will discuss RSC projects such as their semantic enrichment programme and building a domain repository for chemistry research data.
Phil Hurst, Publisher at the Royal Society, who will discuss how the Royal Society has approached the challenge of outlining how their authors can share data associated with their journal articles with their data policy.

Entrance to the seminar is free, but you will need a ticket to the Fair. If you unable to make the session, we will be live tweeting using #alpspdata and will post a blog and photographs after the event.

James Hardcastle on Citation Analysis: Some common issues and problems.

James Hardcastle is a Senior Research Executive at Taylor & Francis where he has worked for the past five years specialising in citation analysis and bibliometrics.

He has responsibility for leading the citation analysis within the company and training staff worldwide in this area. Here, he reflects on common issues and problems with citation analysis in a guest post.

Powerful tools for publishers

Citation analysis and broader bibliometric study are powerful tools for academic publishers to use in developing and supporting their journals. This analysis allows us to understand both who is writing in our journals as well as who is citing them: the strength of bibliometrics from a publishers perspective is that it allows us to look beyond our own journals and data to find information regarding broader trends in the market and subject area.

Further citation-based metrics such as the Impact Factor are still dominate measures of journal value. Using tools such as Thomson Reuters Web of Science allows us to analysis these numbers in more detail. However, there are some major issues around citation analysis and metrics that have to be taken into consideration.

Not all subjects are the same

Subjects behave very differently in terms of citation and authorship patterns, even across related subject areas. Therefore articles and journals from different subject areas should not be compared against each other.

Coverage counts

The main databases have very different coverage; Web of Science covers around 11,000 journals, compared to more than 18,000 journals in Scopus and an unknown corpus in Google Scholar. This means that content will receive different citation counts in different databases, which should be considered when using generic metrics such as the h-index that can yield different results between databases.

ALPSP training course delegates debate...

Not all numbers are the same

Different types of published content behave in different ways, for example review articles tend to receive more citations than research articles and short communications tend to receive citations more quickly. Therefore Impact Factors and citation counts for different types of content should not be directly compared.

Distribution, time and gaming

Other important issues that should be born in mind include the distribution of citations; the time it takes for citations to appear and are not instantly metric; the fact that citation metrics can be gamed; and citations to editorials, book reviews and meeting abstracts as distinct from articles. These issues can lead to mistakes in using citation analysis tools by not considering the effect they have on the data and the metrics.

If you want to find out more, join me and Iain Craig from Wiley-Blackwell on the ALPSP Citation Analysis for Publishers course next month in Oxford.

Note
James is co-tutor on Citation Analysis for Publishers alongside Iain Craig from Wiley-Blackwell. Book now for Citation Analysis for Publishers, 2 May 2013, Oxford. Follow James on Twitter @JamesHTweets

Tuesday, 9 April 2013

New to ALPSP: CPI UK

We are pleased to welcome CPI UK as an Associate Member of ALPSP.

Friday, 5 April 2013

New to ALPSP: David Thew & Company Ltd

We're pleased to welcome David Thew & Company Ltd as an Associate Member of ALPSP.