Thursday 26 November 2015

Standards: chaos minimization, credibility and the human factor

Standard, standards, standards. One is born, one conforms to standards, one dies. Or so Edmund Blackadder might have said.

And yet, as David Sommer and his panel of experts demonstrated earlier this month, standards underpin our scholarly publishing infrastructure. Without them, we could not appoint editorial teams, enable the review process, tag or typeset articles, publish in print or online, catalogue, discover, or even assess the quality of what we published – assuming, that is, we had been allowed through the office door by our standards-compliant HR departments. We couldn’t determine the citation rates of our publications, sell said publications to libraries (all of them naturally sceptical of our unstandardized claims for high usage) or even contact our high-profile UCL author (is this the UCL in London, Belgium, Ecuador, Denmark, Brazil or the USA?). Resolution, disambiguation, standardization is the order of the day.

‘We are’, as Tim Devenport of EDItEUR said, ‘in the chaos minimization business’.

Speakers at the seminar offered overviews of the roles played by CrossRef, Ringgold, ORCID, COUNTER, Thomson Reuters, EDItEUR, libraries (in the guise of the University of Birmingham) and JISC, considering content, institutional and individual identifiers, plus usage, citation, metadata and library standards.

Audio of all talks is available via the ALPSP site, but here are some broader conclusions from issues discussed on the day.

Humans make standards

But we’re remarkably good at breaking them too. The most foolproof systems are those that don’t allow much human intervention at all (ever tried to accurately type a sixteen-digit alphanumerical code on less than eight cups of coffee?). Vendors should build systems that not only pre-populate identifier fields, but actively discourage users from guessing, ignoring or simply making up numbers.

Be the difference

Publishers, funders and institutions need to actively assert their need for standards at every stage of their workflows. Break one part of the article supply chain and something, somewhere, is bound to be lost. (And the worse part? We don’t know where.) That means that the entire supply chain must inform and develop standards, not just 'free ride' on existing ones.

Standards help authors find their voice

If an article can be found by DOI, funding source, award number or ORCID iD – in other words, if one or more of the key standards is applied to a particular publication – then research gets heard above the online ‘noise’. Authors can help themselves by claiming their own iDs, but it’s up to publishers and institutions to show them why it matters.

Identifiers enforce uniqueness

They not only help with functionality (disambiguating data and eradicating duplication), but they ensure correct access rights, help understand a customer base and build stronger client relationships. All of this adds immense value to your data.

Standards build credibility everywhere

We tend to think of publishing standards as being the building blocks of the standard workflows – and they are. But the latest development from ORCID encourages involvement in peer review, with journals and funders now collecting reviewers’ iDs to track review activities. That’s a startling contribution to tenure decisions and research assessments. And what about the prospect of using iDs in job applications to verify your publications?

The Impact Factor is a number, not a standard

OK, so we knew that. And we probably had an opinion on it. But coming on a day when Thomson Reuters announced they were ‘exploring strategic options’ for the Intellectual Property & Science businesses, it was good to hear from the horse’s mouth.

Even the ‘standard’ standards need, well, standardizing

Given the significance of COUNTER usage statistics for library negotiations, the possibility for inaccuracy seems startlingly high. Over 90% of users still require some form of manual intervention, and that means greater likelihood of error. There is a role for standardizing and checking IP information to improve the accuracy of COUNTER data - but for now, no one seems to be claiming that ground.

Slow is good

If a publisher/funder/institution is a late standards adopter, that’s OK. Better to start slow and get it right than to implement poorly and leave a (data) trail of tears. But start. Organizations such as ORCID make available plenty of information about integrating identifiers into publisher and repository workflows.

Standards are not anti-innovation

On the contrary, they facilitate innovation. And they provide the information architecture for innovation to flourish in more than one place.

Share it

Since we can't predict when/where (meta)data will be used, let’s make sure everyone knows as much as possible. Make it open source, or at the very least, make it trustworthy.

And finally…

The mobile charging area at the British Dental Association front desk is a perfect example of the need for rational standards. How many wires?

Martyn Lawrence (@martynlawrence) is Publisher at Emerald Group Publishing and attended the recent ALPSP Setting the Standard seminar in London. He can be contacted at

Monday 9 November 2015

Why Publishers Need to Know the Difference between Search and Text Mining

picture of Haralambos “Babis” MarmanisHaralambos “Babis” Marmanis CTO and VP, Engineering & Product Development at the Copyright Clearance Center looks at the concepts behind search and text mining and highlights why publishers need to understand the differences in order to make the best use of each.

As the author of works on search and the lead architect of a product which enables text mining of scientific journal articles, I am often asked about the difference between Search and Text Mining, and have observed that the two are sometimes conflated. Unless you work with technology every day, this confusion is certainly understandable. Knowing the differences, however, can open new business opportunities for publishers. Both functions deal with the application of algorithms to natural language text, and both need to cope with the fact that, as compared with “pure data,” text is messy. Text is unstructured, amorphous, and difficult to deal with algorithmically.

While the challenges associated with text are common to both search and text mining, the details with respect to inputs, analytical techniques, outputs, and use cases differ greatly. For years, publishers have been engaged in search engine optimization, designed to make their works more discoverable to users. As publishers are increasingly asked to enable text mining of their content, they enter into new territory – a territory that is different than that of public search engines. Thus, it is more important than ever to understand the difference between these two distinct mechanisms of processing content, so that optimal business and licensing strategies are chosen for each.

To begin with, let me describe the key concepts for each area. "Search" means the retrieval of documents based on certain search terms. Think, for example, of your usual web search on well-known search engines such as Google, Yahoo or Bing. In search, the typical actions performed by a software system are index-based and designed for the retrieval of documents. The indexing process therefore aims to build a look-up table that organizes the documents based on the words they contain. The output is typically a hyper-link to text/information residing elsewhere, along with a small amount of text which describes what is to be found at the other end of the link. In these systems, no “net new” information is derived from the documents through the processes that are employed to create the search index. The purpose is to find the existing work so that its content can be used.

On the other hand, "text mining" is a less widely understood but well-developed field that deals with analyzing (not finding) text. That is, while text mining can sometimes look at meta-textual issues – for example, tracking the history of science by counting the instances of a specific phrase (e.g., “avian flu”) in articles – more often the goal is to extract expressed information that is useful for particular purposes, not just to find, link to, and retrieve documents that contain specific facts.

Text mining tools accomplish this by allowing computers to rapidly process thousands of articles and integrate a wealth of information. Some tools rely on parsing the text contained in the documents and apply simple algorithms that effectively count the words of interest. Other tools dig deeper and extract basic language structure and meaning (such as identifying noun phrases or genes) or even analyze the complete grammatical structure of millions of sentences in order to gain insights from the textual expression of the authors. By extracting facts along with authors’ interpretations and opinions over a broad corpus of text, this more sophisticated approach can deliver precise and comprehensive information, and in the commercial setting, provides more value than simple word counts.

Unlike with search, the output of text mining will vary depending on the use to which the researcher wishes to apply the results. In some contexts, the output is digital and designed for machines to process. In other examples, such as using text mining to drive marketing of products and services, the ultimate output will be human-readable text. In other words, even when text mining is performed, sometimes the user needs and receives the full article.

Although both search and text mining involve the parsing and lexical analysis of documents, there are important differences that should drive a publisher’s decisions about investments in text mining and search.

  1. In text mining, the processing and analysis is often done on a project by project basis. Unlike the search functionality provided by search engines, the “how, why, and what” are infinitely variable, and it is difficult to accurately anticipate the inputs, processes, and outputs required. For example, depending on a text miner’s use case, the output may be facts, data, links, or full expression, as opposed to the simple links that are the output of search.
  2. Search is about finding a set of relevant documents, each of which is considered independently by the algorithm; if applied to a single document the process will yield the same result for that document. On the other hand, text mining is mostly about discovering and using information that lives in the fabric of a corpus of documents. Change one document and the fabric of the corpus changes. Mining is usually (but not always) consumptive of the content. So, the “search” process is document-by-document specific, while the “mining” process involves sets of documents and how these documents relate to each other.
  3. Lastly, the mining process aims at extracting “higher-order” information that involves first-, second-, and higher-order correlations that may occur among any combination of the terms, data, or expressions appearing in the corpus of documents that is processed.

In summary, search and text mining should be considered as two quite distinct processing mechanisms, with often different inputs and outputs. While publishers need to engage with both, by conflating them, one loses the unique opportunities and strengths that each provides. With search, it’s all about helping users find the specific content that they are looking for. Text mining goes well beyond search, to find multiple meanings in a publisher’s content in order to derive new value therefrom. Hence, one would expect that, just as the processes themselves differ, publishers’ licenses for the search and text mining processes will differ too.

Tuesday 13 October 2015

Standard Identifiers, Metrics and Processes in Journal Publishing: Mark Hester asks 'Aren't they a bit...dull?'

Why should we use standards? Identifiers, transaction processes, schemas, metrics and many other things in scholarly publishing have standards, or are developing them. Isn’t this a rather arduous and bureaucratic way of handling things? Are these things really there to make life easier or just another way of overcomplicating an already complex market, taking time away from the efforts of actually producing high quality content?

Here Mark Hester of Aries Systems delves into why we should care.

Aren’t standards a bit….dull?'

Standards? Just a bunch of numbers, right? With tedious documentation on how and where to use them? Why would I bother with those?

It’s not hard to see why you might think that, but also easy to see how this is misguided. Jumping straight into a document to read about standards is a little bit like reading the telephone directory when you have no intention of calling someone, or leafing through a Haynes manual when you’re not repairing a car.

An example of a standard from outside publishing might help – EAN-13. What is EAN-13 you might ask? You see examples of it daily – it is the standard for the barcodes we see on everything we buy in the supermarket. Retail staff don’t need to know how EAN-13 works, it is unlikely that they’ve read documentation on it, but they are all grateful that it does work when checking stocks, pricing items and working on the till and, in turn, so are their customers.

So I ignore standards: what’s the worst that can happen?

When I was a student in the early nineties, the departmental librarian had been using his own classification system for many years. Back then, it didn’t matter much – students got used to its quirks, visitors from other departments were rare, from other universities much rarer still. The people using the service understood it, and that was enough.

Imagine taking this approach in the online world - it would mean that your content would be less discoverable and also less usable. Online library catalogues wouldn’t work if everyone took the librarian from my alma mater’s approach! Not using DOIs means frustration for researchers who can’t click on the references and go straight to the articles, and a simple change to a URL means a broken link. If your content isn’t seen it affects your reputation, and in the case of a commercial publisher, your profits.

The benefit of standards will only increase as the ‘digital natives’ used to touch screen technology enter academia and the workplace – having to click more than once or search for more than a minute will lead them to go elsewhere.

How can standards enhance my working life and be good for my organization?

Rapid changes in scholarly publishing means that new applications are found for standards once they are in place. Adopting standards can ‘future proof’ your content and processes against changes that occur in the future.

A great example of this is the relentless adoption of gold open access. The publishing standards which enable Copyright Clearance Center’s RightsLink for OA to display different article processing charge policies to different users on the fly developed separately from one another – Ringgold for institutions, ORCID for identifying authors, and FundRef for funder identification. Brought together, however, their machine readability allows flexible APC pricing models and automated billing and payment processing, making life easier and saving time and money for both publishers and institutions.

The advantages can be psychological as well as practical – if authors, researchers and librarians see the ORCID or CrossRef logos displayed on your website, they will know that your organization is a serious player, one which will help them, one they can trust.

So what's next?

By now, I hope I’ve convinced you of the importance of standards. But if the prospect of researching the topic still fills you with a sense of dread, there's an upcoming seminar from ALPSP I'm helping to coordinate called Setting the Standard. It's being held in London on Wednesday 11 November and includes speakers from CrossRef, Ringgold, ORCID, COUNTER, Thomson Reuters, EDItEUR, Jisc and an institution. Everything you ever wanted to know about standards, but were too scared to ask.

I hope to see you there.

Tuesday 22 September 2015

Reflections on #alpsp15 - Digital Science's Phill Jones explores the key issues

Phill Jones, Head of Publisher Outreach at Digital Science, reflects on the duelling keynote talks from Anurag Acharya, co-founder of Google Scholar and Kuansan Wang, Director, Internet Service Research Center at Microsoft Research in this blog post reflecting on the 2015 ALPSP Conference. He noted their very different views on academic discovery on the open web.

"Citing the difference between "general" search for say a local business, and the geographically global nature of "academic" search, Acharya suggested that personalizing Google scholar wouldn’t yield much additional value. Conversely, Wang described a very different philosophy of highly monitored, highly personalized search through Bing and Cortana that would adapt to individual users needs."

He reflects on the shift in customer base from library to researcher and the resulting revelations as publishers try to better understand their needs:

"Google is so firmly embedded in young researcher’s routines that they don’t even think about the fact that they use it. You wouldn’t expect somebody to tell you that they opened an internet browser, would you?" 

The panel on peer review provoked the following thought:

"One reoccurring theme that emerged from the discussions: the fact too much is currently being asked of the peer-review process. With the mantra of "publish or perish" being truer now than it’s ever been, it can be argued that publishers find themselves unwittingly in the position of administering the process that decides whose career advances and whose doesn’t."

With that position comes great responsibility, something that will no doubt be considered in more detail during Peer Review Week to be held 28 September to 2 October 2015, a collaboration between ORCID, OpenScience and Wiley announced by Alice Meadows during the conference.

Read Phill's full post here on the Digital Science blog.

Monday 14 September 2015

The Academic Book of the Future?

Much discussion of scholarly communication is dominated by scientific and (especially) serials concerns. This session aimed to redress the balance. Richard Fisher chaired a distinguished panel of academics to discuss the recent trends and data on monographs and the current AHRC project on the Academic Book of the Future. These are natural starting points for an extended discussion of what still remains the major currency of both communication and esteem in many academic subjects in the humanities and social sciences.
Simon Tanner from King’s College London and the AHRC Academic Book of the Future project provided an overview of the work completed to date and some highlights from the research data. The first stage of the research project has focused on finding out what the roles and purposes of academic books to serve scholarship and wider learning for all groups involved in this area and then to sense check that back to those groups.

The REF2014 submissions provided a rich data set as a means of learning more about the academic books created and deemed worthy of submission in the last REF cycle (2009-2014). They focused on the Main Panel D for Arts and Humanities. Within this Panel the data can be investigated by Unit of Assessment Subject Area and by Research Output Type. They hope to look at various areas including author gender, book format or length, books per submitting institutions and open access books. Tanner shared some initial analysis that threw up surprising findings including how chapters still feature in REF submissions and how few publishers submit more than 10 titles.

Michael Jubb has been working with the Academic Book of the Future and initial findings from the research suggest books remain a critical part of the scholarly infrastructure in analogue form, but we haven't yet articulated how to present the broad range of scholarly resources in the humanities in an effective and user-friendly way. More will be discussed during Academic Book Week in the UK 9-16 November 2015.

There seem to be powerful incentives to write and to publish books, even as volumes of sales of individual titles fall. Are we publishing too many books?
Professor Peter Mandler from the University of Cambridge and President of the Royal Historical Society observed that technology is making more of an impact on publishing and it's right to reflect on effects on monograph publishing for good or evil. The high cost is a barrier, but it's not practical to totally remove price as a good deal of work goes into it - including remunerated peer review. However, he believes that the sooner we can reduce cost through use of ebooks, the better. New initiatives including 30-60,000 length monographs are to be welcomed.

It's interesting to note the funders don't discuss that the average score for a monograph was much higher than for chapter or article. There are implications for the future of the academic book driven by changes to productivity in output, measurement and metrics around research, and generational changes (younger generation often prefer chapters or articles than long form research).

Professor John Holmwood, University of Nottingham and Past President of the British Sociological Association noted there are some social sciences that hardly submit any books. He reflected on a decline in cultural scholarlship in some social science disciplines. He believes the move across to a linear, cumulative form of journal output perhaps lacks the reflection and transformative impact. The commercialisation of higher education and the development of publishing business models suggests a link in government actions. On the one hand there is a radical ambition to create a democractic open online library, but how does that fit with the commercialisation (and underlying privatisation) of universities?
Holmwood observed there is disruption of the curriculum and the book as a result. Publishers are disrupting both as well with innovation with education delivery and digital provision of learning materials. He feels monographs and journals are moving at different speeds and this is now becoming a problem. Article based disciplines have citation patterns that show a short life of an article compared to disciplines that tend towards long form research which can be cited for decades.

There is no doubt the debate will continue. Follow #AcBookWeek and @AcBookFuture for more details.

Friday 11 September 2015

Digital developments and new revenue streams

Timo Hannay introduced the final panel at the 2015 ALPSP Conference that focused on digital developments and new revenue streams.

Mary Ging has worked as a consultant and as MD for international at Infotrieve. In traditional access models, the annual license model still dominates. Publishers have been experimenting with value-add options. In the medium to long term the consensus is that the model will change. Given millenial's short attention spans and time crunch, is the traditional 12-15 page article the best way to disseminate research information? Is there a better alternative?

Pay Per View document delivery is bigger than you think. Publishers including CUP, Wiley, Nature and others offer rental on their websites as well as third parties such as DeepDyve and ReadCube. With article enhancement, sharing and collaboration tools there are also publisher or third party options as well.

There has been rapid growth in OA publications. In September 2015, DOAG listed 10,555 - a 30% increase in three years. Hybrid journals are increasing and there's a significant improvement in quality. Most articles are covered by a Creative Commons license, increasingly CC-BY. Text and data mining is a new area where there's a lot of interest, but not a lot of revenue. Another issue is the lack of expertise. The biggest challenge is collecting the corpus is a consistent way. There is an opportunity to provide a corpus creation solution for those who wish to do text and data mining. Is there a market for an ebay for datasets? That could work as an incentive to work with it.

Other opportunities include the importance of the patent literature and helping academics stay in touch with what's happening in this area. There could be tools to meet with regulatory requirements (e.g. Quosa for pharmacovigliance). What about the cloud? Are their opportunities to use for one solution for publisher to buy into with standard data structures? The best opportunities will be found by publishers who define their remit more broadly than just the paper.

Mat Pfleger, Managing Director at the Copyright Licensing Agency shared the challenges that CLA are considering as they develop their services. These relate to policy changes to education and HE as well as cuts in funding. Automatic renewals and inflationary pricing are symptoms of complacency. The challenge is to think beyond the short term deal. The current focus on cost masked the broader challenges we face today. We need to focus on that.

Another challenge is a range of new, disruptive services that deliver content as part of a service, each providing data that can be used in many different ways. Each creates value across multiple touch points across an institution. Some examples include The Mobile Learner's Library from Pearson
Kortext - beyond content, it's a collaboration and analytics tool
Article Galaxy Widget

How do we as a community engage with multiple open sources. Some interesting examples are Open Stack Space. Funded by a number of foundations. It provides student access to peer reviewed text books. This year alone they serviced 200,000 students and claim $25million savings. Lumen's mission is to provide open education resources to eliminate textbook costs. 4.9 million resources are downloaded each week from Tes. They recently hired a former ebay contact and have created a marketplace for teachers. It is a significant platform for open educational resources. When you combine with the challenge to the budget, this is a significant game-changer.

What does the collective licensing and streaming of content mean for collective licensing organizations. Netflix, Spotify, EPIC! are all subscription services that have potential to disrupt. They all have a growing catalogue of content which is presented at a granular level. Royalties are linked at this level with micro payment system. Every content industry engaging with these services has had to have a serious rethink about their business model.

Chris Graf, Business Development Director for the Society Services Team at Wiley pondered on what societies really want from publishers. Primarily it is financial, and particularly around new revenue. This new revenue can come from new markets, new adjacent markets and new products. Surprisingly, the biggest growth in content is in Latin America. So they are an area they focus on. With adjacent markets, transactional income such as rental and advertising can be considered. You can think about the user as potential adjacent revenue, but a user pays model can be risky. With new breakout products you need technical insight, drive down costs and usability. They consider this when looking at developing author services.

Graf closed by reflecting on revenue steams. What we have right now is a complex eco-system that publishers and societies benefit from, but it has taken hundreds of years to develop. It's worth bearing that in mind.

Tanya Field, Director of Mobile Value Partners and self-proclaimed outside had a simple message. All the other industries are having to learn that working as individuals will enable them to overcome the hurdles they face to remain profitable. Whatever you deliver to your consumers, the actual delivery needs to be simple and incredibly easy to use. That means presentation levels for every single format. That's a technical challenge as there are so many formats. You really need very clear signposting and intuitive flows for the users to get to the content. It's not just about delivering flat information. Younger users want to engage with content.

Your distribution strategy needs to be at the top of the access point. Last, but not least, the most important thing you need to consider is that the whole world is driven by data. Context and relevance are key to success. Know who your customer is, what they like, when they like it and deliver it to them. Your customer data strategy is key. If data isn't at the heart of your strategy it will be a problem in the future.

Peer review: evolution, experiment and debate

John Sack, Founding Director of HighWire Press introduced the morning panel on peer review at the 2015 ALPSP Conference.

Dr Aileen Fyfe, PI of the ‘Publishing the Philosophical Transactions’ project at the University of St Andrews reflected on how Henry Oldenburg used an editorial driven model for Philosophical Transactions. The Royal Society approved all issues for publication. But it wasn't at the article level, it was to confirm there were no threats to the country: it was ratification of a sort.

In the 1760s considered papers by looking at the abstracts and would take a vote. This was to protect the reputation of the Society and did not check facts or reasoning. Meanwhile in France, the Académie royale des sciences asked academicians might be asked to report jointly on submitted papers on the truth claims being made in the papers. However, they did not judge each other, only outsiders. This high level checking scrutiny was abandoned in the 1780s as it was deemed too difficult. In the Philosophical Transactions in the 1860s referees who were a member of the society would make recommendations for publications. They provided literary comments about the article. The Fellows were writing about each other's work as well as outsiders. This now included a judgment on originality and significance.

The Philosophical Magazine in the 1920s:
Nature in the 1950s-70s would publish papers if they weren't actually wrong with erratic refereeing. They relied on papers that came from good institutions and/or known labs. In summary, the history of peer review is not as simple as you might imagine. Much better to understand this before we move forward to revise and update peer review going forward.

Dr John R Inglis, Executive Director and Publisher at Cold Spring Harbor Laboratory Press wryly noted there are many critics of modern peer review, but the prevailing view tends to be that it may not be perfect, but it's better than nothing. What do scientists think about peer review? Most are satisfied with it, think it helps scientific communication and think it has improved their papers. However, many think it can be improved, think it holds back science and now believe it is now unsustainable with increasing journals and science.

Most scientists think peer review should improve the quality of a paper, determine its originality and the importance of its findings. It helps to ensure previous work is acknowledged, help select the best papers for the journal and detect falsehoods. There are many ways that peer review is changing including double blinding, transparency, publishing reviews alongside papers, checking figures for manipulation, use specialised data, validating authors and reviewers and forbidding author-offered reviewers.

There are also changes to where peer review is done. often it is outsourced to peer review platforms like Rubriq, Peerage of Science, Editage and PubLons. There are also changes to when peer review is done. After publication there are a range of options to comment on papers: journal specific commenting functions, PubMeds Commons, PubPeer, ResearchGate and, ScienceOpen and

Cold Spring Harbor Press launched their pre-print server for biology bioRxiv. It's a not-for-profit free service that distributes draft papers for open comment. Posting is quick. Papers get a date stamp and a DOI. There is a commenting function and they link to the history of the evolution of the paper. It results in rapid transmission of results for community consideration. They have more than 2000 manuscripts posted from over 40 countries and more than 800 institutions. There are rising rates of submission and usage. Every subject category is respected and most manuscripts eventually appear in journals.

They know that 30% of papers have been revised and 33% of all papers have been published in more than 190 journals. They have extensive feedback via social media including 25,000 tweets. There are plans to make submission easier for authors. The use of pre-prints is changing. The behaviour of biologists is changing and journals policies are changing. Inglis closed by quoting the warnings contained in the Research Information Network report on Peer Review.

Dr Simon Kerridge is Director of Research Services, University of Kent and Chair of the Board of Directors at the Association of Research Managers and Administrators. Peer review is generally for journal articles, monographs and other long form research outputs, research data and other forms of scholarly output. From his point of view, it also includes research project proposals, research environment and strategy, and research impact. There are many purely academic reasons for doing peer review, but recognition is a part of it too. Promotion, esteem, time and money are all factors. Many universities have 'citizenship' as criteria for promotion where peer review is a factor. There may be internal or external mentoring or structured support. Becoming a journal editor or reviewer looks good on a CV.

By raising your profile you gain more recognition, but how is this recorded/advertised? Very few journals list reviewers. Some funders list 'peer review college' and most conferences list reviewers. Most academics list their own reviewing and universities try to keep full lists. There are internal work allocation models that provide recognition for peer review. Some journals reward reviewers with reduction in charges, 'peer review miles' to offset future fees and other waivers. Some conferences have reductions and some funders do pay reviewers or their institution. There are some universities that pay bonuses for peer review (e.g. REF peer review panel). With internal peer review it is unlikely you will get paid.
Dr Kirsty Edgar, Leverhulme Early Career Fellow at the University of Bristol provided an early career researcher (ECR) point of view on peer review. She reflected that it always seems to be the third review that's bad! ECRs want to improve research, get academic seal of approval, improve the dissemination of the research. But most importantly, they want to get through the process, publish in the highest impact journal possible, improve their CV and get a job.

There are several issues. There is little in the way of support or training, although this is improving. Are you getting a fair deal as a reviewee? Will you get promotion or fellowship? Will people read your work and will you be able to afford to publish my work?

There are some solutions: improve training or change the system in a small way such as peer choice, cascading reviews and open peer reviews. You can also fundamentally change the system by getting rid of journals, or on a slightly less radical agenda, introduce pre-publication, data and post-publication review. Edgar cited the eLife model of the support they provide to early career researchers. Edgar closed with some recommended reading: Sense About Science, the Voice of Young Science blog by James Steele and the BioMed Central Blog by Sarah Hayes.

Thursday 10 September 2015

What does content and behavioural data mean for publishing? Microsoft's Kuansan Wang considers.

The availability of large amounts of content and behavioural data has also instigated new interdisciplinary research activities in the areas of information retrieval, natural language processing, machine learning, behavioural studies, social computing and data mining.

Kuansan Wang, Director of the Internet Service Research Centre at Microsoft Research considered the impact for the publishing and consumption of content, drawing on observations derived from a web scale data set, newly released to the public.

If you think about the web as a gigantic library of the future, then you should think about the semantic web as the librarian. It involves trust, proof, logic, ontology vocabulary, rdf schema, xml schema, Unicode and URI.

A central theme for the semantic web is trying to help a machine read and makes sense: human readable versus machine readable contents. The semantic web requires humans to define a standard for data formats and models. It has an explicit and precise specification of knowledge representation that everyone has to agree upon.

The knowledge web is where a machine reads human readable contents. With the knowledge web, the machine learns to conflate different formats of the same thing. It involves latent and fuzzy representation of knowledge learned by mining big data.

There has been a paradigm shift in discovery. Traditional web search involves index keywords in documents, matches keywords in queries and has the relevance of "10 blue links". With knowledge web search it digests the world's knowledge, matches user intent and has a dialogue experience.

The dialogue acts in Bing and Cortana are:
  1. answer 
  2. confirmation 
  3. disambiguation 
  4. suggestion 
  5. progress: refinement.

In Bing, you get answers, there is an element of confirmation/correction, refinement dialogue and digressive suggestion. The interface is designed for naturally spoken language with context, confirmation and answer. You don't have to go to the search page, the disambiguation starts as you type. They train the system to try to summarise what it has to learn.

Some of the issues that bug the academic community are:
  • How to recommend completions for seldom observed or never foreseen queries?
  • How to rank these suggestions?
  • How to avoid making suggestions leading to no or bad results?
For finding researchers and potential collaborators they train a machine to go through and aggregate all the information.
Cortana provides proactive suggestions on Windows Android IOS. Concept is based on the successful personal assistants to the stars who write down the interests and activities of the people they serve to gain better insight. They have built in a lot of switches you can turn on/off for personalisation and if you have privacy concerns and now trained Cortana to do this for academics. One of the pain points you hit as a researcher is that you hit a paywall. Cortana tries to help by showing not only the academic article, but also related news stories.

The latest Microsoft vision is about empowering every person and every business to achieve more. They intend to do this through re-imaged productivity, more personal computing and most intelligent cloud. This translates to academic search, Cortana Academic and Project Oxford.

Wednesday 9 September 2015

ALPSP Awards for Innovation in Publishing - the finalists for 2015

The announcement of the winner of the ALPSP Awards for Innovation in Publishing is nearly upon us. Much anticipated and sometimes controversial, there's no denying the quality, breadth and range of the finalists.

In an intense lightning session the night before the Awards dinner, each of the shortlisted organizations presented for four minutes each to profile their submission.

The finalists were interviewed in the run up to the conference. Read each post at your leisure then debate who you think should have won on Thursday evening after the announcement!

Bookmetrix from Altmetric and Springer SBM
CHORUS - advancing public access to research
eLife Lens open-source reading tool from eLife
Impact Vizor from HighWire Press
JSTOR Daily online magazine
Kudos toolkit for researchers and their publishers
Overleaf authorship tool
RightFind XML for Mining from the Copyright Clearance Center
The Xvolution board game from NSTDA

The ALPSP Awards for Innovation in Publishing are sponsored by Publishing Technology. Not at the Awards dinner? Check back on the ALPSP website for the results!

Researching Researchers: Developing Evidence-Based Strategy for Improved Discovery and Access

How do you improve discovery and access to improve researchers, academics and students better? Roger C Schonfeld, Direct of the Library and Scholarly Communications Program at Ithaka S+R, chaired a panel including publisher, librarian and a library supplier at the 2015 ALPSP Conference.

Lettie Conrad, Executive Manager for Online Products at SAGE talked about their research on discoverability and delivery and learning from users to support their work. It's not about the user experience, it's about understanding the researcher experience. SAGE organises their product delivery on personas based on researchers and use case studies.

Conrad observed that whether we like it or not, the majority of search starts with the mainstream web. As a researcher advances in study skills and moves along their academic careers, they start to shift to speciality databases. Library discovery is for known items.

They undertook research into researcher experience through their workflow. Findings on queries included higher use of open web search reported, validating authenticity, browser trends.  Findings on retrieval included 100% manually managed citations, low use of hyperlinked reference, few 'version of record' checks. Many  use citation metrics, but only if they were above the fold and nearby.

They went on to ask what the uptake was for apps and tools and were surprised to hear that they didn't help with citation. It was a pain point. Easy import of citations was important. Being able to personalise their digital library.  What did this all mean for SAGE's strategy? They take the research findings to help shape strategy and ensure content is discoverable. They ensure they have  good usage statistics. their discovery strategy is based on their channels (library, open web, social media, academic, SAGE universe). Metadata is a key part of their strategy in three ways: stewardship, optimization and distribution. In the future, they are focusing on what's beyond search. What about the serendipitous process?

Deirdre Costello, Senior UX Researcher at EBSCO talked about how user expectations are formed on the open web, what users look for to make decisions about library resources, and why we need to think about our search results as one of the most important user experiences we can craft.

They conducted a video diary research programme to gather honest and open feedback from college and university students aged 14-18 years old. The great thing about this approach is that they saw the whole ecosystem as well as the wider range of tools they use to organise their lives. The expectations from these wider tools get ported on to those for college use.

Students have competing demands on their time from learning to do laundry for the first time, to making friends and keeping in touch with family. In addition to this, the changing neurology of minds to skim and scan content, impacts on how students search and interact with research.

Students have used Google for years and trust it, focusing on the top five results as it must be them that screwed up with the wrong search term, right? It's only when a tutor takes time out to explain how to question sources that students start to understand you can't trust everything you find on the web.

Lisa Janicke Hinchliffe is Professor/Coordinator for Strategic Planning/Coordinator for Information Literacy Services and Instruction at the University of Illinois' Library. They have articulated a user-centric framework of principles for library service development.

If you add a default search to your Easy Search query, there's a massive jump in usage. It is a very important piece of real estate for discovery. They use an evidence based and user centric framework in all their work and repeatedly go back to the data.

Their users value seamless, digital delivery. They want coherent discovery pathways. They want things as simple as possible, but NOT simplistic. When they say they want 'everything', it's from THEIR perspective. They have tried and tested a number of search options: transparency, predictability/explainability  and customisability are important.

Changing user behaviours include: the length of queries are growing, known item searches are increasing and there is an increasing use of copy and paste searching.

The user tasks that they aim to support are:

  • locate known item
  • locate known research tool
  • explore topic
  • identify/access library tools/databases for topic
  • identify/access research data and tools
  • identify assistance.
This had led to a range of discovery principles. They required personalisation and customisation with full library discovery for content, services and spaces. They want the fewest steps from discovery to delivery. Everything owned, licensed or provided by the library should be discoverable. They aim to fully develop and deploy fewer tools. They are aiming for a wide scale implementation of adaptive contextual assistance and use consistent language and labelling. Crucially for a state funded institution, they require the greatest discovery delivered at the lowest cost.

Anurag Acharya, co-creator of Google Scholar asks: What happens when your library is worldwide and all articles are easy to find?

There was a real sense of anticipation in the room as co-creator of Google Scholar Anurag Acharya stepped up to make the first keynote of the 2015 ALPSP Conference.
Acharya harked back to his time at grad school in 1990. Print was the dominant format. Research had to be physically handled. Every library was limited or bound in different ways. There was wide distribution for core collections, each field would have its own small sets of journals, with wide visibility for published articles. But there was narrow distribution for other journals that were found in far fewer libraries leading to limited visibility for published articles.

Browse was the common way to find research: tables of content for newly arrived issues, bibliography sections of papers you read, shelves of the libraries you could walk to. Some libraries had search services that were often based on titles, authors, keywords, included abstracts. There was no full text indexing or no relevance ranking. The most recent came first. but if you couldn't find it, you can't learn from it! In every way you were limited, by shelves, by institution's budget, that which you don't know about.

Fast forward to 2015. Almost all journals worldwide are online. A large fraction of archives are online. Anyone anywhere can browse it all - let your fingers do the walking. Your library is worldwide - online shelves have no ends. Relevance ranking allows all articles to rise - all articles are equally easy to find, new or old, well-known journal or obscure. Full text indexing allows all sections to rise including conclusions and methods.

Anyone anywhere can find it all: all areas, all languages, all time. Your own area or your colleague's, latest research or well-read classics, free to all users. If you can get online you can join the entire global research community. There is so much more that you can actually read from big deal licenses, free archives, preprints to open access journals and articles.
The transformation is fantastic - he could not have dreamt of this 25 years ago as a grad student. And publishers, societies, libraries and search services have together made this possible.

So how has researcher behaviour changed? What do they look for? What do they read? What do they cite? There is a tremendous growth in queries with many many more users and queries per user in all research and geographical areas.

Queries evolve: there has been the most growth in keyword/concept queries e.g. author name queries, known item queries. The average query length has increased to 4-5 words. There are multiple concepts or entities occur often and most queries are unique. Queries are no longer limited just to their own area. Relevance ranking makes exploration easy and broad queries return classics/seminal work. There's a mix of expert and non-expert queries from users with sustained growth in related area queries. The researcher is no longer limited to narrow areas.

What do they read? There has been steady and sustained growth per user as well as in diversity of areas per user compared to the growth in related areas queries. Users read much more shown through the growth in both abstracts and full texts.

There is more full text available than ever before. Iterative scanning is a common mode: do a query, scan. Abstracts that have full text links in the search interface are selected more frequently, even if they don't actually read the full text. PDF remains extremely popular for full text allowed what is important to the researcher to be accessible to them later.

They have undertaken research into what researchers cite and the evolution of citation patterns. The full report is published on the Google Scholar blog.

Anurag concluded by observing if it is useful, researchers will find/read/cite. The spread of attention is widening across the spectrum to non-elite journals (more specific, less known), older articles, regional journals and dissertations. Good ideas can come from anywhere and insights are not limited to the well-funded or to the web-published. The top 10 journals still publish many top papers: 85% in 1995 to 75% in 2013. The elite are as yet still elite, but less so.

Research is inherently a process of filtering and abstracts are a crucial part of the filtering process. Forcing full text on early-stage users is not useful and limiting COUNTER stats to full text misses much of an article's utility to researchers.

He reflected that we are lucky to live in an era of information plenty. Better a glut than a famine.

Thursday 27 August 2015

ALPSP Awards Spotlight on... JSTOR Daily

As the ALPSP Conference approaches (it's just under two weeks away and booking closes today) we are delighted to present the final post in the series that shines a spotlight on the finalists for the ALPSP Awards for Innovation in Publishing. Catherine Halley, Editor of JSTOR Daily, tells us what it's all about.

Tell us a bit about your company.

JSTOR is part of ITHAKA. ITHAKA ( is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways. We provide innovative services that benefit higher education, including Ithaka S+R, JSTOR, and Portico. JSTOR is a digital library of more than 2,000 academic journals, dating back to the first volume ever published, along with thousands of monographs, and other material.

What is the project that you submitted for the Awards?

JSTOR Daily is an online magazine published by JSTOR that offers a fresh way for people to understand and contextualize their world. We make historical peer-reviewed scholarly research and other library content relevant and accessible to a general audience by connecting it to the news and offering open access to the original research and other content housed in the JSTOR library. Our cheeky tagline—“where news meets its scholarly match”—encapsulates our belief that deep, substantive journalism doesn’t have to be boring. In addition to weekly feature articles, the magazine publishes daily blog posts that provide the backstory to complex issues of the day in a variety of subject areas, interviews with and profiles of scholars and their work, and much more.

Our idea of a good story is one that:

  • tells thought-provoking stories that appeal to a general reader
  • draws on scholarly research to provide fresh insight into the news media and current affairs
  • deepens our understanding of our world
  • highlights the amazing content found on JSTOR
  • exposes the work of scholars who are using JSTOR to conduct their research.

Tell us more about how it works and the team behind it.

We’re an extremely small team, a micro team, really. The site has two full-time editors and two part-time contributing editors. The stories are written by freelancer writers who are paid for the content they produce. We sit in the marketing and communications department at JSTOR, and work with a part-time in-house marketing person as well as a designer.

Why do you think it demonstrates publishing innovation?

JSTOR Daily publishes high-quality, carefully researched content that provides an alternative to the listicles and clickbait that seem to dominate mainstream media.

JSTOR is home to some of the most fascinating and well-respected peer-reviewed scholarship, as well as thousands of historical documents. In fact, the sheer volume of information available in the library can be so overwhelming that much of the content remains hidden in the archive. By weaving stories around the research, primary source material, and other content in JSTOR, and relating it to current conversations in the public sphere, JSTOR Daily aspires to expose the breadth of this truly great library to a wider audience, and encourage a general reader, regardless of institutional affiliation, to discover and dive into it. The magazine appeals to the knowledge seeker and lifelong learner in each of us.

While the website publishes editorial content, rather than peer-reviewed scholarship, we hope the magazine stories will transform JSTOR from a passive repository of knowledge to an active participant—in partnership with our readers and authors—in the generation and dissemination of collective human wisdom.

What are you plans for the future?

JSTOR Daily was launched in October 2014 on a shoestring budget with just one editor. In keeping with JSTOR's commitment to testing what software developers call "minimum viable products," the site was built using an out-of-the-box WordPress theme called SimpleMag. While the live site is more technically efficient than innovative, we have long-term plans to add interactive content, data visualizations, and a podcast series in the coming year.

The winner of the ALPSP Awards for Innovation in Publishing, sponsored by Publishing Technology, will be announced at the ALPSP Conference. Book now to secure your place.

Tuesday 25 August 2015

ALPSP Awards Spotlight on... Impact Vizor from HighWire Press

In this, the penultimate post from the ALPSP Awards for Innovation in Publishing finalists, John Sack, Founding Director of HighWire Press talks about how it Impact Vizor came about and their plans for the future.

"Can you answer these questions?

Do you reject quality content in one journal that you could use to start a new journal? Are the articles you published this year higher impact than last year’s?
Are you anxious about the impact of the new journal you just launched?
Has your tighter acceptance rate helped your competitors get better articles?
Are you rejecting some very high impact articles?
What topics that you publish are “trending”?  Is this a shift for your discipline?
Who publishes your rejected articles?
Are your review articles this year as good as they were the last two years?
What can you tell the new EIC about the impact of editorial changes?

(What questions are you and your editors asking along these lines?  Let us know in the comments.)

The above editorial and publishing questions have been around forever, but the answers have been based on instinct or feeling or what the EIC’s friends think.  Now we can get better answers.

Where do the better answers come from?

Almost from the start of HighWire, I have been using the tag phrase "evidence-based publishing". With its roots at Stanford University -- and in the Stanford University Library in particular -- HighWire was formed around the idea that data -- evidence -- could inform best practices and the best decisions.

Data hasn't been in short supply lately.  The challenges have been to "mill" the data into information that works for decision-makers: this often means integrating data across disparate systems, making it easy to understand, and timely to access.  We think we are on to how to do this with our "Vizor" suite of analytics products.

What are “Vizors”?

HighWire's Vizors are visualized analytics.  The first Vizor that has begun rolling out is Impact Vizor, which helps editors and publishers see the research impact of the articles they publish, and the articles they reject.

Impact Vizor includes the "Rejected-Article Tracker" (the “RAT”) which visualizes where articles an editor or publisher rejects get published, and how much they get cited.  The "RAT" has been rolled out to sixteen publishers.  Soon we will be rolling out the Section Performance Analyzer (the "SPA") and the Hot Object Tracker (the "HOT").   Following those will be the Advanced Correlator of Citation and Usage (the "ACCU") and the Cohort Comparator ("CC").

Editors get very very excited by the RAT; previously this kind of information has been hard to develop, and hard to visualize.  Impact Vizor does both.  And this information tells you important things about the potential for starting new sections, new journals, or other new products.

And in other big news:  a journal does not have to be hosted by HighWire to use Impact Vizor.  In fact, our first deployment was for a major publisher who is not hosted by HighWire.

How does it work?

Impact Vizor combines data across different systems to put together (literally) a picture of how content is being used and cited.   The key is to get the earliest possible indicators of research merit -- primarily citations -- and see what patterns there are in the data.  In addition to all the visuals, there are also full data tables for those who want to further process and investigate the data, by easily moving it into Excel for example.

How did it get developed?

The Vizor development process incorporated our "early adopter” group of sixteen publishers right from the start: this group prioritized the components to be developed, and three or four publishers are involved in creating the prototypes of each component.

We have a highly-focused team inside HighWire working on the software, and we use a number of open source and "magic quadrant" components.  Key to the software of course is the data, and here we stand on industry APIs and standards, as well as a lot of analytics data and experience at HighWire over the years.

Why should you care?

Impact Vizor solves a significant problem facing publishers and editors: they “fly blind” when making changes in policy, scope or personnel, because the feedback from current citation monitoring tools is terribly slow and painful to interpret.

We integrate varied information sources, and do it rapidly.  We visualize the information, rather than just show large data tables.   Vizors are easy for the non-data-scientist to use; but they can deliver data for the data scientist.

We built a user community to direct the development towards the highest-value business and editorial questions, and we are delivering periodic new function so that value is increasing.

What’s next for Vizors? 

We plan to roll out a half dozen components of Impact Vizor after the RAT, according to a customer-guided roadmap.   Additional data sources and visualizations will come in 2016 as we further expand the concept of impact beyond citation metrics.

HighWire also will be developing other Vizors beyond Impact Vizor, to visualize usage, metadata, use demographics and manuscript workflows."

Friday 21 August 2015

ALPSP Awards Spotlight on… The Xvolution Game from NSTDA

The Xvolution is the first Thai commercial board game specifically created based on Thai fossilized specimens data through the ally of public and private sectors. It has been developed by the NSTDA (National Science and Technology Development Agency) Thailand, partly supported by Plan Toys (Plan Creations Co., Ltd.) and G Softbiz Co., Ltd. In this latest post from the finalists for the ALPSP Awards for Innovation in Publishing, the Xvolution team explain what it is and how it works.

Tell us a bit about your company

The National Science and Technology Development Agency (NSTDA) is an agency of the government under the Thai Ministry of Science and Technology. NSTDA is an umbrella organization that plans and executes the four mandated missions of research and development, technology transfer, human resources development and infrastructure development. NSTDA comprises four national research centers: BIOTEC (National Center for Genetic Engineering and Biotechnology), NECTEC (National Electronics and Computer Technology Center), MTEC (National Metal and Materials Technology Center), and NANOTEC (National Nanotechnology Center). NSTDA works closely with its partners from other government agencies and the private sector, both domestically and internationally, through different mechanisms to achieve these goals.

What is the project that you submitted for the Awards?

The Xvolution is an edutainment board game, partly based on Thai paleontological specimens data. All illustrations are supervised by experts in the field. Augmented Reality (AR) technology and the matched mobile application are an accompaniment to the game. Three stakeholders were involved in game development: the prototype was initiated by researchers, the AR and commercialized version improvement were supported by private companies. The Xvolution is the first Thai commercial board game, providing a new business model for Thai gamers. The game contents were also modified to be used in subway exhibition and also at the Thailand Science and Technology Fair.

Tell us more about how it works and the team behind it.

The board game is both a recreational and educational tool. Recently, board game playing has gained popularity among Thai teens and adults, especially in urban areas. But up to now all the commercial games have been imported from abroad. It is hoped that making a prototype game might be a good way to stimulate many gamers to start creating new games themselves. The Xvolution was initiated by a Ph.D. student, Mr. Peechanit Ketsuwan, and two NSTDA (National Science and Technology Development Agency) officers, Ms. Sasithorn Teth-uthapak and Dr. Namchai Chewawiwat.

The game design is based on the main idea that most of people, of any age, may enjoy playing the game, and the activity itself is a very effective way for learning. We focused on the basic evolutionary contents that were also supplemented with unique data from Thai fossilized specimens. All drawings were newly created for this project. Dr. Varavudh Suteethorn and Dr. Suravech Suteethorn supervised the accuracy of all the illustrations in the game. Both are the researchers at the Palaeotological Research and Education Centre, Mahasarakham University. Dr. Varavudh is now the Director of the Centre.

The rules for Xvolution are modified from the standard Millionaire Game. But instead of using fixed-squared big board, we designed small hexagon-shaped steps that all the players could select to combine and use as the route.

Some special command signs are added for the Battle, Mutation and Level Up. All of these either use for biological contents adding or game excitement enhancement.

When the player steps on the sign 'Battle', (s)he need to 'Battle' with another player, using specific cards obtained from the start and during the playing. The 'Battle' card shows the special environmental condition, ie. endemic, hot and draught, etc., and the player need to choose their cards to fight, with the opponent player. So, for example, if the ice age is the condition, then the animal with fur will survive better than the animals with thick and thin skin, respectively. About a dozen of environmental change conditions is included, to demonstrate the interaction between animals and physical/ biological surroundings.

The 'Mutation' sign is for the mutation concept learning. The player who steps on this sign will open the 'Mutation' card which will command some actions. This might be to exchange genetic materials: exchange your 'Skin' card with central pool. The most severe mutation condition is extinction!

Every time the player passes the 'Start' point, (s)he will gain some eggs and could use them to exchange to the upper level of evolution. The winner is the one who passes from the oldest to the most recent geological era first or who could gain more cards and eggs in the players' set-up period of time.

The game prototype was tested several times using different group of players, varying from elementary, secondary school and college students and also adults, from lay people to researchers. Most of players were satisfied and had fun with it.

AR Technology is presently merged into several products, including books. G. Softbiz Co., Ltd. and several other Thai publishers were successful in using AR technology to promote fairy tale books. Taking photographs with some dinosaur models is a very attractive idea and has the potential to make young children pay more attention to this game and the palaeontological contents. G. Softbiz Co., Ltd. developed the dinosaur models for AR and the mobile app. The Xvolution is available for free download.

Why do you think it demonstrates publishing innovation? 

A specific business model was used for the Xvolution. NSTDA absorbed all the expenses on the prototype development and also partly in final production. Plan Toys, the local expert company in children's toys and books, helped transform the prototype to the commercial version. Some material and resins were changed to be wood and paper works. A lighter and more compact box set was modified. Plan Toys also shared some expenses on production. G. Softbiz Co., Ltd., another local companies that is expert on AR design and book publishing, helped develop the AR model and mobile application and invested in this part. The revenue is shared within these three stakeholders.

What are you plans for the future? 

After more than five years of development, the Xvolution board game launched in January 2015 at Isetan Department Store, the Bangkok Central World in the downtown area of Bangkok. The opportunity to export the Xvolution abroad is now being considered. The social media of this game is at

From the start, the monetary profit is not the major reason or target for our innovation, but the science communication to the public, including the distribution of the scientific information, and the application of a new way for public understanding of science are our concerns.

Therefore we used our information and/or products in many other ways too. The game has been simplified and used as a part in the Thailand Science and Technology Fair 2012. This Fair normally runs for two weeks with more than 1 million visitors, mostly students. We also did the Xvolution mini-exhibition at Chatuchak station of Thailand underground train during July to December 2013. Several ten thousands people transit through this station daily.

We also tried to push the Xvolution contents to be applied as a TV game show on the Thai PBS TV station. The concept of program was discussed several times and the air-time was granted but, unfortunately, we could not find sponsors for this program. We aim to distribute the simple blue-print of the game as a free download for students and teachers, to use as a study tool in schools, in the future.

The Xvolution team:
Ms. Sasithorn Teth-uthapak,
 Dr. Namchai Chewawiwat, email:
Mr. Peechanit Ketsuwan, email:

The winner of the ALPSP Awards for Innovation in Publishing, sponsored by Publishing Technology, will be announced at the ALPSP Conference. Book now to secure your place.

Wednesday 19 August 2015

ALPSP Awards Spotlight on… CHORUS: Collaboration and Innovation for the Public Benefit

In this latest post from the finalists for the ALPSP Awards for Innovation in Publishing, Susan Spilka, Marketing and Communications Director for CHORUS, tells us more about the project and why they felt it should be entered for the Awards.

"It’s an honor for CHORUS to be included among the impressive group of finalists for the ALPSP Awards for Innovation in Publishing 2015. We greatly appreciate the opportunity to showcase the work we are doing.

CHORUS is the outcome of successful collaboration since 2013 among publishers, funding agencies, scholarly societies, and other stakeholders. Our services and best practices provide a sustainable, scalable, cost-effective, interoperable, and transparent solution to deliver public access to published articles reporting on funded research. As a not-for-profit 501(c)(3) membership organization, CHORUS is policy-neutral and focuses on building consensus and services to advance public access in a way that recognizes and sustains the value that publishers bring to science and scholarship, for the benefit of all. The result is increased discoverability for authors’ research, enhanced accountability for funders, and accessibility to articles reporting on research, for everyone.

CHORUS is being adopted by funders to enhance their efforts to increase public access to articles based on funded research and complement their compliance procedures. CHORUS launched with a commitment from the US Department of Energy (followed by a Participation Agreement in April 2015). This week, the Smithsonian Institute released its Public Access plan, naming CHORUS as part of its solution. We expect to announce a pilot project with another federal agency very soon, and anticipate more agency agreements by early fall. CHORUS has also started discussions with a number of funders outside of the US as well as several global NGOs that fund research.

CHORUS is growing rapidly, gaining momentum as we add publisher members’ content and enter into partnerships with funders. As of early August 2015, we are monitoring over 120,000 articles associated with 24 funders for public access, with more than 30,000 already publicly accessible. Our membership roster includes publishers of all stripes and sizes (including not-for-profit societies, university presses, and commercial companies with pure OA and hybrid programs) as well as publishing service providers and organizations. Members pay modest fees to support CHORUS’ operations, while funders, researchers, university research officers, librarians, publishers and the public all benefit from CHORUS’ services at no cost. Our small staff includes myself and Howard Ratner (Executive Director) – two publishing veterans bringing multiple decades of experience – along with a Digital Analyst, Program Manager, and outsourced development, legal, and finance staff. We rely heavily on the tireless efforts of a volunteer Board of Directors and working groups.

CHORUS knits together new services and open APIs with the existing scholarly communications infrastructure to minimize effort and expense and maximize the identification, discovery, access, and preservation steps of compliance to funder public-access requirements. CHORUS utilizes CrossRef’s FundRef service to identify and keep track of papers reporting on research coming from funder grants and contracts. We promote the use of ORCID identifiers and other industry best practices. The resulting metadata (including the DOIs of the published articles) helps make articles more easily discoverable (via CHORUS’ Search service as well as agency portals and general search engines) and transparently describes their public accessibility.

Part of what makes CHORUS unique is that our distributed access approach points users directly to the best available version of articles (either the accepted author manuscript or Version of Record) on publishers’ sites, either immediately or after the prescribed embargo period, in perpetuity. CHORUS also provides freely available, downloadable dashboards with detailed reporting on public accessibility and availability of reuse license terms and preservation arrangements. Our auditing system – a hybrid system of automation and manual checking for public access – follows the DOI links for every identified article in our database. CHORUS publisher members are required to archive their content; CHORUS has special arrangements with CLOCKSS or Portico that enable perpetual public accessibility.

CHORUS is committed to evolving with the needs of the scholarly community. Recently we started to provide customized publisher dashboards as a member benefit. We’ll soon be making our open APIs widely available. Further down the pipeline we are considering some projects designed to make publicly accessible content and data more discoverable. Our recent agreement with ORCID formalizes the coordination of our efforts to promote the adoption of identifiers and standards to manage access to and reporting of research outputs. We also see the tremendous potential of providing access to data; as a result, we are participating in such Initiatives as CrossRef/DataCite, RDA/WDS-Publishing Data Services Working Group, and RMAP.

Stay tuned for new developments … it’s going to be a busy fall! Follow CHORUS on Twitter @CHORUSaccess and Linked In."

The winner of the ALPSP Awards for Innovation in Publishing, sponsored by Publishing Technology, will be announced at the ALPSP Conference. Book now to secure your place.