Tuesday 22 September 2015

Reflections on #alpsp15 - Digital Science's Phill Jones explores the key issues

Phill Jones, Head of Publisher Outreach at Digital Science, reflects on the duelling keynote talks from Anurag Acharya, co-founder of Google Scholar and Kuansan Wang, Director, Internet Service Research Center at Microsoft Research in this blog post reflecting on the 2015 ALPSP Conference. He noted their very different views on academic discovery on the open web.

"Citing the difference between "general" search for say a local business, and the geographically global nature of "academic" search, Acharya suggested that personalizing Google scholar wouldn’t yield much additional value. Conversely, Wang described a very different philosophy of highly monitored, highly personalized search through Bing and Cortana that would adapt to individual users needs."

He reflects on the shift in customer base from library to researcher and the resulting revelations as publishers try to better understand their needs:

"Google is so firmly embedded in young researcher’s routines that they don’t even think about the fact that they use it. You wouldn’t expect somebody to tell you that they opened an internet browser, would you?" 

The panel on peer review provoked the following thought:

"One reoccurring theme that emerged from the discussions: the fact too much is currently being asked of the peer-review process. With the mantra of "publish or perish" being truer now than it’s ever been, it can be argued that publishers find themselves unwittingly in the position of administering the process that decides whose career advances and whose doesn’t."

With that position comes great responsibility, something that will no doubt be considered in more detail during Peer Review Week to be held 28 September to 2 October 2015, a collaboration between ORCID, OpenScience and Wiley announced by Alice Meadows during the conference.

Read Phill's full post here on the Digital Science blog.

Monday 14 September 2015

The Academic Book of the Future?

Much discussion of scholarly communication is dominated by scientific and (especially) serials concerns. This session aimed to redress the balance. Richard Fisher chaired a distinguished panel of academics to discuss the recent trends and data on monographs and the current AHRC project on the Academic Book of the Future. These are natural starting points for an extended discussion of what still remains the major currency of both communication and esteem in many academic subjects in the humanities and social sciences.
Simon Tanner from King’s College London and the AHRC Academic Book of the Future project provided an overview of the work completed to date and some highlights from the research data. The first stage of the research project has focused on finding out what the roles and purposes of academic books to serve scholarship and wider learning for all groups involved in this area and then to sense check that back to those groups.

The REF2014 submissions provided a rich data set as a means of learning more about the academic books created and deemed worthy of submission in the last REF cycle (2009-2014). They focused on the Main Panel D for Arts and Humanities. Within this Panel the data can be investigated by Unit of Assessment Subject Area and by Research Output Type. They hope to look at various areas including author gender, book format or length, books per submitting institutions and open access books. Tanner shared some initial analysis that threw up surprising findings including how chapters still feature in REF submissions and how few publishers submit more than 10 titles.

Michael Jubb has been working with the Academic Book of the Future and initial findings from the research suggest books remain a critical part of the scholarly infrastructure in analogue form, but we haven't yet articulated how to present the broad range of scholarly resources in the humanities in an effective and user-friendly way. More will be discussed during Academic Book Week in the UK 9-16 November 2015.

There seem to be powerful incentives to write and to publish books, even as volumes of sales of individual titles fall. Are we publishing too many books?
Professor Peter Mandler from the University of Cambridge and President of the Royal Historical Society observed that technology is making more of an impact on publishing and it's right to reflect on effects on monograph publishing for good or evil. The high cost is a barrier, but it's not practical to totally remove price as a good deal of work goes into it - including remunerated peer review. However, he believes that the sooner we can reduce cost through use of ebooks, the better. New initiatives including 30-60,000 length monographs are to be welcomed.

It's interesting to note the funders don't discuss that the average score for a monograph was much higher than for chapter or article. There are implications for the future of the academic book driven by changes to productivity in output, measurement and metrics around research, and generational changes (younger generation often prefer chapters or articles than long form research).

Professor John Holmwood, University of Nottingham and Past President of the British Sociological Association noted there are some social sciences that hardly submit any books. He reflected on a decline in cultural scholarlship in some social science disciplines. He believes the move across to a linear, cumulative form of journal output perhaps lacks the reflection and transformative impact. The commercialisation of higher education and the development of publishing business models suggests a link in government actions. On the one hand there is a radical ambition to create a democractic open online library, but how does that fit with the commercialisation (and underlying privatisation) of universities?
Holmwood observed there is disruption of the curriculum and the book as a result. Publishers are disrupting both as well with innovation with education delivery and digital provision of learning materials. He feels monographs and journals are moving at different speeds and this is now becoming a problem. Article based disciplines have citation patterns that show a short life of an article compared to disciplines that tend towards long form research which can be cited for decades.

There is no doubt the debate will continue. Follow #AcBookWeek and @AcBookFuture for more details.

Friday 11 September 2015

Digital developments and new revenue streams

Timo Hannay introduced the final panel at the 2015 ALPSP Conference that focused on digital developments and new revenue streams.

Mary Ging has worked as a consultant and as MD for international at Infotrieve. In traditional access models, the annual license model still dominates. Publishers have been experimenting with value-add options. In the medium to long term the consensus is that the model will change. Given millenial's short attention spans and time crunch, is the traditional 12-15 page article the best way to disseminate research information? Is there a better alternative?

Pay Per View document delivery is bigger than you think. Publishers including CUP, Wiley, Nature and others offer rental on their websites as well as third parties such as DeepDyve and ReadCube. With article enhancement, sharing and collaboration tools there are also publisher or third party options as well.

There has been rapid growth in OA publications. In September 2015, DOAG listed 10,555 - a 30% increase in three years. Hybrid journals are increasing and there's a significant improvement in quality. Most articles are covered by a Creative Commons license, increasingly CC-BY. Text and data mining is a new area where there's a lot of interest, but not a lot of revenue. Another issue is the lack of expertise. The biggest challenge is collecting the corpus is a consistent way. There is an opportunity to provide a corpus creation solution for those who wish to do text and data mining. Is there a market for an ebay for datasets? That could work as an incentive to work with it.

Other opportunities include the importance of the patent literature and helping academics stay in touch with what's happening in this area. There could be tools to meet with regulatory requirements (e.g. Quosa for pharmacovigliance). What about the cloud? Are their opportunities to use for one solution for publisher to buy into with standard data structures? The best opportunities will be found by publishers who define their remit more broadly than just the paper.

Mat Pfleger, Managing Director at the Copyright Licensing Agency shared the challenges that CLA are considering as they develop their services. These relate to policy changes to education and HE as well as cuts in funding. Automatic renewals and inflationary pricing are symptoms of complacency. The challenge is to think beyond the short term deal. The current focus on cost masked the broader challenges we face today. We need to focus on that.

Another challenge is a range of new, disruptive services that deliver content as part of a service, each providing data that can be used in many different ways. Each creates value across multiple touch points across an institution. Some examples include The Mobile Learner's Library from Pearson
Kortext - beyond content, it's a collaboration and analytics tool
Article Galaxy Widget

How do we as a community engage with multiple open sources. Some interesting examples are Open Stack Space. Funded by a number of foundations. It provides student access to peer reviewed text books. This year alone they serviced 200,000 students and claim $25million savings. Lumen's mission is to provide open education resources to eliminate textbook costs. 4.9 million resources are downloaded each week from Tes. They recently hired a former ebay contact and have created a marketplace for teachers. It is a significant platform for open educational resources. When you combine with the challenge to the budget, this is a significant game-changer.

What does the collective licensing and streaming of content mean for collective licensing organizations. Netflix, Spotify, EPIC! are all subscription services that have potential to disrupt. They all have a growing catalogue of content which is presented at a granular level. Royalties are linked at this level with micro payment system. Every content industry engaging with these services has had to have a serious rethink about their business model.

Chris Graf, Business Development Director for the Society Services Team at Wiley pondered on what societies really want from publishers. Primarily it is financial, and particularly around new revenue. This new revenue can come from new markets, new adjacent markets and new products. Surprisingly, the biggest growth in content is in Latin America. So they are an area they focus on. With adjacent markets, transactional income such as rental and advertising can be considered. You can think about the user as potential adjacent revenue, but a user pays model can be risky. With new breakout products you need technical insight, drive down costs and usability. They consider this when looking at developing author services.

Graf closed by reflecting on revenue steams. What we have right now is a complex eco-system that publishers and societies benefit from, but it has taken hundreds of years to develop. It's worth bearing that in mind.

Tanya Field, Director of Mobile Value Partners and self-proclaimed outside had a simple message. All the other industries are having to learn that working as individuals will enable them to overcome the hurdles they face to remain profitable. Whatever you deliver to your consumers, the actual delivery needs to be simple and incredibly easy to use. That means presentation levels for every single format. That's a technical challenge as there are so many formats. You really need very clear signposting and intuitive flows for the users to get to the content. It's not just about delivering flat information. Younger users want to engage with content.

Your distribution strategy needs to be at the top of the access point. Last, but not least, the most important thing you need to consider is that the whole world is driven by data. Context and relevance are key to success. Know who your customer is, what they like, when they like it and deliver it to them. Your customer data strategy is key. If data isn't at the heart of your strategy it will be a problem in the future.

Peer review: evolution, experiment and debate

John Sack, Founding Director of HighWire Press introduced the morning panel on peer review at the 2015 ALPSP Conference.

Dr Aileen Fyfe, PI of the ‘Publishing the Philosophical Transactions’ project at the University of St Andrews reflected on how Henry Oldenburg used an editorial driven model for Philosophical Transactions. The Royal Society approved all issues for publication. But it wasn't at the article level, it was to confirm there were no threats to the country: it was ratification of a sort.

In the 1760s considered papers by looking at the abstracts and would take a vote. This was to protect the reputation of the Society and did not check facts or reasoning. Meanwhile in France, the Académie royale des sciences asked academicians might be asked to report jointly on submitted papers on the truth claims being made in the papers. However, they did not judge each other, only outsiders. This high level checking scrutiny was abandoned in the 1780s as it was deemed too difficult. In the Philosophical Transactions in the 1860s referees who were a member of the society would make recommendations for publications. They provided literary comments about the article. The Fellows were writing about each other's work as well as outsiders. This now included a judgment on originality and significance.

The Philosophical Magazine in the 1920s:
Nature in the 1950s-70s would publish papers if they weren't actually wrong with erratic refereeing. They relied on papers that came from good institutions and/or known labs. In summary, the history of peer review is not as simple as you might imagine. Much better to understand this before we move forward to revise and update peer review going forward.

Dr John R Inglis, Executive Director and Publisher at Cold Spring Harbor Laboratory Press wryly noted there are many critics of modern peer review, but the prevailing view tends to be that it may not be perfect, but it's better than nothing. What do scientists think about peer review? Most are satisfied with it, think it helps scientific communication and think it has improved their papers. However, many think it can be improved, think it holds back science and now believe it is now unsustainable with increasing journals and science.

Most scientists think peer review should improve the quality of a paper, determine its originality and the importance of its findings. It helps to ensure previous work is acknowledged, help select the best papers for the journal and detect falsehoods. There are many ways that peer review is changing including double blinding, transparency, publishing reviews alongside papers, checking figures for manipulation, use specialised data, validating authors and reviewers and forbidding author-offered reviewers.

There are also changes to where peer review is done. often it is outsourced to peer review platforms like Rubriq, Peerage of Science, Editage and PubLons. There are also changes to when peer review is done. After publication there are a range of options to comment on papers: journal specific commenting functions, PubMeds Commons, PubPeer, ResearchGate and Academia.edu, ScienceOpen and SelectedPapers.net.

Cold Spring Harbor Press launched their pre-print server for biology bioRxiv. It's a not-for-profit free service that distributes draft papers for open comment. Posting is quick. Papers get a date stamp and a DOI. There is a commenting function and they link to the history of the evolution of the paper. It results in rapid transmission of results for community consideration. They have more than 2000 manuscripts posted from over 40 countries and more than 800 institutions. There are rising rates of submission and usage. Every subject category is respected and most manuscripts eventually appear in journals.

They know that 30% of papers have been revised and 33% of all papers have been published in more than 190 journals. They have extensive feedback via social media including 25,000 tweets. There are plans to make submission easier for authors. The use of pre-prints is changing. The behaviour of biologists is changing and journals policies are changing. Inglis closed by quoting the warnings contained in the Research Information Network report on Peer Review.

Dr Simon Kerridge is Director of Research Services, University of Kent and Chair of the Board of Directors at the Association of Research Managers and Administrators. Peer review is generally for journal articles, monographs and other long form research outputs, research data and other forms of scholarly output. From his point of view, it also includes research project proposals, research environment and strategy, and research impact. There are many purely academic reasons for doing peer review, but recognition is a part of it too. Promotion, esteem, time and money are all factors. Many universities have 'citizenship' as criteria for promotion where peer review is a factor. There may be internal or external mentoring or structured support. Becoming a journal editor or reviewer looks good on a CV.

By raising your profile you gain more recognition, but how is this recorded/advertised? Very few journals list reviewers. Some funders list 'peer review college' and most conferences list reviewers. Most academics list their own reviewing and universities try to keep full lists. There are internal work allocation models that provide recognition for peer review. Some journals reward reviewers with reduction in charges, 'peer review miles' to offset future fees and other waivers. Some conferences have reductions and some funders do pay reviewers or their institution. There are some universities that pay bonuses for peer review (e.g. REF peer review panel). With internal peer review it is unlikely you will get paid.
Dr Kirsty Edgar, Leverhulme Early Career Fellow at the University of Bristol provided an early career researcher (ECR) point of view on peer review. She reflected that it always seems to be the third review that's bad! ECRs want to improve research, get academic seal of approval, improve the dissemination of the research. But most importantly, they want to get through the process, publish in the highest impact journal possible, improve their CV and get a job.

There are several issues. There is little in the way of support or training, although this is improving. Are you getting a fair deal as a reviewee? Will you get promotion or fellowship? Will people read your work and will you be able to afford to publish my work?

There are some solutions: improve training or change the system in a small way such as peer choice, cascading reviews and open peer reviews. You can also fundamentally change the system by getting rid of journals, or on a slightly less radical agenda, introduce pre-publication, data and post-publication review. Edgar cited the eLife model of the support they provide to early career researchers. Edgar closed with some recommended reading: Sense About Science, the Voice of Young Science blog by James Steele and the BioMed Central Blog by Sarah Hayes.

Thursday 10 September 2015

What does content and behavioural data mean for publishing? Microsoft's Kuansan Wang considers.

The availability of large amounts of content and behavioural data has also instigated new interdisciplinary research activities in the areas of information retrieval, natural language processing, machine learning, behavioural studies, social computing and data mining.

Kuansan Wang, Director of the Internet Service Research Centre at Microsoft Research considered the impact for the publishing and consumption of content, drawing on observations derived from a web scale data set, newly released to the public.

If you think about the web as a gigantic library of the future, then you should think about the semantic web as the librarian. It involves trust, proof, logic, ontology vocabulary, rdf schema, xml schema, Unicode and URI.

A central theme for the semantic web is trying to help a machine read and makes sense: human readable versus machine readable contents. The semantic web requires humans to define a standard for data formats and models. It has an explicit and precise specification of knowledge representation that everyone has to agree upon.

The knowledge web is where a machine reads human readable contents. With the knowledge web, the machine learns to conflate different formats of the same thing. It involves latent and fuzzy representation of knowledge learned by mining big data.

There has been a paradigm shift in discovery. Traditional web search involves index keywords in documents, matches keywords in queries and has the relevance of "10 blue links". With knowledge web search it digests the world's knowledge, matches user intent and has a dialogue experience.

The dialogue acts in Bing and Cortana are:
  1. answer 
  2. confirmation 
  3. disambiguation 
  4. suggestion 
  5. progress: refinement.

In Bing, you get answers, there is an element of confirmation/correction, refinement dialogue and digressive suggestion. The interface is designed for naturally spoken language with context, confirmation and answer. You don't have to go to the search page, the disambiguation starts as you type. They train the system to try to summarise what it has to learn.

Some of the issues that bug the academic community are:
  • How to recommend completions for seldom observed or never foreseen queries?
  • How to rank these suggestions?
  • How to avoid making suggestions leading to no or bad results?
For finding researchers and potential collaborators they train a machine to go through and aggregate all the information.
Cortana provides proactive suggestions on Windows Android IOS. Concept is based on the successful personal assistants to the stars who write down the interests and activities of the people they serve to gain better insight. They have built in a lot of switches you can turn on/off for personalisation and if you have privacy concerns and now trained Cortana to do this for academics. One of the pain points you hit as a researcher is that you hit a paywall. Cortana tries to help by showing not only the academic article, but also related news stories.

The latest Microsoft vision is about empowering every person and every business to achieve more. They intend to do this through re-imaged productivity, more personal computing and most intelligent cloud. This translates to academic search, Cortana Academic and Project Oxford.

Wednesday 9 September 2015

ALPSP Awards for Innovation in Publishing - the finalists for 2015

The announcement of the winner of the ALPSP Awards for Innovation in Publishing is nearly upon us. Much anticipated and sometimes controversial, there's no denying the quality, breadth and range of the finalists.

In an intense lightning session the night before the Awards dinner, each of the shortlisted organizations presented for four minutes each to profile their submission.

The finalists were interviewed in the run up to the conference. Read each post at your leisure then debate who you think should have won on Thursday evening after the announcement!

Bookmetrix from Altmetric and Springer SBM
CHORUS - advancing public access to research
eLife Lens open-source reading tool from eLife
Impact Vizor from HighWire Press
JSTOR Daily online magazine
Kudos toolkit for researchers and their publishers
Overleaf authorship tool
RightFind XML for Mining from the Copyright Clearance Center
The Xvolution board game from NSTDA

The ALPSP Awards for Innovation in Publishing are sponsored by Publishing Technology. Not at the Awards dinner? Check back on the ALPSP website for the results!

Researching Researchers: Developing Evidence-Based Strategy for Improved Discovery and Access

How do you improve discovery and access to improve researchers, academics and students better? Roger C Schonfeld, Direct of the Library and Scholarly Communications Program at Ithaka S+R, chaired a panel including publisher, librarian and a library supplier at the 2015 ALPSP Conference.

Lettie Conrad, Executive Manager for Online Products at SAGE talked about their research on discoverability and delivery and learning from users to support their work. It's not about the user experience, it's about understanding the researcher experience. SAGE organises their product delivery on personas based on researchers and use case studies.

Conrad observed that whether we like it or not, the majority of search starts with the mainstream web. As a researcher advances in study skills and moves along their academic careers, they start to shift to speciality databases. Library discovery is for known items.

They undertook research into researcher experience through their workflow. Findings on queries included higher use of open web search reported, validating authenticity, browser trends.  Findings on retrieval included 100% manually managed citations, low use of hyperlinked reference, few 'version of record' checks. Many  use citation metrics, but only if they were above the fold and nearby.

They went on to ask what the uptake was for apps and tools and were surprised to hear that they didn't help with citation. It was a pain point. Easy import of citations was important. Being able to personalise their digital library.  What did this all mean for SAGE's strategy? They take the research findings to help shape strategy and ensure content is discoverable. They ensure they have  good usage statistics. their discovery strategy is based on their channels (library, open web, social media, academic, SAGE universe). Metadata is a key part of their strategy in three ways: stewardship, optimization and distribution. In the future, they are focusing on what's beyond search. What about the serendipitous process?

Deirdre Costello, Senior UX Researcher at EBSCO talked about how user expectations are formed on the open web, what users look for to make decisions about library resources, and why we need to think about our search results as one of the most important user experiences we can craft.

They conducted a video diary research programme to gather honest and open feedback from college and university students aged 14-18 years old. The great thing about this approach is that they saw the whole ecosystem as well as the wider range of tools they use to organise their lives. The expectations from these wider tools get ported on to those for college use.

Students have competing demands on their time from learning to do laundry for the first time, to making friends and keeping in touch with family. In addition to this, the changing neurology of minds to skim and scan content, impacts on how students search and interact with research.

Students have used Google for years and trust it, focusing on the top five results as it must be them that screwed up with the wrong search term, right? It's only when a tutor takes time out to explain how to question sources that students start to understand you can't trust everything you find on the web.

Lisa Janicke Hinchliffe is Professor/Coordinator for Strategic Planning/Coordinator for Information Literacy Services and Instruction at the University of Illinois' Library. They have articulated a user-centric framework of principles for library service development.

If you add a default search to your Easy Search query, there's a massive jump in usage. It is a very important piece of real estate for discovery. They use an evidence based and user centric framework in all their work and repeatedly go back to the data.

Their users value seamless, digital delivery. They want coherent discovery pathways. They want things as simple as possible, but NOT simplistic. When they say they want 'everything', it's from THEIR perspective. They have tried and tested a number of search options: transparency, predictability/explainability  and customisability are important.

Changing user behaviours include: the length of queries are growing, known item searches are increasing and there is an increasing use of copy and paste searching.

The user tasks that they aim to support are:

  • locate known item
  • locate known research tool
  • explore topic
  • identify/access library tools/databases for topic
  • identify/access research data and tools
  • identify assistance.
This had led to a range of discovery principles. They required personalisation and customisation with full library discovery for content, services and spaces. They want the fewest steps from discovery to delivery. Everything owned, licensed or provided by the library should be discoverable. They aim to fully develop and deploy fewer tools. They are aiming for a wide scale implementation of adaptive contextual assistance and use consistent language and labelling. Crucially for a state funded institution, they require the greatest discovery delivered at the lowest cost.

Anurag Acharya, co-creator of Google Scholar asks: What happens when your library is worldwide and all articles are easy to find?

There was a real sense of anticipation in the room as co-creator of Google Scholar Anurag Acharya stepped up to make the first keynote of the 2015 ALPSP Conference.
Acharya harked back to his time at grad school in 1990. Print was the dominant format. Research had to be physically handled. Every library was limited or bound in different ways. There was wide distribution for core collections, each field would have its own small sets of journals, with wide visibility for published articles. But there was narrow distribution for other journals that were found in far fewer libraries leading to limited visibility for published articles.

Browse was the common way to find research: tables of content for newly arrived issues, bibliography sections of papers you read, shelves of the libraries you could walk to. Some libraries had search services that were often based on titles, authors, keywords, included abstracts. There was no full text indexing or no relevance ranking. The most recent came first. but if you couldn't find it, you can't learn from it! In every way you were limited, by shelves, by institution's budget, that which you don't know about.

Fast forward to 2015. Almost all journals worldwide are online. A large fraction of archives are online. Anyone anywhere can browse it all - let your fingers do the walking. Your library is worldwide - online shelves have no ends. Relevance ranking allows all articles to rise - all articles are equally easy to find, new or old, well-known journal or obscure. Full text indexing allows all sections to rise including conclusions and methods.

Anyone anywhere can find it all: all areas, all languages, all time. Your own area or your colleague's, latest research or well-read classics, free to all users. If you can get online you can join the entire global research community. There is so much more that you can actually read from big deal licenses, free archives, preprints to open access journals and articles.
The transformation is fantastic - he could not have dreamt of this 25 years ago as a grad student. And publishers, societies, libraries and search services have together made this possible.

So how has researcher behaviour changed? What do they look for? What do they read? What do they cite? There is a tremendous growth in queries with many many more users and queries per user in all research and geographical areas.

Queries evolve: there has been the most growth in keyword/concept queries e.g. author name queries, known item queries. The average query length has increased to 4-5 words. There are multiple concepts or entities occur often and most queries are unique. Queries are no longer limited just to their own area. Relevance ranking makes exploration easy and broad queries return classics/seminal work. There's a mix of expert and non-expert queries from users with sustained growth in related area queries. The researcher is no longer limited to narrow areas.

What do they read? There has been steady and sustained growth per user as well as in diversity of areas per user compared to the growth in related areas queries. Users read much more shown through the growth in both abstracts and full texts.

There is more full text available than ever before. Iterative scanning is a common mode: do a query, scan. Abstracts that have full text links in the search interface are selected more frequently, even if they don't actually read the full text. PDF remains extremely popular for full text allowed what is important to the researcher to be accessible to them later.

They have undertaken research into what researchers cite and the evolution of citation patterns. The full report is published on the Google Scholar blog.

Anurag concluded by observing if it is useful, researchers will find/read/cite. The spread of attention is widening across the spectrum to non-elite journals (more specific, less known), older articles, regional journals and dissertations. Good ideas can come from anywhere and insights are not limited to the well-funded or to the web-published. The top 10 journals still publish many top papers: 85% in 1995 to 75% in 2013. The elite are as yet still elite, but less so.

Research is inherently a process of filtering and abstracts are a crucial part of the filtering process. Forcing full text on early-stage users is not useful and limiting COUNTER stats to full text misses much of an article's utility to researchers.

He reflected that we are lucky to live in an era of information plenty. Better a glut than a famine.