Thursday, 11 September 2014

Who's afraid of big data?

Who's afraid of big data? panel
Fiona Murphy from Wiley chaired the final panel on day two of the 2014 ALPSP International Conference. She posed the question: how do we skill up on data, take advantage of opportunities and avoid the pitfalls?

Eric T. Meyer, Senior Research Fellow and Associate Professor at the University of Oxford was first up trying to answer. He observed how a few years ago you would struggle to gain an audience for a big data seminar. Today, it's usually standing room only.

Big data has been around for years. People were quite surprised when Edward Snowden leaked the NSA documents via Wikileaks, but it had been going on for a long time. Big data in scholarly research has also been around a long time in certain disciplines such as physics or astronomy. There was always money to be made in big data, but there's even more now, and everyone is starting to realise it. So much so, you need a big data strategy.

Meyer defines big data as data unprecedented in scale and scope in relation to a given phenomenon. It is about looking at the whole datastore rather than one dataset. Big data for understanding society is often transactional. We're talking really big. If you can use it on your laptop, it won't be big data.

Meyer drew on some entertaining examples of how big data can be used. If you key in the same sentence in different country versions of Google you'll see the variety of responses change. There are limits to big data approaches, they can come up with misleading results. What happens when bots are involved? Does it skew the results? The challenge will be how you can make it meaningful and more useful.

David Kavanagh from Scrazzl reflected on how the challenge researchers face when making decisions about how to structure and plan your experiments. If you want to leverage collective scientific knowledge and identify which products you want to use for your work, there wasn't a structured way of searching of doing this. Kavanagh urged publishers to throw computational power at data and content as a way to solve problems, improve how you work and help make sense of unstructured content.

That's what they have tackled with Scrazzl which is a practical application of structured or unstructured data that Eric Meyer mentioned. You need to have a product database. Then you have to cut out as much human intervention out as you can. Automation is key. Where they couldn't find a unique identifier or a catalogue match, they had to make it as fast as possible for a human to make an association. Speed is key.

Finally, they built a centralised cloud system that vendors could update their own records. It's a crowd sourced system for those who have a vested interest in keeping it up-to-date. The opportunity for them going forward will be through releasing this structured information through unstructured APIs to drive new insights. It also allows semantic enablement of content and offers the opportunity to think about categorisation in new ways.

For publishers running an ad supported model, they can get use the collection of products from the content search and then identify which advert is the most suited for you.

Paul F. Uhlir
Paul F. Uhlir  from the Board on Research & Information at The National Academies  observed that even after 20 years, we have yet to deconstruct the print paradigm and reconstruct it on the Net very well. In the 1980s a gigabyte was considered a lot of data. in the 1990s. a terabyte was a lot of data. In this decade, the exabyte era is not far ahead of us and a whole lot of others ahead of it.

Huge databases in business, mining marketing information and other data. The Internet of Things and semantic web. Everything now can be captured, represented and manipulated in a database. It's an issue of quantity. But there is also an issue of quality. There needs to be a social response. There are a series of threats around big data.



Disintermediation

The rise of big data promises a lot more disruption. Think about 3D printing. The consequence could be millions of product designers specifying items. Manufacturing will be affected. Jobs will be lost. What will happen to the workers in a repair and body shop when cars are driverless? What will happen to the insurance industries. Workers will be disintermediated. What is certain is that there will be massive labour shifts and disruptions.

Playing God

Custom organs for body parts. The ability of insert genes into another organism. All these applications are data intensive and will become even more so. They have profound social and ethical issues and have potential to do great harm.

End of Privacy

Meyer touched on the NSA files. What about spying satellites? The ubiquity of CC TV? These images are kept in huge databases for future use. Product information is held and used to identify preferences by private companies. There is no such thing as privacy any more.

Inequality

Big data are increasingly powerful means to increase hold on global power.

Complexity

The more we learn, the less we know. Any scientist will tell you that greater understanding leads to more questions than answers.

Luddite reactions

The reaction of people to the encroachment of strange and frightening techniques of the technology age where through passive resistance they try to lead a simple life.

There are also a number of weaknesses that centre around:

  • Improving the policies of the research community
  • New or better incentive mechanisms versus mandates
  • Explicit links of big data to innovation and tech transfer
  • Changing legal landscape-lag in law/bad law/IP law
  • Data for policy-communicating with decision makers.




Industry updates: Publons

Andrew Preston
Andrew Preston from Publons outlined their focus on peer review. As a crucial part of the publishing process, peer review is a leading indicator; it's what the experts thought. It is also valuable content.

Publons is about recognition for good review and a measurable research output for reviewers and editors. It is a proof of quality review process for journals.

They believe openness breeds quality. They provide tools for editors. They measure impact, help them engage with reviewers, and assist with finding, vetting and connecting with reviewers. Finally they build communities to help generate engagement, combining pre- and post-pub reviews with searchable, indexed content.

Publons in numbers

The journal adds review. Publons emails a unique token to reviewer. The reviewer signs in and selects privacy settings. They combine and respect views of journal editor as well as reviewer on how much content they show on Publons. There are two versions of the review - public and private. They have found that regular reviewers review up to 50 articles a year. It can be a leading indicator of expertise.

The service highlights the quality of review process by showing reviews and reviewers. It helps building a community. It helps with Almetrics as they always link to your content. Every review on Publons is eligible to appear in Altmetric so you will never see a zero again. This helps to build out the long tail of articles that don't get picked up in the press! They always link back to content. The article is not their focus so they generate clicks back to the publisher's website. There are a suite of editor tools and it helps them find a reviewer. For publishers, it helps to get more submissions as well as better and faster reviews. It boosts article level metrics and generates post-publication discussion.

Further information available on the Publons website. If you are a publisher and would like to see some metrics about reviews on Publons, complete this online form.

Metrics and more

Melinda Kenneway on metrics
Publication metrics are part of a much bigger picture. Where resources are restricted and there is a lot of competition, metrics become more essential to help prioritise and direct funding. The 'Metrics and more' 2014 conference session was chaired by Melinda Kenneway, Director at TBI Communications and Co-founder of Kudos. The panel comprised Mike Taylor, Research Specialist at Elsevier Labs, Euan Adie, Founder of Altmetric and Graham Woodward, Associate Marketing Director at Wiley. Kenneway opened by observing that as an industry we need to consider new types of metrics.

Publication performance metrics include:
  • Anti-impact factor: DORA
  • Rise of article level metrics
  • introduction of almetrics
  • New units of publishing: data/images/blogs
  • Pre-publication scoring (Peerage of Science etc)
  • Post-publication scoring (assessment, ranking etc)
  • Tools for institutional assessment

Researcher performance metrics include:

  • Publication output
  • Publication impact
  • Reputation/influence scoring systems
  • Funding
  • Other income (e.g. patents)
  • Affiliations (institutional reputation)
  • Esteem factors
  • Membership of societies/editorial boards etc
  • Conference activity
  • Awards and prizes

Institutional performance metrics include:

  • University ranking systems
  • Publication impact metrics
  • STAR/Snowball metrics
  • Research leaders and career progression
  • Patents, technologies, products, devices
  • Uptake of research

Graham Woodward, Associate Marketing Director at Wiley, provided an overview of a trial of altmetrics on a selection of six titles. On one article, after a few days of having altmetrics on the site, they saw the following results: c. 10,000 click throughs; average time on page over three minutes; over 3,500 tweets; an estimated 5,000 news stories; 200 blog posts; and 32 F1000 recommendations.

Graham Woodward
They asked for user feedback on the trial and the 50 responses provided a small but select snapshot that enabled them to assess the effectiveness of the trial.

Were the article metrics supplied on the paper useful? 91% said yes. What were the top three most useful metrics? Traditional news outlets, number of readers and blog posts. 77% of respondents felt the experience enhanced the journal.

Half of respondents said they were more likely to submit a paper to the journal. 87% used the metrics to gauge the overall popularity of the article, 77% to discover and network with researchers who are interested in the same area of their work and 66% to understand significance of paper in scientific discipline.

What happened next? The completion of six journal trial was followed by an extension to all OA journals. They have now rolled out metrics across the entire journal portfolio

Euan Adie from Altmetric reflected on the pressures and motivations on researchers. While there is a lot of pressure within labs for young researchers, funders and institutions are increasingly looking for or considering others types of impact, research output and contribution. There is an evaluation gap between funder requirements and measuring impact. That's where altmetrics come in. They take a broader view of impact to help give credit where it is due. HEFCE are doing a review of metrics within institutions at the moment.
Euan Adie

Seven things they've learnt in the past year or so.

  1. Altmetrics means so many things to so many people. But the name doesn't necessarily work. It is complimentary rather than alternative and it is about the data, not just the measure.
  2. It's worked really well for finding where a paper is being talked about where they wouldn't have known before, but also the demographics behind it.
  3. Altmetrics is only a weak indicator of citations, but the whole point is to look beyond. Different types of sources correlate to different extents.
  4. Don't take all altmetrics indicators as one lump, there are many different flavours of research impact.
  5. When you have an indicator and you tie it to incentives, it immediately corrupts the indicator. While he doesn't believe there is massive gaming of altmetrics there is an element of this with some people. It's human nature.
  6. The top 5% of altmetric scores are not what you expect.The most popular paper is a psychological analysis of the characters in Winnie the Pooh.
  7. Peer review is a scary place. Scientists and researchers can be pretty nasty! Comments can be used in a different (more negative) way than expected. But that is not necessarily a bad thing.
Mike Taylor believes we are approaching a revolution rather than an evolution. What we have at the moment is a collision of varying different worlds because the value of interest in metrics is increasing. What makes for great metrics, and how do we talk about them? Do we want the one-size-fits-all approach? We have data and metrics and in between those two things there is theory, formulae, statistics and analysis. Within the gap between the two things there are a lot of political issues. 

Taylor reflected on the economies of attention (or not) and how you assess if people are engaged. With an audience, when hands go up, you know they are paying attention, but no hands doesn't mean they aren't. Metrics so far are specialist, complex, based on 50 years of research, are mostly bibliometrics/citation based and much is proprietary. The implications for changing nature of metrics: are: as metrics are taken more seriously by institutions, the value of them will increase. As the value increases, we need to be more aware of them. As a scholarly community we need to increase awareness about them. Awareness implies critical engagement, mathematics, language, relevance, openness, agreement, golds standards, and community leadership.

Mike Taylor
We are experiencing a collision of worlds. Terms like 'H-Index' are hard to understand, but are well defined. Terms like 'social impact' sound as if they're well defined, but aren't. There are particular problems about the 'community' being rather diverse. There are multiple stakeholders (funders, academics, publishers, start-ups, governments, quangos), international perspectives and varying cultures (from fifty years of research to a start-up). 

Taylor suggested an example metric - 'internationalism'. Measures could include: how well an academic's work is used internationally; how well that academic works; through readership data; citation analysis (cited, citing); co-authorship; funding data (e.g. FundRef); conference invitations e.g. ORCID; guest professorships; text-analysis of content.

Taylor doesn't think metrics is a place where publishers will have the same kind of impact that they might of 30 years ago. He said to expect to see more mixed metrics with qualitative and quantitative work. Taylor concluded that metrics are being taken more seriously (being used in funding decisions). Many stakeholders and communities are converging. 

Big data + cloud computing + APIs + openness = explosive growth in metrics. 

It is a burgeoning research field in its early days. Publishers need to be part of the conversation. We need to enable community leadership and facilitate decision making.

Wednesday, 10 September 2014

ALPSP Awards for Innovation in Publishing - the finalists

ReadCube Connect
The final lightning session from the first day of the ALPSP International Conference showcased the finalists for this year's Awards for Innovation in Publishing sponsored by Publishing Technology.

bioRxiv from Cold Spring Harbor Laboratory Press

The preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution.

Edifix from Inera Inc.

Edifix is a web service that copyedits, corrects, and links bibliographic references in a number of formats.

Frontiers open science platform

Frontiers is a community-rooted open-access publisher and research network that empowers researchers to take charge of publishing and builds online tools to review, evaluate and disseminate science.

IOP ebooks™ from IOP Publishing

A brand new book programme that brings together innovative digital publishing with leading voices from across physics to create the essential collection of physics books for a digital world.

JournalGuide from Research Square

JournalGuide brings all sources of data together in one place to give authors a simple way to choose the best journal for their research.

ReadCube Connect from Labtiva Inc

ReadCube Connect is an HTML5-powered interactive PDF viewer that seamlessly integrates into your article pages, keeping readers onsite and connected.

Rightslink for Open Access

RightsLink for Open Access from the Copyright Clearance Center

RightsLink® for Open Access streamlines the entire author fee transaction by seamlessly integrating with editorial and production workflows.

The Awards will be announced tomorrow night at the conference dinner. Follow #alpspawards on Twitter for the results!

Customers as competitors

Customers as competitors? Anderson, Taylor-Roe and Horova reflect.
The first plenary at the ALPSP International Conference 2014 focused on increased competition from the least likely sources - our customers - with the advent of digital publishing lowering barriers to entry.

Rick Anderson, Associate Dean for Scholarly Resources & Collections at the Marriott Library, University of Utah (also known as a Scholarly Kitchen Chef) chaired a panel comprising Jill Taylor-Roe, Deputy Librarian at Newcastle University Library, Tony Horova, Associate University Librarian at the University of Ottawa and Graham Stone, Information Resources Manager at the University of Huddersfield.

The University of Ottawa is the world's largest bilingual university and the Press and library have a close relationship at management and editorial level. They generate $300,000 sales each year with a simultaneous print and digital programme. Tony Horova shared the interim results of a research project they ran to track results of a Gold OA partnership. The OA partnership was between the University of Ottawa Press and the library was based around shared goals to improve sustainable dissemination of scholarly research.

The Open Access Funding Partnership is a three year agreement to support gold OA with CC licence for new monographs. The library subsidises a maximum of three titles per annum with a $10,000 subvention per title to a max of $30k per year. They have targeted titles of core contemporary social relevance. It is a three year project with the goal of assessing sales and dissemination so they can understand what it will mean for their future programme. Horova shared the results to date. Interestingly, including actual sales.



Where do they go from here? They are one of four university presses in Canada to have embraced OA and intend to remain on the cutting edge. they are assessing the project/consultation process and determining how to further incorporate OA into business model and strategic direction of the Press while discussing financial implications with the library.

Jill Taylor-Roe reflected on the ups and downs of relationships between librarians and publishers. How do we respond to change? We are in the midst of the most disruptive period in scholarly communications. The only real certainty for all of us is that more change will come. To survive and thrive you need to change and adapt.

One major change that publishers have to engage with is the involvement in research publishing decision of managers within an institution. When you come up against financial directors as agents of change, they become a significant influence. This is a different world, they were never involved before. They will ask lots of questions around why there are payments for fees, pages, illustrations etc. He who pays the piper calls the tune.

It is time to change and recalibrate scholarly communication models. Need to put each of our skill sets together to face this new world. In some instances there will be competition from university presses. It is good that the dialogue has been opened, but she is keen not to polarise the discussion.

Graham Stone spoke about the potential impact of open access repositories and library scholarly publishing on 'traditional' publishing models. He asked if we are not missing the point a little bit? Ultimately, what is more important? Is it the usage on your journal platform or is the actual impact of the research. He would argue that the latter sells content. Repositories that multiply access points help increase readership and impact. Repositories are not all that bad. They may well be helping.
Graham Stone steps up for the debate

Stealing your lunch? No. Gold OA in the repository? We paid for it. Green? We have an agreement to access it. If it is hybrid version, we're not only giving you lunch, but also letting you having your cake and eating it as we link to the publisher platform.

Don't waste your money on fancy sites that don't work on mobile. Researchers just require stuff. The PIRUS project from a few years ago where the evidence showed they drove usage. A more recent initiative is the IRUS-UK and Repositories project. They are adding value by promoting where citations are and building awareness internationally.

There is a lot going on in North America who are ahead with scholarly publishing in the library. Amherst, University of Huddersfield, Ubiquity Press are just some examples. They have an eye for growing the author pool, particularly with young researchers who may struggle to get published elsewhere. Academic publishing is a professional industry and it has to adapt to changes in scholarly practice.

Follow the ALPSP International Conference conversation on Twitter via #alpsp14.

Innovation and its place in the changing scholarly publishing landscape

Amy Brand from Digital Science takes questions
Amy Brand opened the ALPSP International Conference 2014 with a keynote reflecting on innovation, how its main function is to advance research, and how vital it is for publishers to participate in the linked information landscape.

There are crucial changes in academic publishing that are directed towards the challenges of new efficiencies in a data publishing environment. It is simply no longer good enough to just read the text. Researchers need to get behind the curtain or under the hood. Who owns what? Institution, funder, publishers? The shifts in landscape are leading to land grabs. There are institutional resources for management alongside publishers ones. But the good news is that there are many new opportunities for publishers to develop new services.

The act of creating something entirely new is an act of innovation. Brand was inspired by her experience at MIT Press. In the late 90s they experimented with open access monographs and were very successful at it. Another MIT  project CogNet was one of the first online tools for researchers. It was very exciting to work on it and proved to be one of the projects that drew her away from books and on to CrossRef.

One of the principle aims of Digital Science is to work smart in order to discover more. They work with researchers, librarians, academic administrators, funders and publishers, who all want to enhance their own platforms. So they spend a lot of time looking at workflows between all these stakeholders. They have built a portfolio of nine different companies and provide supporting tools for every part of the research life cycle.

Innovation sticks when it addresses a specific need. Pain not necessity is the mother of invention. Brand's check list for publishers comprises six 'pain points'.

Paint point number 1: 'I want a smarter way to manage my own record of scholarship.'

Researchers want a one stop interface that manages all aspects from activity reports, personal website, CV grant applications, institutional repository and lab website. Persistent identifiers are one way to help and what ORCID is trying to achieve (although it has not moved as quickly as they would've liked). ORCID identifiers are not all that sexy, but if everyone used them it would dramatically improve the academic world.

Pain point 2: 'I need better ways to manage and share my research data and other outputs'

Figshare allows publishers to host large amount of data and articles with no impact on your infrastructure. Brand believes working in partnership with figshare is a no brainer for publishers.

Pain point 3: 'I'm finding it impossible to keep up with the relevant literature in my field.'

Getting information from the internet is like trying to drink from a fire hose. Sophisticated filtering and recommendation services should be more closely integrated into the publishers' platforms. And this is what ReadCube tries to do: personalised recommendations best on researcher's libraries.

Pain point 4: 'I want to become more efficient at finding collaborators and funding opportunities.'

There are various services and systems available that can help publishers identify candidate peer reviewers. One example, Uber Research, brings in a wider pool of reviewers from a database. They can be filtered on expertise and conflict of interest. This is a real example of linked data that presents an opportunity for publishers to improve how they engage with the peer review process and minimise reviewer fatigue.

Pain point 5: 'Pay walls keep me from accessing needed resources and from disseminating my work as widely as possible.'

How can publishers be new partners for innovative access models? What about differentiated access? As the conference is in Heathrow, think about how airlines break down costs for carry on baggage, early check in, etc. With ReadCube, instead of facing paywalls, the institution has patron-driven but paid access to articles in the library. Brand urged publishers to sell more granular bits of information on the book side as well.

Pain point 6: 'Academic incentives and evaluation norms exert too much control over my research and publication choices.'

The traditional paradigms of where you publish (high Impact Factor, prestige monograph publishers) can constrain the direction of your research. The Journal of Statistical Software is a great example: they provide reproducible code and software tools for readers. Altmetric helps researchers read around the subject and authors can check engagement downstream. It can provide a tremendous force for change as readers start their own little revolutions on what to read.

A more radical way is turning the idea of authors on its head. The relationship between authorship, invention and credit is broken. The Harvard-Wellcome draft taxonomy is helping to drive credit for discovery which in turn can have a huge impact on a person's life and career. (Brand flagged the open invitation to CASRAI-NISO contributor role taxonomy review circle.)

Brand finished with a series of questions she feels publishers should be asking themselves to innovate effectively.
  1. Are you using and contributing linked data?
  2. Have you implemented ORCID IDs in your workflow?
  3. How are you increasing engagement with your content?
  4. Do your journals support data sharing?
  5. Are you displaying article-level metrics?
  6. Do you offer differentiated access options for your users?
  7. How do you currently capture contribution, and how can we collectively improve the tracking of research credit?

Amy Brand is Vice President of Academic & Research Relations and North America for Digital Science.  

Monday, 8 September 2014

Amy Brand talks innovation: not just introducing something new, but also a well-articulated sense of purpose

We are delighted that Amy Brand, Vice President of Academic & Research Relations and North America for Digital Science, is our keynote speaker at the ALPSP International Conference on Wednesday. Amy took some time out of her schedule to tell us a bit more about her work and what she thinks innovation is really about.

Tell us about yourself and Digital Science.

"I feel extremely fortunate to be in the midst of a long and varied career immersed in many different facets of scientific and scholarly communications: as an MIT-trained researcher in linguistics and cognitive science, executive editor at The MIT Press, Director of Business and Product Development at CrossRef, manager of the Office for Scholarly Communication and then Assistant Provost for Faculty Appointments and Information at Harvard, founding member of ORCID’s board of directors, and now as VP at Digital Science, where I manage U.S. operations and cultivate institutional partnerships.

For those of you who don’t know us yet, Digital Science invests in and incubates academic start-ups that provide research information software – software that accelerates scientific and scholarly research, both by facilitating aspects of the research cycle directly and by facilitating the management of the research process.

My ALPSP keynote next week will focus on where the innovation and change in scholarly communications is coming from, where I see it going, and how smaller scholarly and professional publishers can participate and benefit."

What does innovation mean to you?

"Innovation simply means making or introducing something new. In our world that tends to translate as creating new technologies to stay competitive. But innovation has also come to mean a way of working - towards a well-articulated sense of purpose, within a work environment that embraces experimentation and risk taking. I believe that innovation in publishing can make research better, and more productive researchers means more new knowledge with which to address the big problems in our world. We are inventing the future of scholarly communication to meet the evolving needs of scholars - in particular, to make research more efficient with tools that facilitate discovery, accessibility, attribution and reproducibility."

What do you think are the key drivers of innovation in scholarly communications at the moment?

"At a high level, I see our community innovating to create new efficiencies within today's complex linked information environment. But when you drill down, you can identify a number of specific researcher pain points that are driving the invention of new tools and models to address frustrating inefficiencies."

How is that impacting on the traditional industry?

"What it means to be a scholarly publisher and stay competitive has forever changed. Content may still rule, but what we mean by content and the scholarly conversation has expanded significantly. As a consumer of scholarly information, it is no longer enough to simply read the text. I expect to be able to look behind the curtain at data, code, other media, and - downstream - how other people are reacting to the work in real-time. There are tremendous opportunities for publishers that can grow accordingly, and extend their own services into other aspects of the scholarly communication ecosystem."

How  does Digital Science ‘do’ innovation?

"We have a clear vision and a well-defined approach to innovation. We aim to provide innovative tools that support every stage of the research life cycle, and we do so by investing in best-in-class solutions. Most of the start-up companies in our portfolio were conceived and founded by academics innovating to address a major challenge in their own workflows, whether during the funding process, in the lab, managing data, or in the writing and publication process itself."

And finally, what do you hope the delegates will get out of your talk at the conference?

"I hope the audience goes away with a renewed sense of understanding that when we innovate in publishing, we do so to advance research itself, and that the way to stay on the cutting edge today is to participate fully in the linked information landscape. Ultimately, whether you're a publisher, a librarian, a researcher or a funder, we’re all in the scholarly communication enterprise together, working towards the creation of new knowledge."

The ALPSP International Conference is on Wednesday 10 - Friday 12 September at the Park Inn Heathrow, London. Follow the conversation on Twitter via #alpsp14 or read highlights from the sessions here on the ALPSP blog.