ALPSP blog: at the heart of scholarly publishing: Guest Blog Post

Research outputs beyond the PDF: Why they matter, and how to get started

Going beyond the default

While the PDF has become the de facto global standard for publishing articles online, the publishing tools of today offer a whole range of ways to publish content that is more flexible, more engaging, and more user-friendly, as well as better addresses current publishing standards. It’s time to broaden our scope beyond the PDF!

Compared to PDF, both HTML and EPUB3 are better formats for accessibility—not just in the technical sense of image descriptions and ARIA roles, but also in the broader sense of allowing the user to resize and reflow text or zoom in on images.

But there are also many other ways to publish both the outcomes of research and the materials that lead to those outcomes, and many compelling reasons to use them. The three we’ll focus on here are making research data and protocols available to other researchers around the world; engaging your users with more varied and interesting offerings; and translating research for non-scientists and non-specialists.

Publishing data sets: Why, what, and how

The why of making researchers’ data available, in keeping with FAIR data principles, is well recognized: when the data behind the research is findable, accessible, interoperable, and reusable, replication studies can be more easily carried out, and testing the replicability of published results is key to advancing science and improving research integrity.

The how may be more challenging. “Publish researchers’ data alongside the articles or books based on that data” seems perfectly straightforward—until you start thinking about all the things “data set” might mean. Depending on the research and the discipline in question, the data behind a published article might be anything from a vast database of testing data or geolocation coordinates to computer code, from a linguistic corpus to a collection of audio recorded interviews, archival photographs, or tweets. (We won’t get into collections of cells, core samples, or water samples.) While all are data sets and deserve FAIR data treatment, each brings a different set of technical challenges to the publication process.

Using a feature, like Digital Objects on the Literatum publishing platform, that supports multiple publication formats allows publishers to offer—or even require—researchers to make their data available on the same platform and at the same level of discoverability as the publications based on it. A wide variety of data types—essentially, anything that exists in a digital format—can be hosted alongside an article or book, assigned a Datacite or Crossref DOI, and linked bidirectionally with the publication and any other data sets or research products. Depositing a DOI makes data sets easier to find and to cite, benefiting the researchers on both ends of the data-reuse transaction.

Protocols, notebooks, and more

Just as important for replicability as data sets are the protocols used in collecting and analyzing the data—from survey instruments to lab procedures to focus-group guidelines. Publishing research protocols alongside data and findings further encourages replication studies.

A related use case is that of computational notebooks, which are widely used by researchers in many scientific disciplines to carry out, manage, and share their workflows and data analyses but which most publishers can’t yet accommodate as part of the research output. Wiley and Atypon are part of the Sloan Foundation–funded project Notebooks Now!, led by the American Geophysical Union, aimed at developing a standard model for publishing computational notebooks.

Why is this important? As Shelley Stall et al. write,

Providing notebooks as available and curated research outputs would greatly enhance the transparency and reproducibility of research, integrating into computational workflows. The notebooks allow deeper investigations into studies and display of results because they link data and software together dynamically with what are often final figures and plots. [Read the full Notebooks Now! proposal at https://doi.org/10.5281/zenodo.6981363.]

Publishing computational notebooks is just one way that making researchers’ data findable, accessible, interoperable, and reusable helps elevate both integrity and equity across research and publishing.

Access, accessibility, and knowledge translation

Data availability is critical. But publishing “beyond the PDF” isn’t just for data sets!

When we talk about Open Access, or about public access to publicly funded research, we need to consider much more than whether or not a publication is paywalled. It’s important to ask, “Can a member of the public download and read this article?” But we also need to ask, “Will a non-expert reader understand the key findings of this article?” Researchers generally write for other researchers in their field, and the typical editorial process does a good job of facilitating that expert-to-expert conversation—which simply isn’t designed for readers without that specific expertise.

This is where knowledge translation comes into the picture. What alternatives to expert-to-expert academic writing can we provide, in order to make key research findings—for example, from high-quality and up-to-date studies in public health, epidemiology, and occupational safety—both freely available and genuinely useful for non-experts who would benefit from understanding them?

Plain-language summaries, “explainer” blog posts, and static infographics are a great place to start. Translating these into other widely spoken languages takes us further. But why stop there? Publishing platforms like Literatum allow publishers to host audio, video, podcast, and interactive visual content. So consider “explainer” video or podcast interviews where researchers highlight key findings; consider how an interactive graph can help a non-expert understand demographic changes, the spread of a disease, or how languages change over time; consider how an animated map can illustrate economic, weather, or population data across space and time.

Finally, we need to consider accessibility. Step one, of course, is to make sure that our websites are WCAG compliant. Step two is ensuring that all text content—whether articles, books, blog posts, news updates, or data files—is machine readable, so it can be interpreted by screen readers, and available in HTML or EPUB format (either natively or via a PDF rendering tool such as Atypon’s eReader), so that it’s friendly to those who need large print, reflowable text, and zoomable images. Step three is to work on making non-text content as accessible as possible: well-written descriptions for all non-decorative images, closed captions for all videos, transcripts for audio content …

All of these elements, too, can usefully have DOIs deposited, to make citing them easier and help direct readers to the version of record.

Make time for metadata!

Whatever you’re publishing, in whatever format, metadata remains key. The challenge comes in determining what metadata are necessary and appropriate for new types of content. To maximize discovery of non-PDF content, it’s important to accurately identify in the metadata what it is (data set? interview? photo archive?), what the format is (.csv? .mp4? .jpeg? .zip?), what it’s about (few things are more frustrating than thinking you’ve discovered a good study of chess and then finding it’s a study of cheese), and all the ways it’s connected to other pieces of content. Metadata should also include information about how and where to access the content, who created it, and what users can and can’t do with it. An additional part of the metadata equation is deciding which elements are supplementary to the article, and which are effectively on the same level.

Finally, depositing a DOI (or other appropriate persistent ID) is important for everything you publish. On a practical level, using and maintaining DOI links makes citation easier, directs readers to the version of record, and ensures that whatever element of their work is cited, authors’ citation stats benefit from the use of their work. On a more symbolic level, depositing DOIs for non-article content signals a commitment to treat these content types as they deserve: as part of the published scholarly record.

So now what?

Your publishing platform provider can tell you what non-PDF content types can be hosted on your site and how to get them there, and we can also help you resolve metadata questions and deposit your DOIs.

The more challenging—and more exciting—part is up to you and your contributors: Deciding what content types best suit your contributors, their research, and your audience, and then making it happen. We’re here to help you all the way!

Author Bio

Sylvia Izzo Hunter joined the marketing team at Atypon as Community Manager in 2021 and has been Marketing Manager at Inera since 2018, responsible for content marketing and social media. Prior to shifting to marketing, she worked in editorial, production, and digital publishing at University of Toronto Press. A past SSP Board member (2015-2018), Sylvia has also served on SSP's Communications, Education, and DEIA committees and is a member of the NISO CREC Working Group.

Friday, 30 September 2022

Guest Blog Post - Wiley