Wednesday 4 December 2013

Sayeed Choudhury reflects on the research data revolution

Sayeed Choudhury
Sayeed Choudhury, Associate Dean for Research Data Management, Johns Hopkins University, kicked off the STM Innovations seminar reflecting on 'The Research Data Revolution'.

There is a new economy of sources of data. The challenge as publishers is to develop services.

Data Conservancy is a community that develops solutions for data preservation and sharing to promote cross-disciplinary re-use. It is about preservation - collect and take care of research data; sharing - reveal data's potential and possibilities; and discovery - promote re-use and new combinations.

Is data different?
Data is the new oil (stated in Qatar, European Commission, etc). McKinsey claimed that data is 4th factor of production and estimates a potential $3 trillion of economic value across seven sectors within the US alone. Todd Park estimates location sensitive apps generate $90 billion of value annually. Policy movements reflect its importance: the White House Office of Science & Technology Policy Executive memorandum and White House Open Government Initiative are two key initiatives.

Data are a new form of collections though they are fundamentally different in nature. They are created or converted to digital format for processing by machines. Entirely new methods are required to deal with them. They are, in effect, a new form of special collections.

What is 'Big Data'?
There are definitions based on the V's of Big Data (e.g. volume, velocity, variety). What is clear is that it's different from 'spreadsheet science' (or long-tail science). For Choudhury, if a community's ability to deal with data is overwhelmed, it is 'Big Data' - and it's more about 'M's' (methods of lack thereof) than 'V's'.

There's a core of services that span across data from different disciplines and contexts. Archiving is a good example. However, if data collections are basically open, libraries may need to differentiate themselves by the services they offer. They should provide a combination of machine and human mediated services. There will be a set of services that only 'experts' will be able to offer.

Data management layers: curation, preservation, archiving, storage

Understanding infrastructure
Data will require fundamentally new systems and infrastructure. Institutional repositories can be useful gateways, but are not long-term solutions (particularly for 'Big Data'). Libraries will need to operate at scale through an integrated, ecosystem approach to infrastructure. Customised 'human mediated' services are most effective as an interpretative layer on machine based services.

What about publishers?
No one can claim a specific role or act with a sense of entitlement when it comes to data (whether publishers or librarians). The future of data curation is a competition between information graphs. 'Publishing is about content, not format.' - Wendy Queen, Associate Director of Project Muse, Johns Hopkins University Press

No comments:

Post a Comment