Monday 16 December 2013

Colin Meddings: Why data quality matters.

Colin Meddings is the Client Director at DataSalon. Colin will be one of the speakers at the forthcoming ALPSP seminar Data, the universe and everything taking place in January.

Here, in a guest post, he reflects on why good quality customer and internal data is important for scholarly publishers.

'Only four types of organisations need to worry about data quality: Those that care about their customers; Those that care about profit and loss; Those that care about their employees; and Those that care about their futures.' – Thomas C. Redman (2006)

Over recent years publishers have had to overcome many hurdles in the digital world, such as making content available online, managing complex consortia deals, creating new packages of content and tracking usage statistics. The result of all this digital activity is vast amounts of data. However, the pace of change can often distract from the careful governance of this data, leading to gaps, inconsistencies and inaccuracies.

But why does the quality of all this data matter so much? Good data is your most valuable asset, and bad data can seriously harm your business and credibility…

What have you missed? 
At a management level, poor data quality equates directly to poor visibility of key trends in the growth or decline of certain products or markets. At the contact level, you may miss out on valuable sales opportunities if email address fields aren’t filled out correctly or customer names are wrong. Having good data will help deliver better customer service and enhance your reputation, and it means you can make better selections for targeted prospecting, cross-selling and up-selling.

When things go wrong.
Bad data can lead to ‘accidents’ and wrong decisions or actions which can affect customer confidence. You’ve spent time building up a valuable customer list – so it’s important not to waste this by sending campaigns to the wrong people, or with messages which don’t match their interests, or to out-of-date or deceased contacts. Data quality issues can also cost you money directly – for example if invoices or renewal notices are sent to the wrong recipient, or at the wrong time.

Making confident decisions. 
Data quality matters most of all because it enables your staff and management team to really trust the accuracy of the reports and analysis they’re given. Without that confidence, apparent trends or new opportunities will always leave you wondering whether they really present a true picture. But with a complete and accurate view of your customers and prospects, comes the confidence to make well informed business decisions and commit fully to your strategic planning.

So, data quality is a very important foundation for a publisher’s entire business planning process and customer contact strategy. Good data quality will allow your business and its reputation to grow and flourish.

Data quality is just one of the topics in the forthcoming ALPSP seminar Data, the universe and everything. Other areas covered will include the use of institutional and personal identifiers in the scholarly publishing supply chain, publisher metadata, data relating to open access publishing and some case studies from publishers who have tackled data issues.

This post originally appeared on DataSalon’s own blog From the Armchair.

Wednesday 4 December 2013

Frank Stein on Watson and the Journey to Cognitive Computing

Frank Stein on cognitive computing
Frank Stein from IBM outlined their project Watson and the Journey to Cognitive Computing at the STM Innovations seminar. Data is exploding driven by unstructured data (in descending order: video, image, audio, text, structured data). How do we build a system that can take all this info and build something useful for researchers, doctors, etc?

The Watson and Jeopardy! example shows how they have developed a programme that can match deeper evidence and use temporal reasoning, statistical paraphrasing and geospatial reasoning. The evidence is still not 100% certain, but it is about about likelihood and confidence.

What they learned in Jeopardy
The DeepQA approach can accurately answer single sentence queries with confidence and speed. It is highly dependent on content, content quality, and content formats. They need a combination of technologies to get satisfactory performance (semantic technology, machine learning, information retrieval/search technology, databases and high performance computing techniques). Both structured and unstructured content need to be combined for best results. They now need to extend Watson to handle richer interactions and continuous training/learning.

Here's the IBM video about Watson and the game show Jeopardy!

Watson Decision Advisor in medicine
A data-rich, societally important field helping Watson change how medicine is:
IBM used to produce typewriters
When Stein started, IBM produced typewriters. Now they have 10,000+ products. Their sales agents need help. IBM is building out a portfolio of Watson Solutions including Watson Engagement Advisor for use in situations in which you need stronger ties with constituents and better automated or agent-facilitated conversations. Examples include: bank outreach to customers for cross-sell, cable operator services and support, tax agency advice, etc. 

What's next - Cognitive Computing
Watson is ushering in a new era of computing. We have transitioned from the tabulating systems era to programmable systems era. Now we are moving into a world called cognitive systems era. This is a key technology for a new era of computing that takes into account:
  • Content and learning
  • Visual analytics and interaction
  • Data centric systems
  • Cognitive architecture
  • Atomic and nano-scale.

Sayeed Choudhury reflects on the research data revolution

Sayeed Choudhury
Sayeed Choudhury, Associate Dean for Research Data Management, Johns Hopkins University, kicked off the STM Innovations seminar reflecting on 'The Research Data Revolution'.

There is a new economy of sources of data. The challenge as publishers is to develop services.

Data Conservancy is a community that develops solutions for data preservation and sharing to promote cross-disciplinary re-use. It is about preservation - collect and take care of research data; sharing - reveal data's potential and possibilities; and discovery - promote re-use and new combinations.

Is data different?
Data is the new oil (stated in Qatar, European Commission, etc). McKinsey claimed that data is 4th factor of production and estimates a potential $3 trillion of economic value across seven sectors within the US alone. Todd Park estimates location sensitive apps generate $90 billion of value annually. Policy movements reflect its importance: the White House Office of Science & Technology Policy Executive memorandum and White House Open Government Initiative are two key initiatives.

Data are a new form of collections though they are fundamentally different in nature. They are created or converted to digital format for processing by machines. Entirely new methods are required to deal with them. They are, in effect, a new form of special collections.

What is 'Big Data'?
There are definitions based on the V's of Big Data (e.g. volume, velocity, variety). What is clear is that it's different from 'spreadsheet science' (or long-tail science). For Choudhury, if a community's ability to deal with data is overwhelmed, it is 'Big Data' - and it's more about 'M's' (methods of lack thereof) than 'V's'.

There's a core of services that span across data from different disciplines and contexts. Archiving is a good example. However, if data collections are basically open, libraries may need to differentiate themselves by the services they offer. They should provide a combination of machine and human mediated services. There will be a set of services that only 'experts' will be able to offer.

Data management layers: curation, preservation, archiving, storage

Understanding infrastructure
Data will require fundamentally new systems and infrastructure. Institutional repositories can be useful gateways, but are not long-term solutions (particularly for 'Big Data'). Libraries will need to operate at scale through an integrated, ecosystem approach to infrastructure. Customised 'human mediated' services are most effective as an interpretative layer on machine based services.

What about publishers?
No one can claim a specific role or act with a sense of entitlement when it comes to data (whether publishers or librarians). The future of data curation is a competition between information graphs. 'Publishing is about content, not format.' - Wendy Queen, Associate Director of Project Muse, Johns Hopkins University Press

Monday 2 December 2013

International Publishers Association Call for Nominations: 2014 IPA Freedom to Publish Prize

The closing date for nominations for the 2014 IPA Freedom to Publish Prize is 6 January 2014.

The Prize will be awarded on 27 March 2014, during the IPA Congress in Bangkok, and the recipient will receive CHF20,000, thanks to the generous sponsorship of the following publishers: Albert Bonniers Förlag, Elsevier, HarperCollins, Kodansha, Macmillan, OUP, Penguin Random House, and Simon & Schuster.

Nominees can either be publishers who have recently published controversial works in the face of pressure, threats, intimidation or harassment from government or other authorities; or publishers with a long and distinguished history of upholding the values of freedom to publish and freedom of expression.

IPA member organisations, members of the IPA Freedom to Publish Committee, individual publishers, and international professional and non-government organisations working in the field of freedom of expression can nominate candidates for the IPA Freedom to Publish Prize.

Those nominating must explain the reasons behind their choice of candidate in writing (in English, French or Spanish) using the attached form as a template. Nominations should be submitted to the IPA’s Policy Director, José Borghino ( no later than close-of-business (Geneva time) on 6 January 2014.

More about the 30th IPA Congress and the IPA Freedom to Publish Prize Ceremony: 

The 30th IPA Congress will be held in Bangkok, Thailand, on 25-27 March 2014, and will be hosted by the Publishers and Booksellers Association of Thailand (PUBAT) under the auspices of HRH Princess Maha Chakri Sirindhorn. To see the Program, go to the Congress website.

On the eve of the Bangkok Book Fair (28 March to 8 April), hundreds of publishers from all over the world will participate in the Congress, together with authors, copyright specialists, librarians and officials from around 50 countries and international organisations.

The 2014 IPA Freedom to Publish Prize will be awarded during the Congress on 27 March 2014. Aung San Suu Kyi has been invited to give the keynote speech and award the Prize.

Earlybird online registration for the Congress is available from the Congress website.