Thursday 10 September 2015

What does content and behavioural data mean for publishing? Microsoft's Kuansan Wang considers.

The availability of large amounts of content and behavioural data has also instigated new interdisciplinary research activities in the areas of information retrieval, natural language processing, machine learning, behavioural studies, social computing and data mining.

Kuansan Wang, Director of the Internet Service Research Centre at Microsoft Research considered the impact for the publishing and consumption of content, drawing on observations derived from a web scale data set, newly released to the public.

If you think about the web as a gigantic library of the future, then you should think about the semantic web as the librarian. It involves trust, proof, logic, ontology vocabulary, rdf schema, xml schema, Unicode and URI.

A central theme for the semantic web is trying to help a machine read and makes sense: human readable versus machine readable contents. The semantic web requires humans to define a standard for data formats and models. It has an explicit and precise specification of knowledge representation that everyone has to agree upon.

The knowledge web is where a machine reads human readable contents. With the knowledge web, the machine learns to conflate different formats of the same thing. It involves latent and fuzzy representation of knowledge learned by mining big data.

There has been a paradigm shift in discovery. Traditional web search involves index keywords in documents, matches keywords in queries and has the relevance of "10 blue links". With knowledge web search it digests the world's knowledge, matches user intent and has a dialogue experience.

The dialogue acts in Bing and Cortana are:
  1. answer 
  2. confirmation 
  3. disambiguation 
  4. suggestion 
  5. progress: refinement.

In Bing, you get answers, there is an element of confirmation/correction, refinement dialogue and digressive suggestion. The interface is designed for naturally spoken language with context, confirmation and answer. You don't have to go to the search page, the disambiguation starts as you type. They train the system to try to summarise what it has to learn.

Some of the issues that bug the academic community are:
  • How to recommend completions for seldom observed or never foreseen queries?
  • How to rank these suggestions?
  • How to avoid making suggestions leading to no or bad results?
For finding researchers and potential collaborators they train a machine to go through and aggregate all the information.
Cortana provides proactive suggestions on Windows Android IOS. Concept is based on the successful personal assistants to the stars who write down the interests and activities of the people they serve to gain better insight. They have built in a lot of switches you can turn on/off for personalisation and if you have privacy concerns and now trained Cortana to do this for academics. One of the pain points you hit as a researcher is that you hit a paywall. Cortana tries to help by showing not only the academic article, but also related news stories.

The latest Microsoft vision is about empowering every person and every business to achieve more. They intend to do this through re-imaged productivity, more personal computing and most intelligent cloud. This translates to academic search, Cortana Academic and Project Oxford.

No comments:

Post a Comment