"Unsupervised machine learning is hard. There are many examples of supervised machine learning, but these are driven by subject-matter experts that guide the machine towards specific discoveries. It will take much more than ten years to master the extraction of actual knowledge from big data sets.” —Christian Huitema, distinguished engineer, Microsoft Corporation; active leader in the IETF; based in Redmond, Washington"
"A key question over the next six years is how far Google’s current techniques can take them. The strategy for the last six years has been constant: MORE DATA. But even Peter Norvig, head of Google Research, admits that there are declining returns to the more-data game. Certainly, it doesn’t appear that just adding more data is going to yield Gary Snyder’s translations of Chinese poetry. Eventually, it seems to me, Google (or any other translation software) will have to start understanding (in some way) the semantic content of the words it is arranging. And that’s a much harder AI problem to solve than the one that’s brought you the wonders of Google Translate."
"Harvard is making public the information on more than 12 million books, videos, audio recordings, images, manuscripts, maps, and more things inside its 73 libraries. Harvard can’t put the actual content of much of this material online, owing to intellectual property laws, but this so-called metadata of things like titles, publication or recording dates, book sizes or descriptions of what is in videos is also considered highly valuable. Frequently descriptors of things like audio recordings are more valuable for search engines than the material itself. Search engines frequently rely on metadata over content, particularly when it cannot easily be scanned and understood. Harvard is hoping other libraries allow access to the metadata on their volumes, which could be the start of a large and unique repository of intellectual information. “This is Big Data for books,” said David Weinberger, co-director of Harvard’s Library Lab. “There might be 100 different attributes for a single object.” At a one-day test run with 15 hackers working with information on 600,000 items, he said, people created things like visual timelines of when ideas became broadly published, maps showing locations of different items, and a “virtual stack” of related volumes garnered from various locations."
"We’re used to personalization on the consumer Web, from book recommendations on Amazon to the news feed on Facebook. But what will it mean for learning as colleges, too, increasingly mine data to shape the student experience? What does educational personalization look like? How finely should technologists try to parse it—down to individual learning styles? How will personalization conflict with existing regulations? And what are the risks?"