The Marginalian
The Marginalian

Culturomics: What We Can Learn from 5 Million Books

We’ve already established that we could learn a remarkable amount about language from these 5 essential books, but imagine what we could learn from 5 million books. In this excellent talk from TEDxBoston, Harvard scientists Jean-Baptiste Michel and Erez Lieberman Aiden reveal fascinating insights from their computational tool that inspired Google Labs’ addictive NGram Viewer, which pulls from a database of 500 billion words and ideas culled from 5 million books across many centuries, 12% of the books that have ever been published.

They call their approach Culturomics — “the application of massive scale data collection and analysis to the study of human culture.” From advising you on the best career choices for early success to figuring out when an artist is being censored to proving that we’re forgetting the past exponentially more quickly than ever before, the data speaks volumes when queried with intelligence and curiosity.

[The database pulls from] a collection of 5 million books. 500 billion words. A string of characters a thousand times longer than the human genome. A text which, when written out, would stretch from here to the moon and back ten times over. A veritable shard of our cultural genome.”


Published September 21, 2011

https://www.themarginalian.org/2011/09/21/culturomics-tedxboston/

BP

www.themarginalian.org

BP

PRINT ARTICLE

Filed Under

View Full Site

The Marginalian participates in the Bookshop.org and Amazon.com affiliate programs, designed to provide a means for sites to earn commissions by linking to books. In more human terms, this means that whenever you buy a book from a link here, I receive a small percentage of its price, which goes straight back into my own colossal biblioexpenses. Privacy policy. (TLDR: You're safe — there are no nefarious "third parties" lurking on my watch or shedding crumbs of the "cookies" the rest of the internet uses.)