68.6 F

Davis, California

Sunday, May 26, 2024

Column: Culturomics


In 1936, science-fiction writer H.G. Wells expressed the idea of a “permanent world encyclopedia,” a repository of all human knowledge available to every person on the planet, no matter their social standing. This new encyclopedia would unite in us a common understanding of our past, and consequently, our present.

Mr. Wells optimistically prescribed this encyclopedia, this “World Brain,” as a remedy to humanity’s problems — enlightenment for all. Ignorance, inequality and war would fade into darkness.

The World Wide Web exemplifies his utopian dream, albeit in a gleefully messy rejection of conventional encyclopedias.

Replete with rage comics and lolcats, Mr. Wells might object to certain portions of online life. “I Can Has Cheezburger?” is surely not one of the fundamental questions of knowledge that he sought to answer.

Happily, there is an affiliated development that carries his lofty aspirations. Google is digitizing books, surpassing the 20 million scanned last year. They believe they can finish the rest of an estimated 130 million unique books by the end of this decade.

Researchers recognized the significance of fully word-searchable texts via computer. All books ever written could be read simultaneously.

One group of researchers, called the Cultural Observatory, has dubbed their new field “culturomics.”

They are led by Harvard duo Jean-Baptiste Michel and Erez Lieberman Aiden — taking on the formidable task of organizing a deluge of information. Along with Google, they constructed an enormous dataset, capable of measuring the amount a word, phrase, name or number came up in books dating back to 1800.

Speaking statistics, these variables being measured are called Ngrams.

The product of this effort is a powerful query machine, the Google Ngram viewer.

With this tool, the researchers quantified culture — observing generational trends in a few seconds.

One particularly notable trend is that we’re forgetting the past with increasing speed.

By searching for a particular year, the researchers reasoned they could measure the relative importance of events associated with that year. They arbitrarily chose 1950 to begin.

The graphical results show that prior to 1950, few people wrote about 1950. Then, unsurprisingly, the amount 1950 is mentioned spiked dramatically during the year 1950.

And then something peculiar happened. People continued to write steadily about 1950, until the year 1954, when talk of 1950 descended almost as hurriedly as it rose. The bubble burst, as the researchers put it.

Further investigation showed that each year’s bubble tends to burst more quickly than the last.

The bubbles are getting bigger too. Each year is written about with increasing amplitude, as more and more books are published.

Even if we dwell in the past, it is an increasingly more recent past. The present is becoming louder.

Other trends studied with the Ngram viewer included censorship of certain authors in Nazi Germany, the chronology of flu epidemics, linguistic evolution, the rise and fall of famous names, gender inequality, prevalence of the word God and adoption of new technology.

Though implications of decoding human history on such a grand scale are impressive, there are of course problems. Many of the books could not go into this nascent version of the Ngram viewer. Some books were less legible, without an author, without a definite place or time of origin. As a result, the Cultural Observatory could only search 5 million books for their first paper. This fraction of the codex still comprises over 500 billion words.

Moreover, these are only books. The Ngram viewer does not contain periodicals, scholarly articles, tweets, pictures, paintings, videos, status updates — to name a few. These are all valid cultural expressions, and the Cultural Observatory says it is hopeful it can include more.

This is not a replacement of traditional close reading, either. Culturomics is complementary. Just as we cannot read all the books ever written, a computer cannot understand why they are important to read.

If the World Brain and culturomics interest you, I suggest a search for the Cultural Observatory’s TED conference presentation on YouTube, entitled “What we learned from 5 million books.” After all, I have little more than 700 words, while they have over 500 billion.

According to the Ngram viewer (books.google.com/ngrams) “hipsters” have been on the rise since the ‘80s. SEAN LENEHAN is at splenehan@ucdavis.edu.


Please enter your comment!
Please enter your name here