By the numbers | KCET
By the numbers
Google, making use of its 15 million scanned books (books it can't put on online until copyright issues are resolved) has made their words available instead. Millions of words, from books published between 1500 to about 2008. And in these words lie patterns of cultural change.
Sieve the words for frequencies of appearance over the centuries (or over mere years) and you can watch the flowering, duration, demise, and occasional resurrection of words and concepts (including phrases up to five words long).
The tool that sieves the words is the Google Books Ngram Viewer. (Don't be put off by the sci-fi sounding ngram. It's a statistical term used in the study of, among other things, language.) The viewer shows frequencies of use, based on the appearance words or concepts in a minimum of about 40 books.
To use the viewer, you plug in words. Single words generate a unigram; two words compared are bigrams. Trigrams are three word combinations. For cultural critics (and literature scholars), the Ngram Viewer provides statistical support for changes in what writers write about and the words they used. Persumably, changes in word/concept frequencies reflect changing cultural attitudes.
That's the grimly bookish part. The cool part, of course, is that you can do the analysis yourself - DYI scholarship. But beware of the limitations. For example, books in the Google corpus are library books - the ones someone thought to save from all those that have been printed.
So, how have some LA-centric concepts evolved over time?
- Los Angeles had a long upward climb in frequency as soon as the city appeared in accounts of the Mexican War (after 1849). Los Angeles peaks again in the boom time of the late 1880s and drops off in the bust that followed. A thirty-year run up from 1900 to 1930 tracks the age of civic boosterism. It's not until 1970 - when Los Angeles was recast as the capitol of lifestyle innovation - that the frequency of use returns to the heights of Jazz Age LA. Since then, references to Los Angeles have remained high but fluctuating. Perhaps the Los Angeles brand is in some distress?
- Comparing Los Angeles and New York yields predictable results. But comparing Broadway with Hollywood, shows the two "entertainment capitols" changing places from 1940 through 1970 with Hollywood leaping ahead in the years following.
- And who are we? The terms Angelino and Angeleno contended for supremacy as the collective noun for residents of Los Angeles from 1900 onward. (We'll ignore, for the moment, why both of these loconyms are incorrect.) After 1930, Angelino dominates. What's interesting - and unexplained by the data - is why both terms were converging at the end of the century. By 2008, Angeleno had edged slightly ahead.
The image on this page is a screen capture from Google's Ngram Viewer.