Humanizing Big Data with the Hedonometer
Recently I was listening to a podcast while falling asleep, as I do every night. One of my favorites for this purpose is Reply All, which never fails to fascinate while finding a consistent balance between humor and poignancy. I highly recommend it to anyone interested in learning about the most bizarre niches of the already weird phenomenon that is the Internet.
On this fateful evening my sleep was disrupted by my intense interest in the subject of the episode (“Happiness Calculator vs. Alex Goldman”). I stayed up to hear the hosts’ discussion of that happiness calculator: the Hedonometer. Since 2008, UVM-based data scientists Peter Dodds and Chris Danforth have been tracking the collective mood of English speakers on Twitter in order to get a sense of trends in users’ moods over time, in almost real time.
From corpora comprising Twitter messages, music lyrics, the NY Times, and Google Books, Dodds and Danforth drew a set of ~10,000 of the most frequently used words and outsourced the labor of ranking each word from 1 to 9 (sad to happy) to Amazon Mechanical Turk. You can investigate their ranking of words from saddest to happiest on this page; even on its own it’s pretty compelling.
It’s also worth mentioning that Dodds and Danforth are working on a method to place these isolated words in context by taking the emotional temperature of relevant phrases.
What amazes me about this project is the enormous breadth of its data set. It analyzes a 10 percent random sampling of all tweets posted daily. This means that it analyzes roughly 15 million English tweets per day. The team is beginning to work on doing the same for other languages, too. Since the project’s beginning over a decade ago it has grown exponentially in tandem with Twitter.
So what have they learned? The Hedonometer shows that English-speaking Twitter users have gotten considerably sadder over the last half-decade. As Alex Goldman puts it in that episode of Reply All, “Between 2016 and now, we’ve lost a Christmas day of happiness.” (The Hedonometer shoots up every Christmas.) Prior to 2020, the Hedonometer reached its lowest point after the 2017 Las Vegas shooting, but this year has triggered nadir after staggering nadir in our collective happiness.
The sentiment that the hosts of that podcast expressed—that it is depressing but also “kind of comforting to quantify at least a little bit how bad 2020 has been”—resonated with me. Of course I know that I haven’t been alone in this, but to have it reflected on that massive scale makes it more tangible that I’m really not alone. This is especially important during an experience this isolating: for many of us, there’s no way to physically come together for mutual support.
Twitter isn’t a perfectly representative sample of the English-speaking world; even the creators of the Hedonometer acknowledge in their FAQ that “Tweets represent a non-uniform subsampling of all utterances made by a non-representative subpopulation of all people.” But 1 in 5 U.S. adults uses Twitter—that’s a hugely valuable sample. And either way, this ambitious effort raises an interesting counterpoint to conventional wisdom. I think we generally relegate quantitative data to the realm of the “cold” or “emotionless,” but this project demonstrates just the opposite. If data can act as a window casting even a small shaft of light onto the bigger, deeply human picture of our collective emotional experiences, what else can it do?