2021 Runner-up, Non-fiction.  Everybody Lies: big data, new data, and what the internet can tell us about who we really are by Seth Stephens-Davidowitz

Everybody Lies: big data, new data, and what the internet can tell us about who we really are by Seth Stephens-Davidowitz.  Obviously, to be a great novelist or mystery writer, you have to be an accomplished storyteller, but the same holds true for writers of non-fiction.  No one is going to read three hundred pages about cognitive psychology, astrophysics, or data analysis unless you can keep them interested with good stories.  And that is what Seth Stephens-Davidowitz does in his 2017 book about Big Data and how individuals, companies, and governments can effectively harness its powers.  He uses entertaining examples from a variety of fields, including sports (what’s the single best data point to predict future horse racing success?), government (can search terms provide more timely information on flu outbreaks, crime patterns, or unemployment than official sources?), sex (are people really having as much sex as they say they are?), and business (why do casinos offer you those free buffet dinners?).  There is so much breadth to his examples that I tasked a college student I am mentoring to note every different type of job mentioned because he is interested in data analysis as a career.   

The stories keep readers’ attention while also introducing and demonstrating the four specific, and in Seth’s mind “unique,” powers of Big Data.  First, the ability to gather search terms and other data points from publically-available sources creates new types of data and “gives us windows into areas” that sociologists and surveys could not previously reach.  Second, this type of data is more honest than what people say in surveys because of the anonymity of the web.  Third, there is so much data that researchers can zoom in on precise information and still have it be based on a meaningful size of responses.  Finally, the low cost and speed of collecting data allows us to do exponentially more causal experiments. 

The discussion of these powers are the core of the book, but he also talks about the digitization of medical records, books and media, and photos as new sources of data, the Doppelganger effect (think of the tv show “Bull”), and the concept of A/B testing to speed up and improve decision making.  He concludes with questions about the limits of Big Data and the concerns about its ethics and dangers.  Given the rapid rise in legislation regarding protection of individuals’ data online, the one weakness I have with the book is that despite being only four years old, it feels a lot older than that.

For all my previous “books of the year” lists, see my dedicated page for these titles.



Leave a comment