EVERY DAY, we see a graph that appears to definitively measure the number of coronavirus cases or the rate of infection in a given geographic region. The counts might be accurate, or they might be seven, or 13, or who-knows-how-many times too low. Still, I check in on them, and hope they’ll help me make some decisions about my life, because what other information do I have to go on?
In his new book Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society, out this summer from New York University Press, Sun-ha Hong probes how we’ve come to put so much faith in graphs and numbers. Data is often gathered in messy ways and held to arbitrary standards; it often privileges quantifiable aspects over qualities that matter more. Still, we are taught to trust information gathered by a machine, which promises to be free from human error: an attitude that Hong says has led to both increased surveillance and datafication. “There’s this widespread fantasy that more data equals more information equals better judgment,” Hong told me, a fantasy whose flaws are becoming painfully obvious in real time. Technologies of Speculation is a book about how recent technologies both reflect, and work to perpetuate, the supremacy of quantitative knowledge, which is always less neat and less complete than we’d like to think.
EMILY WATLINGTON: I’ve been looking forward to your book ever since I read the article version of the fourth chapter, “Data’s Intimacy,” where you argue that we go for these self-tracking devices because we want some device to know us better than we know ourselves: the Fitbit, as one example.
SUN-HA HONG: When I say “data’s intimacy,” I’m describing this fantasy that, the more we surround ourselves with smart machines, and the more we hand over the private aspects of ourselves to corporations, the more likely we are to get these machines that really do know exactly what we want. That’s a fantasy of convenience, and of accuracy and certainty: some people want to believe that if technology can access who you are very perfectly, it can help you get everything you want. But intimacy is always accompanied by vulnerability.
The major problem that I focus on in the book is how this intimacy gets exploited. A data pipeline is ostensibly created for one purpose, such as to track your exercise and allow you to know yourself better. Then, that data is appropriated for new functions. With Fitbit, there was this idea that bringing data to the people was going to empower them. And in some ways, you could say that it did. But what happened next? Insurance companies are partnering with Fitbit and giving out the devices to customers for free. In return, insurers want the data. At the moment, in the United States, insurance companies are legally forbidden from using that data to directly inform premium costs, but that’s the horizon of potential use. It’s only a matter of time.
In the book, you critique the impulse to collect as much information as possible, and always the widespread belief that transparency and information are inherently beneficial. As we are seeing with the coronavirus … two people can look at the same data, and come to wildly different conclusions. You write that, “The trouble is not that datafication is imperfect but that it opens up a gap between the futuristic promise of better knowledge and the practice of fabrication” — and you describe fabrication as, basically, fudging it. What are your observations regarding coronavirus data?
While working on the book, I came across this tongue-in-cheek comment from anthropologist Mary Douglas about climate change. She said: Listen, you’re trying to educate people and give them all this information about climate change … all the models, all the uncertainty. But don’t you realize that the more information you give people, the more you open up possibilities for misunderstanding and misinterpretation? The answer is obviously not to withhold information, or to just tell people what to think. The point is that just dumping massive amounts of information often actually harms rational debate.
I warn that we are too fixated on transparency, and the idea that information is going to yield truth and reason. We need information, but we can’t just leave the information hanging: especially when the data we have is very, very messy. With the virus, we’re currently writing an encyclopedia of ways in which things can go wrong. Messy data becomes a kind of toxic gas that suffocates the public sphere with bad takes.
In the United States, given the chronic lack of testing, we will never know how many Americans were infected with the virus. And all of these gaps and margins of error become opportunities for speculation.
And it’s also not just that “the people” are not smart enough, or not data-literate enough: you talk about many cases of collecting too much information, to the point that it can’t possibly be useful. You quote the 2005–’14 NSA director Keith Alexander, who argues that the agency’s job is not just to look for the needle in the haystack but to “collect the whole haystack.” Later, you point out that the Snowden affair “brought to light” so many documents that few people — probably not even Snowden himself — read them all.
In some ways, what happened with the Snowden affair is quite similar with what’s happing with coronavirus information. Edward Snowden released all these documents hoping to help people have a rational debate about NSA surveillance. That happened to a degree, but it also fueled conspiracy theories, speculation, doubt, and disagreement.
This is indicative of a broader relationship problem between people and information. Take police body cams: these devices were supposed to solve the problem of racist police brutality through transparency. But that cop who killed George Floyd … he knew he was being filmed, and he didn’t care. When people see something like a clear, daytime video of an old man in Buffalo been being smashed to the ground by the cops and bleeding out of his head … some people will look at that and conclude, oh, this man was a looter or a rioter or an anarchist or what have you. Cops know from experience that data isn’t objective, or universal, or unbiased.
Trump said regarding the Buffalo video that the man fell harder than he was pushed … an example of someone just seeing what they want to believe.
Then everyone is looking at the video trying to figure out what happened before, what happened afterward.
Right. It’s something collected by a machine, so there’s this idea that it’s objective, but we all know how editing and framing impact what we see. Besides that, you also push back against the idea that, even if we had recorded an hour before and an hour after this event … transparency doesn’t guarantee accountability.
You can’t achieve certainty or knowledge just by piling up enough facts, enough data, enough statistics. But that’s the model that we work with when we use data to try and figure out the world: we want to believe that we can just accumulate more and more knowledge until we’ve got it all figured out. What really gets you from information to understanding, or what gets you from numbers to judgment and consensus, is all the qualitative stuff around the data. To return to the Fitbit: the standard of 10,000 steps per day that it holds users to comes from a recommendation invented to market a midcentury Japanese pedometer called “Manpo-Kei,” which literally means 10,000 steps machine. One’s health is compared against this arbitrary number, and the device tells you whether you succeed or fail, and by how much.
That gets us to another essential part of your book: questioning our faith in quantification. You write that numbers are “something we look for and seek assurance from.” Yet you also write that “evidence does not extinguish uncertainty, but also refocuses it,” and also that “there are many mundane gaps between the promise of revelation and the messiness of information.”
There’s this fantasy that, when you see a number, it’s going to have this gravitational pull of objective truth, a force that compels everyone else to meet you there. In practice, numbers are incredibly useful mostly because we’ve invested so much credibility in them. That means numbers are opportunities for people to push partisan agendas, or to reinforce their existing worldviews. In the book, I talk about the imaginary dimension of numbers, the affective dimension of numbers, and the speculative dimension of numbers.
We brush away a lot of the real-world obstacles and say, on paper, we could have predicted that, or measured that. In the book, I talk about a hunger for data that results in the massive growth of state surveillance after 9/11. There was this idea that we could have prevented 9/11 if we only had had more data, if we had only connected the dots. And there was the sentiment after the Boston Marathon bombings: this sense that, you never know who might become a terrorist, so you might as well surveil everyone. Because terrorist attacks are so rare and so singular, it’s very difficult to create statistical models around them.
The Snowden files showed that the NSA was struggling to figure out what kind of data is useful. They included internal memos written by someone called the SIGINT Philosopher — short for signals intelligence philosopher — who talked about analysis paralysis, and focused on the gap between expectations and reality. We go out there and gather data and expect it to deliver predictions. We need it to deliver certainty. We need it to deliver actionable information. So when the data isn’t up to snuff, we get what I refer to as fabrication: this process of filling in the gaps with human guesswork to try and make the data do what we need it to do. That’s where a lot of prejudices and fallacies re-enter the data through the backdoor.
So the idea is we have to watch everybody, and we’re going to let the data lead us to the terrorist. But in practice, it’s the same brown body that is being exhaustively datafied. In Assia Boundaoui’s documentary The Feeling of Being Watched (2018), we see the Muslim and Arab Chicago neighborhood that Boundaoui grew up in. Everyone in the neighborhood grew up with this feeling that they were being surveilled by the FBI. It turns out that the FBI did indeed conduct years of surveillance on this community: it was called Operation Vulgar Betrayal, and it resulted in zero actual terror charges. Surveillance systems tend to be dominated by false positives, inconclusive cases, and large error margins. And they tend to look for the same brown terrorist, often excluding the real dangers facing the country today. For example, the FBI had been using a survey to predict how likely a given suspect is to commit a terroristic or violent act. One of the questions is: Is the suspect a recent religious convert? That question might catch a jihadist, but it’s not going to catch a white supremacist. And in the years since 9/11, white nationalist terrorism has killed more people in the United States than jihadist terrorism.
What would you say is the strangest self-tracking device that you came across while doing your research?
It has to be Spreadsheets, a sex tracking app. You put your phone on the bed, and it listens to you having sex. It delivers data such as thrusts per minute and decibels. But in what universe do those metrics equal a good sex life? How is that meaningful data? They’re just collecting whatever data they can get from the devices typically found in a smartphone. This is one example of how we end up privileging the kind of information we can quantify, and are encouraged to forget the more subjective qualities.
That sounds like an example not just of the impulse to quantify everything, but to gamify it, too.
With Spreadsheets, the effects are laughable. But the same principle plays out in situations where parents, teachers, doctors, employers, cops, or immigration officers have sensitive data about you. They’re the ones deciding what it means and whether it matters. That’s when you see a kind of epistemological tyranny that says, whatever looks and sounds like data, whatever is easy to measure … we’ll measure it. To derive the truth about you, we’re going to have to ignore everything that the sensors cannot catch, everything that doesn’t fit into the dropdown menus.
Even grades attempt to quantify someone’s intelligence and effort, and these numbers can have a huge impact on a child’s entire life.
One result of the valorization of data and predictive analytics is that we ourselves are being asked to be more predictable. There’s this idea that a hard-working, responsible, successful person should show it in a way that a machine can read. For instance, nowadays, it’s increasingly likely that, if you submit your CV to a job opening, some automated system is going to filter it before a human ever reviews your file. One of the tricks that people came up with is writing “Stanford” or “Harvard” somewhere in there, but making the font white. That way, it’s invisible to the human eye, but many AI tools will pick it up. It’s a great trick, and I recommend that everybody does it.