Data in the Hands of Profiteers: A Conversation with Mary F. E. Ebeling
By Julien CrockettJuly 14, 2022
— Jorge Luis Borges, “On Exactitude in Science,” 1946 (“quoting” Suáreza Miranda, Viajes de Varones Prudentes, Libro IV, Cap. XLC, Lérida, 1658).
MAPS AND MODELS are among our best tools for learning and creating actionable knowledge. And yet, maps are not accurate representations of the territories they purport to capture and models are not reality. At best, they are imperfect approximations that reveal some truths and conceal others, invariably reducing complexity through their creators’ choices. In her book Afterlives of Data, sociologist Mary F. E. Ebeling explores the many ways in which our data economy pushes us to forget this distinction — and the consequences thereof.
JULIEN CROCKETT: I’d like to start with the one paragraph short story by Jorge Luis Borges, “On Exactitude in Science,” that you quote in Afterlives of Data. What is the short story about, and why is it relevant to your book?
MARY F. E. EBELING: Borges’s story came to mind during an interview I had with a data scientist at a health informatics institute. She was talking about her work building statistical models based on data and mentioned that a data model is considered scientifically “true” or factual if it has an error rate of two percent or less. But as these models take in more data — and you can never possibly collect enough data to replicate what the model is trying to measure or predict — and try to account for additional variables, they become unstable. This made me think of Borges’s story and the idea that a map can become a one-to-one model for the real world and how we can come to rely on the map as more important than the real world.
This idea is what I explore in my book: what are the repercussions of data, especially data about our lives, now being collected and used to build “replicants” of us (if we want to invoke Philip K. Dick or Ridley Scott). Companies within the data economy are building models of our lives with the promise of measuring us one-to-one, but they are in fact creating abstractions. I discuss how marketers use our health data to construct “data creatures” that “walk around” and “inhabit” our Earth. These creatures are somehow related to us, but they are not us. And these data creatures can actually do a lot of harm, as I lay out in Afterlives of Data and in my first book, Healthcare and Big Data.
The promise and the failure of approximation is at the crux of so many issues in society. But isn’t there something different about the models we create today? By having more data, more processing power, can’t we create better models that better approximate reality?
One of the really important things about Borges’s short story is that it’s about power. It’s about empire. It’s about having control over territory, and today the territory is us. To me, that makes matters worse. It’s worse because everything is digitized, and moves as quickly as these companies can scrape and collect our data, and as quickly as they can construct models disassociated from our lives.
In my first book, I show how commercial data brokers who sell marketing data access petabytes of data on virtually everyone living in the United States from both public and private sources in order to build predictive models whose aim is to market to us, but also to shape our behavior. While a data broker may hold up to 10,000 data points on an individual, it doesn’t mean that those data are an accurate approximation of someone’s life, especially when they are under the control of powerful data companies, like Google or LexisNexis, companies that seek to profit from our data. Scholars like Kate Crawford or Wendy Hui Kyong Chun have argued that predictive models or machine learning AI rely on past data to predict the future. And in my own personal experience with a miscarriage, where data brokers “accurately” predicted from my consumer data that I was pregnant, but then couldn’t predict my miscarriage, these models can be grossly fallible. In a way, in their attempts to master chaos, these data models end up obfuscating, and reinforcing, ignorance.
What made you want to write about data and how data collection practices affect society?
In 2011, I was part of a clinical trial at a fertility clinic to test a fertility drug that was being used in the IVF process. I became pregnant and was being monitored closely. At six weeks, I had an ultrasound and was shown the vibrating pixels on the screen that indicated a heartbeat. That day I came home and my mail carrier had delivered a box of free samples of Similac. I was under the baby spell, so I thought it was kind of weird and funny. It also occurred to me, however, that it was awfully weird that they knew I was pregnant. At 10 weeks, I miscarried. After I came home from the doctor’s where I had seen that the pixels had stopped vibrating, I found that the postal carrier had delivered a free subscription to American Baby. The direct marketing did not stop for the next five years.
Among the kind of direct marketing that I received was a letter from a research lab at a local university that does research on infant language acquisition. The letter was personally addressed to me and to my baby inviting us to come and participate. Because I’m a sociologist, I knew that they had to undergo IRB (Institutional Review Boards) clearance in order to contact me. If I called them, they would have to tell me how they got my information. I called and the woman on the phone told me that they had bought a database from Experian, a credit reporting agency. The database contained information on about 1,200 families from x-number of ZIP codes that were in the region of the research lab, all with births in the household over an x-number of months. And when she looked up my data in the database, she had my name, my mailing address, and the date of birth of my “child,” March 2011. It was the same date that I miscarried.
That is when I decided to dedicate my research to investigating data: how data brokers expropriate our data, using our data as the raw materials that is made to conform to a marketing logic of late capital, and how we become what we are in this data economy — physically, emotionally, psychologically. I wanted to investigate what data make of us and what we are materially in this political economy of data.
How do you define data?
Over the past 10 years, I have come to understand data as being the objects that are created about us by those in power. They take the immeasurable aspects of our lives — our emotions, our bodies, our relationships, our experiences — and make them quantifiable by capturing or making inferences based on digital clues.
As you mentioned, different types of data can be collected about us (e.g., health data, behavioral data, financial data). Are these various types of data treated differently legally and by organizations? And how are data analytics companies able to piece them together to create models of us?
Early in my research around 2014–2015, I went to a direct marketing convention to interview data brokers and I spoke with a brilliant data broker who told me: all data are health data. In my research, I’ve come to learn that this is exactly right. In the United States, we have a highly fragmented system when it comes to regulation of data, specifically data security and privacy. Depending on where the data are produced, they are regulated differently. So in the health-care sector, data are highly regulated and have to be protected and secured differently compared with how consumer data are handled in the retail sector. But because the regulatory system is fragmentary, there are many ways that data “slip” through the cracks that allows certain kinds of data to be collected and reassembled in order to make very rich and detailed profiles of each and every one of us.
Taking my experience as an example, I produced data through the fertility clinic. I also produced data every time I used my credit card to purchase ovulation kits from CVS or fertility drugs from a specialty pharmacy and every time I searched online about what to expect in early pregnancy. You would assume that a lot of this health-related information would be protected by HIPAA. But HIPAA, the Health Insurance Portability and Accountability Act, does not protect data privacy in the way that many people may assume. In fact, HIPAA enables data to be circulated within certain parameters — it makes data safe for circulation, among covered entities and their business associates, which include, for example, hospitals, health-care clinics, doctors, as well as health insurers like Medicaid and Medicare. When I used my credit card at the health clinic, a covered entity under the HIPAA regulations, certain data from each transaction would be visible to entities not covered under HIPAA, including banks and the credit reporting agencies. If all data are health data, all data are also financial data.
So all of this data were reassembled by a place like Experian to build what I call a data “revenant” in my first book (although I should maybe start thinking about “replicants”). They took all of this data and were able to animate it, give it a new life, and construct a “baby” for me. I understood this as “lively data” or data given a new life, alienated from my life. And this baby conformed to a late-capitalist, middle-class American marketing idea of what a baby should be, and what a baby should want. A baby should want a car seat, a baby should want high-end organic cotton baby clothes. And this baby kept on knocking at my door, like literally knocking on my door for five years every time my mail carrier delivered to me more formula samples or direct mailers for child life insurance. And I was able to follow this baby as it grew from being a newborn to an infant to a toddler. And the last time I heard from my baby, it was already in preschool and about to go to kindergarten.
How did the data brokers miss the fact that you had a miscarriage? Isn’t this an example of privacy rules working? Or do you think it’s more an example of your story not fitting the narrative they built about you?
This is not privacy working because we have no privacy under capitalist data surveillance. And with the recent overturning of Roe v. Wade, this lack of privacy becomes even more chilling. All of us in the US have two bodies, a physical one and one made of data. For some of us, our physical bodies have some rights to privacy and autonomy. But, of course, this depends on what kind of body one has. If one has a disability, or is gendered or raced or classed in a certain way, one will have more rights or less rights — but our data bodies never had either a right to privacy nor autonomy because these bodies are built and controlled by data companies, or by for-profit health-care providers, or financial data brokers, all of whom seek to profit from our data bodies by their commodification. And not too long ago, SCOTUS ruled that any online data about us is protected from government seizure, but it is perfectly legal for the government to buy it from data brokers. As I mentioned with HIPAA, the regulations enable data to circulate as long as patient data are deidentified and “secured.” And once these data are completely deidentified and made “safe” to leave a covered entity’s control, they are made into data assets that can be sold. The entire data economy was built to ensure that data circulate “frictionlessly,” that is, without barriers, such as privacy or consent, that clog up the system. Of course, consumers are given things like a Terms of Service contract when they download an app that “feel” like we have some control over our data privacy. But have you ever read those terms and then decided to not download an app?
My data baby did not fit the trajectory that a healthy, white middle-class baby in the United States is supposed to fit. I was supposed to give birth to a live, healthy baby. And it was supposed to grow as it should, with no developmental complications. In reality, the United States has one of the highest rates of maternal and infant death among countries with advanced economies, OECD countries. And for every white woman that dies in childbirth, four Black women die from pregnancy and birth complications. But these are the realities that are inconvenient to data marketers and their quest for “frictionless” data.
We should think about what our each having a data revenant means for people in general. People who might lose their home, be in a horrific car crash, be up for parole, or might need a kidney transplant but could be denied it by an algorithm. This is why the Borges short story is so brilliant. Our models can approximate reality but can never capture the nuance of individual lives.
We regularly hear that our data will empower us — and we are pushed to understand ourselves as “dividuals” (sums of data), as you write in your book. How have narratives of sociotechnical imperatives led to this understanding of ourselves?
The stories of being empowered by our data are simply marketing pitches. We are not empowered by our data. The problem with the empowerment language, especially in the hands of database marketers, is that it shifts the onus — the responsibility of the entire data economy system, a system that was actually built against us — back on to us. The credit score is a prime example of what I’m talking about. The algorithms, the formulas of how a credit score is built, is proprietary. In other words, we can’t know what data are put into the model nor can we know how the algorithm works, if you will, how it operates, or how it is used in contexts outside of, say, applying for a mortgage. But then we are told by FICO that we can be empowered by our data: empowered by actually sharing more of our data with the company! We — the patients, the consumers, the indebted — get the short end of the stick. This system was not built for us. It was not built to benefit our health or our lives. The system was built to make companies money and to extract value from our experiences, including our tragedies.
But collecting data in itself is not the issue, right? As humans, we have a desire to understand the world and among the best ways we can do that is by collecting data and analyzing it.
Of course. There are brilliant philosophers, anthropologists, and sociologists whom I reference in my work and admire deeply. Someone like Sabina Leonelli primarily studies how scientists in the biological sciences use data. She’s looking at how data are produced and used in a completely different way from me, and one of the brilliant things that she talks about is how data are used to anchor facts and evidence to the real world. For example, how is this going to provide us with a more nuanced picture of climate change. So, yes, you’re right that we use data to tell stories about reality and to help us anchor what the truth is to what reality is and also to give us a sense of what we might expect in the near and the not-too-distant future. I’m not against data. I am against data in the hands of profiteers, and those who have the power to use our data to harm us.
What does a better system look like?
I’m a sociologist, and I tend to be pessimistic. We’re curmudgeons. I will say that we cannot make change as individuals. We are completely powerless. But I have a lot of hope in collective action. One example is the organization Mijente, a LatinX immigrant rights organization, taking on LexisNexis, which among other things provides data brokerage services. Mijente is exposing LexisNexis for sharing information with ICE about undocumented immigrants. Mijente is, in other words, going directly after the data brokers who are helping the surveillance state identify undocumented people. There are a lot of collectives like this that are now being organized, who are insisting that data about us be used for the benefit of us, to be built on trust and care. And they are going after Big Data corporations that continue to profit from data about our traumas. These collective efforts represent the only way forward, the only way that we can change any of this.
Julien Crockett is the Science and Law Editor at the Los Angeles Review of Books.
LARB Staff Recommendations
Big Data is watching you, and Lev Manovich thinks that’s just fine.
Evan Selinger vehemently argues against the tenets of Firmin DeBrabander’s “Life After Privacy: Reclaiming Democracy in a Surveillance Society.”
Did you know LARB is a reader-supported nonprofit?
LARB publishes daily without a paywall as part of our mission to make rigorous, incisive, and engaging writing on every aspect of literature, culture, and the arts freely accessible to the public. Help us continue this work with your tax-deductible donation today!