DURING THE 2010 and 2012 elections, Facebook piloted a scheme to increase voter turnout. Called the “voter megaphone,” the experiment allowed users to click an “I Voted” button and to see if their friends had voted. According to the company, the megaphone brought over 300,000 people to the polls who would otherwise have stayed home. It was a triumphant display of social media’s power to engineer social responsibility. News sites and podcasts abound with stories like this. Tech wizards and data scientists always seem to be tweaking algorithms that promise to make us more civically engaged, environmentally friendly, and personally efficient in a continuous progress toward better, more meaningful lives.

Yet there is a danger to this invisible nudging. Imagine what would happen if that “I Voted” button appeared only to Facebook users who lean conservative or only to those in particular districts. An innocent get-out-the-vote campaign could quickly morph into a potent strategy for manipulating elections. And what if such problems don’t derive from programmer bias but are built into our evolving metrics culture?

According to Cathy O’Neil, tools like the voter megaphone — along with Facebook’s newsfeed algorithms — have the potential to become what she calls, in the title of her 2016 book, WMDs: Weapons of Math Destruction. Recently reissued in paperback, Weapons offers an accessible introduction to the high-stakes politics of social quantification. O’Neil pulls together analyses of well-known measures like standardized tests, university rankings, and credit scores, and offers insights into the obscure calculations behind work schedules, prison sentences, and insurance rates. O’Neil defines all of these as WMDs: systems of social measurement marked by large scale, opacity, and the potential to cause harm.

The danger in the spread of WMDs comes from the ability of such quantitative instruments to “define their own reality and use it to justify their results.” Common sense says that numerical structures reflect the underpinnings of social or natural worlds in a neutral way. O’Neil shows that common sense is wrong. Metrics derive from quantified assumptions that may or may not represent the world in ways that the users or targets of numbers would accept — if they knew what they were. In O’Neil’s words, metrics may rely on “poisonous assumptions [that] are camouflaged by math and go largely untested and unquestioned.”

A former professor at Barnard College and creator of the Math Babe blog, O’Neil well understands the power of mathematics to enable scientific discoveries, medical advances, and economic growth. But, as she explains in the story that opens her book, the 2008 financial crisis showed her the dark side of math. Despite a reputation for objectivity and disinterestedness, social metrics are ultimately the product of human choices, and O’Neil shows how “many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly manage our lives.” From Department of Education metrics that target “failing” schools to the inclusion of neighborhood crime statistics in recidivism predictions, WMDs feed cycles of inequality and reinforce racial divisions. Yet, until recently, most have operated behind the scenes, and when they do get noticed, news coverage tends to emphasize their effectiveness as vehicles for increased accountability, transparency, and fairness rather than their potential for harm.

Weapons of Math Destruction is one of several critiques of the social and cultural role of quantification to appear in 2016. Sociologists, anthropologists, and intellectual historians have been attuned to the significance of indicators, metrics, rankings, and algorithms since the 1990s, when key studies like Theodore Porter’s Trust in Numbers (1995), Michael Power’s The Audit Society (1997), and Marilyn Strathern’s collection Audit Cultures (2000) were published. The advent of neoliberalism ushered in imperatives to reform government and public services through a focus on accountability. Education has been a favorite target for accountability reform since the 1983 publication of A Nation at Risk, which raised the alarm that the United States was falling behind global competitors in achievement tests. Twenty-three years later, Margaret Spellings’s Commission on the Future of Higher Education renewed the pressure on colleges and universities when it called for “a robust culture of accountability and transparency throughout higher education.”

Whereas American reformers have been mainly concerned with undergraduate learning, faculty research productivity has come under the microscope in the United Kingdom, Australia, and New Zealand. British academics have faced quantitative evaluations since the 1986 initiation of the Research Assessment Exercise (now the Research Excellence Framework), and university departments in New Zealand have been subjected to Quality Evaluations by the Performance Based Research Fund since 2003. Beginning this year, UK scholars are also audited as teachers, and scores from the Teaching Excellence Framework are now published alongside universities’ REF results and World University Rankings in Times Higher Education.

Much of the scholarship on quantification and audit cultures focuses on higher education, and critics like O’Neil, Strathern, Cris Shore, and Susan Wright have assembled a powerful description of the accountability movement’s effects on research, teaching, and academic labor conditions. Less common are studies of the mechanisms by means of which quantitative evaluation reshapes university life. Wendy Nelson Espeland and Michael Sauder’s Engines of Anxiety: Academic Rankings, Reputation, and Accountability fills this gap admirably. Citing evidence from interviews with deans, admissions officers, career services personnel, and prospective students, Espeland and Sauder demonstrate that metrics like the U.S. News law school rankings can shape every aspect of organizational behavior toward the metrics’ standards and expectations, a phenomenon social scientists call reactivity. By norming educational practices toward those the rankings define as good, rankings become “constitutive rather than simply reflective of what they are attempting to measure.”

Law schools, students, and employers rely on U.S. News rankings as a primary means to understand the field of legal education. A move up or down the ranks, even only a few places, can redefine a school’s future. The rankings influence admissions and financial aid offers, the distribution of resources within programs, career counseling, and deans’ reputations. They structure the day-to-day work of staff and administrators, who are pressured to evaluate every decision’s potential impact on “the numbers.”

In Espeland and Sauder’s view, critics need to go beyond simply identifying the effects of quantitative instruments to analyze how they provoke reactivity. We should work to “[depict] the mechanisms that generate consequences,” they write, because “[a]nalyses of mechanisms produce deeper causal knowledge of social relationships and their dynamism.” To this end, Espeland and Sauder enumerate four ways that rankings and other indicators cause people to “draw new cognitive maps” of their organizations and professional worlds. “Commensuration” is the process by which differences in quality are translated into differences of degree or quantity, which discourages diversity of mission or goals. “Self-fulfilling prophecies” mean that top schools “win” when judged by standards that are themselves based on the traits of the top schools, as determined by reputation surveys. “Reverse engineering” is the process by which institutions identify and manipulate isolated ranking components, which become the basis for resource allocation, as opposed to the quality of the underlying activities the components approximate. Finally, institutional narratives become oriented around assigning praise or blame for ranking changes, which shifts — or weakens — the institution’s identity and autonomy.

Defenders of university rankings argue that such metrics create common standards for measuring performance, which then force schools to improve or risk a decline in rank. It’s true that many universities have been viewed through the lens of reputation and prestige, and that rankings make a genuine contribution by putting some data points on the board. As Espeland and Sauder explain, empirically based accountability measures generally have a populist or democratic dimension in that they help non-experts see into selective — not to say forbidding and exclusionary — institutions that have done a poor job of explaining why college-level learning costs more than people expect, or who pays for research, or even why students shouldn’t be admitted solely on the basis of standardized test scores. Rankings express metrics’ genuine strength, which is to simplify complicated, missing, or withheld detail into a comparative picture that regular people can understand — and understand without having to learn much about what is being rated. In democracies, where people should have a say about systems they may not fathom, quantitative metrics seem like a good solution, or the least bad solution we currently have.

Yet Espeland and Sauder join O’Neil in critiquing metrics for systematically (rather than occasionally) distorting institutions. In their view, while rankings claim to encourage improved service quality, they may have either no effect or a negative one. “Because measures are never an exact representation of the qualities they are designed to assess,” they point out, “it is almost always easier to improve the number than the underlying qualities.” When law schools fell under intense public scrutiny a few years ago for publishing misleading employment statistics, the response of many was not to hire more career counselors or provide more individualized attention to third-year students, but to temporarily employ their own graduates, count graduates employed in jobs that do not require legal training, and assume that alumni they could not locate must be employed. Though their numbers improved, employment outcomes did not.

These changes can’t simply be dismissed as gaming the system — they are logical tweaks that could be construed as cleaning the numbers up while schools create bridge employment for students still looking for legal jobs. If everyone is doing it, then it doesn’t give any one school an unfair advantage. And if everyone is doing it, then your law school better do it too. If you don’t, you’ll fall behind. In short, the indicator satisfies the demand for uniform standards by which schools can be compared. But the result of a set of rational technical adjustments is a number that increasingly diverges from the thing it seeks to measure.

Looking at examples like the voter megaphone, No Child Left Behind, and U.S. News rankings, we can pose a number of dramatic questions about the dominance of quantitative thinking in contemporary life. Are numbers overwhelming society by superseding individual judgment? Are they silently remaking ethical norms? Which groups get to have a voice in building metrics, and which are rendered silent? Do technologies of quantification threaten to replace democratic debate in policy decisions? Does quantification weaken democracy rather than strengthen it?

In her book The Seductions of Quantification, Sally Engle Merry focuses on such questions in her analysis of the construction of international indicators in three areas: violence against women, human trafficking, and violations of human rights. She asks whether the indicators moved policy away from the lived experience and social causes of these problems or whether they created an international framework that can help victims across the world hold perpetrators accountable.

Interestingly, Merry’s answer is both. Her deepest analysis is of the first topic — violence against women. Merry describes the results of six years spent performing ethnographic observations of processes that produced indicators designed to measure forms of violence against women so that policymakers could better respond. How could indicators do this? The stakes are clear in experiences of spousal battery like one that Merry transcribes:

There was so much constant abuse it seemed like it would never end. Many times I thought that when I died it would be because my husband killed me. I was afraid to have him arrested because I knew he wouldn’t stay in that long, and I thought that he would kill me when he got out. […] I just wanted the hell that my life had become to end.

For public policy to address this kind of terror systematically and preventatively, it needs to see such stories as reflecting common rather than isolated incidents that can simply be corrected by imprisoning individual offenders. Policymakers need a sense of a large-scale and meaningful pattern — one with a range of structural causes — and they look for this not in individual testimony but in aggregated statistics. Indicators make violence visible in ways that governments can understand. They also make visible the progress — or lack thereof — toward the reduction of systemic violence.

Merry offers an unparalleled account of the decades-long process whereby a range of experts and activists from various countries turned a heroic series of papers, reports, conferences, debates, statements, findings, and declarations into multiple indicators, including those now tracked by the UN Statistics Division and linked to the UN’s Sustainable Development Goal Target 5.2: “Eliminate all forms of violence against all women and girls in the public and private spheres, including trafficking and sexual and other types of exploitation.” Through her account, we are made privy to the way indicators emerge from tireless advocacy, intellectual rigor, personal determination, and collective patience during repeated meetings and interminable negotiations. The resulting indicators quantify, among other things, the proportion of women subjected to physical or sexual violence in a given country in the previous 12-month period (three percent in Germany, six percent in Norway, 7.3 percent in Gambia, 20 percent in Ghana, and so on). In general, collecting and analyzing the data and then making them public is a very big deal. It even suggests that humanity can in fact govern itself across international lines. In Merry’s ethnography, professional experts and non-governmental bodies show a collective intelligence that is often lacking in national political debate.

So perhaps metrics do strengthen democracy after all? It’s tempting to conclude by saying, “Yes, sometimes: it depends.” A simple solution might be to open up the process to outsiders and educate citizens and consumers about how the numbers work (this project is greatly aided by each of these books). Numbers are here to stay in mass societies, so critique should be the handmaiden of their widening use. But Merry joins O’Neil and Espeland and Sauder in rejecting this kind of reformism. She finds that inequalities of power and resources have too much influence over the shape and effects of the metrics. With regard to violence against women, important conceptual frameworks in play in the creation of the indicators — gender equality and human rights — were eclipsed by two better-funded and institutionally more powerful frameworks, those of law enforcement and of statistics themselves. Though the final indicators include a range of factors and causes, they downplay the deeper sources of violence against women — the maintenance of male dominance in societies across the world.

In Merry’s words, the UN “approved a set of indicators that located the problem of violence against women squarely within partner and nonpartner relationships rather than in any larger set of social inequalities or structures of gender.” The final UN indicators also narrow the forms of gendered violence: they “leav[e] out state violence by police and the military, sexual harassment, stalking, female feticide, violence against men, injury to pets and property, early and forced marriage, and many other forms of violence.” In addition, they downplay the psychological impact, including the effect of threats of future violence. Australia “did a survey that included questions about whether a woman changed her daily pattern of activities because of fear. Anxiety or fear was measured by its effects on activities such as work, social or leisure activities, childcare, and home security systems.” But such questions did not make it into the broader UN protocol, in part because they are considered too subjective or controversial.

This leads to a particularly subtle and difficult limit of the numerical: its aversion to the interpretive processes through which the complexities of everyday experiences are assessed. Physical and mental states, injuries and attitudes toward them, people in variable social positions always appear together, and their qualities need to be sorted out. In a point too easily missed, Merry writes, “violence against women is itself an interpretive category. Since the gender equality approach takes a contextual view of violence against women and sees the behavior in the context of relationships and wider attitudes of social tolerance, counting acts of violence alone provides inadequate information.” If the job is to understand the full ensemble of causes and effects, and thus to intervene most effectively, indicators are intrinsically inadequate.

All of these scholars are well aware of the value of numbers. Numbers allow for abstract picturing of groups, societies, and cities. They regularize anomalies and exceptions, and allow us access to invisible worlds, social and physical alike. Numbers support distributed cognition and collective intelligence. Both are desperately needed in a world damaged by human stupidity. But quantification in its many forms now operates within a complex metrics culture — a contradictory and contested battleground, as these three books explain. Together, they offer an understory that we could call metrics noir.

In the first place, numerical measurement can too readily take on an unquestioned objectivity. It’s an easy mistake to make, because scientists and other experts have a longstanding reputation for unbiased handling of facts, insured by methodological procedures not accessible to the layperson. This objectivity bias is hardened by the production of indicators via expert negotiations hidden from public view, which means that metrics aren’t seen as emerging from the intellectual compromises and culturally conditioned choices that go into their making. The public can remain blissfully ignorant of their baked-in assumptions — say, the idea that the poor are more likely than the middle class to commit crimes. Criticism is easily dismissed as resting on shaky subjective grounds.

Second, metrics culture reinforces the perceived inadequacy of qualitative expertise, of the “liberal professions” that rely on interpretive skills grounded in social, philosophical, and historical learning. If a dean can make promotion or funding decisions by looking at a dashboard of indicators that compare her faculty members to those across the campus and the country (grant dollar totals, prizes, publication rates, citation counts), then he or she need not weigh complex quasi-imponderables and judge the strange mixture of ingredients that make up careers and disciplines. Twenty years ago, Michael Power noted a subtle but determinate feature of the “audit society”: audit slowly weakens judgment, and management becomes a matter of applying formulae whose opacity supplies a false objectivity.

With indicators ascendant over judgment itself, and tied to complicated, obscure, or proprietary procedures, metrics can pacify the interpretive powers of the public and professionals alike. The subjects of assessment rarely interact with quantitative procedures and never demand their abolition. This is a third tendency of metrics culture. Merry discusses “data inertia,” and all these authors note the near-impossibility of putting a finished indicator back in the oven. Policymakers have no stomach for revising indicators beyond the routine tweaking of weightings one sees in U.S. News and similar rankings. Very few scholars analyze the politics of such interventions or detail the losses they create for institutions, scholars, or students. Understanding the history of indicator formation is a minority knowledge project whose negative implications can be brushed aside even when their validity is acknowledged. Although reformers demand that metrics be used only in context, in conjunction with other information, and in collaboration with those being evaluated, metrics weaken the validity of exactly the forms of knowledge that are meant to check them. We thus encounter a Foucauldian nightmare, in which critiques of the ranking system only serve to make it stronger.

Fourth, indicators help create the inequality they measure, while assuring their consumers that the inequality is a natural, preexisting fact. They do this by ignoring distinctive qualities that cannot be quantified and compared. For example, not only is a legal clinic that focuses on the problems faced by recovering opioid addicts not likely to be esteemed or even seen in standard rankings, but the training for such work will be devalued if it is not already a regular component of the top law programs — its very uniqueness will make it incomparable across programs. To put this two-stage process somewhat formally: the set of relevant qualities is narrowed to a common denominator associated with the top schools, and the quantified hierarchy that results then overwhelms the underlying particularities of each school. The gap between the indicators and the actual qualities of a given school is ignored in favor of the gaps among the various institutions. The dominant quality of each school becomes its place in the hierarchy.

The wider effect of all this is particularly damaging in education: ranking renders a large share of any sector — community colleges, chemistry doctoral programs, business schools — inferior to the top programs, and therefore implicitly defective. The deficiencies that rankings always create then justify unequal respect and, more importantly, unequal funding. Rankings undermine the general provision across institutions that created the famous quality of the US public university system, encouraging instead more investment at the top. The general effect is that the rich get richer, which is precisely what has happened in American higher education in the three decades since the U.S. News rankings first appeared. The rise of rankings didn’t cause the breakdown in public funding, but it has naturalized the inequality that results.

The good news, as these books show, is that numbers don’t need to be used as we use them now. But for real change to take place, the wider society has to become involved in the conversation. These books do an excellent job of helping make that happen.

¤

Christopher Newfield is a professor of English at the University of California, Santa Barbara. He is the author of The Great Mistake: How We Wrecked Public Universities and How We Can Fix Them (2016), among other books.

Heather Steffen is a postdoctoral scholar at the University of California, Santa Barbara. She is also part of a team studying undergraduate workers on American campuses, All Worked Up: A Project about Student Labor.