Does One Size Fit All?

In the 10th essay in the Legacies of Eugenics series, Jay S. Kaufman shows how the science of human body size is suffused with cultural assumptions.

Keep LARB paywall-free.


As a nonprofit publication, we depend on readers like you to keep us free. Through December 31, all donations will be matched up to $100,000.


This is the 10th installment in the Legacies of Eugenics series, which features essays by leading thinkers devoted to exploring the history of eugenics and the ways it shapes our present. The series is organized by Osagie K. Obasogie in collaboration with the Los Angeles Review of Books, and supported by the Center for Genetics and Society, the Othering & Belonging Institute, and Berkeley Public Health.


¤


HOW LARGE SHOULD members of our species be? Even before birth, many of us are assessed through ultrasounds to determine if we are the “right” size. The women who gestate us are weighed throughout pregnancy to determine if they are gaining the “right” amount of weight. Once we are born, growth charts track our progress toward full stature. Thereafter, our girth may be scrutinized relentlessly, with the balance scale being the clinical device we encounter—and likely resent—more than any other.


The eugenics movement was premised on the notion of amplifying favorable traits in the population, but this requires knowing which value of a trait is optimal. To this end, the early eugenicists became obsessed with measures of body size and proportion, as exemplified by the elaborate identification system developed by Alphonse Bertillon in the late 19th century to find the physical dimensions associated with criminality and other social pathologies. Countless metrics have been devised since then to track people’s proportions and compare them to supposedly normative ideals, including those still widely used today, such as the “body mass index” and the “waist to hip ratio.” Some of us may even be subjected to the indignity of the skinfold caliper test. But to what end? What are the bases for these devices and measurements? Do they really reveal who has the right to be called “normal” as opposed to “abnormal”? Can they reliably predict who will thrive or not? And the more fundamental question: Do these measures assume that humans are meant by evolutionary design to be of more or less similar size? Or conversely: Could a person’s racial identity dictate where and when “difference” becomes unhealthy “deviance”?


All organisms exhibit a range of healthy variation, but it can indeed be bad to be too big or too small. Early malnutrition, for instance, will reliably truncate life- and health spans. Sadly, almost a quarter of children worldwide fall short of their potential for growth and intellect, crippling their futures. At the other extreme, global obesity has tripled since 1975, generating millions of excess deaths and epidemic rates of diabetes. Every biological parameter, it turns out, comes with pathologies associated with too-muchness and too-littleness. The Goldilocks zone is best. Consider, say, an essential element like iron. It’s bad to have too little (anemia), but it’s also bad to have too much (hemochromatosis). And so it goes for all the essential components that make up living organisms.


¤


Early 19th-century mathematicians such as Carl Friedrich Gauss and Pierre-Simon Laplace documented the occurrence of specific bell-shaped distributions that arose from the interplay of many seemingly random processes. These were given a precise mathematical formulation familiar to us now as the “normal” distribution.


Figure 1. Adult heights (meters)


Human heights come very close to following a “normal” distribution in the mathematical sense, as expected from the myriad interactions of genes and environments—nearly 10,000 genetic variants influence height, and the number of environmental determinants is likely even larger. From the center and spread of this statistical distribution, we know not only the height of the average person but also the relative proportion of people found at every other value.


But this notion of averageness quickly starts to get tricky. In a malnourished population, the distribution is shifted considerably to the left. Figure 1 represents a human population under some kind of ideal environment, but travel to Guatemala or Bolivia and most people are considerably shorter. Might there be some natural variation between groups of people, even under ideal conditions? Perhaps yes, given that, even in rich countries with abundant nutrition, two types of people—men and women—have distinct, albeit largely overlapping, curves.


Figure 2. Adult heights in men and women (meters)


Women are slightly shorter than men, on average, although this is a statement about the groups that does not apply to individuals within those groups—any particular woman is still taller than many men. If we want to know how tall someone is relative to their population, we might compare women to women and men to men, because those subpopulations are centered in slightly different places. Does it follow that there are other “natural” differences like this?


Birthweights are remarkably “normal.” The center of the distribution in rich countries is around 3.6 kilograms (kg), and healthy births at term almost all fall between 2.2 kg and 5.0 kg. Birthweight is the best predictor of infant survival, with lowest risk around 3.6 kg and increasing as birthweights get too big or too small. The range below 2.5 kg is the most perilous, with infants collectively having a roughly 25 times higher risk of death compared to those who weigh more. The two main ways to get into this danger zone are growing too slowly in the womb and exiting it too early.


Intrauterine growth curves were flawed when first published in 1963 because they were based on births at each gestational week, and premature births are not representative of healthy pregnancies at the same point in gestation. With the advent of fetal ultrasounds, recommendations could finally be made about normative expectations for growth in utero. Sex-specific charts were later proposed because newborn girls have slightly more favorable health outcomes than boys despite being 100 grams (g) smaller on average. Nonetheless, in many validation studies, unisex charts turn out to be just as good as sex-specific charts for identifying pregnancies in need of intervention.


Of course, we know that, in the real world, there are profound group differences in birthweight, because there are profound differences in the environments in which women gestate their babies. Factors like poverty, nutrition, and behaviors (such as smoking) matter. Fetal growth charts are not meant to be descriptive, however, but prescriptive: they don’t aim to represent the weights of developing babies as they really occur, but rather as they “should” occur under “ideal” conditions. The prescription amounts to this: avoid being in the tail of the distribution where the bad outcomes are concentrated.


This implies that a child’s specific birthweight matters less than how far they are from the center of their distribution. But here’s the rub: if the average birthweight in India is 2.7 kg, then being below 2.5 kg might not be the red flag it would be in Norway or Canada. With this conundrum in mind, obstetrician Jason Gardosi proposed in 1995 that fetal growth recommendations should be tailored to national origin and other maternal characteristics. His 2009 model for optimal weight-by-gestational-age tracking in the United States included the mother’s height and weight, number of previous pregnancies, sex of the baby, and race of the mother. Yet it turned out that these variables collectively explained only about one-quarter of the total variation in birthweight.


Why was race included? Gardosi deployed a statistical model from a previously collected dataset, and its variables happened to include race and ethnicity, as is indeed typical in American datasets. He retained a subset of variables that were found to be most predictive, including African American race and Hispanic ethnicity. In multiethnic societies like the United States, racial and ethnic groups do indeed have different birthweight distributions, as well as differences in how low birthweight impacts infant mortality. African American infants, for example, have almost double the risk of low birthweight, and more than double the risk of infant mortality, compared to white infants. With a birthweight distribution shifted to the left by close to 200 g, should African American infants be assessed based on their own race-specific curve, or by a curve for all American or even global births?


This decision hinges on a nature-versus-nurture argument: would a truly “healthy” environment result in all babies being on the same curve, or are populations different in some intrinsic way? We can’t ever know the answer to this question, because no American infant has ever been born in an environment divested of social distinctions based on skin color. There are clues, however. A 1997 paper shows that the birthweights of infants born to African immigrants to the United States followed the white distribution, not the African American distribution. This immigrant advantage dissipates in the next generation with assimilation to the distinctive American social environment.


It is no coincidence that adult heights and birthweights are both “normally” distributed, taking on the exact same mathematically formulated bell shape. Indeed, any process that involves averaging across many components will end up with the same distribution, a pillar of probability theory known as “the central limit theorem.” The eugenicist polymath Sir Francis Galton described this “wonderful form of cosmic order” in mystical and reverent terms: “Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along.” This cacophony of errors—some factors nudging the birthweight up and others down—suggests that stratifying according to a few chosen variables might be futile. Of the thousands of influences on birthweight, we clearly know only a few. Do we gain much by trying to stratify? At least in favorable environments, the answer seems to be no. Epidemiologists have demonstrated that, in practice, the customized predictions are no more useful for clinical diagnosis of growth retardation than a simple global average.


This is something of a paradox. How could factors that were significantly predictive in Gardosi’s statistical model be uninformative in clinical practice? The answer is subtle, rooted in the observation made above about female-versus-male heights: the “significant” difference is a characteristic of the group, not of the individual. But clinicians need to make decisions about the singular patient in front of them, not about the average of a large group of patients.


The average heights for men and women shown above were 1.75 meters and 1.65 meters (m), but very few men are exactly 1.75 m and very few women exactly 1.65 m. The properties of the “normal” distribution allow us to infer that 95 percent of all men fall roughly between 1.55 m and 1.95 m, and likewise 95 percent of women between 1.45 m and 1.85 m. A woman whose height is the average height is therefore taller than a great many men, and so if you select a random person of each sex, you have very little basis for guessing which one is taller. Whenever the spread of values around the average is large with respect to the difference between the averages, the group indicator will be a poor predictor for any individual. Race appears to be a significant predictor in statistical models, but its use is almost always invalidated by this simple relationship: the distributions are so overlapping that race is largely uninformative for individual prediction. Why then does the medical use of race persist?


Americans are accustomed to regarding race as a primary axis of identity. Consider a typical description of a medical patient or criminal suspect as, for example, “a 29-year-old Black male,” as though age, race, and sex are the three factors that tell you all you need to know about a person. This reflexive reliance on race, baked into American society from the Three-fifths Compromise to contemporary racial gerrymandering of congressional districts, is not, however, ubiquitous. Canada, for instance, does not routinely use it in either its medical or its economic statistics, and in France, the government is strictly forbidden from asking about race in any context.


It’s then perhaps hardly surprising that international researchers took a decidedly different approach to the question of customized fetal growth charts. A group of collaborators based at Oxford University undertook a project in 2009 that they named INTERGROWTH-21st. Its aim was to generate globally prescriptive growth standards. To this end, they recruited only from pregnancies deemed healthiest among diverse ancestries, with the expectation that babies of all races have equal capacity for optimal health and size.


To confirm that all fetal and newborn measurements could reasonably be pooled across sites, they compared the difference between the average of each site and the average of sites combined in order to verify empirically that these differences were less than a prespecified fraction of the width of the normal distribution. The math worked, which then enabled them in 2014 to declare a single global reference. Babies, their analysis showed, are meant to grow at similar rates and are born with weights that follow the same bell-shaped curve. Across all the various measurements, the difference attributable to the study site was never more than about three percent of the total variability, demonstrating that almost all natural variability in the size of babies exists within every population. This result implies that, although there are smaller and larger babies, no races of people are systematically smaller or larger at birth, once social and environmental disparities are equalized. Of course, in the real world, we know very well that populations with systematically lower birthweights exist, but the aim of INTERGROWTH-21st was to demonstrate that their existence is a consequence of social inequality and not of biology.


While the international investigators of INTERGROWTH-21st were having their universalist epiphany, however, the Americans were still seeing things quite differently. The National Institute of Child Health and Human Development (NICHD), a branch of the National Institutes of Health (NIH), published new standards for fetal growth in 2015, with separate standards for white, Black, Hispanic, and Asian or Pacific Islander pregnancies. It recruited from apparently healthy pregnancies but without the socioeconomic standardization used by the INTERGROWTH-21st study, and it reported significant differences in the growth measures across the four racial/ethnic groups. The NIH authors buttressed their racial stratification by noting that if the white standard were applied to other groups, the proportion of growth-restricted babies would be overestimated. For example, after 18 weeks of gestation, over 10 percent of Black fetuses would be determined to be in the lowest five percent if undifferentiated standards were applied.


The disagreement between the United States’ NIH model and the INTERGROWTH-21st model does not rest on whether the non-white US babies are smaller than white ones but on whether they were supposed to be smaller.


Recall that the chart is meant to be prescriptive, not descriptive, and that the American researchers did not recruit only from the healthiest environments but included those compromised by lower rates of education and income, and by higher rates of stress, violence, or discrimination. Indeed, the racial and ethnic groups in the NIH study had dramatically different socioeconomic profiles. Incidence of low income was 10 percent among the white mothers and 68 percent among the Black ones, for example. Under such disparate social environments, is it reasonable to define non-white infants as inherently smaller and less worthy of clinical intervention?


Tolstoy’s Anna Karenina principle—all happy families are alike, but each unhappy one is unhappy in its own way—applies remarkably well to fetal growth and birthweight: an uncountable number of steps must happen perfectly to enable successful gestation and growth, and the process can be derailed in uncountable ways. The subtle interplay of myriad tiny variations allows for the normal curve that so dazzled Francis Galton—and for the fact that a broad indicator of social advantage like maternal race can be somewhat informative about a population-wide fetal growth trajectory, serving as a simple proxy for all the factors that correlate with race in the United States, from neighborhood and diet to occupation and education, all of these tracking across multiple generations. One could try to measure and account for each factor, and indeed some scholars have defended the NIH race stratification by noting that adjustment for various socioeconomic indicators does not completely erase the observed differences. But these adjustments are necessarily crude and incomplete.


The decision to stratify the NIH fetal growth recommendations by race and ethnicity is arbitrary in the sense that other variables can account for similar observed differences—like, say, maternal smoking, or location. For example, in 2023, 12.5 percent of newborns in Mississippi weighed less than 2.5 kg, compared to 6.8 percent of births in New Hampshire. But race, of course, has a special place in the hearts of Americans, and thus an enduring presence in American social and scientific policies.


The downsides of establishing race-specific standards should now be obvious. First, they reify socially defined categories, falsely implying that they are “natural.” This encourages a steady stream of junk science purporting to find genetic explanations for the observed inequalities. Racialized growth standards also present a problem for groups that are not represented by the four categories, such as Native Americans. Indeed, obstetricians from New Mexico reported in 2018 that they tried to apply the new NICHD standards to Native American mothers and found the predictions inferior to those from the nonracialized standards established in 1991.


Racialized guidelines also create a dilemma when the government decides to change its racial classification scheme, as it did, for example, in 2024 when Middle Eastern and North African people were split off from whites to form their own unique race. This left eight million Americans without a fetal growth chart despite having had one previously. And once you commit to race-specific fetal growth standards, where do you stop? It was recently proposed that Chinese, Asian Indian, Japanese, Korean, Filipino, and Vietnamese Americans should all get their own growth charts too, rather than being lumped together into the impossibly broad Asian or Pacific Islander category.


Racialized growth recommendations present further challenges for women who might not identify with only a single group. The US Census Bureau changed its rules in 2000 to allow respondents to check more than one race box on the decennial census, but the NICHD fetal growth guidelines had no such solution for people with complex identities. In response to these and other criticisms, the NICHD finally, in 2022, added an alternative nonracialized growth chart, which they referred to as a “unified standard.”


¤


Just as the science of birthweight is suffused with cultural detritus, so too is the science of adult weight. Too much weight, defined as obesity, is considered to be a public health problem in need of prescriptive standards. Expensive and complicated techniques—such as weighing people underwater (yes, seriously!)—have been developed to tell us how much of a given human body is made up of fat. It would be impractical to make these precise measures in routine clinical practice, however, and so rough proxies are used.


The most common proxy is an index devised two centuries ago by the Belgian mathematician Adolphe Quetelet when he sought to describe the ideal specimen—the “average man”—a concept that would later serve as the ideological kindling for the eugenics movement. To derive a consistent weight-for-height index, Quetelet knew that the volume of a body would increase more than linearly with an increase in height, and so the denominator weight term would have to be raised to a higher power. Quetelet had originally proposed 2.5 for this exponent on height, but the American epidemiologist Ancel Keys, reintroducing this index in the 1970s, thought that this would be too complicated for doctors to calculate, so he rounded the exponent to the integer 2. We now refer to this measure as the “Body Mass Index” (BMI), measured as kilograms divided by meters squared (㎡). For example, I am 177 cm and 78 kg, so my BMI is about 25 kg/㎡.


It’s easy to measure heights and weights, calculate BMIs, and describe these values. It’s much harder to be prescriptive and tell people what BMI they should have. People tend to lose weight when they get sick, which makes higher mortality at lower BMI values very hard to interpret—is someone sick because they are too thin or are they too thin because they are sick? Moreover, smoking suppresses appetite and therefore lowers weight, but it also makes people die sooner from cancer and heart disease. Despite thousands of papers on the topic, we still have very little scientific agreement on the risk associated with being at one BMI value versus another.


In the late 1990s, the US government settled on a simple prescriptive schema, with BMIs from 18.5 to less than 25 kg/㎡ decreed as “normal,” from 25 to less than 30 kg/㎡ defined as “overweight,” and 30 kg/㎡ or higher as “obese.” Why these cutoffs of 25 and 30? It seems for no other reason than because they are round numbers. According to this scheme, I am overweight, but so are Matt Damon, Tom Hanks, Eddie Murphy, and just about every other guy I know. In fact, research from the US Centers for Disease Control and Prevention confirms that “overweight” people have the lowest mortality rate, making it hard to understand why we are not “normal.”


Just as ideal infant weights have been interpreted through a racial lens, so have adult weights. The United States and many international agencies such as the World Health Organization recommend the same 25 and 30 kg/㎡ cutoff points for defining men and women as overweight and obese. Plenty of researchers and agencies, however, think these prescriptive cut points should be tailored to racial and ethnic variations in the rates of metabolic diseases such as diabetes. Based on an analysis of over one million participants in the British National Health Service, for example, physician Rishi Caleyachetty reported that, in order to flag the same level of diabetes risk faced by white Britons with BMIs of 30 kg/㎡, the obesity cut point for British South Asians should be set to 23.9 kg/㎡, and to 28.1 kg/㎡ for Black Britons.


If you’ve ever seen someone with a BMI of 23.9 kg/㎡, you’d agree that it’s an abuse of the English word to call that person “obese,” regardless of their diabetes risk. For example, the American actor Leonardo DiCaprio is about 183 cm tall and weighs 82 kg, giving him a BMI of about 24 kg/㎡. That would be enough to qualify him as “obese” if he were South Asian but doesn’t get him across the “overweight” threshold as a white person.


Screening high-risk populations at different cut points is, however, not unreasonable. The American Cancer Society recommends that most men begin prostate cancer screening at age 50, but recommends age 45 for Black men. This is a straightforward reflection of their higher burden of disease. Using BMI as a screen for diabetes could in theory likewise motivate a lower cut point for South Asians in the United Kingdom, who do indeed have higher average risk. But the observed variation in this outcome is not primarily racial; Caribbean-born Black individuals have a much higher rate than those born in Africa, and those born in the UK have almost exactly the same risk as whites. Similarly wide variation occurs in South Asians; Bangladeshis have higher risk than Pakistanis, who have higher risk than Indians. Variation among genetically similar populations points strongly to social etiologies, for which ethnicity is serving as a proxy, and which therefore will not be stable over time or generalizable across social contexts.


Addressing the strangeness of apparently thin people being designated as “obese,” the Endocrine Society of India defined a new pathology in South Asians, which they called “normal weight obesity.” Other authors refer to this as “thin-fat obesity.” But South Asians with a BMI of 24 kg/㎡ are only “obese” in the sense that they belong to a population with a high prevalence of a disease for which one of the risk factors is obesity but which is also affected by diet, physical activity, pollution, and countless other social, behavioral, and environmental factors, many of which are not yet known.


Think of it this way: cigarette smoking is the most important cause of lung cancer, but many cases occur in nonsmokers due to exposure to radon. Consider a population with low levels of smoking but high risk of lung cancer due to radon exposure. Should I declare that these people suffer from “smokeless smoking”? That is the logic of the “normal weight obesity” term applied to South Asians.


These race-specific obesity guidelines have caught on, albeit inconsistently. Britain’s National Institute for Health and Care Excellence (NICE) defines policy for its National Health Service. It promulgates the 25 and 30 kg/㎡ overweight and obesity cut points for the general population, while adding that people with Asian, Middle Eastern, Black African, or African Caribbean family background are declared overweight at 23 kg/㎡ and obese at 27.5 kg/㎡. Many Asian nations have likewise lowered their obesity definitions. Japan, for example, considers anyone above 25 kg/㎡ to be obese, as do South Korea and India. So it may come as an interesting surprise that one place where racialized obesity definitions have been resisted is the United States.


An explanation may be that the literature on the subject has thus far focused on South Asians in the UK and is informed by the British colonial experience. South Asians are generally not the same socioeconomically disadvantaged minority group in the United States as in the United Kingdom. Indeed, in the US, a much higher risk of diabetes occurs in Asian subgroups with the lowest economic profiles, most notably Filipinos, but risk is still far greater among rural African Americans and Native Americans.


¤


It’s ironic that the United Kingdom so adamantly racializes its obesity guidelines, while at the same time summarily rejecting any racialization of fetal growth curves, and the United States does exactly the opposite. Both countries cite evidence to support their contradictory views on when to lump and when to split. They apparently view the evidence—and the social context—differently. It’s notable that the racialized fetal growth curves established by the US government were designed to reduce the labeling of ethnic minority babies as pathological. The 2015 NIH paper warns that if race-specific curves are not adopted, the consequence will be that more than 10 percent of all Asian, Hispanic, and Black babies, by the 25th week of pregnancy, would be in the lowest five percent of weights, which is considered a trigger for intervention. They therefore argued that racial minority pregnancies need their own (lower) standards to avoid triggering this clinical red flag. The racialized obesity cut points in the UK have the exact opposite implication, increasing the proportion of cases defined as pathological in high-burden populations, and therefore motivating more, rather than less, clinical intervention.


What both examples have in common, however, is their reflexive decision to divide the population by race and ethnicity rather than by other potential risk stratifications—such as, for instance, the socioeconomic conditions that so obviously drive both fetal growth retardation and excess adiposity in adults. Disparities by educational attainment are at least as large as they are for race and ethnicity, but no professional body has ever promoted guidelines for healthy weight stratified by years of completed schooling. Locating pathology in excess deviation from the normal distribution is eminently reasonable. Infant mortality does indeed peak in the tails of the weight distribution, as does the mortality rate for adults. But the act of parsing by race and ethnicity, and thus reorienting to conditional deviations, complicates medicine with questionable social ideologies.


Both examples apply racial stratification to the question of optimizing health, and both arguably do an injustice to racial and ethnic minority populations but in opposite ways. In the fetal growth curve example, the fact that it’s harmful to be too small is accepted as biologically factual, but the race-specific curves are in effect allowing certain populations to be too small without it being declared abnormal. Their disadvantage is instead being represented as intrinsic, so that their historical marginalization is redefined as “normal.” Race-specific fetal growth curves are, then, a kind of statistical adjustment, removing from view the baseline inequality baked in at the group level. Each pregnancy is evaluated around its race-specific expectation. The normative expectation for minority children is lower by decree. This amounts to declaring that the apple doesn’t fall far from the tree, with these babies coming from their own distinct trees, planted in more rocky soil, so that it is “normal” for the fruit of these trees to be diminished.


The race-specific obesity cut points convey a completely different message. More stringent criteria are applied to social groups with higher average disease risk, which are then used to justify more onerous restrictions on individuals from those groups. Whatever sociohistorical process leads to their group having higher diabetes risk, it is left to the individual to compensate for that legacy with greater discipline and asceticism. The average BMI for a UK man is around 28 kg/㎡, so in effect Asian Britons are asked to push themselves further into the left tail of the distribution in order to become more abnormal. The center of the distribution becomes a kind of privileged (white) space that some groups are denied because of racial frailty, which they must overcome through discipline and vigilance. We know perfectly well that diabetes risk is multifactorial, involving diet quality, air pollution, and physical activity, factors that are, for systemic reasons, less salubrious in minority neighborhoods. As usual, immigrant groups must work harder to overcome their structural disadvantages, and the lower BMI targets for these groups represent that individualized risk compensation.


Despite these opposing implications, racialized fetal growth curves and obesity cut points have something else in common—namely, little to no evidence suggesting that either one is more effective than nonracialized alternatives. Head-to-head comparisons of INTERGROWTH-21st with NICHD fetal growth charts show that both are equally poor predictors of underlying pathology, which renders the extra race adjustment meaningless at the individual level. For obesity cut points, no evidence thus far suggests that any categorization has a population impact, whether racialized or not.


Where they fail as medical technologies, however, they may succeed as social messaging. The word “normal” for the bell-shaped distribution is no accident, and people naturally wonder whether they are “normal” or outliers. We have evolved to conform to normative expectations, but the two examples of public health guidance that I’ve reviewed here promote the powerful ideological message that being normal or abnormal ultimately depends on one’s race. That premise should be regarded as the truly malignant pathology at play here.


¤


Featured image: Mold of the Torso of a Male with a Distended Stomach, ca. 500–400 BC. The J. Paul Getty Museum (76.AD.112). CC0, getty.edu. Accessed September 24, 2025.

LARB Contributor

Jay S. Kaufman is a professor of epidemiology, biostatistics, and occupational health at McGill University. His forthcoming book, The Race Variable: How Statistical Practices Reinforce Inequality, will be published by Columbia University Press in December 2025.

Share

LARB Staff Recommendations