Keeping Humans in the Loop: On Hilke Schellmann’s “The Algorithm”

By Evan SelingerMay 31, 2024

Keeping Humans in the Loop: On Hilke Schellmann’s “The Algorithm”

The Algorithm: How AI Decides Who Gets Hired, Monitored, Promoted, and Fired and Why We Need to Fight Back Now by Hilke Schellmann

EARLIER THIS YEAR, during the winter and spring of 2024, I served on an academic hiring committee for the department of philosophy at my university. We reviewed a daunting stack of approximately 200 job applications. Each one contained a massive amount of material—from carefully crafted statements about research, teaching, and diversity to writing samples, course materials, and letters of recommendation. We knew we were making and breaking lives and careers and that the stakes were extraordinarily high for these candidates.

Our university administration had instituted a dizzying array of rules designed—in theory—to promote fairness through consistency. One rule required committee members to ask every applicant the same preapproved interview questions. For example, during the first round of interviews, we were to follow up a question about the candidate’s research with one about teaching. On paper, this made sense. But conversations, like life itself, don’t always go as predicted. Some candidates naturally segued from research into teaching—they didn’t need prompting. When this happened, we still had to ask the teaching question in its canned form, and we did so sheepishly. Repeatedly, I had to suppress the urge to ask a spontaneous question—something that could liberate us from the formulaic back-and-forth and offer a glimpse into aspects of the candidate that weren’t shielded by the polished facade that develops when you practice well-rehearsed answers to obvious questions.

Perhaps I’d ask a candidate who hasn’t spent much time teaching science and engineering students, “How are you going to get STEM majors who are working part-time, crushed with homework, and trying to develop difficult, cutting-edge technical skills to care enough about your philosophy class to do the assigned reading?” Maybe I’d ask someone who came across as a bit highbrow, “How are you going to connect with students who don’t have any philosophical background and take your upper-level course because it meets a program requirement?” There are so many relevant yet personalized possibilities. But they were all foreclosed, and I couldn’t help but wonder what is lost by rigidly embracing standardization. 

While I was required to merely mimic a robot, job applicants are increasingly required to talk to actual ones. Headlines like “The Interviewer Sounded Like Siri” and “Your Next Job Interview Could Be with a Bot” aren’t metaphorical. A Resume Builder survey found that over 40 percent of companies will adopt AI interviews by the end of this year. The market research firm Gartner estimates that “[b]etween 20 percent and 50 percent of organizations globally are using AI in some part of the hiring process.” Even job-search platforms, like LinkedIn, Monster, Indeed, and ZipRecruiter, “use language-processing AI tools to filter applicants.”

In AI interviews, bots manage questions (posed in writing or through prerecorded video) or ask them, and humans respond verbally. The software records what candidates say, converts their speech into text, and does something remarkable that would have seemed like science fiction not long ago: it assesses the responses, possibly determining whether to eliminate an applicant. The Resume Builder survey found that 15 percent of companies expect to use AI to “make decisions on candidates without any human input.”

Hilke Schellmann, an Emmy-winning journalism professor at New York University, emphatically argues that companies are giving AI too much control. In her new book The Algorithm: How AI Decides Who Gets Hired, Monitored, Promoted, and Fired and Why We Need to Fight Back Now, she walks the reader through the devastating consequences of current trends and explains why there’s no easy path forward. I’ll delve here into Schellmann’s chapters that explore AI’s impact on hiring practices and related workplace issues, setting aside the book’s familiar coverage of workplace surveillance and health data.

A 2023 Pew Research Center survey shows that most Americans (71 percent) strongly oppose the idea of AI making final hiring decisions. Some (41 percent) don’t even want AI reviewing job applications. To gauge your own reaction, watch this short video that quickly walks you through a new AI interview program. Disturbingly, the first thing candidates are forced to do is to choose who, or rather what, will conduct their interview—selecting among avatars designed to represent different genders and races. Many candidates will likely see this choice as a bias test rather than a friendly opportunity for customization. Given some unfair assessments AI systems have made during interviews, this would not be a paranoid assumption. Furthermore, even though the AI doesn’t grade candidates on a pass-or-fail basis, the very fact that companies are making people interview with machines is enough to trigger passionate claims of feeling dehumanized.

Scary as AI-driven changes may be, the fact is that the hiring process does need to evolve. We know that traditional approaches are inadequate; for example, they don’t address bias because they are too unstructured and place too much value on impressionistic feelings. When someone relies on rules of thumb to review hundreds or thousands of résumés by a tight deadline, applicants will almost by definition be judged unfairly. The reviewer may do somewhat better by sorting them according to explicit and numerically rankable criteria that are targeted to match a well-worded job ad. In principle, an AI might be able to follow this protocol better than a human being: it won’t speed-read a résumé because it’s hungry, tired, or distracted, and it may, depending on its programming, exhibit less social prejudice than most humans would. An AI can handle volume too—which, again, is a good thing in principle, since layoffs across prominent sectors have led to ever more applicants. Remote work also expands searches, and so do platforms like LinkedIn. And so does generative AI, because it can be used to rapidly write résumés and cover letters, which means that applicants have little to lose with long shots. A company like Google receives approximately three million applications annually. No team of humans can sort through that many applications, but an AI can do so easily.

And so there is indeed something to be said for using AI. By getting into the nitty-gritty of how prominent AI platforms work, however, Schellmann uncovers its darker side. One example, told with gonzo-journalistic flair, involves her testing the myInterview platform—a system that has processed over 3.4 million interviews. At the time of her writing, job applicants accessed myInterview on a computer, phone, or tablet and responded to prerecorded interview questions; then, an AI analyzed their verbal answers, scrutinizing “the intonation of an applicant’s voice and the words they say.” Needing only “thirty seconds of a candidate speaking,” the AI generates a “match score” that indicates “how good a fit the candidate is for the role” and provides information like a “five-factor personality score” that predicts how “conscientious” and “innovative” the person will be.

Schellmann began her experiment by configuring the technology to screen applications for a hypothetical “office manager/researcher” position. Then, to test the system, she logged on, pretending to be a candidate. The AI judged her answers to be an 83 percent match with the ideal hire. Trying the software again, Schellmann switched approaches and answered the questions for the same job as if she were applying for a position in journalism. Despite emphasizing things like her journalism degree from Columbia University, she didn’t tank; the new match was 82 percent. For a third attempt, Schellmann embraced absurdity, repeating the mantra “I love teamwork.” Shockingly, the score only marginally decreased to a 71 percent match.

Schellmann pushed things even further during yet another attempt, answering the questions by reading something irrelevant (the “Wikipedia entry for psychometrics”) in a foreign language (German). The software transcribed the responses into “gibberish” yet awarded Schellmann a 73 percent match. Baffled by the outcome, she asked a graduate student to read the Wikipedia passage in Chinese. Astonishingly, the score rose to an 80 percent match. For her final experiment, Schellmann used an AI voice generator. The system failed to detect cheating, and the fakery paid off: her score went up to a 79 percent match.

When I visited myInterview’s website after reading Schellmann’s book, I found no indication of whether the company’s AI still analyzes voice pitch and tone. Like many other AI hiring platforms, it only vaguely describes its services. Apparently, myInterview provides just enough titillating information to motivate you to sign up for a product demonstration in which, presumably, they’ll share more information about what their technology actually does. Undeterred, I inquired, using the customer chat feature. Faced with my persistent questioning, a sales representative eventually said that the company doesn’t use voice analysis to infer characteristics like enthusiasm and extroversion. I wouldn’t be surprised if the fallout from Schellmann’s writing prompted this about-face.

MyInterview’s head-scratching issues aren’t unique. Similar concerns have plagued other prominent AI hiring platforms. Take HireVue, a leading AI and human resource management company that has provided services to corporations like Hilton, Delta Air Lines, and Unilever. The company had used AI to analyze aspects of speech such as “variation in tone or pauses.” But after receiving substantial criticism, it eventually stopped this practice in 2021. Indeed, the pressure became so intense that HireVue begrudgingly acknowledged that its “internal research” had determined that voice analysis didn’t offer “much additional predictive value.” Compounding the problem, HireVue only stopped conducting speech input analysis after receiving bad publicity for analyzing faces with computer vision. The PR fiasco was fueled by Schellmann’s skeptical coverage with Jason Bellini in The Wall Street Journal and the Electronic Privacy Information Center’s complaint to the Federal Trade Commission that alleged HireVue provided unfair and deceptive services.

When HireVue’s AI analyzed faces, the company assumed that algorithmically identified facial movements and expressions—such as whether someone is smiling or frowning—could be interpreted as credible evidence of a candidate’s aptitude, emotional intelligence, and fit for a position. For example, the software might assess a candidate’s facial movements to infer “how excited someone seems about a certain work task or how they would behave around angry customers.” But like speech-input analysis, facial characterization lacks scientific credibility.

Lisa Feldman Barrett, an expert on emotion and facial expressions, has been a particularly vocal critic of the endeavor. In a compelling report, legal analysts Luke Stark and Jevan Hutson draw on Barrett’s research to conclude that HireVue’s fundamental mistake derived from promoting “physiognomic AI.” Schellmann agrees.

Physiognomic AI is pseudoscience. It has a disturbing historical legacy that infers character and mental qualities from physical features and creates hierarchies based on the composition of an individual’s body. While various ancient cultures promoted the folk belief that character is inscribed on your face, infamous modern figures have tried to give the view a scientific veneer. For example, Stark and Hutson claim that HireVue’s methodology resembles 19th-century German physiologist Franz Josef Gall’s spurious claims about “cranioscopy.” Gall’s theory, later called phrenology, held that cranial bumps denote brain size, mental abilities, and character traits. Popular with eugenicists, such beliefs disproportionately harmed vulnerable and marginalized groups.

Another issue is that algorithmic facial analysis assumes a simplistic and universal correspondence between physical appearance and emotional states. In reality, however, the meaning of facial gestures is often context-dependent. For example, looking at how friendly someone appears in a brief interview might tell you less about their general demeanor or daily work persona than about the game face they can put on for a quick video recording. Furthermore, individual and group variations that the AI might not factor in can significantly impact people’s behavior. Consider eye contact. Introverted, extroverted, neurodivergent, and neurotypical people can all respond to such contact differently, and yet each of those ways might be “professional.” And this says nothing of cultural differences. Finally, no credible scientific theories offer a compelling explanation of how something like personality, which is relatively stable yet still dispositional, manifests physically in consistent, universal facial expressions or movements.

Some insiders are aware of the various issues exposed by Schellmann. A Harvard Business School study reported “that 88 [percent] of executives know their AI tools screen out qualified candidates but continue to use them anyway because they’re cost-effective.” It would seem that far too many companies are willing to sacrifice fairness, and possibly lose out on top talent, to minimize expenses. So, what should be done? Several options can be eliminated—for instance, AI analysis of speech inputs (e.g., vocal tone and pitch) and faces. There are no viable ways to use technology or policy to fix such tools, so they ought to be prohibited outright. But since US governance makes it exceptionally difficult to create national technology bans, we’ve thus far had to settle for mediocre state initiatives, such as those in Illinois and Maryland, that require candidates to consent to have algorithms evaluate them during video interviews. These don’t go nearly far enough. As privacy theorists have long pointed out, people are wary of withholding consent when they lack bargaining power and worry about missing out on meaningful opportunities.

Less egregiously shocking applications of AI in hiring will require some form of intervention too, even as no one solution will be able to address all problems. The major issues include the need for more transparency, the difficulty candidates and employees have legally proving they’ve been mistreated by a company’s AI, the use of scientifically questionable methods (such as personality tests), and the absence of uniform approaches to using AI systems. Again, some companies keep humans in the loop while others don’t. Even when humans review AI outputs, they can still be influenced by automation bias and may be unprepared to critically assess the black-boxed processes that generate scores. The situation becomes even more complicated when considering vendors’ eagerness to show off research that allegedly validates their products. They often provide evidence that looks compelling, such as favorable case studies; however, as Schellmann notes, such cherry-picked research may be more self-serving than credible.

Systematic auditing is one tool that ought to be deployed to move the needle towards fairness, reliability, and legal compliance. As Brenda Leong, Albert Cahn, and I note in “AI Audits: Who, When, How … Or Even If?,” although there are many ways to operationalize audits, we should understand AI audits as a narrowly technical operation. This quantitative assessment tests whether a system “complies or fails to comply with relevant ‘rules or laws’” concerning “fairness or equity for identified categories or demographics.”

Obviously, an AI system should flunk if the audit shows it is making biased decisions, whether that bias occurs in screening résumés or in determining who will be targeted by its job ads. Many AI tools that use games to claim insight into an applicant’s abilities and personality are poorly designed and should likewise be flagged by an audit. The same standards should also apply to AI systems that identify employees for upskilling, claim to detect burnout, predict turnover risk, or influence promotions and terminations. An AI model audit should—ideally—be refined enough to capture unacceptable bias in lower-risk operations, such as analyzing the format and language of job postings, streamlining the acceptance and onboarding process, and analyzing crucial aspects of the hiring process to ensure compliance. (Unfortunately, if we’re being realistic, this goal is exceptionally difficult to mandate, given the time and resources required to achieve it. That’s why most AI governance frameworks and laws identify tiers of risk and don’t place audit requirements around lower-risk situations.)

But even in cases where the law recognizes the need for AI audits, providing adequate and comprehensive ones is incredibly difficult. For example, some audits may not scrutinize training data, while others dubbed as “independent” by companies may be compromised by corporate influence. Additionally, even when audits examine for essential biases, such as discrimination against women, they can fail to be sufficiently intersectional and omit possibilities like discrimination against disabled women or Black disabled women. Finally, no matter how well audits are structured, they’re toothless without clear enforcement and penalties.

Schellmann cites law professor Frank Pasquale’s proposal to legally empower the Equal Employment Opportunity Commission (EEOC) to conduct “double or triple-blind tests” that audit all AI hiring tools before vendors make them available to companies. This preemptive regulation, akin to the FDA approval process for new medications, would involve licensing products with the aim of ensuring fairness and reliability.

Pasquale’s suggestion is bold but impractical, however. It doesn’t fully account for the fact that AI models constantly evolve. Because these models are continually updated and refined based on new data and algorithmic improvements, their performance and biases can change over time. This means that any approval granted by the EEOC would have a limited shelf life, and vendors would need to resubmit their tools for evaluation at least every year, possibly more often when major upgrades to models or data occur, based on emerging industry best practices. Furthermore, implementing such a comprehensive testing and approval process would require the government to provide a significant and costly workforce, which is highly unlikely given the current state of politics.

So, what’s the best path forward? Unfortunately, there are no easy answers. Policy often lags behind disruptive technology, and unsurprisingly, AI audits, like the one recently mandated in New York City, are still in their early stages. It will be a while before they’re as useful as well-established measures such as, say, financial audits.

Another complication is that moving ahead requires committed collaboration among diverse stakeholders. To create clear, fair, and effective guidelines and standards, policymakers, regulators, vendors, employers, subject-matter experts, civil society organizations, and impacted communities will need to have difficult and ongoing conversations about issues such as what should be audited, what metrics and benchmarks should be used, who should count as an independent auditor, how to meaningfully create and disseminate transparency reports, and what timetables ought to determine the administrations of audits.

Schellmann’s The Algorithm is a wake-up call. The stakes could not be greater. When grappling with the question of whether rigid standardization through AI comes at too high a cost, we must carefully weigh the trade-offs. Indeed, as my own experience on the hiring committee illustrates, even non-AI attempts to standardize hiring can become so rigid that they preclude more holistic assessments and humanizing interactions—at least during the early stages of the process. Navigating the future of hiring in an AI-driven world requires figuring out how to balance practical limitations, a drive for fairness, and a commitment to valuing each candidate’s unique qualities and potential. It’s a daunting challenge, but one we must confront to create a deeply human future of work.

LARB Contributor

Evan Selinger (@evanselinger) is a professor of philosophy at Rochester Institute of Technology.

Share

LARB Staff Recommendations

Did you know LARB is a reader-supported nonprofit?


LARB publishes daily without a paywall as part of our mission to make rigorous, incisive, and engaging writing on every aspect of literature, culture, and the arts freely accessible to the public. Help us continue this work with your tax-deductible donation today!