DATA. I KNOW THERE’S a lot of it around. But do I really need it? Especially if I’m a literary scholar of the old-fashioned, ruminative type? Readers like me cling to artful, poetic texts as a refuge: an antidote to information overload, technological distraction, and the hegemony of instrumental reason. Can a database help me understand a book?
Daniel Shore’s new book says it can. On his (largely persuasive) account, even a traditional humanist critic can use new search tools and datasets to become better at interpreting literary forms and their cultural history. When used prosthetically and heuristically — as aids to discovery — these tools help deepen our appreciation of verbal artifacts. So data can serve our purposes even when statistics, bar charts, and scatterplots leave us cold, and when the spirit of our readings remains defiantly antique and analog.
Part of the appeal of Cyberformalism: Histories of Linguistic Forms in the Digital Archive lies in Shore’s care for the “digital humanities” as a human enterprise, not data-driven but merely data-assisted. This approach to big data is not that of the “quantitative formalists” and others who, over the past decade, have disrupted conventional literary studies by importing statistical procedures and models of inquiry from data science proper. Instead, Shore shows humanists how to extend the reach of a technique they already use as a matter of course: rummaging in literary databases for key words and phrases that intrigue them in the books they read.
Pretty much every humanities scholar today relies on the simple or Boolean text search as an aid to research, mining digital archives for loaded terms and expressions and seeking relevant examples of their use. A practical aim of Cyberformalism is to show how, with a few hours’ tinkering, a scholar can learn to use more recent and sophisticated interfaces — designed and built by linguists, taggers, and coders — to search databases for a larger variety of linguistic features. Ranging from the simple to the complex and from the generic to the specific, these features are lumped together by Shore under the name linguistic forms. A linguistic form in this broad sense is any specified sequence of textual elements. Some elements are abstract — parts of speech, grammatical or syntactical constructs; others are concrete — word parts, single words, phrases, complete clauses. A linguistic form can be partly both abstract and concrete.
Consider this lyric by The Who: “Meet the new boss, same as the old boss” — which is not one of Shore’s instances but might have been one. Among the linguistic forms the line illustrates, one is the abstract syntactical form of the predicate: imperative verb–direct object–adjectival phrase modifying direct object. The generality of this form means that it occurs in phrases with widely divergent verbal contents, most of which would not call The Who to mind. (A made-up example: “Drink Lite Lager, now less filling than ever before!”)
More concretely specified — and far more likely to remind us of The Who, wherever it appears — is another linguistic form: Meet the new [X], same as the old [X]. This form is more concrete because its contents are not syntactically but lexically specified. It comprises a series of mostly invariant words, with just one variable term [X]. Adding to this concreteness is the rhythmic repetition [X] … [X]: the simplest kind of rhyme, formally emphasized by the parallel placement of antithetical modifiers (new … old).
Just as Meet the new [X], same as the old [X] is a linguistic form, so is imperative verb–direct object–adjectival phrase modifying direct object. But the form that is remembered and reused is the more concrete one: Meet the new [X], same as the old [X]. In the decades since The Who’s lyric appeared on the 1971 album Who’s Next (where it ends the song “Won’t Get Fooled Again”), this form has become ubiquitous in headlines. It conveniently sums up many an anticlimactic story, as in the eternal lament of a Cleveland sportswriter: “Meet the new Browns, same as the old Browns.” (A decade ago, a blogger lamented the overuse of this form by newspaper writers and editors, listing dozens of examples and pleading, “Make it stop!”)
As Shore points out, many linguistic forms — concrete and abstract — have long been singled out by scholars as markers of literary style. First there were the countless verbal “schemes” that ancient Greek rhetoricians distinguished by name (such as epistrophe or antistrophe, the trick of ending successive lines with a repeated word like “boss”). But beyond those classical “figures of language,” stock phrases can be associated with traditions or entire genres.
For example, Shore considers the phrase “Was it for this?” — a common lament in 18th-century English poetry, and one that played a seminal part in an early draft of William Wordsworth’s The Prelude. For decades, scholars of Romanticism poked through texts of the era, hoping to learn precisely what Wordsworth was echoing or alluding to with this phrase. Using those erratic efforts as a foil, Shore makes Was it for [X]? demonstrate the superiority of database searching as a tool to clarify — and complicate — such questions of allusion and influence. Tracing the form’s origins to English translations of Virgil’s Aeneid, Shore shows that Wordsworth knew of this Virgilian context, yet he shows also that the form’s wider dispersal makes it tendentious to read the phrase as an allusion to the Aeneid. This tension between unitary and pluralistic accounts of how Was it for [X]? entered The Prelude serves, finally, to warn against hasty readings of verbal echoes as clues to single-minded poetic intention. But the exercise also dares literary Luddites to deny that database searching is now essential to philologically informed criticism. Eschewing digital archives, today, is like stargazing without a telescope: good for an aspiring Wordsworth, perhaps, but useless to scholarship.
In the later chapters of Cyberformalism, Shore looks beyond such concrete, lexically specific linguistic forms as Was it for [X]? to consider more abstract ones. This is where the familiar keyword search falls short. With fewer fixed words in the object being searched for, and more grammatically specified elements, one needs a database and an interface like the ones created by linguists for scientific study: one with tags for parts of speech, and for the different modifying functions and relations that an element can have in different contexts. Adding this metadata requires human labor — especially for premodern archives, which pose challenges beyond the capacity of today’s artificial intelligence. Still, considerable resources are available today, and more can be expected to come online in the near future.
Among the grammatically and syntactically defined forms that Shore presents as case studies, two draw their interest from resonant contemporary slogans: “What would Jesus do?” and “Act as if.” Both yield up intellectual-historical riches when Shore looks past the words of each phrase to search for the linguistic forms that attend them. Thus, “What would Jesus do?” implicitly calls for an answer in a form such as: If Jesus [predicate in the subjunctive mood], he would [predicate]. As it turns out, English writers didn’t produce this form with Jesus as the grammatical subject until the end of the 1600s (in the databases available to Shore). Nor, for that matter, did writers in several other European languages. Why? What changed? Shore offers a reading of the emergence of the counterfactual If Jesus … he would … form that is historically sensitive, though admittedly tentative because of the partiality of the evidence. It involves post-Renaissance biblical exegesis, newly attuned to historical anachronism, as well as post-Reformation religious culture, with its increasingly individual relationship to the Bible.
Similarly, the phrase “Act as if” sets Shore on a search for forms similar to: [imperative verb] as if [independent clause with subjunctive predicate]. He finds these forms in moral precepts of Greek and Roman antiquity, but goes on to show how modern writers put them to radically different uses. In a chapter I consider Shore’s tour de force, he follows the form Act as if [X] along a spiraling arc from the Pensées of Pascal, to Kant’s categorical imperative, to the doctrine of self-realization taught by William James and quickly embraced by the United States’s late-capitalist society (with its entrepreneurial need for self-invention grafted onto a 19th-century confidence-man culture).
While the phrase “Act as if” can be found in a simple search, locating its more abstract forms requires Shore to do a deeper dive into tagged databases. The payoff for this extra effort is clear. And yet, as Shore readily admits, his proposed explanations for these intriguing search results are often debatable. In two further chapters — on abstract linguistic forms in Milton and Shakespeare — Shore attempts even more complex searches, thereby opening his explanatory readings more widely to empirical objections. For example, in discussing Milton’s characteristic “depictives,” or adjectives modifying subjects in a preceding subject-predicate construction, Shore astutely relates this “stylistic marker” to Homer. But he doesn’t think to look in George Chapman’s Iliad, where the first random page I checked provided a depictive similar to Milton’s, and similarly reflecting the English poet’s sense of Homer’s Greek. (The phrase is “Austere and terrible”, rendering δεινὸς at Iliad XVII.211 — perhaps a philologically imperfect Homeric reading, but, in Chapman’s English, plainly meant to describe the preceding subject “War-god.”)
Here the technical challenge, as Shore explains, is that algorithms can’t recognize such a complex abstract form reliably enough to produce search results that are sufficiently accurate: too many false positives are produced. So I can’t click on a search box and instantly find out whether my hunch about Chapman leads to a story about Milton’s depictives strong enough to compete with Shore’s. And, obviously, no one can comb manually through all the adjectives in English literature through Milton’s time, seeking only those that modify subjects in preceding subject-verb constructions. Until we have better tagged datasets (or better artificial intelligence), a discussion of Milton’s depictives will remain largely informed by scholarly intuition, grounded in much reading and guided by humanistic judgment.
While reading Cyberformalism, I found myself starting down trails such as this one — prompted by one or another of Shore’s educated guesses to look for alternative examples, finding different explanations. Does Kant’s fondness for [verb] as if really find its deepest rationale in a story about theological belief, as Shore suggests by juxtaposing it with Pascal’s religious application of Act as if? Or does the Kantian as if more profoundly express his Copernican epistemological shift (a natural-science paradigm that is prominent in Kant’s “Preface” to the second edition of his Critique of Pure Reason)? And in the chapter on Shakespeare, is Shore devising the best heuristic to explain “unhouseled, disappointed, unaneled” (Hamlet 1.5) when he searches for “three past participles in succession”? Suggestively, Shore’s search points toward “legal and liturgical contexts” for this form in texts from Shakespeare’s time (“hanged, drawn, and quartered”; “concluded, accorded, and agreed”; “predestined, called, justified”). And yet, not one of Shore’s top 36 results contains the negative prefix un-. Isn’t Hamlet’s threefold un- … dis- … un- a defining formal feature of this line?
Again, as with Milton, philological intuition suggests another avenue. The second act of Seneca’s Latin revenge tragedy Thyestes starts with a soliloquy by Atreus, beginning with the phrase “Ignave, iners, enervis” (“Idle, inert, impotent,” in John Fitch’s Loeb version). This triple self-accusation, with its triple negation (the ig-, in-, e- are negative prefixes), is followed by a parallel negated passive participle, “inulte” (“unavenged”). If Hamlet’s line advertises its formal affinities in its participles, why not also in its negative prefixes? Without refuting Shore’s account, Thyestes intimates that other stories can also be told about the line’s linguistic form. (Even if Shore had searched his English-language corpora for three successive negatively prefixed words, he’d have missed the Elizabethan translation of the Senecan line: it consists of three nouns without prefixes, “dastard, coward … wretch,” leading to just one negated past participle, “unrevenged!”)
A basic strength of Cyberformalism is that Shore openly welcomes such critical contestation of his claims about particular texts. I think that his general approach can also be questioned in ways that might spark fruitful discussion. What would happen if we made linguistic forms a principal vector in our accounts of literary and intellectual history? Would we risk neglecting continuities of thought that are not expressed in verbal or grammatical similarity? Should we accept that philological studies using linguistic forms will be heavily skewed toward languages and literatures with large corpora and comprehensive metadata? In exploring the literary uses of linguistic forms, must we look first to vernacular discourses of the time, as Shore does for the Hamlet phrase — an approach tending to privilege contemporary social constructs (such as legal and ecclesiastical formulae)? Or are we equally obliged to look across centuries, and even across languages, for formal kinships that may ultimately appear indifferent to content (as in so many of classical rhetoric’s “figures of language”)? And yet if forms aren’t always only or best understood as an index of meaning, what are we really getting a fix on when we find them — are they just containers, distinct from the ideas they contain? But in that case, why not search for the contents themselves? Because computers still can’t do that?
Objections such as these open theoretical cans of worms — yet they also validate Shore’s project, simply by reflecting his deep engagement with ultimate humanistic concerns. Cyberformalism acknowledges that the subjective contemplation and discussion of meanings and values is not just one disposable strand in humanities research; instead, it is paramount to the philologist’s vocation. We should all be debating how data fits into that vocation now, while search-driven literary and cultural study is still in its infancy. Not many years hence, our corpora will be vastly larger and better tagged, and our searches will be far more powerful. The new philology will not be the same as the old philology, and will be pressured on all sides to leave humanistic thinking behind. Many of us will then thank Daniel Shore for having set forth a vision of the digital humanities that puts the human first, the data science second, rather than the other way around.