Cornell Bowers College of Computing and Information Science
Busts of Homer and Apollonius of Rhodes


AI Analysis of Ancient Greek Texts Reveals Tradition of Literary Mimicry

Developing writers commonly imitate the styles of their literary heroes. For Ancient Greek scribes writing two millennia ago, style imitation was not only customary, it was an object of mastery. And, as Cornell researchers have found, the Greeks were masterful at mimicking the styles of their predecessors.

In a computational analysis of Ancient Greek texts, Cornell Computing and Information Science [CIS] researchers have found that Greek prose writers and poets across the centuries were so superb at imitating the styles of the likes of Homer, Plato, Apollonius and Oppian that modern text-mining tools are unable to differentiate Classic authors’ writings from their imitators’. Further, this penchant for imitating the voices of Classic Athens resulted in a more conservative evolution of Greek authorship and style, the analysis shows. 

“People had so much respect for Ancient Greek authors and their works,” said David Mimno, an associate professor of information science and senior author. “Greek scholars were putting in the time to get so familiar with the language of these specific works that they were able to mimic their predecessors’ almost unconscious styles, to the point that we couldn’t measure a difference.”

Grant Storey (CS ’19), lead author, said he assumed the Greek analysis would reveal a few outliers – a handful of Ancient Greek authors whose works were similar to particular influencers, like Homer or Plato. 

“But there weren’t outliers,” said Storey, now a software developer for Affinity. “The entire collection is an outlier – it reveals a cultural tradition across centuries where Greek authors were writing similarly to previous authors.”

Such findings are outlined in the pair’s paper, “Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus,” which was published in a recent edition of the Journal of Cultural Analytics. The paper is an adaptation of Storey’s thesis, which was completed as a graduate student in the Computer Science master’s program.

Both Cornell researchers work in the emerging field of the digital humanities, where computational text-mining tools and the ongoing digitization of classic works open up fresh opportunities for text and language analysis, wherein books become data and digital libraries become huge datasets. 


Today, with the help of computers, digital humanities scholars are mining entire collections, or corpora – everything from Ancient Greek texts and Danish folk tales to Russian novels and British Literature – to learn more about the cultures that produced them. 

We’re learning more about the authors, too.

Stylometry, the study of writing style, has also become much more measurable and faster with the help of text-mining tools. Scholars can mine every written work from a particular author and form an “author signature” based on commonly used words. 

Storey and Mimno broadened the scope of this method of computer-assisted stylometry to better understand how the Ancient Greek written language changed over the centuries.  

First, they mined the Perseus Digital Library’s Greek Collection – more than 450 works of literature and poetry penned by 92 authors between circa 8th Century BC and 6th Century AD. 

Next, Storey and Mimno established author signatures by analyzing each Greek author’s 250 most commonly used words. Then, they began comparing authors and their works across the entire library, pairing together those who shared the most frequently used words.

To measure the magnitude by which the Ancient Greek written language evolved over time compared to other languages, the pair mined two other digital libraries: a massive English collection with about 2,800 texts, including Modern English texts from the Gutenberg Dataset; the plays of Shakespeare from the Shakespeare Corpus, and a combination of Icelandic collections with 213 texts. All told, between the three digitized collections, the total number of words involved in this study eclipsed 220 million.

The entire collection is an outlier – it reveals a cultural tradition across centuries where Greek authors were writing similarly to previous authors.

What they found was striking: Ancient Greek authors writing at least four centuries apart were still showing high levels of style similarity, far more than writers in the English and Icelandic libraries, according to their analysis. Appollonius and Homer; Homer and Quintus Smymaeus; Plato and Aelius Aristides – these are some of the Greek author pairs who showed the most similarities in style despite centuries between them. 

“Writers tend to have different high-frequency word identifiers,” Mimno said. “With Greek authors, they’re surprisingly close.”

For example, little is definitively known about Apollonius of Rhodes, except that he lived in the 3rd century BC, penned the epic poem Argonautica, and – as Storey and Mimno found – could copy the style of Homeric poetry nearly as well as Homer himself. 

“If you take a passage of Apollonius and analyze it computationally, it’s as similar in style to Homer as any other Homer passage,” Mimno said. “And Apollonius is writing 500 years after Homer.”

Like Homer and Hesiod – another one of his Classic muses, Apollonius would go on to inspire several future Greek poets, the findings suggest: Oppian, Oppian of Apamea (both from 2nd century AD), and Quintus Smyrnaeus (4th century AD) all wrote with distinct similarity to Apollonius, according to the analysis. This single lineage of writing-style emulation – from Homer in the 8th century BC up to Quintus Smyrnaeus in the 4th century AD – is just one example among many others found throughout the entire Ancient Greek collection, researchers found.

Not only were Classic scribes writing prose and poetry in a similar style that paid homage to the great writers and orators of Athens, Mimno said, they were doing so with uncanny ability, generation after generation.

“It’s easy to make yourself sound old-fashioned,” he said. “To get it right is another level.”

The research is supported by the National Science Foundation and the Alfred P. Sloan Foundation.

Louis DiPietro is the communications coordinator for Information Science and Statistics and Data Science.