By Munyao Kilolo, Ph.D. student in comparative literature
James H. Brusuelas, Associate Professor of Classics at the University of Kentucky, completed his Ph.D. in classics in 2008, a period he considers the worst time possible for the job market. As the housing bubble burst jobs, including those in the academy, disappeared. And as a California native, Brusuelas hoped to pursue local employment opportunities upon finishing his degree at UCI.
In the midst of hiring freezes he accepted a research job working for the online Thesaurus Linguae Graecae® (TLG®) project. Since its founding in 1972, at UCI, the project has collected and digitized most literary texts written in Greek from Homer to the fall of Byzantium in AD 1453. It represents the first effort in the humanities to produce a large digital corpus of literary texts. Brusuelas explains, “TLG is one of the very first digital humanities projects within classics, and has been essential for classics scholars because it eases data mining essential for research.” Today the online TLG contains over 110 million words from over 10,000 works. The work at TLG proved to be immensely impactful for Brusuelas’ career path.
While immersed in the work at TLG, Brusuelas continued to apply for academic teaching jobs. “Lo and behold! The only academic job offer I got was from Oxford University in the UK. It seemed to have come out of nowhere. The job was posted in April of 2010, with a start date in September. I needed to decide almost immediately. And so I moved to Oxford!”
Digital humanities at work on the ancient world
The job turned out to be everything Brusuelas hoped for. They needed someone with a Ph.D. in Greek who also had a working knowledge of Papyrology, in order to work with Greek papyrus texts. “I was not only trained through my Ph.D. program on papyrology, but the project at Oxford also needed someone with some experience with digital humanities. And this is where my two years at TLG paid off. I could now comfortably participate in a crowdsourcing project for transcribing a massive papyrus collection housed at Oxford. This basically gave instant access to anybody who wanted to volunteer to transcribe these texts.”
Brusuelas and his team created a large database consisting primarily of Greek character classifications and x/y location coordinates associated with hundreds of thousands of images of Greek papyrus fragments, which Oxford obtained at the very end of the nineteenth and early twentieth centuries. From around 400,000 images, the team gathered 10 million unique annotations.
Following this success, the team grappled with how to elicit transcriptions of the texts, and turned to computationally identifying the texts to expedite the transcription process. “Between 2011 and 2014, the team and volunteers did millions of annotations. Even though a large number of individuals were making enormous contributions, there were many overlaps or identifications of the same source texts (especially around Homer’s works). It became imperative that we automate this process.”
DNA sequencing, but for language
Brusuelas acknowledges the unique challenges in reading ancient Greek Papyri, including the lack of word division and the need to break apart strings of Greek characters that are difficult to decipher, even among trained papyrologists. Even with the use of AI, it was challenging to apply algorithmic processes to this transcription data.
The solution came in the form of a Computer Science master’s degree student at Middle Tennessee State University – Alex Williams – who had been one of Brusuelas' research assistants at Oxford. As part of a master’s thesis, Williams adapted a biological algorithm used in DNA sequencing called BLAST in order to apply it to Greek text. Williams insisted that there wasn’t a need for a 100% match when making artificial identifications, and so “we adopted the concept of DNA sequencing matches, simply looking at similarity at the level of DNA. We basically swapped out the nucleotide sequences and plugged in Greek characters.”
While the team had some fascinating outcomes, they were still limited in terms of outputs. The big AI breakthroughs were yet to come. Fast forward to 2015, when computer scientists were training on convolutional neural networks. Witnessing the emergence of powerful new tools like this, Brusuelas and his team returned to their original Ancient Lives data. And with major success.
After almost a decade on the project, Brusuelas, Williams and another colleague, John Wallen from Middle Tennessee State University, started training neural networks that would eventually become AI-assisted papyrology. There have been major triumphs in their work, including creating a model that works with 94% classification accuracy and the first AI model that looks at an image of a papyrus and auto generates a complete transcription. However, there are also some serious concerns around deep fakes, which the team works to address through metadata documentation.
For Brusuelas, the biggest breakthrough came last year through his digital restoration of the Herculaneum papyri project with Brent Seales at the University of Kentucky. With the Vesuvius Challenge, a machine learning and computer vision competition that awarded over $1,000,000 in prizes, the riddle of the carbonized Herculaneum papyri was solved. Using virtual unrolling and AI to render the Greek text visible, they have revealed 16 columns of Greek from inside a papyrus scroll that cannot be physically opened.
Looking back on his post-Ph.D. journey, Brusuelas acknowledges that the path to his current faculty position at the University of Kentucky was one that he couldn’t have predicted while in graduate school. Grateful to have navigated the economic crisis of the time – something that he acknowledges for today’s graduate students as well – Brusuelas encourages graduate students, especially those in the humanities and social sciences, to stay informed about what’s going on with AI. “We are chasing this thing. We need to get ahead of it and imagine ways of teaching that integrate AI standards with pedagogy in the classroom. Students need to learn to distinguish between ethical approaches to using AI as a collaborative tool for research and using it to do the job for them.”
Interested in reading more from the School of Humanities? Sign up for our monthly newsletter.