Israeli students’ AI model rescues lost words of antiquity

April 10, 2024 by Pesach Benson
Read on for article

In a groundbreaking development for historical researchers, students at Ben-Gurion University of the Negev have harnessed the power of artificial intelligence to restore illegible letters and words in ancient Hebrew and Aramaic inscriptions.

The Dead Sea Scrolls are on display in the Israel Museum in Jerusalem on March 30, 2021. Photo by Eitan Elhadez-Barak/TPS

Every year, archaeologists unearth a wealth of ancient texts written in Hebrew and Aramaic across the Near East. These inscriptions are invaluable for understanding the region’s rich cultural and historical heritage. However, many of these texts have suffered damage over time, making it difficult for scholars to decipher them. Natural disasters, political conflicts, and the ravages of time have all taken their toll on these ancient artifacts.

But BGU’s innovative approach may revolutionise the field of epigraphy, the science of identifying, classifying, and interpreting inscriptions found on ancient artifacts such as coins, monuments, statues, buildings, or writing found on ancient papyrus, parchment or scrolls.

“This breakthrough has the potential to revolutionise the field of epigraphy,” said Professor Mark Last, who supervised the students’ project.

“Not only can we assist historians in reconstructing ancient texts more accurately, but I also believe that this model can be adapted to other morphologically rich ancient languages.”

Traditionally, epigraphists relied on time-consuming manual methods to reconstruct missing parts of damaged inscriptions. However, those methods are prone to errors.

Students from the university’s Department of Software and Information Systems Engineering who took on the project approached the challenge as an “extended masked language modelling task.” This refers to a specific type of natural language processing task that builds upon the concept of masked language modelling, a technique commonly used in pre-training large-scale language models. Damaged content can comprise single characters, character n-grams (partial words), single complete words, and multi-word n-grams.

Led by Last, undergraduate students Niv Fono, Harel Moshayof, Eldar Karol, and Itay Asraf applied a masked language modelling approach to corrupted inscriptions in Hebrew and Aramaic. This involved training the system on a dataset comprising 22,144 sentences from the Old Testament and testing it on an additional 536 sentences, achieving notable success.

By employing an ensemble of word and character prediction models, they were able to achieve high accuracy in restoring damaged text.

Their model, dubbed “Embible,” was presented to  the European Chapter of the Association for Computational Linguistics at its meeting in March.

“We can help historians who have devoted their lives to recreating these ancient texts as accurately as possible,” said Last, “Furthermore, I believe the model can be extended to cover other morphologically rich ancient languages.”

Speak Your Mind

Comments received without a full name will not be considered
Email addresses are NEVER published! All comments are moderated. J-Wire will publish considered comments by people who provide a real name and email address. Comments that are abusive, rude, defamatory or which contain offensive language will not be published

Got something to say about this?

This site uses Akismet to reduce spam. Learn how your comment data is processed.