In search of inspiration for improving computer-based text translators, researchers at Dartmouth College turned to the Bible for guidance. The result is an algorithm trained on various versions of the sacred texts that can convert written works into different styles for different audiences.
Internet tools to translate text between languages like English and Spanish are widely available. Creating style translators – tools that keep text in the same language but transform the style – have been much slower to emerge. The Dartmouth-led team saw in the Bible “a large, previously untapped dataset of aligned parallel text.” Beyond providing infinite inspiration, each version of the Bible contains more than 31,000 verses that the researchers used to produce over 1.5 million unique pairings of source and target verses for machine-learning training sets.
“The English-language Bible comes in many different written styles, making it the perfect source text to work with for style translation,” said Keith Carlson, a PhD student at Dartmouth and lead author of the research paper about the study.
As an added benefit for the research team, the Bible is already thoroughly indexed by the consistent use of book, chapter and verse numbers. The predictable organization of the text across versions eliminates the risk of alignment errors that could be caused by automatic methods of matching different versions of the same text.
“The Bible is a ‘divine’ data set to work with to study this task,” said Daniel Rockmore, a professor of computer science at Dartmouth and contributing author on the study. “Humans have been performing the task of organizing Bible texts for centuries, so we didn’t have to put our faith into less reliable alignment algorithms.” (1)
In the beginning there was Logos.
And we tried to express God with words.
We were bad at it in the beginning.
But gradually we learned.
To use words better.
To express ourselves.
To make art with lifeless marking on white paper.
And people read and wept.
And people believed and followed.
And people forgot.
And people became indifferent.
At the end, the markings on the paper were dead.
Being nothing more than sad reminders.
That we once upon a time were alive.
That we used to be part of God.
In the beginning there was Logos.
And we tried to express God with words.
We were so good at it in the beginning…
PS. Dartmouth College has a long history of innovation in computer science. The term “artificial intelligence” was coined at Dartmouth during a 1956 conference that created the AI research discipline. Other advancements include the design of BASIC – the first general-purpose and accessible programing language – and the Dartmouth Time-Sharing System that contributed to the modern-day operating system.