… experts describe the project as the most ambitious undertaking in the history of human language technology.
If it is developed as planned, the first-of-its-kind machine will be able to recognize speech in multiple languages, translate it into English, and then mine the resulting transcripts to sift so-called intelligence from dross, said sources close to the project.
The ultimate goal of the endeavor, dubbed GALE for Global Autonomous Language Exploitation, is to turn the staggeringly large volumes of recorded foreign language broadcasts, phone conversations, and Internet traffic into something national security analysts, spooks, and soldiers can actually use.
It is said that the National Security Agency gathers enough information every hour to fill the Library of Congress. Most of it is never translated, and never reaches the desk of an analyst.
The dearth of solid human language technology in the hands of the government is a â€œhuge problem,â€? has said Gilman Louie, the president and CEO of In-Q-Tel, the Central Intelligence Agencyâ€™s venture capital arm…
DARPAâ€™s ambitions for GALE represent the â€œHoly Grailâ€? of human language technology, said John Makhoul, a scientist at BBN working on the GALE project. The best speech recognition software operating in a controlled environment like a television broadcast can usually get nine out of 10 words correct.
DARPA wants 95 percent accuracy rates for both speech recognition and translation. And it demands that the engines be able to process radio, TV, talk shows, newswires, newsgroups, blogs, and phone conversations in English, Arabic, and Chinese.
The Pentagon is seeking the same high accuracy rate for the translation part of the system. Experts say that translation software now performs with around 80 percent accuracy.