Which machine translator can you rely on? The world testing marathon that starts on March 1st, at Charles University in Prague should bring the answer to this question. For example, the best known translator by Google will stand in comparison with the new EU translator, Euromatrix. Eromatrix might save the European Union hundreds of millions of Euros in the future.
Professional translators translated 12,500 sentences selected from articles out of the best known news servers and daily papers - BBC (English), Le Monde (French), Der Spiegel (German), El Mundo (Spanish), as well as iDnes or Lidové noviny (Czech) - into five languages. The same texts now undergo machine translation. David Matuška from CEET explains: “Comparing the human and machine translations then shows the reliability of machine translators, finding the most suitable one for individual language combination.”
The Czech translation agency, CEET, was appointed by the European Union to assess this project. Matuška explains why CEET decided to invest into both the European Union project and the development of a new translator: “Computer supported translations followed by human proofreading represents the future of translation field. It is therefore essential to participate in the development and research of such technology.”
Compared to translations such as between English and Chinese, machines encounter surprisingly more serious problems dealing with European languages. Ondřej Bojar from the Faculty of Mathematics and Physics, Charles University, says: “The signs may be completely different, yet these languages are closer in both morphology and word order. Translations from and to Czech are significantly complicated especially due to declination and conjugation.” Czech is thus one of the languages used for translator testing.
From March 1 to 5, computers will be translating test articles from European newspapers; assessment will follow. Matuška concludes: “We are curious to see the results for Euromatrix. Contrary to Google and other translators, which only work with word frequency, Euromatrix adds grammar.”
Schedule
1.-5. 3. 2010
Registered participants, organisations, and enthusiasts have their software translate a selected text. In Prague, Euromatrix’s development is headed by the Institute of Formal and Applied Linguistics (Faculty of Mathematics and Physics) at Charles University. Aside from Google, other registered participants include, for example, Systran and Moses.
March 2010
Completed machine translations are processed by the University of Edinburgh. Here, they are processed by a special computer programme that evaluates and compares the accuracy of individual translations based on their similarity with human translation.
March – July 2010
Human assessment (John Hopkins University, USA). On an identical text, experts will compare the human translation with several machine translations. Subjectively, they will evaluate the best and most accurate one. An objective view is secured by the number of such outputs. Parallel evaluation is implemented by several dozens of people.
July 2010
Complete test results are processed and evaluated by CEET, announcing the results at the Association for Computational Linguistics Conference (Uppsala, Sweden).
What Is Euromatrix, and How Does It Work?
The work of majority of machine translators, including Google, are strictly based on a statistical model, where thousands of words are put in every day. Consequently, it assesses the frequency and translates. Unfortunately, only 2 to 3 % of such translations provide one hundred percent quality. However; Euromatrix is a hybrid translator model combining statistics and linguistics. Its quality is thus increased by several levels.
Development Cannot Exist without Money
The development of Euromatrix is financially supported by the European Union. The EU expects its use to bring significant savings in the sphere of translation of various regulations, contracts, and official documents. Matuška explains: “Total cost of the Euromatrix Plus project is approximately 5 million Euro, 3.8 million of which is paid by the European Union. Approximately 1.1 million comes from the budgets of research institutes, or national grant programmes. CEET invested about 60,000 Euros from its own sources. If the European Committee decides to use machine translations, it may start to save tens of millions Euros a year in approximately a five-year’s time.” The Euromatrix Plus project investment thus should pay back after the first year of its active use.




