Professional Documents
Culture Documents
Hang & Chao, Machine Translation Evaluation. A Survey (Paper 2016)
Hang & Chao, Machine Translation Evaluation. A Survey (Paper 2016)
net/publication/303280649
CITATIONS READS
20 3,379
2 authors:
All content following this page was uploaded by Lifeng Han on 04 March 2018.
Abstract 1 Introduction
[Arnold2003] D. Arnold. 2003. Computers and Trans- [Buck2012] Christian Buck. 2012. Black box features
lation: A translator’s guide-Chap8 Why translation for the wmt 2012 quality estimation shared task. In
is difficult for computers. Benjamins Translation Li- Proceedings of WMT.
brary.
[Callison-Burch et al.2006] Chris Callison-Burch,
[Avramidis et al.2011] Eleftherios Avramidis, Maja Philipp Koehn, and Miles Osborne. 2006. Improved
Popovic, David Vilar, and Aljoscha Burchardt. statistical machine translation using paraphrases. In
2011. Evaluate with confidence estimation: Proceedings of HLT-NAACL.
Machine ranking of translation outputs using
grammatical features. In Proceedings of WMT. [Callison-Burch et al.2007] Chris Callison-Burch,
Cameron Fordyce, Philipp Koehn, Christof Monz,
[Bahdanau et al.2014] Dzmitry Bahdanau, Kyunghyun and Josh Schroeder. 2007. (meta-) evaluation of
Cho, and Yoshua Bengio. 2014. Neural machine machine translation. In Proceedings of WMT.
translation by jointly learning to align and translate.
CoRR, abs/1409.0473. [Callison-Burch et al.2008] Chris Callison-Burch,
Cameron Fordyce, Philipp Koehn, Christof Monz,
[Banerjee and Lavie2005] Satanjeev Banerjee and Alon and Josh Schroeder. 2008. Further meta-evaluation
Lavie. 2005. Meteor: An automatic metric for of machine translation. In Proceedings of WMT.
mt evaluation with improved correlation with human
judgments. In Proceedings of the ACL. [Callison-Burch et al.2009] Chris Callison-Burch,
Philipp Koehn, Christof Monz, and Josh Schroeder.
[Bangalore et al.2000] Srinivas Bangalore, Owen Ram- 2009. Findings of the 2009 workshop on statistical
bow, and Steven Whittaker. 2000. Evaluation met- machine translation. In Proceedings of the 4th
rics for generation. In Proceedings of INLG. WMT.
[Dagan et al.2006] Ido Dagan, Oren Glickman, and [Giménez and Márquez2007] Jesús Giménez and Lluís
Bernardo Magnini. 2006. The pascal recognis- Márquez. 2007. Linguistic features for automatic
ing textual entailment challenge. Machine Learning evaluation of heterogenous mt systems. In Proceed-
Challenges:LNCS, 3944:177–190. ings of WMT.
[Dahlmeier et al.2011] Daniel Dahlmeier, Chang Liu, [Group2010] NIST Multimodal Information Group.
and Hwee Tou Ng. 2011. Tesla at wmt2011: Trans- 2010. Nist 2005 open machine translation (openmt)
lation evaluation and tunable metric. In Proceedings evaluation. In Philadelphia: Linguistic Data Con-
of WMT. sortium. Report.
[Doddington2002] George Doddington. 2002. Auto- [Guo et al.2009] Jiafeng Guo, Gu Xu, Xueqi Cheng,
matic evaluation of machine translation quality us- and Hang Li. 2009. Named entity recognition in
ing n-gram co-occurrence statistics. In HLT Pro- query. In Proceeding of SIGIR.
ceedings.
[Gupta et al.2015] Rohit Gupta, Constantin Orasan, and
[Dorr et al.2009] Bonnie Dorr, Matt Snover, and etc. Josef van Genabith. 2015. Machine translation
Nitin Madnani. 2009. Part 5: Machine translation evaluation using recurrent neural networks. In Pro-
evaluation. In Bonnie Dorr edited DARPA GALE ceedings of the Tenth Workshop on Statistical Ma-
program report. chine Translation, pages 380–384, Lisbon, Portugal,
[Doyon et al.1999] Jennifer B. Doyon, John S. White, September. Association for Computational Linguis-
and Kathryn B. Taylor. 1999. Task-based evalua- tics.
tion for machine translation. In Proceedings of MT
[Hald1998] Anders Hald. 1998. A History of Math-
Summit 7.
ematical Statistics from 1750 to 1930. ISBN-10:
[Echizen-ya and Araki2010] H. Echizen-ya and 0471179124. Wiley-Interscience; 1 edition.
K. Araki. 2010. Automatic evaluation method for
machine translation using noun-phrase chunking. In [Han et al.2012] Aaron Li Feng Han, Derek Fai Wong,
Proceedings of the ACL. and Lidia Sam Chao. 2012. A robust evaluation
metric for machine translation with augmented fac-
[Eck and Hori2005] Matthias Eck and Chiori Hori. tors. In Proceedings of COLING.
2005. Overview of the iwslt 2005 evaluation cam-
paign. In In proceeding of International Workshop [Han et al.2013a] Aaron Li Feng Han, Derek Fai Wong,
on Spoken Language Translation (IWSLT). Lidia Sam Chao, Liangeye He, Shuo Li, and Ling
Zhu. 2013a. Phrase tagset mapping for french
[EuroMatrix2007] Project EuroMatrix. 2007. 1.3: Sur- and english treebanks and its application in ma-
vey of machine translation evaluation. In EuroMa- chine translation evaluation. In International Con-
trix Project Report, Statistical and Hybrid MT be- ference of the German Society for Computational
tween All European Languages, co-ordinator: Prof. Linguistics and Language Technology, LNAI Vol.
Hans Uszkoreit. 8105, pages 119–131.
[Han et al.2013b] Aaron Li Feng Han, Derek Fai Wong, [Koehn et al.2007b] Philipp Koehn, Hieu Hoang,
Lidia Sam Chao, Yi Lu, Liangye He, Yiming Wang, Alexandra Birch, Chris Callison-Burch, Mar-
and Jiaji Zhou. 2013b. A description of tunable ma- cello Federico, Nicola Bertoldi, Brooke Cowan,
chine translation evaluation systems in wmt13 met- Wade Shen, Christine Moran, Richard Zens, et al.
rics task. In Proceedings of WMT, pages 414–421. 2007b. Moses: Open source toolkit for statisti-
cal machine translation. In Proceedings of the
[Han et al.2014] Aaron Li Feng Han, Derek Fai Wong, 45th annual meeting of the ACL on interactive
Lidia Sam Chao, Liangeye He, and Yi Lu. 2014. poster and demonstration sessions, pages 177–180.
Unsupervised quality estimation model for english Association for Computational Linguistics.
to german translation and its application in extensive
supervised evaluation. In The Scientific World Jour- [Koehn2004] Philipp Koehn. 2004. Statistical signif-
nal. Issue: Recent Advances in Information Technol- icance tests for machine translation evaluation. In
ogy, pages 1–12. Proceedings of EMNLP.
[Hwang et al.2007] Young Sook Hwang, Andrew [Koehn2010] Philipp Koehn. 2010. Statistical Ma-
Finch, and Yutaka Sasaki. 2007. Improving statis- chine Translation. Cambridge University Press.
tical machine translation using shallow linguistic
knowledge. Computer Speech and Language, [Landis and Koch1977] J. Richard Landis and Gary G.
21(2):350–372. Koch. 1977. The measurement of observer agree-
ment for categorical data. Biometrics, 33(1):159–
[Kendall and Gibbons1990] Maurice G. Kendall and 174.
Jean Dickinson Gibbons. 1990. Rank Correlation
Methods. Oxford University Press, New York. [Laoudi et al.2006] Jamal Laoudi, Ra R. Tate, and
Clare R. Voss. 2006. Task-based mt evaluation:
[Kendall1938] Maurice G. Kendall. 1938. A new mea- From who/when/where extraction to event under-
sure of rank correlation. Biometrika, 30:81–93. standing. In in Proceedings of LREC06, pages
2048–2053.
[Khalilov and Fonollosa2011] Maxim Khalilov and
José A. R. Fonollosa. 2011. Syntax-based reorder- [Lapata2003] Mirella Lapata. 2003. Probabilistic text
ing for statistical machine translation. Computer structuring: Experiments with sentence ordering. In
Speech and Language, 25(4):761–788. Proceedings of ACL.
[King et al.2003] Marrgaret King, Andrei Popescu- [Lebanon and Lafferty2002] Guy Lebanon and John
Belis, and Eduard Hovy. 2003. Femti: Creating and Lafferty. 2002. Combining rankings using condi-
using a framework for mt evaluation. In Proceed- tional probability models on permutations. In Pro-
ings of the Machine Translation Summit IX. ceeding of the ICML.
[Koehn and Monz2005] Philipp Koehn and Christof [Leusch and Ney2009] Gregor Leusch and Hermann
Monz. 2005. Shared task: Statistical machine trans- Ney. 2009. Edit distances with block movements
lation between european languages. In Proceedings and error rate confidence estimates. Machine Trans-
of the ACL Workshop on Building and Using Paral- lation, 23(2-3).
lel Texts. [Li et al.2012] Liang You Li, Zheng Xian Gong, and
[Koehn and Monz2006] Philipp Koehn and Christof Guo Dong Zhou. 2012. Phrase-based evaluation for
Monz. 2006. Manual and automatic evaluation machine translation. In Proceedings of COLING,
of machine translation between european languages. pages 663–672.
In Proceedings on the Workshop on Statistical Ma- [Lin and Hovy2003] Chin-Yew Lin and E. H. Hovy.
chine Translation, pages 102–121, New York City, 2003. Automatic evaluation of summaries using
June. Association for Computational Linguistics. n-gram co-occurrence statistics. In Proceedings
[Koehn et al.2003] Philipp Koehn, Franz Josef Och, NAACL.
and Daniel Marcu. 2003. Statistical phrase-based [Lin and Och2004] Chin-Yew Lin and Franz Josef Och.
translation. In Proceedings of the 2003 Conference 2004. Automatic evaluation of machine transla-
of the North American Chapter of the Association tion quality using longest common subsequence and
for Computational Linguistics on Human Language skip-bigram statistics. In Proceedings of ACL.
Technology-Volume 1, pages 48–54. Association for
Computational Linguistics. [Liu and Gildea2005] Ding Liu and Daniel Gildea.
2005. Syntactic features for evaluation of machine
[Koehn et al.2007a] Philipp Koehn, Hieu Hoang, translation. In Proceedingsof theACL Workshop on
Alexandra Birch, Chris Callison-Burch, Marcello Intrinsic and Extrinsic Evaluation Measures for Ma-
Federico, Nicola Bertoldi, Brooke Cowan, Wade chine Translation and/or Summarization.
Shen, Christine Moran, Richard Zens, Chris Dyer,
Ondrej Bojar, Alexandra Constantin, and Evan [Liu et al.2011] Chang Liu, Daniel Dahlmeier, and
Herbst. 2007a. Moses: Open source toolkit for Hwee Tou Ng. 2011. Better evaluation metrics
statistical machine translation. In Proceedings of lead to better machine translation. In Proceedings
ACL. of EMNLP.
[Lo and Wu2011a] Chi Kiu Lo and Dekai Wu. 2011a. [Parton et al.2011] Kristen Parton, Joel Tetreault ans
Meant: An inexpensive, high- accuracy, semi- Nitin Madnani, and Martin Chodorow. 2011. E-
automatic metric for evaluating translation utility rating machine translation. In Proceedings of WMT.
based on semantic roles. In Proceedings of ACL.
[Paul et al.2010] Michael Paul, Marcello Federico, and
[Lo and Wu2011b] Chi Kiu Lo and Dekai Wu. 2011b. Sebastian Stüker. 2010. Overview of the iwslt 2010
Structured vs. flat semantic role representations for evaluation campaign. In Proceeding of IWSLT.
machine translation evaluation. In Proceedings of
the 5th Workshop on Syntax and Structure in Statis- [Paul2009] M. Paul. 2009. Overview of the iwslt 2009
ticalTranslation (SSST-5). evaluation campaign. In Proceeding of IWSLT.
[Lo et al.2012] Chi Kiu Lo, Anand Karthik Turmuluru, [Pearson1900] Karl Pearson. 1900. On the criterion
and Dekai Wu. 2012. Fully automatic semantic mt that a given system of deviations from the proba-
evaluation. In Proceedings of WMT. ble in the case of a correlated system of variables
is such that it can be reasonably supposed to have
[Mariño et al.2006] José B. Mariño, Rafael E. Banchs, arisen from random sampling. Philosophical Maga-
Josep M. Crego, Adrià de Gispert, Patrik Lambert, zine, 50(5):157–175.
José A. R. Fonollosa, and Marta R. Costa-jussà.
2006. N-gram based machine translation. Compu- [Popović et al.2011] Maja Popović, David Vilar, Eleft-
tational LinguisticsLinguistics, 32(4):527–549. herios Avramidis, and Aljoscha Burchardt. 2011.
Evaluation without references: Ibm1 scores as eval-
[Marsh and Perzanowski1998] Elaine Marsh and Den-
uation metrics. In Proceedings of WMT.
nis Perzanowski. 1998. Muc-7 evaluation of ie tech-
nology: Overview of results. In Proceedingsof Mes- [Popovic and Ney2007] M. Popovic and Hermann Ney.
sage Understanding Conference (MUC-7). 2007. Word error rates: Decomposition over pos
[McKeown1979] Kathleen R. McKeown. 1979. Para- classes and applications for error analysis. In Pro-
phrasing using given and new information in a ceedings of WMT.
question-answer system. In Proceedings of ACL.
[Povlsen et al.1998] Claus Povlsen, Nancy Underwood,
[Menezes et al.2006] Arul Menezes, Kristina Bradley Music, and Anne Neville. 1998. Evaluating
Toutanova, and Chris Quirk. 2006. Microsoft text-type suitability for machine translation a case
research treelet translation system: Naacl 2006 study on an english-danish system. In Proceeding
europarl evaluation. In Proceedings of WMT. LREC.
[Meteer and Shaked1988] Marie Meteer and Varda [Raybaud et al.2011] Sylvain Raybaud, David Lan-
Shaked. 1988. Microsoft research treelet transla- glois, and Kamel Smaïli. 2011. ”this sentence is
tion system: Naacl 2006 europarl evaluation. In wrong.” detecting errors in machine-translated sen-
Proceedings of COLING. tences. Machine Translation, 25(1):1–34.
[Miller et al.1990] G. A. Miller, R. Beckwith, C. Fell- [Sánchez-Martínez and Forcada2009] F. Sánchez-
baum, D. Gross, and K. J. Miller. 1990. Wordnet: Martínez and M. L. Forcada. 2009. Inferring
an on-line lexical database. International Journal of shallow-transfer machine translation rules from
Lexicography, 3(4):235–244. small parallel corpora. Journal of Artificial
Intelligence Research, 34:605–635.
[Montgomery and Runger2003] Douglas C. Mont-
gomery and George C. Runger. 2003. Applied [Snover et al.2006] Mattthew Snover, Bonnie J. Dorr,
statistics and probability for engineers. John Wiley Richard Schwartz, Linnea Micciulla, and John
and Sons, New York, third edition. Makhoul. 2006. A study of translation edit rate with
[Moran and Lewis2012] John Moran and David Lewis. targeted human annotation. In Proceeding of AMTA.
2012. Unobtrusive methods for low-cost manual as-
[Song and Cohn2011] Xingyi Song and Trevor Cohn.
sessment of machine translation. Tralogy I [Online],
2011. Regression and ranking based optimisation
Session 5.
for sentence level mt evaluation. In Proceedings of
[Mrquez2013] L. Mrquez. 2013. automatic evaluation WMT.
of machine translation quality. Dialogue 2013 in-
vited talk, extended. [Specia and Giménez2010] L. Specia and J. Giménez.
2010. Combining confidence estimation and
[Och2003] Franz Josef Och. 2003. Minimum error rate reference-based metrics for segment-level mt eval-
training for statistical machine translation. In Pro- uation. In The Ninth Conference of the Association
ceedings of ACL. for Machine Translation in the Americas (AMTA).
[Papineni et al.2002] Kishore Papineni, Salim Roukos, [Specia et al.2011] Lucia Specia, Naheh Hajlaoui,
Todd Ward, and Wei Jing Zhu. 2002. Bleu: a Catalina Hallett, and Wilker Aziz. 2011. Pre-
method for automatic evaluation of machine trans- dicting machine translation adequacy. In Machine
lation. In Proceedings of ACL. Translation Summit XIII.
[Stanojević and Sima’an2014a] Miloš Stanojević and [Wong and yu Kit2009] Billy Wong and Chun yu Kit.
Khalil Sima’an. 2014a. Beer: Better evaluation as 2009. Atec: automatic evaluation of machine trans-
ranking. In Proceedings of the Ninth Workshop on lation via word choice and word order. Machine
Statistical Machine Translation. Translation, 23(2-3):141–155.
[Stanojević and Sima’an2014b] Miloš Stanojević and [Zhang and Zong2015] Jiajun Zhang and Chengqing
Khalil Sima’an. 2014b. Evaluating word order re- Zong. 2015. Deep neural networks in machine
cursively over permutation-forests. In Proceedings translation: An overview. IEEE Intelligent Systems,
of the Eight Workshop on Syntax, Semantics and (5):16–25.
Structure in Statistical Translation.