Thirumaligai Eisner 4 Ab 1

1
Annotated Bibliography
Shade, Benjamin, and Eduardo G. Altmann. "Quantifying the Dissimilarity of Texts."
Information, vol. 14, no. 5, 2023, p. 271. ProQuest Central Student; Publicly Available
Content Database, https://doi.org/10.3390/info14050271.
Published on May 2, 2023, "Quantifying the Dissimilarity of Texts" by Benjamin
Shade and Eduardo Altmann discusses the performances of different measures of
quantifying the dissimilarity between texts. The article primarily compared the
Jaccard Distance and the Jensen-Shannon divergence against general vector
embeddings created using the all-MiniLM-L6-v2 model, finding that the Jensen-
Shannon divergence performed very strongly across all 3 tasks, which are
clustering texts by author, subject, and time period, with the vector embeddings
also performing very well, while the Jaccard Distance wasn't as effective,. The
article discusses language models on a very high level, having a higher barrier to
entry than the other articles I have reviewed. The information contained,
describing techniques on how to quantify dissimilarity, is very useful for
comparing translated texts to quantify the accuracy of a translation, which can be
used to fine-tune the models used. However, while the information contained is
very useful, I wouldn't annotate this article, as it's too advanced for the moment.
Son, Jungha, and Boyoung Kim. "Translation Performance from the User's Perspective of Large
Language Models and Neural Machine Translation Systems." Information, vol. 14, no.
10, 2023, p. 574. ProQuest Central Student; Publicly Available Content Database,
https://doi.org/10.3390/info14100574.
2
Published on October 19, 2023, "Translation Performance from the User's
Perspective of Large Language Models and Neural Machine Translation Systems"
by Jungha Son and Boyoung Kim compares and contrasts the language translation
abilities of different Large Language Models, by comparing the capabilities of
Google Translate, Microsoft Translate, and ChatGPT, using corpora from
Workshop on Machine Translation as benchmarks. From this article, I could
annotate the parts on the different metrics used to compare the models, which are
their scores in the BLEU, chrF, and TER metrics, as well as their performance in
translating specific language pairs, to understand how Language Models are used
in language translation, as well as how they are graded and scored in their
capabilities. Skimming through the article, I could note down and utilize the
techniques for language translation in detail, as well as the background
knowledge so I can further build my foundation. Finally, I will definitely
read/annotate this article as it is incredibly useful for building up my foundations
for my Senior Project, before diving into higher-level knowledge such as specific
techniques, which are also detailed thoroughly in the article.
Zhu, Wenhao, et al. "Multilingual Machine Translation with Large Language Models: Empirical
Results and Analysis." Publicly Available Content Database, 2023,
www.proquest.com/working-papers/multilingual-machine-translation-with-large/
docview/2799277250/se-2?accountid=41498.
Published on April 10, 2023, with the current version revised as of October 29,
2023, "Multilingual Machine Translation with Large Language Models:
Empirical Results and Analysis" by Wenhao Zhu and others discusses 2 primary
3
questions, which are, "1) How LLMs perform MMT over massive languages?"
and "2) Which factors affect the performance of LLMs?" The article initially
compares and contrasts the performances of different LLMs, including ChatGPT
and LLaMA2-7B, for translating from English into other languages, comparing
their performances with those of Google Translate, and concluding that general
LLMs still have a long way to go for translation compared to the most common
LLM for translation, Google Translate. The article continues with the second
question by finding scenarios where LLMs excel/struggle. The article is
incredibly useful to me by showing the capabilities of different LLMs in different
areas, while also showing their strengths and weaknesses in translation. I will
annotate this source, as it will prove incredibly useful for the foundations of my
research, while also pointing different directions I should explore/study further.

Thirumaligai Eisner 4 Ab 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thirumaligai Eisner 4 Ab 1

Uploaded by

Copyright:

Available Formats

1

Shade, Benjamin, and Eduardo G. Altmann. "Quantifying the Dissimilarity of Texts."

Content Database, https://doi.org/10.3390/info14050271.

Published on May 2, 2023, "Quantifying the Dissimilarity of Texts" by Benjamin

Shade and Eduardo Altmann discusses the performances of different measures of

Jaccard Distance and the Jensen-Shannon divergence against general vector

describing techniques on how to quantify dissimilarity, is very useful for

comparing translated texts to quantify the accuracy of a translation, which can be

Published on October 19, 2023, "Translation Performance from the User's

Perspective of Large Language Models and Neural Machine Translation Systems"

abilities of different Large Language Models, by comparing the capabilities of

Google Translate, Microsoft Translate, and ChatGPT, using corpora from

Workshop on Machine Translation as benchmarks. From this article, I could

techniques for language translation in detail, as well as the background

knowledge so I can further build my foundation. Finally, I will definitely

read/annotate this article as it is incredibly useful for building up my foundations

techniques, which are also detailed thoroughly in the article.

Results and Analysis." Publicly Available Content Database, 2023,

2023, "Multilingual Machine Translation with Large Language Models:

compares and contrasts the performances of different LLMs, including ChatGPT

question by finding scenarios where LLMs excel/struggle. The article is

incredibly useful to me by showing the capabilities of different LLMs in different

research, while also pointing different directions I should explore/study further.

You might also like