Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Multilingual Ancient Text and Translation

Team Members
Pooja Upadhyay 1709113067
Utkarsh Pal 1709113112
Shashi Prakash 1709113095
Priyanka Patel 1709113075
Under the guidence of
Mrs. Shikha Verma
CONTENTS

1. INTRODUCTION
2. MOTIVATION
3. OBJECTIVE
4. SCOPE
5. RELATED WORK
6. TECHNICAL FEASIBILITY
7. REFERENCES
8. CONCLUSION
INTRODUCTION

● Some long lost ancient languages that have never been deciphered could
be the next ones to get the machine translation treatment.

● This project is about the conversion of the the ancient script into the
some modern day languages that we use.

● We will the power of modern day technology like machine learning to


solve the problem of script conversion.
MOTIVATION
● The ancient languages contains valuable information about our past.

● They consist information of ancient civilization , culture and sometimes


technologies too.

● So far archeologist use to spend lots of time on reading thick books and finding
the meaning of symbols or alphabets.

● This motivated us to make a system than reduce the time of script recognition.

● Hence it will save the time and money both. This system will also help common
people who dont have much knowledge about the ancient scripts.
OBJECTIVE
● To collect and analyze the textual characters from different ancient literature, agencies and
ICT based techniques.

● To recognize multiple languages using various machine learning techniques.

● To translate multiple language into english using machine learning


approaches .

● To develop a integrated text recognition and translation system

● To verify and validate the developed integrated Text


recognition and translation system .

● To perform comparative analysis of the developed integrated


Text recognition and translation system .
Literature Survey
S.No Author Year of Technique Used Input Parameters Data set Output Output
Name Publica used Paramet-
tion ers

1 David Ha 2018 PreaActResNet-18+ Kuzushiji kanji Kuzushiji Modern Kanji Accuracy


Mainfold Mixup sample MNIST version of
kuzushiji kanji

2 Yoshua 2016 LSTM +Ensemble Chinese Character ICDAR-2013 Drawn Chinese Accuracy
Bengio character (character
accuracy
41%)

3 Quoc V. NA SMT system with English Language WMT 14 French BLEU


Le Ensemble Text english to translation Score
french 40.3

4 Assc. for 2020 Fine tuning multigual Indian language IITB(Kunchku English BLEU
computati model of IITB HI-EN ttam etal. Translation score
onal data and all ILCI 2017), ILCI 33.78
Linguistic data on EN-XX data
S.No Author Year of Technique Input Parameters Data set Output Output
Name Publicatio Used used Param-ete
n rs

9 Vikrant Goyal 2019 bidirectional English or Gujarati English–Gujar Gujarati or BLEU


, Dipti Misra RNN model with lanuage text ati and English language score
Sharma BPE Gujarati–Engli text 18.64
sh

10 Gayatri H. 2014 TCH, SVM Devanagari Devanagari English Accuracy


Khobaragad, language images signs and Language text
Deepak panels
Kapgate images
dataset

11 E. K. 2016 Zoning Ancient Tamil Vattezhuthu English Accuracy


Language 89.7%
Vellingiriraj database

12 CHANHEE 2020 NMT(neural Korean Ancient DB English BLUE


LEE machine provided by Language Score
HEUISEOK the Institute of
translation)
LIM Translation of 5.289
YEONGWOO Korean
K YANG Classics.
S. Author Name Year of Technique Used Input Data set Output Output
No Publica Parameters used Param-eter
tion s

5 SONIKA NARANG, 2019 Simple majority voting Devnagri handwritten recognised Accuracy
M K JINDAL and on MLP, NN, Ancient Devanagari Devanagari (96.0%)
MUNISH KUMAR CNN,decision tree, Text ancient text
random forest manuscripts

6 Yannis Assael, 2019 bidirectional LSTM and ancient Greek AG textual restored full character
Thea both characters and inscription corpora texts error rate
Sommerschield, words as inputs. (30.1%),
Jonathan Prag Top-20
accuracy
score
73.5%

7 Shailesh Acharya 2015 Deep CNNs with Devanagari Devnagri recognised Accuracy
,Ashok Kumar added Dropout layer ancient text Handwritten Devanagari 98.47%
Pant,Prashnna and Dataset increment Dataset text
Kumar Gyawali technique

8 Swapnil Ashok 2020 transformer based marathi marathi-to-en English BLEU score
Jadhav architecture : language text glish parallel Translation 72.37
vaswani-wmt-en-de-big corpus
Process Architecture
SCOPE
1) IMAGE SELECTION

The data selection is the process of selecting and loading the input images from dataset repository. The dataset is
used to recognize the text from the input images. To read the image with the help of imread() function. The
dataset contains the images of Arabic characters.

2) IMAGE PREPROCESSING

Data pre-processing is the process of increasing or decreasing the total number of pixels.In this pre-processing,
we can implement the image resize.Image resizing is necessary when you need to increase or decrease the total
number of pixels, whereas remapping can occur when you are zooming refers to increase the quantity of pixels.
So that when you zoom an image, you will see clear content

3) SEGMENTATION
Segmentation attempts to partition the pixels of an image into groups that strongly correlate with the objects in
an image.In segmentation, use binary operation technique.We can segment the each characters in the form of
binary.
4) TEXT EXTRACTION

In this step, we have to extract the text by using the MSER( Maximally Stable Extremal Region) .
MSER is a method for blob detection in images. The MSER algorithm extracts from an image a
number of co-variant regions, called MSERs: an MSER is a stable connected component of some
gray-level sets of the image.

5) FEATURE EXTRACTION
In feature extraction stage each character is represented as a feature vector, which becomes its
identity.We can extract the features values from the image.In feature extraction, we have to find
the mean, median and variance.
6) CLASSIFICATION
In classification process, we have to classify the features to recognize the text.

Image classification is performed the images in order to identify which image contains text.

A classifier is used to identify the image containing text.

7) TRANSLATION
In this step, we have to translate the extracted text into another language.

In our process, we have to translate the Arabic language into English language.
8) RESULT GENERATION
The Final Result will get generated based on the overall classification and prediction. The performance of this
proposed approach is evaluated using some measures like,

Accuracy : Accuracy of classifier refers to the ability of classifier. It predicts the class label correctly and the
accuracy of the predictor refers to how well a given predictor can guess the value of predicted attribute for a new
data.

AC= (TP+TN)/ (TP+TN+FP+FN)

Precision : Precision is defined as the number of true positives divided by the number of true positives plus the
number of false positives.

Precision=TP/ (TP+FP)
TASK COMPLETED

1. Image Selection
2. Image Preprocessing
3. Segmentation
4. Text Extraction
5. Feature Extraction
6. Classification
7. Translation

TASK TO BE PERFORMED

1. More accuracy and precision to be acheived .


2. Only working for 5000 characters .
TECHNICAL FEASIBILITY
SOFTWARE REQUIREMENTS

1. OS : Windows 7 or above
2. LANGUAGE: Python
3. Frontend : Anaconda Navigator - Spyder

HARDWARE REQUIREMENTS

1. CPU : Pentium IV 2.4 Ghz or above


2. Hard Disk : 200GB
3. Keyboard : 110 Keys Enhanced
4. RAM : 4GB
RELATED WORK
● In 1954, IBM held a first ever public demonstration of a machine translation. The system had
a pretty small vocabulary of only 250 words and it could translate only 49 hand-picked
Russian sentences to English. The number seems minuscule now but the system is widely
regarded as an important milestone in the progress of machine translation.
● The world’s first web translation tool, Babel Fish, was launched by the AltaVista search
engine in 1997.
● Google Translate : It has since changed the way we work (and even learn) with different
languages.
REFERENCES

[1] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford, “CAPTCHA: Using hard ai problems for
security,” in Advances in CryptologyEUROCRYPT. Berlin, Germany: Springer, 2003, pp. 294–311.
[2] A. L. Von, M. Blum, and J. Langford, “Telling humans and computers apart automatically,”
Commun. ACM, vol. 47, no. 2, pp. 56–60, 2004.
[3] E. Bursztein, M. Martin, and J. C. Mitchell, “Text-based CAPTCHA strengths and weaknesses,”
in Proc. 18th ACM Conf. Comput. Commun. Security, 2011, pp. 125–138.
[4] J. Yan and A. S. El Ahmad, “Usability of CAPTCHAs or usability issues in CAPTCHA design,”
in Proc. 4th Symp. Usable Privacy Security, 2008, pp. 44–52.
[5] J. Yan and A. S. E. Ahmad, “Breaking visual CAPTCHAs with naive pattern recognition
algorithms,” in Proc. 23rd Annu. Comput. Security Appl. Conf. (ACSAC), 2007, pp. 279–291.

[6] J. Yan and A. S. El Ahmad, “A low-cost attack on a microsoft CAPTCHA,” in Proc. 15th ACM
Conf. Comput. Commun. Security, 2008, pp. 543–554.

[7] C. Kumar, L. Kevin, Y. S. Patrice, and C. Mary, “Computers beat humans at single character
recognition in reading based human interaction proofs (HIPs),” in Proc. 2nd Conf. Email Anti-Spam
(CEAS), Stanford, CA, USA, 2005, pp. 1–8

[8] H. Gao, W. Wang, J. Qi, X. Wang, X. Liu, and J. Yan, “The robustness of hollow CAPTCHAs,” in
Proc. ACM SIGSAC Conf. Comput. Commun. Security, 2013, pp. 1075–1086.
CONCLUSION
➔ Some long lost ancient language that have never been deciphered yet could be the next one to
get the machine translation treatment.
➔ Translation of language from ancient time can serve as a content for various digital platforms
and can be used to bring revolutionary changes in the field of science, medicine and natural
phenomenon.
➔ In language translation there has been always a problem of ambiguity faced in meanings of
same words translated from different sources, this problem could be solved using ML and
therefore can establish uniformity in language translation.
➔ History of Human evaluation, it's heritage and culture could be understood in more authentic
ways relying in translation on these original ancient manuscripts.

You might also like