Professional Documents
Culture Documents
lecture 6
lecture 6
lecture 6
Computer in Linguistics
Farig Sadeque
Assistant Professor
Computer Science and Engineering
BRAC University
Development of Bangla language
technology: scope and necessity
Talking points
- Summary of the current status
- Components
- Spell and grammar checker
- Translation
- OCR
- Sentiment analysis
- Speech to text and text to speech
- Plagiarism checker
- Question answering
- Digital assistant
- Sign language to text converter
Summary
Summary
Existing Bangla NLP market analysis
- Market opportunity
- Existing tech
- Entry barrier and challenges
Spell and grammar checker
- Market opportunity
Target Segmentation
Journalists
Students
Translators
Govt Employees
Existing tech
- One spell checker from EBLICT: https://spell.bangla.gov.bd/
- Some other online spell checkers
- No grammar check/correction tool
- Some SOTA research came out of the ভাষাভ্রম competition this year
Translation
- English to Bangla and Bangla to English
- Potential Market
- 75% of consumers are more likely to buy products from websites in their native language
- 65% of non-native English speakers prefer content in their native tongue
- Was valued at USD 650 million in 2020 and is expected to reach USD 3 billion by 2027
- Interested communities:
- Technology & manufacturing: Translate manuals of different machineries and different products
- Global business people: Translate to understand cultural statements better
- Finance and legal: translate documents without any contextual mistakes
- Marketing (copy & content writers): Translate from Bangla to English or English to Bangla to
advertise products
- E-commerce: Translate to communicate product information
- Healthcare: Translate important healthcare information
- Freelance writers
Existing tech
- Multiple government initiative
- Amar Vasha was supposed to use artificial intelligence to translate Supreme Court orders and
decisions from English to Bangla
- BUET CSE published a 2.75 million sentence-pair translation corpus
- Google's proprietary machine translation technology, dubbed Google Neural
Machine Translation (GNMT), employs recurrent neural networks
- Over 4,000 volunteers from 81 locations throughout the nation entered at
least 400,000 words into the translation software on a single day to celebrate
Independence Day
Entry barriers
- Bangla language structure
- Collected corpus was never deployed to build a proper software
- Why?
OCR
- Potential market: globally valued at 10.65 billion USD
Segment Application
Segmentation Use-Case
People with certain disabilities Making a draft only with using voice