For my capstone project, I decided to address the problem of the
inaccuracies of machine translation, and the reasons for its errors. In order to better understand this, I realized I would first have to have in-depth knowledge of how languages work. This means that I did not study English grammar; I studied the universal rules that apply to all language structures. These rules, when such as which letters and changes cause a verb to be in a different tense or an adjective to change the subject its describing, vary from language to language, but each has its own function.
Background research: I have done no small amount of background research on different languages. From November until the beginning of May, I travelled to Fort Worth every Thursday to intern at Teneo Linguistics Company. Teneo is a company that translated documents and modifies them so that they do not have any offensive or unintended meanings in English. They pay special attention to semantics to make sure that a clients advertisements will attract English- speaking customers. Teneo put me in contact with translators of five different languages: Spanish, French, Haitian Creole, German, and Mandarin Chinese. I learned a lot about the quirks of different languages from them. Did you know that Haitian Creole has no passive voice? If you want to speak passively in Creole, it takes a lot of extra effort.
Specify Requirements: The requirements of my capstone project are that I must effectively categorize five languages Spanish, French, Haitian Creole, German, and Mandarin Chinese so that I can identify the similarities and differences of linguistic structures. I will also work to develop theories of how parts of speech are associated so that I can understand language evolution. After understanding these basic concepts, I can then work to improve the algorithms that control machine translation. Someday, I hope to become a computational linguist and work on artificial intelligence (AI), specializing in how the AI system will communicate.
Develop model: Ah, data modeling this is whats difficult about the project that Ive chosen. I have been working on modeling the basic structure of language. As of now, its a work in progress. I became sidetracked with studying semantics and the troubles of translating homophones, homonyms, and slang. In fact, there is a short booklet about mistranslations here that you can look through. The model is more than sentence structure, though its about the different ways words can be connected and the different meanings they can have based on tone, placement, arbitrariness, etc.
Test and refine: I have refined my research, models, and theories countless times as I discover new information. Linguistics truly is a social science. Almost constant revision is needed because it is such a complex subject, with branches of the field in neuroscience, biology, computer science, history, psychology, and many other fields. I am currently in the process of refining my model of basic universal language structure, so that I can someday transfer those theories to machine translation and artificial intelligence.