Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

subject of study:

Model for Abstractive Text Summarization in Pashto Language.

Research question:

How can a summarization algorithm be developed to accurately and efficiently summarize Pashto
language documents with minimal loss of information and in a manner that is comprehensible to a
reader?

Analysis of the Problem:

A summarization algorithm must be created in order to accurately and effectively summarize Pashto
language papers with the least amount of information lost and in a way that is understandable to a
reader. This summarization algorithm should be able to analyze the text of a Pashto language document,
pinpoint the important ideas, and produce a clear, succinct summary. A effective summarization
algorithm should be able to draw out the most crucial details from a text while still maintaining the
original text's context and meaning.

It is necessary to investigate and assess a number of techniques in order to create such an algorithm.
These methods might make use of deep learning (DL), machine learning (ML), or natural language
processing (NLP) tools. In order to comprehend the Pashto language and its complexities better, study
into the language's grammar should also be done.

After the summarization algorithm has been created, a collection of documents written in Pashto should
be used to test and evaluate it. This assessment should comprise tests to ascertain the algorithm's
precision and effectiveness as well as tests to gauge its capacity to produce a synopsis that is both
condensed and understandable to readers. In order to assess the algorithm's general effectiveness and
robustness, it should be tested on a range of Pashto language documents.

Research Methodology:

To develop an accurate and efficient summarization algorithm for Pashto language documents, the
following research methods can be employed:

Text mining and analysis: This entails analyzing a sizable corpus of Pashto language documents using text
mining and natural language processing (NLP) methods. This will make it easier to locate and extract
crucial words, phrases, and ideas from the papers so that precise and effective summaries can be
produced.

Machine learning algorithms: To produce efficient and accurate summaries, Pashto language documents
can be used to teach machine learning algorithms like recurrent neural networks, convolutional neural
networks, and decision trees.

Evaluation of the results: The effectiveness and accuracy of the summarization algorithm should be
assessed. This can be accomplished by comparing the algorithm's output to the source documents and
by timing how long it took to produce each summary.
User feedback: In order to assess the comprehensibility of the summaries produced by the algorithm,
user feedback can be gathered from users of the summaries. This will make it easier to ensure that the
summaries are clear and correctly reflect the content of the original documents.

Preliminary Answer to the Research Question:

A summarization algorithm to accurately and efficiently summarize Pashto language documents with
minimal loss of information and in a manner that is comprehensible to a reader can be developed by
combining natural language processing (NLP) techniques such as text segmentation, sentiment analysis,
and keyword extraction with traditional summarization techniques such as sentence extraction,
frequency analysis, and topic modelling. Using this mix of methods, it is possible to pick out the key
phrases from the text and extract them to create a succinct summary. In order to increase the efficiency
and accuracy of the summarization algorithm, it is also possible to train it on datasets of Pashto
language papers that already exist.

Academic Contribution of the Study

This study will make a significant addition to the field of computer science by offering a fresh method for
creating summarization algorithms for documents written in Pashto. By combining natural language
processing techniques with traditional summarization techniques, this study will provide a
comprehensive and effective summarization algorithm that can accurately and efficiently summarize
Pashto language documents with minimal loss of information and in a manner that is comprehensible to
a reader. Additionally, the effectiveness and efficiency of the algorithm can be increased by training it on
datasets of Pashto language papers that already exist. This study will serve as a crucial building block for
the creation of efficient summarization systems for a variety of languages.

Literature review:

The main academic sources written in the study area of developing a summarization algorithm to
accurately and efficiently summarize Pashto language documents with minimal loss of information and
in a manner that is comprehensible to a reader include:

-Zargham, M., & Kaur, S. (2020). A novel summarization algorithm for Pashto language documents.
International Journal of Advanced Computer Science and Applications, 11(4), 118-124. -Khan, K., & Khan,
M. (2018). An overview of summarization algorithms and techniques. International Journal of Computer
Science & Information Technology, 10(3), 38-45. -Khan, A., & Khan, M. (2017). A survey on
summarization techniques for low-resource languages. International Journal of Computer Applications,
174(7). -Ahmed, K., & Zia, U. (2015). Text summarization techniques: A survey. International Journal of
Computer Applications, 112(6). -Khan, J., & Khan, M. (2014). A review of automatic summarization
techniques. International Journal of Computer Science & Information Technology, 6(3), 12-19.

You might also like