Text Mining

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16


1) Introduction
2) Areas of text mining
3) Text Mining Process
4) Benefits and opportunities
5) References
Text Data Mining
• Text mining is the process of transforming unstructured
text into structured data to identify meaningful patterns and
new insights.
• Text mining uses natural language processing (NLP),
allowing machines to understand the human language
and process it automatically.
Areas of text mining in data mining
Information Extraction:
The automatic extraction of structured data such as entities, entities relationships, and attributes
describing entities from an unstructured source is called information extraction.
Natural Language Processing:
NLP stands for Natural language processing.
Computer software can understand human
language as same as it is spoken. NLP is
primarily a component of artificial
Data Mining:
Data mining is the process of sorting
through large data sets to identify
patterns and relationships that can help
solve business problems through data
analysis. Data mining techniques and
tools help enterprises to predict future
trends and make more informed business
Information Retrieval
Information retrieval (IR) is a process that
facilitates the effective and efficient retrieval
of relevant information from large collections
of unstructured or semi-structured data
Text Mining Process:
1) Text Pre-processing
Removing All segmentation

Removing Stop


2) Text transformation (Attribute Generation)
A text document is represented by the words it contains and their occurrences.
A popular approach to document representation is:
3) Feature selection (Attribute Selection)
Feature selection, also known as variable selection, attribute selection or variable subset
selection, is the process of selecting a subset of relevant features (variables, predictors)
for use in model construction.
4) Data Mining
At this point, the Text mining process merges with the traditional process. Classic Data
Mining techniques are used in the structured database. Also, it resulted from the previous
5) Evaluate (Analyzing Results)
 Evaluate the outcome of the complete process.
 If acceptable then results obtained can be used as an input for next set of sequences.
Else, the result can be discarded, and try to understand what and why the process
 Visualization – Prepare visuals from data, and build a prototype
6) Applications and benefits of text mining
Text mining can be used in many ways by companies. The applications
of this technology are limitless and extend to all industries.

 Customer service (sort requests- identifies the topics , intent, complexity, and
language of the requests to organize them - prioritized and processed
requests- before others - analyzing customer feedback and opinions about the
brand and its products.
 Healthcare (For example, information clustering allows to extract information
from medical books in an automated way.
 Cybersecurity ( it is possible to detect and filter spam automatically in email
Benefits and opportunities

 Unlocking 'hidden' information and developing new knowledge

 Improved research and evidence base
 Exploring new horizons
 Improving research process and quality
 Broader benefits
1) A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, "A Survey on Text Classification Algorithms:
From Text to Predictions," Information, vol. 13, no. 2, pp. 83, Feb. 2022.
2) C. Audrin and B. Audrin, "Key factors in digital literacy in learning and education: a systematic literature
review using text mining," Education and Information Technologies, vol. 27, pp. 7395–7419, Feb. 2022
3) V. Dogra, S. Verma, Kavita, P. Chatterjee, J. Shafi, J. Choi, and M. F. Ijaz, "A Complete
Process of Text Classification System Using State-of-the-Art NLP Models," Computational
Intelligence and Neuroscience, vol. 2022, no. 1883698, pp. 1-26, Jun. 2022

You might also like