Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 16

Text Analytics and Text Mining Overview

•According to a study by Merrill Lynch and Gartner, 85 percent of all


corporate data is captured and stored in some sort of unstructured form
(McKnight, 2005)
•A vast majority of business data are stored in text documents that are
virtually unstructured
•Because knowledge is power in today’s business world, and knowledge is
derived from data and information, businesses that effectively and
efficiently tap into their text data sources will have the necessary
knowledge to make better decisions, leading to a competitive advantage
over those businesses that lag behind
• Even though the overarching goal for both text analytics and text
mining is to turn unstructured textual data into actionable
information through the application of natural language processing
(NLP) and analytics, their definitions are somewhat different, at
least to some experts in the field
• According to them, text analytics is a broader concept that
includes information retrieval (e.g., searching and identifying
relevant documents for a given set of key terms) as well as
information extraction, data mining, and Web mining, whereas
text mining is primarily focused on discovering new and useful
knowledge from the textual data sources
• Following figure Illustrates the relationships between text
analytics and text mining along with other related application
areas
• Text Analytics = Information Retrieval + Information Extraction +
Data Mining + Web Mining
or simply

• Text Analytics = Information Retrieval + Text Mining

• While the term text analytics is more commonly used in a business


application context, text mining is frequently used in academic
research circles. Even though they may be defined somewhat
differently at times, text analytics and text mining are usually used
synonymously
• Text mining is the same as data mining in that it has the same
purpose and uses the same processes, but with text mining the
input to the process is a collection of unstructured (or less
structured) data files such as Word documents PDF files, text
excerpts, XML files, and so on

• In essence, text mining can be thought of as a process (with two


main steps) that starts with imposing structure on the text-based
data sources followed by extracting relevant information and
knowledge from this structured text-based data using data mining
techniques and tools
• The benefits of text mining are obvious in the areas where very
large amounts of textual data are being generated, such as law
(court orders), academic research (research articles), finance
(quarterly reports), medicine (discharge summaries), biology
(molecular interactions), technology (patent files), and marketing
(customer comments)

• - Analysing customer complaints / Feedback


• - Focus group data
• - Email
Popular application areas of text mining

•Information extraction

•Topic tracking

•Summarization

•Categorization

•Clustering

•Concept linking

•Question answering
• Information extraction: Identification of key
phrases and relationships within text by
looking for predefined objects and sequences
in text by way of pattern matching.

• Topic tracking: Based on a user profile and


documents that a user views, text mining can
predict other documents of interest to the
user.
• Summarization: Summarizing a document to
save time on the part of the reader.

• Categorization: Identifying the main themes


of a document and then placing the document
into a predefined set of categories based on
those themes.

• Clustering: Grouping similar documents


without having a predefined set of categories.
• Concept linking: Connects related documents
by identifying their shared concepts and, by
doing so, helps users find information that
they perhaps would not have found using
traditional search methods.

• Question answering: Finding the best answer


to a given question through knowledge-driven
pattern matching.
• Machine learning is the sphere of computer science that tries to provide
computers the ability to learn without being explicitly programmed. One of
areas of study in ML is the Natural Language Processing

• Natural Language Processing broadly refers to the study and development of


computer systems that can interpret speech and text as humans naturally speak
and type it

• It is obvious that it is impossible to analyze natural language without clear


understanding of how the language forms are being built, without deep
knowledge of language meaning and context. And this is where linguistics helps
researchers to do their work. This study of language plays significant role in the
human language analysis; linguistics observes the interplay between sound and
meaning.

• Linguistics is the scientific study of language, including its grammar, semantics,


and phonetics.
Other Text Mining Applications:

Marketing Applications
•Increase cross-selling and up-selling by analyzing the unstructured
data generated by call centers. Text generated by call center notes as
well as transcriptions of voice conversations with customers can be
analyzed by text mining algorithms to extract novel, actionable
information about customers’ perceptions toward a company’s
products and services.

•Additionally, blogs, user reviews of products at independent Web


sites, and discussion board postings are a gold mine of customer
sentiments. This rich collection of information, once properly
analyzed, can be used to increase satisfaction and the overall lifetime
value of the customer
Security Applications

•One of the largest and most prominent text mining applications in the
security domain is probably the highly classified ECHELON surveillance
system. As rumour has it, ECHELON is assumed to be capable of
identifying the content of telephone calls, faxes, e-mails, and other
types of data, intercepting information sent via satellites, public-
switched telephone networks, and microwave links
•In 2007, EUROPOL developed an integrated system capable of
accessing, storing, and analyzing vast amounts of structured and
unstructured data sources in order to track transnational organized
crime. Called the Overall Analysis System for Intelligence Support
(OASIS), this system aims to integrate the most advanced data and text
mining technologies available in today’s market. The system has enabled
EUROPOL to make significant progress in supporting its law enforcement
objectives at the international level (EUROPOL, 2007)
• Another security-related application of text mining is in the area of
deception detection . Applying text mining to a large set of real-
world criminal (person-of-interest) statements, Fuller et al. (2008)
developed prediction models to differentiate deceptive statements
from truthful ones

• Using a rich set of cues extracted from the textual statements, the
model predicted the holdout samples with 70 percent accuracy,
which is believed to be a significant success considering that the
cues are extracted only from textual statements (no verbal or
visual cues are present)
• Biomedical Applications

• The identification of chemical compounds: identifying their structures and the


relations between them; and identifying drugs in which the particular
compound is used, along with their respective side effects and toxicity

• Disease research such as cancer: several applications are developed to provide


easy access to the most recent developments in cancer research

• Genetics: gathering the most recent information about complex processes


involving genes and proteins

• Indexing Medline documents

• Finding risk factors of a disease


• Academic Applications

• Analysing student applications


• Evaluation of written assignments
• Course evaluations
• Classifying open ended survey responses
• Course recommendations to students

You might also like