Full download Text as Data: Computational Methods of Understanding Written Expression Using SAS (Wiley and SAS Business Series) 1st Edition Deville file pdf all chapter on 2024

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Text as Data: Computational Methods

of Understanding Written Expression


Using SAS (Wiley and SAS Business
Series) 1st Edition Deville
Visit to download the full and correct content document:
https://ebookmass.com/product/text-as-data-computational-methods-of-understandin
g-written-expression-using-sas-wiley-and-sas-business-series-1st-edition-deville/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Data Management Essentials Using SAS and JMP Julie M.


Kezik & Melissa E. Hill

https://ebookmass.com/product/data-management-essentials-using-
sas-and-jmp-julie-m-kezik-melissa-e-hill/

Visual Data Insights Using SAS ODS Graphics: A Guide to


Communication-Effective Data Visualization 1st Edition
Leroy Bessler

https://ebookmass.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communication-effective-data-
visualization-1st-edition-leroy-bessler-2/

Visual Data Insights Using SAS ODS Graphics: A Guide to


Communication-Effective Data Visualization 1st Edition
Leroy Bessler

https://ebookmass.com/product/visual-data-insights-using-sas-ods-
graphics-a-guide-to-communication-effective-data-
visualization-1st-edition-leroy-bessler/

SAS for R users : a book for budding data scientists


First Edition Ohri

https://ebookmass.com/product/sas-for-r-users-a-book-for-budding-
data-scientists-first-edition-ohri/
Computational and Data-Driven Chemistry Using
Artificial Intelligence: Fundamentals, Methods and
Applications Takashiro Akitsu

https://ebookmass.com/product/computational-and-data-driven-
chemistry-using-artificial-intelligence-fundamentals-methods-and-
applications-takashiro-akitsu/

Numerical Methods Using Kotlin: For Data Science,


Analysis, and Engineering 1st Edition Haksun Li

https://ebookmass.com/product/numerical-methods-using-kotlin-for-
data-science-analysis-and-engineering-1st-edition-haksun-li-2/

Numerical Methods Using Kotlin: For Data Science,


Analysis, and Engineering 1st Edition Haksun Li

https://ebookmass.com/product/numerical-methods-using-kotlin-for-
data-science-analysis-and-engineering-1st-edition-haksun-li/

Spatial analysis using big data: methods and urban


applications Yamagata

https://ebookmass.com/product/spatial-analysis-using-big-data-
methods-and-urban-applications-yamagata/

Multiphysics Modeling: Numerical Methods and


Engineering Applications: Tsinghua University Press
Computational Mechanics Series 1st Edition Cen

https://ebookmass.com/product/multiphysics-modeling-numerical-
methods-and-engineering-applications-tsinghua-university-press-
computational-mechanics-series-1st-edition-cen/
Text as Data
Wiley and SAS
Business Series
The Wiley and SAS Business Series presents books that help senior
level managers with their critical management decisions.
Titles in the Wiley and SAS Business Series include:

The Analytic Hospitality Executive: Implementing Data Analytics in Hotels


and Casinos by Kelly A. McGuire
Analytics: The Agile Way by Phil Simon
The Analytics Lifecycle Toolkit: A Practical Guide for an Effective Analytics
Capability by Gregory S. Nelson
Anti-­Money Laundering Transaction Monitoring Systems Implementation:
Finding Anomalies by Derek Chau and Maarten van Dijck Nemcsik
Artificial Intelligence for Marketing: Practical Applications by Jim Sterne
Business Analytics for Managers: Taking Business Intelligence Beyond
Reporting (Second Edition) by Gert H. N. Laursen and Jesper Thorlund
Business Forecasting: The Emerging Role of Artificial Intelligence and
Machine Learning by Michael Gilliland, Len Tashman, and Udo
Sglavo
The Cloud-­Based Demand-­Driven Supply Chain by Vinit Sharma
Consumption-­
Based Forecasting and Planning: Predicting Changing
Demand Patterns in the New Digital Economy by Charles W. Chase
Credit Risk Analytics: Measurement Techniques, Applications, and
Examples in SAS by Bart Baesen, Daniel Roesch, and Harald Scheule
Demand-­Driven Inventory Optimization and Replenishment: Creating a
More Efficient Supply Chain (Second Edition) by Robert A. Davis
Economic Modeling in the Post Great Recession Era: Incomplete Data,
Imperfect Markets by John Silvia, Azhar Iqbal, and Sarah Watt House
Enhance Oil & Gas Exploration with Data-­ Driven Geophysical and
Petrophysical Models by Keith Holdaway and Duncan Irving
Fraud Analytics Using Descriptive, Predictive, and Social Network
Techniques: A Guide to Data Science for Fraud Detection by Bart Baesens,
Veronique Van Vlasselaer, and Wouter Verbeke
Intelligent Credit Scoring: Building and Implementing Better Credit Risk
Scorecards (Second Edition) by Naeem Siddiqi
JMP Connections: The Art of Utilizing Connections in Your Data by John
Wubbel
Leaders and Innovators: How Data-­Driven Organizations Are Winning
with Analytics by Tho H. Nguyen
On-­Camera Coach: Tools and Techniques for Business Professionals in a
Video-­Driven World by Karin Reed
Next Generation Demand Management: People, Process, Analytics, and
Technology by Charles W. Chase
A Practical Guide to Analytics for Governments: Using Big Data for Good
by Marie Lowman
Profit from Your Forecasting Software: A Best Practice Guide for Sales
Forecasters by Paul Goodwin
Project Finance for Business Development by John E. Triantis
Smart Cities, Smart Future: Showcasing Tomorrow by Mike Barlow and
Cornelia Levy-­Bencheton
Statistical Thinking: Improving Business Performance (Third Edition) by
Roger W. Hoerl and Ronald D. Snee
Strategies in Biomedical Data Science: Driving Force for Innovation by
Jay Etchings
Style and Statistics: The Art of Retail Analytics by Brittany Bullard
Text as Data: Computational Methods of Understanding Written Expression
Using SAS by Barry deVille and Gurpreet Singh Bawa
Transforming Healthcare Analytics: The Quest for Healthy Intelligence by
Michael N. Lewis and Tho H. Nguyen
Visual Six Sigma: Making Data Analysis Lean (Second Edition) by
Ian Cox, Marie A. Gaudard, and Mia L. Stephens
Warranty Fraud Management: Reducing Fraud and Other Excess Costs in
Warranty and Service Operations by Matti Kurvinen, Ilkka Töyrylä,
and D. N. Prabhakar Murthy

For more information on any of the above titles, please visit www
.wiley.com.
Text as Data
Computational Methods
of Understanding Written Expression
Using SAS

By
Barry deVille and
Gurpreet Singh Bawa
Copyright © 2022 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.


Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or


transmitted in any form or by any means, electronic, mechanical, photocopying,
recording, scanning, or otherwise, except as permitted under Section 107 or 108 of
the 1976 United States Copyright Act, without either the prior written permission
of the Publisher, or authorization through payment of the appropriate per-­copy fee to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978)
750-­8400, fax (978) 750-­4470, or on the web at www.copyright.com. Requests to the
Publisher for permission should be addressed to the Permissions Department, John
Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-­6011, fax (201)
748-­6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have


used their best efforts in preparing this book, they make no representations or
warranties with respect to the accuracy or completeness of the contents of this book
and specifically disclaim any implied warranties of merchantability or fitness for a
particular purpose. No warranty may be created or extended by sales representatives
or written sales materials. The advice and strategies contained herein may not be
suitable for your situation. You should consult with a professional where appropriate.
Neither the publisher nor author shall be liable for any loss of profit or any other
commercial damages, including but not limited to special, incidental, consequential, or
other damages.

For general information on our other products and services or for technical support,
please contact our Customer Care Department within the United States at (800) 762-­
2974, outside the United States at (317) 572-­3993 or fax (317) 572-­4002.

Wiley also publishes its books in a variety of electronic formats. Some content that
appears in print may not be available in electronic formats. For more information
about Wiley products, visit our website at www.wiley.com.

Library of Congress Cataloging-­in-­Publication Data is Available:

9781119487128 (hardback)
9781119487173 (ePDF)
9781119487159 (ePub)

Cover Design: Wiley


To all those who unconditionally love and support authors
and their writing processes – especially our life partners, Maya
McNeilly and Dilpreet Kaur, who go above and beyond.
Contents

Preface xi
Acknowledgments xiii
About the Authors xv

Introduction 1
Chapter 1 Text Mining and Text Analytics 3
Chapter 2 Text Analytics Process Overview 15
Chapter 3 Text Data Source Capture 33
Chapter 4 Document Content and Characterization 43
Chapter 5 Textual Abstraction: Latent Structure, Dimension
Reduction 73
Chapter 6 Classification and Prediction 103
Chapter 7 Boolean Methods of Classification and
Prediction 125
Chapter 8 Speech to Text 139

Appendix A Mood State Identification in Text 157


Appendix B A Design Approach to Characterizing Users
Based on Audio Inter­actions on a Conversational
AI Platform 175
Appendix C SAS Patents in Text Analytics 189
Glossary 197
Index 203

ix
Preface

This book provides an end-­to-­end description of the text analytics


process with examples drawn from a range of case studies using
­
­various capabilities of SAS text analytics and the associated SAS com-
puting environment. Qualitative and quantitative approaches within
the SAS environment are covered across the entire text analytics life
cycle from document capture, document characterization, document
understanding, through operational deployments.
We cover procedure-­ based, engineering approaches to text
­analytics, as well as more discovery-­based quantitative approaches.
Since much of the text analytics process depends on the text capture
and text preprocessing environment, these aspects of text analytics are
covered as well.

xi
Acknowledgments

This work was initiated and promoted by Julie Palmieri, serving


as editor-­ in-­
chief of SAS Press. James Allen Cox has consistently
offered advice and review throughout and gave a detailed review
of early versions of the draft. Tom Sabo gave advice and review and
made significant contributions to the chapter on Boolean rules. Our
colleagues Saratendu Sethi, Terry Woodfield, and Sanford Gayle
­
have provided decades of advice on text analytics in general. Elisha
­Benjamin of John Wiley & Sons was a great source of advice and
assistance throughout the project. Wiley executive editor Sheck Cho
is the consummate professional and both a rock and a beacon for us
aspiring authors.
As authors, we acknowledge their invaluable advice, assistance,
encouragement, and also humbly acknowledge that any remaining
faults are ours alone.

xiii
About the Authors

Barry deVille is a practitioner, developer, and author in the fields


of statistics, data science, and text analytics. During a decades-­long
career at SAS, he collaborated extensively with the text analytic R&D
development team, deploying text mining solutions to a variety of
global clients in various industrial, financial, health, and social media
applications. This work resulted in the award of numerous US pat-
ents on decision tree induction algorithms and multidimensional text
analytics. Prior to joining SAS, he worked with the National Research
Council and other government and commercial entities in Canada in
the development and commercialization of statistical and machine
learning algorithms.

Gurpreet Singh Bawa has practiced internationally in the areas


of statistics with an emphasis on artificial intelligence (AI) and
machine learning (ML). He was awarded a PhD at Panjab Univer-
sity, C
­ handigarh, India, in the fields of AI and ML. He has authored
numerous publications in national and international journals.
His research in the areas of unstructured data analysis have led to
numerous patent applications and awards (including one with co-­
author deVille on social community identification and automatic
document classification). He also works in breakeven analysis and
portfolio optimization. He is ­currently authoring a book on advanced
mathematics.

xv
Text as Data
Another random document with
no related content on Scribd:
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like