Professional Documents
Culture Documents
(Download PDF) Feature Engineering For Machine Learning and Data Analytics First Edition Dong Online Ebook All Chapter PDF
(Download PDF) Feature Engineering For Machine Learning and Data Analytics First Edition Dong Online Ebook All Chapter PDF
https://textbookfull.com/product/feature-engineering-for-machine-
learning-principles-and-techniques-for-data-scientists-first-
edition-casari/
https://textbookfull.com/product/the-art-of-feature-engineering-
essentials-for-machine-learning-1st-edition-pablo-duboue/
https://textbookfull.com/product/ai-and-machine-learning-
paradigms-for-health-monitoring-system-intelligent-data-
analytics-hasmat-malik/
https://textbookfull.com/product/advanced-data-analytics-using-
python-with-machine-learning-deep-learning-and-nlp-examples-
mukhopadhyay/
Recent Developments in Machine Learning and Data
Analytics IC3 2018 Jugal Kalita
https://textbookfull.com/product/recent-developments-in-machine-
learning-and-data-analytics-ic3-2018-jugal-kalita/
https://textbookfull.com/product/machine-learning-and-big-data-
analytics-paradigms-analysis-applications-and-challenges-aboul-
ella-hassanien/
https://textbookfull.com/product/data-processing-with-optimus-
supercharge-big-data-preparation-tasks-for-analytics-and-machine-
learning-with-optimus-using-dask-and-pyspark-leon/
https://textbookfull.com/product/intelligent-feature-selection-
for-machine-learning-using-the-dynamic-wavelet-fingerprint-mark-
k-hinders/
https://textbookfull.com/product/scala-and-spark-for-big-data-
analytics-explore-the-concepts-of-functional-programming-data-
streaming-and-machine-learning-1st-edition-md-rezaul-karim/
FEATURE ENGINEERING FOR
MACHINE LEARNING AND
DATA ANALYTICS
Chapman & Hall/CRC
Data Mining and Knowledge Series
Series Editor: Vipin Kumar
RapidMiner
Data Mining Use Cases and Business Analytics Applications
Markus Hofmann and Ralf Klinkenberg
Computational Business Analytics
Subrata Das
Data Classification
Algorithms and Applications
Charu C. Aggarwal
Healthcare Data Analytics
Chandan K. Reddy and Charu C. Aggarwal
Accelerating Discovery
Mining Unstructured Information for Hypothesis Generation
Scott Spangler
Event Mining
Algorithms and Applications
Tao Li
Text Mining and Visualization
Case Studies Using Open-Source Tools
Markus Hofmann and Andrew Chisholm
Graph-Based Social Media Analysis
Ioannis Pitas
Data Mining
A Tutorial-Based Primer, Second Edition
Richard J. Roiger
Data Mining with R
Learning with Case Studies, Second Edition
Luís Torgo
Social Networks with Rich Edge Semantics
Quan Zheng and David Skillicorn
Large-Scale Machine Learning in the Earth Sciences
Ashok N. Srivastava, Ramakrishna Nemani, and Karsten Steinhaeuser
Data Science and Analytics with Python
Jesus Rogel-Salazar
Feature Engineering for Machine Learning and Data Analytics
Guozhu Dong and Huan Liu
Edited by
Guozhu Dong and Huan Liu
MATLAB• is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks
does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion
of MATLAB• software or related products does not constitute endorsement or sponsorship by The
MathWorks of a particular pedagogical approach or particular use of the MATLAB• software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To my family, especially baby Hazel [G. D.]
Preface xv
Contributors xvii
vii
viii Contents
Index 395
Preface
Feature engineering plays a vital role in big data analytics. Machine learning
and data mining algorithms cannot work without data. Little can be achieved
if there are few features to represent the underlying data objects, and the
quality of results of those algorithms largely depends on the quality of the
available features. Data can exist in various forms such as image, text, graph,
sequence, and time series. A common way to represent data for data analytics
is to use feature vectors. Feature engineering meets the needs in the generation
and selection of useful features, as well as several other issues.
This book is devoted to feature engineering. It covers various aspects
of feature engineering, including feature generation, feature extraction, fea-
ture transformation, feature selection, and feature analysis and evaluation. It
presents concepts, methods, examples, as well as applications.
Feature engineering is often data type specific and application dependent.
This calls for multiple chapters on different data types that require specialized
feature engineering techniques to meet various data analytic needs. Hence, this
book contains chapters on feature engineering for major data types such as
texts, images, sequences, time series, graphs, streaming data, software engi-
neering data, Twitter data, and social media data. It also contains generic
feature generation approaches, as well as methods for generating tried-and-
tested, hand-crafted, domain-specific features.
This book contains many useful feature engineering concepts and tech-
niques, which are an important part of machine learning and data analytics.
They can help readers to meet their needs in multiple scenarios: (a) gener-
ate features to represent the data when there are no features, (b) generate
effective features when (one may be concerned that) existing features are
not good/competitive enough, (c) select features when there are too many
features, (d) generate and select effective features for specific types of appli-
cations, and (e) understand the challenges associated with, and the needed
approaches to handle, various data types. This list is certainly not exhaustive.
The first chapter is an introduction, which defines the concepts of fea-
tures and feature engineering, offers an overview of the book, and provides
pointers to topics not covered in this book. The next six chapters are devoted
to feature engineering, including feature generation, for specific data types,
namely texts, images, sequences, time series, graphs, and streaming data. The
subsequent four chapters cover generic approaches for feature engineering,
namely feature selection, feature transformation-based feature engineering,
xv
xvi Preface
FOOTNOTES:
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.