Welcome to Scribd!

Feature Engineering PDF

Uploaded by

0% found this document useful (0 votes)

61 views19 pages

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to improve machine learning results. It involves extracting relevant attributes from data, creating new features from existing ones, selecting important features, and reducing dimensionality. Some key techniques include feature extraction, creation, selection, and dimensionality reduction methods like PCA. Feature engineering is especially important when data is limited to avoid overfitting.

Original Description:

Original Title

Feature-Engineering.pdf

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

61 views19 pages

Feature Engineering PDF

Uploaded by

Rutuparn Dalvi

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 19

Search inside document

Feature Engineering

(In Machine Learning)

What is a feature?

An attribute (coordinate) of the observation (point)

that is important from learning or prediction point
of view

Not all attributes are features

2
Examples of Features …
• An attribute (in a table)
• Line in an image
• A phrase
• A word count

3
What is Feature Engineering?

A set of steps taken to present the original and / or

transformed data to a machine learning strategy –
such that inherent important structures in the data
are exposed for the purpose of model creation

4
What is Feature Engineering?

5
Feature Engineering: when required, and not …
• Feature engineering is required when …
– Limited data is available
• “Curse of dimensionality” if more features are considered in
model building
• Cases of over-fitting if there are more features and less data
– Limited computation power

• Feature engineering may not be required when …

– Copious data availability (eg. images, server logs)
– Computation power is not an issue (eg. cloud computing)
– Most important: availability of universal function
approximators
• Artifical Neural Networks, Deep Learning Networks

6
The Feature Engineering Process
Execution of the following steps

1. Create / identify set of relevant features

2. Fit a model and run validation tests
3. Re-design or re-select features based on results of
validation
4. Perform step 2

Repeat the process until ‘satisfactory’ results are

obtained or there is no further improvement

7
Components of Feature Engineering
• Feature Extraction
• Feature Creation
• Feature Selection
• Dimensionality Reduction
– PCA, SVD

8
Feature Extraction
• Goal
– To increase the level of abstraction
– To reduce the total data sent into learning algorithms
• Example
– Edge detection in images
– Curvature detection in 3d models
– Number of concavities / convexities in 3D models
– Identifying regions with same “colours”
• Satellite imaging
• Temperature based tool condition monitoring
• MRI / X-Ray processing
9
Feature Extraction

10
Feature Extraction

11
Feature Creation
• Goal: To create a set of attributes
– based on domain knowledge or pre-processing /
visualization
– that are known to better describe the structure of the data
to be processed
• Example:
– In linear regression, addition of new terms like ‘log’, ‘tanh’,
‘exp’, ‘sin’, square, cube, x1 * x2 (feature combinations),
etc.
– One-hot encoding: creation of dummy variables
– Discretizing continuous attributes
– Combining multiple attributes into one feature
– Addition of new terms resulting from ‘feature extraction’
12
One-hot-encoding

13
Feature Selection
• Goal: To reduce the total number of ‘features’ sent
into the machine learning algorithm
– To reduce model complexity and model computation time
• Methods
– Forward selection:
• Start with minimal set and gradually add features
– Backward selection:
• Start with a maximal set and gradually reduce features
– Filter
• Based on analysis such as Pearson correlation coefficient
– Embedded methods
• LASSO (Least Absolute Shrinkage and Selection Operator: L1
penalty), Ridge (L2), ElasticNet (L1+L2) Regression

14
Feature Selection: Forward Selection

15
Feature Selection: Forward Selection

16
Feature Selection: Backward Selection

17
Feature Selection: Backward Selection

18
Dimensionality Reduction
• Goal:
– To reduce number of features by identifying feature combinations
• Example
– Principal Component Analysis

Test Bank For Analytics Data Science and Artificial Intelligence 11th Ediiton Sharda
Document12 pages
Test Bank For Analytics Data Science and Artificial Intelligence 11th Ediiton Sharda
CassandraDuncanmcytd
100% (42)
Feature Engineering PDF
Document19 pages
Feature Engineering PDF
Rutuparn Dalvi
No ratings yet
Inbound 3415279694782152083
Document6 pages
Inbound 3415279694782152083
joy15102000
No ratings yet
6 - Machine Learning 2
Document14 pages
6 - Machine Learning 2
sdog444514
No ratings yet
DX Intro 19.0 M01 Lecture Slides Introduction
Document24 pages
DX Intro 19.0 M01 Lecture Slides Introduction
Milind Devle
No ratings yet
Feature Engineering
Document13 pages
Feature Engineering
DARLING PEARL SANTOS
No ratings yet
M56. Dasar Data Analytics Menggunakan Python
Document19 pages
M56. Dasar Data Analytics Menggunakan Python
Fahmi Ramdhani
No ratings yet
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
Document36 pages
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
prime9316586191
No ratings yet
Lecture 11
Document24 pages
Lecture 11
RAHAT
No ratings yet
Creating Intuitive Interactive Dashboards With The Adf Data Visualization Components
Document74 pages
Creating Intuitive Interactive Dashboards With The Adf Data Visualization Components
nandhuln
No ratings yet
Creating Intuitive Interactive Dashboards With The Adf Data Visualization Components
Document74 pages
Creating Intuitive Interactive Dashboards With The Adf Data Visualization Components
nandhuln
No ratings yet
Chapter 6 Object Recognition
Document51 pages
Chapter 6 Object Recognition
sana one
No ratings yet
DataAnalytic-03 - Data Analytics Implementation
Document37 pages
DataAnalytic-03 - Data Analytics Implementation
kadnan
No ratings yet
Data Pre-Processing Python For Beginner
Document12 pages
Data Pre-Processing Python For Beginner
Bongkar Taktik
No ratings yet
Data Pre-Processing Python For Beginner
Document12 pages
Data Pre-Processing Python For Beginner
Bongkar Taktik
No ratings yet
AAM 1st Unit QB
Document4 pages
AAM 1st Unit QB
Sachin Mahale
No ratings yet
01 Phan Tich Dau Tu Nang Cao - CRISP Trong KHDL
Document37 pages
01 Phan Tich Dau Tu Nang Cao - CRISP Trong KHDL
TUYỀN TRỊNH BÍCH
No ratings yet
ITA6016 - Machine Learning Introduction
Document13 pages
ITA6016 - Machine Learning Introduction
Nikhil Kumar sarwgi
No ratings yet
Feature Selection: Slide 1
Document29 pages
Feature Selection: Slide 1
Prathik Narayan
No ratings yet
Mtech Big Data Analytics SRM
Document13 pages
Mtech Big Data Analytics SRM
Roam Simenthy
No ratings yet
OOAD01 AnalysisDesign
Document41 pages
OOAD01 AnalysisDesign
vo cuong
No ratings yet
Semi Supervised Learning
Document86 pages
Semi Supervised Learning
chaudharylalit025
No ratings yet
Mentor Introduction
Document22 pages
Mentor Introduction
Nguyen Dinh
No ratings yet
Week 1:: Data Structure and Algorithm
Document66 pages
Week 1:: Data Structure and Algorithm
malik muneeb
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
Document6 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
goci
No ratings yet
CS464 Ch1 Intro Fall2020
Document83 pages
CS464 Ch1 Intro Fall2020
Mathias Bueno
No ratings yet
Karan Mini Proj
Document11 pages
Karan Mini Proj
Karan D Parge
No ratings yet
crisp (2)
Document28 pages
crisp (2)
ananttyagi089
No ratings yet
Chapter 1.2. Overview of ML
Document17 pages
Chapter 1.2. Overview of ML
Sơn Trịnh
No ratings yet
Lab #3
Document12 pages
Lab #3
Akriti Nigam
No ratings yet
Data Preprocessing Part 3
Document31 pages
Data Preprocessing Part 3
new acc jeet
No ratings yet
2020 R1 Mechanical Overview Presentation
Document60 pages
2020 R1 Mechanical Overview Presentation
Emre ATAY
No ratings yet
Crisp DM
Document30 pages
Crisp DM
Kautilya Parmar
100% (1)
Feature Engineering For Machine Learning
Document41 pages
Feature Engineering For Machine Learning
mngcezyvlmcpadgnpn
No ratings yet
Feature Engineering PDF
Document75 pages
Feature Engineering PDF
jc224
100% (1)
Qconsp17 Featureengineering 170426171227
Document75 pages
Qconsp17 Featureengineering 170426171227
Prashanth Mohan
No ratings yet
UNIT 1 (ML For DS)
Document10 pages
UNIT 1 (ML For DS)
lucifer.sh18
No ratings yet
Machine Learning Zero To Hero 6 Weeks
Document6 pages
Machine Learning Zero To Hero 6 Weeks
Hitesh Mali
No ratings yet
Machine Learning Feature Engineering: Features
Document17 pages
Machine Learning Feature Engineering: Features
BHAVIN THUMAR
No ratings yet
Data Mining Data Transformations: Gergely Lukács
Document51 pages
Data Mining Data Transformations: Gergely Lukács
Blazs
No ratings yet
Deep Learning Vocabulary
Document6 pages
Deep Learning Vocabulary
jaffar bikat
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
Document33 pages
Chapter Five Principal Comonent Analysis (PCA)
Ruun Mohamed
No ratings yet
SQL SERVER 2005/2008 Performance Tuning For The Developer: Michelle Gutzait
Document112 pages
SQL SERVER 2005/2008 Performance Tuning For The Developer: Michelle Gutzait
sauravkgupta5077
No ratings yet
Organising ML Projects
Document52 pages
Organising ML Projects
max biscene
No ratings yet
Instrumentation and Measurement: Csci 599 Class Presentation Shreyans Mehta
Document22 pages
Instrumentation and Measurement: Csci 599 Class Presentation Shreyans Mehta
Bhanu K Prakash
No ratings yet
DBA-T2.C4-Database Optimizationsssss
Document30 pages
DBA-T2.C4-Database Optimizationsssss
Elena Smo
No ratings yet
Thesis Capstone Proposal Defense
Document18 pages
Thesis Capstone Proposal Defense
Eli de Ocampo
No ratings yet
Data Science With Python Training in Bangalore - Python Training Institutes in Bangalore, Marathahalli, Jayanagar
Document8 pages
Data Science With Python Training in Bangalore - Python Training Institutes in Bangalore, Marathahalli, Jayanagar
itrain
100% (1)
ML Lab Manual
Document38 pages
ML Lab Manual
Rahul
No ratings yet
ML.2-ML Project (Week 2)
Document12 pages
ML.2-ML Project (Week 2)
Sơn Trịnh
No ratings yet
L 03 Lifecycle
Document18 pages
L 03 Lifecycle
Yahya Ramadhan
No ratings yet
Crisp
Document8 pages
Crisp
Jose Rafael Cruz
No ratings yet
The Following Was Presented at DMT'09 (May 10-13, 2009)
Document29 pages
The Following Was Presented at DMT'09 (May 10-13, 2009)
Zul Tukang Mancing
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
Document51 pages
Lecture 17&18 - Introduction To Machine Learning
bscs-20f-0009
No ratings yet
Crispslides
Document20 pages
Crispslides
Miguel pilamunga
No ratings yet
3-Data Considerations
Document46 pages
3-Data Considerations
max biscene
No ratings yet
SE 458 - Data Mining (DM) : Spring 2019 Section W1
Document10 pages
SE 458 - Data Mining (DM) : Spring 2019 Section W1
rock
No ratings yet
TDD Training
Document130 pages
TDD Training
tevi
No ratings yet
Machine Learning Spark ML
Document14 pages
Machine Learning Spark ML
Perike Chandra Sekhar
No ratings yet
Week 0601
Document23 pages
Week 0601
sam zak
No ratings yet
Python: Deeper Insights into Machine Learning
From Everand
Python: Deeper Insights into Machine Learning
John Hearty
No ratings yet
UNIT I Notes-Communications
Document17 pages
UNIT I Notes-Communications
Akash Ranjan
No ratings yet
PPT12-W12-Big Data Visualization
Document29 pages
PPT12-W12-Big Data Visualization
annisaaam72
No ratings yet
IJCRT22A6701
Document8 pages
IJCRT22A6701
shital shermale
No ratings yet
Spe-181001-An Integrated System For Drilling Real Time Data Analytics ARAMCO
Document11 pages
Spe-181001-An Integrated System For Drilling Real Time Data Analytics ARAMCO
mbkh7117
No ratings yet
Please Check
Document34 pages
Please Check
dikpalak
No ratings yet
Neural Networks Vs Random Forests
Document8 pages
Neural Networks Vs Random Forests
Franco Capuano
No ratings yet
DTFT Problems
Document5 pages
DTFT Problems
Megha G Krishnan
No ratings yet
Facial Emotions Based PC Access For The Benefit of Autistic People
Document6 pages
Facial Emotions Based PC Access For The Benefit of Autistic People
jayanthikrishnan
No ratings yet
PIM Digital Redesign and Experiments of A Roll-Angle Controller For A VTOL-UAV
Document6 pages
PIM Digital Redesign and Experiments of A Roll-Angle Controller For A VTOL-UAV
Yury Permiakov
No ratings yet
cs231n 2017 Lecture4
Document100 pages
cs231n 2017 Lecture4
Tarik
No ratings yet
Oral Communication (Reviewer)
Document4 pages
Oral Communication (Reviewer)
Frances Nicole Arce
No ratings yet
Prepared By: Romnick S. Levantino
Document13 pages
Prepared By: Romnick S. Levantino
Judith Mae Caparoso
No ratings yet
CAP873
Document2 pages
CAP873
NIdhi
100% (1)
Computational Linguistics
Document3 pages
Computational Linguistics
Shehwaar Aman
No ratings yet
Artificial Intelligence Questions
Document15 pages
Artificial Intelligence Questions
vignesh
No ratings yet
Control System Design by Frequency Response Using Matlab: Riyadh Nazar Ali AL-Gburi, Ali Saleh Aziz
Document7 pages
Control System Design by Frequency Response Using Matlab: Riyadh Nazar Ali AL-Gburi, Ali Saleh Aziz
Aman
No ratings yet
Lectura 1 Artificial - Intelligence - Enabled - Project - Management
Document23 pages
Lectura 1 Artificial - Intelligence - Enabled - Project - Management
liberatahilario
No ratings yet
To Design and Implement An FIR Filter For Given Specifications
Document8 pages
To Design and Implement An FIR Filter For Given Specifications
4NM19EC157 SHARANYA R SHETTY
No ratings yet
Ruzlaini Ghoni (cd7272) - Chap 3
Document5 pages
Ruzlaini Ghoni (cd7272) - Chap 3
Vignesh Ramakrishnan
No ratings yet
Matlab Robust Control Toolbox
Document228 pages
Matlab Robust Control Toolbox
dunerto
No ratings yet
Chapter 2
Document24 pages
Chapter 2
dtran2354
No ratings yet
Quantum Capsule Networks
Document18 pages
Quantum Capsule Networks
Ridho Akademik
No ratings yet
Acst Manual
Document20 pages
Acst Manual
Mauricio Ávalos Norambuena
No ratings yet
Automatic Image Caption Generation System
Document4 pages
Automatic Image Caption Generation System
International Journal of Innovative Science and Research Technology
No ratings yet
Anshul Dyundi Machine Learning July 2022
Document46 pages
Anshul Dyundi Machine Learning July 2022
Anshul Dyundi
50% (2)
Non Verbal Symbols
Document3 pages
Non Verbal Symbols
Ken
No ratings yet
3 Structure Query Language SQL
Document30 pages
3 Structure Query Language SQL
Dibyesh
No ratings yet
Deep-Learning-Keras-Tensorflow - 1.1.1 Perceptron and Adaline - Ipynb at Master Leriomaggio - Deep-Learning-Keras-Tensorflow
Document11 pages
Deep-Learning-Keras-Tensorflow - 1.1.1 Perceptron and Adaline - Ipynb at Master Leriomaggio - Deep-Learning-Keras-Tensorflow
me andan buscando
No ratings yet
Iceberg Ship Classification of Satellite Radar Images
Document25 pages
Iceberg Ship Classification of Satellite Radar Images
Karan Upare
No ratings yet