Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 8

Feature

z Selection
Algorithm
z
Functions
def function_name():
print(“test”)
function_name()
Output: test

def add(x,y):
return x+y #you can either return a value then print function or print it
directly
Print(add(2,3))
Output: 5
z
ML Algorithms

 logistic regression: used if value to predict is Boolean(true or


false) or (1 or 0)

 linear regression: used if value to predict is continuous or


numerical and correlation between values is linear or strong

 polynomial regression: like linear regression but used when the


correlation value is low
z
Imputing Missing Values:
Imputing is a common data pre-processing technique used in machine learning and data analysis to handle missing values in a dataset. When you
have missing data points in your dataset, you can either remove the rows with missing values or fill in those missing values with some
appropriate values. Imputing involves replacing missing values with estimated or calculated values based on the available data.
z
Feature Selection Using
Combinations(nCr)
1. Feature Names Array: You start with a list or array of feature names. These are the variables or attributes that you want to consider for your predictive model.

2. Generating Permutations: You plan to use the itertools library in Python to generate permutations of the feature names array. Combinations are all possible
arrangements of the feature names in different orders but we are able to filter them. i.e.: [[f1,f2,f3],[f6,f2,f4],[f5,f3,f1]…..]

3. Selecting a Subset of Features: From the generated combinations, you choose any 'n' number of feature names. This 'n' represents the number of features you want
to consider in your model. For example, if 'n' is 3, you select three feature names from a permutation.

4. Model Fitting and R-squared: You fit a regression model using the selected subset of features. It's important to note that this model fitting step will be repeated for
each combination of features . For each model, you calculate the R-squared (R^2) value, which measures how well the model fits the data.

5. Storing R-squared Values: You maintain an array or list 'R' to store the R-squared values obtained from each model.

6. Selecting the Best Model: After fitting models with all combinations of features, you identify the combination that produced the highest R-squared value. This
combination of features is considered the best for your predictive model.

7. Final Model and Predictions: You create a final regression model using the selected best features and use it to predict outcomes. This model is expected to perform
well, given that it was chosen based on the highest R-squared value. You can also consider creating models with the 2nd and 3rd best features from the 'R' array.

8. Model Evaluation: After building your final models, you can evaluate their performance using appropriate metrics (e.g., Mean Squared Error, Root Mean Squared
Error ,i.e.: model.score) and assess how well they generalize to new data.
z
FlowChart for Feature Selection and
Linear Regression
z
Code
z
Project modules

 Cleaning,sorting and imputing data.

 Integrate different regression types e.g.: if r^2 < 0.45 (45%) then we use polynomial
regression function and if type(criteria) == bool then we use logistic regression
function

 All together we will have 5 modules:


 Feature selection and combinations
 Linear regression function()
 Polynomial regression function()
 Logistic regression function()
 A decision function which will decide which model we should use based on correlation
value between features

You might also like