Professional Documents
Culture Documents
unit1
unit1
unit1
LEARNING
WITH PYTHON
SEMESTER 5
UNIT - 1
HI COLLEGE
SYLLABUS
UNIT - 1
HI COLLEGE
INTRODUCTION TO MACHINE LEARNING
Machine learning is a rapidly growing field in artificial intelligence (AI) that
focuses on enabling computers to learn and improve from experience without
being explicitly programmed. In other words, machine learning algorithms can
automatically learn and make predictions or decisions based on data.
Machine learning algorithms are trained on large datasets, which they use to
learn patterns and relationships in the data. The algorithms then use this
knowledge to make predictions or decisions about new, unseen data.
1. Finance: Machine learning algorithms are used in finance for tasks such as
fraud detection, credit scoring, stock price prediction, and portfolio
optimization.
2. Healthcare: Machine learning algorithms are used in healthcare for tasks such
as medical image analysis, disease diagnosis, drug discovery, and personalized
medicine.
3. Retail: Machine learning algorithms are used in retail for tasks such as
demand forecasting, inventory optimization, personalized recommendations,
and pricing optimization.
6. Education: Machine learning algorithms are used in education for tasks such
as student performance prediction, personalized learning, and intelligent
tutoring systems.
7. Energy: Machine learning algorithms are used in energy for tasks such as
wind turbine performance prediction, energy demand forecasting, and smart
grid optimization.
The techniques used for training and evaluating binary classifiers can be
adapted for multiclass and multilabel classification problems as well. However,
multiclass and multilabel classification problems can be more challenging than
binary classification due to the increased number of classes and labels that
need to be considered. Techniques such as one-vs-rest (OVR) and one-vs-one
(OVO) are commonly used for multiclass classification, while techniques such as
binary relevance (BR) and label powerset (LP) are commonly used for multilabel
classification.
2. Accuracy: Accuracy is the fraction of correctly classified data points in the test
set. It is calculated as (TP + TN) / (TP + TN + FP + FN). Accuracy is a simple and
intuitive metric, but it may not be meaningful for imbalanced datasets or when
the cost of misclassification is unequal for different classes.
3. Precision & Recall: Precision and recall are measures of how well the classifier
performs for each class separately. Precision is the fraction of true positives
among all positive predictions, while recall is the fraction of true positives
among all actual positives. They are calculated as precision = TP / (TP + FP) and
recall = TP / (TP + FN), respectively. Precision and recall are important because
they provide information about the classifier's ability to correctly identify
positive and negative examples, respectively.
y = mx + b
where m is the slope of the line, b is the y-intercept, and x is the independent
variable. The coefficient of determination (R²) is used to measure the goodness
of fit of the model.
In Python, we can use NumPy and Scipy libraries to perform linear regression
with one variable. Here's an example:
y = mx + b
where m is the slope of the line, b is the y-intercept, and x is the independent
variable. The coefficient of determination (R²) is used to measure the goodness
of fit of the model.
In Python, we can use NumPy and Scipy libraries to perform linear regression
with one variable. Here's an example:
y = mx + b
where m is the slope of the line, b is the y-intercept, and x is the independent
variable. The coefficient of determination (R²) is used to measure the goodness
of fit of the model.
In Python, we can use NumPy and Scipy libraries to perform linear regression
with one variable. Here's an example:
In this example, we first import necessary libraries like NumPy and matplotlib.
We then create a sample dataset with x and y values. Next, we calculate the
coefficients using scipy's linregress function which returns slope, intercept, R²
value along with some other statistics. Finally, we plot the data and regression
line using matplotlib library.
where b0 is the y-intercept, and bi (i = 1, 2, ..., n) are the coefficients for each
independent variable.
In Python, we can use NumPy, Scipy, and Pandas libraries to perform linear
regression with multiple variables. Here's an example:
In this example, we first import necessary libraries like NumPy, Pandas and
matplotlib. We then create a sample dataset in a pandas DataFrame with x1, x2
and y values. Next, we calculate the coefficients using scipy's linregress function
separately for each independent variable and also calculate the intercept using
linregress function with np.ones() array as input to get the y-intercept. Finally,
we print the coefficients and R² values separately for each independent variable
and also print the R² value for both models combined using hstack function
from NumPy library to concatenate the two columns into a single array.
Let's say we want to predict whether a person will buy a product or not based
on their age and income. We have a dataset with these variables and the binary
outcome (whether the person bought the product or not).
First, we calculate the odds of buying the product for each age and income
level using the formula:
For example, if the probability of buying for people aged 25 with an income of
$25,000 is 0.6, then the odds would be:
Next, we calculate the logarithm of these odds using the natural logarithm (ln)
to get the logit values. The logit is the logarithm of the odds and is used as a
linear predictor in logistic regression:
For example, if the odds for people aged 25 with an income of $25,000 is 2.4,
then the logit would be:
Now, we can use linear regression to find the relationship between these logit
values and age and income levels:
Here, b0 is the intercept (the logit value when age and income are both zero),
b1 is the coefficient for age, and b2 is the coefficient for income. By
exponentiating both sides of this equation, we can convert it back to odds: