Applications of Linear Algebra in Data Science

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Assignment of:-

Elementary Linear Algebra

Topic Name: -

Application of Linear Algebra in Science and

Submitted to:-

Mr.Tariq Hussain

Submitted by:

Kashif Iftikhar

Roll No: -

Bsf1903713(Bs IT 3rd)

(University of Education Lahore Multan Campus


Applications of Linear Algebra In Data Science

People think of the field of data science in general, or of specific areas of it, such as
natural language processes, machine learning, or computer vision, they rarely consider
linear algebra. The reason linear algebra is often overlooked is that tools used today to
implement data science algorithms do an excellent job in hiding the underlying maths
that make everything come true. Most of the time, people avoid getting into linear
algebra because it’s “difficult” or “hard to understand.” Although partly true, being
familiar with linear algebra is an essential skill for data scientists and computer
engineers.

You might doubt my argument, as you can implement many algorithms in machine


learning and data science without needing to go deep into the math. Nevertheless,
knowing the math behind the algorithms give you a new perspective of that algorithm,
and hence opens up new doors and applications for you to explore.

Linear algebra is the core of many known data science algorithms. In this article, I will
discuss three applications of linear algebra in three data science fields. From machine
learning, we will talk about loss functions, from natural language processing, we will talk
about word embedding, and finally, from computer vision, we will cover image
convolution.

Machine Learning

Machine learning is, without a doubt, the most known application of artificial
intelligence (AI). The main idea behind machine learning is giving systems the power to
automatically learn and improve from experience without being explicitly programmed
to do so. Machine learning functions through building programs that have access to data
(constant or updated) to analyze, find patterns and learn from. Once the programs
discover relationships in the data, it applies this knowledge to new sets of data. You can
read more about how algorithms learn from this article.
Linear algebra has several applications in machine learning, such as loss functions,
regularization, support vector classification, and much more. In this article, however, we
will only cover linear algebra in loss functions.

Loss Function

The way machine learning algorithms work is, they collect data, analyze it, and then build
a model using one of many approaches (linear regression, logistic regression, decision
tree, random forest, etc.). Then, based on the results, they can predict future data
queries.

But…

How can you measure the accuracy of your prediction model?

Using linear algebra, in particular using loss functions. The loss function is a method of
evaluating how accurate your prediction models are. Will it perform well with new
datasets? If your model is totally off, your loss function will output a higher number.
Where if it were a good one, the loss function would output a lower amount.

Regression is modeling a relationship between a dependent variable, Y, and several


independent variables, Xi’s. After this relation is plotted, we try to fit a line in space on
these variables, and then use this line to predict future values of Xi’s. There many types
of loss functions, some of which are more complicated than others; however, the most
commonly used two are Mean Squared Error and Mean Absolute Error.

 Mean Squared Error

Mean Squared Error (MSE) is probably the most used loss error approach, easy to
understand and implement and generally works quite well in most regression problems.
Most Python libraries used in data science, Numpy, Scikit, and TensorFlow have their
own built-in implementation of the MSE functionality. Despite that, they all work based
on the same equation:

Take the square of the difference Where N is the number of data points in both the
observed and predicted values.

Steps of calculating the MSE:

1. Calculate the between each pair of the observed and predicted values.

2. difference.

3. Add the squared differences together to find the cumulative value.

4. Calculate the average error of the cumulative sum.

Here is the Python code to calculate and plot the MSE.

#Set data
x = list(range(1,6)) #data points
y = [1,1,2,2,4] #original values
y_bar = [0.6,1.29,1.99,2.69,3.4] #predicted values
summation = 0
n = len(y)
for i in range(0, n):
# finding the difference between observed and
predicted value
difference = y[i] - y_bar[i]
squared_difference = difference**2 # taking square
of the differene
# taking a sum of all the differences
summation = summation + squared_difference
MSE = summation/n # get the average of all
print("The Mean Square Error is: ", MSE)
#Plot relationship
plt.scatter(x, y, color='#06AED5')
plt.plot(x, y_bar, color='#1D3557', linewidth=2)
plt.xlabel('Data Points', fontsize=12)
plt.ylabel('Output', fontsize=12)
plt.title("MSE")

Most data scientists don’t like to use the MSE because it may not be a
perfect representation of the error. However, it is usually used as an
intermediate step to Root Mean Squared Error (RMSE), which can be
easily obtained by taking the square root of the MSE.

 Mean Absolute Error

The Mean Absolute Error (MAE) is quite similar to the MSE; the
difference is, we calculate the absolute difference between the observed
data and the protected one.
The MAE cost is more robust compared to MSE. However, a
disadvantage of MAE is that handling the absolute or modulus operator
in mathematical equations is not easy. Yet, MAE is the most intuitive of
all the loss function calculating methods.

You might also like