Professional Documents
Culture Documents
Applications of Linear Algebra in Data Science
Applications of Linear Algebra in Data Science
Applications of Linear Algebra in Data Science
Topic Name: -
Submitted to:-
Mr.Tariq Hussain
Submitted by:
Kashif Iftikhar
Roll No: -
Bsf1903713(Bs IT 3rd)
People think of the field of data science in general, or of specific areas of it, such as
natural language processes, machine learning, or computer vision, they rarely consider
linear algebra. The reason linear algebra is often overlooked is that tools used today to
implement data science algorithms do an excellent job in hiding the underlying maths
that make everything come true. Most of the time, people avoid getting into linear
algebra because it’s “difficult” or “hard to understand.” Although partly true, being
familiar with linear algebra is an essential skill for data scientists and computer
engineers.
Linear algebra is the core of many known data science algorithms. In this article, I will
discuss three applications of linear algebra in three data science fields. From machine
learning, we will talk about loss functions, from natural language processing, we will talk
about word embedding, and finally, from computer vision, we will cover image
convolution.
Machine Learning
Machine learning is, without a doubt, the most known application of artificial
intelligence (AI). The main idea behind machine learning is giving systems the power to
automatically learn and improve from experience without being explicitly programmed
to do so. Machine learning functions through building programs that have access to data
(constant or updated) to analyze, find patterns and learn from. Once the programs
discover relationships in the data, it applies this knowledge to new sets of data. You can
read more about how algorithms learn from this article.
Linear algebra has several applications in machine learning, such as loss functions,
regularization, support vector classification, and much more. In this article, however, we
will only cover linear algebra in loss functions.
Loss Function
The way machine learning algorithms work is, they collect data, analyze it, and then build
a model using one of many approaches (linear regression, logistic regression, decision
tree, random forest, etc.). Then, based on the results, they can predict future data
queries.
But…
Using linear algebra, in particular using loss functions. The loss function is a method of
evaluating how accurate your prediction models are. Will it perform well with new
datasets? If your model is totally off, your loss function will output a higher number.
Where if it were a good one, the loss function would output a lower amount.
Mean Squared Error (MSE) is probably the most used loss error approach, easy to
understand and implement and generally works quite well in most regression problems.
Most Python libraries used in data science, Numpy, Scikit, and TensorFlow have their
own built-in implementation of the MSE functionality. Despite that, they all work based
on the same equation:
Take the square of the difference Where N is the number of data points in both the
observed and predicted values.
1. Calculate the between each pair of the observed and predicted values.
2. difference.
#Set data
x = list(range(1,6)) #data points
y = [1,1,2,2,4] #original values
y_bar = [0.6,1.29,1.99,2.69,3.4] #predicted values
summation = 0
n = len(y)
for i in range(0, n):
# finding the difference between observed and
predicted value
difference = y[i] - y_bar[i]
squared_difference = difference**2 # taking square
of the differene
# taking a sum of all the differences
summation = summation + squared_difference
MSE = summation/n # get the average of all
print("The Mean Square Error is: ", MSE)
#Plot relationship
plt.scatter(x, y, color='#06AED5')
plt.plot(x, y_bar, color='#1D3557', linewidth=2)
plt.xlabel('Data Points', fontsize=12)
plt.ylabel('Output', fontsize=12)
plt.title("MSE")
Most data scientists don’t like to use the MSE because it may not be a
perfect representation of the error. However, it is usually used as an
intermediate step to Root Mean Squared Error (RMSE), which can be
easily obtained by taking the square root of the MSE.
The Mean Absolute Error (MAE) is quite similar to the MSE; the
difference is, we calculate the absolute difference between the observed
data and the protected one.
The MAE cost is more robust compared to MSE. However, a
disadvantage of MAE is that handling the absolute or modulus operator
in mathematical equations is not easy. Yet, MAE is the most intuitive of
all the loss function calculating methods.