Professional Documents
Culture Documents
ML Final
ML Final
ML Final
PRACTICAL FILE
Submitted in partial fulfillment for the award of
BACHELOR OF TECHNOLOGY
(CSE), 3rdyear
Submitted by
RUPESH VARSHNEY
Roll No:- 2001090100045
Submitted To
Mr. Dewang Chaudhary
(INTERNSHIP ASSESSMENT: KCS-554 )
In MACHINE LEARNING
[Session: 2022-2023]
CERTIFICATE
OF INTERNSHIP
IN
MACHINE LEARNING
This is to certify that Mr.
RUPESH VARSHNEY
From
Skillvoid
Has Successfully Completed Internship Program
for the period from 09 October, 2022 to 09 November, 2022.
HEAD-HR
ALIGARH COLLEGE OF ENGINEERING & TECHNOLGY
3KM FROM SASNI GATE, MATHURA ROAD ALIGARH-202001
INTERNSHIP CERTIFICATE
6 Conclusion 32
1. INTRODUCTION TO ORGANIZATION
Details of the last annual general meeting of skillvoid Elearning Private Limited are not
available. The company is yet to submit its first full-year financial statements to the
registrar.
CIN AAY-6426
Partners 0
Designated Partners 2
UNDERTAKEN
● Panda
● Matplotlib
2. Spyder
Scientific Python Development Environment (Spyder) is
a free & open-source python IDE. It is lightweight and is
an excellent python ide for data science & ML. It is
used by a lot of data analysts for real-time code analysis.
Spyder has an interactive code execution pattern which
gives you the option to compile any single line, a
section of the code, or the whole code in one go.
You can find the redundant variables, errors, syntax
issues in your code without even compiling it in Spyder
via the static code analysis feature. It is also integrated
with many DS packages like NumPy, SciPy, Pandas,
IPython, etc. to help you in doing data analytics.
You can control the execution flow of your source code
from the Spyder GUI (Graphical User Interface) via the
Spyder debugger. The history log page of Spyder records
all the commands used in the editor for further
references. You can also know about any built-in
function, method, class, etc. in Spyder via the Help Pane
of Spyder. It is an excellent tool for data science
enthusiasts.
3. Thonny
Thonny is an excellent Python IDE that will run on
Windows, Linux, and Mac. The debugger of Thonny
helps in debugging codes line by line, this process helps
a lot for beginners who are learning to code. The
excellent GUI of Thonny makes the installation of third-
party packages much easier.
Thonny autocompletes code according to its prediction
and inspects the code for bracket mismatching and
highlights the error which is a great feature for
beginners. It is completely free to download. When you
call a function in Thonny, it will be done in a separate
window which makes the user understand the local
variables & call stack of the function better. The package
manager of
Thonny helps you in downloading them and
increasing the functionality of python.
4. JupyterLab
It is a web-based python IDE for Machine Learning &
DS professionals. You can test your code as you write
via the interactive output system of JupyterLab. The
interface of JupyterLab is quite good as it provides you a
simultaneous view of the terminal, text editor, console,
and file directory.
Features like auto code completion, auto-formatting,
autosave, etc. make it one of the best free Python IDEs
for ML and DS professionals. There is a zen mode in
JupyterLab which allows users to minimise distractions,
unrequired screens, and focus on the project under
process. The files created in JupyterLab can be
downloaded in various formats like .py, pdf, etc.
5. PyCharm
It is an excellent python IDE which has features like auto
code completion, auto code indentation, etc. It has a
smart debugger that analyses the code and highlights
errors. DS & ML professionals who are into web
development prefer PyCharm also because of its easy
navigation facility. You can search for any particular
symbol used in long codes via the navigation feature in
PyCharm. Interlinking multiple scripts is also easier in
PyCharm.
One can restructure their code easily via PyCharm’s
refactoring feature where you can change the method
signature, rename the file, extract any method in code.
ML professionals use integrated unit testing to test their
ML pipelines.
It helps in knowing the performance of any particular ML
model. PyCharm comes with inbuilt integrated unit
testing and one can see the results in a graphical layout. It
also has a version control system that helps in keeping
track of the changes made to any particular
file/application.
6. Visual Code
Visual Code is one of the most used Python IDE by ML
& DS professionals. It works on Windows, Mac, and
Linux operating systems. VS Code supports many
languages besides Python like C, C#, JavaScript, HTML,
CSS, etc. Visual Code is a lightweight, open-source
Python IDE that has a free version as well as a paid
version for businesses/enterprises.
It is also a good platform for beginners as you will get
hints in the VS Code whenever you create functions or
classes. The auto code completion also helps users to save
time while coding. VS Code is also integrated with
PyLint which checks errors in the source code. You can
perform unit testing on your ML or DS models easily via
VS Code.
The REPL (read-evaluate-print loop) helps in seeing
quick results of any small python code in a separate
window. It helps a lot when one is experimenting
with any new API or function.
VS Code makes working with SQL, Unity, .NET,
Node.js, and many other tools easier. One can rename
a file, extract methods, add imports, etc. in your code
via the VS Code refactor. VS Code is an excellent IDE
for ML & DS to optimise and debug codes easily.
7. Atom
Atom is an excellent IDE for ML & DS professionals
which supports many other languages besides python
like C, C++, HTML, JavaScript, etc. You can use it on
Windows, Linux, and Mac. Atom supports MySQL,
PostgreSQL, Microsoft SQL Server which helps you in
writing and executing SQL queries/commands.
There are many useful packages in Atom like the atom-
beautify package which beautifies your code and makes
it more accurate. The outline view feature of Atom lets
you see a tree-based view of your code and you can
cross- check your classes, functions, etc. easily. Atom
will provide you many themes and templates from
GitHub to choose from.
ML & DS professionals also prefer Atom because of
its ability for cross-platform editing. It is one of the
best open-source free IDEs to use currently.
3. Internship training work
undertaken
Machine learning is a multidisciplinary field that uses
scientific inference and mathematical algorithms to
extract meaningful knowledge and insights from a large
amount of structured and unstructured data. These
algorithms are implemented via computer programs
which are usually run on powerful hardware since it
requires a significant amount of processing.
In 1959, Arthur Samuel defined machine learning as a relatField of study that gives
computers the ability to learn ed". Ms Results without being explicitly programmed".
interactions
"The goal of machine learning is to build computer systems that can adapt and learn
from their experience." Tom Dietterich deal- use amountsocial extract analysis
wa Tom M. Mitchell provided a widely quoted, more formal definition: "A computer
program is said to learn 05 from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E"
● Machine learning
A branch of artificial intelligence, concerned with the design and development
of algorithms that allow computers to evolve behaviours based on empirical
data.As intelligence requires knowledge, it is necessary for the computers to
acquire knowledge.Machine learning refers to a system capable of the autonomous
acquisition and integration of knowledge
● Why is ML needed?
● Types of Algorithm in ML
1. Supervised learning.
Prediction
Classification (discrete labels), Regression (real values)
2. Unsupervised learning.
Clustering
Probability distribution estimation Finding association (in features)
Dimension reduction
3. Semi-supervised learning.
4. Reinforcement learning.
Decision making (robot, chess machine)
● Regression and Classification
These are Supervised Learning algorithms. Both the algorithms are used for prediction
in Machine learning and work with the labeled datasets. But the difference between
both is how they are used for different machine learning problems.
● Application
1. Face detection.
2. Object detection and recognition.
3. Image segmentation.
4. Multimedia event detection.
5. Economical and commercial usage.
4. PROJECT WORK
fake_or_real_news.csv
https://www.kaggle.com/hassanamin/textdb3
Dataset
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
data = pd.read_csv("news.csv") print(data.head())
x = np.array(data["title"]) y
= np.array(data["label"])
cv = CountVectorizer()
x = cv.fit_transform(x)
Now let’s separate the dataset into training and testing sets, and then I’ll use the Multinomial Naive
Bayes algorithm to train the fake news detection model:
Now let’s test this model. To test our trained model, I’ll first write down the title of any news
item found on google news to see if our model predicts that the news is real or not:
news_headline = "CA Exams 2021: Supreme Court asks ICAI to extend opt-out option for July
exams, final order tomorrow"
data = cv.transform([news_headline]).toarray()
print(model.predict(data))
Now I’m going to write a random fake news headline to see if the model predicts the news is
fake or not:
● Summary
Car Dataset
Here, the dataset of different cars is given with their applications. This data is
available in a csv file. We are going to
analyse this data using the pandas DataFramework.
2. Question (based on the value count function) Check what are the different
type of make in our dataset and count what are the count of each make in
data?
One of the main areas of research in machine learning is the prediction of the
price of cars. It is based on finance and the marketing domain. It is a major
research topic in machine learning because the price of a car depends on
many factors. Some of the factors that contribute a lot to the price of a car are:
● Brand
● Model
● Horsepower
● Mileage
● Safety Features
● GPS and many more
If one ignores the brand of the car, a car manufacturer primarily fixes the price
of a car based on the features it can offer a customer. Later, the brand may
raise the price depending on its goodwill, but the most important factors are
what features a car gives you to add value to your life. So, in the section below,
I will walk you through the task of training a car price prediction model with
machine learning using the Python programming language
1. data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
# Column Non-Null Count Dtype
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
data = pd.read_csv("CarPrice.csv")
data.head()
2. print(data.describe))
The price column in this dataset is supposed to be the column whose values
we need to predict. So let’s see the distribution of the values of the price
column:
3. print(data.corr))
predict = "price"
data = data[["symboling", "wheelbase", "carlength",
"carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg", "price"]]
x = np.array(data.drop([predict], 1))
y = np.array(data[predict])
● Summary
It is a major research topic in machine learning because the price of a car
depends on many factors.
Out[5]: Dimensions.Height 0
Dimensions.Length 0
Dimensions.Width 0
Engine Information.Driveline 0
Engine Information.Engine Type 0
Engine Information.Hybrid 0
Engine Information.Number of Forward Gears 0
Engine Information.Transmission 0
Fuel Information.City mpg 0
Fuel Information.Fuel Type 0
Fuel Information.Highway mpg 0
Identification.Classification 0
Identification.ID 0
Identification.Make 0
Identification.Model Year 0
Identification.Year 0
Engine Information.Engine Statistics.Horsepower 0
Engine Information.Engine Statistics.Torque 0
dtype: int64
In[6]: car.head(2)
Out[6]:
Dim. Dim. Dim. Engine Iden. Iden. Engine Info. Fuel Engine
Height length Width Information Make Year Engine Info. Info.
Driveline Statistics. Fuel Hybrid
Horsepower Type
All-wheel
0 140 143 202 drive Audi 2009 250 Gasoline TRUE
Front-wheel
1 140 143 202 drive Audi 2009 200 Gasoline TRUE
In[9]: car.head()
Out[9]:
Dim. Dim. Dim. Engine Iden. Iden. Engine Info. Fuel Engine
Height length Width Information Make Year Engine Info. Info.
Driveline Statistics. Fuel Hybrid
Horsepower Type
1 Front-wheel
140 143 202 drive Audi 2009 200 Gasoline TRUE
2 Front-wheel
140 143 202 drive Audi 2009 200 Gasoline TRUE
3 All-wheel
140 143 202 drive Audi 2009 200 Gasoline TRUE
4 All-wheel
140 143 202 drive Audi 2009 200 Gasoline TRUE
Out[10]:
Dim. Dim. Dim Engine Iden. Iden. Engine Fuel Engine
Heig lengt . Info. Make Year Info. Info. Info.
ht h Widt Driveline Engine Fuel Hybrid
h Statistics. Type
Horsepow
er
0 All-wheel Gasolin
140 143 202 drive Audi 2009 250 e TRUE
1 Front-wheel Gasolin
140 143 202 drive Audi 2009 200 e TRUE
2 Front-wheel Gasolin
140 143 202 drive Audi 2009 200 e TRUE
3 All-wheel Gasolin
140 143 202 drive Audi 2009 200 e TRUE
... … … … ….. … … … … …
Dodg Gasoli True
Rear-wheel
4374 97 235 224 e 2011 390 ne
drive
4375 Dodg Gasoli True
Four-wheel
97 235 224 e 2011 310 ne
drive
Front-wheel
1 140 143 202 drive Audi 2009 200 Gasoline TRUE
0 All-wheel
140 143 205 drive Audi 2009 250 Gasoline TRUE
1 Front-wheel
140 143 205 drive Audi 2009 200 Gasoline TRUE
2 Front-wheel
140 143 205 drive Audi 2009 200 Gasoline TRUE
3 All-wheel
140 143 205 drive Audi 2009 200 Gasoline TRUE
4 All-wheel
140 143 205 drive Audi 2009 200 Gasoline TRUE
5. RESULTS AND DISCUSSION