Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

ML Internship Report

A Practical Training Report

submitted

in partial fulfillment

for the award of the Degree

Bachelor of Technology

in Department of Computer Science and Engineering

Guide: Submitted by-


Dr Sanjay Jain Name: Ayush Paul
Professor Enrollment No.: A20405220124

Department of Computer Science and Engineering


Amity School of Engineering & Technology
Amity University Rajasthan, Jaipur
November 2022
CANDIDATE’S DECLARATION

I hereby declare that the work, which is being presented in the summer training report,
entitled “Machine learning Internship” in partial fulfillment for the award of Degree of
“Bachelor of Technology” in Department of Computer Science and Engineering, and
submitted to the Department of Computer Science and Engineering, Amity School of
Engineering & Technology, Amity University, Rajasthan. This is a record of my own
training preparing under the guidance of Dr Sanjay Jain

Ayush Paul
Computer Science and Engineering
Enrolment No.: A20405220124
Amity University Rajasthan, Jaipur

Counter Signed by

Dr Sanjay Jain
Offer Letter
Course Certificate
TABLE OF CONTENTS
Abstract ............................................................................................... 06
Introduction ........................................................................................ 08
About The Company ................................................................. 08
About Internship ........................................................................ 08
Rationale and Goals of Project .................................................. 09
Chapter 1: Introduction to Projects .................................................... 10
Project 1 ..................................................................................... 11
Approach to the System ................................................... 11
Sections ............................................................................ 13
Technology Used in the project ....................................... 15
Supported Operating System ........................................... 16
Chapter 2: Literature Review ............................................................. 23
Chapter 3: Coding. ............................................................................. 26
Conclusion… ...................................................................................... 29
Bibliography…………………………………………… .................. 30
Abstract
I worked as an intern trainee at Teachnook. The training took place during the summer
vacations. It started from 1st August 2022 and went on till 30th September, thus it was
internship of 8 weeks. The internship was focused primarily on Python with Machine
Learning. Main objective and aim of internship were to get experience of Machine Learning
domain as a whole and how to build a career around it.

Internship provided me with lots of hands-on experience with Python and ML. The main
achievement of this internship was learning discipline to work. Secondly, have good
knowledge of Python language. There I saw real time (live) project which gave me an idea of
how to apply theoretical knowledge into something useful and these are things that really
matters.

During my Internship I worked on two projects. My work was to create a working model of a
clock and apply Machine Learning on a particular dataset.

In general, working in a collaborative environment on a product has been a delightful


challenge and I gained some valuable experience.
Introduction
About the Company:
Teachnook, is turning out to be a head turner in the field of skill development for students. True to its
name, it’s the corner with stairwell to your dreams. Its a team of motivated individuals driven with a
sole mission to help students become industry ready. It believe that every student has a different set
of skills depending on their interests ,natural abilities and experiences. Possessing right set of skills
not only facilitates in professional growth and providing competitive edge but also results in higher
productivity.

About Internship:
Internship Description
My Internship was primarily focused upon training and implementation on real life projects.
Me and my course mates were first trained in basics of Python language and then were
introduced with the ins and outs of Machine Learning.

Profile Requirements
 Should have a basic knowledge of a programming language 
 Should have a good knowledge of mathematical analytical skills.


Rationale and Goals of Project
The project under which I was working can be subdivided into the following goals:
• Choosing and cleaning of dataset provided
• Applying Machine Learning algorithms on the cleaned dataset
Chapter 1: Introduction to Projects
Project 1: Building a Machine Learning Model To Predict Housing Price in
Bengaluru

My project was to build a machine learning model with at least 75% accuracy, to predict the
price of a house in Bengaluru.

Aim:

Following were the aims of this project

 To analyze data 
 Applying measures to clean the data and make it uniform
 Check the accuracy of the model

Approach

In this project, we will be using the time module and its sleep() function. Follow the below steps to
create a countdown timer:
Step 1: Choose the dataset which best suits our needs.
Step 2: Import the necessary libraries (in our case it was pandas and numpy).
Step 3: storing the data from the dataset in a dataframe.
Step 4: Showing and determining the required aspects/columns from the dataset.
Step 5: cleaning of the data by removing entries with null values or combining certain columns or by
converting some data to make the dataset universally acceptable and uniform.
Step 6: After cleaning the dataset we divide the dataset in 80-20 ration, where 80% of the data is used
to train the model and the rest is used to test the outcome of the data.
Step 7: Creating a model to predict the prices.
Step 8: Testing the model to check its accuracy.
Step 9: Testing other algorithms to see if the model created provides the best results.
Step 10: Deployment.
Functionality:

 Primarily used to analyze data


 Can be implemented onto any dataset of choice
 Can also be applied onto a particular section of any dataset.

Technology Used in the project

We have developed this project using the below technology

Python version (Recommended): 3.0.1 Version

Programming Language Used: Python

IDE Tool (Recommended): Anaconda, Jupyter Notebook

Project Type: Data Analysis

Supported Operating System

We can configure this project on following operating system.

 Windows :This project can easily be configured on windows operating system. For
running this project on Windows system, you will have to install Python 2., PIP,
Django.
 Linux : We can run this project also on all versions of Linux operating system
 Mac : We can also easily configured this project on Mac operating system











CHAPTER 2: LITERATURE REVIEW
2.1 Python: -
Python is an interpreted high-level programming language for general-purpose programming.

Created by Guido van Rossum and first released in 1991, Python has a design philosophy that

emphasizes code readability, notably using significant whitespace. It provides constructs that

enable clear programming on both small and large scales. In July 2018, Van Rossum stepped

down as the leader in the language community after 30 years.

Python features a dynamic type system and automatic memory management. It supports multiple

programming paradigms, including object-oriented, imperative, functional and procedural, and

has a large and comprehensive standard library.

Python interpreters are available for many operating systems. CPython, the reference

implementation of Python, is open-source software and has a community-based development

model, as do nearly all of Python's other implementations. Python and CPython are managed by

the non-profit Python Software Foundation.

Python has a simple, easy to learn syntax emphasizes readability hence, it reduces the cost of

program maintenance. Also, Python supports modules and packages, which encourages program

modularity and code reuse.


2.1.1 Advantages of using PYTHON

The diverse application of the Python language is a result of the combination of features which

give this language an edge over others. Some of the benefits of programming in Python include:

Presence of third party modules:


The Python Package Index (PPI) contains numerous third-party modules that make Python
capable of interacting with most of the other languages and platforms.

1. Extensive Support Libraries:


Python provides a large standard library which includes areas like internet protocols, string
operations, web services tools and operating system interfaces. Many high use programming
tasks have already been scripted into the standard library which reduces length of code to be
written significantly.

2. Open Source and Community Development:


Python language is developed under an OSI-approved open source license, which makes it
free to use and distribute, including for commercial purposes. Further, its development is
driven by the community which collaborates for its code through hosting conferences and
mailing lists, and provides for its numerous modules.

3. Learning Ease and Support Available:


Python offers excellent readability and uncluttered simple-to-learn syntax which helps
beginners to utilize this programming language. The code style guidelines, PEP 8, provide a
set of rules to facilitate the formatting of code. Additionally, the wide base of users and active
developers has resulted in a rich internet resource bank to encourage development and the
continued adoption of the language.

4. User-friendly Data Structures:


Python has built-in list and dictionary data structures which can be used to construct fast
runtime data structures. Further, Python also provides the option of dynamic high-level data
typing which reduces the length of support code that is needed.
5. Productivity and Speed:
Python has clean object-oriented design, provides enhanced process control capabilities, and
possesses strong integration and text processing capabilities and its own unit testing
framework, all of which contribute to the increase in its speed and productivity. Python is
considered a viable option for building complex multi-protocol network applications.

2.2 Machine Learning: -


Machine learning (ML) is a topic of study focused on comprehending and developing
"learning" methods, or methods that use data to enhance performance on a certain set of tasks.
It is considered to be a component of artificial intelligence. Without being expressly taught to
do so, machine learning algorithms create a model using sample data, also referred to as
training data, in order to make predictions or judgments. Machine learning algorithms are
utilized in a wide range of applications, including email filtering, computer vision, and
medicine, when it is challenging or impractical to create traditional algorithms that can carry
out the required functions. Computational statistics, which focuses on making predictions with
computers, is closely related to a subset of machine learning, but not all machine learning is
statistical learning. The field of machine learning benefits from the tools, theory, and
application domains that come from the study of mathematical optimization.

Machine learning approaches are traditionally divided into three broad categories, which
correspond to learning paradigms, depending on the nature of the "signal" or "feedback"
available to the learning system:

 Supervised learning

 Unsupervised Learning

 Reinforcement learning

Supervised Learning:
Supervised learning algorithms build a mathematical model of a set of data that contains both
the inputs and the desired outputs. The data is known as training data and consists of a set of
training examples. Each training example has one or more inputs and the desired output, also
known as a supervisory signal.
In the mathematical model, each training example is represented by an array or vector,
sometimes called a feature vector, and the training data is represented by a matrix. Through
iterative optimization of an objective function, supervised learning algorithms learn a function
that can be used to predict the output associated with new inputs. An optimal function will
allow the algorithm to correctly determine the output for inputs that were not a part of the
training data. An algorithm that improves the accuracy of its outputs or predictions over time
is said to have learned to perform that task.

Unsupervised Learning:

Unsupervised learning algorithms take a set of data that contains only inputs, and find
structure in the data, like grouping or clustering of data points. The algorithms, therefore,
learn from test data that has not been labeled, classified, or categorized. Instead of responding
to feedback, unsupervised learning algorithms identify commonalities in the data and react
based on the presence or absence of such commonalities in each new piece of data. A central
application of unsupervised learning is in the field of density estimation in statistics, such as
finding the probability density function.

Reinforcement Learning:

Reinforcement learning is an area of machine learning concerned with how software agents
ought to take actions in an environment so as to maximize some notion of cumulative reward.
Due to its generality, the field is studied in many other disciplines, such as game theory,
control theory, operations research, information theory, simulation-based optimization, multi-
agent systems, swarm intelligence, statistics, and genetic algorithms. In machine learning, the
environment is typically represented as a Markov decision process (MDP). Many
reinforcement learning algorithms use dynamic programming techniques. Reinforcement
learning algorithms do not assume knowledge of an exact mathematical model of the MDP,
and are used when exact models are infeasible. Reinforcement learning algorithms are used in
autonomous vehicles or in learning to play a game against a human opponent.
Models:
Performing machine learning involves creating a model, which is trained on some training
data and then can process additional data to make predictions. Various types of models have
been used and researched for machine learning systems.

 Artificial neural networks

 Decision trees

 Support-vector machines

 Regression analysis

 Bayesian networks

 Gaussian processes

 Genetic algorithms

Prepare and clean the dataset


Machine learning models generally need large arrays of high-quality training data to ensure an
accurate model. Generally, the model will learn the relationships between input and output data
from this training dataset. The makeup of these datasets will differ depending on the type of
machine learning training being performed. Supervised machine learning models are trained on
labelled datasets, which contain both input variables and labelled output variables.

The process of preparing and labelling the data is usually completed by a data scientist and is
often labour intensive. Unsupervised machine learning models on the other hand won’t need
labelled data, so the training dataset will just contain input variables or features. In both types of
machine learning the quality of data has a major effect on the overall effectiveness of the model.
The model learns from the data so poor-quality training data quality may mean the model is
ineffective once deployed. The data should be checked and cleaned so data is standardised, any
missing data is identified, and any outliers are detected.
Split the prepared dataset and perform cross validation

The real-world effectiveness of a machine learning model depends on its ability to generalise, to
apply the logic learned from training data to new and unseen data. Models are often at risk of
being overfitted to the training data, which means the algorithm is too closely aligned to the
original training data. The result will be a drop in accuracy or even a loss in function when
encountering new data in a live environment.

To counter this, the prepared data is usually split into training and testing data. The majority of the
dataset is reserved as training data (for example around 80% of the overall dataset), and a subset
of testing data is also created. The model can then be trained and built off the training data, before
being measured on the testing data. The testing data acts as new and unseen data, allowing the
model to be assessed for accuracy and levels of generalisation.

The process is called cross validation in machine learning, as it validates the effectiveness of the
model against unseen data. There are a range of cross validation techniques, categorised as either
exhaustive and non-exhaustive approaches. Exhaustive cross validation techniques will test all
combinations and iterations of a training and testing dataset. Non-exhaustive cross validation
techniques will create a randomised partition of training and testing subsets. The exhaustive
approach will provide more in-depth insight into the dataset but will take much more time and
resources in contrast to a non-exhaustive approach.
CHAPTER 3: CODING
Project 1: Building a machine learning model

Data Used For ML Model

Importing Required Libraries

Dataset After Data Cleaning and Amending


df10 = df9.drop(['size','price_per_sqft'],axis='columns')
df10.head(3)
Predictions

Averaging of Values in Ranges


Conclusion
This Internship has introduced me to Machine Learning. Now, I know that Machine Learning
is a technique of training machines to perform the activities a human brain can do, albeit bit
faster and better than an average human-being. Today we have seen that the machines can
beat human champions in games such as Chess, AlphaGO, which are considered very
complex. I have seen that machines can be trained to perform human activities in several
areas and can aid humans in living better lives. Machine Learning can be a Supervised or
Unsupervised. If I have lesser amount of data and clearly labeled data for training, opt for
Supervised Learning. Unsupervised Learning would generally give better performance and
results for large data sets. If I have a huge data set easily available, better to go for deep
learning techniques. I also have learned Reinforcement Learning and Deep Reinforcement
Learning. now I know what Neural Networks are, their applications and limitations. Finally,
when it comes to the development of machine learning models of my own, I looked at the
choices of various development languages, IDEs and Platforms. Next thing that I need to do
is start learning and practicing each machine learning technique. The subject is vast, it means
that there is width, but if I consider the depth, each topic can be learned in a few hours. Each
topic is independent of each other. I need to take into consideration one topic at a time and
implement the algorithm/s in it using a language choice of mine. This is the best way to start
studying Machine Learning. Practicing one topic at a time, very soon I would acquire the
width that is eventually required of a Machine Learning expert.

In conclusion, the summer training was time well spent. I learned a lot, both technical and
nontechnical aspects. In the company environment I also got to know that my strength lies in
teamwork, analytical thinking and technical skills.
Machine Learning in future–

As we head towards an even more tech-driven future, Machine Learning is one of the best
career choices of the 21st century. It has plenty of job opportunities with a high-paying salary.
Also, the future scope of Machine Learning is on its way to make a drastic change in the
world of automation. Further, there is a wide scope of Machine Learning in India. Thus, you
can make a lucrative career in the field of Machine Learning to contribute to thus growing
digital world.

Bibliography
 https://www.expert.ai/

 https://www.geeksforgeeks.org/python-programming-language/

 https://www.python.org/
 open source platforms (GitHub)

You might also like