Professional Documents
Culture Documents
Pooja Intership2
Pooja Intership2
Pooja Intership2
INTERNSHIP REPORT
ON
BACHELOR OF ENGINEERING IN
ELECTRONICS AND
COMMUNICATION ENGINEERING
Submitted by:
POOJA N
4UB19EC048
Conducted at
Varcons Technologies Pvt Ltd
2022-23
CERTIFICATE
This is to certify that the Internship titled “Machine Learning Internship with Project
Predictive Model for Forecasting Demand and Supply of TOP Crops” carried out by Ms.
Pooja N, a bonafide student of UBDT COLLEGE OF ENGINEERING, in partial fulfillment
for the award of Bachelor of Engineering, in Electronics and communication under
Visvesvaraya Technological University, Belagavi, during the year 2022-2023. It is certified
that all corrections/suggestions indicated have been incorporated in the report.
The project report has been approved as it satisfies the academic requirements in respect
of Internship prescribed for the course Internship / Professional Practice.(18ECI85)
External Viva:
1)
2)
2
OFFER LETTER
3
CERTIFICATE
4
D E C LAR AT I O N
Date : 1/10/2022 :
Place : Davangere
USN : 4UB19EC048
NAME : POOJA N
5
ACKNOWLE DGEM ENT
This Internship is a result of accumulated guidance, direction and support of several important persons.
We take this opportunity to express our gratitude to all who have helped us to complete the Internship.
We express our sincere thanks to our Principal sir , for providing us adequate facilities to undertake this
Internship.
We would like to thank our Head of Dept , for providing us an opportunity to carry out Internship and
for his valuable guidance and support.
We would like to thank our (Lab assistant name) Software Services for guiding us during the period of
internship.
We express our deep and profound gratitude to our guide, Dr Sreedharmurthy S K, Professor, for his
keen interest and encouragement at every step in completing the Internship.
We would like to thank all the faculty members of our department for the support extended during the
course of Internship.
We would like to thank the non-teaching members of our dept, for helping us during the Internship.
Last but not the least, we would like to thank our parents and friends without whose constant help, the
completion of Internship would have not been possible.
NAME POOJA N
USN 4UB19EC048
6
ABSTRACT
External factors, such as social media and financial news, can have wide-spread effects on stock price
movement. For this reason, social media is considered a useful resource for precise market predictions.
In this paper, we show the effectiveness of using Twitter posts to predict stock prices. We start by
training various models on the Sentiment 140 Twitter data. We found that Support Vector Machines
(SVM) performed best (0.83 accuracy) in the sentimental analysis, so we used it to predict the average
sentiment of tweets for each day that the market was open. Next, we use the sentimental analysis of one
year’s data of tweets that contain the “stock market”, “stock twits”, “AAPL” keywords, with the goal of
predicting the corresponding stock prices of Apple Inc. (AAPL) and the US’s Dow Jones Industrial
Average (DJIA) index prices. Two models, Boosted Regression Trees and Multilayer Perceptron
Neural Networks were used to predict the closing price difference of AAPL and DJIA prices. We show
that neural networks perform substantially better than traditional models for stocks’ price prediction.
7
Table of Contents
Sl no Description Page no
1 Company Profile 8
3 Objectives of Internship 12
4 Tasks Performed 13
5 Introduction to Python 14
6 Introduction to 15-17
Machine Learning
7 Algorithms of 18-22
Machine Learning
8 Project 23-29
Implementation
9 Snapshots 30-31
10 Conclusion 32
8
COMPANY PROFILE
Sarvamoola Software Services. is a Technology Organization providing solutions for all web design
and development, MYSQL, PYTHON Programming, HTML, CSS, ASP.NET and LINQ. Meeting
the ever increasing automation requirements, Sarvamoola Software Services. specialize in ERP,
Connectivity, SEO Services, Conference Management, effective web promotion and tailor-made
software products, designing solutions best suiting clients requirements.
Varcons technologies, strive to be the front runner in creativity and innovation in software
development through their well-researched expertise and establish it as an out of the box software
development company in Bangalore, India. As a software development company, they translate this
software development expertise into value for their customers through their professional solutions.
They understand that the best desired output can be achieved only by understanding the clients
demand better. Varcons Technologies work with their clients and help them to defiine their exact
solution requirement. Sometimes even they wonder that they have completely redefined their
solution or new application requirement during the brainstorming session, and here they position
themselves as an IT solutions consulting group comprising of high caliber consultants.
They believe that Technology when used properly can help any business to scale and achieve new
heights of success. It helps Improve its efficiency, profitability, reliability; to put it in one sentence ”
Technology helps you to Delight your Customers” and that is what we want to achieve.
9
ABOUT THE COMPANY
Varcons Technologies is a Technology Organization providing solutions for all web design and
development, MYSQL, PYTHON Programming, HTML, CSS, ASP.NET and LINQ. Meeting the
ever increasing automation requirements, Varcons Technologies specialize in ERP, Connectivity,
SEO Services, Conference Management, effective web promotion and tailor-made software
products, designing solutions best suiting clients requirements. The organization where they have a
right mix of professionals as a stakeholders to help us serve our clients with best of our
capability and with at par industry standards. They have young, enthusiastic, passionate and
creative Professionals to develop technological innovations in the field of Mobile technologies, Web
applications as well as Business and Enterprise solution. Motto of our organization is to
“Collaborate with our clients to provide them with best Technological solution hence creating Good
Present and Better Future for our client which will bring a cascading a positive effect in their
business shape as well”. Providing a Complete suite of technical solutions is not just our tag line, it
is Our Vision for Our Clients and for Us, We strive hard to achieve it.
It is the process by which new applications are created for devices running the Android operating
system. Applications are usually developed in Java (and/or Kotlin; or other such option)
programming language using the Android software development kit (SDK), but other development
environments are also available, some such as Kotlin support the exact same Android APIs (and
bytecode), while others such as Go have restricted API access.
The Android software development kit includes a comprehensive set of development tools. These
include a debugger, libraries, a handset emulator based on QEMU, documentation, sample code,
and zutorials. Currently supported development platforms include computers running Linux (any
modern desktop Linux distribution), Mac OS X 10.5.8 or later, and Windows 7 or later. As of
March 2015, the SDK is not available on Android itself, but softwaredevelopment is possible by
using specialized Android applications.
10
Web Application
It is a client–server computer program in which the client (including the user interface and client- side
logic) runs in a web browser. Common web applications include web mail, online retail sales, online
auctions, wikis, instant messaging services and many other functions. web applications use web
documents written in a standard format such as HTML and JavaScript,which are supported by a
variety of web browsers. Web applications can beconsidered as a specifific variant of client–server
software where the client software isdownloaded to the client machine when visiting the relevant web
page, using standardprocedures such as HTTP. The Client web software updates may happen each time
the web page is visited. During the session, the web browser interprets and displays the pages, and
acts as the universal client for any web application. The use of web application frameworks can often
reduce the number of errors in a program, both by making the code simpler, and by allowing one team
to concentrate on the framework while another focuses on a specifified use case. In applications which
are exposed to constant hacking attempts on the Internet, security-related problems can be caused by
errors in the program.
Frameworks can also promote the use of best practices such as GET after POST. There are some
who view a web application as a two-tier architecture. This can be a “smart” client that performs all
the work and queries a “dumb” server, or a “dumb” client that relies on a “smart”server. The client
would handle the presentation tier, the server would have the database (storage tier), and the
business logic (application tier) would be on one of them or on both. While this increases the
scalability of the applications and separates the display and the database, it still doesn‟t allow for
true specialization of layers, so most applications will outgrow this model. An emerging strategy
for application software companies is to provide web access to software previously distributed as
local applications. Depending on the type of application, it may require the development of an
entirely different browser-based interface, or merely adapting an existing application to use
different presentation technology. These programs allow the user to pay a monthly or yearly fee
for use of a software application without having to install it on a local hard drive. A company
which follows this strategy is known as an application service provider (ASP), and ASPs are
currently receiving much attention in the software industry.
Security breaches on these kinds of applications are a major concern because it can involve both
enterprise information and private customer data. Protecting these assets is an important part of any
web application and there are some key operational areas that must be included in the development
process. This includes processes for authentication, authorization, asset handling, input, and logging
and auditing. Building security into the applications from the beginning can be more effective and less
disruptive in the long run.
11
Web design
It is encompasses many different skills and disciplines in the production and maintenance of websites.
The different areas of web design include web graphic design; interface design; authoring, including
standardized code and proprietary software; user experience design; and
search engine optimization. The term web design is normally used to describe the design process
relating to the front-end (client side) design of a website including writing mark up. Web design
partially overlaps web engineering in the broader scope of web development. Web designers are
expected to have an awareness of usability and if their role involves creating mark up then they are also
expected to be up to date with web accessibility guidelines. Web design partially overlaps web
engineering in the broader scope of web development.
Vacrons Technologies plays an essential role as an institute, the level of education, development of
student’s skills are based on their trainers. If you do not have a good mentor then you may lag in many
things from others and that is why we at Vacrons Technologies gives you the facility of skilled
employees so that you do not feel unsecured aboutthe academics. Personality development and
academic status are some of those things which lie on mentor’s hands. If you are trained well then you
can do well in your future and knowing its importance of Vacrons Technologies always tries to give
you the best.
They have a great team of skilled mentors who are always ready to direct their trainees in the best
possible way they can and to ensure the skills of mentors we held many skill development programs as
well so that each and every mentor can develop their own skills with the demands of the companies so
that they can prepare a complete packaged trainee.
• Python
• Selenium Testing
12
OBJECTIVES OF INTERNSHIP
1. Gain knowledge and experience in the field of machine learning through hands-on projects.
4. Ability to understand data, design a model and understanding the intricaciesof it.
13
TASKS PERFORMED
14
Introduction to Python
Python
Python is a popular programming language. It was created by Guido van Rossum,and released in
1991.
It is used for:
web development (server-side),
software development,
mathematics,
system scripting.
Application of Python
Python can be used on a server to create web applications.
Python can be used alongside software to create workflows.
Python can connect to database systems. It can also read and modify files.
Python can be used to handle big data and perform complex mathematics.
Python can be used for rapid prototyping, or for production-ready software
development.
Features of Python
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi,etc).
Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer linesthan some
other programming languages.
Python runs on an interpreter system, meaning that code can be executed assoon as it is
written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-oriented way or afunctional
way.
16
Introduction to ML
Machine Learning (ML) is that field of computer science with the help of which computer
systems can provide sense to data in much the same way as human beings do. In simple words,
ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm
or method. The main focus of ML is to allow computer systems learn from experience without
being explicitly programmed or human intervention.
Human beings, at this moment, are the most intelligent and advanced species on earth because
they can think, evaluate and solve complex problems. On the other side, AI is still in its initial
stage and haven’t surpassed human intelligence in many aspects. Then the question is that what is
the need to make machine learn? The most suitable reason for doing this is, “to make decisions,
based on data, with efficiency and scale”. Lately, organizations are investing heavily in newer
technologies like Artificial Intelligence, Machine Learning and Deep Learning to get the key
information from data to perform several real-world tasks and solve problems. We can call it
data-driven decisions taken by machines, particularly to automate the process. These data-
driven decisions can be used, instead of using programing logic, in the problems that cannot be
programmed inherently. The fact is that we can’t do without human intelligence, but other
aspect is that we all need to solve real-world problems with efficiency at a huge scale. That is
why the need for machine learning arises.
1. Numpy:
NumPy is a Python library used for working with arrays.It also has functions for working in
domain of linear algebra, fourier transform, and matrices.NumPy was created in 2005 by
Travis Oliphant. It is an open source project and you can use it freely. NumPy stands for
Numerical Python.
In Python we have lists that serve the purpose of arrays, but they are slow to process.NumPy
aims to provide an array object that is up to 50x faster than traditional Python lists.The array
object in NumPy is called ndarray, it provides alot of supporting functions that make working
with ndarray very easy.
Arrays are very frequently used in data science, where speed and resources are very important.
2. Pandas:
pandas is a Python package providing fast, flexible, and expressive data structures designed to
make working with “relational” or “labeled” data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, real-world data analysis in Python.
Additionally, it has the broader goal of becoming the most powerful and flexible open source
data analysis/manipulationtool available in any language. It is already well on its way toward this
goal.
3. Matplotlib:
Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and
its numerical extension NumPy. As such, it offers a viable open sourcealternative to MATLAB.
Developers can also use matplotlib’s APIs (Application Programming Interfaces) to embed plots
in GUI applications.
A Python matplotlib script is structured so that a few lines of code are all that is required in
most instances to generate a visual data plot. The matplotlib scriptinglayer overlays two APIs:
The pyplot API is a hierarchy of Python code objects toppedby
matplotlib.pyplot
An OO (Object-Oriented) API collection of objects that can be assembledwith greater
flexibility than pyplot. This API provides direct access to Matplotlib’s backend layers.
Internship report 16
4. Scikit-learn:
Scikit-learn is an open source data analysis library, and the gold standard for Machine
Learning (ML) in the Python ecosystem. Key concepts and featuresinclude:
Algorithmic decision-making methods, including:
Regression: predicting or projecting data values based on the average mean ofexisting and
planned data.
Internship report 17
Machine Learning Algorithms
Machine Learning algorithms are the programs that can learn the hidden patterns from the data,
predict the output, and improve the performance from experiences on their own. Different
algorithms can be used in machine learning for different tasks, such as simple linear regression
that can be used for prediction problems like stock market prediction, and the KNN algorithm can
be used for classification problems.
In this topic, we will see the overview of some popular and most commonlyused machine
learning algorithms along with their use cases and categories.
Supervised learning is a type of Machine learning in which the machine needs external
supervision to learn. The supervised learning models are trained using the labeled dataset. Once
the training and processing are done, the model is tested by providing a sample test data to check
whether it predicts the correct output.
The goal of supervised learning is to map input data with the output data. Supervised learning is
based on supervision, and it is the same as when a student learns things in the teacher's
supervision. The example of supervised learning is spam filtering.
o Classification
o Regression
Examples of some popular supervised learning algorithms are Simple Linearregression, Decision
Tree, Logistic Regression, KNN algorithm, etc. Read more..
Internship report 18
2) Unsupervised Learning Algorithm
It is a type of machine learning in which the machine does not need any external supervision to
learn from the data, hence called unsupervised learning. The unsupervised models can be trained
using the unlabelled dataset that is not classified,
nor categorized, and the algorithm needs to act on that data without any supervision. In
unsupervised learning, the model doesn't have a predefined output, and it tries to find useful
insights from the huge amount of data. These are used to solve the Association and Clustering
problems. Hence further, it can be classified into two types:
o Clustering
o Association
Examples of some Unsupervised learning algorithms are K-means Clustering, Apriori Algorithm,
Eclat, etc. Read more..
3) Reinforcement Learning
In Reinforcement learning, an agent interacts with its environment by producing actions, and
learn with the help of feedback. The feedback is given to the agent in the form of rewards, such as
for each good action, he gets a positive reward, and for each bad action, he gets a negative reward.
There is no supervision provided to the agent. Q-Learning algorithm is used in reinforcement
learning.
Internship report 19
ALGORITHMS:
1. Linear Regression
Linear regression is one of the most popular and simple machine learning algorithms that is used
for predictive analysis. Here, predictive analysis defines prediction of something, and linear
regression makes predictions for continuous numbers suchas salary, age, etc.
It shows the linear relationship between the dependent and independent variables, and shows how
the dependent variable(y) changes according to the independent variable (x).
It tries to best fit a line between the dependent and independent variables, and this best fit line is
knowns as the regression line.
y= a0+ a*x+ b
x= independent variable
a0 = Intercept of line.
The below diagram shows the linear regression for prediction of weight according to height: Read
more..
Internship report 20
2. Logistic Regression
Logistic regression is the supervised learning algorithm, which is used to predict the categorical
variables or discrete values. It can be used for the classification problems in machine learning, and
the output of the logistic regression algorithm can be eitherYes or NO, 0 or 1, Red or Blue, etc.
Logistic regression is similar to the linear regression except how they are used, such as Linear
regression is used to solve the regression problem and predict continuous values, whereas
Logistic regression is used to solve the Classification problem and used to predict the discrete
values.
Instead of fitting the best fit line, it forms an S-shaped curve that lies between 0 and
1. The S-shaped curve is also known as a logistic function that uses the concept of the threshold.
Any value above the threshold will tend to 1, and below the thresholdwill tend to 0.
Internship report 21
3. Decision Tree Algorithm
A decision tree is a supervised learning algorithm that is mainly used to solve the classification
problems but can also be used for solving the regression problems. It can work with both
categorical variables and continuous variables. It shows a tree- like structure that includes nodes
and branches, and starts with the root node that expand on further branches till the leaf node. The
internal node is used to represent the features of the dataset, branches show the decision rules, and
leaf nodes representthe outcome of the problem.
Some real-world applications of decision tree algorithms are identification between cancerous
and non-cancerous cells, suggestions to customers to buy a car, etc
The data points that help to define the hyperplane are known as support vectors, and hence it is
named as support vector machine algorithm.
Some real-life applications of SVM are face detection, image classification, Drug discovery, etc.
Consider the below diagram:
Internship report 22
5. K-Nearest Neighbour (KNN)
K-Nearest Neighbour is a supervised learning algorithm that can be used for both classification
and regression problems. This algorithm works by assuming the similarities between the new data
point and available data points. Based on these similarities, the new data points are put in the
most similar categories. It is also known as the lazy learner algorithm as it stores all the available
datasets and classifies each new case with the help of K-neighbours. The new case is assigned to the
nearest class with most similarities, and any distance function measures the distance between the
data points. The distance function can be Euclidean, Minkowski, Manhattan, or Hamming
distance, based on the requirement.
Internship report 23
Project Implementation
Proposed Model
Internship report 24
Objectives
This analysis helps the farmers in evaluating future demands of the crops. This report
will help farmersdetermine the variety and the time of planting the crops. Main objective of
this work includes helping the farmers by providing historical crop yield data with cost
forecasting for risk management. Also, datacollected would help the administration in
making crop protection arrangements and strategies for inventory network activity. This
will likewise enable the government to have an equilibrium costing over the TOP crops, so
they can sell the item in the market for reasonable expense.
System security, data security and reliability are the striking features.
Internship report 25
DESIGN & ANALYSIS
The work was carried out in four stages- data collection, data pre- processing,
prediction and data visualization. General parameters for predicting the crop price are
climate change, Government policy and demand. The acquired data was not present in the
required format; thus, it was structured in the required format. Prediction is done by utilizing
past year data of crop price. The output provides one year data i.e., a yearly price prediction.
Consider two input parameters for yearly price prediction. The price of the crop which is
output under a certain period, it is the price of input in the previous period. Machine
Learning algorithms are used to predict and forecast demand-supply and prices. Prediction
is accomplished using regression analysis. It is a type of supervised learning in the domain
of Machine Learning that results in a predicted relationship between labels and data points.
Internship report 26
Data acquisition and ingestion:
The process of transporting data from one or more sources to a target site for
further processing and analysis. Agricultural data of previous years arecollected and used
by the system. This dataset includes crop areas, types of crops cultivated, nature of the soil,
yields and overall crops consumption. Datais gathered from authentic websites like
Ministry of Agriculture & Farmers Welfare, Food and Agriculture Organization, APEDA,
NITI Aayog, Agriculture Marketing Department of Karnataka, Indiastat.com and
Competition Commission. Additionally, some unpublished data has also been procured
from APMCs and district agriculture and horticulture departments.
The weather condition data is collected from authentic sources like IMD.
Figure : Dataset
Data pre-processing:
Crop prices are affected by several factors such as climate, supply and demand. The
obtained data contained huge number of outliers, null values and many discontinuous
values. An outlier is a data point that is noticeably different from the rest. They represent
errors in measurement, bad data collection, or simply show variables not considered when
collecting the data. Learning algorithms are sensitive to outliers. Using Python libraries in
Excel we managedto reduce the outliers and error values.
Internship report 27
data has been used for analysis. We calculated this data by making use of demand curve
formula which fits the curve. Yearly data are collected for forecasting because it has less
noise. As the data for demand was simulated, the required accuracy is partially met, when
actual data isavailable, accuracy of themodel can be increased.
The equation that depicts the relationship between the price of a certain commodity and the
quantity of that commodity that is demanded at that price can be given as
Qd = a – m P
Qd = Linear demand curvea = Production (Kg)
m= Slope
P = Price (Rs)
The algorithms and tools thus selected are familiarized by carrying out some test runs
and finding the most optimal algorithm to satisfy the needs. Algorithmslike Linear regression,
Logistic regression and Random Forest are used for prediction and classification. Linear
regression is initially carried out to predictthe value of a variable based on the value of
another independent variable. Thechosen algorithms are then implemented in sequence to
design a predictive model.
Model Validation:
Model validation is carried out in two phases. In the initial phase, real time data is given as
an input to the designed predictive model to obtain the forecasting information. These results are
compared, verified and validated against the authenticdata to check for accuracy. In the second
phase, Orange3, a python- based data visualization, machine learning and data mining tool kit has
been used for explorativerapid qualitative data analysis to validate our prediction model.
Internship report 28
Code Snippet:
import pandas as pd
import numpy as np
data = pd.read_csv('crop_data.csv')
data['date'] = pd.to_datetime(data['date'])
data.set_index('date', inplace=True)
data['prev_price'] = data.groupby('crop')['price'].shift(1)
data.dropna(inplace=True)
y = data[['price']]
Internship report 29
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
prediction = model.predict(new_data)
plt.plot(data.index, data['price'])
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
Internship report 30
SNAPSHOTS
Internship report 31
Fig: Price wrt Demand
Internship report 32
CONCLUTION
The package was designed in such a way that future modifications can be done easily. The
following conclusions can be deduced from the development of the project:
It provides a friendly graphical user interface which proves to be better when compared
to the existing system.
System security, data security and reliability are the striking features.
Internship report 33