Pooja Intership2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI – 590018, Karnataka

INTERNSHIP REPORT
ON

“A Predictive Model for Forecasting


Demand and Supply of TOP Crops”
Submitted in partial fulfilment for the award of degree

BACHELOR OF ENGINEERING IN
ELECTRONICS AND
COMMUNICATION ENGINEERING

Submitted by:
POOJA N
4UB19EC048

Conducted at
Varcons Technologies Pvt Ltd

UBDT COLLEGE OF ENGINEERING


Department of Electronics and Engineering
Davangere
1
UBDT COLLEGE OF ENGINEERING
Department of Electronics and Communication
EngineeringDavangere

2022-23

CERTIFICATE

This is to certify that the Internship titled “Machine Learning Internship with Project
Predictive Model for Forecasting Demand and Supply of TOP Crops” carried out by Ms.
Pooja N, a bonafide student of UBDT COLLEGE OF ENGINEERING, in partial fulfillment
for the award of Bachelor of Engineering, in Electronics and communication under
Visvesvaraya Technological University, Belagavi, during the year 2022-2023. It is certified
that all corrections/suggestions indicated have been incorporated in the report.

The project report has been approved as it satisfies the academic requirements in respect
of Internship prescribed for the course Internship / Professional Practice.(18ECI85)

Signature of Guide Signature of HOD


Dr Shreedharmurthy S K Dr.Ravendra P Rajput

External Viva:

Name of the Examiner Signature with Date

1)

2)

2
OFFER LETTER

3
CERTIFICATE

4
D E C LAR AT I O N

I, Pooja N, final year student of Computer Science and Engineering, UBDT


College of Engineering - 577004, declare that the Internship has been successfully
completed, in VARCONS TECHNOLOGIES. This report is submitted in
partial fulfillment of the requirements for award of Bachelor Degree in Computer
Science and Engineering, during the academic year 2022-2023.

Date : 1/10/2022 :
Place : Davangere

USN : 4UB19EC048
NAME : POOJA N

5
ACKNOWLE DGEM ENT

This Internship is a result of accumulated guidance, direction and support of several important persons.
We take this opportunity to express our gratitude to all who have helped us to complete the Internship.

We express our sincere thanks to our Principal sir , for providing us adequate facilities to undertake this
Internship.

We would like to thank our Head of Dept , for providing us an opportunity to carry out Internship and
for his valuable guidance and support.

We would like to thank our (Lab assistant name) Software Services for guiding us during the period of
internship.

We express our deep and profound gratitude to our guide, Dr Sreedharmurthy S K, Professor, for his
keen interest and encouragement at every step in completing the Internship.

We would like to thank all the faculty members of our department for the support extended during the
course of Internship.

We would like to thank the non-teaching members of our dept, for helping us during the Internship.

Last but not the least, we would like to thank our parents and friends without whose constant help, the
completion of Internship would have not been possible.

NAME POOJA N
USN 4UB19EC048

6
ABSTRACT

External factors, such as social media and financial news, can have wide-spread effects on stock price
movement. For this reason, social media is considered a useful resource for precise market predictions.
In this paper, we show the effectiveness of using Twitter posts to predict stock prices. We start by
training various models on the Sentiment 140 Twitter data. We found that Support Vector Machines
(SVM) performed best (0.83 accuracy) in the sentimental analysis, so we used it to predict the average
sentiment of tweets for each day that the market was open. Next, we use the sentimental analysis of one
year’s data of tweets that contain the “stock market”, “stock twits”, “AAPL” keywords, with the goal of
predicting the corresponding stock prices of Apple Inc. (AAPL) and the US’s Dow Jones Industrial
Average (DJIA) index prices. Two models, Boosted Regression Trees and Multilayer Perceptron
Neural Networks were used to predict the closing price difference of AAPL and DJIA prices. We show
that neural networks perform substantially better than traditional models for stocks’ price prediction.

7
Table of Contents

Sl no Description Page no

1 Company Profile 8

2 About the Company 9-11

3 Objectives of Internship 12

4 Tasks Performed 13

5 Introduction to Python 14

6 Introduction to 15-17
Machine Learning

7 Algorithms of 18-22
Machine Learning

8 Project 23-29
Implementation
9 Snapshots 30-31

10 Conclusion 32

8
COMPANY PROFILE

A Brief History of Varcons Technologies


Varcons Technologies, was incorporated with a goal “To provide high quality and optimal
Technological Solutions to business requirements of our clients”. Every business is a different and
has a unique business model and so are the technological requirements. They understand this and
hence the solutions provided to these requirements are different as well. They focus on clients
requirements and provide them with tailor made technological solutions. They also understand that
Reach of their Product to its targeted market or the automation of the existing process into e-client
and simple process are the key features that our clients desire from Technological Solution they are
looking for and these are the features that we focus on while designing the solutions for their clients.

Sarvamoola Software Services. is a Technology Organization providing solutions for all web design
and development, MYSQL, PYTHON Programming, HTML, CSS, ASP.NET and LINQ. Meeting
the ever increasing automation requirements, Sarvamoola Software Services. specialize in ERP,
Connectivity, SEO Services, Conference Management, effective web promotion and tailor-made
software products, designing solutions best suiting clients requirements.

Varcons technologies, strive to be the front runner in creativity and innovation in software
development through their well-researched expertise and establish it as an out of the box software
development company in Bangalore, India. As a software development company, they translate this
software development expertise into value for their customers through their professional solutions.

They understand that the best desired output can be achieved only by understanding the clients
demand better. Varcons Technologies work with their clients and help them to defiine their exact
solution requirement. Sometimes even they wonder that they have completely redefined their
solution or new application requirement during the brainstorming session, and here they position
themselves as an IT solutions consulting group comprising of high caliber consultants.

They believe that Technology when used properly can help any business to scale and achieve new
heights of success. It helps Improve its efficiency, profitability, reliability; to put it in one sentence ”
Technology helps you to Delight your Customers” and that is what we want to achieve.

9
ABOUT THE COMPANY

Varcons Technologies is a Technology Organization providing solutions for all web design and
development, MYSQL, PYTHON Programming, HTML, CSS, ASP.NET and LINQ. Meeting the
ever increasing automation requirements, Varcons Technologies specialize in ERP, Connectivity,
SEO Services, Conference Management, effective web promotion and tailor-made software
products, designing solutions best suiting clients requirements. The organization where they have a
right mix of professionals as a stakeholders to help us serve our clients with best of our
capability and with at par industry standards. They have young, enthusiastic, passionate and
creative Professionals to develop technological innovations in the field of Mobile technologies, Web
applications as well as Business and Enterprise solution. Motto of our organization is to
“Collaborate with our clients to provide them with best Technological solution hence creating Good
Present and Better Future for our client which will bring a cascading a positive effect in their
business shape as well”. Providing a Complete suite of technical solutions is not just our tag line, it
is Our Vision for Our Clients and for Us, We strive hard to achieve it.

Products of Vacrons Technologies.


Android Apps

It is the process by which new applications are created for devices running the Android operating
system. Applications are usually developed in Java (and/or Kotlin; or other such option)
programming language using the Android software development kit (SDK), but other development
environments are also available, some such as Kotlin support the exact same Android APIs (and
bytecode), while others such as Go have restricted API access.

The Android software development kit includes a comprehensive set of development tools. These
include a debugger, libraries, a handset emulator based on QEMU, documentation, sample code,
and zutorials. Currently supported development platforms include computers running Linux (any
modern desktop Linux distribution), Mac OS X 10.5.8 or later, and Windows 7 or later. As of
March 2015, the SDK is not available on Android itself, but softwaredevelopment is possible by
using specialized Android applications.

10
Web Application

It is a client–server computer program in which the client (including the user interface and client- side
logic) runs in a web browser. Common web applications include web mail, online retail sales, online
auctions, wikis, instant messaging services and many other functions. web applications use web
documents written in a standard format such as HTML and JavaScript,which are supported by a
variety of web browsers. Web applications can beconsidered as a specifific variant of client–server
software where the client software isdownloaded to the client machine when visiting the relevant web
page, using standardprocedures such as HTTP. The Client web software updates may happen each time
the web page is visited. During the session, the web browser interprets and displays the pages, and
acts as the universal client for any web application. The use of web application frameworks can often
reduce the number of errors in a program, both by making the code simpler, and by allowing one team
to concentrate on the framework while another focuses on a specifified use case. In applications which
are exposed to constant hacking attempts on the Internet, security-related problems can be caused by
errors in the program.
Frameworks can also promote the use of best practices such as GET after POST. There are some
who view a web application as a two-tier architecture. This can be a “smart” client that performs all
the work and queries a “dumb” server, or a “dumb” client that relies on a “smart”server. The client
would handle the presentation tier, the server would have the database (storage tier), and the
business logic (application tier) would be on one of them or on both. While this increases the
scalability of the applications and separates the display and the database, it still doesn‟t allow for
true specialization of layers, so most applications will outgrow this model. An emerging strategy
for application software companies is to provide web access to software previously distributed as
local applications. Depending on the type of application, it may require the development of an
entirely different browser-based interface, or merely adapting an existing application to use
different presentation technology. These programs allow the user to pay a monthly or yearly fee
for use of a software application without having to install it on a local hard drive. A company
which follows this strategy is known as an application service provider (ASP), and ASPs are
currently receiving much attention in the software industry.
Security breaches on these kinds of applications are a major concern because it can involve both
enterprise information and private customer data. Protecting these assets is an important part of any
web application and there are some key operational areas that must be included in the development
process. This includes processes for authentication, authorization, asset handling, input, and logging
and auditing. Building security into the applications from the beginning can be more effective and less
disruptive in the long run.

11
Web design
It is encompasses many different skills and disciplines in the production and maintenance of websites.
The different areas of web design include web graphic design; interface design; authoring, including
standardized code and proprietary software; user experience design; and
search engine optimization. The term web design is normally used to describe the design process
relating to the front-end (client side) design of a website including writing mark up. Web design
partially overlaps web engineering in the broader scope of web development. Web designers are
expected to have an awareness of usability and if their role involves creating mark up then they are also
expected to be up to date with web accessibility guidelines. Web design partially overlaps web
engineering in the broader scope of web development.

Departments and services offered

Vacrons Technologies plays an essential role as an institute, the level of education, development of
student’s skills are based on their trainers. If you do not have a good mentor then you may lag in many
things from others and that is why we at Vacrons Technologies gives you the facility of skilled
employees so that you do not feel unsecured aboutthe academics. Personality development and
academic status are some of those things which lie on mentor’s hands. If you are trained well then you
can do well in your future and knowing its importance of Vacrons Technologies always tries to give
you the best.
They have a great team of skilled mentors who are always ready to direct their trainees in the best
possible way they can and to ensure the skills of mentors we held many skill development programs as
well so that each and every mentor can develop their own skills with the demands of the companies so
that they can prepare a complete packaged trainee.

Services provided by Vacrons Technologies.


• Core Java and Advanced Java

• Web services and development

• Dot Net Framework

• Python

• Selenium Testing

• Conference / Event Management Service

• Academic Project Guidance

12
OBJECTIVES OF INTERNSHIP

1. Gain knowledge and experience in the field of machine learning through hands-on projects.

2. Work collaboratively with teams of experts.

3. Assist with data collection, pre-processing, and analysis.

4. Ability to understand data, design a model and understanding the intricaciesof it.

5. Understand and apply machine learning algorithms to solve problems.

6. Develop software tools and applications to support machine learning projects.

7. Identify potential areas of improvement and develop strategies to address them.

8. Prepare reports and presentations to communicate and interpret project results.

13
TASKS PERFORMED

Sl.no Name of the Topic Date

01 Introduction to python 23/08/2022 to


30/08/2022
02 Introduction to Machine Learning 01/09/2022 to
09/09/2022
03 Machine Learning Algorithms 12/09/2022 to
20/09/2022
04 Project Work 21/09/2022 to
25/09/2022
05 Certification and Others 26/09/2022 to
27/09/2022

14
Introduction to Python

Python
Python is a popular programming language. It was created by Guido van Rossum,and released in
1991.

It is used for:
 web development (server-side),
 software development,
 mathematics,
 system scripting.

Application of Python
 Python can be used on a server to create web applications.
 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software
development.

Features of Python
 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi,etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer linesthan some
other programming languages.
 Python runs on an interpreter system, meaning that code can be executed assoon as it is
written. This means that prototyping can be very quick.
 Python can be treated in a procedural way, an object-oriented way or afunctional
way.

Python Syntax compared to other programming languages


 Python was designed for readability, and has some similarities to the English language
with influence from mathematics.
 Python uses new lines to complete a command, as opposed to other
15
programming languages which often use semicolons or parentheses.
 Python relies on indentation, using whitespace, to define scope; such as thescope of
loops, functions and classes. Other programming languages often use curly-brackets for
this purpose.

16
Introduction to ML

Machine Learning (ML) is that field of computer science with the help of which computer
systems can provide sense to data in much the same way as human beings do. In simple words,
ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm
or method. The main focus of ML is to allow computer systems learn from experience without
being explicitly programmed or human intervention.

Human beings, at this moment, are the most intelligent and advanced species on earth because
they can think, evaluate and solve complex problems. On the other side, AI is still in its initial
stage and haven’t surpassed human intelligence in many aspects. Then the question is that what is
the need to make machine learn? The most suitable reason for doing this is, “to make decisions,
based on data, with efficiency and scale”. Lately, organizations are investing heavily in newer
technologies like Artificial Intelligence, Machine Learning and Deep Learning to get the key
information from data to perform several real-world tasks and solve problems. We can call it
data-driven decisions taken by machines, particularly to automate the process. These data-
driven decisions can be used, instead of using programing logic, in the problems that cannot be
programmed inherently. The fact is that we can’t do without human intelligence, but other
aspect is that we all need to solve real-world problems with efficiency at a huge scale. That is
why the need for machine learning arises.

Libraries used in Machine Learning:

1. Numpy:
NumPy is a Python library used for working with arrays.It also has functions for working in
domain of linear algebra, fourier transform, and matrices.NumPy was created in 2005 by
Travis Oliphant. It is an open source project and you can use it freely. NumPy stands for
Numerical Python.

In Python we have lists that serve the purpose of arrays, but they are slow to process.NumPy
aims to provide an array object that is up to 50x faster than traditional Python lists.The array
object in NumPy is called ndarray, it provides alot of supporting functions that make working
with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very important.
2. Pandas:
pandas is a Python package providing fast, flexible, and expressive data structures designed to
make working with “relational” or “labeled” data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, real-world data analysis in Python.
Additionally, it has the broader goal of becoming the most powerful and flexible open source
data analysis/manipulationtool available in any language. It is already well on its way toward this
goal.

pandas is well suited for many different kinds of data:

 Tabular data with heterogeneously-typed columns, as in an SQL table orExcel


spreadsheet
 Ordered and unordered (not necessarily fixed-frequency) time series data.
 Arbitrary matrix data (homogeneously typed or heterogeneous) with row andcolumn
labels
 Any other form of observational / statistical data sets. The data need not belabeled at
all to be placed into a pandas data structure.

3. Matplotlib:
Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and
its numerical extension NumPy. As such, it offers a viable open sourcealternative to MATLAB.
Developers can also use matplotlib’s APIs (Application Programming Interfaces) to embed plots
in GUI applications.
A Python matplotlib script is structured so that a few lines of code are all that is required in
most instances to generate a visual data plot. The matplotlib scriptinglayer overlays two APIs:
 The pyplot API is a hierarchy of Python code objects toppedby
matplotlib.pyplot
 An OO (Object-Oriented) API collection of objects that can be assembledwith greater
flexibility than pyplot. This API provides direct access to Matplotlib’s backend layers.

Internship report 16
4. Scikit-learn:

Scikit-learn is an open source data analysis library, and the gold standard for Machine
Learning (ML) in the Python ecosystem. Key concepts and featuresinclude:
Algorithmic decision-making methods, including:

Classification: identifying and categorizing data based on patterns.

Regression: predicting or projecting data values based on the average mean ofexisting and
planned data.

Clustering: automatic grouping of similar data into datasets.

Internship report 17
Machine Learning Algorithms
Machine Learning algorithms are the programs that can learn the hidden patterns from the data,
predict the output, and improve the performance from experiences on their own. Different
algorithms can be used in machine learning for different tasks, such as simple linear regression
that can be used for prediction problems like stock market prediction, and the KNN algorithm can
be used for classification problems.

In this topic, we will see the overview of some popular and most commonlyused machine
learning algorithms along with their use cases and categories.

Types of Machine Learning Algorithms


Machine Learning Algorithm can be broadly classified into three types:

1. Supervised Learning Algorithms


2. Unsupervised Learning Algorithms
3. Reinforcement Learning algorithm

1) Supervised Learning Algorithm

Supervised learning is a type of Machine learning in which the machine needs external
supervision to learn. The supervised learning models are trained using the labeled dataset. Once
the training and processing are done, the model is tested by providing a sample test data to check
whether it predicts the correct output.

The goal of supervised learning is to map input data with the output data. Supervised learning is
based on supervision, and it is the same as when a student learns things in the teacher's
supervision. The example of supervised learning is spam filtering.

Supervised learning can be divided further into two categories of problem:

o Classification
o Regression

Examples of some popular supervised learning algorithms are Simple Linearregression, Decision
Tree, Logistic Regression, KNN algorithm, etc. Read more..

Internship report 18
2) Unsupervised Learning Algorithm

It is a type of machine learning in which the machine does not need any external supervision to
learn from the data, hence called unsupervised learning. The unsupervised models can be trained
using the unlabelled dataset that is not classified,

nor categorized, and the algorithm needs to act on that data without any supervision. In
unsupervised learning, the model doesn't have a predefined output, and it tries to find useful
insights from the huge amount of data. These are used to solve the Association and Clustering
problems. Hence further, it can be classified into two types:

o Clustering
o Association

Examples of some Unsupervised learning algorithms are K-means Clustering, Apriori Algorithm,
Eclat, etc. Read more..

3) Reinforcement Learning

In Reinforcement learning, an agent interacts with its environment by producing actions, and
learn with the help of feedback. The feedback is given to the agent in the form of rewards, such as
for each good action, he gets a positive reward, and for each bad action, he gets a negative reward.
There is no supervision provided to the agent. Q-Learning algorithm is used in reinforcement
learning.

Internship report 19
ALGORITHMS:
1. Linear Regression

Linear regression is one of the most popular and simple machine learning algorithms that is used
for predictive analysis. Here, predictive analysis defines prediction of something, and linear
regression makes predictions for continuous numbers suchas salary, age, etc.

It shows the linear relationship between the dependent and independent variables, and shows how
the dependent variable(y) changes according to the independent variable (x).

It tries to best fit a line between the dependent and independent variables, and this best fit line is
knowns as the regression line.

The equation for the regression line is:

y= a0+ a*x+ b

Here, y= dependent variable

x= independent variable
a0 = Intercept of line.

Linear regression is further divided into two types:

o Simple Linear Regression: In simple linear regression, a single independentvariable is


used to predict the value of the dependent variable.
o Multiple Linear Regression: In multiple linear regression, more than one independent
variables are used to predict the value of the dependent variable.

The below diagram shows the linear regression for prediction of weight according to height: Read
more..

Internship report 20
2. Logistic Regression
Logistic regression is the supervised learning algorithm, which is used to predict the categorical
variables or discrete values. It can be used for the classification problems in machine learning, and
the output of the logistic regression algorithm can be eitherYes or NO, 0 or 1, Red or Blue, etc.

Logistic regression is similar to the linear regression except how they are used, such as Linear
regression is used to solve the regression problem and predict continuous values, whereas
Logistic regression is used to solve the Classification problem and used to predict the discrete
values.

Instead of fitting the best fit line, it forms an S-shaped curve that lies between 0 and
1. The S-shaped curve is also known as a logistic function that uses the concept of the threshold.
Any value above the threshold will tend to 1, and below the thresholdwill tend to 0.

Internship report 21
3. Decision Tree Algorithm

A decision tree is a supervised learning algorithm that is mainly used to solve the classification
problems but can also be used for solving the regression problems. It can work with both
categorical variables and continuous variables. It shows a tree- like structure that includes nodes
and branches, and starts with the root node that expand on further branches till the leaf node. The
internal node is used to represent the features of the dataset, branches show the decision rules, and
leaf nodes representthe outcome of the problem.

Some real-world applications of decision tree algorithms are identification between cancerous
and non-cancerous cells, suggestions to customers to buy a car, etc

4. Support Vector Machine Algorithm


A support vector machine or SVM is a supervised learning algorithm that can also be used for
classification and regression problems. However, it is primarily used for classification problems.
The goal of SVM is to create a hyperplane or decision boundary that can segregate datasets into
different classes.

The data points that help to define the hyperplane are known as support vectors, and hence it is
named as support vector machine algorithm.

Some real-life applications of SVM are face detection, image classification, Drug discovery, etc.
Consider the below diagram:

Internship report 22
5. K-Nearest Neighbour (KNN)

K-Nearest Neighbour is a supervised learning algorithm that can be used for both classification
and regression problems. This algorithm works by assuming the similarities between the new data
point and available data points. Based on these similarities, the new data points are put in the
most similar categories. It is also known as the lazy learner algorithm as it stores all the available
datasets and classifies each new case with the help of K-neighbours. The new case is assigned to the
nearest class with most similarities, and any distance function measures the distance between the
data points. The distance function can be Euclidean, Minkowski, Manhattan, or Hamming
distance, based on the requirement.

Internship report 23
Project Implementation

A predictive model for forecasting demand and supply information of TOP


crops- Compsoft Technologies. Built a python application that analyses the top
crops at any given time, depending on the season or demand. You can use the
dataset available on the Internet to use

India is majorly an agriculture-based economy. Around 42% of the people depend on


agriculture for their livelihood. The economic upliftment of farmers happens when there is a
seamless transfer of agricultural produce from producers to the consumers. It is evident that
there is a huge gap between demand and supply of various crops, due to which both farmers
and consumers are facing problems. At present, in India there is no system in place to
efficiently manage this demand and supply issue. The potential of present-day technologies
like data analytics, machine learning can be exploited to overcomethese issues. The available
data about the demand, supply, price variation of thecrops and other factors affecting the
supply chain of agricultural produce can beused to analyze and come up with a model to
predict and forecast market variations of agricultural crops.

Proposed Model

For farmers, the arrangement of a financial motivation is an exceptionally significant


factor in choice to plant or reject the manor of yields. Besides, dataon demand and supply
of TOP crops can provide useful information to farmers, governments, and the common
man. It can inform issues about price, demand and production of the crop. The agriculture
sector enables 64 % of therural workforce. According to the latest survey of Agricultural
Households conducted by National sample survey office, nearly half of the farmers’
income come from the crop production. The critical farming decisions are made by
analyzing the price data. The price chart for TOP crops are prepared by gathering,
reviewing and analyzing the various data collected from the different authentic sources.
Every part of the information is inspected and assessed utilizing diagnostic and coherent
thinking. Demand supply analysis can track the price of the crops which will influence the
farmers decision of crop production.

Internship report 24
Objectives

This analysis helps the farmers in evaluating future demands of the crops. This report
will help farmersdetermine the variety and the time of planting the crops. Main objective of
this work includes helping the farmers by providing historical crop yield data with cost
forecasting for risk management. Also, datacollected would help the administration in
making crop protection arrangements and strategies for inventory network activity. This
will likewise enable the government to have an equilibrium costing over the TOP crops, so
they can sell the item in the market for reasonable expense.

 Automation of the entire system improves the efficiency

 It provides a friendly graphical user interface which proves to be better when


compared to the existing system.

 It gives appropriate access to the authorized users depending on their


permissions.

 It effectively overcomes the delay in communications.

 Updating of information becomes so easier

 System security, data security and reliability are the striking features.

 The System has adequate scope for modification in future if it is necessary.

Internship report 25
DESIGN & ANALYSIS

The work was carried out in four stages- data collection, data pre- processing,
prediction and data visualization. General parameters for predicting the crop price are
climate change, Government policy and demand. The acquired data was not present in the
required format; thus, it was structured in the required format. Prediction is done by utilizing
past year data of crop price. The output provides one year data i.e., a yearly price prediction.
Consider two input parameters for yearly price prediction. The price of the crop which is
output under a certain period, it is the price of input in the previous period. Machine
Learning algorithms are used to predict and forecast demand-supply and prices. Prediction
is accomplished using regression analysis. It is a type of supervised learning in the domain
of Machine Learning that results in a predicted relationship between labels and data points.

Figure : Workflow Model

Internship report 26
Data acquisition and ingestion:

The process of transporting data from one or more sources to a target site for
further processing and analysis. Agricultural data of previous years arecollected and used
by the system. This dataset includes crop areas, types of crops cultivated, nature of the soil,
yields and overall crops consumption. Datais gathered from authentic websites like
Ministry of Agriculture & Farmers Welfare, Food and Agriculture Organization, APEDA,
NITI Aayog, Agriculture Marketing Department of Karnataka, Indiastat.com and
Competition Commission. Additionally, some unpublished data has also been procured
from APMCs and district agriculture and horticulture departments.
The weather condition data is collected from authentic sources like IMD.

Figure : Dataset

Data pre-processing:
Crop prices are affected by several factors such as climate, supply and demand. The
obtained data contained huge number of outliers, null values and many discontinuous
values. An outlier is a data point that is noticeably different from the rest. They represent
errors in measurement, bad data collection, or simply show variables not considered when
collecting the data. Learning algorithms are sensitive to outliers. Using Python libraries in
Excel we managedto reduce the outliers and error values.

Since demand data was unaccounted in any authentic websites, simulated

Internship report 27
data has been used for analysis. We calculated this data by making use of demand curve
formula which fits the curve. Yearly data are collected for forecasting because it has less
noise. As the data for demand was simulated, the required accuracy is partially met, when
actual data isavailable, accuracy of themodel can be increased.

y = Max Price (Rs. /Quintal)


x = Production (Kg)

The equation that depicts the relationship between the price of a certain commodity and the
quantity of that commodity that is demanded at that price can be given as
Qd = a – m P
Qd = Linear demand curvea = Production (Kg)
m= Slope
P = Price (Rs)

Designing the predictive model:

The algorithms and tools thus selected are familiarized by carrying out some test runs
and finding the most optimal algorithm to satisfy the needs. Algorithmslike Linear regression,
Logistic regression and Random Forest are used for prediction and classification. Linear
regression is initially carried out to predictthe value of a variable based on the value of
another independent variable. Thechosen algorithms are then implemented in sequence to
design a predictive model.

Model Validation:

Model validation is carried out in two phases. In the initial phase, real time data is given as
an input to the designed predictive model to obtain the forecasting information. These results are
compared, verified and validated against the authenticdata to check for accuracy. In the second
phase, Orange3, a python- based data visualization, machine learning and data mining tool kit has
been used for explorativerapid qualitative data analysis to validate our prediction model.

Internship report 28
Code Snippet:

# import the necessary libraries

import pandas as pd

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt

# load the dataset

data = pd.read_csv('crop_data.csv')

# preprocess the data

# convert the date column to a datetime object

data['date'] = pd.to_datetime(data['date'])

# set the date column as the index of the dataframe

data.set_index('date', inplace=True)

# create a new column for the previous year's prices

data['prev_price'] = data.groupby('crop')['price'].shift(1)

# remove rows with missing values

data.dropna(inplace=True)

# select the input and output variables

X = data[['prev_price', 'climate_change', 'government_policy',


'demand']]

y = data[['price']]

# split the dataset into training and testing data

Internship report 29
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=0)

# create the linear regression model

model = LinearRegression()

# train the model on the training data

model.fit(X_train, y_train)

# evaluate the model on the testing data

score = model.score(X_test, y_test)

# make predictions on new data

new_data = pd.DataFrame({'prev_price': [100],


'climate_change': [1], 'demand': [500]})

prediction = model.predict(new_data)

# visualize the results

plt.plot(data.index, data['price'])

plt.xlabel('Date')

plt.ylabel('Price')

plt.title('Crop Prices Over Time')

plt.show()

Internship report 30
SNAPSHOTS

Fig: Demand wrt year

Internship report 31
Fig: Price wrt Demand

Internship report 32
CONCLUTION
The package was designed in such a way that future modifications can be done easily. The
following conclusions can be deduced from the development of the project:

 Automation of the entire system improves the efficiency

 It provides a friendly graphical user interface which proves to be better when compared
to the existing system.

 It gives appropriate access to the authorized users depending on their permissions.

 It effectively overcomes the delay in communications.

 Updating of information becomes so easier

 System security, data security and reliability are the striking features.

 The System has adequate scope for modification in future if it is necessary.

Internship report 33

You might also like