DAL Lab File

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 38

INDORE INSTITUTE OF SCIENCE & TECHNOLOGY, INDORE

Department of Computer Science & Engineering

LAB MANUAL

Subject: - Data Analytics Lab


(CS605)

BE: Third Year

Name of Student: -Pratyoosh mishra


Enrollment no: 0818CS191132
Section: CS-2

Session 2021-22

INDORE INSTITUTE OF SCIENCE & TECHNOLOGY, INDORE


Department of Computer Science & Engineering
CERTIFICATE

This is to certify that the experimental work entered in this journal as per the B. tech IIIrd

year syllabus prescribed by the RGPV was done by Piyush Mahajan B. tech year VI

semester in the Data Analytics Lab Laboratory of this institute during the academic year

2021-22 Signature of Head Signature of the Faculty

The Vision of the Institute is:


To be a nationally recognized institution of excellence in technical education and produce
competent professionals capable of making valuable contribution to the society.
The Mission of the Institute is:
• To promote academic growth by offering state-of-the-art undergraduate and
postgraduate programmes.
• To undertake collaborative projects which offer opportunities for interaction with
academia and industry.
• To develop intellectually capable human potential who are creative, ethical and
gifted leaders.
VISION & MISSION OF DEPARTMENT
Vision of the Department:
To be a center of academic excellence in the field of computer science and engineering
education.
Mission of the Computer Science and Engineering Department:
• Strive for academic excellence in computer science and engineering through well
designed course curriculum, effective classroom pedagogy and in-depth knowledge of
laboratory work.
• Transform under graduate engineering students into technically competent, socially
responsible and ethical computer science and engineering professionals.
• Create computing centers of excellence in leading areas of computer science and
engineering to provide exposure to the students on latest software tools and computing
technologies.
• Incubate, apply and spread innovative ideas by collaborating with relevant industries and
R&D labs through focused research groups.
• Attain these through continuous team work by a group of committed faculties,
transforming the computer science and engineering department as a leader in imparting
computer science and engineering education and research.
PROGRAMME EDUCATIONAL OBJECTIVES (PEOs)

PEO 1: To provide students with a solid foundation in mathematics, computer science and
engineering, basic science fundamentals required to solve the computing problems.
PEO 2: To expose students to latest computing technologies and software tools, so that they
can comprehend, analyze, design and create innovative computing products and solutions
for real life problems.
PEO 3: To inculcate in students’ multi-disciplinary approach, professional attitude and
ethics, communication and teamwork skills, and ability to relate computer engineering
issues with social awareness.
PEO 4: To develop professional skills in students that prepare them for immediate
employment and for life-long learning in advanced areas of computer science and related
fields which enable them to be successful entrepreneurs.
PROGRAM SPECIFIC OUTCOMES (PSO's)
A graduate of the Computer Science and Engineering Program will demonstrate:
PSO 1: Computer Science Specific Skills: The ability to identify, analyze and design
solutions for complex engineering problems in multidisciplinary areas by understanding the
core principles and concepts of computer science and thereby engage in national grand
challenges.
PSO 2: Programming and Software Development Skills: The ability to acquire programming
efficiency by designing algorithms and applying standard practices in software project
development to deliver quality software products meeting the demands of the industry.
PSO 3: Professional Skills: The ability to apply the fundamentals of computer science in
competitive research and to develop innovative products to meet the societal needs thereby
evolving as an eminent researcher and entrepreneur.

Department of Computer Science and CS- Data Analytics Professional


Engineering 605 Lab Code

Lecture Tutorial Lab Total Hours

0 0 6 6

PROGRAMME OUTCOMES (POs)


PO 1 Apply the knowledge of mathematics, science and engineering fundamentals for the
solution of computer science and engineering problems.
PO 2 Ability to identify, formulate and analyze the complex engineering problems
PO 3 Ability to design and develop the computer-based systems to meet desired needs
within realistic constraints such as public health and safety, environmental, agriculture,
economic and societal considerations
PO 4 Ability to demonstrate with excellent programming, analytical, logical and
problemsolving skills PO 5 Ability to use the emerging technologies, skills, and modern
software tools to design, develop, test and debug the programs or software
PO 6 Ability to include and solve the social, cultural, ethical issues with computer science
and engineering solutions
PO 7 Ability to design and develop web-based solutions with effective graphical user
interface for the need of sustainable development
PO 8 Apply ethical principles and commit to professional ethics and responsibilities and
norms of the computer science and engineering practices.
PO 9 Ability to work individually and as a member or leader in diverse teams to accomplish
a common goal.
PO 10 Ability to communicate effectively in both verbal and written forms with engineering
community and society
PO 11 Knowledge and understanding of the engineering and management principles and
apply these to one's own work, as a member and leader in a team to manage the software
and IT based projects in multidisciplinary environments.
PO 12 Appreciation of technological change and the need for independent life-long learning
S.No. List of experiments Page
No.

Write a program to demonstrate various basic operations of python’s


1 add-on library NumPy.

Write a program to demonstrate various basic operations of python’s


2 add-on library pandas.

Write a program to demonstrate Plotting and Visualization.


3

4 Implementation of R language data structure.

5 Data Manipulation using R Language.

Data Visualization using R Language.


6

EXPERIMENT NO. 1

Aim / Title: Implementation of basic operations of python’s add-on library NumPy.


Problem Statement: Write a program to demonstrate various basic operations of python’s
add-on library NumPy.

Objectives: To write and execute programs in python to demonstrate python’s various basic
operations of python’s add-on library NumPy.

Outcomes: To get the grip on python’s built in functions and data structures to use in data
manipulation and analysis using python add on library NumPy.

Prerequisite: You must be comfortable with variables, linear equations, graphs of functions,
histograms, and statistical means.
You should be a good programmer. Ideally, you should have some experience programming
in python because the programming exercises are in Python. However, experienced
programmers without Python experience can usually complete the programming exercises
anyway.

Hardware requirements: Memory and disk space required per user: 1GB RAM + 1GB of
disk + . 5 CPU core.Server overhead: 2-4GB or 10% system overhead (whatever is larger), .
5 CPU cores, 10GB disk space.Port requirements: Port 8000 plus 5 unique, random ports
per notebook.

Software requirements: jupyter notebook , anaconda platform or any online platform to run
the model.

Theory: What is NumPy?


NumPy is a Python library used for working with arrays.

It also has functions for working in the domain of linear algebra, fourier transform, and
matrices.

NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use
it freely.

NumPy stands for Numerical Python.

Why Use NumPy?


In Python we have lists that serve the purpose of arrays, but they are slow to process.

NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.
The array object in NumPy is called ndarray, it provides a lot of supporting functions that
make working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very
important.

Data Science: is a branch of computer science where we study how to store, use and analyze
data for deriving information from it.

Why is NumPy Faster Than Lists?


NumPy arrays are stored at one continuous place in memory unlike lists, so processes can
access and manipulate them very efficiently.
This behavior is called locality of reference in computer science.
This is the main reason why NumPy is faster than lists. Also it is optimized to work with the
latest CPU architectures.
Which Language is NumPy written in?
NumPy is a Python library and is written partially in Python, but most of the parts that
require fast computation are written in C or C++.
Where is the NumPy Codebase?
The source code for NumPy is located at this github
repository https://github.com/numpy/numpy

Program and Output :


Conclusion: Now we are able to use python’s built in functions and data structures to use in
data manipulation and analysis using python’s add on library NumPy.

Sample Viva Questions and Answers:


Q.1 What is the difference between NumPy and pandas?

Ans : The Pandas module mainly works with the tabular data, whereas the NumPy module
works with the numerical data. NumPy library provides objects for multi-dimensional
arrays, whereas Pandas is capable of offering an in-memory 2d table object called
DataFrame. NumPy consumes less memory as compared to Pandas.

Q.3 What are the features of NumPy? Ans :


NumPy Features
• High-performance N-dimensional array object. ...
• It contains tools for integrating code from C/C++ and Fortran. ...
• It contains a multidimensional container for generic data. ...
• Additional linear algebra, Fourier transform, and random number capabilities. ...
• It consists of broadcasting functions.

Q.4 Why is NumPy used in machine learning?


Ans : NumPy is a library for the Python programming language, adding support for large,
multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays. Moreover NumPy forms the foundation
of the Machine Learning stack.

Roll No. Name of Date of Date of Grade Sign of Sign of


Student Performance Evaluation Student Faculty

0818CS19 Pratyoosh
1132 Mishra
EXPERIMENT NO. 2

Aim / Title: Implementation of basic operations of python’s add-on library Pandas.

Problem Statement: Write a program to demonstrate various basic operations of python’s


add-on library pandas.

Objectives: To write and execute programs in python to demonstrate python’s various basic
operations of python’s add-on library Pandas.

Outcomes: To get the grip on python’s built in functions and data structures to use in data
manipulation and analysis using python add on library Pandas.

Prerequisite: You must be comfortable with variables, linear equations, graphs of functions,
histograms, and statistical means.
You should be a good programmer. Ideally, you should have some experience programming
in python because the programming exercises are in Python. However, experienced
programmers without Python experience can usually complete the programming exercises
anyway.

Hardware requirements: Memory and disk space required per user: 1GB RAM + 1GB of
disk + . 5 CPU core.Server overhead: 2-4GB or 10% system overhead (whatever is larger), .
5 CPU cores, 10GB disk space.Port requirements: Port 8000 plus 5 unique, random ports
per notebook.

Software requirements: jupyter notebook , anaconda platform or any online platform to run
the model.

Theory: What is Pandas?


Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and
was created by Wes McKinney in 2008.

Why Use Pandas?


Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.

Data Science: is a branch of computer science where we study how to store, use and analyze
data for deriving information from it.
What Can Pandas Do?
Pandas gives you answers about the data. Like:

• Is there a correlation between two or more columns?


• What is the average value?
• Max value?  Min value?

Pandas are also able to delete rows that are not relevant, or contain wrong values, like empty
or NULL values. This is called cleaning the data.

Where is the Pandas Codebase?


The source code for Pandas is located at this github repository
https://github.com/pandasdev/pandas
Program and Output :
Conclusion: Now we are able to use python’s built in functions and data structures to use in
data manipulation and analysis using python’s add on library Pandas.

Sample Viva Questions and Answers: Q.1


Why is pandas used in Python?

Ans : Pandas is the most popular python library that is used for data analysis. It provides
highly optimized performance with back-end source code that is purely written in C or
Python. We can analyze data in pandas with: Series.
Q.2 What are the significant features of the pandas Library?
Ans : The key features of the panda's library are as follows:

• Memory Efficient

• Data Alignment

• Reshaping

• Merge and join

• Time Series

Program and Output:

Conclusion:

Q.3 Explain Reindexing in pandas?


Ans : Reindexing is used to conform DataFrame to a new index with optional filling logic. It
places NA/NaN in that location where the values are not present in the previous index. It
returns a new object unless the new index is produced as equivalent to the current one, and
the value of copy becomes False. It is used to change the index of the rows and columns of
the DataFrame.

Roll Name of Date of Date of Grade Sign of Sign of


No. Student Performance Evaluation Student Faculty

0818CS Pratyoosh
191132 Mishra
EXPERIMENT NO. 3

Aim / Title: Implementation of python’s library for data visualisation and plotting.

Problem Statement: Write a program to demonstrate Plotting and Visualization.

Objectives: To write and execute programs in python to demonstrate python’s data


manipulation capabilities by using built-in data structures and functions for data
visualisation and plotting.

Outcomes: To get the grip on python’s built in library for data visualisation and plotting.

Prerequisite: You must be comfortable with variables, linear equations, graphs of functions,
histograms, and statistical means.
You should be a good programmer. Ideally, you should have some experience programming
in python because the programming exercises are in Python. However, experienced
programmers without Python experience can usually complete the programming exercises
anyway.

Hardware requirements: Memory and disk space required per user: 1GB RAM + 1GB of
disk + . 5 CPU core.Server overhead: 2-4GB or 10% system overhead (whatever is larger), .
5 CPU cores, 10GB disk space.Port requirements: Port 8000 plus 5 unique, random ports
per notebook.

Software requirements: jupyter notebook , anaconda platform or any online platform to run
the model.

Theory:

NameVisual dimensionsDescription / Example usages






Bar chart of tips by day of week

Bar chart
length/count
category
color
Presents categorical data with rectangular bars with heights or lengths
proportional to the values that they represent. The bars can be plotted vertically
or horizontally.
• A bar graph shows comparisons among discrete categories. One axis of the chart
shows the specific categories being compared, and the other axis represents a
measured value.
• Some bar graphs present bars clustered in groups of more than one, showing the
values of more than one measured variable. These clustered groups can be
differentiated using color.
• For example; comparison of values, such as sales performance for several
persons or businesses in a single time period.

Histogram of housing prices

Histogram




• bin limits
• count/length
• color
• An approximate representation of the distribution of numerical data. Divide the
entire range of values into a series of intervals and then count how many values
fall into each interval this is called binning. The bins are usually specified as
consecutive, non-overlapping intervals of a variable. The bins (intervals) must
be adjacent, and are often (but not required to be) of equal size.
• For example, determining frequency of annual stock market percentage returns
within particular ranges (bins) such as 0-10%, 11-20%, etc. The height of the bar
represents the number of observations (years) with a return % in the range
represented by the respective bin.

Basic scatterplot of two variables

Scatter plot
x position y
position
symbol/glyp
h color size
• Uses Cartesian coordinates to display values for typically two variables for a set
of data.
• Points can be coded via color, shape and/or size to display additional variables.
• Each point on the plot has an associated x and y term that determines its location
on the cartesian plane.
• Scatter plots are often used to highlight the correlation between variables (x and
y).




Scatter plot

Scatter plot (3D)

• position x
• position y
• position z
• color
• symbol
• size
• Similar to the 2-dimensional scatter plot above, the 3-dimensional scatter plot
visualizes the relationship between typically 3 variables from a set of data. 
Again point can be coded via color, shape and/or size to display additional
variables

Network analysis

Network





nodes size
nodes color
ties
thickness
ties color
spatializatio
n
• Finding clusters in the network (e.g. grouping Facebook friends into different
clusters).
• Discovering bridges (information brokers or boundary spanners) between
clusters in the network
• Determining the most influential nodes in the network (e.g. A company wants to
target a small group of people on Twitter for a marketing campaign).
• Finding outlier actors who do not fit into any cluster or are in the periphery of a
network.

Pie chart

Pie chart

• color
• Represents one categorical variable which is divided into slices to illustrate
numerical proportion. In a pie chart, the arc length of each slice (and
consequently its central angle and area), is proportional to the quantity it
represents.
• For example, as shown in the graph to the right, the proportion of English native
speakers worldwide


Line chart

Line chart

• x position
• y position
• symbol/glyph
color
size
Represents information as a series of data points called 'markers' connected by
straight line segments.
Similar to a scatter plot except that the measurement points are ordered
(typically by their x-axis value) and joined with straight line segments.
• Often used to visualize a trend in data over intervals of time – a time series –
thus the line is often drawn chronologically.

Program and output:


1. Bar Chart
2. Histogram

3. Scatter plot



4. Scatter plot (3D)
5. Pie chart

Conclusion: Now we are able to use python’s built in libraries for data visualization and
plotting.

Sample Viva Questions and Answers:


Q.1 What is Data Visualization in Python?
Ans : Data visualization is the discipline of trying to understand data by placing it in a
visual context so that patterns, trends and correlations that might not otherwise be detected
can be exposed. Python offers multiple great graphing libraries that come packed with lots
of different features.
Q.2 Why is data visualization important?
Ans : Data visualization gives us a clear idea of what the information means by giving it
visual context through maps or graphs. This makes the data more natural for the human
mind to comprehend and therefore makes it easier to identify trends, patterns, and outliers
within large data sets.
Q.3 What are data visualization techniques?
Ans : Understanding Data Visualization Techniques. Data visualization is a graphical
representation of information and data. By using visual elements like charts, graphs, and
maps, data visualization tools provide an accessible way to see and understand trends,
outliers, and patterns in data.

Roll Name of Date of Date of Grade Sign of Sign of


No. Student Performance Evaluation Student Faculty

0818CS Pratyoosh
191132 Mishra

EXPERIMENT NO. 4

Aim / Title: Implementation of R language data structure.

Problem Statement: Write a program of various data structure of R.

Objectives: To write and execute programs in R to demonstrate R’s data structure


capabilities by using built-in data structures and functions.

Outcomes: To get the grip in R language Data Strucutre.

Prerequisite: You must be comfortable with variables, linear equations, graphs of functions,
histograms, and statistical means.
You should be a good programmer. Ideally, you should have some experience programming
in pR because the programming exercises are in R. However, experienced programmers
without R experience can usually complete the programming exercises anyway.
Hardware requirements: Memory and disk space required per user: 1GB RAM + 1GB of
disk + . 5 CPU core.Server overhead: 2-4GB or 10% system overhead (whatever is larger), .
5 CPU cores, 10GB disk space.Port requirements: Port 8000 plus 5 unique, random ports
per notebook.

Software requirements: jupyter notebook , anaconda platform or any online platform to run
the model.

Theory:
R Data Structure:
To make the best of the R language, you’ll need a strong understanding of the basic data
types and data structures and how to operate on them.
Data structures are very important to understand because these are the objects you will
manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most
common sources of frustration for beginners.
Vectors:
A vector is a collection of elements that are most commonly of
mode character, logical, integer or numeric.
You can create an empty vector with vector(). (By default the mode is logical. You can be
more explicit as shown in the examples below.) It is more common to use direct
constructors such as character(), numeric(), etc.

List:
In R lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to
a single mode and can encompass any mixture of data types. Lists are sometimes called
generic vectors, because the elements of a list can by of any type of R object, even lists
containing further lists. This property makes them fundamentally different from atomic
vectors. A list is a special type of vector. Each element can be a different type.
Create lists using list( or coerce other objects usingas.list(). An empty list of the required
)
length can be created using vector(
Matrices )

In these R objects, the elements are organised in a 2-dimensional layout. Matrices hold
elements of similar atomic types. These are beneficial when the elements belong to a
single class. Matrices having numeric elements are created for mathematical calculations.
You can create matrices using the matrix() function. The basic syntax to create a matrix
is given below:
matrix(data, nrow, ncol, byrow, dimnames)

Factors
These R objects are used for categorizing data and storing them as levels. They are good for
statistical modelling and data analysis. Both integers and strings can be stored in factors.
You can use the factor() function for creating a factor by providing a vector as an input to
the method.

Data Frame:
A data frame is a very important data type in R. It’s pretty much the de facto data structure
for most tabular data and what we use for statistics.
A data frame is a special type of list where every element of the list has same length (i.e.
data frame is a “rectangular” list).
Data frames can have additional attributes such rownames(), which can be useful for
as
annotating data, likesubject_id or sample_i . But most of the time they are not used.
d
Some additional information on data
frames:
• Usually created by read.csv() and read.table(), i.e. when importing the data into R.
• Assuming all columns in a data frame are of same type, data frame can be converted
to a matrix with data.matrix() (preferred) or as.matrix(). Otherwise type coercion will
be enforced and the results may not always be what you expect.
 Can also create a new data framedata.frame() function.
with
 Find the number of rows and columns nrow(dat) and ncol(dat), respectively.
 with
Rownames are often automatically generated and look like 1, 2, …, n.
Consistency in
numbering of rownames may not be honored when rows are reshuffled or subset.

Program and Output:


VECTOR:-

LIST:-
MATRICS:-
DATA FRAMES:-
Conclusion:

Q.1 How vector is different from List data Structure?


ANS:-

Vector List
It has contiguous memory. While it has non-contiguous memory.
It is synchronized. While it is not synchronized.
Vector may have a default size. List does not have default size.
In vector, each element only In list, each element requires extra requires the
space for itself only. space for the node which holds the
element, including pointers to the next
and previous elements in the list.
Insertion at the end requires Insertion is cheap no matter where constant time
but insertion in the list it occurs.
elsewhere is costly.
Vector is thread safe. List is not thread safe.
Random access of elements is Random access of elements is not possible.
possible

Q.2 What is Data Frame in R?


ANS:- Data Frames in R Language are generic data objects of R which are used
to store the tabular data. Data frames can also be interpreted as matrices where
each column of a matrix can be of the different data types. DataFrame is made up
of three principal components, the data, rows, and columns.

Roll No. Name of Date of Date of Grade Sign of Sign of


Student Performance Evaluation Student Faculty
0818CS191132 Pratyoosh
Mishra

EXPERIMENT NO. 5

Aim / Title: Data Manipulation using R Language.

Problem Statement: Write data manipulation code using R language.

Objectives: To write and execute programs in R to demonstrate R’s data manipulation


capabilities by using built-in data structures and functions.

Outcomes: Understand R language Data Manipulation.

Prerequisite: You must be comfortable with variables, linear equations, graphs of functions,
histograms, and statistical means.
You should be a good programmer. Ideally, you should have some experience programming
in pR because the programming exercises are in R. However, experienced programmers
without R experience can usually complete the programming exercises anyway.

Hardware requirements: Memory and disk space required per user: 1GB RAM + 1GB of
disk + . 5 CPU core.Server overhead: 2-4GB or 10% system overhead (whatever is larger), .
5 CPU cores, 10GB disk space.Port requirements: Port 8000 plus 5 unique, random ports
per notebook.

Software requirements: jupyter notebook , anaconda platform or any online platform to run
the model.

Theory:

Data manipulation in R using the dplyr package


R provides a simple and easy to use package called dplyr for data manipulation. The
package has some in-built methods for manipulation, data exploration and transformation.
Let us check out some of the most important functions of this package:
select()
The select() method is one of the basic functions for data manipulation in R. This method is
used for selecting columns in R. Using this, you can select data as with its column name.
The columns can be selected based on certain conditions. Suppose we want to select the 3rd
and 4th column of a data frame called myData, the code will be:
select(myData,3:4)
filter()
This method is used for filtering rows of a dataset that match specific criteria. It can work
like the select(), you pass the data frame first and then a condition separated using a
comma. For example, if you want to filter out columns that have cars that are red in colour
in a data set, you have to write:
filter(cars, colour==”Red”)
As a result, the matching rows will be displayed.

Pipe operator
The pipe operator is available in packages such as magrittr and dplyr for simplifying your
overall code. The operator lets you combine multiple functions together. Denoted by the
%>% symbol, it can be used with popular methods such as summarise(), filter(), select()
and group_by() while data manipulation in R.

Program and Output:


SELECT:-
FILTER:-
Conclusion:

Roll No. Name of Date of Date of Grade Sign of Sign of


Student Performance Evaluation Student Faculty
0818CS191124 Piyush
Mahajan

EXPERIMENT NO. 6

Aim / Title: Data Visualization using R Language.


Problem Statement: Write data visualization code using R language.

Objectives: To write and execute programs in R to demonstrate R’s data visualization


capabilities by using built-in data structures and functions.

Outcomes: Understand R language Data Visualization.

Prerequisite: You must be comfortable with variables, linear equations, graphs of functions,
histograms, and statistical means.
You should be a good programmer. Ideally, you should have some experience programming
in R because the programming exercises are in R. However, experienced programmers
without R experience can usually complete the programming exercises anyway.

Hardware requirements: Memory and disk space required per user: 1GB RAM + 1GB of
disk + . 5 CPU core.Server overhead: 2-4GB or 10% system overhead (whatever is larger), .
5 CPU cores, 10GB disk space.Port requirements: Port 8000 plus 5 unique, random ports
per notebook.

Software requirements: jupyter notebook , anaconda platform or any online platform to run
the model.

Theory:

Data Visualization in R
R is a language that is designed for statistical computing, graphical data analysis, and
scientific research. It is usually preferred for data visualization as it offers flexibility and
minimum required coding through its packages.

Plotting with ggplot2


ggplot2 is a plotting package that provides helpful commands to create complex plots from
data in a data frame. It provides a more programmatic interface for specifying what
variables to plot, how they are displayed, and general visual properties. Therefore, we only
need minimal changes if the underlying data change or if we decide to change from a bar
plot to a scatterplot. This helps in creating publication quality plots with minimal amounts
of adjustments and tweaking.
ggplot2 refers to the name of the package itself. When using the package we use the
function ggplot() to generate the plots, and so references to using the function will be
referred to as ggplot() and the package as a whole as ggplot2
Bar Plot
There are two types of bar plots- horizontal and vertical which represent data points as
horizontal or vertical bars of certain lengths proportional to the value of the data item.
They are generally used for continuous and categorical variable plotting. By setting the
horiz parameter to true and false, we can get horizontal and vertical bar plots
respectively.
Histogram
A histogram is like a bar chart as it uses bars of varying height to represent data
distribution. However, in a histogram values are grouped into consecutive intervals called
bins. In a Histogram, continuous values are grouped and displayed in these bins whose size
can be varied.
Box Plot
The statistical summary of the given data is presented graphically using a boxplot. A
boxplot depicts information like the minimum and maximum data point, the median value,
first and third quartile, and interquartile range.
Scatter Plot
A scatter plot is composed of many points on a Cartesian plane. Each point denotes the
value taken by two parameters and helps us easily identify the relationship between them.

Program and Output

BAR PLOT :-
HISTOGRAM :-
BOX PLOT:-
SCATTER PLOT : -
Conclusion:

Q.1 Which library is used for data visualization in R?


ANS:- ggplot2

Q.2 How plot bar plot in R?

ANS:- ggplot(data=diamonds,aes(x=price))+geom_bar()

Roll No. Name of Date of Date of Grade Sign of Sign of


Student Performance Evaluation Student Faculty
0818CS191132 Pratyoosh
Mishra

You might also like