Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Data Analytics and R

ASSIGNMENT 2

Submitted to: Submitted by:


Prabin Kumar Rout Pranati Choudhary
(BFT/19/424)
Q1. Working with Data in R
ANS: R is ideal for analysing larger data sets that would take too long to compute
manually. R Programming Language is an open-source programming language that is
widely used as a statistical software and data analysis tool. Data Frames in R Language are
generic data objects of R which are used to store the tabular data. Data frames can also be
interpreted as matrices where each column of a matrix can be of the different data types.
Data Frame is made up of three principal components, the data, rows, and columns.
R can be used to calculate a number of values that might be useful to us while doing research on a
data set. For example, we can find the mean, median, minimum, and maximum values in a
data set.. This will give us the minimum and maximum data points, as well as the mean,
median and first and third quartile values. We can summarise, find the head, tail or can even
add/subtract large values easily. We can also plot graphs and scatter diagrams using R.
How Can R analytics Be Implemented?
While R programming was originally designed for statisticians, it can be implemented for a
variety of uses including predictive analytics, data modeling, and data mining. Businesses
can implement R to create custom models for data collection, clustering, and analytics. R
analytics can provide a valuable way to quickly develop models targeted at understanding
specific areas of the business and delivering tailored insights on day-to-day needs.
R analytics can be used for the following purposes:

 Statistical testing  Prescriptive analytics  Predictive analytics  Time-series analysis 2 


What-if analysis  Regression models  Data exploration  Forecasting  Text mining 
Data mining  Visual analytics  Web analytics  Social media analytics  Sentiment
analysis
R can be used to solve real-world business problems by turbocharging an organization's
analytics program. It can be integrated into a business’s analytics platform to help users get
the most out of their data. With an extensive library of R functions and advanced statistical
techniques, R can be used to apply statistical models to your analysis and better understand
trends in the data. It can help predict potential business outcomes, identify opportunities
and risks and create interactive dashboards to gain a comprehensive view of the data. This
can lead to better business decisions and increased revenue.
Example
1. To create a data frame in R use data. Frame() command and then pass each of the
vectors you have created as arguments to the function.
2. A data frame in R can be expanded by adding new columns and rows to the already
existing data frame.

3. Extract data from a data frame means that to access its rows or columns. One can
extract a specific column from a data frame using its column name.
4. One can get the structure of the data frame using str() function in R. It can display even
the internal structure of large lists which are nested. It provides one-liner output for the
basic R objects letting the user know about the object and its constituents.

5. To create a data frame in R use data.frame() command and then pass each of the
vectors you have created as arguments to the function
2. Built in data viewer
The RStudio IDE includes a data viewer that allows you to look inside data frames and other
rectangular data structures. The viewer also allows includes some simple exploratory data
analysis (EDA) features that can help you understand the data as you manipulate it with R.

Starting the viewer

You can invoke the viewer in a console by calling the View function on the data frame you
want to look at. For instance, to view the built-in iris dataset, run these commands:
You can also start the viewer by clicking on the table data icon on the right, in the
environment pane:

Sorting

As you might expect, you can sort by any column by just by clicking on the column. Click on
a column that’s already sorted to reverse the sort direction.

To remove sorting and show the data in the order R sees it, click the empty cell in the upper
left.

Filtering

To apply filters, click the Filter icon in the toolbar. Any field that can be filtered will have a
white box labelled All. Click this box to change which field values you want to see. For
instance, to filter out irises with a sepal width greater than 3.6:
Note the text on the bottom, which indicates how many records the dataset contained
before and after filtering; in this case we’ve filtered 150 records down to 135.

Not all kinds of fields can be filtered. At the moment, only the following types are
supported:

 Numeric
 Character
 Factor (treated as character if > 256 levels)
 Boolean (logical)

Filters are additive (i.e., joined with AND); that is, if you apply two column filters, you will
see only records that match both of them.

Clear individual filters by clicking the (x) next to the filter; to clear all the filters at once, click
the Filter icon in the toolbar.

Searching

You can search for text across all the columns of your frame by typing in the global filter
box:
The search feature matches the literal text you type in with the displayed values, so in
addition to searching for text in character fields, you can search for e.g., TRUE or 4.6 and see
results in logical and numeric field types.

Searching and filtering are additive; when both are applied, you will see only records that
match your filters and contain your search text.

3.Data summaries in R
In R data frame, the statistical summary and nature of the data can be obtained by
applying summary() function. It is a generic function used to produce result summaries of
the results of various model fitting functions. The function invokes particular methods
which depend on the class of the first argument.
Descriptive Statistics of the data frame in R can be calculated by 3 different methods. Let’s
see how to calculate summary statistics of each column of dataframe in R with an example
for each method. summary() function in R is used to get the summary statistics of the
column

 Descriptive statistics with summary function in R


 Summary statistics in R using stat.desc() function from “pastecs” package
 Descriptive statistics with describe() function from “Hmisc” package
 summarise() function of the dplyr package in R
Summary statistic is computed using summary() function in R. summary() function is
automatically applied to each column. The format of the result depends on the data type of
the column.
 If the column is a numeric variable, mean, median, min, max and quartiles are
returned.
 If the column is a factor variable, the number of observations in each group is
returned.
 Descriptive statistics in R with simple summary function calculates
 minimum value of each column
 maximum value of each column

Q4. Neural networking in R programming


Neural Network in R, Neural Network is just like a human nervous system, which is made up
of interconnected neurons, in other words, a neural network is made up of interconnected
information processing units. Neural Networks A neural network is a network or circul of
neurons, or in a modem sense an artificial neural network, composed of artificial neurons or
nodes Today. neural networks are used for solving many business problems such as sales
forecasting, customer research, data validation, and risk management. The first layer of the
neural network receives the raw input, processes it and passes the processed information to
the hidden layers. The hidden layer passes the information to the last layer, which produces
the output. The advantage of neural network is that it is adaptive in nature. It learns from
the information provided, i.e. trains itself from the data, which has a known outcome and
optimizes its weights for a better prediction in situations with unknown outcome.
Outcome and optimizes its weights for a better prediction in situations with unknown
outcome. A neural network consists of three layers:
1. Input Layer: Layers that take inputs based on existing data.
2. Hidden Layer: Layers that use backpropagation to optimise the weights of the input
variables in order to improve the predictive power of the model.
3. Output Layer: Output of predictions based on the data from the input and hidden layers.

4.The input data is introduced to the neural network through the input layer that has one
neuron for each component present in the input data and is communicated to hidden
layers(one or more) present in the network. It is called ‘hidden’ only because they do not
constitute the input or output layer. In the hidden layers, all the processing actually
happens through a system of connections characterized by weights and biases(as discussed
earlier). Once the input is received, the neuron calculates a weighted sum adding also the
bias and according to the result and an activation function (the most common one is
sigmoid), it decides whether it should be ‘fired’ or ‘activated’. Then, the neuron transmits
the information downstream to other connected neurons in a process called ‘forward pass’.
At the end of this process, the last hidden layer is linked to the output layer which has one
neuron for each possible desired output.
5.Implementing Neural Network in R Programming
It is very much easier to implement a neural network by using the R language because of its
excellent libraries inside it. Before implementing a neural network in R let’s understand the
structure of the data first.
6.Understanding the structure of the data
Here we use the binary datasets. The objective is to predict whether a candidate will get
admitted to a university with variables such as gre, gpa, and rank. The R script is provided
side by side and is commented for better understanding of the user. The data is in .csv
format. We will get the working directory with getwd() function and place out datasets
binary.csv inside it to proceed further.
5. K means clustering
Clustering is the immense pool of technologies to catch classes of observations (known as
clusters) under a dataset provided, that contribute identical features. Clustering is arranged
in a way that each observation in the same class possesses similar characteristics and
observation of separate groups shows dissimilarity in characteristics. As a part of the
unsupervised learning method, clustering attempts to identify a relationship between n-
observations( data points) without being trained by the response variable .K Means
Clustering in R Programming is an Unsupervised Non-linear algorithm that cluster data
based on similarity or similar groups. It seeks to partition the observations into a
prespecified number of clusters. Segmentation of data takes place to assign each training 16
example to a segment called a cluster. In the unsupervised algorithm, high reliance on raw
data is given with large expenditure on manual review for review of relevance is given. It is
used in a variety of fields like Banking, healthcare, retail, Media, etc.

Applications of K-means Clustering


The concern of the fact is that the data is always complicated, mismanaged, and noisy. The
conditions in the real world cast hardly the clear picture to which these types of
algorithms can be applied. Let’s learn where we can implement k-means clustering among
various

1. K-means clustering is applied in the Call Detail Record (CDR) Analysis. It gives in-
depth vision about customer requirements and satisfaction on the basis of call-traffic
during the time of the day and demographic of a particular location.
2. It is used in the clustering of documents to identify the compatible documents in the
same place.
3. It is deployed to classify the sounds on the basis of their identical patterns and
segregate malformation in them.
4. It serves as the model of lossy images compression technique, in the confinement of
images, K-means makes clusters pixels of an image in order to decrease the total size
of it.
5. It is helpful in the business sector for recognizing the portions of purchases made by
customers, also to cluster movements on apps and websites.
6. In the field of insurance and fraud detection on the basis of prior data, it is plausible
to cluster fraudulent consumers to demand based on their proximity to clusters as
the patterns indicate.
Key Features of K-means Clustering

Find below some key features of k-means clustering;

1. It is very smooth in terms of interpretation and resolution.

2. For a large number of variables present in the dataset, K-means operates quicker
than Hierarchical clustering.
3. While redetermining the cluster centre, an instance can modify the cluster.

4. K-means reforms compact clusters.

5. It can work on unlabeled numerical data.

6. Moreover, it is fast, robust and uncomplicated to understand and yields the best
outcomes when datasets are well distinctive (thoroughly separated) from each
other.

You might also like