Professional Documents
Culture Documents
Assigment-2 Pranati Choudhary
Assigment-2 Pranati Choudhary
ASSIGNMENT 2
3. Extract data from a data frame means that to access its rows or columns. One can
extract a specific column from a data frame using its column name.
4. One can get the structure of the data frame using str() function in R. It can display even
the internal structure of large lists which are nested. It provides one-liner output for the
basic R objects letting the user know about the object and its constituents.
5. To create a data frame in R use data.frame() command and then pass each of the
vectors you have created as arguments to the function
2. Built in data viewer
The RStudio IDE includes a data viewer that allows you to look inside data frames and other
rectangular data structures. The viewer also allows includes some simple exploratory data
analysis (EDA) features that can help you understand the data as you manipulate it with R.
You can invoke the viewer in a console by calling the View function on the data frame you
want to look at. For instance, to view the built-in iris dataset, run these commands:
You can also start the viewer by clicking on the table data icon on the right, in the
environment pane:
Sorting
As you might expect, you can sort by any column by just by clicking on the column. Click on
a column that’s already sorted to reverse the sort direction.
To remove sorting and show the data in the order R sees it, click the empty cell in the upper
left.
Filtering
To apply filters, click the Filter icon in the toolbar. Any field that can be filtered will have a
white box labelled All. Click this box to change which field values you want to see. For
instance, to filter out irises with a sepal width greater than 3.6:
Note the text on the bottom, which indicates how many records the dataset contained
before and after filtering; in this case we’ve filtered 150 records down to 135.
Not all kinds of fields can be filtered. At the moment, only the following types are
supported:
Numeric
Character
Factor (treated as character if > 256 levels)
Boolean (logical)
Filters are additive (i.e., joined with AND); that is, if you apply two column filters, you will
see only records that match both of them.
Clear individual filters by clicking the (x) next to the filter; to clear all the filters at once, click
the Filter icon in the toolbar.
Searching
You can search for text across all the columns of your frame by typing in the global filter
box:
The search feature matches the literal text you type in with the displayed values, so in
addition to searching for text in character fields, you can search for e.g., TRUE or 4.6 and see
results in logical and numeric field types.
Searching and filtering are additive; when both are applied, you will see only records that
match your filters and contain your search text.
3.Data summaries in R
In R data frame, the statistical summary and nature of the data can be obtained by
applying summary() function. It is a generic function used to produce result summaries of
the results of various model fitting functions. The function invokes particular methods
which depend on the class of the first argument.
Descriptive Statistics of the data frame in R can be calculated by 3 different methods. Let’s
see how to calculate summary statistics of each column of dataframe in R with an example
for each method. summary() function in R is used to get the summary statistics of the
column
4.The input data is introduced to the neural network through the input layer that has one
neuron for each component present in the input data and is communicated to hidden
layers(one or more) present in the network. It is called ‘hidden’ only because they do not
constitute the input or output layer. In the hidden layers, all the processing actually
happens through a system of connections characterized by weights and biases(as discussed
earlier). Once the input is received, the neuron calculates a weighted sum adding also the
bias and according to the result and an activation function (the most common one is
sigmoid), it decides whether it should be ‘fired’ or ‘activated’. Then, the neuron transmits
the information downstream to other connected neurons in a process called ‘forward pass’.
At the end of this process, the last hidden layer is linked to the output layer which has one
neuron for each possible desired output.
5.Implementing Neural Network in R Programming
It is very much easier to implement a neural network by using the R language because of its
excellent libraries inside it. Before implementing a neural network in R let’s understand the
structure of the data first.
6.Understanding the structure of the data
Here we use the binary datasets. The objective is to predict whether a candidate will get
admitted to a university with variables such as gre, gpa, and rank. The R script is provided
side by side and is commented for better understanding of the user. The data is in .csv
format. We will get the working directory with getwd() function and place out datasets
binary.csv inside it to proceed further.
5. K means clustering
Clustering is the immense pool of technologies to catch classes of observations (known as
clusters) under a dataset provided, that contribute identical features. Clustering is arranged
in a way that each observation in the same class possesses similar characteristics and
observation of separate groups shows dissimilarity in characteristics. As a part of the
unsupervised learning method, clustering attempts to identify a relationship between n-
observations( data points) without being trained by the response variable .K Means
Clustering in R Programming is an Unsupervised Non-linear algorithm that cluster data
based on similarity or similar groups. It seeks to partition the observations into a
prespecified number of clusters. Segmentation of data takes place to assign each training 16
example to a segment called a cluster. In the unsupervised algorithm, high reliance on raw
data is given with large expenditure on manual review for review of relevance is given. It is
used in a variety of fields like Banking, healthcare, retail, Media, etc.
1. K-means clustering is applied in the Call Detail Record (CDR) Analysis. It gives in-
depth vision about customer requirements and satisfaction on the basis of call-traffic
during the time of the day and demographic of a particular location.
2. It is used in the clustering of documents to identify the compatible documents in the
same place.
3. It is deployed to classify the sounds on the basis of their identical patterns and
segregate malformation in them.
4. It serves as the model of lossy images compression technique, in the confinement of
images, K-means makes clusters pixels of an image in order to decrease the total size
of it.
5. It is helpful in the business sector for recognizing the portions of purchases made by
customers, also to cluster movements on apps and websites.
6. In the field of insurance and fraud detection on the basis of prior data, it is plausible
to cluster fraudulent consumers to demand based on their proximity to clusters as
the patterns indicate.
Key Features of K-means Clustering
2. For a large number of variables present in the dataset, K-means operates quicker
than Hierarchical clustering.
3. While redetermining the cluster centre, an instance can modify the cluster.
6. Moreover, it is fast, robust and uncomplicated to understand and yields the best
outcomes when datasets are well distinctive (thoroughly separated) from each
other.