Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

CHAPTER-1:-INTRODUCTION TO R LANGUAGE

1.1 HISTORY AND OVERVIEW

R is a coding language and software system developed for the analysis and computing of
statistical data. It provides an environment to represent statistical data in the form of charts,
graphs and reporting. R Language was developed by Ross Ihaka and Robert Gentleman at the
Auckland University, New Zealand, and is presently developed by the R Development Core
Team. R is easily accessible under the GNU General Public License, and pre-compiled binary
versions are availble for several operating systems like Linux, Windows and Mac. R is
implemented by S programming language combined with lexical scoping semantics inspired by
Scheme. S was created by John Chambers while at Bell Labs.

This programming language derives its name R, grounded on the first letter of first name of the
two developers of R (Robert Gentleman and Ross Ihaka), and partly a play on the name of the
Bell Labs Language S. R is a very effective programming language and it include all the
potentials like any other good programming language. R is a free ( open source software). R is
world’s most broadly used statistical data analysis tool. It's the # 1 choice of data scientists and
supported by a vivacious and talented community of contributors.

1.2 FEATURES OF R LANGUAGE

i. R language is a well developed , straight forward and effecient programing language. It


includes loops, conditionals, recursive functions, and input and output facilities.
ii. R has an excellent storage and data handling facility.
iii. R provides a set of operators for vectors, arrays and matrices.
iv. R provides a list of wide collection of tools for data analysis.
v. R is accessed through interpreter based on command line; it supports arithmetic operations
which are matrix based. Data structure of R involve vectors, arrays, matrices, lists and data
frames. Extendable object scheme of R contains objects for regression models, time-series
and geo-spatial coordinates. The scalar data type was not a data structure of R. As an
alternative, a scalar is expressed as a vector which I of length one.
vi. Procedural programing is supported by R language with functions and object-oriented
programming with generic functions for some functions. It is mainly utilized by statisticians
and mathematicians needs an atmosphere for analysis of statistical data and development of
software, R language is also utilized as a tool box for common matrix operations with
performance standards similar to MATLAB or GNU octave.

1.3 PACKAGE

Performance of R language can be enhanced through a package which is created by user


generally developed in C, C++, java etc. for specific statistical method, graphical plots (ggplots),
Import/ Export abilities, reporting tools (knitr, sweave ) etc. R has a core group of packages, it is
provided through the installation, with more than 7,801 extra packages, these include
Comprehensive R Archive Network (CRAN), Bio conductor, Omegahat, GitHub, etc.

The "Task Views" page on the website of CRAN provide a great variety of jobs (such as
Finance, Genetics, Computing with good performance, Machine Learning, Medical Imaging,
Social Sciences and Spatial Statistics) to which R has been utilized and for which packages are
provided. R is also used by the Food and Drug Administration (FDA) as right for analysing data
from medical research. Some R package resources comprise Crantastic, which is a open site for
rating and studying all CRAN packages, and R-Forge, a central platform for the collective
enhancement of R packages, software associated to R, and projects. R-Forge also hosts various
unpublished beta packages, and development of CRAN package.

For the analysis of genomic data ,the Bio-conductor project provides many R packages like
Affymetrix and cDNA microarray object-oriented data-handling , and has began to offer tools for
examination of next generation data high throughput sequencing technique.

1.4 RSTUDO

RStudio is an IDE, integrated development environment. It offers management of workspace, it


involves syntax highlighting editor, console and debugging. RStudio is an open supply software
system although business versions are also provided with some improved features and it supports
desktop computers which operates on windows, mac and linux as well as on browser connected
to RStudio.
Two versions available are:

a. RStudio desktop: Software runs in the same way as desktop application.


b. RStudio server: In this RStudio is used to access web browser.

The proposed work was carried out using RStudio Desktop. Features utilized were:

IDE was created specifically for R language.

 Syntax is highlighted, completion of code and the smart indention


 From the s/ource editor R program can be executed directly
 Rapidly switch to function definitions

1) Workflow is taken together


 Integrated R support and documentation
 using projects multiple working directories can be easily managed
 Data viewer and workplace browser

2) influentialStudio authoring and fixing


 Quickly detect and fix errors.
 Tools Extensive package development.
 Authoring with Sweave and R Markdown

There are four sections in RStudio which are shown inFug1.1

1)  Console: This region shows the output of program that you run. Also, the code can be
written directly in console. Code entered directly in R console can’t be sketched later.
This is where R script comes in use.
2) R Script: As it is clear from the name, here you get space to write program. To run those
codes, just select the line(s) of code and press Ctrl + Enter. Otherwise, you can click on
little ‘Run’ button available at top right corner of R Script.
3) R environment: This region shows the collection of external elements added. This
comprises data set, variables, vectors, functions etc. To check if data has been loaded
appropriately in R, always have a look at this area.
4) Graphical Output: This area shows the graphs created during exploratory data analysis.
Not only graphs, you could select packages, get help from embedded R’s official
documentation

Fig 1.1 RStudio

1.5 R PACKAGES

R packages are the group of R functions, complied code and sample data. They are found inside
a directory which is called "library"in the R langauge. By default, R installs a collection of
packages all through installation. Other packages are added later, when they are required for
some specific purpose. When we switch the R console, only the default packages are offered by
default. Other packages which are already installed have to be loaded exp licitly to be used by
the R program that is going to use the particular package. There are two methods to get new R
packages. First is installing directly from the comprehensive R archive network(CRAN)
directory and second is downloading the package to your computer system and manually install
it.
The command given below gets the packages directly from the web page CRAN and installs the
package in the R environment. You may be prompted to choose a nearest mirror. Select the one
suitable to your location.

Another way is go to the link R Packages to get the required packages. Save the package as
a .zip file in a right location in the local system.

Now we can run the following command to install this package.


CHAPTER-2:-DATA ANALYSIS WITH R

2.1 DATA MINING

Data Mining is demarcated as extracting information from enormous group of data. In other
words, we can say that data mining is the technique of mining knowledge from data.

There is a large amount of data available in the Information Industry. There is no use of that data
till it is transformed into some useful information. It is essential to analyze this large amount of
data and extract useful information from it. Extraction of information is not the single procedure
that we require to accomplish; data mining also includes other steps such as Data Cleaning, Data
Integration, Data Transformation, Data Mining, Pattern Evaluation and Data Presentation. As
soon as all these processes are done, we would be able to utilize this information in various other
applications such as Fraud Detection, Market Analysis, Production Control, Science Exploration,
etc. Data mining is the component of the Knowledge Discovery process. Knowledge discovery
in data bases generally abbreviated as KDD. Data mining and KDD are frequently used
interchangeably because data mining is the key component of the KDD process. KDD process
may involves some steps: like data selection, data cleaning, data transformation, pattern
searching i.e. data mining, finding presentation, finding interpretation and finding evaluation. A
typical KDD process is shown

Fig:2.1 A typical Knowledge Discovery process


2.2 APPLICATION OF DATA MINING IN TELECOMMUNICATION
INDUSTRY

Currently the Telecommunication industry is amongst the most rising industries providing
numerous services such as fax, pager, cell phone, Internet services, etc. The telecommunication
industry is expanding rapidly because of the development of advanced hardware, software and
communication technology. That is why data mining is become very significant to help and
expand the business. Data Mining in Telecommunication industry helps in detecting the
telecommunication patterns, catch deceitful activities, make best use of available resource, and
enhance quality of service. Some examples are listed below for which data mining advance
telecommunication services as:

i. Multidimensional Analysis of Telecommunication data.


ii. Fraudulent pattern analysis.
iii. Identification of unusual patterns
iv. Multidimensional association and sequential patterns analysis.
v. Mobile Telecommunication services.
vi. Use of visualization tools in telecommunication data analysis

You might also like