Professional Documents
Culture Documents
Chapter-1:-Introduction To R Language: 1.1 History and Overview
Chapter-1:-Introduction To R Language: 1.1 History and Overview
R is a coding language and software system developed for the analysis and computing of
statistical data. It provides an environment to represent statistical data in the form of charts,
graphs and reporting. R Language was developed by Ross Ihaka and Robert Gentleman at the
Auckland University, New Zealand, and is presently developed by the R Development Core
Team. R is easily accessible under the GNU General Public License, and pre-compiled binary
versions are availble for several operating systems like Linux, Windows and Mac. R is
implemented by S programming language combined with lexical scoping semantics inspired by
Scheme. S was created by John Chambers while at Bell Labs.
This programming language derives its name R, grounded on the first letter of first name of the
two developers of R (Robert Gentleman and Ross Ihaka), and partly a play on the name of the
Bell Labs Language S. R is a very effective programming language and it include all the
potentials like any other good programming language. R is a free ( open source software). R is
world’s most broadly used statistical data analysis tool. It's the # 1 choice of data scientists and
supported by a vivacious and talented community of contributors.
1.3 PACKAGE
The "Task Views" page on the website of CRAN provide a great variety of jobs (such as
Finance, Genetics, Computing with good performance, Machine Learning, Medical Imaging,
Social Sciences and Spatial Statistics) to which R has been utilized and for which packages are
provided. R is also used by the Food and Drug Administration (FDA) as right for analysing data
from medical research. Some R package resources comprise Crantastic, which is a open site for
rating and studying all CRAN packages, and R-Forge, a central platform for the collective
enhancement of R packages, software associated to R, and projects. R-Forge also hosts various
unpublished beta packages, and development of CRAN package.
For the analysis of genomic data ,the Bio-conductor project provides many R packages like
Affymetrix and cDNA microarray object-oriented data-handling , and has began to offer tools for
examination of next generation data high throughput sequencing technique.
1.4 RSTUDO
The proposed work was carried out using RStudio Desktop. Features utilized were:
1) Console: This region shows the output of program that you run. Also, the code can be
written directly in console. Code entered directly in R console can’t be sketched later.
This is where R script comes in use.
2) R Script: As it is clear from the name, here you get space to write program. To run those
codes, just select the line(s) of code and press Ctrl + Enter. Otherwise, you can click on
little ‘Run’ button available at top right corner of R Script.
3) R environment: This region shows the collection of external elements added. This
comprises data set, variables, vectors, functions etc. To check if data has been loaded
appropriately in R, always have a look at this area.
4) Graphical Output: This area shows the graphs created during exploratory data analysis.
Not only graphs, you could select packages, get help from embedded R’s official
documentation
1.5 R PACKAGES
R packages are the group of R functions, complied code and sample data. They are found inside
a directory which is called "library"in the R langauge. By default, R installs a collection of
packages all through installation. Other packages are added later, when they are required for
some specific purpose. When we switch the R console, only the default packages are offered by
default. Other packages which are already installed have to be loaded exp licitly to be used by
the R program that is going to use the particular package. There are two methods to get new R
packages. First is installing directly from the comprehensive R archive network(CRAN)
directory and second is downloading the package to your computer system and manually install
it.
The command given below gets the packages directly from the web page CRAN and installs the
package in the R environment. You may be prompted to choose a nearest mirror. Select the one
suitable to your location.
Another way is go to the link R Packages to get the required packages. Save the package as
a .zip file in a right location in the local system.
Data Mining is demarcated as extracting information from enormous group of data. In other
words, we can say that data mining is the technique of mining knowledge from data.
There is a large amount of data available in the Information Industry. There is no use of that data
till it is transformed into some useful information. It is essential to analyze this large amount of
data and extract useful information from it. Extraction of information is not the single procedure
that we require to accomplish; data mining also includes other steps such as Data Cleaning, Data
Integration, Data Transformation, Data Mining, Pattern Evaluation and Data Presentation. As
soon as all these processes are done, we would be able to utilize this information in various other
applications such as Fraud Detection, Market Analysis, Production Control, Science Exploration,
etc. Data mining is the component of the Knowledge Discovery process. Knowledge discovery
in data bases generally abbreviated as KDD. Data mining and KDD are frequently used
interchangeably because data mining is the key component of the KDD process. KDD process
may involves some steps: like data selection, data cleaning, data transformation, pattern
searching i.e. data mining, finding presentation, finding interpretation and finding evaluation. A
typical KDD process is shown
Currently the Telecommunication industry is amongst the most rising industries providing
numerous services such as fax, pager, cell phone, Internet services, etc. The telecommunication
industry is expanding rapidly because of the development of advanced hardware, software and
communication technology. That is why data mining is become very significant to help and
expand the business. Data Mining in Telecommunication industry helps in detecting the
telecommunication patterns, catch deceitful activities, make best use of available resource, and
enhance quality of service. Some examples are listed below for which data mining advance
telecommunication services as: