R Studio Project

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

HISTORY OF R

ABSTRACT
R is a programming language and free software environment for statistical computing and
graphics supported by the R Foundation for Statistical Computing. The R language is widely
used among statisticians and data miners for developing statistical software and data
analysis. Polls, data mining surveys, and studies of scholarly literature databases show
substantial increases in popularity; as of February 2020, R ranks 13th in the TIOBE index, a
measure of popularity of programming languages.

A GNU package, source code for the R software environment is written primarily


in C, Fortran, and R itself and is freely available under the GNU General Public License. Pre-
compiled binary versions are provided for various operating systems. Although R has
a command line interface, there are several 3rd party graphical user interfaces, such
as RStudio, an integrated development environment, and JUPYTER, a notebook interface.

INTRODUCTION

HISTORY

R is an implementation of the S programming language combined with lexical


scoping semantics, inspired by Scheme. S was created by John Chambers in 1976, while
at Bell Labs. There are some important differences, but much of the code written for S runs
unaltered.
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand, and is currently developed by the R Development Core Team (of which Chambers is
a member). R is named partly after the first names of the first two R authors and partly as a
play on the name of S. The project was conceived in 1992, with an initial version released in
1995 and a stable beta version in 2000.

STATISTICAL FEATURES

R and its libraries implement a wide variety of statistical and graphical techniques,


including linear and nonlinear modeling, classical statistical tests, time-series analysis,
classification, clustering, and others. R is easily extensible through functions and extensions,
and the R community is noted for its active contributions in terms of packages. Many of R's
standard functions are written in R itself, which makes it easy for users to follow the
algorithmic choices made. For computationally intensive tasks, C, C++, and Fortran code can
be linked and called at run time. Advanced users can write C, C+
+, Java, .NET or Python code to manipulate R objects directly. R is highly extensible through
the use of user-submitted packages for specific functions or specific areas of study. Due to
its S heritage, R has stronger object-oriented programming facilities than most statistical
computing languages. Extending R is also eased by its lexical scoping rules.

Another strength of R is static graphics, which can produce publication-quality graphs,


including mathematical symbols. Dynamic and interactive graphics are available through
additional packages.

R has Rd, its own LaTeX-like documentation format, which is used to supply comprehensive
documentation, both online in a number of formats and in hard copy.

PROGRAMMING FEATURES

Like other similar languages such as APL and MATLAB, R supports matrix arithmetic.


R's data structures include vectors, matrices, arrays, data frames (similar to tables in
a relational database) and lists. Arrays are stored in column-major order. R's extensible object
system includes objects for (among others): regression models, time-series and geo-spatial
coordinates. The scalar data type was never a data structure of R. Instead, a scalar is
represented as a vector with length one.

Many features of R derive from Scheme. R uses S-expressions to represent both data and


code. Functions are first-class and can be manipulated in the same way as data objects,
facilitating meta-programming, and allow multiple dispatch. Variables in R are lexically
scoped and dynamically typed. Function arguments are passed by value, and are lazy—that is
to say, they are only evaluated when they are used, not when the function is called.

R supports procedural programming with functions and, for some functions, object-oriented


programming with generic functions. A generic function acts differently depending on
the classes of arguments passed to it. In other words, the generic function dispatches the
function (method) specific to that class of object. For example, R has a generic print function
that can print almost every class of object in R with a simple print(object name) syntax.

Although used mainly by statisticians and other practitioners requiring an environment for
statistical computation and software development, R can also operate as a general matrix
calculation toolbox – with performance benchmarks comparable to GNU
Octave or MATLAB.

PACKAGES

The capabilities of R are extended through user-created packages, which allow specialised


statistical techniques, graphical devices, import/export capabilities, reporting tools
(Rmarkdown, knitr, Sweave), etc. These packages are developed primarily in R, and
sometimes in Java, C, C++, and Fortran. The R packaging system is also used by researchers
to create compendia to organise research data, code and report files in a systematic way for
sharing and public archiving.

A core set of packages is included with the installation of R, with more than 15,000
additional packages (as of September 2018) available at the Comprehensive R Archive
Network (CRAN), Bioconductor, Omegahat, GitHub, and other repositories.

The "Task Views" page (subject list) on the CRAN website lists a wide range of tasks (in
fields such as Finance, Genetics, High Performance Computing, Machine Learning, Medical
Imaging, Social Sciences and Spatial Statistics) to which R has been applied and for which
packages are available. R has also been identified by the FDA as suitable for interpreting data
from clinical research.

Other R package resources include Crantastic, a community site for rating and reviewing all
CRAN packages, and R-Forge, a central platform for the collaborative development of R
packages, R-related software, and projects. R-Forge also hosts many unpublished beta
packages, and development versions of CRAN packages. Microsoft maintains a daily
snapshot of CRAN, that dates back to Sept. 17, 2014.

The Bioconductor project provides R packages for the analysis of genomic data. This
includes object-oriented data-handling and analysis tools for data
from Affymetrix, cDNA microarray, and next-generation high-throughput
sequencing methods.

INTERFACES

The most specialized integrated development environment (IDE) for R is RStudio. A similar


development interface is R Tools for Visual Studio. Some generic IDEs like Eclipse, also
offer features to work with R.

Graphical user interfaces with more of a point-and-click approach include Rattle GUI, R


Commander, and RKWard.

Some of the more common editors with varying levels of support for R
include Emacs (Emacs Speaks Statistics), Vim (Nvim-R plugin), Neovim (Nvim-R
plugin), Kate, LyX, Notepad++, Visual Studio Code, WinEdt, and Tinn-R.

R functionality is accessible from several scripting languages such


as Python, Perl, Ruby, F#, and Julia. Interfaces to other, high-level programming languages,
like Java and .NET C# are available as well.

IMPLEMENTATIONS

The main R implementation is written in R, C, and Fortran, and there are several other
implementations aimed at improving speed or increasing extensibility. A closely related
implementation is pqR (pretty quick R) by Radford M. Neal with improved memory
management and support for automatic multithreading. Renjin and FastR
are Java implementations of R for use in a Java Virtual Machine. CXXR, rho, and Riposte are
implementations of R in C++. Renjin, Riposte, and pqR attempt to improve performance by
using multiple processor cores and some form of deferred evaluation. Most of these
alternative implementations are experimental and incomplete, with relatively few users,
compared to the main implementation maintained by the R Development Core Team.

TIBCO built a runtime engine called TERR, which is part of Spotfire.


Microsoft R Open is a fully compatible R distribution with modifications for multi-threaded
computations.

COMMUNITIES

R has local communities worldwide for users to network, share ideas, and learn.

There is a growing number of R events bringing its users together, such as conferences (e.g.
useR!, WhyR?, conectaR, SatRdays), meetups, as well as R-Ladies groups that promote
gender diversity.

LITERATURE REVIEW
1.TEXT MINING SCIENTIFIC ARTICLES USING R STUDIO
The aim of this study is to develop a solution for text mining scientific articles using
the R language in the " Knowledge Extraction and Machine Learning " course.
Automatic text summary of papers is a challenging problem whose approach would
allow researchers to browse large article collections and quickly view highlights and
drill down for details. The proposed solution is based in social network analysis, topic
models and bipartite graph approaches. Our method defines a bipartite graph between
documents and topics built using the Latent Dirichlet Allocation topic model. The
topics are connected to generate a network of topics, which is converted to bipartite
graph, using topics collected in the same document. Hence, it is revealed to be a very
promising technique for providing insights about summarizing scientific article
collections.

2. AN ANALYTICAL REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO

International Journal of Engineering Technologies and Management Research,, 2019

I. Journal
A larger amount of data gives a better output but also working with it can become a challenge
due to processing limitations. Nowadays companies are starting to realize the importance of
using more data in order to support decision for their strategies. It was said and proved
through study cases that "More data usually beats better algorithms". With this statement
companies started to realize that they can chose to invest more in processing larger sets of
data rather than investing in expensive algorithms. During the last decade, large statistics
evaluation has seen an exponential boom and will absolutely retain to witness outstanding
tendencies due to the emergence of new interactive multimedia packages and extraordinarily
incorporated systems driven via the speedy growth in statistics services and microelectronic
gadgets. Up to now, maximum of the modern mobile structures are especially centered to
voice communications with low transmission fees.

Doi: 10.5281/zenodo.3266146

Publication Date: 2019

Publication Name: International Journal of Engineering Technologies and Management


Research.

MODEL RESEARCH IDEA RESULT


The R language is well established as the language for doing statistics, data analysis, data-
mining algorithm development, stock trading, credit risk scoring, market basket analysis and
all manner of predictive analytics. However, given the deluge of data that must be processed
and analyzed today, many organizations have been reticent about deploying R beyond
research into production applications. A large number of fields and sectors, ranging from
economic and business activities to public administration, from national security to scientific
researches in many areas, involve with Big Data problems. On the one hand, Big Data is
extremely valuable to produce productivity in businesses and evolutionary breakthroughs in
scientific disciplines, which give us a lot of opportunities to make great progresses in many
fields. There is no doubt that the future competitions in business productivity and
technologies will surely converge into the Big Data explorations. On the other hand, Big Data
also arises with many challenges, such as difficulties in data capture, data storage, and data
analysis and data visualization. The main objective of this paper is emphasizing the
significance and relevance of Big Data in our business system, society administration and
scientific research. They have purposed potential techniques to solve the problem, including
cloud computing, quantum computing and biological computing. To capture the value from
“Big Data”, we need to develop new techniques and technologies for analyzing it. Until now,
scientists have developed a wide variety of techniques and technologies to capture, curate,
analyze and visualize Big Data. We need tools (platforms) to make sense of “Big Data”.
Current tools concentrate on three classes, namely, batch processing tools, stream processing
tools, and interactive analysis tools. Most batch processing tools are based on the Apache
Hadoop infrastructure, such as Mapreduce , R Programming and Dryad. The interactive
analysis processes the data in an interactive environment, allowing users to undertake their
own analysis of information.

CONCLUSION
There are a number of reasons why R studio is preferred:

There are many answers to this question, but some of the most important are:

1.R and RStudio are free.

One of the biggest perks of working with R and RStudio is that both are available free of
charge. Whereas other, proprietary statistics packages are often stuck in the dark ages of
development (the 1990s, for example), and can be incredibly expensive to purchase, R is a
free alternative that allows users of all experience levels to contribute to its development.

2.Analyses done using R are reproducible.

As many scientific fields embrace the idea of reproducible analyses, proprietary point-and-
click systems actually serve as a hindrance to this process. If you need to re-run your
analysis using one of these systems, you’ll need to carefully copy-and-paste your results
into your text editor, potentially from beginning to end. As anyone who has done this sort
of copy-and-pasting knows, this approach is both prone to errors and incredibly tedious.

If, on the other hand, you use the workflows described in this book, your analyses will be
reproducible, thus eliminating the copy-and-paste dance. And, as you can probably guess,
it is much better to be able to update your code and data inputs and then re-run all of your
analysis with the push of a button than to have to worry about manually moving your
results from one program to another. Reproducibility also helps you as a programmer,
since your most frequent collaborator is likely to be yourself a few months or years down
the road. Instead of having to carefully write down all the steps you took to find the correct
drop-down menu option, your entire code is stored, and immediately reusable.

3.Using R makes collaboration easier.

This approach also helps with collaboration since, as you will see later, you can share a
single R Markdown file containing all of your analysis, documentation, comments, and
code with others. This reduces the time needed to work with others and reduces the
likelihood of errors being made in following along with point-and-click analyses. The
mantra here is to Say No to Copy-And-Paste! both for your sanity and for the sake of
science.

4.Struggling through programming helps you learn.

We all know that learning isn’t easy. Do you have trouble remembering how to follow a
list of more than 10 steps or so? Do you find yourself going back over and over again
because you can’t remember what step comes next in the process? This is extremely
common, especially if you haven’t done the procedure in awhile. Learning by following a
procedure is easy in the short-term, but can be extremely frustrating to remember in the
long-term. If done well, programming promotes long-term thinking to short-term fixes.

You might also like