Monday, June 6, 2016

Introduction To R Language


R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formula where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and Mac OS.

The R environment

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hard copy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.
R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.
Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.
R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hard copy.

Every data analysis technique at your fingertips : R includes virtually every data manipulation, statistical model, and chart that the modern data scientist could ever need. You can easily find, download and use cutting-edge community-reviewed methods in statistics and predictive modeling from leading researchers in data science, free of charge.

Create beautiful and unique data visualizations : Representing complex data with charts and graphs is an essential part of the data analysis process, and R goes far beyond the traditional bar chart and line plot. Heavily influenced by thought leaders in data visualization like Bill Cleveland and Edward Tufte, R makes it easy to draw meaning from multidimensional data with multi-panel charts, 3-D surfaces and more. The custom charting capabilities of R are featured in many of the stunning infographics seen in the New York Times, The Economist, and the Flowing Data blog.

Get better results faster : Instead of using point-and-click menus or inflexible "black-box" procedures, R is a programming language designed expressly for data analysis. Intermediate level R programmers create data analyses faster than users of legacy statistical software, with the flexibility to mix-and-match models for the best results. And R scripts are easily automated, promoting both reproducible research and production deployments.

Draw on the talents of data scientists worldwide : As a thriving open-source project, R is supported by a community of more than 2 million users and thousands of developers worldwide. Whether you're using R to optimize portfolios, analyze genomic sequences, or to predict component failure times, experts in every domain have made resources, applications and code available for free, online.