R (programming language)

Programming language for statistics From Wikipedia, the free encyclopedia

R (programming language)

R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics and data analysis.[9]

Quick Facts Paradigms, Designed by ...
R
Thumb
Thumb
R terminal
ParadigmsMulti-paradigm: procedural, object-oriented, functional, reflective, imperative, array[1]
Designed byRoss Ihaka and Robert Gentleman
DeveloperR Core Team
First appearedAugust 1993; 31 years ago (1993-08)
Stable release
4.4.2[2]  / 31 October 2024; 3 months ago (31 October 2024)
Typing disciplineDynamic
Platformarm64 and x86-64
LicenseGPL-2.0-or-later[3]
Filename extensions
  • .r[4]
  • .rdata
  • .rhistory
  • .rds
  • .rda[5]
Websiter-project.org
Influenced by
Influenced
Julia[7] pandas[8]
Close

The core R language is augmented by a large number of extension packages, containing reusable code, documentation, and sample data.

R software is open-source and free software. It is licensed by the GNU Project and available under the GNU General Public License.[3] It is written primarily in C, Fortran, and R itself. Precompiled executables are provided for various operating systems.

As an interpreted language, R has a native command line interface. Moreover, multiple third-party graphical user interfaces are available, such as RStudio—an integrated development environment—and Jupyter—a notebook interface.

History

Thumb
Ross Ihaka, co-originator of R
Thumb
Robert Gentleman, co-originator of R

R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland.[10] The language was inspired by the S programming language, with most S programs able to run unaltered in R.[6] The language was also inspired by Scheme's lexical scoping, allowing for local variables.[1]

The name of the language, R, comes from being both an S language successor as well as the shared first letter of the authors, Ross and Robert.[11] In August 1993, Ihaka and Gentleman posted a binary of R on StatLib — a data archive website.[12] At the same time, they announced the posting on the s-news mailing list.[13] On December 5, 1997, R became a GNU project when version 0.60 was released.[14] On February 29, 2000, the 1.0 version was released.[15]

Packages

Summarize
Perspective
Thumb
Violin plot created from the R visualization package ggplot2

R packages are collections of functions, documentation, and data that expand R.[16] For example, packages add report features such as RMarkdown, Quarto,[17] knitr and Sweave. Packages also add the capability to implement various statistical techniques such as linear, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, time-series analysis, and clustering. Easy package installation and use have contributed to the language's adoption in data science.[18]

Base packages are immediately available when starting R and provide the necessary syntax and commands for programming, computing, graphics production, basic arithmetic, and statistical functionality.[19]

The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Friedrich Leisch to host R's source code, executable files, documentation, and user-created packages.[20] Its name and scope mimic the Comprehensive TeX Archive Network and the Comprehensive Perl Archive Network.[20] CRAN originally had three mirrors and 12 contributed packages.[21] As of 16 October 2024, it has 99 mirrors[22] and 21,513 contributed packages.[23] Packages are also available on repositories R-Forge, Omegahat, and GitHub.[24][25][26]

The Task Views on the CRAN web site list packages in fields such as causal inference, finance, genetics, high-performance computing, machine learning, medical imaging, meta-analysis, social sciences, and spatial statistics.

The Bioconductor project provides packages for genomic data analysis, complementary DNA, microarray, and high-throughput sequencing methods.

The tidyverse package bundles several subsidiary packages that provide a common interface for tasks related to accessing and processing "tidy data",[27] data contained in a two-dimensional table with a single row for each observation and a single column for each variable.[28]

Installing a package occurs only once. For example, to install the tidyverse package:[28]

> install.packages("tidyverse")

To load the functions, data, and documentation of a package, one executes the library() function. To load tidyverse:[a]

> # Package name can be enclosed in quotes
> library("tidyverse")

> # But also the package name can be called without quotes
> library(tidyverse)

Interfaces

R comes installed with a command line console. Available for installation are various integrated development environments (IDE). IDEs for R include R.app[29] (OSX/macOS only), Rattle GUI, R Commander, RKWard, RStudio, and Tinn-R.[30]

General purpose IDEs that support R include Eclipse via the StatET plugin and Visual Studio via R Tools for Visual Studio.

Editors that support R include Emacs, Vim via the Nvim-R plugin, Kate, LyX via Sweave, WinEdt (website), and Jupyter (website).

Scripting languages that support R include Python (website), Perl (website), Ruby (source code), F# (website), and Julia (source code).

General purpose programming languages that support R include Java via the Rserve socket server, and .NET C# (website).

Statistical frameworks which use R in the background include Jamovi and JASP.

Community

The R Core Team was founded in 1997 to maintain the R source code. The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. The R Consortium is a Linux Foundation project to develop R infrastructure.

The R Journal is an open access, academic journal which features short to medium-length articles on the use and development of R. It includes articles on packages, programming tips, CRAN news, and foundation news.

The R community hosts many conferences and in-person meetups - see the community maintained GitHub list. These groups include:

  • UseR!: an annual international R user conference (website)
  • Directions in Statistical Computing (DSC) (website)
  • R-Ladies: an organization to promote gender diversity in the R community (website)
  • SatRdays: R-focused conferences held on Saturdays (website)
  • R Conference (website)
  • posit::conf (formerly known as rstudio::conf) (website)

Implementations

The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include:

Microsoft R Open (MRO) was an R implementation. As of 30 June 2021, Microsoft started to phase out MRO in favor of the CRAN distribution.[33]

Commercial support

Although R is an open-source project, some companies provide commercial support:

  • Oracle provides commercial support for the Big Data Appliance, which integrates R into its other products.
  • IBM provides commercial support for in-Hadoop execution of R.

Examples

Summarize
Perspective

Hello, World!

"Hello, World!" program:

> print("Hello, World!")
[1] "Hello, World!"

Basic syntax

The following examples illustrate the basic syntax of the language and use of the command-line interface. (An expanded list of standard language features can be found in the R manual, "An Introduction to R".[34])

In R, the generally preferred assignment operator is an arrow made from two characters <-, although = can be used in some cases.[35]

> x <- 1:6 # Create a numeric vector in the current environment
> y <- x^2 # Create vector based on the values in x.
> print(y) # Print the vector’s contents.
[1]  1  4  9 16 25 36

> z <- x + y # Create a new vector that is the sum of x and y
> z # Return the contents of z to the current environment.
[1]  2  6 12 20 30 42

> z_matrix <- matrix(z, nrow = 3) # Create a new matrix that turns the vector z into a 3x2 matrix object
> z_matrix 
     [,1] [,2]
[1,]    2   20
[2,]    6   30
[3,]   12   42

> 2 * t(z_matrix) - 2 # Transpose the matrix, multiply every element by 2, subtract 2 from each element in the matrix, and return the results to the terminal.
     [,1] [,2] [,3]
[1,]    2   10   22
[2,]   38   58   82

> new_df <- data.frame(t(z_matrix), row.names = c("A", "B")) # Create a new data.frame object that contains the data from a transposed z_matrix, with row names 'A' and 'B'
> names(new_df) <- c("X", "Y", "Z") # Set the column names of new_df as X, Y, and Z.
> print(new_df)  # Print the current results.
   X  Y  Z
A  2  6 12
B 20 30 42

> new_df$Z # Output the Z column
[1] 12 42

> new_df$Z == new_df['Z'] && new_df[3] == new_df$Z # The data.frame column Z can be accessed using $Z, ['Z'], or [3] syntax and the values are the same. 
[1] TRUE

> attributes(new_df) # Print attributes information about the new_df object
$names
[1] "X" "Y" "Z"

$row.names
[1] "A" "B"

$class
[1] "data.frame"

> attributes(new_df)$row.names <- c("one", "two") # Access and then change the row.names attribute; can also be done using rownames()
> new_df
     X  Y  Z
one  2  6 12
two 20 30 42

Structure of a function

One of R's strengths is the ease of creating new functions.[36] Objects in the function body remain local to the function, and any data type may be returned. In R, almost all functions and all user-defined functions are closures.[37]

Create a function:

# The input parameters are x and y.
# The function returns a linear combination of x and y.
f <- function(x, y) {
  z <- 3 * x + 4 * y

  # an explicit return() statement is optional, could be replaced with simply `z`
  return(z)
}

Usage output:

> f(1, 2)
[1] 11

> f(c(1, 2, 3), c(5, 3, 4))
[1] 23 18 25

> f(1:3, 4)
[1] 19 22 25

It is possible to define functions to be used as infix operators with the special syntax `%name%` where "name" is the function variable name:

> `%sumx2y2%` <- function(e1, e2) {e1 ^ 2 + e2 ^ 2}
> 1:3 %sumx2y2% -(1:3)
[1]  2  8 18

Since version 4.1.0 functions can be written in a short notation, which is useful for passing anonymous functions to higher-order functions:[38]

> sapply(1:5, \(i) i^2)    # here \(i) is the same as function(i) 
[1]  1  4  9 16 25

Native pipe operator

In R version 4.1.0, a native pipe operator, |>, was introduced.[39] This operator allows users to chain functions together one after another, instead of a nested function call.

> nrow(subset(mtcars, cyl == 4)) # Nested without the pipe character
[1] 11

> mtcars |> subset(cyl == 4) |> nrow() # Using the pipe character
[1] 11

Another alternative to nested functions, in contrast to using the pipe character, is using intermediate objects:

> mtcars_subset_rows <- subset(mtcars, cyl == 4)
> num_mtcars_subset <- nrow(mtcars_subset_rows)
> print(num_mtcars_subset)
[1] 11

While the pipe operator can produce code that is easier to read, it has been advised to pipe together at most 10 to 15 lines and chunk code into sub-tasks which are saved into objects with meaningful names.[40] Here is an example with fewer than 10 lines that some readers may still struggle to grasp without intermediate named steps:

(\(x, n = 42, key = c(letters, LETTERS, " ", ":", ")"))
    strsplit(x, "")[[1]] |>
    (Vectorize(\(chr) which(chr == key) - 1))() |>
    (`+`)(n) |>
    (`%%`)(length(key)) |>
    (\(i) key[i + 1])() |>
    paste(collapse = "")
)("duvFkvFksnvEyLkHAErnqnoyr")

Object-oriented programming

The R language has native support for object-oriented programming. There are two native frameworks, the so-called S3 and S4 systems. The former, being more informal, supports single dispatch on the first argument and objects are assigned to a class by just setting a "class" attribute in each object. The latter is a Common Lisp Object System (CLOS)-like system of formal classes (also derived from S) and generic methods that supports multiple dispatch and multiple inheritance[41]

In the example, summary is a generic function that dispatches to different methods depending on whether its argument is a numeric vector or a "factor":

> data <- c("a", "b", "c", "a", NA)
> summary(data)
   Length     Class      Mode 
        5 character character 
> summary(as.factor(data))
   a    b    c NA's 
   2    1    1    1

Modeling and plotting

Thumb
Diagnostic plots from plotting “model” (q.v. “plot.lm()” function). Notice the mathematical notation allowed in labels (lower left plot).

The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.

# Create x and y values
x <- 1:6
y <- x^2

# Linear regression model y = A + B * x
model <- lm(y ~ x)

# Display an in-depth summary of the model
summary(model)

# Create a 2 by 2 layout for figures
par(mfrow = c(2, 2))

# Output diagnostic plots of the model
plot(model)

Output:

Residuals:
      1       2       3       4       5       6       7       8      9      10
 3.3333 -0.6667 -2.6667 -2.6667 -0.6667  3.3333

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -9.3333     2.8441  -3.282 0.030453 * 
x             7.0000     0.7303   9.585 0.000662 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared:  0.9583, Adjusted R-squared:  0.9478
F-statistic: 91.88 on 1 and 4 DF,  p-value: 0.000662

Mandelbrot set

Thumb
"Mandelbrot.gif" graphic created in R. (Note: Colors differ from actual output.)

This Mandelbrot set example highlights the use of complex numbers. It models the first 20 iterations of the equation z = z2 + c, where c represents different complex constants.

Install the package that provides the write.gif() function beforehand:

install.packages("caTools")

R Source code:

library(caTools)

jet.colors <-
    colorRampPalette(
        c("green", "pink", "#007FFF", "cyan", "#7FFF7F",
          "white", "#FF7F00", "red", "#7F0000"))

dx <- 1500 # define width
dy <- 1400 # define height

C  <-
    complex(
            real = rep(seq(-2.2, 1.0, length.out = dx), each = dy),
            imag = rep(seq(-1.2, 1.2, length.out = dy), times = dx)
            )

# reshape as matrix of complex numbers
C <- matrix(C, dy, dx)

# initialize output 3D array
X <- array(0, c(dy, dx, 20))

Z <- 0

# loop with 20 iterations
for (k in 1:20) {

  # the central difference equation
  Z <- Z^2 + C

  # capture the results
  X[, , k] <- exp(-abs(Z))
}

write.gif(
    X,
    "Mandelbrot.gif",
    col = jet.colors,
    delay = 100)

Version names

Summarize
Perspective
Thumb
CD of R Version 1.0.0, autographed by the core team of R, photographed R in Quebec City in 2019

All R version releases from 2.14.0 onward have codenames that make reference to Peanuts comics and films.[42][43][44]

In 2018, core R developer Peter Dalgaard presented a history of R releases since 1997.[45] Some notable early releases before the named releases include:

  • Version 1.0.0 released on February 29, 2000 (2000-02-29), a leap day
  • Version 2.0.0 released on October 4, 2004 (2004-10-04), "which at least had a nice ring to it"[45]

The idea of naming R version releases was inspired by the Debian and Ubuntu version naming system. Dalgaard also noted that another reason for the use of Peanuts references for R codenames is because, "everyone in statistics is a P-nut".[45]

More information Version, Release date ...
R release codenames
Version Release date Name Peanuts reference Reference
4.4.2 2024-10-31 Pile of Leaves [46] [47]
4.4.1 2024-06-14 Race for Your Life [48] [49]
4.4.0 2024-04-24 Puppy Cup [50] [51]
4.3.3 2024-02-29 Angel Food Cake [52] [53]
4.3.2 2023-10-31 Eye Holes [54] [55]
4.3.1 2023-06-16 Beagle Scouts [56] [57]
4.3.0 2023-04-21 Already Tomorrow [58][59][60] [61]
4.2.3 2023-03-15 Shortstop Beagle [62] [63]
4.2.2 2022-10-31 Innocent and Trusting [64] [65]
4.2.1 2022-06-23 Funny-Looking Kid [66][67][68][69][70][71] [72]
4.2.0 2022-04-22 Vigorous Calisthenics [73] [74]
4.1.3 2022-03-10 One Push-Up [73] [75]
4.1.2 2021-11-01 Bird Hippie [76][77] [75]
4.1.1 2021-08-10 Kick Things [78] [79]
4.1.0 2021-05-18 Camp Pontanezen [80] [81]
4.0.5 2021-03-31 Shake and Throw [82] [83]
4.0.4 2021-02-15 Lost Library Book [84][85][86] [87]
4.0.3 2020-10-10 Bunny-Wunnies Freak Out [88] [89]
4.0.2 2020-06-22 Taking Off Again [90] [91]
4.0.1 2020-06-06 See Things Now [92] [93]
4.0.0 2020-04-24 Arbor Day [94] [95]
3.6.3 2020-02-29 Holding the Windsock [96] [97]
3.6.2 2019-12-12 Dark and Stormy Night See It was a dark and stormy night#Literature[98] [99]
3.6.1 2019-07-05 Action of the Toes [100] [101]
3.6.0 2019-04-26 Planting of a Tree [102] [103]
3.5.3 2019-03-11 Great Truth [104] [105]
3.5.2 2018-12-20 Eggshell Igloos [106] [107]
3.5.1 2018-07-02 Feather Spray [108] [109]
3.5.0 2018-04-23 Joy in Playing [110] [111]
3.4.4 2018-03-15 Someone to Lean On [112][better source needed] [113]
3.4.3 2017-11-30 Kite-Eating Tree See Kite-Eating Tree[114] [115]
3.4.2 2017-09-28 Short Summer See It Was a Short Summer, Charlie Brown [116]
3.4.1 2017-06-30 Single Candle [117] [118]
3.4.0 2017-04-21 You Stupid Darkness [117] [119]
3.3.3 2017-03-06 Another Canoe [120] [121]
3.3.2 2016-10-31 Sincere Pumpkin Patch [122] [123]
3.3.1 2016-06-21 Bug in Your Hair [124] [125]
3.3.0 2016-05-03 Supposedly Educational [126] [127]
3.2.5 2016-04-11 Very, Very Secure Dishes [128] [129][130][131]
3.2.4 2016-03-11 Very Secure Dishes [128] [132]
3.2.3 2015-12-10 Wooden Christmas-Tree See A Charlie Brown Christmas[133] [134]
3.2.2 2015-08-14 Fire Safety [135][136] [137]
3.2.1 2015-06-18 World-Famous Astronaut [138] [139]
3.2.0 2015-04-16 Full of Ingredients [140] [141]
3.1.3 2015-03-09 Smooth Sidewalk [142][page needed] [143]
3.1.2 2014-10-31 Pumpkin Helmet See You're a Good Sport, Charlie Brown [144]
3.1.1 2014-07-10 Sock it to Me [145][146][147][148] [149]
3.1.0 2014-04-10 Spring Dance [100] [150]
3.0.3 2014-03-06 Warm Puppy [151] [152]
3.0.2 2013-09-25 Frisbee Sailing [153] [154]
3.0.1 2013-05-16 Good Sport [155] [156]
3.0.0 2013-04-03 Masked Marvel [157] [158]
2.15.3 2013-03-01 Security Blanket [159] [160]
2.15.2 2012-10-26 Trick or Treat [161] [162]
2.15.1 2012-06-22 Roasted Marshmallows [163] [164]
2.15.0 2012-03-30 Easter Beagle [165] [166]
2.14.2 2012-02-29 Gift-Getting Season See It's the Easter Beagle, Charlie Brown[167] [168]
2.14.1 2011-12-22 December Snowflakes [169] [170]
2.14.0 2011-10-31 Great Pumpkin See It's the Great Pumpkin, Charlie Brown[171] [172]
r-devel N/A Unsuffered Consequences [173] [45]
Close

See also

Notes

  1. This displays to standard error a listing of all the packages that tidyverse depends upon. It may also display warnings showing namespace conflicts, which may typically be ignored.

References

Further reading

Wikiwand - on

Seamless Wikipedia browsing. On steroids.