Introduction to R - 1st lesson (introduction)

Nathalie Villa-Vialaneix - http://www.nathalievialaneix.eu
September 14-16th, 2015

Master TIDE, Université Paris 1

R

Basic presentation of R

R

  • what is R?
  • how to install R and additional packages?
  • how to start and use R and its packages?
  • find help with R

What is R?

  • R is a free statistical software
  • can be installed on Linux, Windows and Mac OS X
  • has a number of additional packages providing a wide number of functions for statistics and scientific programming

Install R

R can be downloaded at http://cran.univ-paris1.fr with different installation instructions depending on your OS

For Ubuntu user, R can be installed with the command line

sudo apt-get install r-base-core r-base-dev

(see also

http://tuxette.nathalievialaneix.eu/?p=1380&lang=en

for a more up-to-date installation)

Start R

  • On Windows and Mac OS X, R is started by double clicking on its icon. A very basic GUI then opens up

  • On Linux/Unix, R is started using the command

    R
    

    in a terminal and the program starts with no GUI

RStudio

RStudio is a set of integrated tools designed to help you use R. It includes a GUI: RStudio

RStudio

RStudio is a set of integrated tools designed to help you use R. It includes a GUI.

Good practices with your statistical analyses:

  • systematically create and save a R script that contains all the command lines that you have used (plain text file named XXX.R)
  • comment your scripts using the character #
# this is a comment in a R script
data(iris)
  • you can save your results as well (R data files are named XXX.rda)

Install packages

Additional packages can be installed with

  • the menu “Install packages” on Windows and Mac OS X
  • the command line install.packages("RColorBrewer") where RColorBrewer is the name of the package that you want to install
  • the menu “Tools/Install packages” in RStudio

In all cases, a menu pops up asking you to choose your CRAN repository: select Paris 2 which is the repository hosted by University Paris 1!

Load packages

Once a package is installed, you can load it using the command line

library(RColorBrewer)

and all its functionalities are available until you close your R session.

Help pages and demos

  • Help on a given function help(mean)
  • Help on a given topic help.search("mean") returns a list of functions that correspond to this topic
  • Demonstrations: the command line demo() lists available demonstrations. Try demo(graphics) to have a demonstration of the graphical capabilities of R

Find help on the web

Variables, mode, operations

R

  • variables, affectation

  • different modes for a variable, vectors, factors

  • basic functions and operations for different modes

Basic operations

R can be used for basic operations:

log((1 + sqrt(3)/2)^4)
[1] 2.495243

See operators: +, -, *, /, etc

See functions: log, exp, cos, sin, tan, etc

Variables

Results can be stored in a variable

a <- cos(pi/4)
a
[1] 0.7071068
b = sin(pi/3+1)
b
[1] 0.888651

Mode of a variable: numeric

The mode of a variable indicates how it is stored in memory:

mode(a)
[1] "numeric"
is.numeric(a)
[1] TRUE

Mode of a variable: integer

Numeric variables can also be specifically integers:

b <- 1L; mode(b)
[1] "numeric"
is.integer(b); is.integer(a); is.numeric(b)
[1] TRUE
[1] FALSE
[1] TRUE

Mode of a variable: character

b <- a; b; mode(b)
[1] 0.7071068
[1] "numeric"
b <- "a"; b; mode(b)
[1] "a"
[1] "character"

Mode of a variable: character

b <- as.character(a); b; mode(b)
[1] "0.707106781186548"
[1] "character"
b <- as.numeric(b); b; mode(b)
[1] 0.7071068
[1] "numeric"

Mode of a variable: character

b <- as.numeric("a"); b;
Warning: NAs introduced by coercion
[1] NA

In R missing values are coded with NA:

d <- NA; d; is.na(d)
[1] NA
[1] TRUE

Functions for characters

b <- "ABCABD"; nchar(b)
[1] 6
substr(b, 1, 3)
[1] "ABC"
b <- tolower(b); b; toupper(b)
[1] "abcabd"
[1] "ABCABD"

Operations with characters

a <- "a"; b <- "abcabd"; paste(a,b,sep="-")
[1] "a-abcabd"
strsplit(b,a)
[[1]]
[1] ""   "bc" "bd"
grep(a, b); gsub(a,"",b)
[1] 1
[1] "bcbd"

Mode of a variable: logical

d <- is.numeric(b); d; mode(d)
[1] FALSE
[1] "logical"
as.numeric(d); as.character(d); as.logical(1)
[1] 0
[1] "FALSE"
[1] TRUE

Functions/operations with booleans

e <- !d; e
[1] TRUE
d|e; e&d
[1] TRUE
[1] FALSE

...be careful with booleans

e
[1] TRUE
e+e;
[1] 2
as.logical("0")
[1] NA

Concatenating

Vectors can be used to concatenate several variables of the same mode

a.vector <- c(a, b, d, e); a.vector
[1] "a"      "abcabd" "FALSE"  "TRUE"  
mode(a.vector); length(a.vector)
[1] "character"
[1] 4

Pairwise operations with vectors

nchar(a.vector)
[1] 1 6 5 4
a <- c(1:5,NA); b <- seq(1,100,length=5); a; b
[1]  1  2  3  4  5 NA
[1]   1.00  25.75  50.50  75.25 100.00
a*b # !! this is not a dot product
[1]   1.0  51.5 151.5 301.0 500.0    NA

Basic functions and operators with vectors

a; sum(a); sum(a, na.rm=TRUE)
[1]  1  2  3  4  5 NA
[1] NA
[1] 15
sum(a > 2, na.rm=TRUE)
[1] 3

Basic functions and operators with vectors

a; b <- rep(1:6, each=2); b
[1]  1  2  3  4  5 NA
 [1] 1 1 2 2 3 3 4 4 5 5 6 6
unique(b); intersect(a,b); union(a,b)
[1] 1 2 3 4 5 6
[1] 1 2 3 4 5
[1]  1  2  3  4  5 NA  6

Subsetting vectors

b <- rep(1:3, 2); b; 
[1] 1 2 3 1 2 3
b>2; which(b>2); b[c(3,6)]; b[b>2]
[1] FALSE FALSE  TRUE FALSE FALSE  TRUE
[1] 3 6
[1] 3 3
[1] 3 3

Subsetting vectors

b <- rep(1:3, each=2); b; 
[1] 1 1 2 2 3 3
which(b>5); length(which(b>5))
integer(0)
[1] 0
b==2; 
[1] FALSE FALSE  TRUE  TRUE FALSE FALSE

Mode of a variable: factor

A factor is a vector with a pre-defined list of values

a.factor <- as.factor(a.vector); a.factor
[1] a      abcabd FALSE  TRUE  
Levels: a abcabd FALSE TRUE
class(a.factor); levels(a.factor); nlevels(a.factor)
[1] "factor"
[1] "a"      "abcabd" "FALSE"  "TRUE"  
[1] 4

Levels of factors

a.factor[3] <- "2"; a.factor
[1] a      abcabd <NA>   TRUE  
Levels: a abcabd FALSE TRUE
a.factor[3] <- "a"; a.factor
[1] a      abcabd a      TRUE  
Levels: a abcabd FALSE TRUE
levels(a.factor) <-5:8; a.factor
[1] 5 6 5 8
Levels: 5 6 7 8

Operations with factors

as.character(a.factor)
[1] "5" "6" "5" "8"
as.numeric(a.factor)
[1] 1 2 1 4

Listing and removing variables

ls()
[1] "a"        "a.factor" "a.vector" "b"        "d"        "e"       
[7] "iris"    
rm(iris); ls()
[1] "a"        "a.factor" "a.vector" "b"        "d"        "e"       
rm(list=ls())

This information is also available (in a more detailed manner) in the panel “Environment” of RStudio.

Exercise 1

a <- c(1, 5, 3, 4); b <- letters[5:7]
a; b
[1] 1 5 3 4
[1] "e" "f" "g"

How to obtain from a and b:

d
[1] "1" "5" "e" "f" "g" "3" "4"

Exercise 2

a <- c(1, 2, NA, 3:7); a; b <- 1:7; all.equal(a,b)
[1]  1  2 NA  3  4  5  6  7
[1] "Numeric: lengths (8, 7) differ"

How to remove missing values from a such that

all.equal(a, b)
[1] TRUE

Exercise 3

set.seed(1119)
as <- sample(1:5, 100, replace=TRUE)
head(as); tail(as)
[1] 4 3 4 1 4 1
[1] 5 2 4 5 4 5

What do the functions set.seed, sample, head and tail do? Find in one command line the number of “5” in this vector. The result must be equal to:

[1] 17