Nathalie Villa-Vialaneix - http://www.nathalievialaneix.eu
September 14-16th, 2015
Master TIDE, Université Paris 1
R can be downloaded at http://cran.univ-paris1.fr with different installation instructions depending on your OS
For Ubuntu user, R can be installed with the command line
sudo apt-get install r-base-core r-base-dev
(see also
http://tuxette.nathalievialaneix.eu/?p=1380&lang=en
for a more up-to-date installation)
On Windows and Mac OS X, R is started by double clicking on its icon. A very basic GUI then opens up
On Linux/Unix, R is started using the command
R
in a terminal and the program starts with no GUI
RStudio is a set of integrated tools designed to help you use R. It includes a GUI:
RStudio is a set of integrated tools designed to help you use R. It includes a GUI.
Good practices with your statistical analyses:
XXX.R
)#
# this is a comment in a R script
data(iris)
XXX.rda
)Additional packages can be installed with
install.packages("RColorBrewer")
where RColorBrewer
is the name of the package that you want to installIn all cases, a menu pops up asking you to choose your CRAN repository: select Paris 2 which is the repository hosted by University Paris 1!
Once a package is installed, you can load it using the command line
library(RColorBrewer)
and all its functionalities are available until you close your R session.
help(mean)
help.search("mean")
returns a list of functions that correspond to this topic
demo()
lists available demonstrations. Try
demo(graphics)
to have a demonstration of the graphical capabilities of Rvariables, affectation
different modes for a variable, vectors, factors
basic functions and operations for different modes
R can be used for basic operations:
log((1 + sqrt(3)/2)^4)
[1] 2.495243
See operators: +
, -
, *
, /
, etc
See functions: log
, exp
, cos
, sin
, tan
, etc
Results can be stored in a variable
a <- cos(pi/4)
a
[1] 0.7071068
b = sin(pi/3+1)
b
[1] 0.888651
The mode of a variable indicates how it is stored in memory:
mode(a)
[1] "numeric"
is.numeric(a)
[1] TRUE
Numeric variables can also be specifically integers:
b <- 1L; mode(b)
[1] "numeric"
is.integer(b); is.integer(a); is.numeric(b)
[1] TRUE
[1] FALSE
[1] TRUE
b <- a; b; mode(b)
[1] 0.7071068
[1] "numeric"
b <- "a"; b; mode(b)
[1] "a"
[1] "character"
b <- as.character(a); b; mode(b)
[1] "0.707106781186548"
[1] "character"
b <- as.numeric(b); b; mode(b)
[1] 0.7071068
[1] "numeric"
b <- as.numeric("a"); b;
Warning: NAs introduced by coercion
[1] NA
In R missing values are coded with NA
:
d <- NA; d; is.na(d)
[1] NA
[1] TRUE
b <- "ABCABD"; nchar(b)
[1] 6
substr(b, 1, 3)
[1] "ABC"
b <- tolower(b); b; toupper(b)
[1] "abcabd"
[1] "ABCABD"
a <- "a"; b <- "abcabd"; paste(a,b,sep="-")
[1] "a-abcabd"
strsplit(b,a)
[[1]]
[1] "" "bc" "bd"
grep(a, b); gsub(a,"",b)
[1] 1
[1] "bcbd"
d <- is.numeric(b); d; mode(d)
[1] FALSE
[1] "logical"
as.numeric(d); as.character(d); as.logical(1)
[1] 0
[1] "FALSE"
[1] TRUE
e <- !d; e
[1] TRUE
d|e; e&d
[1] TRUE
[1] FALSE
e
[1] TRUE
e+e;
[1] 2
as.logical("0")
[1] NA
Vectors can be used to concatenate several variables of the same mode
a.vector <- c(a, b, d, e); a.vector
[1] "a" "abcabd" "FALSE" "TRUE"
mode(a.vector); length(a.vector)
[1] "character"
[1] 4
nchar(a.vector)
[1] 1 6 5 4
a <- c(1:5,NA); b <- seq(1,100,length=5); a; b
[1] 1 2 3 4 5 NA
[1] 1.00 25.75 50.50 75.25 100.00
a*b # !! this is not a dot product
[1] 1.0 51.5 151.5 301.0 500.0 NA
a; sum(a); sum(a, na.rm=TRUE)
[1] 1 2 3 4 5 NA
[1] NA
[1] 15
sum(a > 2, na.rm=TRUE)
[1] 3
a; b <- rep(1:6, each=2); b
[1] 1 2 3 4 5 NA
[1] 1 1 2 2 3 3 4 4 5 5 6 6
unique(b); intersect(a,b); union(a,b)
[1] 1 2 3 4 5 6
[1] 1 2 3 4 5
[1] 1 2 3 4 5 NA 6
b <- rep(1:3, 2); b;
[1] 1 2 3 1 2 3
b>2; which(b>2); b[c(3,6)]; b[b>2]
[1] FALSE FALSE TRUE FALSE FALSE TRUE
[1] 3 6
[1] 3 3
[1] 3 3
b <- rep(1:3, each=2); b;
[1] 1 1 2 2 3 3
which(b>5); length(which(b>5))
integer(0)
[1] 0
b==2;
[1] FALSE FALSE TRUE TRUE FALSE FALSE
A factor
is a vector with a pre-defined list of values
a.factor <- as.factor(a.vector); a.factor
[1] a abcabd FALSE TRUE
Levels: a abcabd FALSE TRUE
class(a.factor); levels(a.factor); nlevels(a.factor)
[1] "factor"
[1] "a" "abcabd" "FALSE" "TRUE"
[1] 4
a.factor[3] <- "2"; a.factor
[1] a abcabd <NA> TRUE
Levels: a abcabd FALSE TRUE
a.factor[3] <- "a"; a.factor
[1] a abcabd a TRUE
Levels: a abcabd FALSE TRUE
levels(a.factor) <-5:8; a.factor
[1] 5 6 5 8
Levels: 5 6 7 8
as.character(a.factor)
[1] "5" "6" "5" "8"
as.numeric(a.factor)
[1] 1 2 1 4
ls()
[1] "a" "a.factor" "a.vector" "b" "d" "e"
[7] "iris"
rm(iris); ls()
[1] "a" "a.factor" "a.vector" "b" "d" "e"
rm(list=ls())
This information is also available (in a more detailed manner) in the panel “Environment” of RStudio.
a <- c(1, 5, 3, 4); b <- letters[5:7]
a; b
[1] 1 5 3 4
[1] "e" "f" "g"
How to obtain from a
and b
:
d
[1] "1" "5" "e" "f" "g" "3" "4"
a <- c(1, 2, NA, 3:7); a; b <- 1:7; all.equal(a,b)
[1] 1 2 NA 3 4 5 6 7
[1] "Numeric: lengths (8, 7) differ"
How to remove missing values from a
such that
all.equal(a, b)
[1] TRUE
set.seed(1119)
as <- sample(1:5, 100, replace=TRUE)
head(as); tail(as)
[1] 4 3 4 1 4 1
[1] 5 2 4 5 4 5
What do the functions set.seed
, sample
, head
and tail
do? Find in one command line the number of “5” in this vector. The result
must be equal to:
[1] 17