Nathalie Vialaneix and Sébastien Déjean
12 octobre 2018
Edward Tufte about 1986 Challenger space shuttle disaster
Ref: Visual explanations: image and quantities, evidence and narrative. Chap. 5 deals with the Challenger disaster.
plot
works for many different ojects)ggplot2
is a part of tidyverse
(www.tidyverse.org) a collection or R
packages designed for data sciencedata.frame
ggplot2
learning curve is steep but worth the effort
people <- data.frame(weight = c(80, 49, 62, 57),
height = c(1.82, 1.58, 1.71, 1.63),
gender = c("M", "F", "F", "F"))
people
weight height gender
1 80 1.82 M
2 49 1.58 F
3 62 1.71 F
4 57 1.63 F
Want to make a scatterplot of height vs. weight:
\( \Rightarrow \) a plot can be found of as a mapping of data to geometric object (point, line, bar…) and their aesthetic attributes (shape, color, size…) \[ \mbox{height} \rightarrow x \qquad \mbox{weight} \rightarrow y \qquad \mbox{gender} \rightarrow \mbox{color} \]
Also scales and a coordinate system are needed to convert data unit to physical drawing units
geometric objects + scales and coordinate system \( \rightarrow \) plot
Data: a data.frame
and nothing else!
Aesthetic mapping: describe how data are mapped to things we can see on the plot through the function aes()
.
Geometric object: perform the actual rendering of the plot and control the type of plot to create (points
, line
, histogram
, boxplot
…).
Statistical transformations: transform the data for instance by summarising it in some manner (sum
, density
, smooth
…)
Position adjustments: apply minor changes to the position of elements (jitter
, fill
, stack
, dodge
, identity
)
Basic idea: specify different parts of the plot, and add them together using the +
operator.
library(ggplot2)
ggplot(data = <DATA>,
aes(x = <X AXIS VARIABLE>,
y = <Y AXIS VARIABLE>, ... ), ...) +
geom_<TYPE>(aes(size = <SIZE VARIABLE>, ...),
data = <DATA>,
stat = <FUNCTION>,
position = <POSITION>,
color = <"COLOR">, ...) +
scale_<AESTHETIC>_<TYPE>(name = <NAME>,
breaks = <WHERE>,
labels = <LABELS>, ... ) +
theme(...) +
facet_<TYPE>(<FORMULA>)
~54,000 round diamonds from http://www.diamondse.info with carat, colour, clarity, cut, total depth, table, depth, width, height, price
data(diamonds)
dim(diamonds)
[1] 53940 10
head(diamonds)
# A tibble: 6 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.230 Ideal E SI2 61.5 55. 326 3.95 3.98 2.43
2 0.210 Premium E SI1 59.8 61. 326 3.89 3.84 2.31
3 0.230 Good E VS1 56.9 65. 327 4.05 4.07 2.31
4 0.290 Premium I VS2 62.4 58. 334 4.20 4.23 2.63
5 0.310 Good J SI2 63.3 58. 335 4.34 4.35 2.75
6 0.240 Very Good J VVS2 62.8 57. 336 3.94 3.96 2.48
Use a subsample of 1,000 diamonds in the following graphids
Base graphics histogram example:
hist(diamonds$price, main = "", xlab = "Price", breaks = 50)
ggplot2 graphics histogram example:
ggplot(diamonds, aes(x = price)) + geom_histogram(bins = 50)
ggplot(diamonds, aes(x = price, fill = color)) +
geom_histogram(binwidth = 1000) + facet_wrap( ~ cut) + theme_bw()
color
cut
(one panel for each type of
cut)## Create ggplot object, populate it with data
ggplot(diamonds, aes(x = carat, y = price, colour = cut)) +
## Add layer(s)
geom_point(alpha = 0.3) +
geom_smooth() +
## Scales for dimensions, color palettes
scale_y_log10() +
## Condition on variables
facet_grid(~ cut) +
## More options
ggtitle("First example") + theme_bw()
## Create ggplot object
MyPlot <- ggplot(diamonds)
class(MyPlot)
summary(MyPlot); MyPlot
## Add aesthetics
MyPlot <- MyPlot + aes(x = carat, y = price, colour = cut)
summary(MyPlot)
MyPlot
## Add layer(s)
MyPlot <- MyPlot + geom_point(alpha=0.3)
summary(MyPlot) ; MyPlot
MyPlot <- MyPlot + geom_smooth()
summary(MyPlot) ; MyPlot
## Scales for dimensions
MyPlot + scale_y_log10()
## Condition on variables
MyPlot + facet_grid(~ cut)
## More options
MyPlot + ggtitle("First example") + theme_bw()
Create a plot object (class gg, ggplot2
)
Plot objects can be stored as variables; it's easy to share a plot.
The plot object cannot be displayed without, at least, one layer
ggplot2 aesthetic = “something you can see”, set with the aes()
function
Examples:
All geom_XXX
require some aesthetics (at least one)
?geom_point
...
Aesthetics
The following aesthetics can be used with geom_point. Aesthetics are mapped
to variables in the data with the aes function: geom_point(aes(x = var))
x: x position (required)
y: y position (required)
shape: shape of point
colour: border colour
size: size
fill: internal colour
alpha: transparency
All of the following are correct (and equivalent):
ggplot(diamonds, aes(x = carat, y = price, color = cut)) + geom_point()
ggplot(diamonds) + geom_point(aes(x = carat, y = price, color = cut))
ggplot(diamonds, aes(x = carat, y = price)) + geom_point(aes(color = cut))
Aesthetic mapping:
aes()
ggplot(diamonds, aes(x = carat, y = price, color = clarity)) + geom_point()
Setting:
geom_<TYPE>
ggplot(diamonds, aes(x = carat, y = price)) + geom_point(color = "red")
with just geom_histogram
and geom_point
but improved by adding transparency to points
Tip: Size of bins is 0.2
ggplot2 geometric object = the actual marks put on the plot, a plot must have at least one geom
Examples:
geom_point()
for scatterplots, dot plots, etc.geom_line()
for time series, geom_smooth() for trend lines
(spline by default), etc.geom_boxplot()
and geom_histogram()
geom_bar()
help.search("geom_", package = "ggplot2")
geom_abline geom_jitter
geom_area geom_line
geom_bar geom_linerange
geom_bin2d geom_path
geom_blank geom_point
geom_boxplot geom_pointrange
geom_contour geom_polygon
geom_crossbar geom_quantile
geom_density geom_rect
geom_density2d geom_ribbon
geom_errorbar geom_rug
geom_errorbarh geom_segment
geom_freqpoly geom_smooth
geom_hex geom_step
geom_histogram geom_text
geom_hline geom_tile
geom_vline ...
p <- ggplot(diamonds)
## Overall histogram
p + geom_histogram(aes(x = price))
## Composition of each bin
p + geom_histogram(aes(x = price, fill = cut))
## Relative proportions
p + geom_histogram(aes(x = price, fill = cut), position = "fill")
p + geom_density(aes(x = price, fill = cut), alpha = 0.5)
p + geom_boxplot(aes(x = cut, y = price), notch = TRUE)
About notches
p + geom_boxplot(aes(x = cut, y = price, fill = color))
geom_rect
requires xmin
, xmax
, ymin
, ymax
geom_tile
requires x
, y
and optionnaly handles width
and height
p + geom_tile(aes(x = as.numeric(cut), y = as.numeric(color), fill = depth))
/!\ Only the last value of depth is represented…
geom_raster
is a special case of geom_tile
with all tiles having the same sizep + geom_raster(aes(x = as.numeric(cut), y = as.numeric(color), fill = depth))
ggplot(diamonds, aes(x = color, y = price, fill = color)) +
geom_boxplot(outlier.size = 0) +
geom_point(aes(fill = color), alpha = 0.1, shape = 21)
ggplot(diamonds, aes(x = color, y = price, fill = color)) +
geom_boxplot(outlier.size = 0) +
geom_point(aes(fill = color), alpha = 0.1, , shape = 21,
position = position_jitter(w = .3))
ggplot(diamonds, aes(x = reorder(color, price), y = price, fill = color)) +
geom_boxplot(outlier.size = 0) +
geom_point(aes(fill = color), alpha = 0.1, , shape = 21,
position = position_jitter(w = .4))
reorder
is not a ggplot2
function. It deals with its first argument as a categorial variable (color
) and reorder its level based on the value of a second variable (price
). The third argument (FUN
, default is mean
) is the function to be applied to price
for each level of color
.ggplot(diamonds, aes(x = reorder(color, price), y = price, fill = color)) +
geom_violin() +
geom_point(aes(fill = color), alpha = 0.5, , shape = 21,
position = position_jitter(w = .4))
ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
geom_point(shape = 21) + geom_smooth() + geom_rug()
Base graphics
?check for the option position
of geom_bar
Themes handle non-data plot elements like axis labels, plot background, legend
appearance, …: theme_gray()
(default), theme_bw()
,
theme_classic()
, theme_linedraw()
, …
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() +
geom_smooth(aes(colour = cut)) + theme_bw()
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() +
geom_smooth(aes(colour = cut)) + theme_bw() + ylab("price (in USD)") +
ggtitle("My beautiful plot")
ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
geom_boxplot() + scale_fill_discrete(name = "Clarity of diamond",
labels = paste("C", 1:8))
library(RColorBrewer)
ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
geom_boxplot() + scale_fill_manual(name = "Clarity of\n diamond",
labels = paste("C", 1:8),
values = brewer.pal(8, "Set2"))
ggplot(diamonds, aes(x = z, y = carat, colour = price)) +
geom_point() + scale_y_log10() +
scale_colour_gradient(low = "grey", high = "pink")
ggplot(diamonds, aes(x = z, y = carat, colour = price)) +
geom_point() + scale_y_log10() + xlim(0, 10) +
scale_colour_gradient(low = "grey", high = "pink")
ggplot(diamonds, aes(x = price)) + geom_histogram() + facet_wrap(~ cut)
ggplot(diamonds, aes(x = price)) + geom_histogram(fill = "red") +
facet_grid(clarity ~ cut) + theme_dark()
ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
geom_boxplot() +
theme(legend.text = element_text(size = 5, colour = "red"),
legend.position = "top",
axis.ticks = element_blank(),
axis.text.x = element_text(size = 10, angle = 45, face = "bold"))
To plot several ggplot graphics together, use gridExtra
:
library(gridExtra)
p1 <- ggplot(diamonds) + geom_point(aes(x = carat, y = price, color = cut))
p2 <- ggplot(diamonds) + geom_density(aes(x = price, fill = cut), alpha = 0.5)
grid.arrange(p1, p2, ncol = 2)
Use ggsave
to save a ggplot (uses file name extension to determine file
type: .ps, .eps, .tex, .pdf, .jpg, .tiff, .png, .bmp, .svg, .wmf)
p <- ggplot(...) + ...
ggsave("...", plot = p, width = 4, height = 4)
library(ggnetwork); library(network)
data(emon)
ggplot(ggnetwork(emon[[1]], layout = "kamadakawai", arrow.gap = 0.025),
aes(x, y, xend = xend, yend = yend)) +
geom_edges(aes(color = Frequency), curvature = 0.1,
arrow = arrow(length = unit(10, "pt"), type = "open")) +
geom_nodes(aes(size = Formalization)) +
scale_color_gradient(low = "grey50", high = "tomato") +
scale_size_area(breaks = 1:3) + theme_blank()
ggbio = Bioconductor package for ggplots of genomics data http://bioconductor.org/packages/release/bioc/html/ggbio.html
Slides built with material coming from:
Credit for figures