+ - 0:00:00
Notes for current slide
Notes for next slide

Data visualization with ggplot2

1 / 90

ggplot

ggplot2 is a tidyverse package for data visualization. It follows the "grammar of graphics"

2 / 90

ggplot

ggplot2 is a tidyverse package for data visualization. It follows the "grammar of graphics"

Even with just a few of its functions, it can make beautiful, publication-ready plots.

3 / 90

ggplot

ggplot2 is a tidyverse package for data visualization. It follows the "grammar of graphics"

Even with just a few of its functions, it can make beautiful, publication-ready plots.

Its more advanced features add even more versatility.

4 / 90

ggplot

ggplot2 is a tidyverse package for data visualization. It follows the "grammar of graphics"

Even with just a few of its functions, it can make beautiful, publication-ready plots.

Its more advanced features add even more versatility.

People make companion packages that allow you to do even more, e.g., ggpubr to add statistics to your plot -- you'll use this in gradstats.

5 / 90

Palmer penguins

We're going to use the Palmer Penguins dataset as an example throughout our discussion of ggplot.

6 / 90

Palmer penguins

We're going to use the Palmer Penguins dataset as an example throughout our discussion of ggplot.

This data comes from the palmerpenugins package, which you can download from CRAN using install.packages("palmerpenguins").

7 / 90

Palmer penguins

We're going to use the Palmer Penguins dataset as an example throughout our discussion of ggplot.

This data comes from the palmerpenugins package, which you can download from CRAN using install.packages("palmerpenguins").

And remember, to load the package, use library(penguins), which will give you access to the built-in dataset, called penguins

8 / 90

Palmer penguins

We're going to use the Palmer Penguins dataset as an example throughout our discussion of ggplot.

This data comes from the palmerpenugins package, which you can download from CRAN using install.packages("palmerpenguins").

And remember, to load the package, use library(penguins), which will give you access to the built-in dataset, called penguins

9 / 90

Palmer penguins

library(palmerpenguins)
glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adeli…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, …
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, 3…
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, 17.1, 17.3, 17.6, 2…
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180, 182, 191, 198, 185…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, 3300, 3700, 3200, 3…
## $ sex <fct> male, female, female, NA, female, male, female, male, NA, NA, NA, NA, female,…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007,…
10 / 90

ggplot2

When creating a plot in ggplot2, the first thing you have to do is call the ggplot() function.

11 / 90

ggplot2

When creating a plot in ggplot2, the first thing you have to do is call the ggplot() function.

It takes data as its first argument. Let's start there...

12 / 90

ggplot2

When creating a plot in ggplot2, the first thing you have to do is call the ggplot() function.

It takes data as its first argument. Let's start there...

ggplot(data = penguins)

13 / 90

ggplot2

All we got was a blank square!

14 / 90

ggplot2

All we got was a blank square!

To understand why, let's back up to the philosophy of ggplot...

15 / 90

ggplot2

What is a statistical graphic?

16 / 90

ggplot2

What is a statistical graphic?

According to Hadley Wickham, we can think of statistical graphics as:

17 / 90

ggplot2

What is a statistical graphic?

According to Hadley Wickham, we can think of statistical graphics as:

  • Mappings from data to aesthetic attributes (color, shape, size)
  • Consisting of geometric objects (points, lines, bars)
  • On a particular coordinate system (or scale)
  • Sometimes derived from a statistical transformation
18 / 90

ggplot2

What is a statistical graphic?

According to Hadley Wickham, we can think of statistical graphics as:

  • Mappings from data to aesthetic attributes (color, shape, size)
  • Consisting of geometric objects (points, lines, bars)
  • On a particular coordinate system (or scale)
  • Sometimes derived from a statistical transformation

In ggplot, that combination of things forms a layer. The ggplot terminology is:

19 / 90

ggplot2

What is a statistical graphic?

According to Hadley Wickham, we can think of statistical graphics as:

  • Mappings from data to aesthetic attributes (color, shape, size)
  • Consisting of geometric objects (points, lines, bars)
  • On a particular coordinate system (or scale)
  • Sometimes derived from a statistical transformation

In ggplot, that combination of things forms a layer. The ggplot terminology is:

  • aes - aesthetic mappings
  • geom - geometric objects
  • scale - scales
  • stat - statistical transformations
20 / 90

ggplot

Putting all that together, the flow of information in ggplot is...

21 / 90

ggplot

Putting all that together, the flow of information in ggplot is...

  1. data is attached to the ggplot() call...
22 / 90

ggplot

Putting all that together, the flow of information in ggplot is...

  1. data is attached to the ggplot() call...

  2. mapped by aes()...

23 / 90

ggplot

Putting all that together, the flow of information in ggplot is...

  1. data is attached to the ggplot() call...

  2. mapped by aes()...

  3. transformed by stat before being passed to a geom()...

24 / 90

ggplot

Putting all that together, the flow of information in ggplot is...

  1. data is attached to the ggplot() call...

  2. mapped by aes()...

  3. transformed by stat before being passed to a geom()...

  4. which is placed, sized, and colored according to its relevant scale...

25 / 90

ggplot

Putting all that together, the flow of information in ggplot is...

  1. data is attached to the ggplot() call...

  2. mapped by aes()...

  3. transformed by stat before being passed to a geom()...

  4. which is placed, sized, and colored according to its relevant scale...

  5. then ta-da! rendered plot.

26 / 90

ggplot

Putting all that together, the flow of information in ggplot is...

  1. data is attached to the ggplot() call...

  2. mapped by aes()...

  3. transformed by stat before being passed to a geom()...

  4. which is placed, sized, and colored according to its relevant scale...

  5. then ta-da! rendered plot.


We already did step 1 above, by providing our penguins dataset to ggplot().

27 / 90

ggplot

Putting all that together, the flow of information in ggplot is...

  1. data is attached to the ggplot() call...

  2. mapped by aes()...

  3. transformed by stat before being passed to a geom()...

  4. which is placed, sized, and colored according to its relevant scale...

  5. then ta-da! rendered plot.


We already did step 1 above, by providing our penguins dataset to ggplot().

Now let's move onto aesthetic mapping.

28 / 90

But first... pipes!

%>%

29 / 90

Pipe operator

30 / 90

Pipe operator

Pipes are a way to write code in a way that is easier to read and understand. They allow you to string a series of functions together into a pipeline.

31 / 90

Pipe operator

Pipes are a way to write code in a way that is easier to read and understand. They allow you to string a series of functions together into a pipeline.


%>%

Strictly speaking, a pipe passes an object on the left-hand side as the first argument of whatever function is on the right-hand side.

32 / 90

Pipe operator

Pipes are a way to write code in a way that is easier to read and understand. They allow you to string a series of functions together into a pipeline.


%>%

Strictly speaking, a pipe passes an object on the left-hand side as the first argument of whatever function is on the right-hand side.

In general form, this means that...

x %>% f(y) is the same as f(x, y)

33 / 90

Pipe operator

Pipes are a way to write code in a way that is easier to read and understand. They allow you to string a series of functions together into a pipeline.


%>%

Strictly speaking, a pipe passes an object on the left-hand side as the first argument of whatever function is on the right-hand side.

In general form, this means that...

x %>% f(y) is the same as f(x, y)


The first argument of many tidyverse functions is data. This is the case for ggplot(), as we just saw.

34 / 90

Pipe operator

Pipes are a way to write code in a way that is easier to read and understand. They allow you to string a series of functions together into a pipeline.


%>%

Strictly speaking, a pipe passes an object on the left-hand side as the first argument of whatever function is on the right-hand side.

In general form, this means that...

x %>% f(y) is the same as f(x, y)


The first argument of many tidyverse functions is data. This is the case for ggplot(), as we just saw.

tidyverse functions are designed this way to make it easier to use pipes!

35 / 90

Pipe operator

Let's take another look at the code we just wrote:

ggplot(data = penguins)
36 / 90

Pipe operator

Let's take another look at the code we just wrote:

ggplot(data = penguins)

We can re-write this code by "piping" the first argument, data to the function, ggplot()

penguins %>%
ggplot()
37 / 90

Pipe operator

Let's take another look at the code we just wrote:

ggplot(data = penguins)

We can re-write this code by "piping" the first argument, data to the function, ggplot()

penguins %>%
ggplot()

You can think of the pipe as representing the phrase "and then..."

38 / 90

Pipe operator

Let's take another look at the code we just wrote:

ggplot(data = penguins)

We can re-write this code by "piping" the first argument, data to the function, ggplot()

penguins %>%
ggplot()

You can think of the pipe as representing the phrase "and then..."

Verbally, we might describe the code we just wrote as "Start with the penguins data, and then create the foundation of a plot"

39 / 90

Pipe operator

Let's take another look at the code we just wrote:

ggplot(data = penguins)

We can re-write this code by "piping" the first argument, data to the function, ggplot()

penguins %>%
ggplot()

You can think of the pipe as representing the phrase "and then..."

Verbally, we might describe the code we just wrote as "Start with the penguins data, and then create the foundation of a plot"


⌨️ You can enter a pipe with the shortcut CTRL+Shift+M for PC or CMD+Shift+M for Mac.

40 / 90

Your Turn 1

02:00

The code below calculates the mean of the variable flipper_length_mm in the penguins dataset.

mean(penguins$flipper_length_mm, na.rm = TRUE)
  1. Run the code chunk provided to calculate the mean of flipper_length_mm.

  2. Re-write the code above using the pipe operator, %>%. Check that the answer you get matches the original code you ran above.

41 / 90

Solution

mean(penguins$flipper_length_mm, na.rm = TRUE)
## [1] 200.9152
penguins$flipper_length_mm %>%
mean(na.rm = TRUE)
## [1] 200.9152
42 / 90

Now back to ggplot...

43 / 90

Aesthetic mapping

The basic idea with aesthetic mapping is that we want to take our data and map it onto physical space using geometric objects, or geoms.

44 / 90

Aesthetic mapping

The basic idea with aesthetic mapping is that we want to take our data and map it onto physical space using geometric objects, or geoms.

For ggplot to place a geom, it needs to know how to map aspects of the data onto the aesthetics (or visual features) of the graph.

45 / 90

Aesthetic mapping

The basic idea with aesthetic mapping is that we want to take our data and map it onto physical space using geometric objects, or geoms.

For ggplot to place a geom, it needs to know how to map aspects of the data onto the aesthetics (or visual features) of the graph.

Each geom() function in ggplot takes a mapping argument.

46 / 90

Aesthetic mapping

The basic idea with aesthetic mapping is that we want to take our data and map it onto physical space using geometric objects, or geoms.

For ggplot to place a geom, it needs to know how to map aspects of the data onto the aesthetics (or visual features) of the graph.

Each geom() function in ggplot takes a mapping argument.

Let’s start with a basic geom, geom_histogram()...

47 / 90

Mapping with geoms

We're going to call ggplot() again, and this time add the geom_histogram() layer, telling it to map the variable flipper_length_mm to the x-axis aesthetic

48 / 90

Mapping with geoms

We're going to call ggplot() again, and this time add the geom_histogram() layer, telling it to map the variable flipper_length_mm to the x-axis aesthetic

penguins %>%
ggplot() +
geom_histogram(mapping = aes(x = flipper_length_mm))

49 / 90

Mapping with geoms

You can add additional arguments to the geom(). For example, we could change the number of bins

50 / 90

Mapping with geoms

You can add additional arguments to the geom(). For example, we could change the number of bins

penguins %>%
ggplot() +
geom_histogram(mapping = aes(x = flipper_length_mm),
bins = 10)

51 / 90

Mapping with geoms

Or make them a different color. We can change the color of 2-d objects with the fill aesthetic

52 / 90

Mapping with geoms

Or make them a different color. We can change the color of 2-d objects with the fill aesthetic

penguins %>%
ggplot() +
geom_histogram(mapping = aes(x = flipper_length_mm),
fill = "red")

53 / 90

Mapping with geoms

Now we'll use a different geom -- we'll add a layer of points to our plot using geom_point()

penguins %>%
ggplot() +
geom_point(mapping = aes(x = flipper_length_mm))
## Error: geom_point requires the following missing aesthetics: y

54 / 90

Mapping with geoms

We get an error, telling us that geom_point() requires the y-aesthetic.

This makes sense -- we need an x and y axis to define where points belong on a scatter plot. Let's add bill_length_mm as the y-axis

55 / 90

Mapping with geoms

We get an error, telling us that geom_point() requires the y-aesthetic.

This makes sense -- we need an x and y axis to define where points belong on a scatter plot. Let's add bill_length_mm as the y-axis

penguins %>%
ggplot() +
geom_point(mapping = aes(x = flipper_length_mm,
y = bill_length_mm))

56 / 90

Mapping with geoms

Let's find out if the relationship between flipper_length_mm and bill_length_mm relates to the species of penguin.

We'll map species to the color aesthetic (similar to fill, but for 1-d objects).

57 / 90

Mapping with geoms

Let's find out if the relationship between flipper_length_mm and bill_length_mm relates to the species of penguin.

We'll map species to the color aesthetic (similar to fill, but for 1-d objects).

penguins %>%
ggplot() +
geom_point(
mapping = aes(x = flipper_length_mm,
y = bill_length_mm,
color = species))

58 / 90

Mapping with geoms

Notice that we included color inside our aesthetic mapping call (mapping = aes()) here, but not when we filled our histogram with the color red earlier

59 / 90

Mapping with geoms

Notice that we included color inside our aesthetic mapping call (mapping = aes()) here, but not when we filled our histogram with the color red earlier

This is the difference between mapping an aesthetic to data and just setting an aesthetic to some value (e.g., "red").

60 / 90

Mapping with geoms

Notice that we included color inside our aesthetic mapping call (mapping = aes()) here, but not when we filled our histogram with the color red earlier

This is the difference between mapping an aesthetic to data and just setting an aesthetic to some value (e.g., "red").

This is a fairly common mistake, so let's take a look at an example

61 / 90

Mapping with geoms

What happens if we tell ggplot to make our points red but accidentally include that inside the aes() call?

62 / 90

Mapping with geoms

What happens if we tell ggplot to make our points red but accidentally include that inside the aes() call?

penguins %>%
ggplot() +
geom_point(mapping = aes(x = flipper_length_mm,
y = bill_length_mm,
color = "red"))

This is not what we want! ggplot is treating the value "red" as if it were part of our data, which it isn't

63 / 90

Your Turn 2

03:30
  1. Create a scatter plot to visualize the relationship between flipper_length_mm and bill_length_mm.

  2. Build on your plot above by adding an aesthetic to visualize the effect of species. Choose any aesthetic you’d like or play around with a few. What do they do? How might you use more than one aesthetic?

Note: Options for aesthetics include color, shape, size, and alpha (transparency, 0-1 scale).

64 / 90

Solution

Q1

penguins %>%
ggplot() +
geom_point(aes(x = flipper_length_mm,
y = bill_length_mm))

65 / 90

Solution

Q2 (answers will vary...)

penguins %>%
ggplot() +
geom_point(aes(x = flipper_length_mm,
y = bill_length_mm,
color = species,
shape = species),
alpha = 0.5,
size = 3)

66 / 90

Mapping with geoms

We could also make separate graphs for each species using facet_wrap. We do this by passing a one-sided formula to facet_wrap()

67 / 90

Mapping with geoms

We could also make separate graphs for each species using facet_wrap. We do this by passing a one-sided formula to facet_wrap()

penguins %>%
ggplot() +
geom_point(aes(x = flipper_length_mm,
y = bill_length_mm)) +
facet_wrap(~species)

68 / 90

Mapping with geoms

Another thing we often want to do is to add a line over our scatterplot to describe the linear relationship between variables. We can do this by adding a geom_smooth() layer to our plot.

69 / 90

Mapping with geoms

Another thing we often want to do is to add a line over our scatterplot to describe the linear relationship between variables. We can do this by adding a geom_smooth() layer to our plot.

penguins %>%
ggplot() +
geom_point(aes(x = flipper_length_mm,
y = bill_length_mm,
color = species)) +
geom_smooth(aes(x = flipper_length_mm,
y = bill_length_mm),
color = "black")

Note that "loess" is the default function for geom_smooth().

Learn more on that here.

You can change that by setting the method argument in geom_smooth(). Let's change it to our old friend linear regression or "lm"

penguins %>%
ggplot() +
geom_point(aes(x = flipper_length_mm,
y = bill_length_mm,
color = species)) +
geom_smooth(aes(x = flipper_length_mm,
y = bill_length_mm),
color = "black",
method = "lm")

70 / 90

Global aesthetic mapping

Our code so far has been getting rather inefficient. We're specifying the x and y axis for each geom_* call.

71 / 90

Global aesthetic mapping

Our code so far has been getting rather inefficient. We're specifying the x and y axis for each geom_* call.


Instead, we can use global aesthetic mappings, which are specified in the ggplot() call.

72 / 90

Global aesthetic mapping

Our code so far has been getting rather inefficient. We're specifying the x and y axis for each geom_* call.


Instead, we can use global aesthetic mappings, which are specified in the ggplot() call.

Global mappings are inherited by each layer unless they're overwritten.

73 / 90

Global aesthetic mapping

Let's re-make our previous plot using global aesthetic mapping

74 / 90

Global aesthetic mapping

Let's re-make our previous plot using global aesthetic mapping

penguins %>%
ggplot(aes(x = flipper_length_mm,
y = bill_length_mm))+
geom_point(aes(color = species)) +
geom_smooth(color = "black",
method = "lm")

So...what do we put in global aesthetic mapping and what do we put in the aesthetic mapping of specific geoms?

You want to put anything in the global mapping that you want every layer to inherit (or at least the majority of them).

In the code above, I defined the x and y aesthetics globally, because I want those the same in every geom.

However, I don't define thecolor aesthetic globally, because color is geom-specific in this case.

75 / 90

Global aesthetic mapping

Let's take a look at the previous example again, but this time with color in the global aesthetic...

76 / 90

Global aesthetic mapping

Let's take a look at the previous example again, but this time with color in the global aesthetic...

penguins %>%
ggplot(aes(x = flipper_length_mm,
y = bill_length_mm,
color = species))+
geom_point() + # inherit global
geom_smooth(method = "lm") #inherit global

As you can see, global aesthetic mapping gets inherited by every layer. We can override this by providing a different aesthetic mapping in individual geom() calls...

penguins %>%
ggplot(aes(x = flipper_length_mm,
y = bill_length_mm,
color = species))+
geom_point() + #inherit global
geom_smooth(method = "lm",
color = "black") #override global `color`

77 / 90

Your Turn 3

04:00
  1. Use rio::import to read in the dataset pragmatic_scales_data.csv, which is saved inside the data folder. Save it to an object called ps_data.

  2. Take a glimpse at the data using glimpse() or str() to get a sense of the variables. You can also View() the data.

  3. Fill in the blanks in the code to re-create the plot below. Note: This plot uses a new geom called geom_col(), which I've filled in for you.

78 / 90

Solution

ps_data <- rio::import(here::here("data/pragmatic_scales_data.csv"))
glimpse(ps_data)
## Rows: 588
## Columns: 5
## $ subid <chr> "M22", "M22", "M22", "M22", "T22", "T22", "T22", "T22", "T17", "T17", "T17", "T17", "…
## $ item <chr> "faces", "houses", "pasta", "beds", "beds", "faces", "houses", "pasta", "pasta", "fac…
## $ correct <int> 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1…
## $ age <dbl> 2.00, 2.00, 2.00, 2.00, 2.13, 2.13, 2.13, 2.13, 2.32, 2.32, 2.32, 2.32, 2.38, 2.38, 2…
## $ condition <chr> "Label", "Label", "Label", "Label", "Label", "Label", "Label", "Label", "Label", "Lab…
ps_data %>%
ggplot(aes(x = item, y = correct, fill = item)) +
geom_col() +
facet_wrap(~condition)

79 / 90

Labels and themes

You can do a TON more customization of your plots than what we've covered so far. The possibilities with ggplot really are endless!

80 / 90

Labels and themes

You can do a TON more customization of your plots than what we've covered so far. The possibilities with ggplot really are endless!

For example, you can change your axis labels, tweak the theme of the plot, etc...

81 / 90

Labels and themes

You can do a TON more customization of your plots than what we've covered so far. The possibilities with ggplot really are endless!

For example, you can change your axis labels, tweak the theme of the plot, etc...


Good resources:

https://ggplot2.tidyverse.org/reference/theme.html

http://r-statistics.co/Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html

82 / 90

Example 🤯

Image from Eric Ekholm

83 / 90

Example 🤯

Image from Eric Ekholm

84 / 90

Example 🤯

Image from Georgios Karamanis

85 / 90

Example 🤯

Image from Georgios Karamanis

86 / 90

Let's recap with a flipbook!

87 / 90

Flipbook

Click on the "full screen" icon in the bottom right and use the navigation arrows to flip through the creation of a scatterplot in real time!

88 / 90

Q & A

05:00
89 / 90

Next up...

Data wrangling with dplyr

90 / 90

ggplot

ggplot2 is a tidyverse package for data visualization. It follows the "grammar of graphics"

2 / 90
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow