+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to the Tidyverse

1 / 17

Tidyverse

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

2 / 17

Tidyverse

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.


These packages are designed to support the natural workflow of any data analysis project, as depicted below

Image from R for Data Science

3 / 17

Core tidyverse packages

4 / 17

Core tidyverse packages

Image from Sylvia Canelón

5 / 17

Core tidyverse packages

Image from Mine Çetinkaya-Rundel

6 / 17

Slide adapted from Sylvia Canelón

readr

Importing data is the very first step!
You can use readr to import rectangular data.

Functions for different file types

  • comma separated (CSV) files with read_csv()
  • tab separated files with read_tsv()
  • general delimited files with read_delim()
  • fixed width files with read_fwf()
  • tabular files where columns are separated by white-space with read_table()
  • web log files with read_log()

Download PDF

7 / 17

Slide adapted from Sylvia Canelón

tibble

A tibble is much like the data frame in base R, but optimized for use in the tidyverse.

Among other features, it has nicer printing methods.

Download PDF

8 / 17

Slide adapted from Sylvia Canelón

ggplot2

ggplot2 uses the "Grammar of Graphics" to create a plot. Check out some examples of what's possible from the ggplot2 extensions gallery.

Download PDF

9 / 17

Slide adapted from Sylvia Canelón

dplyr

dplyr includes a host of functions that perform specific and incremental data transformation steps to help you to wrangle your data into exactly the right form you need.

  • Pick observations by their values with filter().

  • Reorder the rows with arrange().

  • Pick variables by their names select().

  • Create new variables with functions of existing variables mutate().

  • Collapse many values down to a single summary summarize().

Download PDF

10 / 17

Slide adapted from Sylvia Canelón

forcats

forcats is great for working with categorical variables or factors.

A few key functions:

  • fct_reorder(): Reordering a factor by another variable

  • fct_infreq(): Reordering a factor by the frequency of values

  • fct_relevel(): Changing the order of a factor by hand

  • fct_lump(): Collapsing the least/most frequent values of a factor into “other”

11 / 17

Slide adapted from Sylvia Canelón

stringr

stringr helps us manipulate strings! The package includes many functions to help us with regular expressions, which are a concise language for describing patterns in strings.

stringr allows you to

  • detect matches
  • subset strings
  • manage string lengths
  • mutate strings
  • join and split strings
  • order strings
  • ...and more!

Download PDF

12 / 17

Slide adapted from Sylvia Canelón

tidyr

tidyr allows you to reshape raw "untidy" data into a "tidy" format.

Here is an example of what a tidy dataframe looks like:

There are three interrelated rules which make a dataset tidy:

  • Each variable must have its own column.
  • Each observation must have its own row.
  • Each value must have its own cell.

Download PDF

13 / 17

Slide adapted from Sylvia Canelón

purrr

purrr provides tools for working with functions and vectors -- in particular, repeating the same operations many times concisely and efficiently

The map() family of functions helps us replace for loops,
making our code easier to read and more succinct.

14 / 17

Q & A

05:00
15 / 17

Next up...

Importing data & Project-oriented workflows

16 / 17

Break!

05:00
17 / 17

Tidyverse

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow