The tidyverse
is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
The tidyverse
is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
These packages are designed to support the natural workflow of any data analysis project, as depicted below
Image from R for Data Science
Slide adapted from Sylvia Canelón
readr
Importing data is the very first step!
You can use readr
to import rectangular data.
Functions for different file types
read_csv()
read_tsv()
read_delim()
read_fwf()
read_table()
read_log()
Slide adapted from Sylvia Canelón
tibble
A tibble
is much like the data frame in base R, but optimized for use in the tidyverse.
Among other features, it has nicer printing methods.
Slide adapted from Sylvia Canelón
ggplot2
ggplot2
uses the "Grammar of Graphics" to create a plot. Check out some examples of what's possible from the ggplot2 extensions gallery.
Slide adapted from Sylvia Canelón
dplyr
dplyr
includes a host of functions that perform specific and incremental data transformation steps to help you to wrangle your data into exactly the right form you need.
Pick observations by their values with filter()
.
Reorder the rows with arrange()
.
Pick variables by their names select()
.
Create new variables with functions of existing variables mutate()
.
Collapse many values down to a single summary summarize()
.
Slide adapted from Sylvia Canelón
forcats
forcats
is great for working with categorical variables or factors.
A few key functions:
fct_reorder()
: Reordering a factor by another variable
fct_infreq()
: Reordering a factor by the frequency of values
fct_relevel()
: Changing the order of a factor by hand
fct_lump()
: Collapsing the least/most frequent values of a factor into “other”
Slide adapted from Sylvia Canelón
stringr
stringr
helps us manipulate strings! The package includes many functions to help us with regular expressions, which are a concise language for describing patterns in strings.
stringr
allows you to
Slide adapted from Sylvia Canelón
tidyr
tidyr
allows you to reshape raw "untidy" data into a "tidy" format.
Here is an example of what a tidy dataframe looks like:
There are three interrelated rules which make a dataset tidy:
Slide adapted from Sylvia Canelón
purrr
purrr
provides tools for working with functions and vectors -- in particular, repeating the same operations many times concisely and efficiently
The map()
family of functions helps us replace for loops,
making our code easier to read and more succinct.
05:00
05:00
The tidyverse
is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
The tidyverse
is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
The tidyverse
is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
These packages are designed to support the natural workflow of any data analysis project, as depicted below
Image from R for Data Science
Slide adapted from Sylvia Canelón
readr
Importing data is the very first step!
You can use readr
to import rectangular data.
Functions for different file types
read_csv()
read_tsv()
read_delim()
read_fwf()
read_table()
read_log()
Slide adapted from Sylvia Canelón
tibble
A tibble
is much like the data frame in base R, but optimized for use in the tidyverse.
Among other features, it has nicer printing methods.
Slide adapted from Sylvia Canelón
ggplot2
ggplot2
uses the "Grammar of Graphics" to create a plot. Check out some examples of what's possible from the ggplot2 extensions gallery.
Slide adapted from Sylvia Canelón
dplyr
dplyr
includes a host of functions that perform specific and incremental data transformation steps to help you to wrangle your data into exactly the right form you need.
Pick observations by their values with filter()
.
Reorder the rows with arrange()
.
Pick variables by their names select()
.
Create new variables with functions of existing variables mutate()
.
Collapse many values down to a single summary summarize()
.
Slide adapted from Sylvia Canelón
forcats
forcats
is great for working with categorical variables or factors.
A few key functions:
fct_reorder()
: Reordering a factor by another variable
fct_infreq()
: Reordering a factor by the frequency of values
fct_relevel()
: Changing the order of a factor by hand
fct_lump()
: Collapsing the least/most frequent values of a factor into “other”
Slide adapted from Sylvia Canelón
stringr
stringr
helps us manipulate strings! The package includes many functions to help us with regular expressions, which are a concise language for describing patterns in strings.
stringr
allows you to
Slide adapted from Sylvia Canelón
tidyr
tidyr
allows you to reshape raw "untidy" data into a "tidy" format.
Here is an example of what a tidy dataframe looks like:
There are three interrelated rules which make a dataset tidy:
Slide adapted from Sylvia Canelón
purrr
purrr
provides tools for working with functions and vectors -- in particular, repeating the same operations many times concisely and efficiently
The map()
family of functions helps us replace for loops,
making our code easier to read and more succinct.
05:00
05:00