class: title-slide, center, middle

# Importing Data & Project-oriented Workflows

---

# Importing data

### Importing data in R generally has 2 commonly challenging aspects...

1. You need to call a function that works with a particular data format (`.csv`, `.txt`, `.sav`, etc.)

2. You need to tell R where to look for the data

---
# Importing data

### Challenge # 1: File types

.pull-left[

.center[
### `readr`

`read_csv()`, `read_tsv()`, `read_delim()`, `read_fwf()`, etc...
]
]

.pull-right[

.center[
### `rio`

`import()`
]
]

---

# Importing data

### Challenge # 1: File types

.pull-left[

.center[
### `readr`

`read_csv()`, `read_tsv()`, `read_delim()`, `read_fwf()`, etc...
]
]

.pull-right[

.center[
### `rio`

`import()`

✅
]
]

---
# Importing data

### `rio::import()`

We just call `import()` and under the hood it calls the right read function given the file's extension (`.csv`, `.txt`, `.sav`, `.xlsx`, etc.)

We'll get some practice with this in a few minutes

---
# Project-oriented workflows

### Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the **working directory**.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

--
***

If you ever get lost, you can print your working directory with `getwd()`

If you are working in a `.Rmd` document, R by default will set whatever folder on your computer where that `.Rmd` file lives as your working directory

```r
getwd()
```

```
## [1] "/Users/bcullen/Desktop/summeR-bootcamp-2020/static/slides"
```

For example, I created these slides in a `.Rmd` document that lives in this folder on my computer ☝️

---
class: split-three
# Project-oriented workflows

### RStudio Projects

The best way to simplify issues with working directories is to use **RStudio Projects**.

--
***

.column[.content[.center[

### Step 1
<img src="images/create_project1.png" width="90%" />
]]]

.column[.content[.center[

### Step 2
<img src="images/create_project2.png" width="90%" />

]]]

.column[.content[.center[

### Step 3
<img src="images/create_project3.png" width="90%" />

]]]

---
# Project-oriented workflows

### RStudio Projects

When you create a Project in RStudio, it is associated with a folder somewhere on your computer.

It will automatically create a `.Rproj` file in that folder, which will keep track of the "top level" of your project.

--
***

For example, we've been using RStudio Projects for "Your Turn" exercises

---
# Project-oriented workflows

### `here::here()`

.pull-left[
In combination with RStudio Projects, use the `here` package

`here::here()` will build a file path to the top level of your project directory.

This makes it easy to tell R where files live relative to the top-level folder of your project

]

.pull-right[
<img src="images/here.png" width="3696" />
]

---
class: inverse 
# Your turn 1

#### 1. Load the `rio` and `here` packages.

#### 2.  Run the following code to import the data called `pragmatic_scales_data.csv`. Why do you get an error? Where is this file saved? *Hint*: Look through the folder(s) in the Files pane

```r
ps_data <- import("pragmatic_scales_data.csv")
```

#### 3. Fix the error in the code above to import the data. *Hint*: use the `here()` function

#### 4. Remember that `rio` is flexible with file types -- `rio::import()` will call the right function under the hood to read in the file based on the file extension. Use `rio` to import `pragmatic_scales_data.sav` (an SPSS file type) and save it to a new object named `ps_data_2`.

---
class: solution
# Solution

.panelset[
.panel[.panel-name[Q1]

```r
# Q1.
library(rio)
library(here)
```
]

.panel[.panel-name[Q2]

```r
# Q2. 
ps_data <- import("pragmatic_scales_data.csv")
```

```
## Error in import("pragmatic_scales_data.csv"): No such file
```

***

The file **pragmatic_scales_data.csv** is saved in the **data** folder, so we need to tell R to look in that folder.
]

.panel[.panel-name[Q3]

```r
# Q3.
ps_data <- import(here("data/pragmatic_scales_data.csv"))
```
]

.panel[.panel-name[Q4]

```r
# Q4.
ps_data_2 <- import(here("data/pragmatic_scales_data.sav"))
```
]
]

---
# Exporting data

You can also use `rio` to export your data using `export()`, saving it in any of the formats that it works with.

--
***

Here are the arguments you will need to use for `export()`

```r
export(x, file)
```

`x` is the `data.frame` object in your RStudio Environment you want to export

`file` is the path/filename for the resulting file

--
***

For example, let's say I want to export `ps_data` as an `.xlsx` file and put it into the `data/` subdirectory.

```r
export(ps_data, here::here("data/ps_data.xlsx"))
```

---
class: inverse 
# Your turn 2

1. Look through the Files pane and find the file `another_data_set.csv`. Make note of what subdirectory it is saved in. Import the data and save to an object called `another_df`.

1. Now export the data you just imported and save it into the `data/` directory. Make sure the name of the resulting file is `another_data_set`, and export it as a `.xlsx` file.

1. One of your colleagues insists you send them a `.sav` file so that they can run the analyses in SPSS. Make another copy of `another_data_set` in the `data/` subdirectory that is in the `.sav` format.

1. Finally, let's read one of these datasets to make sure everything worked as expected. Import `another_data_set.sav` , which you just created, and import it, saving it to a new object named `another_df_2`.

---
class: solution
# Solution

.panelset[
.panel[.panel-name[Q1]

```r
# Q1.
another_df <- import(here("data/more_data/another_data_set.csv"))
```

]

.panel[.panel-name[Q2]

```r
# Q2.
export(another_df, here("data/another_data_set.xlsx"))
```
]

.panel[.panel-name[Q3]

```r
# Q3.
export(another_df, here("data/another_data_set.sav"))
```

]

.panel[.panel-name[Q4]

```r
# Q4.
another_df_2 <- import(here("data/another_data_set.sav"))
```
]
]

---
# Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

--
***

### `View()`

One way is to click on the `View` button in the environment pane...

You should see `ps_data` in the environment pane with a little data table icon at the far right. Click on that icon.

You'll notice that this ran `View(ps_data)` in the console. We could have instead just typed this directly ourselves -- notice the capital `V` in `View()` 👀

---
# Viewing data

### `head()` and `tail()`

.panelset[
.panel[.panel-name[`head()`]

You can also see just the first few rows of a dataframe with `head()`, which is especially helpful for large data sets

```r
head(ps_data)
```

```
##   subid   item correct  age condition
## 1   M22  faces       1 2.00     Label
## 2   M22 houses       1 2.00     Label
## 3   M22  pasta       0 2.00     Label
## 4   M22   beds       0 2.00     Label
## 5   T22   beds       0 2.13     Label
## 6   T22  faces       0 2.13     Label
```
]

.panel[.panel-name[`tail()`]

`tail()` is the complement to `head()`, displaying just the final rows from a dataframe

```r
tail(ps_data)
```

```
##      subid   item correct  age condition
## 583 MSCH84  pasta       1 2.83  No Label
## 584 MSCH84   beds       0 2.83  No Label
## 585 MSCH85  faces       0 2.69  No Label
## 586 MSCH85 houses       0 2.69  No Label
## 587 MSCH85  pasta       0 2.69  No Label
## 588 MSCH85   beds       0 2.69  No Label
```
]
]

---
# Viewing data

### `str()` and `glimpse()`

.panelset[
.panel[.panel-name[`str()`]

We saw `str()` when we first introduced data frames. It's worth mentioning it again because it can be so useful when you import data to see how your variables were read in (i.e. their types)

```r
str(ps_data)
```

```
## 'data.frame':	588 obs. of  5 variables:
##  $ subid    : chr  "M22" "M22" "M22" "M22" ...
##  $ item     : chr  "faces" "houses" "pasta" "beds" ...
##  $ correct  : int  1 1 0 0 0 0 1 1 0 0 ...
##  $ age      : num  2 2 2 2 2.13 2.13 2.13 2.13 2.32 2.32 ...
##  $ condition: chr  "Label" "Label" "Label" "Label" ...
```

]

.panel[.panel-name[`glimpse()`]

`glimpse()` is very similar to `str()` but is a tidyverse function, and it shows you a little more raw data

```r
glimpse(ps_data)
```

```
## Rows: 588
## Columns: 5
## $ subid <chr> "M22", "M22", "M22", "M22", "T22", "T22", "T22", "T22", "T1…
## $ item <chr> "faces", "houses", "pasta", "beds", "beds", "faces", "house…
## $ correct <int> 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,…
## $ age <dbl> 2.00, 2.00, 2.00, 2.00, 2.13, 2.13, 2.13, 2.13, 2.32, 2.32,…
## $ condition <chr> "Label", "Label", "Label", "Label", "Label", "Label", "Labe…
```

]
]

---
class: inverse

# Your turn 3

1. Take a look at `another_df`, which should be saved in your Global Environment. Click the "View" button in the Environment pane, and also use `View()` in your Console.

1. Now look at some summary information about `another_df` using `str()` and `glimpse()`. *Hint*. You will need to load the `tidyverse` package first in order to use `glimpse()`.

1. Lastly find the number of rows and columns in `another_df` using `nrow()` and `ncol()`, respectively. Make sure your answers match the summary information given to you above.

---
class: solution

# Solution

.panelset[
.panel[.panel-name[Q1]

```r
View(another_df)
```
]

.panel[.panel-name[Q2]

```r
library(tidyverse)

str(another_df)
```

```
## 'data.frame':	32 obs. of  4 variables:
##  $ subid  : chr  "A001" "A001" "A001" "A001" ...
##  $ stimuli: chr  "A" "B" "C" "D" ...
##  $ correct: int  0 0 1 0 1 1 1 1 0 0 ...
##  $ age    : num  2.5 2.5 2.5 2.5 2.75 2.75 2.75 2.75 3.6 3.6 ...
```

```r
glimpse(another_df)
```

```
## Rows: 32
## Columns: 4
## $ subid <chr> "A001", "A001", "A001", "A001", "B002", "B002", "B002", "B002…
## $ stimuli <chr> "A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D", "…
## $ correct <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1…
## $ age <dbl> 2.50, 2.50, 2.50, 2.50, 2.75, 2.75, 2.75, 2.75, 3.60, 3.60, 3…
```
]

.panel[.panel-name[Q3]

```r
nrow(another_df)
```

```
## [1] 32
```

```r
ncol(another_df)
```

```
## [1] 4
```

]
]
 
---
class: inverse, center, middle
# Q & A

---
class: inverse, center, middle
# Next up...

## Data visualization with `ggplot2`

---
class: inverse, center, middle
# Break!

Notes for current slide

Notes for next slide

Importing Data & Project-oriented Workflows1 / 44

Importing dataImporting data in R generally has 2 commonly challenging aspects...2 / 44

Importing dataImporting data in R generally has 2 commonly challenging aspects...You need to call a function that works with a particular data format (.csv, .txt, .sav, etc.)
3 / 44

Importing data

Importing data in R generally has 2 commonly challenging aspects...

You need to call a function that works with a particular data format (.csv, .txt, .sav, etc.)
You need to tell R where to look for the data

4 / 44

Importing dataChallenge # 1: File types5 / 44

Importing data

Challenge # 1: File types

`readr`

read_csv(), read_tsv(), read_delim(), read_fwf(), etc...

6 / 44

Importing data

Challenge # 1: File types

`readr`

read_csv(), read_tsv(), read_delim(), read_fwf(), etc...

`rio`

import()

7 / 44

Importing data

Challenge # 1: File types

`readr`

read_csv(), read_tsv(), read_delim(), read_fwf(), etc...

`rio`

import()

✅

8 / 44

Importing data

`rio::import()`

We just call import() and under the hood it calls the right read function given the file's extension (.csv, .txt, .sav, .xlsx, etc.)

9 / 44

Importing data

`rio::import()`

We just call import() and under the hood it calls the right read function given the file's extension (.csv, .txt, .sav, .xlsx, etc.)

We'll get some practice with this in a few minutes

10 / 44

Project-oriented workflowsChallenge # 2 : File paths11 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

12 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

13 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

If you ever get lost, you can print your working directory with getwd()

14 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

If you ever get lost, you can print your working directory with getwd()

If you are working in a .Rmd document, R by default will set whatever folder on your computer where that .Rmd file lives as your working directory

15 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

If you ever get lost, you can print your working directory with getwd()

If you are working in a .Rmd document, R by default will set whatever folder on your computer where that .Rmd file lives as your working directory

getwd()

## [1] "/Users/bcullen/Desktop/summeR-bootcamp-2020/static/slides"

For example, I created these slides in a .Rmd document that lives in this folder on my computer ☝️

16 / 44

Project-oriented workflows

RStudio Projects

The best way to simplify issues with working directories is to use RStudio Projects.

17 / 44

Project-oriented workflows

RStudio Projects

The best way to simplify issues with working directories is to use RStudio Projects.

Step 1

Step 2

Step 3

18 / 44

Project-oriented workflows

RStudio Projects

When you create a Project in RStudio, it is associated with a folder somewhere on your computer.

19 / 44

Project-oriented workflows

RStudio Projects

When you create a Project in RStudio, it is associated with a folder somewhere on your computer.

It will automatically create a .Rproj file in that folder, which will keep track of the "top level" of your project.

20 / 44

Project-oriented workflows

RStudio Projects

When you create a Project in RStudio, it is associated with a folder somewhere on your computer.

It will automatically create a .Rproj file in that folder, which will keep track of the "top level" of your project.

For example, we've been using RStudio Projects for "Your Turn" exercises

21 / 44

Project-oriented workflowshere::here()22 / 44

Project-oriented workflows

`here::here()`

In combination with RStudio Projects, use the here package

here::here() will build a file path to the top level of your project directory.

This makes it easy to tell R where files live relative to the top-level folder of your project

23 / 44

Your turn 1

04:00

1. Load the `rio` and `here` packages.

2. Run the following code to import the data called `pragmatic_scales_data.csv`. Why do you get an error? Where is this file saved? Hint: Look through the folder(s) in the Files pane

ps_data <- import("pragmatic_scales_data.csv")

3. Fix the error in the code above to import the data. Hint: use the `here()` function

4. Remember that `rio` is flexible with file types -- `rio::import()` will call the right function under the hood to read in the file based on the file extension. Use `rio` to import `pragmatic_scales_data.sav` (an SPSS file type) and save it to a new object named `ps_data_2`.

24 / 44

Solution

# Q1.
library(rio)
library(here)

# Q2. 
ps_data <- import("pragmatic_scales_data.csv")

## Error in import("pragmatic_scales_data.csv"): No such file

The file pragmatic_scales_data.csv is saved in the data folder, so we need to tell R to look in that folder.

# Q3.
ps_data <- import(here("data/pragmatic_scales_data.csv"))

# Q4.
ps_data_2 <- import(here("data/pragmatic_scales_data.sav"))

25 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

26 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

Here are the arguments you will need to use for export()

export(x, file)

27 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

Here are the arguments you will need to use for export()

export(x, file)

x is the data.frame object in your RStudio Environment you want to export

28 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

Here are the arguments you will need to use for export()

export(x, file)

x is the data.frame object in your RStudio Environment you want to export

file is the path/filename for the resulting file

29 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

Here are the arguments you will need to use for export()

export(x, file)

x is the data.frame object in your RStudio Environment you want to export

file is the path/filename for the resulting file

For example, let's say I want to export ps_data as an .xlsx file and put it into the data/ subdirectory.

export(ps_data, here::here("data/ps_data.xlsx"))

30 / 44

Your turn 2

04:00

Look through the Files pane and find the file another_data_set.csv. Make note of what subdirectory it is saved in. Import the data and save to an object called another_df.
Now export the data you just imported and save it into the data/ directory. Make sure the name of the resulting file is another_data_set, and export it as a .xlsx file.
One of your colleagues insists you send them a .sav file so that they can run the analyses in SPSS. Make another copy of another_data_set in the data/ subdirectory that is in the .sav format.
Finally, let's read one of these datasets to make sure everything worked as expected. Import another_data_set.sav , which you just created, and import it, saving it to a new object named another_df_2.

31 / 44

SolutionQ1
Q2
Q3
Q4
# Q1.
another_df <- import(here("data/more_data/another_data_set.csv"))
# Q2.
export(another_df, here("data/another_data_set.xlsx"))
# Q3.
export(another_df, here("data/another_data_set.sav"))
# Q4.
another_df_2 <- import(here("data/another_data_set.sav"))
32 / 44

Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

33 / 44

Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

`View()`

One way is to click on the View button in the environment pane...

34 / 44

Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

`View()`

One way is to click on the View button in the environment pane...

You should see ps_data in the environment pane with a little data table icon at the far right. Click on that icon.

35 / 44

Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

`View()`

One way is to click on the View button in the environment pane...

You should see ps_data in the environment pane with a little data table icon at the far right. Click on that icon.

You'll notice that this ran View(ps_data) in the console. We could have instead just typed this directly ourselves -- notice the capital V in View() 👀

36 / 44

Viewing datahead() and tail()37 / 44

Viewing data

`head()` and `tail()`

You can also see just the first few rows of a dataframe with head(), which is especially helpful for large data sets

head(ps_data)

##   subid   item correct  age condition
## 1   M22  faces       1 2.00     Label
## 2   M22 houses       1 2.00     Label
## 3   M22  pasta       0 2.00     Label
## 4   M22   beds       0 2.00     Label
## 5   T22   beds       0 2.13     Label
## 6   T22  faces       0 2.13     Label

tail() is the complement to head(), displaying just the final rows from a dataframe

tail(ps_data)

##      subid   item correct  age condition
## 583 MSCH84  pasta       1 2.83  No Label
## 584 MSCH84   beds       0 2.83  No Label
## 585 MSCH85  faces       0 2.69  No Label
## 586 MSCH85 houses       0 2.69  No Label
## 587 MSCH85  pasta       0 2.69  No Label
## 588 MSCH85   beds       0 2.69  No Label

38 / 44

Viewing data

`str()` and `glimpse()`

We saw str() when we first introduced data frames. It's worth mentioning it again because it can be so useful when you import data to see how your variables were read in (i.e. their types)

str(ps_data)

## 'data.frame':    588 obs. of  5 variables:
##  $ subid    : chr  "M22" "M22" "M22" "M22" ...
##  $ item     : chr  "faces" "houses" "pasta" "beds" ...
##  $ correct  : int  1 1 0 0 0 0 1 1 0 0 ...
##  $ age      : num  2 2 2 2 2.13 2.13 2.13 2.13 2.32 2.32 ...
##  $ condition: chr  "Label" "Label" "Label" "Label" ...

glimpse() is very similar to str() but is a tidyverse function, and it shows you a little more raw data

glimpse(ps_data)

## Rows: 588
## Columns: 5
## $ subid     <chr> "M22", "M22", "M22", "M22", "T22", "T22", "T22", "T22", "T1…
## $ item      <chr> "faces", "houses", "pasta", "beds", "beds", "faces", "house…
## $ correct   <int> 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,…
## $ age       <dbl> 2.00, 2.00, 2.00, 2.00, 2.13, 2.13, 2.13, 2.13, 2.32, 2.32,…
## $ condition <chr> "Label", "Label", "Label", "Label", "Label", "Label", "Labe…

39 / 44

Your turn 3

02:00

Take a look at another_df, which should be saved in your Global Environment. Click the "View" button in the Environment pane, and also use View() in your Console.
Now look at some summary information about another_df using str() and glimpse(). Hint. You will need to load the tidyverse package first in order to use glimpse().
Lastly find the number of rows and columns in another_df using nrow() and ncol(), respectively. Make sure your answers match the summary information given to you above.

40 / 44

Solution

View(another_df)

library(tidyverse)
str(another_df)

## 'data.frame':    32 obs. of  4 variables:
##  $ subid  : chr  "A001" "A001" "A001" "A001" ...
##  $ stimuli: chr  "A" "B" "C" "D" ...
##  $ correct: int  0 0 1 0 1 1 1 1 0 0 ...
##  $ age    : num  2.5 2.5 2.5 2.5 2.75 2.75 2.75 2.75 3.6 3.6 ...

glimpse(another_df)

## Rows: 32
## Columns: 4
## $ subid   <chr> "A001", "A001", "A001", "A001", "B002", "B002", "B002", "B002…
## $ stimuli <chr> "A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D", "…
## $ correct <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1…
## $ age     <dbl> 2.50, 2.50, 2.50, 2.50, 2.75, 2.75, 2.75, 2.75, 3.60, 3.60, 3…

nrow(another_df)

## [1] 32

ncol(another_df)

## [1] 4

41 / 44

Q & A
05:00
42 / 44

Next up...Data visualization with ggplot243 / 44

Break!
10:00
44 / 44

Importing dataImporting data in R generally has 2 commonly challenging aspects...2 / 44

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

Esc	Back to slideshow

Importing Data & Project-oriented Workflows1 / 44

Importing dataImporting data in R generally has 2 commonly challenging aspects...2 / 44

Importing dataImporting data in R generally has 2 commonly challenging aspects...You need to call a function that works with a particular data format (.csv, .txt, .sav, etc.)
3 / 44

Importing data

Importing data in R generally has 2 commonly challenging aspects...

You need to call a function that works with a particular data format (.csv, .txt, .sav, etc.)
You need to tell R where to look for the data

4 / 44

Importing dataChallenge # 1: File types5 / 44

Importing data

Challenge # 1: File types

`readr`

read_csv(), read_tsv(), read_delim(), read_fwf(), etc...

6 / 44

Importing data

Challenge # 1: File types

`readr`

read_csv(), read_tsv(), read_delim(), read_fwf(), etc...

`rio`

import()

7 / 44

Importing data

Challenge # 1: File types

`readr`

read_csv(), read_tsv(), read_delim(), read_fwf(), etc...

`rio`

import()

✅

8 / 44

Importing data

`rio::import()`

We just call import() and under the hood it calls the right read function given the file's extension (.csv, .txt, .sav, .xlsx, etc.)

9 / 44

Importing data

`rio::import()`

We just call import() and under the hood it calls the right read function given the file's extension (.csv, .txt, .sav, .xlsx, etc.)

We'll get some practice with this in a few minutes

10 / 44

Project-oriented workflowsChallenge # 2 : File paths11 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

12 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

13 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

If you ever get lost, you can print your working directory with getwd()

14 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

If you ever get lost, you can print your working directory with getwd()

If you are working in a .Rmd document, R by default will set whatever folder on your computer where that .Rmd file lives as your working directory

15 / 44

Project-oriented workflows

Challenge # 2 : File paths

When R looks for a file, it has a starting point. This is called the working directory.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio...

If you ever get lost, you can print your working directory with getwd()

If you are working in a .Rmd document, R by default will set whatever folder on your computer where that .Rmd file lives as your working directory

getwd()

## [1] "/Users/bcullen/Desktop/summeR-bootcamp-2020/static/slides"

For example, I created these slides in a .Rmd document that lives in this folder on my computer ☝️

16 / 44

Project-oriented workflows

RStudio Projects

The best way to simplify issues with working directories is to use RStudio Projects.

17 / 44

Project-oriented workflows

RStudio Projects

The best way to simplify issues with working directories is to use RStudio Projects.

Step 1

Step 2

Step 3

18 / 44

Project-oriented workflows

RStudio Projects

When you create a Project in RStudio, it is associated with a folder somewhere on your computer.

19 / 44

Project-oriented workflows

RStudio Projects

When you create a Project in RStudio, it is associated with a folder somewhere on your computer.

It will automatically create a .Rproj file in that folder, which will keep track of the "top level" of your project.

20 / 44

Project-oriented workflows

RStudio Projects

When you create a Project in RStudio, it is associated with a folder somewhere on your computer.

It will automatically create a .Rproj file in that folder, which will keep track of the "top level" of your project.

For example, we've been using RStudio Projects for "Your Turn" exercises

21 / 44

Project-oriented workflowshere::here()22 / 44

Project-oriented workflows

`here::here()`

In combination with RStudio Projects, use the here package

here::here() will build a file path to the top level of your project directory.

This makes it easy to tell R where files live relative to the top-level folder of your project

23 / 44

Your turn 1

04:00

1. Load the `rio` and `here` packages.

2. Run the following code to import the data called `pragmatic_scales_data.csv`. Why do you get an error? Where is this file saved? Hint: Look through the folder(s) in the Files pane

ps_data <- import("pragmatic_scales_data.csv")

3. Fix the error in the code above to import the data. Hint: use the `here()` function

4. Remember that `rio` is flexible with file types -- `rio::import()` will call the right function under the hood to read in the file based on the file extension. Use `rio` to import `pragmatic_scales_data.sav` (an SPSS file type) and save it to a new object named `ps_data_2`.

24 / 44

Solution

# Q1.
library(rio)
library(here)

# Q2. 
ps_data <- import("pragmatic_scales_data.csv")

## Error in import("pragmatic_scales_data.csv"): No such file

The file pragmatic_scales_data.csv is saved in the data folder, so we need to tell R to look in that folder.

# Q3.
ps_data <- import(here("data/pragmatic_scales_data.csv"))

# Q4.
ps_data_2 <- import(here("data/pragmatic_scales_data.sav"))

25 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

26 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

Here are the arguments you will need to use for export()

export(x, file)

27 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

Here are the arguments you will need to use for export()

export(x, file)

x is the data.frame object in your RStudio Environment you want to export

28 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

Here are the arguments you will need to use for export()

export(x, file)

x is the data.frame object in your RStudio Environment you want to export

file is the path/filename for the resulting file

29 / 44

Exporting data

You can also use rio to export your data using export(), saving it in any of the formats that it works with.

Here are the arguments you will need to use for export()

export(x, file)

x is the data.frame object in your RStudio Environment you want to export

file is the path/filename for the resulting file

For example, let's say I want to export ps_data as an .xlsx file and put it into the data/ subdirectory.

export(ps_data, here::here("data/ps_data.xlsx"))

30 / 44

Your turn 2

04:00

Look through the Files pane and find the file another_data_set.csv. Make note of what subdirectory it is saved in. Import the data and save to an object called another_df.
Now export the data you just imported and save it into the data/ directory. Make sure the name of the resulting file is another_data_set, and export it as a .xlsx file.
One of your colleagues insists you send them a .sav file so that they can run the analyses in SPSS. Make another copy of another_data_set in the data/ subdirectory that is in the .sav format.
Finally, let's read one of these datasets to make sure everything worked as expected. Import another_data_set.sav , which you just created, and import it, saving it to a new object named another_df_2.

31 / 44

SolutionQ1
Q2
Q3
Q4
# Q1.
another_df <- import(here("data/more_data/another_data_set.csv"))
# Q2.
export(another_df, here("data/another_data_set.xlsx"))
# Q3.
export(another_df, here("data/another_data_set.sav"))
# Q4.
another_df_2 <- import(here("data/another_data_set.sav"))
32 / 44

Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

33 / 44

Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

`View()`

One way is to click on the View button in the environment pane...

34 / 44

Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

`View()`

One way is to click on the View button in the environment pane...

You should see ps_data in the environment pane with a little data table icon at the far right. Click on that icon.

35 / 44

Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, which each offer different information.

`View()`

One way is to click on the View button in the environment pane...

You should see ps_data in the environment pane with a little data table icon at the far right. Click on that icon.

You'll notice that this ran View(ps_data) in the console. We could have instead just typed this directly ourselves -- notice the capital V in View() 👀

36 / 44

Viewing datahead() and tail()37 / 44

Viewing data

`head()` and `tail()`

You can also see just the first few rows of a dataframe with head(), which is especially helpful for large data sets

head(ps_data)

##   subid   item correct  age condition
## 1   M22  faces       1 2.00     Label
## 2   M22 houses       1 2.00     Label
## 3   M22  pasta       0 2.00     Label
## 4   M22   beds       0 2.00     Label
## 5   T22   beds       0 2.13     Label
## 6   T22  faces       0 2.13     Label

tail() is the complement to head(), displaying just the final rows from a dataframe

tail(ps_data)

##      subid   item correct  age condition
## 583 MSCH84  pasta       1 2.83  No Label
## 584 MSCH84   beds       0 2.83  No Label
## 585 MSCH85  faces       0 2.69  No Label
## 586 MSCH85 houses       0 2.69  No Label
## 587 MSCH85  pasta       0 2.69  No Label
## 588 MSCH85   beds       0 2.69  No Label

38 / 44

Viewing data

`str()` and `glimpse()`

We saw str() when we first introduced data frames. It's worth mentioning it again because it can be so useful when you import data to see how your variables were read in (i.e. their types)

str(ps_data)

## 'data.frame':    588 obs. of  5 variables:
##  $ subid    : chr  "M22" "M22" "M22" "M22" ...
##  $ item     : chr  "faces" "houses" "pasta" "beds" ...
##  $ correct  : int  1 1 0 0 0 0 1 1 0 0 ...
##  $ age      : num  2 2 2 2 2.13 2.13 2.13 2.13 2.32 2.32 ...
##  $ condition: chr  "Label" "Label" "Label" "Label" ...

glimpse() is very similar to str() but is a tidyverse function, and it shows you a little more raw data

glimpse(ps_data)

## Rows: 588
## Columns: 5
## $ subid     <chr> "M22", "M22", "M22", "M22", "T22", "T22", "T22", "T22", "T1…
## $ item      <chr> "faces", "houses", "pasta", "beds", "beds", "faces", "house…
## $ correct   <int> 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,…
## $ age       <dbl> 2.00, 2.00, 2.00, 2.00, 2.13, 2.13, 2.13, 2.13, 2.32, 2.32,…
## $ condition <chr> "Label", "Label", "Label", "Label", "Label", "Label", "Labe…

39 / 44

Your turn 3

02:00

Take a look at another_df, which should be saved in your Global Environment. Click the "View" button in the Environment pane, and also use View() in your Console.
Now look at some summary information about another_df using str() and glimpse(). Hint. You will need to load the tidyverse package first in order to use glimpse().
Lastly find the number of rows and columns in another_df using nrow() and ncol(), respectively. Make sure your answers match the summary information given to you above.

40 / 44

Solution

View(another_df)

library(tidyverse)
str(another_df)

## 'data.frame':    32 obs. of  4 variables:
##  $ subid  : chr  "A001" "A001" "A001" "A001" ...
##  $ stimuli: chr  "A" "B" "C" "D" ...
##  $ correct: int  0 0 1 0 1 1 1 1 0 0 ...
##  $ age    : num  2.5 2.5 2.5 2.5 2.75 2.75 2.75 2.75 3.6 3.6 ...

glimpse(another_df)

## Rows: 32
## Columns: 4
## $ subid   <chr> "A001", "A001", "A001", "A001", "B002", "B002", "B002", "B002…
## $ stimuli <chr> "A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D", "…
## $ correct <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1…
## $ age     <dbl> 2.50, 2.50, 2.50, 2.50, 2.75, 2.75, 2.75, 2.75, 3.60, 3.60, 3…

nrow(another_df)

## [1] 32

ncol(another_df)

## [1] 4

41 / 44

Q & A
05:00
42 / 44

Next up...Data visualization with ggplot243 / 44

Break!
10:00
44 / 44

Importing Data & Project-oriented Workflows

Importing data

Importing data in R generally has 2 commonly challenging aspects...

Importing data

Importing data in R generally has 2 commonly challenging aspects...

Importing data

Importing data in R generally has 2 commonly challenging aspects...

Importing data

Challenge # 1: File types

Importing data

Challenge # 1: File types

readr

Importing data

Challenge # 1: File types

readr

rio

Importing data

Challenge # 1: File types

readr

rio

Importing data

rio::import()

Importing data

rio::import()

Project-oriented workflows

Challenge # 2 : File paths

Project-oriented workflows

Challenge # 2 : File paths

Project-oriented workflows

Challenge # 2 : File paths

Project-oriented workflows

Challenge # 2 : File paths

Project-oriented workflows

Challenge # 2 : File paths

Project-oriented workflows

Challenge # 2 : File paths

Project-oriented workflows

RStudio Projects

Project-oriented workflows

RStudio Projects

Step 1

Step 2

Step 3

Project-oriented workflows

RStudio Projects

Project-oriented workflows

RStudio Projects

Project-oriented workflows

RStudio Projects

Project-oriented workflows

here::here()

Project-oriented workflows

here::here()

Your turn 1

1. Load the rio and here packages.

2. Run the following code to import the data called pragmatic_scales_data.csv. Why do you get an error? Where is this file saved? Hint: Look through the folder(s) in the Files pane

3. Fix the error in the code above to import the data. Hint: use the here() function

4. Remember that rio is flexible with file types -- rio::import() will call the right function under the hood to read in the file based on the file extension. Use rio to import pragmatic_scales_data.sav (an SPSS file type) and save it to a new object named ps_data_2.

Solution

Exporting data

Exporting data

Exporting data

Exporting data

Exporting data

Your turn 2

Solution

Viewing data

Viewing data

View()

Viewing data

View()

Viewing data

View()

Viewing data

head() and tail()

Viewing data

head() and tail()

Viewing data

str() and glimpse()

Your turn 3

`readr`

`readr`

`rio`

`readr`

`rio`

`rio::import()`

`rio::import()`

`here::here()`

`here::here()`

1. Load the `rio` and `here` packages.

2. Run the following code to import the data called `pragmatic_scales_data.csv`. Why do you get an error? Where is this file saved? Hint: Look through the folder(s) in the Files pane

3. Fix the error in the code above to import the data. Hint: use the `here()` function

4. Remember that `rio` is flexible with file types -- `rio::import()` will call the right function under the hood to read in the file based on the file extension. Use `rio` to import `pragmatic_scales_data.sav` (an SPSS file type) and save it to a new object named `ps_data_2`.

`View()`

`View()`

`View()`

`head()` and `tail()`

`head()` and `tail()`

`str()` and `glimpse()`

Data visualization with `ggplot2`

`readr`

`readr`

`rio`

`readr`

`rio`

`rio::import()`

`rio::import()`

`here::here()`

`here::here()`

1. Load the `rio` and `here` packages.

2. Run the following code to import the data called `pragmatic_scales_data.csv`. Why do you get an error? Where is this file saved? Hint: Look through the folder(s) in the Files pane

3. Fix the error in the code above to import the data. Hint: use the `here()` function

4. Remember that `rio` is flexible with file types -- `rio::import()` will call the right function under the hood to read in the file based on the file extension. Use `rio` to import `pragmatic_scales_data.sav` (an SPSS file type) and save it to a new object named `ps_data_2`.