<- c(31, 29, 27, 25, 23, 21)
ages_int <- c(
names_char "Gabriel", "Britni", "Michael", "Christiana", "Olivia", "Hannah"
)names(ages_int) <- names_char
Lesson 7: Lists and Tibbles
Gabriel Odom and Data Carpentry Collaborators
Review
What did we learn last class?
- Atomic Vectors
- Basic Subsetting
- Type Coercion
- Attributes
- Named Subsetting
- Factors
- Missing Values
Outline
First of all, I’d like to acknowledge once again the Data Carpentry instructors for the “R for Ecology” lesson. In this lesson, we will cover the following:
- Lists
- Tibbles and Data Frames
- Positional Subsetting
- Relational Subsetting
- Named Subsetting
Reading, Videos, and Assignments
- Watch: https://www.youtube.com/playlist?list=PLHKz7LkI0oR2B3z24nkzEYqwTtO2F9vu0
- Read:
- HOPR section 5.7-5.8, 5.11 (https://rstudio-education.github.io/hopr/r-objects.html)
- Sections 3.5-3.7 of the “Vectors” chapter of Advanced R (https://adv-r.hadley.nz/vectors-chap.html)
- Read HOPR chapter 6 (https://rstudio-education.github.io/hopr/r-notation.html)
- Sections 4.1-4.4 of the “Subsetting” chapter of Advanced R (https://adv-r.hadley.nz/subsetting.html)
- Read the “Tibbles” chapter (10) of the First Edition of R4DS (https://r4ds.had.co.nz/tibbles.html)
- Do: Complete the exercises from the readings; for HOPR, replace any call to the
data.frame()
function withtibble()
)
Overview
Review of Objects
Recall that everything in R is an object. The objects in R are broken up primarily like this:
Also, because R is a vectorised language, most non-function objects we encounter are vectors. Vectors in R are broken into two main types: atomic and non-atomic. Atomic vectors mean that all of the elements in the vector have the same type or class. Non-atomic vectors are everything else. Last lesson, we worked on creating and subsetting atomic vectors.
Object Dimensions
The rules and techniques for subsetting data depend on two things:
- Is the object atomic or non-atomic? E.g., an integer vector vs a list.
- Is the object 1-dimensional or higher-dimensional? E.g., a list vs a tibble.
For these considerations, we use the class of the object to decide the proper subsetting rules (rather than the type).
Here is a table of example vector types, their dimensions, and their resulting classes (this is not exhaustive):
Atomic | Non-Atomic | |
---|---|---|
1-D | logical |
list |
2-D | matrix |
data.frame |
Lists: 1-D Non-Atomic Vectors
Example Data
Names and Ages
Last lesson, we had created a basic vector for my sibling’s ages, named with their first names.
Electronic ID Object
Also, recall the ID vector we created in the last lesson:
<- c(
DrGabriel Surname = "Odom",
FirstName = "Gabriel",
HighestDegrees = "PhD, ThD",
Age = 31,
City = "Pembroke Pines",
State = "FL",
MovedFrom = "TX",
timeEmployed = 2.1
)
We discovered last lesson that the values for Age
and timeEmployed
were coerced from numeric to character information due to the restrictions of atomic vectors. So far, we have created ID vectors for ourselves, but these do not allow us to mix different classes of data. We need something more flexible than atomic vectors. Non-atomic vectors are vectors such that the type / class of each element in the vector is not forced to be the same.
Creating Lists
The most common example of a non-atomic vector is a list. Lists are vectors that can have any class of object as its elements—even other lists! They are the opposite of atomic. We have seen one such example so far: the attributes of an atomic vector are a list:
attributes(DrGabriel)
$names
[1] "Surname" "FirstName" "HighestDegrees" "Age"
[5] "City" "State" "MovedFrom" "timeEmployed"
class(attributes(DrGabriel))
[1] "list"
Lists give us flexibility to create an ID vector that has different classes of information, including other vectors! We create a new list with the function list()
:
# Create a List
<- list(
DrGabriel_ls Surname = "Odom",
Forename = "Gabriel",
Male = TRUE,
City = "Pembroke Pines",
State = factor("FL", levels = state.abb),
CurrZIP = 33025,
MovedFrom = "TX",
Age = 31,
MaxDegree = c("PhD", "ThD"),
TimeEmpl = 2.1
)
# Inspect
DrGabriel_ls
$Surname
[1] "Odom"
$Forename
[1] "Gabriel"
$Male
[1] TRUE
$City
[1] "Pembroke Pines"
$State
[1] FL
50 Levels: AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD ... WY
$CurrZIP
[1] 33025
$MovedFrom
[1] "TX"
$Age
[1] 31
$MaxDegree
[1] "PhD" "ThD"
$TimeEmpl
[1] 2.1
If we want to inspect the internal structure of this list, we can see that the classes of our original atomic vectors have been preserved. There is no type coercion in a list.
str(DrGabriel_ls)
List of 10
$ Surname : chr "Odom"
$ Forename : chr "Gabriel"
$ Male : logi TRUE
$ City : chr "Pembroke Pines"
$ State : Factor w/ 50 levels "AL","AK","AZ",..: 9
$ CurrZIP : num 33025
$ MovedFrom: chr "TX"
$ Age : num 31
$ MaxDegree: chr [1:2] "PhD" "ThD"
$ TimeEmpl : num 2.1
As with the atomic vectors, we can also find what R thinks about lists and how R stores this information:
class(DrGabriel_ls)
[1] "list"
typeof(DrGabriel_ls)
[1] "list"
Tibbles: 2-D Non-Atomic Vectors
A tibble is the representation of data in the format of a table where the columns are vectors that all have the same length. Because columns are vectors, each column must contain a single type of data (e.g., characters, integers, logicals, etc). For example, here is a figure depicting a tibble comprised of numeric and character atomic vectors.
Creating Tibbles
In order to use tibbles, we need functions from the tidyverse
. A tibble can be created by hand, but most commonly we generate them by importing tabular / spreasheet data (with the functions read_csv()
or read_delim()
; more on these in a week or so).
library(tidyverse)
We can create the same ID card as we did with the list()
function, but in a rectangular / tabular form. We create a tibble with the tibble()
function. Similar to the c()
function, we can supply as many inputs as we want, as long as they are all vectors with the same length (notice that I changed the “highest degrees” back to a character vector with length 1 because of this restriction).
# Create a Tibble
<- tibble(
DrGabriel_df Surname = "Odom",
FirstName = "Gabriel",
HighestDegrees = "PhD, ThD",
Age = 31,
City = "Pembroke Pines",
State = "FL",
MovedFrom = "TX",
TimeEmployed = 2.1
)
# Inspect
DrGabriel_df
# A tibble: 1 × 8
Surname FirstName HighestDegrees Age City State MovedFrom TimeEmployed
<chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
1 Odom Gabriel PhD, ThD 31 Pembroke … FL TX 2.1
(Technically I could have left the degrees entry as an atomic vector with 2 entries, but the tibble()
function would have to try to figure out what I meant. Sometimes it guesses correctly, sometimes not. We will discuss the tibble()
function more later.)
As with the atomic vectors, we can also find what R thinks about lists and how R stores this information:
class(DrGabriel_df)
[1] "tbl_df" "tbl" "data.frame"
typeof(DrGabriel_df)
[1] "list"
Notice that R still stores a tibble as a list (we see this from the typeof()
function). In a tibble, all the individual columns
- must have the same class,
- must have the same length.
While these two restrictions apply within the columns, the entire data set often has different classes of data in each column. Here’s an example of superhero data:
<- tibble(
heroes_df subject_ID = factor(c("008", "016", "115", "027", "001")),
name = c(
"Wonder Woman", "Green Lantern", "Spider-Man", "Batman", "Superman"
),alias = c(
"Diana Prince", "Alan Scott", "Peter Parker", "Bruce Wayne",
"Clark Kent / Kal-El"
),city = c(
"Gateway City", "Capitol City", "New York City", "Gotham", "Metropolis"
),male = c(FALSE, TRUE, TRUE, TRUE, TRUE),
heightCM = c(183.5, 182.9, 177.8, 188.0, 190.5),
weightKg = c(74.8, 91.2, 75.7, 95.3, 106.6),
firstRun = c(1941L, 1940L, 1962L, 1939L, 1938L)
)
Notice that when I create this tibble, each of the entries (columns) are the same length (5 elements each), within each column the classes do not change (column heightCM
is all numeric, column city
is all character, etc.).
We check what this looks like:
heroes_df
# A tibble: 5 × 8
subject_ID name alias city male heightCM weightKg firstRun
<fct> <chr> <chr> <chr> <lgl> <dbl> <dbl> <int>
1 008 Wonder Woman Diana Prince Gate… FALSE 184. 74.8 1941
2 016 Green Lantern Alan Scott Capi… TRUE 183. 91.2 1940
3 115 Spider-Man Peter Parker New … TRUE 178. 75.7 1962
4 027 Batman Bruce Wayne Goth… TRUE 188 95.3 1939
5 001 Superman Clark Kent / … Metr… TRUE 190. 107. 1938
We see that the classes of the columns are displayed under the column names.
(OPTIONAL) Data Frames
Data frames are and have been the de facto data structure in R for most tabular data for over two decades. They are what people commonly used for statistics and plotting. However, tibbles are simply a modern hybrid of a table and a data frame, which includes many nice attributes, properties, and features. The main difference between tibbles and data frames is that data frames can only have columns of atomic vectors; the other restrictions are the same: only one class per column and all columns have the same length.
Moving forward, we will make use of tibbles, but you will most likely see older code that uses data frames as you learn more of R. We create the “antique” tibble with the data.frame()
function.
# Create a Data Frame
<- data.frame(
heroes_OldDF subjectID = factor(c("008", "016", "115", "027", "001")),
name = c("Wonder Woman", "Green Lantern", "Spider-Man", "Batman", "Superman"),
alias = c(
"Diana Prince", "Alan Scott", "Peter Parker", "Bruce Wayne",
"Clark Kent / Kal-El"
),city = c(
"Gateway City", "Capitol City", "New York City", "Gotham", "Metropolis"
),male = c(FALSE, TRUE, TRUE, TRUE, TRUE),
heightCM = c(183.5, 182.9, 177.8, 188.0, 190.5),
weightKg = c(74.8, 91.2, 75.7, 95.3, 106.6),
firstRun = c(1941L, 1940L, 1962L, 1939L, 1938L)
)
# Inspect
heroes_OldDF
subjectID name alias city male heightCM
1 008 Wonder Woman Diana Prince Gateway City FALSE 183.5
2 016 Green Lantern Alan Scott Capitol City TRUE 182.9
3 115 Spider-Man Peter Parker New York City TRUE 177.8
4 027 Batman Bruce Wayne Gotham TRUE 188.0
5 001 Superman Clark Kent / Kal-El Metropolis TRUE 190.5
weightKg firstRun
1 74.8 1941
2 91.2 1940
3 75.7 1962
4 95.3 1939
5 106.6 1938
If you want some more details on this data frame, use the str()
function:
str(heroes_OldDF)
'data.frame': 5 obs. of 8 variables:
$ subjectID: Factor w/ 5 levels "001","008","016",..: 2 3 5 4 1
$ name : chr "Wonder Woman" "Green Lantern" "Spider-Man" "Batman" ...
$ alias : chr "Diana Prince" "Alan Scott" "Peter Parker" "Bruce Wayne" ...
$ city : chr "Gateway City" "Capitol City" "New York City" "Gotham" ...
$ male : logi FALSE TRUE TRUE TRUE TRUE
$ heightCM : num 184 183 178 188 190
$ weightKg : num 74.8 91.2 75.7 95.3 106.6
$ firstRun : int 1941 1940 1962 1939 1938
Inspecting Tibbles
We already saw how the printing the tibble and caling the str()
function can be useful to check the content and the structure of data. Because we want to see something more interesting than our ID card, let’s take a look at the mpg
data set from last week.
mpg
# A tibble: 234 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
3 audi a4 2 2008 4 manu… f 20 31 p comp…
4 audi a4 2 2008 4 auto… f 21 30 p comp…
5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
# ℹ 224 more rows
Here is a non-exhaustive list of functions to get a sense of the content/structure of your data. Note that—while these functions are very useful, we get most of this information for free if we are using tibbles (one of the main reasons we use tibbles instead of the older data frame structure).
- Structure:
str(mpg)
: a list of the vectors used to create thempg
data set, their classes, and their lengths (this includes excerpts from most of the functions below)
- Size:
dim(mpg)
: a vector with the number of rows in the first element, and the number of columns as the second element (these are the dimensions of the object)nrow(mpg)
: the number of rowsncol(mpg)
: the number of columns
- Content:
head(mpg)
: the first 6 rowstail(mpg)
: the last 6 rowssummary(mpg)
: summary statistics for each numeric column
- Names:
names(mpg)
orcolnames(mpg)
: the column namesrownames(mpg)
: the row names (only for old-schooldata.frame
objects)
Note: the head
, tail
, summary
, and names
functions are “generic”; that is, they can be used on most other types of objects besides tibbles, including data frames and even atomic vectors. The dim
, nrow
, and ncol
functions are also generic, but they are only applicable to rectangular, layered, or tabular data.
Positional Subsetting
Positive Positional Subsetting
Lists
Let’s extract the eighth element (my age) of a non-atomic vector:
8] DrGabriel_ls[
$Age
[1] 31
Remember that [
takes in the position of the element of the vector and returns the contents at that position. One of the special things about non-atomic vectors (lists) is that their sub-contents are themselves lists.
class(DrGabriel_ls[8])
[1] "list"
In its list form, this value doesn’t help us very much. For instance, we can’t find my age in months with this:
8] * 12 DrGabriel_ls[
Error in DrGabriel_ls[8] * 12: non-numeric argument to binary operator
In order to extract the contents of the inner list (the contents of the sixth position of DrGabriel_ls
), we need to use a second-level extractor:
8]] DrGabriel_ls[[
[1] 31
8]] * 12 DrGabriel_ls[[
[1] 372
This is the primary difference between atomic and non-atomic positional subsetting. Consider this figure:
The original object x
is a list (the pepper container), and the first element of x
is also a list (x[1]
; still the pepper container, but now with only one packet of pepper). The contents of the first element’s inner list is an atomic vector (x[[1]]
; the packet of pepper). To extract the contents of the x[[1]]
atomic vector (the pepper), we subset one last time. Because x[[1]]
itself is atomic, we can subset it by either [
or [[
; common standards have use use [
for atomic vectors, however.
Tibbles
To extract the element in the second row and third column from a tibble, we can type (using row, column notation):
2, 3] heroes_df[
# A tibble: 1 × 1
alias
<chr>
1 Alan Scott
To extract the third and fourth row within the first and third column:
3:4, c(1, 3)] heroes_df[
# A tibble: 2 × 2
subject_ID alias
<fct> <chr>
1 115 Peter Parker
2 027 Bruce Wayne
To extract an entire column:
2] heroes_df[,
# A tibble: 5 × 1
name
<chr>
1 Wonder Woman
2 Green Lantern
3 Spider-Man
4 Batman
5 Superman
Negative Positional Subsetting
Lists
Using the same negative subsetting framework, I’m going to “de-identify” my electronic ID card.
-(1:2)] DrGabriel_ls[
$Male
[1] TRUE
$City
[1] "Pembroke Pines"
$State
[1] FL
50 Levels: AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD ... WY
$CurrZIP
[1] 33025
$MovedFrom
[1] "TX"
$Age
[1] 31
$MaxDegree
[1] "PhD" "ThD"
$TimeEmpl
[1] 2.1
Tibbles
For tibbles, negative subsetting uses the same row, column notation as it does for positive subsetting.
-1, ] heroes_df[
# A tibble: 4 × 8
subject_ID name alias city male heightCM weightKg firstRun
<fct> <chr> <chr> <chr> <lgl> <dbl> <dbl> <int>
1 016 Green Lantern Alan Scott Capi… TRUE 183. 91.2 1940
2 115 Spider-Man Peter Parker New … TRUE 178. 75.7 1962
3 027 Batman Bruce Wayne Goth… TRUE 188 95.3 1939
4 001 Superman Clark Kent / … Metr… TRUE 190. 107. 1938
-c(1, 3), -c(1, 3, 4)] heroes_df[
# A tibble: 3 × 5
name male heightCM weightKg firstRun
<chr> <lgl> <dbl> <dbl> <int>
1 Green Lantern TRUE 183. 91.2 1940
2 Batman TRUE 188 95.3 1939
3 Superman TRUE 190. 107. 1938
Relational Subsetting
The main idea of relational subsetting is to keep values from tabular data based on their comparison to another set of values. This are the steps:
- Extract a subset of a tibble using your preferred subsetting rules
- Using the logical comparison operators, reduce the subset you extracted in step 1 to a logical atomic vector.
- Subset the full data using this logical atomic vector.
Boolean Subsetting
Because relational subsetting is Boolean (logical) subsetting with an additional step, we first discuss how logical subsetting rules work for non-atomic vectors. For lists, these rules work the same for lists in one dimension as they do for atomic vectors in one dimension.
Tibbles, however, are different. Recall that tibbles have positional subsetting properties through the row, column
syntax and also the positional subsetting properties of lists. These properties hold true for logical subsetting.
Row and Column Subsetting
We can create logical atomic vectors of the same lengths as the number of rows and columns of the tibble (as returned by the dim()
function).
# Dimension of the tibble:
dim(heroes_df)
[1] 5 8
# Logical vector to match the rows (keep the last two):
c(rep(FALSE, 3), rep(TRUE, 2))
[1] FALSE FALSE FALSE TRUE TRUE
# Logical vector to match the columns (keep the last four):
rep(c(FALSE, TRUE), each = 4)
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
# Subset the tibble:
heroes_df[# rows
c(rep(FALSE, 3), rep(TRUE, 2)),
# columns
rep(c(FALSE, TRUE), each = 4)
]
# A tibble: 2 × 4
male heightCM weightKg firstRun
<lgl> <dbl> <dbl> <int>
1 TRUE 188 95.3 1939
2 TRUE 190. 107. 1938
Did you notice that I broke the last expression up over multiple lines? This is to make the code easier for humans to read. Another option I could have done is this (this is usually what I do):
# Logical vector to match the rows (keep the last two):
<- c(rep(FALSE, 3), rep(TRUE, 2))
keepRows_lgl # Logical vector to match the columns (keep the last four):
<- rep(c(FALSE, TRUE), each = 4)
keepCols_lgl
# Subset the tibble:
heroes_df[keepRows_lgl, keepCols_lgl]
# A tibble: 2 × 4
male heightCM weightKg firstRun
<lgl> <dbl> <dbl> <int>
1 TRUE 188 95.3 1939
2 TRUE 190. 107. 1938
List Subsetting
Because tibbles are also lists, we can subset the columns of the tibble using a logical atomic vector with length equal to the number of columns. This code will let us keep every other column starting with the second column:
rep(c(FALSE, TRUE), times = 4)] heroes_df[
# A tibble: 5 × 4
name city heightCM firstRun
<chr> <chr> <dbl> <int>
1 Wonder Woman Gateway City 184. 1941
2 Green Lantern Capitol City 183. 1940
3 Spider-Man New York City 178. 1962
4 Batman Gotham 188 1939
5 Superman Metropolis 190. 1938
Character Set Membership
For numeric and logical atomic vectors, we can make easy Boolean comparisons. “Is this value TRUE
?” “Are these numbers less than 0?” Unfortunately, comparisons for character vectors are not as easy. We need new functions to measure if one character string belongs to a character vector. However, character set membership operators share this with other comparison operators: they must return logical values.
Exact Matching
If we want to find if one string is an element of a set of strings exactly, we can use the “in” operator: %in%
. This is helpful if you have many possible spellings or abbreviations of the same word or idea:
"Florida" %in% c("FL", "Fl", "fl", "Florida", "florida")
[1] TRUE
"Georgia" %in% c("FL", "Fl", "fl", "Florida", "florida")
[1] FALSE
This matching is exact: the character string on the left of %in%
must match exactly to one of the elements of the character vector on the right. This means the following checks will fail:
"Florida State" %in% c("FL", "Fl", "fl", "Florida", "florida")
[1] FALSE
"Florida" %in% c("FL", "Fl", "fl", "State of Florida", "florida")
[1] FALSE
Partial Matching
If we want to find if a string is part of another string, we use the string detection function: str_detect()
. This function is from the stringr
package, which is part of the tidyverse
(more on this later this semester).
str_detect(
string = c("FL", "Fl", "fl", "State of Florida", "florida", "Florida"),
pattern = "Florida"
)
[1] FALSE FALSE FALSE TRUE FALSE TRUE
We know that the labels "State of Florida"
and "Florida"
match our search, but "florida"
does not.
Relational Subsetting for Tibbles
The tibble uses both the row, column and list subsetting rules, which gives it more flexibility than lists alone. We can use relational subsetting to get at some powerful results.
Numerical Relationships
If we would like to see which superheroes were in print prior to the US involvement in WWII, we will use a numeric comparison of the first print run year vs 1941.
# Step 1: subset the first print year of the superheroes:
"firstRun"]] heroes_df[[
[1] 1941 1940 1962 1939 1938
# Step 2: compare these print years to the start of WWII for the US:
"firstRun"]] <= 1941 heroes_df[[
[1] TRUE TRUE FALSE TRUE TRUE
# Step 3: subset the tibble to return only the pre-war heroes
"firstRun"]] <= 1941, ] heroes_df[heroes_df[[
# A tibble: 4 × 8
subject_ID name alias city male heightCM weightKg firstRun
<fct> <chr> <chr> <chr> <lgl> <dbl> <dbl> <int>
1 008 Wonder Woman Diana Prince Gate… FALSE 184. 74.8 1941
2 016 Green Lantern Alan Scott Capi… TRUE 183. 91.2 1940
3 027 Batman Bruce Wayne Goth… TRUE 188 95.3 1939
4 001 Superman Clark Kent / … Metr… TRUE 190. 107. 1938
Character Relationships
How many heroes include the name “man” somewhere in their name? (We will learn more about the str_detect()
function in the stringr::
lesson; for now, note that it returns TRUE
or FALSE
values.)
# Step 1: subset the names of the superheroes:
"name"]] heroes_df[[
[1] "Wonder Woman" "Green Lantern" "Spider-Man" "Batman"
[5] "Superman"
# Step 2: compare these names to the character strings "man":
str_detect(string = heroes_df[["name"]], pattern = "man")
[1] TRUE FALSE FALSE TRUE TRUE
# Step 3: subset the tibble to return only the heroes with "man" in their names
str_detect(string = heroes_df[["name"]], pattern = "man"), ] heroes_df[
# A tibble: 3 × 8
subject_ID name alias city male heightCM weightKg firstRun
<fct> <chr> <chr> <chr> <lgl> <dbl> <dbl> <int>
1 008 Wonder Woman Diana Prince Gate… FALSE 184. 74.8 1941
2 027 Batman Bruce Wayne Goth… TRUE 188 95.3 1939
3 001 Superman Clark Kent / K… Metr… TRUE 190. 107. 1938
Lists
The rules for 1-dimensional non-atomic vectors are very different. The main difference is that most relational functions, such as str_detect()
, %in%
, <
, and others don’t always work on lists.
Named Subsetting
Lists
We know that the electronic ID list we made has names, but we can check to be sure:
names(DrGabriel_ls)
[1] "Surname" "Forename" "Male" "City" "State" "CurrZIP"
[7] "MovedFrom" "Age" "MaxDegree" "TimeEmpl"
Therefore, we can extract the contents of this list by position name:
"Age"] DrGabriel_ls[
$Age
[1] 31
Tibbles
All of the same name-based extraction / subsetting rules that work on the names of list elements work on tibbles, because a tibble is a type of list (check this with the typeof()
function):
"alias"] heroes_df[
# A tibble: 5 × 1
alias
<chr>
1 Diana Prince
2 Alan Scott
3 Peter Parker
4 Bruce Wayne
5 Clark Kent / Kal-El
"alias"]] heroes_df[[
[1] "Diana Prince" "Alan Scott" "Peter Parker"
[4] "Bruce Wayne" "Clark Kent / Kal-El"
However, we also get the column-based subsetting options too!
"alias"] heroes_df[,
# A tibble: 5 × 1
alias
<chr>
1 Diana Prince
2 Alan Scott
3 Peter Parker
4 Bruce Wayne
5 Clark Kent / Kal-El
The $
Operator
There is one other subsetting operator that only works for name-based subsetting of non-atomic vectors, the $
function. We can subset any object of type list
with the $
function in the following manner:
# From a list
$MaxDegree DrGabriel_ls
[1] "PhD" "ThD"
# From a tibble
$name heroes_df
[1] "Wonder Woman" "Green Lantern" "Spider-Man" "Batman"
[5] "Superman"
What is the class of the object returned by $
?
class(heroes_df$city)
[1] "character"
Symbol Objects
This function is very different. Instead of taking in the name of the element as a character string, it takes in the name as a symbol. If you remember our flowchart on objects from last class, recall that there was a small box connecting “Functions” to “Vectors” labelled “Languages”; symbols are language objects: they help R connect functions to vectors. You have technically used symbols since the second day of class. Consider this example:
<- 1:10
myIntegers mean(x = myIntegers)
Notice that I did not type myIntegers
in quotes (as "myIntegers"
). When R sees a reference to something other than all letters, crazy symbols, or TRUE
/ FALSE
, R interprets what you typed as a symbol to be evaluated (think back to our rules for naming objects in lesson 5). R will look in your Global Environment for the object named what you typed. In the example above, R isn’t taking the mean of the letters myIntegers
, but rather replacing that symbol with the atomic vector 1:10
.
Symbol Subsetting
We have seen this behaviour before, when we created plots with ggplot
. The aes()
function took in displ
and hwy
as symbols, and looked for objects with those names in the supplied mpg
data set instead of in the Global environment. Because this is a very complex but powerful tool, we will return to it in a few lessons.