PHC 6701: R for Data Science // Advanced R

Author

Gabriel J. Odom, PhD, ThD

About

These are the written lecture materials for the class PHC 6701 (commonly known as “Advanced R”) at Florida International University’s Stempel College of Public Health. The source code and data sets for this book are available here: https://github.com/gabrielodom/PHC6701_r4ds.

The videos for the course material are all in playlists on YouTube. The playlists are numbered to correspond with the chapters of this book. The channel is here: https://www.youtube.com/@OdomScience/playlists

Getting Started

Complete all of the steps below before the first day of class (I’ll be available via Zoom if you get stuck on any of these steps):

  1. Set up your computer:
  2. Install R/Rstudio:
  3. Configure RStudio: https://derailment.netlify.app/2019-12-22-configuring-rstudio/
  4. (OPTIONAL) Join the community of R programmers on Slack: https://rfordatasci.com/.
  5. Sign up for GitHub using your student email address: https://docs.github.com/en/get-started/quickstart/creating-an-account-on-github

Pre-Course Learning Assessment

Here are some questions about some R basics. I don’t expect you to know the answer to any of these questions now (but it’s nice if you do). By the end of the semester, however, you will be able to answer all of these questions correctly.

Questions for R component of the final exam:

  1. Everything in R is an __________.

  2. Almost everything R does uses a __________.

  3. You want to create an atomic vector of stoplight colours named by their meaning. Which option should you use?

    1. list(stop = "red", caution = "yellow", go = "green")
    2. list(go = "red", drive_faster = "yellow", stop_and_check_Twitter = "green")
    3. tibble(stop = "red", caution = "yellow", go = "green")
    4. c(stop = "red", caution = "yellow", go = "green")
  4. You want to save your stoplight colours as an object named colours_char. What is the appropriate function (based on our style guide) to use in order to assign a value to this object?

  5. Which functions can be used to extract the “red” component of the colours_char atomic vector? Select all that apply (each correct answer is worth 1/8th of the point).

    1. colours_char["stop"]
    2. colours_char[["stop"]]
    3. colours_char$"stop"
    4. colours_char$stop
    5. colours_char[1]
    6. colours_char[[1]]
    7. colours_char@"stop"
    8. colours_char@stop
  6. What are the four most common types of atomic vectors in R, ordered from least to most complex?

  7. We now create a non-atomic vector to store patient demographics named patients_ls. What expressions can be used to extract the “age” information (in the 5th position of the vector)? Select all that apply (each correct answer is worth 1/8th of the point).

    1. patients_ls["age"]
    2. patients_ls[["age"]]
    3. patients_ls$"age"
    4. patients_ls$age
    5. patients_ls[5]
    6. patients_ls[[5]]
    7. patients_ls@"age"
    8. patients_ls@age
  8. What helper functions tell us what kind of object an object named x is?

    1. str(x); whatIs(x); typeof(x)
    2. str(x); class(x); typeof(x)
    3. str(x); class(x); whatIs(x)
    4. whatIs(x); class(x); typeof(x)
  9. True or False: a list is a vector.

  10. If you see the error "could not find function "read_csv"" or "could not find function "%>%"", what code will probably fix it?

  11. True or False: a tibble is a list.

  12. True or False: a tibble is an atomic vector.

  13. In R, what is the difference between a Matrix and a Data Frame?

  14. Choose the Tidyverse functions that we have used to operate on columns of a tibble. Select all that apply.

    1. filter()
    2. select() / rename()
    3. mutate()
    4. group_by()
    5. summarise()
    6. pull()
    7. arrange()
    8. pivot_wider() / pivot_longer()
    9. left_join() / full_join()
  15. Choose the Tidyverse functions that we have used to operate on rows of a tibble. Select all that apply.

    1. filter()
    2. select() / rename()
    3. mutate()
    4. group_by()
    5. summarise()
    6. pull()
    7. arrange()
    8. pivot_wider() / pivot_longer()
    9. left_join() / full_join()
  16. Write the R code to create a function that normalizes a numeric vector. Include complete documentation and a basic example for this function.

  17. Some of your data has a free text field with ZIP codes, stored in an object called cityStateZIP_char. What will str_extract(cityStateZIP_char, pattern = "\\d{5}") do?

  18. You have 100 .csv files in a directory called data_raw/ that contains no other files. Write a basic batch import command using the purrr package.

  19. Label the three main groups of layers in the following basic ggplot graph.

ggplot(data = mtcars) +                   # A
  aes(x = disp, y = mpg, color = cyl) +   # B
  geom_point(alpha = 0.75)                # C

Line “A” is the ______________________________ layer.
Line “B” is the ______________________________ layer.
Line “C” is the ______________________________ layer.