R for Data Science Pre-Assessment

Here are some questions about some R basics. I don’t expect you to know the answer to any of these questions now (but it’s nice if you do). By the end of the semester, however, you will be able to answer all of these questions correctly. I have partitioned this course into seven blocks, each roughly two weeks long. Here are some questions to test your knowledge for each block.

Introduction and Basic Scripting

  1. Data Science is the intersection of which three domains of expertise?
  2. Everything in R is an __________.
  3. What is the appropriate syntax (based on the Tidyverse Style Guide) for the assignment operator in R?

Quarto and ggplot2

  1. In a Quarto document, where are the place(s) you will find code in the YAML programming language?
  2. Quarto documents combine computing languages, such as R, Python, and SQL, with the meta-language YAML. What is the other major programming language used an a Quarto document, and where will you find it used?
  3. Label the three main groups of layers in the following basic ggplot2 graph.
ggplot(data = mtcars) +                   # A
  aes(x = disp, y = mpg, color = cyl) +   # B
  geom_point(alpha = 0.75)                # C

Line “A” is the ______________________________ layer.
Line “B” is the ______________________________ layer.
Line “C” is the ______________________________ layer.

Data Structures, Part 1

  1. What are the four most common types of atomic vectors in R, ordered from least to most complex?

  2. You want to create an atomic vector of stoplight colours named by their meaning. Which option should you use?

    1. list(stop = "red", caution = "yellow", go = "green")
    2. list(go = "red", drive_faster = "yellow", stop_and_check_Twitter = "green")
    3. tibble(stop = "red", caution = "yellow", go = "green")
    4. c(stop = "red", caution = "yellow", go = "green")
  3. We assign the vector above to have the name colours_char. Which functions can be used to extract the “red” component of the colours_char atomic vector? Select all that apply.

    1. colours_char["stop"]
    2. colours_char[["stop"]]
    3. colours_char$"stop"
    4. colours_char$stop
    5. colours_char[1]
    6. colours_char[[1]]
    7. colours_char@"stop"
    8. colours_char@stop

Data Structures, Part 2

  1. What helper functions tell us what kind of object an object named x is?

    1. str(x); whatIs(x); typeof(x)
    2. str(x); class(x); typeof(x)
    3. str(x); class(x); whatIs(x)
    4. whatIs(x); class(x); typeof(x)
  2. True or False:

    1. a list is a vector.
    2. a tibble is a list.
    3. a tibble is an atomic vector.
  3. In R, what is the difference between a matrix and a data.frame?

  4. We now create a non-atomic vector to store patient demographics named patients_ls. What expressions can be used to extract the “age” information (in the 5th position of the vector)? Select all that apply.

    1. patients_ls["age"]
    2. patients_ls[["age"]]
    3. patients_ls$"age"
    4. patients_ls$age
    5. patients_ls[5]
    6. patients_ls[[5]]
    7. patients_ls@"age"
    8. patients_ls@age

dplyr

  1. If you see the error "could not find function 'read_csv'" or "could not find function '%>%'", what code will probably fix it?

  2. Choose the Tidyverse functions that we have used to operate on columns of a tibble. Select all that apply.

    1. filter()
    2. select() / rename()
    3. mutate()
    4. group_by()
    5. summarise()
    6. pull()
    7. arrange()
    8. pivot_wider() / pivot_longer()
    9. left_join() / full_join()
  3. Choose the Tidyverse functions that we have used to operate on rows of a tibble. Select all that apply.

    1. filter()
    2. select() / rename()
    3. mutate()
    4. group_by()
    5. summarise()
    6. pull()
    7. arrange()
    8. pivot_wider() / pivot_longer()
    9. left_join() / full_join()

Functions and Iteration

  1. Almost everything R does uses a __________.
  2. What is the default syntax to return a value from a function in R?
  3. Write the R code to create a function that normalizes a numeric vector. Include complete documentation and a basic example for this function.
  4. You have 100 .csv files in a directory called data_raw/ that contains no other files. Write a basic batch import command using the purrr package.

Strings, Git+GitHub, and Tidy Workflows

  1. Some of your data has a free text field with ZIP codes, stored in an object called cityStateZIP_char. What will str_extract(cityStateZIP_char, pattern = "\\d{5}") do?
  2. What is Git?
  3. What should be your first step when writing a new data pipeline?