ggplot(data = mtcars) + # A
aes(x = disp, y = mpg, color = cyl) + # B
geom_point(alpha = 0.75) # CR for Data Science Pre-Assessment
Here are some questions about some R basics. I don’t expect you to know the answer to any of these questions now (but it’s nice if you do). By the end of the semester, however, you will be able to answer all of these questions correctly. I have partitioned this course into seven blocks, each roughly two weeks long. Here are some questions to test your knowledge for each block.
Introduction and Basic Scripting
- Data Science is the intersection of which three domains of expertise?
- Everything in R is an __________.
- What is the appropriate syntax (based on the Tidyverse Style Guide) for the assignment operator in R?
Quarto and ggplot2
- In a Quarto document, where are the place(s) you will find code in the YAML programming language?
- Quarto documents combine computing languages, such as R, Python, and SQL, with the meta-language YAML. What is the other major programming language used an a Quarto document, and where will you find it used?
- Label the three main groups of layers in the following basic ggplot2 graph.
Line “A” is the ______________________________ layer.
Line “B” is the ______________________________ layer.
Line “C” is the ______________________________ layer.
Data Structures, Part 1
What are the four most common types of atomic vectors in R, ordered from least to most complex?
You want to create an atomic vector of stoplight colours named by their meaning. Which option should you use?
list(stop = "red", caution = "yellow", go = "green")list(go = "red", drive_faster = "yellow", stop_and_check_Twitter = "green")tibble(stop = "red", caution = "yellow", go = "green")c(stop = "red", caution = "yellow", go = "green")
We assign the vector above to have the name
colours_char. Which functions can be used to extract the “red” component of thecolours_charatomic vector? Select all that apply.colours_char["stop"]colours_char[["stop"]]colours_char$"stop"colours_char$stopcolours_char[1]colours_char[[1]]colours_char@"stop"colours_char@stop
Data Structures, Part 2
What helper functions tell us what kind of object an object named
xis?str(x); whatIs(x); typeof(x)str(x); class(x); typeof(x)str(x); class(x); whatIs(x)whatIs(x); class(x); typeof(x)
True or False:
- a list is a vector.
- a tibble is a list.
- a tibble is an atomic vector.
In R, what is the difference between a
matrixand adata.frame?We now create a non-atomic vector to store patient demographics named
patients_ls. What expressions can be used to extract the “age” information (in the 5th position of the vector)? Select all that apply.patients_ls["age"]patients_ls[["age"]]patients_ls$"age"patients_ls$agepatients_ls[5]patients_ls[[5]]patients_ls@"age"patients_ls@age
dplyr
If you see the error
"could not find function 'read_csv'"or"could not find function '%>%'", what code will probably fix it?Choose the Tidyverse functions that we have used to operate on columns of a tibble. Select all that apply.
filter()select() / rename()mutate()group_by()summarise()pull()arrange()pivot_wider() / pivot_longer()left_join() / full_join()
Choose the Tidyverse functions that we have used to operate on rows of a tibble. Select all that apply.
filter()select() / rename()mutate()group_by()summarise()pull()arrange()pivot_wider() / pivot_longer()left_join() / full_join()
Functions and Iteration
- Almost everything R does uses a __________.
- What is the default syntax to return a value from a function in R?
- Write the R code to create a function that normalizes a numeric vector. Include complete documentation and a basic example for this function.
- You have 100
.csvfiles in a directory calleddata_raw/that contains no other files. Write a basic batch import command using thepurrrpackage.
Strings, Git+GitHub, and Tidy Workflows
- Some of your data has a free text field with ZIP codes, stored in an object called
cityStateZIP_char. What willstr_extract(cityStateZIP_char, pattern = "\\d{5}")do? - What is Git?
- What should be your first step when writing a new data pipeline?