> 867 + 5309
Lesson 2: Introduction to R
Review
- Data science is the intersection of statistics, code, and domain knowledge.
Data science has a few major components:
- Data Preparation and Exploration
- Data Representation and Transformation
- Computing with Data
- Modelling Data
- Visualizing and Presenting Data
- Science / Philosophy about Data Science
Reproducible research is pre-producible: research results that have been created with code can be re-created at will.
"Hello, world!"
Outline
We will cover the following:
- The RStudio Interface
- Basic Operations
- R is an “Object-Oriented” Language
- R is a “Functional” Language
- Getting Help
Reading, Videos, and Assignments
- Watch: https://www.youtube.com/playlist?list=PLHKz7LkI0oR29avKyw6faFjTI4QpKkjZG
- Read: “The Very Basics” chapter of Hands-On Programming in R (HOPR), (https://rstudio-education.github.io/hopr/basics.html)
- Do:
- the pre-course learning assessment and email your answers to me in a Word document.
- Install the Tidyverse package suite for R (paste this code into your Console:
install.packages("tidyverse")
, then press theEnter
(Windows) /return
(Mac) key; if RStudio prompts you to restart, do it).
The RStudio User Interface
Open RStudio. We should see something like this:
This window is broken up into “panes”. Each pane is useful for different parts of our workflow. At this point in the class, the only thing we need to care about is the “Console” pane on the left (but we will make use of the other panes in short order).
Basic Operations
Now that R has told us good morning, we can learn some of the other basic things we can ask R to do.
Characters
Your first code in R last class was to instruct R to tell the world hello. R uses "
(double quotes) and '
(single quotes) to mark textual information.
Scalar Arithmetic
R can do lots of powerful things, but it can also act as a very simple calculator. To add two numbers, after the >
prompt, type
If you would like to chain multiple operations together, you can use multiple lines. The catch: you have to end all but your last line with an operator.
> 24 *
+ 7
Why “scalar” arithmetic? Did you notice the “[1]” next to your answers? In mathematics, a scalar is a numeric quantity represented in one dimension, or—as most people commonly refer to them—numbers. We will return to this concept, and why it is important in R, shortly.
Logic
We can use R’s logic system to compare values. A simple comparison is, “is 5 less than 3”? In R, this expression is
5 < 3
[1] FALSE
This comparison returns a logical indicator, FALSE
. Notice that this value is not surrounded by quotes, and it’s also not a “number”. TRUE
and FALSE
are special symbols in R.
The following symbols are useful in logical comparisons:
Symbol | Meaning | Example |
---|---|---|
> |
greater than | 5 > 3 |
< |
less than | 5 < 3 |
>= |
greater than or equal to | 5 >= 3 |
<= |
less than or equal to | 5 <= 3 |
== |
equal to | 5 == 3 |
!= |
not equal to | 5 != 3 |
The Code Comment
Many times, we want to include notes with our code, to remind ourselves why we did what we did. In these cases, we don’t want R to try to evaluate these notes, so we use a special character to mark our notes as “code comments”.
2 + 4 + 6 + 8
[1] 20
# Who do you appreciate?
Objects in R
What is an Object?
An object is like a basket or container to hold stuff. Let’s make our very first object. We are going to create an object named x
, and store the number 2 in it:
<- 2 x
Now, we can recall this value any time we evaluate x
.
x
[1] 2
Now look at the “Environment” pane on the top right. You should see this:
What happened? Consider this analogy: you have a small container, you put the number 2 in it, and you label the container as x
.
An example in the real world is a container of food in the office refrigerator: it should have a label to identify what it is and who owns it, and it should contain some food or drink. So far, you have used “containers” without labelling them. When you executed 24 * 7
, R created a container and put the number 168 in it. However, R is like a vigilant office administrator: anything unlabelled in the refrigerator will be tossed out! Thus, if you wanted access to this number again, you must calculate 24 * 7
from scratch.
We can perform calculations with objects just like we could with anything contained in that object. For instance, if you wanted 10 to the power of 2, we see that the following are equivalent:
10 ^ 2
[1] 100
10 ^ x
[1] 100
Just like any real lunch container, we can always empty out old food and put new food in it. To do this, we simply assign some new value to x
:
<- 3 x
Vectors
Recall that our examples had scalar values? R, however, is designed to work with ordered sets of values, known as a vector. There are a few simple ways to make vectors in R, depending on what you want these vectors to contain. Let’s first make a vector of the digits on a keypad.
<- 0:9 keypad
We see that the syntax a:b
creates a sequence of integers from a
to b
. We can do everything with a vector of numbers that we could do with a scalar value:
+ 3 keypad
[1] 3 4 5 6 7 8 9 10 11 12
* 2 keypad
[1] 0 2 4 6 8 10 12 14 16 18
10 - keypad
[1] 10 9 8 7 6 5 4 3 2 1
2 ^ keypad
[1] 1 2 4 8 16 32 64 128 256 512
< 5 keypad
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
Calling Functions in R
You are probably thinking about now, “but what if I want to make a vector with something other than just sequential integers?” That’s a great question. To create some of the more complicated vectors, we need to explicitly call some functions.
The “Concatenate” Function
One of the most important functions you will learn is the function to create simple vectors:
c(1, 3, 6, 12)
[1] 1 3 6 12
The c()
function will put almost anything you want into a single vector, provided that they are all similar things. For instance, I can make a vector of my full name and terminal degrees:
<- c("Prof.", "Gabriel", "J.", "Odom", "PhD", "ThD")
me me
[1] "Prof." "Gabriel" "J." "Odom" "PhD" "ThD"
Function Components
Functions have three main components: its name, its arguments’ names, and its arguments’ values. For instance, the function to take an arithmetic mean in R is mean()
. When you type mean
in the Console pane, RStudio displays a hint on how to use the function. Let’s use this hint to find the expected roll on a six-sided die:
mean(x = 1:6)
[1] 3.5
We unpack this as follows:
- The function is named
mean
. - The mean function has an argument named
x
. - We supply the vector of integers from 1 to 6 as the value to the argument
x
.
Notice that the name of the argument to the mean()
function is “x”. Most functions have named arguments, but the c()
function does not. It allows you to choose your own names.
We can even put functions inside of other functions:
log(c(10, 100, 1000))
[1] 2.302585 4.605170 6.907755
Getting Help
Sometimes, you need to get some help with R code. To get help on any function you know the name of, type ?
then the name (with no spaces). For example, to find the help file for the round()
function, type
?round
Help files in R are all structured the same way. The super important parts are
- Name, Alias, and {Source}: what the function is called, {the package it comes from}, and if it is part of a group of functions
- Description: simple statement of what the function is supposed to do
- Arguments: names and descriptions of the function arguments
- Details: more information you might need to know about the function. This is usually where I look if the function is doing something I didn’t expect.
- Value: (OPTIONAL) this describes what the function returns (but not all functions return things you can see, or what they return is very basic, so this section may be missing)
- Examples: lines of code that you can run to test what the function should be doing.
The “Examples” section is where I go first, and then I go to “Arguments”, “Value” (if applicable), and “Details” after that.
We can also use Google. Websites like https://www.stackoverflow.com can be very helpful. Later on, when we use packages of R code, finding the package webiste or user guide can also be very helpful.