Data Types
In the last lesson, we learned about two data types: vectors and data frames. We also
learned about two different classes of vectors: numeric
and factor
. There are many
other data types in R. Each has a special use, and to be productive in R, you need to be
familiar with the major types and the operations on these types.
Primitive Types
Each R object has a un underlying “type”, which determines the set of possible values
for that object. You can find the type of an object using the typeof
function.
The main types include the following:
-
logical
: a logical value.TRUE
[1] TRUE
FALSE
[1] FALSE
TRUE | FALSE # logical 'or'
[1] TRUE
TRUE & FALSE # logical 'and'
[1] FALSE
!TRUE # logical 'not'
[1] FALSE
-
integer
: an integer (positive or negative). Many R programmers do not use this mode since everyinteger
value can be represented as adouble
.1L # suffix integers with an L to distinguish them from doubles
[1] 1
-7L
[1] -7
1L:10L # range of values
[1] 1 2 3 4 5 6 7 8 9 10
1:10 # (L suffix is optional)
[1] 1 2 3 4 5 6 7 8 9 10
7%%2 # modulo (remainder)
[1] 1
7%/%2 # integer division
[1] 3
-
double
: a real number stored in “double-precision floatint point format.”1
[1] 1
3.14
[1] 3.14
-(3 + 8/2) * 7 # arithmetic operations
[1] -49
2^10 # exponentiation
[1] 1024
A
double
type can store the special valuesInf
,-Inf
, andNaN
, which represent “positive infinity,” “negative infinity,” and “not a number”:1/0
[1] Inf
-1/0
[1] -Inf
0/0
[1] NaN
-
complex
: a complex number1i # suffix with i to denote 'imaginary'
[1] 0+1i
(2i)^2
[1] -4+0i
sqrt(-1+0i)
[1] 0+1i
-
character
: a sequence of characters, called a “string” in other programming languages"Hello, World!" # denote a string with double quotes...
[1] "Hello, World!"
'abracadabra' # ...or with single quotes (both forms are equivalent).
[1] "abracadabra"
-
list
: a list of named values (discussed in detail in the next section)list(a = 10, b = 11, z = "hello")
$a [1] 10 $b [1] 11 $z [1] "hello"
-
builtin
,closure
,special
: a function or operator (for most purposes, the distinctions between these are not important)typeof(sqrt)
[1] "builtin"
typeof(read.csv)
[1] "closure"
typeof(`<-`)
[1] "special"
-
NULL
: a special type with only one possible value, known asNULL
typeof(NULL)
[1] "NULL"
This is not an exhaustive list, but the other modes are exotic and you probably won’t ever encounter them.
Missing Values
One unique feature of R is its support for “Not Applicable” or “Missing” values.
The logical
, integer
, double
, complex
, and character
types can all
represent missing values, using the special constant NA
.
Conversions
Often, you don’t need to worry too much about the types, because R will implicitly convert between types for you. For example, consider the following sequence of commands
x <- 1:10
x[[2]] <- 3.14
When the first line gets executed, x
gets created as an integer
vector. In the
second line, R converts x
to a double
vector so that it can store the value
3.14
.
Lists
A “list” is a primitive type that stores a sequence of values, along with optional names for these values. The power of the list type is that it allows you to represent complicated objects.
We construct lists using the list
function:
abe <- list(first.name = "Abraham", last.name = "Lincoln", weight.lb = 180,
height.in = 76.8)
In this example, abe
is a list with four elements, with names
first.name
, last.name
, weight.lb
, and height.in
.
We access the elements of a list using double square brackets. We can either specify the index of the element
abe[[1]]
[1] "Abraham"
abe[[2]]
[1] "Lincoln"
or we can specify the name
abe[["first.name"]]
[1] "Abraham"
abe[["last.name"]]
[1] "Lincoln"
Another way to access an element by name is to use the $
operator:
abe$height
[1] 76.8
abe$weight
[1] 180
Both forms (abe[["first.name"]]
and abe$first.name
) are equivalent, but the $
form is more common.
As with vectors, we get the number of elements with the length
function:
length(abe)
[1] 4
We slice lists with single square brackets:
abe[1:2]
$first.name [1] "Abraham" $last.name [1] "Lincoln"
abe[1]
$first.name [1] "Abraham"
For a vector, the slice [1]
is logically equivalent to the element
[[1]]
, but for a list, these entities are distinct.
We can delete a particular element of a list by assigning it the value NULL
:
abe[["last.name"]] <- NULL
This removes the element, and shifts the indexes of subsequent elements
abe[[2]]
[1] 180
abe[[3]]
[1] 76.8
Classes
Two types we saw in the previous lesson are not primitive: data frames and factors. In fact, a data frame is a special type of list, and a factor is a special type of integer vector. These special types are known as “classes”.
Every R object is a member of one or more classes. To find these classes, use the
class
function:
class(TRUE)
[1] "logical"
class(1L)
[1] "integer"
class(3.14)
[1] "numeric"
(Confusingly, the class for double
objects is not called double
; it is called
numeric
.)
A data frame is a list whose elements are vectors, each with the same length.
A factor is an integer vector taking values in the range 1
..m
, with each integer
corresponding to a certain level. R distinguishes between these types and their
underlying representations by assigning them to different classes.
bikedata <- read.csv("bikedata.csv")
typeof(bikedata)
[1] "list"
class(bikedata)
[1] "data.frame"
typeof(bikedata$colour)
[1] "integer"
class(bikedata$colour)
[1] "factor"
The power of classes is that they allow you to change how certain functions behave. Compare the following two otputs:
summary(bikedata$colour)
Black Blue Green Grey Other Red White NA's 262 636 149 531 52 378 333 14
summary(unclass(bikedata$colour))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1.00 2.00 4.00 3.83 6.00 7.00 14
Here, unclass
is a function that converts to the underlying primitive type. When
we summarize an object with class factor
, we report counts for the levels; when
we summarize an object with class integer
, we report quartiles and other statistics.
Advanced R programmers create new kinds of classes, along with specialized functions to act on these classes.