Data Types

In the last lesson, we learned about two data types: vectors and data frames. We also learned about two different classes of vectors: numeric and factor. There are many other data types in R. Each has a special use, and to be productive in R, you need to be familiar with the major types and the operations on these types.

Primitive Types

Each R object has a un underlying “type”, which determines the set of possible values for that object. You can find the type of an object using the typeof function.

The main types include the following:

  • logical: a logical value.

    TRUE
    [1] TRUE
    
    FALSE
    [1] FALSE
    
    TRUE | FALSE  # logical 'or'
    [1] TRUE
    
    TRUE & FALSE  # logical 'and'
    [1] FALSE
    
    !TRUE  # logical 'not'
    [1] FALSE
    
  • integer: an integer (positive or negative). Many R programmers do not use this mode since every integer value can be represented as a double.

    1L  # suffix integers with an L to distinguish them from doubles
    [1] 1
    
    -7L
    [1] -7
    
    1L:10L  # range of values
     [1]  1  2  3  4  5  6  7  8  9 10
    
    1:10  # (L suffix is optional)
     [1]  1  2  3  4  5  6  7  8  9 10
    
    7%%2  # modulo (remainder)
    [1] 1
    
    7%/%2  # integer division
    [1] 3
    
  • double: a real number stored in “double-precision floatint point format.”

    1
    [1] 1
    
    3.14
    [1] 3.14
    
    -(3 + 8/2) * 7  # arithmetic operations
    [1] -49
    
    2^10  # exponentiation
    [1] 1024
    

    A double type can store the special values Inf, -Inf, and NaN, which represent “positive infinity,” “negative infinity,” and “not a number”:

    1/0
    [1] Inf
    
    -1/0
    [1] -Inf
    
    0/0
    [1] NaN
    
  • complex: a complex number

    1i  # suffix with i to denote 'imaginary'
    [1] 0+1i
    
    (2i)^2
    [1] -4+0i
    
    sqrt(-1+0i)
    [1] 0+1i
    
  • character: a sequence of characters, called a “string” in other programming languages

    "Hello, World!"  # denote a string with double quotes...
    [1] "Hello, World!"
    
    'abracadabra'    # ...or with single quotes (both forms are equivalent).
    [1] "abracadabra"
    
  • list: a list of named values (discussed in detail in the next section)

    list(a = 10, b = 11, z = "hello")
    $a
    [1] 10
        
    $b
    [1] 11
        
    $z
    [1] "hello"
    
  • builtin, closure, special: a function or operator (for most purposes, the distinctions between these are not important)

    typeof(sqrt)
    [1] "builtin"
    
    typeof(read.csv)
    [1] "closure"
    
    typeof(`<-`)
    [1] "special"
    
  • NULL: a special type with only one possible value, known as NULL

    typeof(NULL)
    [1] "NULL"
    

This is not an exhaustive list, but the other modes are exotic and you probably won’t ever encounter them.

Missing Values

One unique feature of R is its support for “Not Applicable” or “Missing” values. The logical, integer, double, complex, and character types can all represent missing values, using the special constant NA.

Conversions

Often, you don’t need to worry too much about the types, because R will implicitly convert between types for you. For example, consider the following sequence of commands

x <- 1:10
x[[2]] <- 3.14

When the first line gets executed, x gets created as an integer vector. In the second line, R converts x to a double vector so that it can store the value 3.14.

Lists

A “list” is a primitive type that stores a sequence of values, along with optional names for these values. The power of the list type is that it allows you to represent complicated objects.

We construct lists using the list function:

abe <- list(first.name = "Abraham", last.name = "Lincoln", weight.lb = 180, 
    height.in = 76.8)

In this example, abe is a list with four elements, with names first.name, last.name, weight.lb, and height.in.

We access the elements of a list using double square brackets. We can either specify the index of the element

abe[[1]]
[1] "Abraham"
abe[[2]]
[1] "Lincoln"

or we can specify the name

abe[["first.name"]]
[1] "Abraham"
abe[["last.name"]]
[1] "Lincoln"

Another way to access an element by name is to use the $ operator:

abe$height
[1] 76.8
abe$weight
[1] 180

Both forms (abe[["first.name"]] and abe$first.name) are equivalent, but the $ form is more common.

As with vectors, we get the number of elements with the length function:

length(abe)
[1] 4

We slice lists with single square brackets:

abe[1:2]
$first.name
[1] "Abraham"

$last.name
[1] "Lincoln"
abe[1]
$first.name
[1] "Abraham"

For a vector, the slice [1] is logically equivalent to the element [[1]], but for a list, these entities are distinct.

We can delete a particular element of a list by assigning it the value NULL:

abe[["last.name"]] <- NULL

This removes the element, and shifts the indexes of subsequent elements

abe[[2]]
[1] 180
abe[[3]]
[1] 76.8

Classes

Two types we saw in the previous lesson are not primitive: data frames and factors. In fact, a data frame is a special type of list, and a factor is a special type of integer vector. These special types are known as “classes”.

Every R object is a member of one or more classes. To find these classes, use the class function:

class(TRUE)
[1] "logical"
class(1L)
[1] "integer"
class(3.14)
[1] "numeric"

(Confusingly, the class for double objects is not called double; it is called numeric.)

A data frame is a list whose elements are vectors, each with the same length. A factor is an integer vector taking values in the range 1..m, with each integer corresponding to a certain level. R distinguishes between these types and their underlying representations by assigning them to different classes.

bikedata <- read.csv("bikedata.csv")
typeof(bikedata)
[1] "list"
class(bikedata)
[1] "data.frame"
typeof(bikedata$colour)
[1] "integer"
class(bikedata$colour)
[1] "factor"

The power of classes is that they allow you to change how certain functions behave. Compare the following two otputs:

summary(bikedata$colour)
Black  Blue Green  Grey Other   Red White  NA's 
  262   636   149   531    52   378   333    14 
summary(unclass(bikedata$colour))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1.00    2.00    4.00    3.83    6.00    7.00      14 

Here, unclass is a function that converts to the underlying primitive type. When we summarize an object with class factor, we report counts for the levels; when we summarize an object with class integer, we report quartiles and other statistics.

Advanced R programmers create new kinds of classes, along with specialized functions to act on these classes.