########################################################################## ########################################################################## # a brief introduction to some of the most important functionality # in R # david draper, 13 jan 2019 # the comment symbol in R is # # this text file can be copied and pasted into an R session, bit by bit, # as part of learning about creating new R objects, R's data types, # operations on vectors, how to write functions, and simple plotting ########################################################################## # see # cran.r-project.org/doc/manuals/r-release/R-intro.html # and # cran.r-project.org/doc/manuals/r-release/R-lang.html # for (voluminous) detail on the R language # if you're *completely* new to R, i recommend working through # Appendix A ('A sample session') in the first URL above; by # 'working through' i mean copy (one by one) all of the R commands # listed there into your own R session and pay attention to # what comes out # you might work all the way through this file first, and then # go to Appendix A; you should then find that you understand most or all # of what's going on in that Appendix ########################################################################## # i work with R in the windows 10 operating system (OS) in an # old-school fashion: i start an R session and i also open a .txt file # with notepad # i formulate my R code in the .txt file, editing it until # it looks right; i then copy and paste it into my R session and run it; # i copy whatever R says in return and paste R's reply into # the .txt file # when fully active, my screen has (1) a large R session window # with command and plotting areas and (2) a .txt file recording # both sides of my conversation with R # you may prefer instead to learn how to use 'RStudio'; it's a freeware # environment that maintains those same 3 ingredients (command # window, plotting window, conversation window) for you automatically ########################################################################## # preliminaries in the windows 10 OS (with analogous operations # under Mac OS X or Linux): # (1) before starting R, create a directory in which you want # the results of your R session to be stored (this is also where # you want to create and save your .txt file if you use my # old-school method) # for example, your main directory for this course might be called # 'AMS-206', and you might make a sub-directory for this introductory # session called (e.g.) 'R-Introduction' # (2) start up R; you'll get a welcome banner something like this: # --> welcome banner begins below this line # R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" # Copyright (C) 2018 The R Foundation for Statistical Computing # Platform: x86_64-w64-mingw32/x64 (64-bit) # R is free software and comes with ABSOLUTELY NO WARRANTY. # You are welcome to redistribute it under certain conditions. # Type 'license()' or 'licence()' for distribution details. # Natural language support but running in an English locale # R is a collaborative project with many contributors. # Type 'contributors()' for more information and # 'citation()' on how to cite R or R packages in publications. # Type 'demo()' for some demos, 'help()' for on-line help, or # 'help.start()' for an HTML browser interface to help. # Type 'q()' to quit R. # [Previously saved workspace restored] # --> welcome banner ends above this line # --> sidebar on 'help( )' begins below this line # important: 'help( )' is an extremely useful built-in function; # if there's a function named 'foo', the command 'help( foo )' # will open a window in your default browser (if you're online) # and display the official R help page about 'foo' (these pages # assume some familiarity with R, and may be hard to understand # at first, but you'll get better at interpreting them over time) # --> sidebar on 'help( )' ends above this line # (3) click on the Misc menu and uncheck the Buffered output option # (this will force R to converse with you continuously rather than # storing up a lot of its replies in a buffer and only replying # to you when the buffer is full) # click on the File menu and choose 'Change dir...' # click through your file system tree until you come to # the directory you created earlier, where you want to store # the results from this introductory session; click OK when you # arrive at this directory # at any point, when R has finished doing what you've asked it to do, # it will prompt you to issue another command with the # character string '> ': # > # commands in R (things you type in at the '>' prompt) involve # two types of things: (a) expressions and (b) functions # expressions are things that R evaluates for you, and when you work # with functions you'll either be calling them or creating them # an example of an expression evaluation is 1 + 1 # to which R replies # [1] 2 # (i'll explain the '[1]' later) # R has a rich set of built-in functions -- earlier i called 'help( foo )' # a command, but it's really a call to the built-in function 'help' # --> sidebar on functions begins below this line # what is a function in R? as with all other programming languages, # it's a structure that accepts inputs and produces outputs # the generic structure of an R function that's prepared to accept # k inputs is # function.name( input.1, input.2, ..., input.k ) { # (here the function does things for you) # } # as an example of function inputs, if you type 'help( help )' # you'll get a web page that includes the following description: # Description # help is the primary interface to the help systems. # Usage # help(topic, package = NULL, lib.loc = NULL, # verbose = getOption("verbose"), # try.all.packages = getOption("help.try.all.packages"), # help_type = getOption("help_type")) # so 'help' is prepared to accept 6 inputs: (topic, package, ..., # help_type) # some inputs to built-in R functions have *default* values: # if you don't mention those inputs in your function call, # R will use the built-in default value for that input, # which is listed on the help page for that function using # the syntax 'input = default.input' (for example, if you just # type the command 'help( )', the input 'package' will default # to 'NULL' (a special keyword in R meaning essentially 'an object # that exists but has no value'), and so on for the other inputs # --> sidebar on functions ends above this line # (3) to see what the current directory is at any time in R, # you can use the built-in function call # getwd( ) # which stands for 'get working directory' # in my case, on my home desktop, this is # getwd( ) # [1] "C:/DD/Teaching/AMS-206/Winter-2019" # --> sidebar on '[1]' begins below this line # what's the deal with '[1]'? if the output of a function # is a list of things, R lists them and 'helpfully' tells you which # thing is first on the list; if the list is too long to fit on # one line, R continues it on a second line and begins the second line # with something like '[18] ...', which tells you that the first line # had 17 elements of the list in it and the 18th element is at the # beginning of the second line; and so on # --> sidebar on '[1]' ends above this line # if you know the exact path to the directory you want, # instead of the menu-based 'Change dir...' approach # you can use the *set working directory* function 'setwd': # setwd( 'C:/DD/Teaching/AMS-206/Winter-2019' ) # does the same thing as the menu-based method, but to use 'setwd' # you have to know the exact absolute path # (4) you might begin your conversation with R with the # built-in function call ls( ) # this lists (almost) all of the objects that currently # exist in your current working directory; here, because nothing # is there, R replies # character(0) # which is its way of saying that this directory is currently # (almost) empty; you can instead try ls( all.names = T ) # to get (almost almost) all defined objects # --> sidebar on 'T' and 'F' begins below this line # note: 'T' and 'F' are permissible abbreviations for 'TRUE' # and 'FALSE', respectively in R # --> sidebar on 'T' and 'F' ends above this line # to my second ls( ... ) command R replies # [1] ".Random.seed" # well, what the hell is that? whenever the attributes of an object # in an R session are unknown to you, i recommend using the 'str' # (structure) function: str( .Random.seed ) # to which R replies # int [1:626] 403 90 1363676567 1985947481 -1428087515 ... # explanation: all of R's pseudo-random number generators need # 'random seeds' for initialization; these are integers from # - big to + big, where big seems to be on the order of about 10^9 # there are other invisible objects in a brand-new R directory # (an example is .Machine , which provides useful information # about things like the largest base-10 real number that R # can handle (1.797693e+308 in 64-bit mode)), but this is getting # too far down in the weeds for beginning R coders ########################################################################## # (5) the assignment operators # your R working directory is currently (essentially) empty; # how do you fill it with stuff that will help you get your work done? # there are two main ways: you can read stuff in from outside R # (i'll cover that in another tutorial file), or you can *assign* # R objects values on the command line # the two assignment operators in R are '<-' and '=' temp.0 <- 1 # '<-' is meant to look like a left arrow, so you could read # the command above as '1 goes to temp.0' or 'temp.0 gets 1' # or (more directly) 'temp.0 is assigned the value 1' # if the R object temp.0 doesn't already exist in your current # directory, R creates it and assigns the numeric value 1 to it; # if temp.0 already exists, R over-writes its previous value with 1 temp.0 = 1 # this has exactly the same effect as 'temp.0 <- 1' # i strongly recommend that you stick with '<-', because it's easy # to confuse '=' with '==', which has a completely different meaning # (explained below) ls( ) # [1] "temp.0" # now the current working directory has one object in it, 'temp.0' # --> additional comment on '=' begins below this line # as noted above, '=' has a completely different meaning # inside a function call: in the command 'ls( all.names = T )' above, # all '=' is doing is providing the function 'ls' with a non-default # value of its input 'all.names' # --> additional comment on '=' ends above this line ########################################################################## # (6) some comments on data types and objects in R: # R supports a wide variety of data structures, including vector, # list, matrix, data frame, factor and table # in R, what we usually think of mathematically as a scalar # is just a vector with only one element # you can find out about the various kinds of R data types # with the built-in function 'typeof( )' -- to paraphrase its # R help page: # typeof( x ) determines the (R internal) type or storage mode # of any R object x; typeof( ) returns a character string # the most important current possible values are the vector types # 'logical', 'numeric', 'double', 'integer', 'complex', 'character' # (i won't talk about 'complex', because we won't use any complex numbers # in this class) # all of these data types apply to scalars as well as vectors; another # important vector type is 'list' (more about that in a later # tutorial file) # more esoteric data types in R, for hardcore coders: # "single", "raw", "list", "NULL", "closure" (function), "special" and # "builtin" (basic functions and operators), "environment", "S4" # (some S4 objects) and others that are unlikely to be seen at user # level ("symbol", "pairlist", "promise", "language", "char", "...", # "any", "expression", "externalptr", "bytecode" and "weakref") ##### (A) logical temp.1 <- ( 2 == 2 ) # '==' is a comparison operator (there will be more about this topic # below); '==' asks the question: are the things to the left and right # of '==' identical? # there are three ways to assess the kind of object you're working with # in R: mode( temp.1 ) # [1] "logical" typeof( temp.1 ) # [1] "logical" class( temp.1 ) # [1] "logical" # fortunately they all agree when the object is logical, but # they don't always agree on other kinds of objects (see below) # 'mode( )' and 'typeof( )' both identify the internal storage type # of an object (i.e., how R actually stores the object in memory), # except that 'typeof( )' sometimes gives more detail than 'mode( )' # (see *numeric* below) # contemporary computers all use base 2 (binary, 0 and 1) for # internal representations of what we would think of as # real-world objects # a single binary digit is called a *bit*; 8 bits together form # a *byte*; in byte storage the integer 1 would be 00000001, # and you can see that a single byte can store any integer # between 0 and ( 2^8 - 1 ) = 255 # non-integer real numbers are represented in contemporary computers # in either *single-* or *double-precision floating-point* format; # single precision uses 32 adjacent bits of storage, and # double precision uses 64 bits; in double precision (the default # for numeric data), R can work with real numbers between # +/- 1.797693 * 10^308, and it can work with numbers as close to 0 # as +/- 2.225074 * 10^( -308 ) # single- and double-precision numeric objects contain 7 and 15 # significant decimal digits, respectively # characters are stored in R in what's called ASCII coding, # a method first introduced in 1963 to represent letters # and other useful characters in 7-bit binary format # 'class( )' is in R to support object-oriented programming (OOP); # 'class( )' summarizes the *class attributes* an object possesses # in OOP, objects are instances of classes (e.g., 'logical' above); # they may contain both *fields* (otherwise known as *attributes*) # and code (also known as *procedures* or *methods*) # concerning 'temp.1', for example: it knows that it's of class # 'logical' because it inherited that from the way it was created, i.e., # the logical comparison '( 2 == 2 )'; because R knows that 'temp.1' # is of class 'logical', R will resist your attempts to do something # to it that's not appropriate with logical objects (e.g., # concatenating it to a character string) # returning to discussion of logical objects: # R has all the usual logical operators, but not using words; # in R, 'and' is '&', 'or' is '|', and 'not' is '!' # '&' and '|' require a bit more explaining, because R also # has the operators '&&' and '||' # '&' is called element-wise logical 'and', whereas '&&' is just # logical 'and' (and similarly for '|' versus '||'); what this means # is as follows: # first i need to introduce the 'combine' function 'c( )'; # this just takes a list of objects and returns a vector # whose elements correspond to the things on the list: c( 1, 2 ) # [1] 1 2 # now '&' versus '&&': c( T, T ) & c( T, F ) # [1] TRUE FALSE # but c( T, T ) && c( T, F ) # [1] TRUE # in other words, '&' performs 'and' on an entire vector, but '&&' # only performs 'and' on the first component ('|' and '||' work # in an analogous manner) # following the basic rules of boolean logic, you can do arithmetic # on logical objects: 1 + T # [1] 2 # T (TRUE) is treated as 1 and F (FALSE) as 0 # an important note about 'c( )': you can combine objects of different # types with 'c( )', but if you do, R will *coerce* (a nice word for # *force*) all of the resulting objects to the highest type of any # single object in the list, in the following hierarchy: # NULL < raw < logical < integer < double < complex < character < # list < expression # so 'c( 3, 'foo' )' will force the 'double' value 3 to the # character string '3': c( 3, 'foo' ) # [1] "3" "foo" ##### (B) numeric, double and integer temp.2 <- 1 mode( temp.2 ) # [1] "numeric" typeof( temp.2 ) # [1] "double" class( temp.2 ) # [1] "numeric" # these three data types (numeric, double, integer) are used # by R to represent what we usually think of as real numbers # the default mode for a real number in R is 'numeric', even if # it looks like an integer to you (notice that 'temp.2 <- 1' looks like # temp.2 is being assigned the integer value 1, but R thinks of it # as a double-precision real number (see 'typeof( )' above) # the default for 'numeric' objects in R is double precision # there are a bunch of built-in functions that allow you to # ask R about the data type of an object; here are three examples # on the number '1': is.numeric( 1 ) # [1] TRUE is.double( 1 ) # [1] TRUE is.integer( 1 ) # [1] FALSE # so, as i said above, the number '1' defaults to double-precision mode, # which makes it 'numeric' but not 'integer' # there are also a bunch of built-in functions that allow you to # force an R object to have a particular mode/type; here's an example: temp.3 <- as.integer( 1 ) mode( temp.3 ) # [1] "numeric" typeof( temp.3 ) # [1] "integer" class( temp.3 ) # [1] "integer" # other functions like 'as.integer' include 'as.numeric', 'as.double' # and 'as.character' # --> in-the-weeds sidebar on mode, type and class begins below this line # you may at this point be confused about the differences in R # between mode, type and class; the people who wrote R make the # following confession/'explanation' on the help page for 'numeric': # It is a historical anomaly that R has two names for its # floating-point vectors, 'double' and 'numeric' (and formerly # had 'real'). 'double' is the name of the 'type'. 'numeric' is # the name of the 'mode' and also of the implicit 'class'. # as far as i can tell, what's going on here is that when R was # originally written, it was not an object-oriented (OO) system; # the developers then created an OO system within R called S3, # which purists crapped on because it didn't adhere strictly enough # to their idea of how OO systems should work; so the R developers # then created another OO system within R called S4, which is # much stricter but which then created confusion between the usages # of mode, type and class in S3 and S4 # see the R guru hadley wickham's web page # adv-r.had.co.nz/S4.html # for more on these angels-on-the-head-of-a-pin topics (as far as # day-in-day-out beginning and intermediate R coding is concerned) # --> in-the-weeds sidebar on mode, type and class ends above this line ##### (B.1) arithmetic in R # numeric scalars behave in the expected ways when # manipulated by the usual arithmetic operations: + (addition) # - (subtraction) * (multiplication) / (division) ^ (exponentiation) # (e.g., 2^3 returns 2 to the third power) # the usual precedence rules apply: in expressions such as 1 + 2 * 3 # and 1 / 2 + 3, multiplication and division take precedence over # addition and subtraction; in 1 + 2^4 * 3 , exponentiation takes # precedence over multiplication; see, e.g., # www.datamentor.io/r-programming/precedence-associativity/ # for the full chart summarizing R's preference order # i strongly recommend that you use parentheses ( ) to # clarify to yourself and R exactly what you want to do; # for example, 1 + ( 2 * 3 ) versus ( 1 + 2 ) * 3 and # 1 + ( ( 2^4 ) * 3 ) versus ( ( 1 + 2 )^4 ) * 3 versus # 1 + 2^( 4 * 3 ) # in effect, parentheses have higher precedence than all # arithmetic operations ##### (B.2) special scalars # there are 4 special scalars in R: # NULL # is an object that R returns when something you've asked R to do # results in an undefined value # NA # (not available) is R's symbol for missing data; the treatment # of missing data is an important topic in data science, to which # we'll return later # NaN # (not a number) is a logical scalar and represents the result # of you trying to do something bizarre in the real number system; # for example, 0 / 0 # returns # NaN # Inf and -Inf # stand for infinity and negative infinity, which are real values # bigger in absolute value than the largest real number R knows # how to manipulate; for example, 1 / 0 # returns # Inf # and so does 0 + 10^400 # Inf # in addition to = - * / ^, R has some other built-in # arithmetic operators as well: %/% is integer division # (e.g., 11 %/% 4 is 2; R takes the answer from real-number division, # 2.75, and tosses away the .75); and %% is modulus, which returns # the integer remainder after integer division (e.g., 9 %% 3 is 0, # and 10 %% 3 is 1) # integers are stored in a different, more compact, way than # non-integer real numbers; an object of mode/class/type 'integer' # can only have values between +/- 2147483647, which is ( 2^{ 31 } - 1 ) # earlier i defined temp.3 to be the integer '1' temp.3 + 2147483647 # [1] 2147483648 typeof( temp.3 + 2147483647 ) # [1] "double" temp.3 + as.integer( 2147483647 ) # [1] NA # Warning message: # In temp.3 + as.integer(2147483647) : NAs produced by integer overflow # what just happened was this: when i asked R to add 2147483647 # to temp.3, it interpreted 2147483647 as a double-precision real # number (the default for numeric data), and it then noticed a type # mismatch between temp.3 (integer) and 2147483647 (double-precision) # when i tried to add them; using its standard rules for resolving # type mismatches, it *coerced* temp.3 to double precision and added # it using double-precision arithmetic to 2147483647; but when i forced # R to regard both temp.3 and 2147483647 as integers, the result of # integer addition on the two objects was an integer overflow # as mentioned previously but not in detail, *coerce* is R jargon # for forcing an object into a different mode than it currently has, # typically to resolve type mismatches # you may think this stuff is too esoteric to matter, but it can have # real consequences; here's an example ##### (B.3) 'for' loops # now is as good a time as any to introduce the R structure called # the 'for-loop' (every language, going back to FORTRAN in the early # 1950s, has this sort of capability): computer are really good at doing # the same thing over again many times on different data, which # is what 'for' loops do: for ( i in 1:3 ) { print( i ) } # [1] 1 # [1] 2 # [1] 3 # the highly useful syntax '1:3' is shorthand for the vector # ( 1, 2, 3 ) in R; you can also do things like '( - 3 ):5' # and '6:2' (try them and see what they do); both arguments # in the syntax 'm:n' will typically be treated as integers by R, # but 'x:y' with 'x' and 'y' numeric also produces a result # with no error message (it may just not be what you expected): (1.2):4 # [1] 1.2 2.2 3.2 # the left and right curly brackets '{' and '}' in the 'for' loop # are delimiters that enclose the stuff you want R to do repeatedly, # in the case of the loop above as 'i' runs from 1 to 3; in this case # i could have instead used the command for ( i in 1:3 ) print( i ) # and this would have done the same thing as the 'for' loop above # with curly brackets, but i like the curly-bracket method # because it separates the thing that's being done repetitively # from the loop that defines the repetitions; in any case, if you # want to do two or more things in your 'for' loop, the # curly-bracket approach is the natural way to go # as an alternative to '{' and '}', R will allow you to issue # more than one command on a single line with the syntax # first-command ; second-command ; ... # but i like the curly-bracket approach better because it makes # my code more readable # in the 'for' loop above, 'i' is called a *dummy variable of iteration* # or a *loop index*; since no object called 'i' exists in the # current R session, R creates it, and because '1:3' implies # the integer sequence ( 1, 2, 3 ), 'i' is of mode/type 'integer': for ( i in 1:3 ) { print( is.integer( i ) ) } # [1] TRUE # [1] TRUE # [1] TRUE # this also introduces the highly useful practice in R of # calling a function with inputs that include the result of # calling another function: in the command line # print( is.integer( i ) ) # R evaluates the inner function call 'is.integer( i )' and passes # the result as an input to the built-in function 'print', # which sends a reply to the screen in your current command window # consisting of the current value of the thing being printed for ( i in 1:3 ) { i } # this code returns nothing; in other words, if you want to know # what's going on inside a 'for' loop, you have to use 'print' for ( i in 1:3 ) { print( i ) } # [1] 1 # [1] 2 # [1] 3 for ( i in 1:2147483648 ) { } # Error in for (i in 1:2147483648) { : # long vectors not supported yet: eval.c:6387 # since the iteration variable i is treated as an integer, # 'for ( i in 1:2147483648 )' created an integer overflow # this is an example of why it's good to know the data types # of all of your R objects: suppose you were running a # simulation with a really large number of simulation replications; # if you tried to run the code m <- 3000000000 # at the moment, m is double precision for ( i in 1:m ) { # here m is coerced to integer } # Error in for (i in 1:m) { : long vectors not supported yet: eval.c:6387 # without understanding data types in R, you would have no idea # what that error message meant ##### (C) character temp.4 <- 'foo' mode( temp.4 ) # [1] "character" typeof( temp.4 ) # [1] "character" class( temp.4 ) # [1] "character" # as their name implies, 'character' objects are strings of text # R will happily work with data sets in which some variables # are 'numeric' and others are 'character' # you can also mix 'numeric' and 'character' objects in printing # with the 'paste' function: paste( 'pi to 15 significant decimal digits is', pi ) # [1] "pi to 15 significant digits is 3.14159265358979" # paste coerces all of its arguments to 'character' and # concatenates the result into one big character string; this is useful # for printing out the results of a calculation in a # human-friendly manner ########################################################################## # (7) built-in operators in R for comparing objects # we already saw an example of this in creating the logical object # temp.1 <- ( 2 == 2 ) # here's a table of R's comparison operators: # R operator description # < less than # <= less than or equal to # > greater than # >= greater than or equal to # == exactly equal to # != not equal to # R declares two double-precision objects equal if and only if # they differ in absolute value by no more than 1.110223 * 10^( - 16 ); # similarly, R declares y > x if ( y - x ) is greater than # 1.110223 * 10^( - 16 ); and so on ########################################################################## # (8) operations on vectors # all of the standard arithmetic operations on vectors in R are # performed in an element-wise fashion, ; for example temp.5 <- c( 1, 2, 3, 4 ) temp.6 <- c( 2, 0, 1, 3 ) temp.5 + temp.6 # [1] 3 2 4 7 temp.5 - temp.6 # [1] -1 2 2 1 temp.5 * temp.6 # [1] 2 0 3 12 temp.5 / temp.6 # [1] 0.500000 Inf 3.000000 1.333333 # this makes R's handling of vectors the same as the usual mathematical # operations of '+' and '-' on vectors in vector spaces, but not # '*' (which would be a dot product) and '/' (undefined in vector spaces) # the logical operators also work in an element-wise way: temp.5 < temp.6 # [1] TRUE FALSE FALSE FALSE temp.5 == temp.6 #[1] FALSE FALSE FALSE FALSE # quick quiz: what would the result be of the command 'temp.5 = temp.6'? # R also allows you to perform arithmetic and logical operations # on vectors of unequal length, in some cases without issuing a # warning message; this is either a bug or a feature, depending on # your point of view temp.5 # [1] 1 2 3 4 temp.7 <- c( 2, 3, 4, 5, 6 ) temp.5 + temp.7 # [1] 3 5 7 9 7 # Warning message: # In temp.5 + temp.7 : # longer object length is not a multiple of shorter object length # what R just did was as follows: it lined the two vectors up # and correctly performed addition on the first k elements, where # k is the length of the shorter vector; it then *recycled* the first # element of the shorter vector, concatenating it to the end of the # shorter vector, and performed addition on (in this case) element # ( k + 1 ) print( temp.8 <- 2:9 ) # [1] 2 3 4 5 6 7 8 9 temp.5 # [1] 1 2 3 4 temp.5 + temp.8 # [1] 3 5 7 9 7 9 11 13 # this is even weirder; because the 'longer object length [temp.8, # length 8] *is* a multiple of the shorter object length [temp.5, # length 4]', R recycles the shorter object to bring its length # up to that of the longer object -- i.e., '1 2 3 4' becomes # '1 2 3 4 1 2 3 4' -- and it then adds the two vectors of length 8, # all without giving you a warning message # my advice: when you're doing arithmetic or logical operations # on vectors, always try to ensure that they're of the same length, # with one exception: temp.5 # [1] 1 2 3 4 3.7 * temp.5 # [1] 3.7 7.4 11.1 14.8 # this is multiplication of a vector by a scalar; R treats the scalar # 3.7 as a vector of length 1, recycles it up to the vector # ( 3.7, 3.7, 3.7, 3.7 ), and does its usual vector multiplication # element-wise, which in this case gives an answer that agrees with # the usual mathematical multiplication of a vector by a scalar # in vector spaces (adding a scalar to a vector works correctly # in the same way) ########################################################################## # (9) accessing subsets of vectors # this is done in R with left and right square brackets: temp.8 # [1] 2 3 4 5 6 7 8 9 temp.8[ 3 ] # [1] 4 # in other words, 'temp.8[ 3 ]' asks R to extract the third element # of temp.8 # if i try 'temp.8[ 3 7 ]' R replies # Error: unexpected numeric constant in "temp.8[ 3 7" # the object inside the square brackets has to be a valid vector: temp.8[ c( 3, 7 ) ] # this works fine: # [1] 4 8 temp.8[ - 1 ] # [1] 3 4 5 6 7 8 9 # this is an interesting and useful feature in R: negative indices # ask R to omit the relevant elements and return the rest # if i try 'temp.8[ c( 3, - 1 ) ]' R replies # Error in temp.8[c(3, -1)] : # only 0's may be mixed with negative subscripts # you can't mix positive and negative indices temp.8[ c( 1.47, 3.02 ) ] # [1] 2 4 # you *can* use double-precision numeric vectors as indices; R just # rounds the numeric values to integers ( temp.8 > 3.5 ) # [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE # if you try to extract an element using an index that's out of bounds # (e.g., 'temp.8' has 8 elements, so temp.8[ 11 ] is undefined), # R will respond with an 'NA': temp.8[ 11 ] # [1] NA # here's a useful trick: suppose you want to extract from temp.8 # all of its elements that are bigger than 3.5: temp.8 # [1] 2 3 4 5 6 7 8 9 temp.8[ temp.8 > 3.5 ] # [1] 4 5 6 7 8 9 # what R just did was to compute the logical vector 'temp.8 > 3.5', # which (abbreviated) is 'F F T T T T T T', and it then uses this # to select elements of temp.8 with the rule *omit if F, keep if T* # you can also use the square bracket notation to modify only some # elements of a vector and leave the rest alone: temp.8 # [1] 2 3 4 5 6 7 8 9 temp.8[ 5 ] <- 0 temp.8 # [1] 2 3 4 5 0 7 8 9 temp.8[ c( 1, 3, 6 ) ] <- c( 0.1, 0.3, 0.2 ) temp.8 # [1] 0.1 3.0 0.3 5.0 0.0 0.2 8.0 9.0 ########################################################################## # (10) naming elements of vectors in R # R allows you to name the elements of a vector: temp.9 <- c( "alice", "bob", "chloe", "daniel" ) names( temp.9 ) <- c( "sophomore", "freshman", "senior", "sophomore" ) temp.9 # sophomore freshman senior sophomore # "alice" "bob" "chloe" "daniel" # you can use the element names to pick out subsets: temp.9[ names( temp.9 ) == "sophomore" ] # sophomore sophomore # "alice" "daniel" ########################################################################## # let's clean up the current working directory # before finishing the R session: ls( ) # [1] "i" "m" "temp.0" "temp.1" "temp.2" "temp.3" "temp.4" # "temp.5" "temp.6" "temp.7" "temp.8" "temp.9" # you can use the built-in function 'rm( )' (for remove) to get rid # of objects you no longer want: rm( 'i' ) ls( ) # [1] "m" "temp.0" "temp.1" "temp.2" "temp.3" "temp.4" "temp.5" # "temp.6" "temp.7" "temp.8" "temp.9" # you don't have to surround the name of the object with quotes: rm( temp.0 ) ls( ) # [1] "m" "temp.1" "temp.2" "temp.3" "temp.4" "temp.5" "temp.6" # "temp.7" "temp.8" "temp.9" # in a strange departure from usual syntax, you can remove two or more # objects without making a vector of them with 'c( )': rm( m, temp.1 ) ls( ) # [1] "temp.2" "temp.3" "temp.4" "temp.5" "temp.6" "temp.7" "temp.8" # "temp.9" # you can remove *ALL* of the objects in the current working directory # with the command 'rm( list = ls( ) )' # the help page on 'rm( )' contains the following warning: # to remove (almost) everything in the working environment. # You will get no warning, so don't do this unless you are really sure. # rm(list=ls()) # note that i inserted some spaces in my version of this command: # 'rm( list = ls( ) )' -- i do this to improve the readability of my code # with almost no exceptions, you're free to insert spaces in this way, # except (of course) inside character strings: 'foo', ' foo', 'f oo', # 'fo o' and 'foo ' are all different objects in R, as they should be # unfortunately there is no 'unrm( )' (unremove) function in R; # once something is gone from the current working directory, it's gone # but if you save all of your commands in a .txt file like this one, # you can always re-create something you accidentally removed, # by copying and pasting the relevant code into your current session # suppose that you wanted to save only temp.6 and temp.7 # you could laboriously write out the command # rm( temp.2, temp.3, temp.4, temp.5, temp.8, temp.9 ) # but you could instead be creatively lazy -- the little code block below # uses the built-in function 'setdiff', which compares two vectors # and keeps track of the differences (in effect, this code block creates # a new function that specifies which objects to keep instead of # which to remove): print( keep <- c( 'temp.6', 'temp.7' ) ) # [1] "temp.6" "temp.7" print( remove <- setdiff( ls( ), keep ) ) # [1] "keep" "temp.2" "temp.3" "temp.4" "temp.5" "temp.8" "temp.9" rm( list = c( remove, 'remove' ) ) ls( ) # [1] "temp.6" "temp.7" # you can also treat two vectors as sets and do union and intersection # operations on them, as well as keeping track of which elements are # the same and which are different and asking questions about whether # an element is in a vector or not; the relevant built-in functions are # called 'union', 'intersect', 'setdiff' (partially illustrated above), # 'setequal', and 'is.element': temp.6 # [1] 2 0 1 3 temp.7 # [1] 2 3 4 5 6 union( temp.6, temp.7 ) # [1] 2 0 1 3 4 5 6 intersect( temp.6, temp.7 ) # [1] 2 3 setdiff( temp.6, temp.7 ) # [1] 0 1 setdiff( temp.7, temp.6 ) # [1] 4 5 6 # so 'setdiff' is asymmetric in its inputs: 'setdiff( x, y )' returns # a list of the elements of 'x' that are not in 'y' setequal( temp.6, temp.7 ) # [1] FALSE is.element( 2, temp.6 ) # [1] TRUE ########################################################################## # now get out of R, in such a way that you can get back into R # and pick up exactly where you left off: # if you want instead to exit R without saving anything, you can just # call the built-in function 'q' (for *quit*): q( ) # under the windows 10 OS, the graphical user interface (GUI) will # pop up a new window that asks if you want to 'Save workspace image?' # if you click 'Yes', a file called '.RData' will be saved in your # current working directory; if you click 'No', R will exit # without saving anything # here's a different way to exit R that allows you to name your # '.RData' file -- it would be natural for you to name this session # something like 'intro.RData'): save.image( file = 'intro.RData' ) # note: if there's already a file with that name in your current # working directory, this 'save.image' command will cheerfully # over-write your previous file, without warning # now you can say 'q( )' as above and this time *exit without # saving anything*; in the directory that you either chose from # the pull-down menu or mentioned in your 'setwd( )' command at # the beginning of this tutorial, you'll now have a file called # 'intro.RData'; if you double-left-click on that, R will start up again # and end its welcome banner with # [Previously saved workspace restored] # if you now say ls( ) # R will reply # [1] "temp.6" "temp.7" # and you can continue working in R just as if you never left # you can of course run the 'save.image( )' command anytime during # an active R session, without then leaving R; this is useful if # you're going away from your computer for an extended period of time # and you need to protect against the possibility (windows, anyone?) # that your OS will reboot while you're gone and destroy your R session ########################################################################## # R is both deep and wide -- it can do a *lot* more than i've shown # here -- but this is a good place to stop for now ########################################################################## ##########################################################################