##########################################################################
##########################################################################

# a brief introduction to some of the most important functionality
# in R

# david draper, 13 jan 2019

# the comment symbol in R is #

# this text file can be copied and pasted into an R session, bit by bit,
# as part of learning about creating new R objects, R's data types, 
# operations on vectors, how to write functions, and simple plotting

##########################################################################

# see 

#   cran.r-project.org/doc/manuals/r-release/R-intro.html

# and

#   cran.r-project.org/doc/manuals/r-release/R-lang.html

# for (voluminous) detail on the R language

# if you're *completely* new to R, i recommend working through
# Appendix A ('A sample session') in the first URL above; by
# 'working through' i mean copy (one by one) all of the R commands 
# listed there into your own R session and pay attention to
# what comes out

# you might work all the way through this file first, and then
# go to Appendix A; you should then find that you understand most or all
# of what's going on in that Appendix

##########################################################################

# i work with R in the windows 10 operating system (OS) in an 
# old-school fashion: i start an R session and i also open a .txt file 
# with notepad

# i formulate my R code in the .txt file, editing it until
# it looks right; i then copy and paste it into my R session and run it;
# i copy whatever R says in return and paste R's reply into
# the .txt file

# when fully active, my screen has (1) a large R session window
# with command and plotting areas and (2) a .txt file recording
# both sides of my conversation with R

# you may prefer instead to learn how to use 'RStudio'; it's a freeware
# environment that maintains those same 3 ingredients (command
# window, plotting window, conversation window) for you automatically

##########################################################################

# preliminaries in the windows 10 OS (with analogous operations 
# under Mac OS X or Linux):

# (1) before starting R, create a directory in which you want
# the results of your R session to be stored (this is also where
# you want to create and save your .txt file if you use my
# old-school method)

# for example, your main directory for this course might be called
# 'AMS-206', and you might make a sub-directory for this introductory
# session called (e.g.) 'R-Introduction'

# (2) start up R; you'll get a welcome banner something like this: 

# --> welcome banner begins below this line

#   R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
#   Copyright (C) 2018 The R Foundation for Statistical Computing
#   Platform: x86_64-w64-mingw32/x64 (64-bit)

#   R is free software and comes with ABSOLUTELY NO WARRANTY.
#   You are welcome to redistribute it under certain conditions.
#   Type 'license()' or 'licence()' for distribution details.

#     Natural language support but running in an English locale

#   R is a collaborative project with many contributors.
#   Type 'contributors()' for more information and
#   'citation()' on how to cite R or R packages in publications.

#   Type 'demo()' for some demos, 'help()' for on-line help, or
#   'help.start()' for an HTML browser interface to help.
#   Type 'q()' to quit R.

#     [Previously saved workspace restored]

# --> welcome banner ends above this line

# --> sidebar on 'help( )' begins below this line

# important: 'help( )' is an extremely useful built-in function;
# if there's a function named 'foo', the command 'help( foo )'
# will open a window in your default browser (if you're online)
# and display the official R help page about 'foo' (these pages
# assume some familiarity with R, and may be hard to understand
# at first, but you'll get better at interpreting them over time)

# --> sidebar on 'help( )' ends above this line

# (3) click on the Misc menu and uncheck the Buffered output option 
# (this will force R to converse with you continuously rather than 
# storing up a lot of its replies in a buffer and only replying 
# to you when the buffer is full)

# click on the File menu and choose 'Change dir...'

# click through your file system tree until you come to
# the directory you created earlier, where you want to store
# the results from this introductory session; click OK when you
# arrive at this directory

# at any point, when R has finished doing what you've asked it to do,
# it will prompt you to issue another command with the
# character string '> ':

#   > 

# commands in R (things you type in at the '>' prompt) involve
# two types of things: (a) expressions and (b) functions 

# expressions are things that R evaluates for you, and when you work
# with functions you'll either be calling them or creating them

# an example of an expression evaluation is

1 + 1

# to which R replies

# [1] 2

# (i'll explain the '[1]' later)

# R has a rich set of built-in functions -- earlier i called 'help( foo )'
# a command, but it's really a call to the built-in function 'help'

# --> sidebar on functions begins below this line

# what is a function in R? as with all other programming languages,
# it's a structure that accepts inputs and produces outputs

# the generic structure of an R function that's prepared to accept
# k inputs is

#   function.name( input.1, input.2, ..., input.k ) {

#     (here the function does things for you)

#   }

# as an example of function inputs, if you type 'help( help )' 
# you'll get a web page that includes the following description:

#   Description

#   help is the primary interface to the help systems.

#   Usage

#   help(topic, package = NULL, lib.loc = NULL,
#        verbose = getOption("verbose"),
#        try.all.packages = getOption("help.try.all.packages"),
#        help_type = getOption("help_type"))

# so 'help' is prepared to accept 6 inputs: (topic, package, ...,
# help_type)

# some inputs to built-in R functions have *default* values:
# if you don't mention those inputs in your function call,
# R will use the built-in default value for that input,
# which is listed on the help page for that function using
# the syntax 'input = default.input' (for example, if you just
# type the command 'help( )', the input 'package' will default
# to 'NULL' (a special keyword in R meaning essentially 'an object
# that exists but has no value'), and so on for the other inputs

# --> sidebar on functions ends above this line

# (3) to see what the current directory is at any time in R,
# you can use the built-in function call

#   getwd( )

# which stands for 'get working directory'

# in my case, on my home desktop, this is

#   getwd( )

#   [1] "C:/DD/Teaching/AMS-206/Winter-2019"

# --> sidebar on '[1]' begins below this line

# what's the deal with '[1]'? if the output of a function
# is a list of things, R lists them and 'helpfully' tells you which
# thing is first on the list; if the list is too long to fit on
# one line, R continues it on a second line and begins the second line
# with something like '[18] ...', which tells you that the first line
# had 17 elements of the list in it and the 18th element is at the
# beginning of the second line; and so on

# --> sidebar on '[1]' ends above this line

# if you know the exact path to the directory you want,
# instead of the menu-based 'Change dir...' approach
# you can use the *set working directory* function 'setwd':

#   setwd( 'C:/DD/Teaching/AMS-206/Winter-2019' )

# does the same thing as the menu-based method, but to use 'setwd'
# you have to know the exact absolute path

# (4) you might begin your conversation with R with the 
# built-in function call

ls( )

# this lists (almost) all of the objects that currently
# exist in your current working directory; here, because nothing
# is there, R replies

#   character(0)

# which is its way of saying that this directory is currently
# (almost) empty; you can instead try

ls( all.names = T )

# to get (almost almost) all defined objects

# --> sidebar on 'T' and 'F' begins below this line

# note: 'T' and 'F' are permissible abbreviations for 'TRUE'
# and 'FALSE', respectively in R

# --> sidebar on 'T' and 'F' ends above this line

# to my second ls( ... ) command R replies

#   [1] ".Random.seed"

# well, what the hell is that? whenever the attributes of an object 
# in an R session are unknown to you, i recommend using the 'str' 
# (structure) function:

str( .Random.seed )

# to which R replies

#   int [1:626] 403 90 1363676567 1985947481 -1428087515 ...

# explanation: all of R's pseudo-random number generators need 
# 'random seeds' for initialization; these are integers from 
# - big to + big, where big seems to be on the order of about 10^9

# there are other invisible objects in a brand-new R directory
# (an example is .Machine , which provides useful information
# about things like the largest base-10 real number that R
# can handle (1.797693e+308 in 64-bit mode)), but this is getting
# too far down in the weeds for beginning R coders

##########################################################################

# (5) the assignment operators

# your R working directory is currently (essentially) empty;
# how do you fill it with stuff that will help you get your work done?

# there are two main ways: you can read stuff in from outside R
# (i'll cover that in another tutorial file), or you can *assign* 
# R objects values on the command line

# the two assignment operators in R are '<-' and '='

temp.0 <- 1

# '<-' is meant to look like a left arrow, so you could read
# the command above as '1 goes to temp.0' or 'temp.0 gets 1'
# or (more directly) 'temp.0 is assigned the value 1'

# if the R object temp.0 doesn't already exist in your current
# directory, R creates it and assigns the numeric value 1 to it;
# if temp.0 already exists, R over-writes its previous value with 1

temp.0 = 1

# this has exactly the same effect as 'temp.0 <- 1'

# i strongly recommend that you stick with '<-', because it's easy
# to confuse '=' with '==', which has a completely different meaning
# (explained below)

ls( )

# [1] "temp.0"

# now the current working directory has one object in it, 'temp.0'

# --> additional comment on '=' begins below this line

# as noted above, '=' has a completely different meaning 
# inside a function call: in the command 'ls( all.names = T )' above,
# all '=' is doing is providing the function 'ls' with a non-default
# value of its input 'all.names'

# --> additional comment on '=' ends above this line

##########################################################################

# (6) some comments on data types and objects in R:

# R supports a wide variety of data structures, including vector,
# list, matrix, data frame, factor and table

# in R, what we usually think of mathematically as a scalar 
# is just a vector with only one element

# you can find out about the various kinds of R data types
# with the built-in function 'typeof( )' -- to paraphrase its 
# R help page:

#   typeof( x ) determines the (R internal) type or storage mode 
#   of any R object x; typeof( ) returns a character string

#   the most important current possible values are the vector types 

#   'logical', 'numeric', 'double', 'integer', 'complex', 'character'

# (i won't talk about 'complex', because we won't use any complex numbers
# in this class)

#   all of these data types apply to scalars as well as vectors; another
#   important vector type is 'list' (more about that in a later
#   tutorial file)

#   more esoteric data types in R, for hardcore coders: 

#   "single", "raw", "list", "NULL", "closure" (function), "special" and 
#   "builtin" (basic functions and operators), "environment", "S4" 
#   (some S4 objects) and others that are unlikely to be seen at user 
#   level ("symbol", "pairlist", "promise", "language", "char", "...", 
#   "any", "expression", "externalptr", "bytecode" and "weakref")

##### (A) logical

temp.1 <- ( 2 == 2 )

# '==' is a comparison operator (there will be more about this topic
# below); '==' asks the question: are the things to the left and right 
# of '==' identical?

# there are three ways to assess the kind of object you're working with
# in R:

mode( temp.1 )

# [1] "logical"

typeof( temp.1 )

# [1] "logical"

class( temp.1 )

# [1] "logical"

# fortunately they all agree when the object is logical, but
# they don't always agree on other kinds of objects (see below)

# 'mode( )' and 'typeof( )' both identify the internal storage type 
# of an object (i.e., how R actually stores the object in memory),
# except that 'typeof( )' sometimes gives more detail than 'mode( )'
# (see *numeric* below)

# contemporary computers all use base 2 (binary, 0 and 1) for
# internal representations of what we would think of as 
# real-world objects

# a single binary digit is called a *bit*; 8 bits together form 
# a *byte*; in byte storage the integer 1 would be 00000001, 
# and you can see that a single byte can store any integer 
# between 0 and ( 2^8 - 1 ) = 255

# non-integer real numbers are represented in contemporary computers 
# in either *single-* or *double-precision floating-point* format;
# single precision uses 32 adjacent bits of storage, and 
# double precision uses 64 bits; in double precision (the default
# for numeric data), R can work with real numbers between 
# +/- 1.797693 * 10^308, and it can work with numbers as close to 0
# as +/- 2.225074 * 10^( -308 )

# single- and double-precision numeric objects contain 7 and 15 
# significant decimal digits, respectively

# characters are stored in R in what's called ASCII coding,
# a method first introduced in 1963 to represent letters
# and other useful characters in 7-bit binary format

# 'class( )' is in R to support object-oriented programming (OOP);
# 'class( )' summarizes the *class attributes* an object possesses

# in OOP, objects are instances of classes (e.g., 'logical' above);
# they may contain both *fields* (otherwise known as *attributes*)
# and code (also known as *procedures* or *methods*)

# concerning 'temp.1', for example: it knows that it's of class
# 'logical' because it inherited that from the way it was created, i.e.,
# the logical comparison '( 2 == 2 )'; because R knows that 'temp.1'
# is of class 'logical', R will resist your attempts to do something
# to it that's not appropriate with logical objects (e.g.,
# concatenating it to a character string)

# returning to discussion of logical objects:

# R has all the usual logical operators, but not using words;
# in R, 'and' is '&', 'or' is '|', and 'not' is '!'

# '&' and '|' require a bit more explaining, because R also
# has the operators '&&' and '||'

# '&' is called element-wise logical 'and', whereas '&&' is just
# logical 'and' (and similarly for '|' versus '||'); what this means
# is as follows:

# first i need to introduce the 'combine' function 'c( )';
# this just takes a list of objects and returns a vector
# whose elements correspond to the things on the list:

c( 1, 2 )

# [1] 1 2

# now '&' versus '&&':

c( T, T ) & c( T, F )

# [1] TRUE FALSE

# but

c( T, T ) && c( T, F )

# [1] TRUE

# in other words, '&' performs 'and' on an entire vector, but '&&'
# only performs 'and' on the first component ('|' and '||' work
# in an analogous manner)

# following the basic rules of boolean logic, you can do arithmetic
# on logical objects:

1 + T

# [1] 2

# T (TRUE) is treated as 1 and F (FALSE) as 0

# an important note about 'c( )': you can combine objects of different
# types with 'c( )', but if you do, R will *coerce* (a nice word for
# *force*) all of the resulting objects to the highest type of any
# single object in the list, in the following hierarchy:

#   NULL < raw < logical < integer < double < complex < character < 
#   list < expression

# so 'c( 3, 'foo' )' will force the 'double' value 3 to the
# character string '3':

c( 3, 'foo' )

# [1] "3"   "foo"

##### (B) numeric, double and integer

temp.2 <- 1

mode( temp.2 )

# [1] "numeric"

typeof( temp.2 )

# [1] "double"

class( temp.2 )

# [1] "numeric"

# these three data types (numeric, double, integer) are used
# by R to represent what we usually think of as real numbers

# the default mode for a real number in R is 'numeric', even if
# it looks like an integer to you (notice that 'temp.2 <- 1' looks like
# temp.2 is being assigned the integer value 1, but R thinks of it
# as a double-precision real number (see 'typeof( )' above)

# the default for 'numeric' objects in R is double precision

# there are a bunch of built-in functions that allow you to
# ask R about the data type of an object; here are three examples
# on the number '1':

is.numeric( 1 )

# [1] TRUE

is.double( 1 )

# [1] TRUE

is.integer( 1 )

# [1] FALSE

# so, as i said above, the number '1' defaults to double-precision mode,
# which makes it 'numeric' but not 'integer'

# there are also a bunch of built-in functions that allow you to
# force an R object to have a particular mode/type; here's an example:

temp.3 <- as.integer( 1 )

mode( temp.3 )

# [1] "numeric"

typeof( temp.3 )

# [1] "integer"

class( temp.3 )

# [1] "integer"

# other functions like 'as.integer' include 'as.numeric', 'as.double'
# and 'as.character'

# --> in-the-weeds sidebar on mode, type and class begins below this line

# you may at this point be confused about the differences in R
# between mode, type and class; the people who wrote R make the
# following confession/'explanation' on the help page for 'numeric':

#   It is a historical anomaly that R has two names for its 
#   floating-point vectors, 'double' and 'numeric' (and formerly 
#   had 'real'). 'double' is the name of the 'type'. 'numeric' is 
#   the name of the 'mode' and also of the implicit 'class'.

# as far as i can tell, what's going on here is that when R was
# originally written, it was not an object-oriented (OO) system;
# the developers then created an OO system within R called S3,
# which purists crapped on because it didn't adhere strictly enough
# to their idea of how OO systems should work; so the R developers
# then created another OO system within R called S4, which is
# much stricter but which then created confusion between the usages
# of mode, type and class in S3 and S4

# see the R guru hadley wickham's web page

#  adv-r.had.co.nz/S4.html

# for more on these angels-on-the-head-of-a-pin topics (as far as
# day-in-day-out beginning and intermediate R coding is concerned)

# --> in-the-weeds sidebar on mode, type and class ends above this line

##### (B.1) arithmetic in R

# numeric scalars behave in the expected ways when
# manipulated by the usual arithmetic operations: + (addition)
# - (subtraction) * (multiplication) / (division) ^ (exponentiation)
# (e.g., 2^3 returns 2 to the third power)

# the usual precedence rules apply: in expressions such as 1 + 2 * 3
# and 1 / 2 + 3, multiplication and division take precedence over 
# addition and subtraction; in 1 + 2^4 * 3 , exponentiation takes 
# precedence over multiplication; see, e.g.,

#   www.datamentor.io/r-programming/precedence-associativity/

# for the full chart summarizing R's preference order

# i strongly recommend that you use parentheses ( ) to
# clarify to yourself and R exactly what you want to do;
# for example, 1 + ( 2 * 3 ) versus ( 1 + 2 ) * 3 and
# 1 + ( ( 2^4 ) * 3 ) versus ( ( 1 + 2 )^4 ) * 3 versus
# 1 + 2^( 4 * 3 )

# in effect, parentheses have higher precedence than all
# arithmetic operations

##### (B.2) special scalars

# there are 4 special scalars in R:

#   NULL

# is an object that R returns when something you've asked R to do
# results in an undefined value

#   NA

# (not available) is R's symbol for missing data; the treatment
# of missing data is an important topic in data science, to which
# we'll return later

#   NaN

# (not a number) is a logical scalar and represents the result
# of you trying to do something bizarre in the real number system;
# for example, 

0 / 0

# returns

#   NaN

#   Inf and -Inf

# stand for infinity and negative infinity, which are real values
# bigger in absolute value than the largest real number R knows
# how to manipulate; for example,

1 / 0

# returns

#   Inf

# and so does

0 + 10^400

#   Inf

# in addition to = - * / ^, R has some other built-in 
# arithmetic operators as well: %/% is integer division 
# (e.g., 11 %/% 4 is 2; R takes the answer from real-number division, 
# 2.75, and tosses away the .75); and %% is modulus, which returns 
# the integer remainder after integer division (e.g., 9 %% 3 is 0, 
# and 10 %% 3 is 1)

# integers are stored in a different, more compact, way than 
# non-integer real numbers; an object of mode/class/type 'integer'
# can only have values between +/- 2147483647, which is ( 2^{ 31 } - 1 )

# earlier i defined temp.3 to be the integer '1'

temp.3 + 2147483647

# [1] 2147483648

typeof( temp.3 + 2147483647 )

# [1] "double"

temp.3 + as.integer( 2147483647 )

# [1] NA
# Warning message:
# In temp.3 + as.integer(2147483647) : NAs produced by integer overflow

# what just happened was this: when i asked R to add 2147483647
# to temp.3, it interpreted 2147483647 as a double-precision real
# number (the default for numeric data), and it then noticed a type
# mismatch between temp.3 (integer) and 2147483647 (double-precision)
# when i tried to add them; using its standard rules for resolving
# type mismatches, it *coerced* temp.3 to double precision and added
# it using double-precision arithmetic to 2147483647; but when i forced
# R to regard both temp.3 and 2147483647 as integers, the result of
# integer addition on the two objects was an integer overflow

# as mentioned previously but not in detail, *coerce* is R jargon 
# for forcing an object into a different mode than it currently has, 
# typically to resolve type mismatches

# you may think this stuff is too esoteric to matter, but it can have
# real consequences; here's an example

##### (B.3) 'for' loops

# now is as good a time as any to introduce the R structure called
# the 'for-loop' (every language, going back to FORTRAN in the early
# 1950s, has this sort of capability): computer are really good at doing
# the same thing over again many times on different data, which
# is what 'for' loops do:

for ( i in 1:3 ) {

  print( i )

}

# [1] 1
# [1] 2
# [1] 3

# the highly useful syntax '1:3' is shorthand for the vector 
# ( 1, 2, 3 ) in R; you can also do things like '( - 3 ):5'
# and '6:2' (try them and see what they do); both arguments
# in the syntax 'm:n' will typically be treated as integers by R,
# but 'x:y' with 'x' and 'y' numeric also produces a result
# with no error message (it may just not be what you expected):

(1.2):4

# [1] 1.2 2.2 3.2

# the left and right curly brackets '{' and '}' in the 'for' loop
# are delimiters that enclose the stuff you want R to do repeatedly, 
# in the case of the loop above as 'i' runs from 1 to 3; in this case 
# i could have instead used the command

for ( i in 1:3 ) print( i )

# and this would have done the same thing as the 'for' loop above
# with curly brackets, but i like the curly-bracket method
# because it separates the thing that's being done repetitively
# from the loop that defines the repetitions; in any case, if you
# want to do two or more things in your 'for' loop, the
# curly-bracket approach is the natural way to go

# as an alternative to '{' and '}', R will allow you to issue 
# more than one command on a single line with the syntax

#   first-command ; second-command ; ...

# but i like the curly-bracket approach better because it makes
# my code more readable

# in the 'for' loop above, 'i' is called a *dummy variable of iteration*
# or a *loop index*; since no object called 'i' exists in the 
# current R session, R creates it, and because '1:3' implies 
# the integer sequence ( 1, 2, 3 ), 'i' is of mode/type 'integer':

for ( i in 1:3 ) {

  print( is.integer( i ) )

}

# [1] TRUE
# [1] TRUE
# [1] TRUE

# this also introduces the highly useful practice in R of
# calling a function with inputs that include the result of
# calling another function: in the command line 

#    print( is.integer( i ) )

# R evaluates the inner function call 'is.integer( i )' and passes 
# the result as an input to the built-in function 'print', 
# which sends a reply to the screen in your current command window 
# consisting of the current value of the thing being printed

for ( i in 1:3 ) {

  i

}

# this code returns nothing; in other words, if you want to know
# what's going on inside a 'for' loop, you have to use 'print' 

for ( i in 1:3 ) {

  print( i )

}

# [1] 1
# [1] 2
# [1] 3

for ( i in 1:2147483648 ) { 

}

# Error in for (i in 1:2147483648) { : 
#   long vectors not supported yet: eval.c:6387

# since the iteration variable i is treated as an integer,
# 'for ( i in 1:2147483648 )' created an integer overflow

# this is an example of why it's good to know the data types
# of all of your R objects: suppose you were running a
# simulation with a really large number of simulation replications;
# if you tried to run the code

m <- 3000000000        # at the moment, m is double precision
     
for ( i in 1:m ) {     # here m is coerced to integer

}

# Error in for (i in 1:m) { : long vectors not supported yet: eval.c:6387

# without understanding data types in R, you would have no idea
# what that error message meant

##### (C) character

temp.4 <- 'foo'

mode( temp.4 )

# [1] "character"

typeof( temp.4 )

# [1] "character"

class( temp.4 )

# [1] "character"

# as their name implies, 'character' objects are strings of text

# R will happily work with data sets in which some variables
# are 'numeric' and others are 'character'

# you can also mix 'numeric' and 'character' objects in printing
# with the 'paste' function:

paste( 'pi to 15 significant decimal digits is', pi )

# [1] "pi to 15 significant digits is 3.14159265358979"

# paste coerces all of its arguments to 'character' and
# concatenates the result into one big character string; this is useful
# for printing out the results of a calculation in a 
# human-friendly manner

##########################################################################

# (7) built-in operators in R for comparing objects

# we already saw an example of this in creating the logical object

#   temp.1 <- ( 2 == 2 )

# here's a table of R's comparison operators:

# R operator         description

#     <         less than
#     <=        less than or equal to
#     >         greater than
#     >=        greater than or equal to
#     ==        exactly equal to
#     !=        not equal to

# R declares two double-precision objects equal if and only if
# they differ in absolute value by no more than 1.110223 * 10^( - 16 );
# similarly, R declares y > x if ( y - x ) is greater than
# 1.110223 * 10^( - 16 ); and so on

##########################################################################

# (8) operations on vectors

# all of the standard arithmetic operations on vectors in R are
# performed in an element-wise fashion, ; for example

temp.5 <- c( 1, 2, 3, 4 )

temp.6 <- c( 2, 0, 1, 3 )

temp.5 + temp.6

# [1] 3 2 4 7

temp.5 - temp.6

# [1] -1  2  2  1

temp.5 * temp.6

# [1]  2  0  3 12

temp.5 / temp.6

# [1] 0.500000      Inf 3.000000 1.333333

# this makes R's handling of vectors the same as the usual mathematical
# operations of '+' and '-' on vectors in vector spaces, but not 
# '*' (which would be a dot product) and '/' (undefined in vector spaces)

# the logical operators also work in an element-wise way:

temp.5 < temp.6

# [1] TRUE FALSE FALSE FALSE

temp.5 == temp.6

#[1] FALSE FALSE FALSE FALSE

# quick quiz: what would the result be of the command 'temp.5 = temp.6'?

# R also allows you to perform arithmetic and logical operations
# on vectors of unequal length, in some cases without issuing a 
# warning message; this is either a bug or a feature, depending on 
# your point of view

temp.5

# [1] 1 2 3 4

temp.7 <- c( 2, 3, 4, 5, 6 )

temp.5 + temp.7

# [1] 3 5 7 9 7

# Warning message:
# In temp.5 + temp.7 :
#   longer object length is not a multiple of shorter object length

# what R just did was as follows: it lined the two vectors up
# and correctly performed addition on the first k elements, where
# k is the length of the shorter vector; it then *recycled* the first
# element of the shorter vector, concatenating it to the end of the
# shorter vector, and performed addition on (in this case) element
# ( k + 1 )

print( temp.8 <- 2:9 )

# [1] 2 3 4 5 6 7 8 9

temp.5

# [1] 1 2 3 4

temp.5 + temp.8

# [1]  3  5  7  9  7  9 11 13

# this is even weirder; because the 'longer object length [temp.8, 
# length 8] *is* a multiple of the shorter object length [temp.5,
# length 4]', R recycles the shorter object to bring its length
# up to that of the longer object -- i.e., '1 2 3 4' becomes
# '1 2 3 4 1 2 3 4' -- and it then adds the two vectors of length 8,
# all without giving you a warning message

# my advice: when you're doing arithmetic or logical operations 
# on vectors, always try to ensure that they're of the same length,
# with one exception:

temp.5

# [1] 1 2 3 4

3.7 * temp.5

# [1]  3.7  7.4 11.1 14.8

# this is multiplication of a vector by a scalar; R treats the scalar
# 3.7 as a vector of length 1, recycles it up to the vector
# ( 3.7, 3.7, 3.7, 3.7 ), and does its usual vector multiplication
# element-wise, which in this case gives an answer that agrees with
# the usual mathematical multiplication of a vector by a scalar
# in vector spaces (adding a scalar to a vector works correctly
# in the same way)

##########################################################################

# (9) accessing subsets of vectors

# this is done in R with left and right square brackets:

temp.8

# [1] 2 3 4 5 6 7 8 9

temp.8[ 3 ]

# [1] 4

# in other words, 'temp.8[ 3 ]' asks R to extract the third element
# of temp.8

# if i try 'temp.8[ 3 7 ]' R replies

# Error: unexpected numeric constant in "temp.8[ 3 7"

# the object inside the square brackets has to be a valid vector:

temp.8[ c( 3, 7 ) ]          # this works fine:

# [1] 4 8

temp.8[ - 1 ]

# [1] 3 4 5 6 7 8 9

# this is an interesting and useful feature in R: negative indices
# ask R to omit the relevant elements and return the rest

# if i try 'temp.8[ c( 3, - 1 ) ]' R replies

# Error in temp.8[c(3, -1)] : 
#   only 0's may be mixed with negative subscripts

# you can't mix positive and negative indices

temp.8[ c( 1.47, 3.02 ) ]

# [1] 2 4

# you *can* use double-precision numeric vectors as indices; R just
# rounds the numeric values to integers

( temp.8 > 3.5 )

# [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

# if you try to extract an element using an index that's out of bounds
# (e.g., 'temp.8' has 8 elements, so temp.8[ 11 ] is undefined),
# R will respond with an 'NA':

temp.8[ 11 ]

# [1] NA

# here's a useful trick: suppose you want to extract from temp.8
# all of its elements that are bigger than 3.5:

temp.8

# [1] 2 3 4 5 6 7 8 9

temp.8[ temp.8 > 3.5 ]

# [1] 4 5 6 7 8 9

# what R just did was to compute the logical vector 'temp.8 > 3.5',
# which (abbreviated) is 'F F T T T T T T', and it then uses this
# to select elements of temp.8 with the rule *omit if F, keep if T*

# you can also use the square bracket notation to modify only some
# elements of a vector and leave the rest alone:

temp.8

# [1] 2 3 4 5 6 7 8 9

temp.8[ 5 ] <- 0

temp.8

# [1] 2 3 4 5 0 7 8 9

temp.8[ c( 1, 3, 6 ) ] <- c( 0.1, 0.3, 0.2 )

temp.8

# [1] 0.1 3.0 0.3 5.0 0.0 0.2 8.0 9.0

##########################################################################

# (10) naming elements of vectors in R

# R allows you to name the elements of a vector:

temp.9 <- c( "alice", "bob", "chloe", "daniel" )

names( temp.9 ) <- c( "sophomore", "freshman", "senior", "sophomore" )

temp.9

# sophomore  freshman    senior sophomore 
#   "alice"     "bob"   "chloe"  "daniel"

# you can use the element names to pick out subsets:

temp.9[ names( temp.9 ) == "sophomore" ]

# sophomore sophomore 
#   "alice"  "daniel"

##########################################################################

# let's clean up the current working directory 
# before finishing the R session:

ls( )

# [1] "i"      "m"      "temp.0" "temp.1" "temp.2" "temp.3" "temp.4" 
# "temp.5" "temp.6" "temp.7" "temp.8" "temp.9"

# you can use the built-in function 'rm( )' (for remove) to get rid
# of objects you no longer want:

rm( 'i' )

ls( )

# [1] "m"      "temp.0" "temp.1" "temp.2" "temp.3" "temp.4" "temp.5" 
# "temp.6" "temp.7" "temp.8" "temp.9"

# you don't have to surround the name of the object with quotes:

rm( temp.0 )

ls( )

# [1] "m"      "temp.1" "temp.2" "temp.3" "temp.4" "temp.5" "temp.6" 
# "temp.7" "temp.8" "temp.9"

# in a strange departure from usual syntax, you can remove two or more
# objects without making a vector of them with 'c( )':

rm( m, temp.1 )

ls( )

# [1] "temp.2" "temp.3" "temp.4" "temp.5" "temp.6" "temp.7" "temp.8" 
# "temp.9"

# you can remove *ALL* of the objects in the current working directory
# with the command 'rm( list = ls( ) )'

# the help page on 'rm( )' contains the following warning:

#   to remove (almost) everything in the working environment.
#   You will get no warning, so don't do this unless you are really sure.

#   rm(list=ls())

# note that i inserted some spaces in my version of this command:
# 'rm( list = ls( ) )' -- i do this to improve the readability of my code

# with almost no exceptions, you're free to insert spaces in this way,
# except (of course) inside character strings: 'foo', ' foo', 'f oo',
# 'fo o' and 'foo ' are all different objects in R, as they should be

# unfortunately there is no 'unrm( )' (unremove) function in R;
# once something is gone from the current working directory, it's gone

# but if you save all of your commands in a .txt file like this one,
# you can always re-create something you accidentally removed,
# by copying and pasting the relevant code into your current session

# suppose that you wanted to save only temp.6 and temp.7

# you could laboriously write out the command

#   rm( temp.2, temp.3, temp.4, temp.5, temp.8, temp.9 )

# but you could instead be creatively lazy -- the little code block below
# uses the built-in function 'setdiff', which compares two vectors
# and keeps track of the differences (in effect, this code block creates
# a new function that specifies which objects to keep instead of
# which to remove):

print( keep <- c( 'temp.6', 'temp.7' ) )

# [1] "temp.6" "temp.7"

print( remove <- setdiff( ls( ), keep ) )

# [1] "keep"   "temp.2" "temp.3" "temp.4" "temp.5" "temp.8" "temp.9"

rm( list = c( remove, 'remove' ) )

ls( )

# [1] "temp.6" "temp.7"

# you can also treat two vectors as sets and do union and intersection
# operations on them, as well as keeping track of which elements are
# the same and which are different and asking questions about whether
# an element is in a vector or not; the relevant built-in functions are
# called 'union', 'intersect', 'setdiff' (partially illustrated above),
# 'setequal', and 'is.element':

temp.6

# [1] 2 0 1 3

temp.7

# [1] 2 3 4 5 6

union( temp.6, temp.7 )

# [1] 2 0 1 3 4 5 6

intersect( temp.6, temp.7 )

# [1] 2 3

setdiff( temp.6, temp.7 )

# [1] 0 1

setdiff( temp.7, temp.6 )

# [1] 4 5 6

# so 'setdiff' is asymmetric in its inputs: 'setdiff( x, y )' returns
# a list of the elements of 'x' that are not in 'y'

setequal( temp.6, temp.7 )

# [1] FALSE

is.element( 2, temp.6 )

# [1] TRUE

##########################################################################

# now get out of R, in such a way that you can get back into R
# and pick up exactly where you left off:

# if you want instead to exit R without saving anything, you can just
# call the built-in function 'q' (for *quit*):

q( )

# under the windows 10 OS, the graphical user interface (GUI) will 
# pop up a new window that asks if you want to 'Save workspace image?' 
# if you click 'Yes', a file called '.RData' will be saved in your 
# current working directory; if you click 'No', R will exit 
# without saving anything

# here's a different way to exit R that allows you to name your 
# '.RData' file -- it would be natural for you to name this session
# something like 'intro.RData'):

save.image( file = 'intro.RData' )

# note: if there's already a file with that name in your current
# working directory, this 'save.image' command will cheerfully 
# over-write your previous file, without warning 

# now you can say 'q( )' as above and this time *exit without 
# saving anything*; in the directory that you either chose from 
# the pull-down menu or mentioned in your 'setwd( )' command at 
# the beginning of this tutorial, you'll now have a file called 
# 'intro.RData'; if you double-left-click on that, R will start up again 
# and end its welcome banner with

#   [Previously saved workspace restored]

# if you now say

ls( )

# R will reply

# [1] "temp.6" "temp.7"

# and you can continue working in R just as if you never left

# you can of course run the 'save.image( )' command anytime during
# an active R session, without then leaving R; this is useful if
# you're going away from your computer for an extended period of time
# and you need to protect against the possibility (windows, anyone?)
# that your OS will reboot while you're gone and destroy your R session

##########################################################################

# R is both deep and wide -- it can do a *lot* more than i've shown
# here -- but this is a good place to stop for now

##########################################################################
##########################################################################