R – Page 2 – The Research Kitchen

I put together a short intro presentation for some people explaining a little bit about R from an introductory point of view. Slides put together with R/markdown and ioslides.

Presentation here.

Here is the raw markdown if you are interested:

[code lang=”r”]
—
title: "R Introduction"
author: "Rory Winston"
date: "2 August 2014"
output:
ioslides_presentation: default
beamer_presentation:
fig_height: 6
fig_width: 8
keep_tex: yes
logo: r_logo.png
self_contained: no
fontsize: 10pt
—

## What is R?

– A Domain-Specific-Language (DSL) for statistics and data analysis
– Based on the S Programming Language
– An environment for Exploratory Data Analysis (EDA)
– A quasi-functional language with IDE and REPL
– A vectorized language with BLAS support
– A collection of over 7,000+ libraries
– A large and active community across industry and academia
– Around 20 years old (Lineage dates from 1975 – almost 40 years ago)

“`{r,echo=FALSE,message=FALSE}
options("digits"=5)
options("digits.secs"=3)

“`

## Types

– Primitives (numeric, integer, character, logical, factor)
– Data Frames
– Lists
– Tables
– Arrays
– Environments
– Others (functions, closures, promises..)

## Simple Types
“`{r,collapse=TRUE}
x <- 1
class(x)

y <- "Hello World"
class(y)

z <- TRUE
class(z)

as.integer(z)
“`

## Simple Types – Vectors

The basic type unit in R is a vector

“`{r, collapse=TRUE}
x <- c(1,2,3)
x
x <- 1:3
x[1]
x[0]
x[-1]
“`

## Generating Vectors

R provides lots of convenience functions for data generation:

“`{r,collapse=TRUE}
rep(0, 5)
seq(1,10)
seq(1,2,.1)
seq(1,2,length.out=6)
“`

## Indexing

“`{r,collapse=TRUE}
x <- c(1, 3, 4, 10, 15, 20, 50, 1, 6)
x > 10
which(x > 10)
x[x>10]
x[!x>10]
x[x<=10]
x[x>10 & x<30]
“`

## Functions {.smaller}

“`{r, collapse=TRUE}
square <- function(x) x^2
square(2)

pow <- function(x, p=2) x^p
pow(10)
pow(10,3)
pow(p=3,10)

“`

Functions can be passed as data:

“`{r,collapse=TRUE}
g <- function(x, f) f(x)
g(10, square)

h <- function(x,f,…) f(x,…)
h(10, pow, 3)
“`

## R is Vectorized

Example – multiplying two vectors:

“`{r}
mult <- function(x,y) {
z <- numeric(length(x))
for (i in 1:length(x)) {
z[i] <- x[i] * y[i]
}
z
}

mult(1:10,1:10)
“`

## R is Vectorized

Multiplying two vectors ‘the R way’:

“`{r}
1:10 * 1:10
“`

NOTE: R recycles vectors of unequal length:
“`{r}
1:10 * 1:2
“`

## NOTE: Random Number Generation

R contains a huge number of built-in random number generators for various probability distributions

“`{r}
# Normal variates, mean=0, sd=1
rnorm(10)
rnorm(10, mean=100, sd=5)
“`

Many different distributions available (the r* functions)

## Data Frames

– Data frames are the fundamental structure used in data analysis
– Similar to a database table in spirit (named columns, distinct types)

“`{r}
d <- data.frame(x=1:6, y="AUDUSD", z=c("one","two"))
d
“`

## Data Frames

Data frames can be indexed like a vector or matrix:

“`{r,collapse=TRUE}
# First row
d[1,]

# First column
d[,1]

# First and third cols, first two rows
d[1:2,c(1,3)]
“`

## Data Frames {.smaller}

Let’s generate some dummy data:
“`{r}

generateData <- function(N) data.frame(time=Sys.time()+1:N,
sym="AUDUSD",
bid=rep(1.2345,N)+runif(min=-.0010,max=.0010,N),
ask=rep(1.2356,N)+runif(min=-.0010,max=.0010,N),
exch=sample(c("EBS","RTM","CNX"),N, replace=TRUE))

prices <- generateData(50)
head(prices, 5)
“`

## Data Frames

We can add/remove columns on the fly:
“`{r}
prices$spread <- prices$ask-prices$bid
prices$mid <- (prices$bid + prices$ask) * 0.5
head(prices)
“`

## Data Frames

Some basic operations on data frames:
“`{r,collapse=TRUE}
names(prices)

table(prices$exch)

summary(prices$mid)
“`

## Data Frames {.smaller}

Operations can be applied across different dimensions of a data frame:

“`{r,collapse=TRUE}
sapply(prices,class)
“`