Functional selects/updates are a relatively trick topic in kdb+ – mainly as the syntax takes a lot of getting used to. They are normally required when there are some dynamic elements in e.g. column selection or grouping criteria.

They are pretty well covered in Q For Mortals, but I wanted to add a couple of examples…combining functional select and update for example:

To start, load the sample tables in sp.q..we will use the table called ‘p’:

\l sp.q
q)p
p | name  color weight city   w
--| ----------------------------
p1| nut   red   12     london 91
p2| bolt  green 17     paris  91
p3| screw blue  17     rome   91
p4| screw red   14     london 91
p5| cam   blue  12     paris  91
p6| cog   red   19     london 91


### Functional Selects

A simple select from p with some criteria:

q)select from p where city=london
p | name  color weight city   w
--| ----------------------------
p1| nut   red   12     london 91
p4| screw red   14     london 91
p6| cog   red   19     london 91


Now lets look at the parse tree for this query:

q)parse "select from p where city=london"
?
p
,,(=;city;,london)
0b
()


Now in order to convert this to a functional select, we need to turn this parse tree into an executable statement using the ? operator.

The basic form of the ? operator is

?[tablename;(select criteria);(grouping criteria);(columns)]


The parse tree above gives us each of the four elements in the right order – we just need to convert them to a valid functional syntax. For the example above this translates to:

q)?[p;enlist (=;city;enlist london);0b;()]
p | name  color weight city   w
--| ----------------------------
p1| nut   red   12     london 91
p4| screw red   14     london 91
p6| cog   red   19     london 91


So if we want to add e.g. column selection:

q)select name,color,weight from p where city=london
name  color weight
------------------
nut   red   12
screw red   14
cog   red   19


The parse tree looks like:


q)parse"select name,color,weight from p where city=london"
?
p
,,(=;city;,london)
0b
namecolorweight!namecolorweight


In this case we have added a dictionary mapping selected columns to their output names:

q)?[p;enlist (=;city;enlist london);0b;(namecolorweight)!(namecolorweight)]
name  color weight
------------------
nut   red   12
screw red   14
cog   red   19


Similarly, we can change the select criteria:

q)select name,color,weight from p where city in londonparis
name  color weight
------------------
nut   red   12
bolt  green 17
screw red   14
cam   blue  12
cog   red   19


Which produces the following parse tree:

q)parse"select name,color,weight from p where city in londonparis"
?
p
,,(in;city;,londonparis)
0b
namecolorweight!namecolourweight


This is only a small modification to the original functional select:

q)?[p;enlist (in;city;enlist londonparis);0b;(namecolorweight)!(namecolorweight)]
name  color weight
------------------
nut   red   12
bolt  green 17
screw red   14
cam   blue  12
cog   red   19


Functional updates follow an almost identical form, but use the ! operator, e..g

q)update w:sum weight by city from p
p | name  color weight city   w
--| ----------------------------
p1| nut   red   12     london 45
p2| bolt  green 17     paris  29
p3| screw blue  17     rome   17
p4| screw red   14     london 45
p5| cam   blue  12     paris  29
p6| cog   red   19     london 45

q)parse "update w:sum weight by city from p"
!
p
()
(,city)!,city
(,w)!,(sum;weight)


This parse tree maps to:

q)![p;();(enlist city)!enlist city;(enlist w)!enlist (sum;weight)]
p | name  color weight city   w
--| ----------------------------
p1| nut   red   12     london 45
p2| bolt  green 17     paris  29
p3| screw blue  17     rome   17
p4| screw red   14     london 45
p5| cam   blue  12     paris  29
p6| cog   red   19     london 45


Now if we want to update the output from a select, e.g. a simple grouped update:

q)update w:sum weight by color from select name,color,weight from p where city in londonparis
name  color weight w
---------------------
nut   red   12     45
bolt  green 17     17
screw red   14     45
cam   blue  12     12
cog   red   19     45


The parse tree for this looks like:

q)parse"update w:sum weight by color from select name,color,weight from p where city in londonparis"
!
(?;p;,,(in;city;,londonparis);0b;namecolorweight!namecolorweight)
()
(,color)!,color
(,w)!,(sum;weight)


This parse tree looks complex, but the main complexity comes from the nested functional select within.

We could explicitly write the entire function (nested select and update):

q)![?[p;enlist (in;city;enlist londonparis);0b;namecolorweight!namecolorweight];();(enlist color)!enlist color;(enlist w)!enlist (sum;weight)]
name  color weight w
---------------------
nut   red   12     45
bolt  green 17     17
screw red   14     45
cam   blue  12     12
cog   red   19     45


However it may be easier to read if we store the select in its own variable:

q)sel::?[p;enlist (in;city;enlist londonparis);0b;namecolorweight!namecolorweight]
q)sel
name  color weight
------------------
nut   red   12
bolt  green 17
screw red   14
cam   blue  12
cog   red   19


And then refer to the select thus:

q)![sel;();(enlist color)!enlist color;(enlist w)!enlist (sum;weight)]
name  color weight w
---------------------
nut   red   12     45
bolt  green 17     17
screw red   14     45
cam   blue  12     12
cog   red   19     45


# History of k/kdb+ Presentation

Last year at the inaugural Singapore kx meetup, my colleague Ajay and I gave a short presentation on the history and background behind the K language and kdb+. I did get some feedback from people like Arthur Whitney and Rob Hodgkinson on the presentation, and even got some great old pics (courtesy of Rob) of Rob/Arthur/Ken Iverson from way back! I have attached the PDF – its an interesting story and an incredibly powerful language if you have not already tried it.

Presentation here: Introduction to kdb+

# Rmathlib and kdb+ part 3: Utility Functions

In the first two parts of this series, I looked at the basics of the interface I created between rmathlib and kdb+. In this post, I’ll go through some of the convenience functions I wrote to emulate some basic R functionality.

NOTE: I wrote these functions as a learning exercise to familiarise myself with q/kdb+ a bit more – they are very simplistic and I am sure there are much better ways to do them (I know the very nice qml project does a lot of this stuff, plus interfaces to external BLAS libraries).

q)\l rmath.q

### seq – sequence function

The first convenience function is seq, which is like R’s seq() or Numpy’s arange() in that it takes a start and end point, and generates a range of numbers.

q)seq[1;100]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26...

The function is just a curried wrapper around seqn, which also takes a step size:

q)seqn[1;10;.5]
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10


This is handy when working with some of the probability generation functions shown last time. For instance the commands below calculate the normal distribution (cdf) values for evenly spaced points from -3 to 3:

q)each[{pnorm[x;0;1]}] seqn[-3;3;.5]
0.001349898 0.006209665 0.02275013 0.0668072...


### table – Symbol Tabulation

R provides a handy function to show summary count tables of factor levels. In q, this can be used as follows:

q)t:([] sym:1000?ABCD)
q)table[sym;t]
B| 241
A| 244
C| 256
D| 259

This tabulates the column sym from the table t. The summary is ordered by increasing frequency count.

NOTE: this has really no advantage over the standard

q)x xasc select count i by sym from t
sym| x
---| ---
B  | 241
A  | 244
C  | 256
D  | 259

except some slight brevity.

### range – Min/Max

The range function simply returns the boundaries of a set of values:

q)x:rnorm[10000]
q)range[x]
-3.685814 4.211363
q)abs (-) . range[x]   / absolute range
7.897177

### summary – Min/Max/Mean/Median/IQR

The summary function provides summary stats, a la R’s summary() function:

q)x:norm[10000;3;2]
q)summary x
min   | -4.755305
1q    | 1.59379
median| 2.972523
mean  | 2.966736
3q    | 4.336589
max   | 10.00284

### quantile – Quantile Calculations

A very simple quantile calc:

q)x:norm[10000;3;2]
q)quantile[x;.5]
2.973137

### hist – Bin Count

Very crude bin count – specify the data and the number of bins:

q)hist[x;10]
-4.919383| 14
-3.279589| 101
-1.639794| 601
0        | 1856
1.639794 | 3043
3.279589 | 2696
4.919383 | 1329
6.559177 | 319
8.198972 | 40
9.838766 | 1

### diag – Identity Matrix Generation

q)diag 10
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1

### Scale functions

Sometimes its very useful to scale an input data set – e.g. when feeding multiple inputs into a statistical model, large differences between the relative scales of the inputs combined with finite-precision computer arithmetic can result in some inputs being dwarfed by others. The scale function just adjusts the input as follows: $$X_{s}=(X-\mu)/\sigma$$.

The example below scales two inputs with different ranges:

q)x:norm[10;0;1]; y:norm[10;5;3]
q)x
1.920868 -1.594028 -0.02312519 1.079606 -0.5310111 0.2762119 0.1218428 0.9584264 -0.4244091 -0.7981221
q)y
10.69666 2.357529 8.93505 3.65696 5.218461 3.246216 5.971919 7.557135 1.412827 1.246241
q)range x
-1.594028 1.920868
q)range y
1.246241 10.69666
q)range scale x
-1.74507 1.878671
q)range scale y
-1.232262 1.845551

There are other useful scaling measures, including the min/max scale: $$\frac{x-min(x)}{max(x)-min(x)}$$. This is implemented using the minmax function:

q)minmax x
1 0 0.4469273 0.760658 0.302432 0.5320897 0.4881712 0.726182 0.3327606 0.226438
range minmax x
0 1f


There are other functions which are useful for scaling, e.g. the RMSD (root-mean-square deviation): $$\sqrt{\frac{\sum{x_i^2}}{N}}$$:

q)x:rnorm 1000
q)rms x
1.021065

### nchoosek – Combinations

The nchoosek function calculates the number of combinations of N items chosen k at a time (i.e. $${N}\choose{k}$$:

q)nchoosek[100;5]
7.528752e+07
q)each[{nchoosek[10;x]}] seq[1;10]
10 45 120 210 252 210 120 45 10 1f

The source file is available here: https://github.com/rwinston/kdb-rmathlib/blob/master/rmath_aux.q.

# Rmathlib and kdb+, part 2 – Probability Distribution Functions

Following on from the last post on integrating some rmathlib functionality with kdb+, here is a sample walkthrough of how some of the functionality can be used, including some of the R-style wrappers I wrote to emulate some of the most commonly-used R commands in q.

Firstly, load the rmathlib library interface:

q)\l rmath.q

## Random Number Generation

R provides random number generation facilities for a number of distributions. This is provided using a single underlying uniform generator (R provides many different RNG implementations, but in the case of Rmathlib it uses a Marsaglia-multicarry type generator) and then uses different techniques to generate numbers distributed according to the selected distribution. The standard technique is inversion, where a uniformly distributed number in [0,1] is mapped using the inverse of the probability distribution function to a different distribution. This is explained very nicely in the book “Non-Uniform Random Variate Generation”, which is availble in full here: http://luc.devroye.org/rnbookindex.html.

In order to make random variate generation consistent and reproducible across R and kdb+, we need to be able to seed the RNG. The default RNG in rmathlib takes two integer seeds. We can set this in an R session as follows:

> .Random.seed[2:3]<-as.integer(c(123,456))


and the corresponding q command is:

q)sseed[123;456]

Conversely, getting the current seed value can be done using:

q)gseed[] 123 456i

The underlying uniform generator can be accessed using runif:

q)runif[100;3;4] 3.102089 3.854157 3.369014 3.164677 3.998812 3.092924 3.381564 3.991363 3.369..

produces 100 random variates uniformly distributed between [3,4].

Then for example, normal variates can be generated:

q)rnorm 10 -0.2934974 -0.334377 -0.4118473 -0.3461507 -0.9520977 0.9882516 1.633248 -0.5957762 -1.199814 0.04405314

This produces identical results in R:

> rnorm(10)
[1] -0.2934974 -0.3343770 -0.4118473 -0.3461507 -0.9520977  0.9882516  1.6332482 -0.5957762 -1.1998144
[10]  0.0440531


Normally-distributed variables with a distribution of $$N(\mu,\sigma)$$ can also be generated:

q)dev norm[10000;3;1.5] 1.519263 q)avg norm[10000;3;1.5] 2.975766

Or we can alternatively scale a standard normal $$X ~ N(0,1)$$ using $$Y = \sigma X + \mu$$:

q)x:rnorm[1000] q) int$(avg x; dev x) 0 1i q)y:(x*3)+5 q) int$ (avg y; dev y) 5 3i

## Probability Distribution Functions

As well as random variate generation, rmathlib also provides other functions, e.g. the normal density function:

q)dnorm[0;0;1] 0.3989423

computes the normal density at 0 for a standard normal distribution. The second and third parameters are the mean and standard deviation of the distribution.

The normal distribution function is also provided:

q)pnorm[0;0;1] 0.5

computes the distribution value at 0 for a standard normal (with mean and standard deviation parameters).

Finally, the quantile function (the inverse of the distribution function – see the graph below – the quantile value for .99 is mapped onto the distribution function value at that point: 2.32):

q)qnorm[.99;0;1] 2.326348

We can do a round-trip via pnorm() and qnorm():

q)int $qnorm[ pnorm[3;0;1]-pnorm[-3;0;1]; 0; 1] 3i Thats it for the distribution functions for now – rmathlib provides lots of different distributions (I have just linked in the normal and uniform functions for now. There are some other functions that I have created that I will cover in a future post. All code is on github: https://github.com/rwinston/kdb-rmathlib [Check out part 3 of this series] # Integrating Rmathlib and kdb+ The R engine is usable in a variety of ways – one of the lesser-known features is that it provides a standalone math library that can be linked to from an external application. This library provides some nice functionality such as: * Probability distribution functions (density/distribution/quantile functions); * Random number generation for a large number of probability distributions In order to make use of this functionality from q, I built a simple Rmathlib wrapper library. The C wrapper can be found here and is simply a set of functions that wrap the appropriate calls in Rmathlib. For example, a function to generate N randomly-generated Gaussian values using the underlying rnorm() function is: K rnn(K n, K mu, K sigma) { int i,count = n->i; K ret = ktn(KF, count); for (i = 0; i < count; ++i) kF(ret)[i] = rnorm(mu->f, sigma->f); return ret; }  These have to be imported and linked from a kdb+ session, which is done using special directives (the 2: verb). I decided to automate the process of generating these directives – the code shell script below parses a set of function declarations in a delimited section of a C header file and produces the appropriate load statements: INFILE=rmath.h DLL=\:rmath echo "dll:$DLL"

DECLARATIONS=$(awk '/\/\/ BEGIN DECL/ {f=1;next} /\/\/ END DECL/ {f=0} f {sub(/K /,"",$0);print $0}'$INFILE)
for decl in $DECLARATIONS; do FNAME=${decl%%(*}
ARGS=${decl##$FNAME}
IFS=, read -r -a CMDARGS <<< "$ARGS" echo "${FNAME}:dll 2:(\$FNAME;${#CMDARGS[*]})"
done

echo "\\l rmath_aux.q"


This generates a set of link commands such as the following:

dll::rmath
rn:dll 2:(rn;2)
rnn:dll 2:(rnn;3)
dn:dll 2:(dn;3)
pn:dll 2:(pn;3)
qn:dll 2:(qn;3)
sseed:dll 2:(sseed;2)
gseed:dll 2:(gseed;1)
nchoosek:dll 2:(nchoosek;2)


It also generates a call to load a second q script, rmath_aux.q, which contains a bunch of q wrappers and helper functions (I will write a separate post about that later).

A makefile is included which generates the shared lib (once the appropriate paths to the R source files is set) and q scripts. A sample q session looks like the following:

 q) \l rmath.q q) x:rnorm 1000 / generate 1000 normal variates q) dnorm[0;0;1] / normal density at 0 for a mean 0 sd 1 distribution 

The project is available on github: https://github.com/rwinston/kdb-rmathlib.

Note that loading rmath.q loads the rmath dll, which in turn loads the rmathlib dll, so the rmathlib dll should be available on the dynamic library load path.

[Check out Part 2 of this series]

# Exporting Data From R to KDB

Here is the beginnings of a simple routine to convert R data frames to Q format (in this case a dictionary). It uses the S3 dispatch mechanism to handle the conversion of different data types. Extremely basic (I havent even bothered to put in proper file output – just capturing the output of cat) but very quick to knock up.

The code is mainly a top-level function called to_dict:

to_dict <- function(x) {
cat(substitute(x),":", sep="")
nms <- names(x)
for (n in nms) { cat("",n,sep="") }
cat("!(",sep="")
r <- rep(c(";",")"),times=c(length(nms)-1,1))
for (i in 1:length(nms)) { cat(qformat(x[[nms[i]]]),r[i],sep="") }
}


Where qformat is a generic S3 function:

qformat <- function(x) {
UseMethod("qformat")
}

qformat.default <- function(x) {
cat("",format(x))
}

qformat.logical <- function(x) {
cat(ifelse(x==TRUE,"1b","0b"))
}

qformat.factor <- function(x) {
cat("",gsub("\\s","",format(x)), sep="")
}


It can be used as follows (using the famous Anscombe quartet data):

> write(capture.output(to_dict(anscombe)),
file="/tmp/anscombe.q")


Then within a Q session:

q)\l /tmp/anscombe.q
q)anscombe
x1| 10 8 13 9 11 14 6 4 12 7 5
x2| 10 8 13 9 11 14 6 4 12 7 5
x3| 10 8 13 9 11 14 6 4 12 7 5
x4| 8 8 8 8 8 8 8 19 8 8 8
y1| 8.04 6.95 7.58 8.81 8.33 9.96 7.24 4.26 10.84 4.82 5.68
y2| 9.14 8.14 8.74 8.77 9.26 8.1 6.13 3.1 9.13 7.26 4.74
y3| 7.46 6.77 12.74 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73
y4| 6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 6.89

# Binomial Pricing Trees in R

Binomial Tree Simulation

The binomial model is a discrete grid generation method from $$t=0$$ to $$T$$. At each point in time ($$t+\Delta t$$) we can move up with probability $$p$$ and down with probability $$(1-p)$$. As the probability of an up and down movement remain constant throughout the generation process, we end up with a recombining binary tree, or binary lattice. Whereas a balanced binomial tree with height $$h$$ has $$2^{h+1}-1$$ nodes, a binomial lattice of height $$h$$ has $$\sum_{i=1}^{h}i$$ nodes.

The algorithm to generate a binomial lattice of $$M$$ steps (i.e. of height $$M$$) given a starting value $$S_0$$, an up movement $$u$$, and down movement $$d$$, is:

 FOR i=1 to M FOR j=0 to i STATE S(j,i) = S(0)*u^j*d^(n-j) ENDFOR ENDFOR 

We can write this function in R and generate a graph of the lattice. A simple lattice generation function is below:

# Generate a binomial lattice
# for a given up, down, start value and number of steps
genlattice <- function(X0=100, u=1.1, d=.75, N=5) {
X <- c()
X[1] <- X0
count <- 2

for (i in 1:N) {
for (j in 0:i) {
X[count] <- X0 * u^j * d^(i-j)
count <- count + 1
}
}
return(X)
}


We can generate a sample lattice of 5 steps using symmetric up-and-down values:

> genlattice(N=5, u=1.1, d=.9)
[1] 100.000  90.000 110.000  81.000  99.000 121.000  72.900  89.100 108.900 133.100  65.610
[12]  80.190  98.010 119.790 146.410  59.049  72.171  88.209 107.811 131.769 161.051


In this case, the output is a vector of alternate up and down state values.

We can nicely graph a binomial lattice given a tool like graphviz, and we can easily create an R function to generate a graph specification that we can feed into graphviz:

function(S, labels=FALSE) {
shape <- ifelse(labels == TRUE, "plaintext", "point")

cat("digraph G {", "\n", sep="")
cat("rankdir=LR;","\n")

# Create a dot node for each element in the lattice
for (i in 1:length(S)) {
cat("node", i, "[label=\"", S[i], "\"];", "\n", sep="")
}

# The number of levels in a binomial lattice of length N
# is $\frac{\sqrt{8N+1}-1}{2}$
L <- ((sqrt(8*length(S)+1)-1)/2 - 1)

k<-1
for (i in 1:L) {
tabs <- rep("\t",i-1)
j <- i
while(j>0) {
cat("node",k,"->","node",(k+i),";\n",sep="")
cat("node",k,"->","node",(k+i+1),";\n",sep="")
k <- k + 1
j <- j - 1
}
}

cat("}", sep="")
}


This will simply output a dot script to the screen. We can capture this script and save it to a file by invoking:

> x<-capture.output(dotlattice(genlattice(N=8, u=1.1, d=0.9)))
> cat(x, file="/tmp/lattice1.dot")


We can then invoke dot from the command-line on the generated file:

$dot -Tpng -o lattice.png -v lattice1.dot  The resulting graph looks like the following: Binomial Lattice (no labels) If we want to add labels to the lattice vertices, we can add the labels attribute: > x<-capture.output(dotlattice(genlattice(N=8, u=1.1, d=0.9), labels=TRUE)) > cat(x, file="/tmp/lattice1.dot")  Binomial Lattice (labels) # Quasi-Random Number Generation in R Random number generation is a core topic in numerical computer science. There are many efficient algorithms for generating random (strictly speaking, pseudo-random) variates from different probability distributions. The figure below shows a sampling of 1000 two-dimensional random variates from the standard Gaussian and Cauchy distributions, respectively. The size of the extreme deviations of the Cauchy distribution is apparent from the graph. However, sometimes we need to produce numbers that are more evenly distributed (quasi-random numbers). For example, in a Monte Carlo integration exercise, we can get faster convergence with a lower error bound using so-called low-discrepancy random sequences, using the GSL library. In the figure below, we show two-dimensional normal and Sobol (a low-discrepancy generator) variates. Normal, Cauchy, and Sobol 2-d variates To generate the graph below, I used the GSL library for R, as shown below: library(gsl) q <- qrng_alloc(type="sobol", 2) rs <- qrng_get(q,1000) par(mfrow=c(3,1)) plot(rnorm(1000), rnorm(1000), pch=20, main="~N(0,1)", ylab="", xlab="") plot(rs, pch=20, main="Sobol", ylab="", xlab="") plot(rcauchy(1000), rcauchy(1000),pch=20, main="~C(0,1)", ylab="",xlab="")  The property of low-discrepancy generators is even more apparent if we view the random variates in a higher dimension, for example the figure below shows the variates as a 3-dimensional cube. Note how the clustering around the centre of the cube is much more pronounced for the Gaussian cube. 3D Random Variates To plot the figure above, I used the GSL and Lattice libraries: library(gsl) library(lattice) q <- qrng_alloc(type="sobol", 3) npoints <- 200 rs <- qrng_get(q,npoints) ltheme <- canonical.theme(color = FALSE) ltheme$strip.background\$col <- "transparent"
lattice.options(default.theme = ltheme)
trellis.par.set(layout.heights =

# Plot the normal variates in a 3-dim cube
p1 <- cloud(rnorm(npoints) ~ rnorm(npoints) + rnorm(npoints), xlab="x", ylab="y",
zlab="z", pch=20, main="~N(0,1)")
p2 <- cloud(rs[,1] ~ rs[,2] + rs[,3], xlab="x", ylab="y",
zlab="z", pch=20, main="Sobol")
print(p1, split=c(1,1,2,1), more=TRUE)
print(p2, split=c(2,1,2,1))


# The Euler Method In R

The Euler Method is a very simple method used for numerical solution of initial-value problems. Although there are much better methods in practise, it is a nice intuitive mechanism.

The objective is to find a solution to the equation

$$\frac{dy}{dt} = f(t,y)$$

over a grid of points (equally spaced, in our case). Euler’s method uses the relation (basically a truncated form of Taylor’s approximation theorem with the error terms chopped off):

$$y(t_{i+1}) = y(t_i) + h\frac{df(t_i, y(t_i))}{dt}$$

In R, we can express this iterative solution as:

euler <- function(dy.dx=function(x,y){}, h=1E-7, y0=1, start=0, end=1) {
nsteps <- (end-start)/h
ys <- numeric(nsteps+1)
ys[1] <- y0
for (i in 1:nsteps) {
x <- start + (i-1)*h
ys[i+1] <- ys[i] + h*dy.dx(x,ys[i])
}
ys
}


Note that given the start and end points, and the size of each step, we figure out the number of steps. Inside the loop, we calculate each successive approximation.

An example using the difference equation

$$\frac{df(x,y)}{dx} = 3x – y + 8$$

is:

dy.dx <- function(x,y) { 3*x - y + 8 }
euler(dy.dx, start=0, end=0.5, h=0.1, y0=3)
[1] 3.00000 3.50000 3.98000 4.44200 4.88780 5.31902


# Newton’s Method In R

Here is a toy example of implementing Newton’s method in R. I found some old code that I had written a few years ago when illustrating the difference between convergence properties of various root-finding algorithms, and this example shows a couple of nice features of R.

Newtons method is an iterative root-finding algorithm where we start with an initial guess $$x_0$$ and each successive guess is refined using the iterative process:

$$x_{n+1} = x_n – \frac{f(x_n)}{f'(x_n)}$$

Here is a simple implementation of this in R:

newton <- function(f, tol=1E-12,x0=1,N=20) {
h <- 0.001
i <- 1; x1 <- x0
p <- numeric(N)
while (i<=N) {
df.dx <- (f(x0+h)-f(x0))/h
x1 <- (x0 - (f(x0)/df.dx))
p[i] <- x1
i <- i + 1
if (abs(x1-x0) < tol) break
x0 <- x1
}
return(p[1:(i-1)])
}


Note a couple of things:

* The step-size for numerical differentiation is hardcoded to be 0.001. This is arbitrary and should probably be a parameter.
* The algorithm will run until either the number of steps N has been reached, or the error tolerance $$\left|x_{n+1}-x_n\right|< \epsilon$$, where $$\epsilon$$ is defined as the tolerance parameter tol. * The function returns a vector of the iterated x positions, which will be <= N. Here is a simple example: for instance, to find the zeros of the function $$f(x) = x^3 + 4x^2 - 10$$ which in R we can represent as: [source lang="R"]f <- function(x) { x^3 + 4*x^2 -10 }[/source] Let's plot the function over the range [1,2]:

plot(x,f(x), type='l', lwd=1.5, main=expression(sin(x/2) * cos(x/4)))
abline(h=0)


It seems obvious from the plot that the zero of f(x) in the range [1,2] is somewhere between x=1.3 and x=1.4.

This is made even more clear if we tabulate the x,y values over this range:

> xy <- cbind(x,f(x))
> xy
x
[1,] 1.0 -5.000
[2,] 1.1 -3.829
[3,] 1.2 -2.512
[4,] 1.3 -1.043
[5,] 1.4  0.584
[6,] 1.5  2.375
[7,] 1.6  4.336
[8,] 1.7  6.473
[9,] 1.8  8.792
[10,] 1.9 11.299
[11,] 2.0 14.000


Using the function defined earlier, we can run the root-finding algorithm:

> p <- newton(f, x0=1, N=10)
> p
[1] 1.454256 1.368917 1.365238 1.365230 1.365230 1.365230 1.365230


This returns 1.365230 as the root of f(x). Plotting the last value in the iteration:

> abline(v=p[length(p)])


In the iteration loop, we use a schoolbook-style numerical derivative:

$$\frac{dy}{dx} = \frac{f(x+\epsilon)-f(x)}{\epsilon}$$

Its also worth noting that R has some support for symbolic derivatives, so we could use a symbolic function expression. A simple example of symbolic differentiation in R (note that R works with expression instances when using symbolic differentiation):

> e <- expression(sin(x/2)*cos(x/4))
> dydx <- D(e, "x")
> dydx
cos(x/2) * (1/2) * cos(x/4) - sin(x/2) * (sin(x/4) * (1/4))


In order for this to be useful, obviously we need to be able to evaluate the calculated derivative. We can do this using eval:

> z <- seq(-1,1,.1)
> eval(dydx, list(x=z))
[1] 0.3954974 0.4146144 0.4320092 0.4475873 0.4612640 0.4729651 0.4826267
[8] 0.4901964 0.4956329 0.4989067 0.5000000 0.4989067 0.4956329 0.4901964
[15] 0.4826267 0.4729651 0.4612640 0.4475873 0.4320092 0.4146144 0.3954974


Note that we can bind the x parameter in the expression when calling eval().