Plotting Tick Data with ggplot2

Here are some examples of using ggplot2 and kdb+ together to produce some simple graphs of data stored in kdb+. I am using the qserver extension for R (http://code.kx.com/wsvn/code/cookbook_code/r/) to connect to a running kdb+ instance from within R.

First, lets create a dummy data set: a set of evenly-spaced timestamps and a random walk price series:

ONE_SEC:long$1e9 tab:([]time:.z.P+ONE_SEC * (til 1000);price:sums?[1000?1.<0.5;-1;1]) Then import the data into R: >tab <- execute(h,'select from tab') Then plot a simple line graph – remember ggplot2 works natively with data frames: >library(ggplot2) >ggplot(tab, aes(x=time, y=price)) + geom_line() + ggtitle("Stock Price Evolution") This will produce a line graph similar to the one below: Next, we can do a simple bin count / histogram on the price series: ggplot(tab, aes(x=(price))) + geom_histogram() Which will produce a graph like the following: We can adjust the bin width to get a more granular graph using the binwidth parameter: > ggplot(tab, aes(x=(price))) + geom_histogram(position="identity", binwidth=1) We can also make use of some aesthetic attributes, e.g. fill color – we can shade the histogram by the number of observations in each bin: ggplot(tab, aes(x=(price), fill=..count..)) + geom_histogram(position="identity", binwidth=1) Which results in: Some other graphs: Say I have a data frame with a bunch of currency tick data (bid/offer/mid prices). The currencies are interspersed. Here is a sample: > head(ccys) sym timestamp bid ask mid 1 AUDJPY 2013-01-15 11:00:16.127 94.485 94.496 94.4905 2 AUDJPY 2013-01-15 11:00:22.592 94.486 94.496 94.4910 3 AUDJPY 2013-01-15 11:00:30.117 94.498 94.505 94.5015 4 AUDJPY 2013-01-15 11:00:30.325 94.498 94.506 94.5020 5 AUDJPY 2013-01-15 11:00:37.118 94.499 94.507 94.5030 6 AUDJPY 2013-01-15 11:00:47.348 94.526 94.536 94.5310 I want to add a column containing the log-returns calculated separately for each currency: log.ret <- function(x) do.call("rbind", lapply(seq_along(x), function(i) cbind(x[[i]],lr=c(0, diff(log(x[[i]]$mid))))))
0 1i
q)y:(x*3)+5
q) int$(avg y; dev y) 5 3i Probability Distribution Functions As well as random variate generation, rmathlib also provides other functions, e.g. the normal density function: q)dnorm[0;0;1] 0.3989423 computes the normal density at 0 for a standard normal distribution. The second and third parameters are the mean and standard deviation of the distribution. The normal distribution function is also provided: q)pnorm[0;0;1] 0.5 computes the distribution value at 0 for a standard normal (with mean and standard deviation parameters). Finally, the quantile function (the inverse of the distribution function – see the graph below – the quantile value for .99 is mapped onto the distribution function value at that point: 2.32): q)qnorm[.99;0;1] 2.326348 We can do a round-trip via pnorm() and qnorm(): q)int$ qnorm[ pnorm[3;0;1]-pnorm[-3;0;1]; 0; 1]
3i

Thats it for the distribution functions for now – rmathlib provides lots of different distributions (I have just linked in the normal and uniform functions for now. There are some other functions that I have created that I will cover in a future post.

All code is on github: https://github.com/rwinston/kdb-rmathlib

[Check out part 3 of this series]

Integrating Rmathlib and kdb+

The R engine is usable in a variety of ways – one of the lesser-known features is that it provides a standalone math library that can be linked to from an external application. This library provides some nice functionality such as:

* Probability distribution functions (density/distribution/quantile functions);
* Random number generation for a large number of probability distributions

In order to make use of this functionality from q, I built a simple Rmathlib wrapper library. The C wrapper can be found here and is simply a set of functions that wrap the appropriate calls in Rmathlib. For example, a function to generate N randomly-generated Gaussian values using the underlying rnorm() function is:

K rnn(K n, K mu, K sigma) {
int i,count = n->i;
K ret = ktn(KF, count);
for (i = 0; i < count; ++i)
kF(ret)[i] = rnorm(mu->f, sigma->f);
return ret;
}

These have to be imported and linked from a kdb+ session, which is done using special directives (the 2: verb). I decided to automate the process of generating these directives – the code shell script below parses a set of function declarations in a delimited section of a C header file and produces the appropriate load statements:

INFILE=rmath.h
DLL=\:rmath

echo "dll:$DLL" DECLARATIONS=$(awk '/\/\/ BEGIN DECL/ {f=1;next} /\/\/ END DECL/ {f=0} f {sub(/K /,"",$0);print$0}' $INFILE) for decl in$DECLARATIONS; do
FNAME=${decl%%(*} ARGS=${decl##$FNAME} IFS=, read -r -a CMDARGS <<< "$ARGS"
echo "${FNAME}:dll 2:(\$FNAME;${#CMDARGS[*]})" done echo "\\l rmath_aux.q" This generates a set of link commands such as the following: dll::rmath rn:dll 2:(rn;2) rnn:dll 2:(rnn;3) dn:dll 2:(dn;3) pn:dll 2:(pn;3) qn:dll 2:(qn;3) sseed:dll 2:(sseed;2) gseed:dll 2:(gseed;1) nchoosek:dll 2:(nchoosek;2) It also generates a call to load a second q script, rmath_aux.q, which contains a bunch of q wrappers and helper functions (I will write a separate post about that later). A makefile is included which generates the shared lib (once the appropriate paths to the R source files is set) and q scripts. A sample q session looks like the following: q) \l rmath.q q) x:rnorm 1000 / generate 1000 normal variates q) dnorm[0;0;1] / normal density at 0 for a mean 0 sd 1 distribution The project is available on github: https://github.com/rwinston/kdb-rmathlib. Note that loading rmath.q loads the rmath dll, which in turn loads the rmathlib dll, so the rmathlib dll should be available on the dynamic library load path. [Check out Part 2 of this series] Exporting Data From R to KDB Here is the beginnings of a simple routine to convert R data frames to Q format (in this case a dictionary). It uses the S3 dispatch mechanism to handle the conversion of different data types. Extremely basic (I havent even bothered to put in proper file output – just capturing the output of cat) but very quick to knock up. The code is mainly a top-level function called to_dict: to_dict <- function(x) { cat(substitute(x),":", sep="") nms <- names(x) for (n in nms) { cat("",n,sep="") } cat("!(",sep="") r <- rep(c(";",")"),times=c(length(nms)-1,1)) for (i in 1:length(nms)) { cat(qformat(x[[nms[i]]]),r[i],sep="") } } Where qformat is a generic S3 function: qformat <- function(x) { UseMethod("qformat") } qformat.default <- function(x) { cat("",format(x)) } qformat.logical <- function(x) { cat(ifelse(x==TRUE,"1b","0b")) } qformat.factor <- function(x) { cat("",gsub("\\s","",format(x)), sep="`") } It can be used as follows (using the famous Anscombe quartet data): > write(capture.output(to_dict(anscombe)), file="/tmp/anscombe.q") Then within a Q session: q)\l /tmp/anscombe.q q)anscombe x1| 10 8 13 9 11 14 6 4 12 7 5 x2| 10 8 13 9 11 14 6 4 12 7 5 x3| 10 8 13 9 11 14 6 4 12 7 5 x4| 8 8 8 8 8 8 8 19 8 8 8 y1| 8.04 6.95 7.58 8.81 8.33 9.96 7.24 4.26 10.84 4.82 5.68 y2| 9.14 8.14 8.74 8.77 9.26 8.1 6.13 3.1 9.13 7.26 4.74 y3| 7.46 6.77 12.74 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73 y4| 6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 6.89 Compiling The kdb/R interface on Win32 I have been playing with the kdb/R interface from kx.com, and had some problems installing with Cygwin gcc. It may be possible to get this to work with Cygwin gcc + a Win32 threads library, but in the meantime I installed MinGW, and it works perfectly. Here are the steps (basically as per the kx docs): 1. Download c.o from here: http://kx.com/q/w32/ 2. gcc -c base.c -I. -I "${R_HOME}/include/"
3. gcc -Wl,--export-all-symbols -shared -o qserver.dll c.o base.o \${r
-HOME}/bin/R.dll -lws2_32

The resulting qserver.dll can be loaded via dyn.load(), and then (just using the qserver.R supplied by kx) from within R:

source("qserver.R")
conn < - open_connection("server", 12345)
result <- execute(conn, "select avg bid by sym from fx_quote")
x <- as.data.frame(mapply(FUN=c, result))