Rmathlib and kdb+, part 2 – Probability Distribution Functions

Following on from the last post on integrating some rmathlib functionality with kdb+, here is a sample walkthrough of how some of the functionality can be used, including some of the R-style wrappers I wrote to emulate some of the most commonly-used R commands in q.

Loading the rmath library

Firstly, load the rmathlib library interface:

q)\l rmath.q

Random Number Generation

R provides random number generation facilities for a number of distributions. This is provided using a single underlying uniform generator (R provides many different RNG implementations, but in the case of Rmathlib it uses a Marsaglia-multicarry type generator) and then uses different techniques to generate numbers distributed according to the selected distribution. The standard technique is inversion, where a uniformly distributed number in [0,1] is mapped using the inverse of the probability distribution function to a different distribution. This is explained very nicely in the book “Non-Uniform Random Variate Generation”, which is availble in full here: http://luc.devroye.org/rnbookindex.html.

In order to make random variate generation consistent and reproducible across R and kdb+, we need to be able to seed the RNG. The default RNG in rmathlib takes two integer seeds. We can set this in an R session as follows:

> .Random.seed[2:3]<-as.integer(c(123,456))

and the corresponding q command is:

q)sseed[123;456]

Conversely, getting the current seed value can be done using:

q)gseed[]
123 456i

The underlying uniform generator can be accessed using runif:

q)runif[100;3;4]
3.102089 3.854157 3.369014 3.164677 3.998812 3.092924 3.381564 3.991363 3.369..

produces 100 random variates uniformly distributed between [3,4].

Then for example, normal variates can be generated:

q)rnorm 10
-0.2934974 -0.334377 -0.4118473 -0.3461507 -0.9520977 0.9882516 1.633248 -0.5957762 -1.199814 0.04405314

This produces identical results in R:

> rnorm(10)
 [1] -0.2934974 -0.3343770 -0.4118473 -0.3461507 -0.9520977  0.9882516  1.6332482 -0.5957762 -1.1998144
[10]  0.0440531

Normally-distributed variables with a distribution of \( N(\mu,\sigma) \) can also be generated:

q)dev norm[10000;3;1.5]
1.519263
q)avg norm[10000;3;1.5]
2.975766

Or we can alternatively scale a standard normal \( X ~ N(0,1) \) using \( Y = \sigma X + \mu \):

q)x:rnorm[1000]
q) `int$ (avg x; dev x)
0 1i
q)y:(x*3)+5
q) `int$ (avg y; dev y)
5 3i

Probability Distribution Functions

As well as random variate generation, rmathlib also provides other functions, e.g. the normal density function:

q)dnorm[0;0;1]
0.3989423

computes the normal density at 0 for a standard normal distribution. The second and third parameters are the mean and standard deviation of the distribution.

The normal distribution function is also provided:

q)pnorm[0;0;1]
0.5

computes the distribution value at 0 for a standard normal (with mean and standard deviation parameters).

Finally, the quantile function (the inverse of the distribution function – see the graph below – the quantile value for .99 is mapped onto the distribution function value at that point: 2.32):

cdf

q)qnorm[.99;0;1]
2.326348

We can do a round-trip via pnorm() and qnorm():

q)`int $ qnorm[ pnorm[3;0;1]-pnorm[-3;0;1]; 0; 1]
3i

Thats it for the distribution functions for now – rmathlib provides lots of different distributions (I have just linked in the normal and uniform functions for now. There are some other functions that I have created that I will cover in a future post.

All code is on github: https://github.com/rwinston/kdb-rmathlib

[Check out part 3 of this series]