Categories
Coding kdb

Functional Selects/Updates in kdb+

Functional selects/updates are a relatively trick topic in kdb+ – mainly as the syntax takes a lot of getting used to. They are normally required when there are some dynamic elements in e.g. column selection or grouping criteria.

They are pretty well covered in Q For Mortals, but I wanted to add a couple of examples…combining functional select and update for example:

To start, load the sample tables in sp.q..we will use the table called ‘p’:

\l sp.q
q)p
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 91
p2| bolt green 17 paris 91
p3| screw blue 17 rome 91
p4| screw red 14 london 91
p5| cam blue 12 paris 91
p6| cog red 19 london 91

Functional Selects

A simple select from p with some criteria:


q)select from p where city=`london
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 91
p4| screw red 14 london 91
p6| cog red 19 london 91

Now lets look at the parse tree for this query:

q)parse "select from p where city=`london"
?
`p
,,(=;`city;,`london)
0b
()

Now in order to convert this to a functional select, we need to turn this parse tree into an executable statement using the ? operator.

The basic form of the ? operator is


?[tablename;(select criteria);(grouping criteria);(columns)]

The parse tree above gives us each of the four elements in the right order – we just need to convert them to a valid functional syntax. For the example above this translates to:


q)?[p;enlist (=;`city;enlist `london);0b;()]
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 91
p4| screw red 14 london 91
p6| cog red 19 london 91

So if we want to add e.g. column selection:

q)select name,color,weight from p where city=`london
name color weight
------------------
nut red 12
screw red 14
cog red 19

The parse tree looks like:


q)parse"select name,color,weight from p where city=`london"
?
`p
,,(=;`city;,`london)
0b
`name`color`weight!`name`color`weight

In this case we have added a dictionary mapping selected columns to their output names:


q)?[p;enlist (=;`city;enlist `london);0b;(`name`color`weight)!(`name`color`weight)]
name color weight
------------------
nut red 12
screw red 14
cog red 19

Similarly, we can change the select criteria:


q)select name,color,weight from p where city in `london`paris
name color weight
------------------
nut red 12
bolt green 17
screw red 14
cam blue 12
cog red 19

Which produces the following parse tree:


q)parse"select name,color,weight from p where city in `london`paris"
?
`p
,,(in;`city;,`london`paris)
0b
`name`color`weight!`name`colour`weight

This is only a small modification to the original functional select:


q)?[p;enlist (in;`city;enlist `london`paris);0b;(`name`color`weight)!(`name`color`weight)]
name color weight
------------------
nut red 12
bolt green 17
screw red 14
cam blue 12
cog red 19

Functional Updates

Functional updates follow an almost identical form, but use the ! operator, e..g


q)update w:sum weight by city from p
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 45
p2| bolt green 17 paris 29
p3| screw blue 17 rome 17
p4| screw red 14 london 45
p5| cam blue 12 paris 29
p6| cog red 19 london 45

q)parse "update w:sum weight by city from p"
!
`p
()
(,`city)!,`city
(,`w)!,(sum;`weight)

This parse tree maps to:


q)![p;();(enlist `city)!enlist `city;(enlist `w)!enlist (sum;`weight)]
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 45
p2| bolt green 17 paris 29
p3| screw blue 17 rome 17
p4| screw red 14 london 45
p5| cam blue 12 paris 29
p6| cog red 19 london 45

Now if we want to update the output from a select, e.g. a simple grouped update:


q)update w:sum weight by color from select name,color,weight from p where city in `london`paris
name color weight w
---------------------
nut red 12 45
bolt green 17 17
screw red 14 45
cam blue 12 12
cog red 19 45

The parse tree for this looks like:


q)parse"update w:sum weight by color from select name,color,weight from p where city in `london`paris"
!
(?;`p;,,(in;`city;,`london`paris);0b;`name`color`weight!`name`color`weight)
()
(,`color)!,`color
(,`w)!,(sum;`weight)

This parse tree looks complex, but the main complexity comes from the nested functional select within.

We could explicitly write the entire function (nested select and update):


q)![?[p;enlist (in;`city;enlist `london`paris);0b;`name`color`weight!`name`color`weight];();(enlist `color)!enlist `color;(enlist `w)!enlist (sum;`weight)]
name color weight w
---------------------
nut red 12 45
bolt green 17 17
screw red 14 45
cam blue 12 12
cog red 19 45

However it may be easier to read if we store the select in its own variable:


q)sel::?[p;enlist (in;`city;enlist `london`paris);0b;`name`color`weight!`name`color`weight]
q)sel
name color weight
------------------
nut red 12
bolt green 17
screw red 14
cam blue 12
cog red 19

And then refer to the select thus:


q)![sel;();(enlist `color)!enlist `color;(enlist `w)!enlist (sum;`weight)]
name color weight w
---------------------
nut red 12 45
bolt green 17 17
screw red 14 45
cam blue 12 12
cog red 19 45

Categories
Coding kdb

History of k/kdb+ Presentation

Last year at the inaugural Singapore kx meetup, my colleague Ajay and I gave a short presentation on the history and background behind the K language and kdb+. I did get some feedback from people like Arthur Whitney and Rob Hodgkinson on the presentation, and even got some great old pics (courtesy of Rob) of Rob/Arthur/Ken Iverson from way back! I have attached the PDF – its an interesting story and an incredibly powerful language if you have not already tried it.

Presentation here: Introduction to kdb+

Categories
kdb R Statistics

Plotting Tick Data with ggplot2

Here are some examples of using ggplot2 and kdb+ together to produce some simple graphs of data stored in kdb+. I am using the qserver extension for R (http://code.kx.com/wsvn/code/cookbook_code/r/) to connect to a running kdb+ instance from within R.

First, lets create a dummy data set: a set of evenly-spaced timestamps and a random walk price series:

ONE_SEC:`long$1e9
tab:([]time:.z.P+ONE_SEC * (til 1000);price:sums?[1000?1.<0.5;-1;1])

Then import the data into R:

[source lang="r"]>tab <- execute(h,'select from tab')[/source]

Then plot a simple line graph - remember ggplot2 works natively with data frames:

[source lang="r"]>library(ggplot2)
>ggplot(tab, aes(x=time, y=price)) + geom_line()
+ ggtitle("Stock Price Evolution")[/source]

This will produce a line graph similar to the one below:

Simple line chart

Next, we can do a simple bin count / histogram on the price series:

[source lang="r"]ggplot(tab, aes(x=(price))) + geom_histogram()[/source]

Which will produce a graph like the following:

ggplot_hist_1

We can adjust the bin width to get a more granular graph using the binwidth parameter:

[source lang="r"]> ggplot(tab, aes(x=(price)))
+ geom_histogram(position="identity", binwidth=1)[/source]

ggplot_hist_2

We can also make use of some aesthetic attributes, e.g. fill color - we can shade the histogram by the number of observations in each bin:

[source lang="r"]ggplot(tab, aes(x=(price), fill=..count..))
+ geom_histogram(position="identity", binwidth=1)[/source]

Which results in:

ggplot_hist_3

Some other graphs: Say I have a data frame with a bunch of currency tick data (bid/offer/mid prices). The currencies are interspersed. Here is a sample:

[source lang="r"]
> head(ccys)
sym timestamp bid ask mid
1 AUDJPY 2013-01-15 11:00:16.127 94.485 94.496 94.4905
2 AUDJPY 2013-01-15 11:00:22.592 94.486 94.496 94.4910
3 AUDJPY 2013-01-15 11:00:30.117 94.498 94.505 94.5015
4 AUDJPY 2013-01-15 11:00:30.325 94.498 94.506 94.5020
5 AUDJPY 2013-01-15 11:00:37.118 94.499 94.507 94.5030
6 AUDJPY 2013-01-15 11:00:47.348 94.526 94.536 94.5310
[/source]

I want to add a column containing the log-returns calculated separately for each currency:

[source lang="r"]
log.ret <- function(x)
do.call("rbind",
lapply(seq_along(x),
function(i)
cbind(x[[i]],lr=c(0, diff(log(x[[i]]$mid))))))
ccys <- log.ret(split(ccys, f=ccys$sym))
[/source]

Then I can plot a simple line chart of the log returns using the lovely facets feature in ggplot to split out a separate panel per symbol:

[source lang="r"]
ggplot(ccys, aes(x=timestamp, y=lr))
+ geom_line()
+ facet_grid(sym ~ .)
[/source]

Which produces the following:

line_chart_1

Another nice graph - display a visual summary of the tick frequency by time. This one uses a dummy column that represents a tick arrival. Note in the following graph I have reduced the line width and set the alpha value to a tiny value (creating a large transparency effect) as otherwise the density of tick lines is too great. The overall effect is visually pleasing:

[source lang="r"]
ccys <- cbind(ccys, dummy=rep(1,nrow(ccys)))
ggplot(ccys, aes(x=timestamp,y=dummy, ymin=0, ymax=1))
+ geom_linerange(alpha=1/2,size=.01,width=.01)
+ facet_grid(sym ~ .)
+ theme(axis.text.y=element_blank())
+ xlab("ticks") + ylab("time")
+ ggtitle("Tick Density")
[/source]

Which results in:

ticks

Next time (time permitting) - covariance matrices and bivariate density plots..