Project Euler Problem #15

February 22nd, 2009

Problem 15 on Project Euler asks us to find the number of distinct routes between the top left and bottom right corners in a 20×20 grid, with no backtracking allowed.

I originally saw this type of problem tackled in the book Notes On Introductory Combinatorics, by George Polya amongst others. This book is hard to find now, but it is a really clear intro to combinatoric math.

The solution can be paraphrased as follows: if the grid is of size 20×20, and it takes 2 movements to navigate a single square in the grid, then we must make a total of 40 movements to get from the top right to the bottom left. Exactly half of these movements will be left-to-right, and the other half will be up-down. The total number of distinct routes is the number of ways that we can choose 20 of each type of move from the 40 total moves required. So we need the combinatoric construct n-choose-k, or how many ways k items can be selected from n total items. This is represented as tex:{n\choose k}.

In R, calculating tex:{40\choose 20} is just:

choose(40, 20)

Learning Lilypond : Luciana’s Theme

February 22nd, 2009

There is a whole bunch of obscure and wonderful music out there that I have been meaning to transcribe and arrange for the piano, and so I have been looking around for decent music transcription software, and I recently came across lilypond. This is to music engraving what TeX is to mathematical typesetting, and the results are really very nice. The downside of lilypond (as with TeX) is the slightly steep initial learning curve. To put it to the test, I have transcribed a piece of music that I came across a while ago – it’s called “Tema Par Luciana”, and is from an old Italian movie called C’eravamo tanto amati. I haven’t seen the movie, but I came across the theme in a mix created by a French DJ called elektrosonik. The piece is simple enough that it was pretty straightforward to transcribe, but also has enough variety that it is a good starting point for learning lilypond techniques.

Here is the PDF output (Lilypond will also produce a MIDI file if asked):

Tema Par Luciana (PDF)

And here is the lilypond source: luciana.ly

Project Euler Problem #13

February 22nd, 2009

Problem 13 on Project Euler asks us to sum 100 50-digit numbers and give the first 10 digits of the result. This is pretty easy. Note we are using R’s integer division operator %/% to discard the remainder of the large summed integer and just gives us the first 10 digits of the result.

## Problem 13
problem13 <- function() {
    nums <- scan("problem13.dat")
    s <- sum(nums)
    s %/% 10^(floor(log10(s))-9)
}

Project Euler Problem #14

February 22nd, 2009

Problem 14 on the Project Euler site asks us to find the longest chain under 1 million created using the Collatz mapping. This is fairly straightforward, although performance again is not great:

## Problem 14
# Collatz conjecture
problem14 <- function(N) {
        maxChain <- 0
        chains <- rep(0,N)
        x <- 1
        for (i in 1:N) {
                n <- i
                chain <- 0
                while(n > 1) {
                        n <- ifelse(n %% 2 == 0, n/2, 3*n+1)
                        chain <- chain + 1
                        if (n < N && chains[n] > 0) {
                            chain <- chain + chains[n]
                                break
                        }
                        
                }
                chains[i] <- chain
                if (chain > maxChain) {
                        maxChain <- chain
                        x <- i
                }
        }
        x
}

Project Euler Problem #12

February 22nd, 2009

Problem 12 on the Project Euler site asks:

What is the value of the first triangle number to have over five hundred divisors?

A triangular number T(n) is defined as tex:T(n) = \frac{n(n+1)}{2}. The R code below consists of a solution, which involves the fact that the number of proper divisors of an integer n can be calculated by first computing a prime factorisation of the number n, e.g. if tex:n = p^aq^b, where p,q are prime, then the number of proper divisors of n can be calculated as tex:d(n) = (a+1)(b+1). This solution is extremely slow (mainly due to the naive prime sieving algorithm), and could be speeded up dramatically with a little effort.

# Sieve of Eratosthenes
prime.sieve <- function(n) {
 a <- seq.int(1,n)
 p <- 1
 M <- as.integer(sqrt(n))
 while ((p <- p + 1) <= M) {
  if (a[p] != 0)
    a[seq.int(p*p, n, p)] <- 0
 }
 a[a>1 & a>0]
}
 

# Trial Division
# Returns the exponents of the prime
# factors of n
# e.g. if n = p^a*q^b
# tdiv(n) will return (a,b)
tdiv <- function(n) {
 primes <- prime.sieve(n)
 factors <- c()
 i <- 1
 curr <- 0
 
 for (p in primes) {
  while (n %% p == 0) {
   curr <- curr + 1
   n <- n %/% p
  }
  factors[i] <- curr
  i <- i + 1
  curr <- 0
 }
 
 factors[factors > 0]
}

# Compute nth triangular number
T <- function(n) {
        (n*(n+1))/2
}
 
## Problem 12
# This is a slooow solution
problem12 <- function(N) {
 n <- 0 # current triangular number Tn
 i <- 5 # \sum_{i=1}^n{i}
 
 while (TRUE) { 
  n <- T(i)
  factors <- tdiv(n) 
  if (prod(factors+1) >= N) {
   return(n)
  }
  i <- i + 1
 }
}

Project Euler Problem #19

February 20th, 2009

Problem 19 on the Project Euler website asks the user, given some initial information:

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

The obvious (but longer) way is to calculate the sum of the days between 1901 and 2000, given the number of days in each month, and a helper function to determine whether a year is a leap year or not:

is.leap <- function(year) {
        return (year %% 4 == 0 || (year %% 100 == 0 && year && 4 == 0))
}

# Problem 19
problem19 <- function() {
        daycount <- 1
        daylist <- list()
        i <- 1
        for (year in 1900:2000) {
                months <- c(0,31,28,31,30,31,30,31,31,30,31,30,31)
                if (is.leap(year)) {
                        months[3] <- 29
                        }
                        days <- daycount + cumsum(months)
                        daycount <- days[length(days)]
                        daylist[[i]] <- (days[-(length(days))])
                        i <- i + 1
                }
                sum(unlist(lapply(daylist[-1], function(x) {sum(x %% 7==1)} )))
}

However, with the aid of R’s chron library, there is a much easier way:

# Problem 19, method 2
library(chron)
sum(weekdays(seq.dates(01/01/1901, “12/31/2000, by=”months))==”Sun)

Webank Event at NESTA

January 22nd, 2009

Just attended the webank event at NESTA tonight, which was well attended with a capacity crowd of finance industry entrepeneurs, lawyers, VCs and bloggers. Three interesting companies presented at the event, each with their own angle on p2p finance:

Zopa presented a brief introduction to their model and gave some interesting statistics about their typical user base. Zopa (UK at least) seems to be profiting handsomely from the current credit market difficulties, although it seems that they may not be quite yet in the black. Their US operation seems to be less of a p2p operation. Average returns apparently around 9%, with a default rate of around 0.2%.

Kubera Money presented on their vision of a p2p lending platform based on the chit fund, or ROSCA model. Not too much detail regarding the actual implementation of how the schemes will work in the distributed world, but sounds like they have the FSA regulatory hurdles cleared, so hopefully will progress fast.

Midpoint Transfer was the last startup to present, and their business model revolves around disintermediating FX dealers by offering FX rates at the midpoint (no spread) with all trades offset via a linked network of midpoint-controlled intermediary accounts. Trades are guaranteed to be matched same day and midpoint takes on the credit risk. There is a flat $30 fee per transaction. Having worked in FX for a couple of years, I am sceptical about a couple of things, mainly around their guaranteed matching (which necessitates them having to hedge any unmatched exposure themselves, which will be a common occurrence unless they have thousands of users), and the fact that the midpoint is not set until the trade is actually matched, which could entail a very unfavourable price for a customer if a trade is not matched until end-of-day during a period of high volatility. If this is the case, there is a real risk that customers in this case would have been much better off by just dealing through a broker.

Afterwards, there was an entertaining verbal sparring match between the Zopa CEO and James Gardner, the head of innovation at a large retail bank, who both argued their views eloquently. Met some interesting people at the event – there are a lot of innovative ideas coming down the line in p2p finance.

Debugging DLL Loading Issues in Windows

December 12th, 2008

Attempting to debug dynlib loading issues in Windows can be tricky – on Linux you can use strace to monitor dynamic linking and see what libraries are being loaded as the application runs. Having just had some issues with dynlib loading issues in an R extension I wrote, I was keen to find a good way to debugging the issues as they occurred. Thankfully, there are some good utilities out there for debugging this kind of thing:

Dependency Walker is an old SDK tool that is still invaluable. Drag a .DLL or .EXE into the application window and it will show you the dependencies, what type of loading mechanism they use (e.g. eager or delay load), and also what exported functions from the dependent DLLs are actually used by the .DLL or .EXE you have selected.

Here is a screenshot of depends.exe operating on my R extension DLL. Note that MSCVR80.DLL and R.DLL are both highlighted as missing – this is extremely useful for figuring out dependent DLL load issues.

Dependency Walker

Dependency Walker

The SysInternals Process Monitor is an incredibly useful low-level Swiss Army Knife utility that can be used, among other things, to monitor dynamic library loading activity as it occurs, using the file activity view. Here is a view of Process Monitor monitoring R as it loads my extension DLL. The subsequent dependent DLL loading activity is highlighted.

Process Explorer

Process Explorer

R/Reuters Real-Time Data Extension

December 11th, 2008

Last week, I posted an R extension DLL that downloads historical data from Reuters using the SFC API. This time around, I am posting an extension DLL that can subscribe to real-time updates using the RFA API. This extension allows you to specify a list of items and data fields, and subscribe for a specified amount of time to updates on those items.

Here is an example:

# Load the DLL
dyn.load("RfaClient")

# Low-level subscription function wrapper
rsub < - function(time, items, func, sessionName, config, debug) {
.Call("subscribe", as.integer(time), items, func , sessionName, config, debug)
}

# Define items and fields to subscribe to
items <- list()
fields <- c("BID","ASK","TIMCOR")
items[[1]] <- c("IDN_SELECTFEED", "GBP=", fields)
items[[2]] <- c("IDN_SELECTFEED", "JPY=", fields)

# Callback function (invoked when data items update)
callback <- function(df) {
print(paste("Received an update for",df$ITEM, ", update time=", df$TIMCOR))
}

# Subscribe for 5 seconds using the supplied config parameters
rsub(5000, items, callback, "clientSession", "rfa.cfg", FALSE)

A short introductory guide is supplied:

Introduction (PDF)

Introduction (PDF)

The functionality is pretty basic right now, and there may be issues with the code (e.g. memory leaks, performance issues). Any feedback is gratefully received.

The package, including source, can be downloaded here:

rfaclient.zip

It was built with R 2.8.0 and RFA C++ 6.0.2, using Visual C++ 2005.

Working With Unicode Symbols in R and Vim

December 10th, 2008

I came across a Vim feature today that I had forgotten about for a while – digraphs. Digraphs are basically Vim abbreviations for encoded character shortcuts. Typing <ctrl-K> followed by the digraph will insert the mapped character into the file. For instance, to insert the Euro symbol, type

<CTRL-K>Eu.

A list of existing digraphs can be shown using :digraph, and a new digraph can be added using the same command.

For instance, to add new digraphs for the Unicode male (♂) and female (♀) gender symbols, which are Unicode code points 9794/0×2642 and 9792/0×2640 respectively, you can type:

:dig Ma 9794
:dig Fe 9792

Similarly, R can insert Unicode code points into text and labels, and even data point symbols. One little-known convention in R is that if a data point symbol is negative, it is taken to be a Unicode code point. The following example generates some random data, and then displays the data with either a male or female symbols depending on whether the data point is divisible by 2 or not. Also, the symbols are inserted into the plot title using the \u character prefix.

> gender < - rbinom(n=100, size=100, prob=0.5)
> plot(gender, cex=2.5,
pch=ifelse(gender %% 2 == 0, -0x2642L, -0x2640L),
col=ifelse(gender %% 2 == 0, 2, 3), main="\u2640 and \u2642 Trials")

Plot Example

Plot Example