Using Awk To Prefix Lines With Millisecond Timestamp

July 2nd, 2009

Here is a handy way to get awk to preprocess a line and add a timestamp (Put here as I will probably forget how to do this straight away again!)


echo "foo,bar" | awk '{x="'"`date +%Y%M%d%S%N`"'"; printf "%s,%s\n",x,$0 }'

London UseR Group Talk - Slides

April 2nd, 2009

The inaugural London UseR event was a great success, with a lot of interesting people and a very constructive networking atmosphere!

I gave a (slightly disjointed) talk on concurrency and the bigmemory package in R (more on that later this year at UseR! 2009 in France).

The slides are here.

R User Group Meeting, London

March 5th, 2009

On Tuesday March 31st, Mango Solutions are sponsoring the inaugural London R User Group Meeting. It will be a great opportunity to meet other R users and find out how people are using it. As the first one of its kind in London, I would expect a high level of interest. There will be a number of speakers presenting on various topics, using the UseR! panel format (short talks at around 15 minutes or so). I will be giving a short presentation, most likely following on from the real-time data integration work I presented on at UseR! 2008.

See this page for details on the event:

http://www.mango-solutions.com/events/UseRLondon.html

Project Euler Problem #28

March 3rd, 2009

Problem 28 on the Project Euler website asks what is the sum of both diagonals in a 1001×1001 clockwise spiral. This was an interesting one: the relationship between the numbers on the diagonals is easy to deduce, but expressing it succinctly in R took a little bit of tweaking. I’m sure it could be compressed even further.

# Problem 28
spiral.size <- function(n) {
        stopifnot(n %% 2 ==1)
        
        if (n==1) {
                return(1)
        }
        sum(cumsum(rep(2*seq(1,floor(n/2)), rep(4,floor(n/2))))+1)+1
}

spiral.size(1001)

Project Euler Problem #22

March 1st, 2009

Problem 22 on Project Euler proves a text file containing a large number of comma-delimited names and asks us to calculate the numeric sum of the alphabetical score for each name multiplied by the name’s position in the original list. This is made slightly easier by the presence of the predefined LETTERS variable in R.

problem22 <- function() {
        namelist <- scan(file="c:/temp/names.txt", sep=",", what="", na.strings="")
        sum(unlist(
                lapply(namelist, 
                        function(Z) which(namelist==Z) * sum(match(unlist(strsplit(Z,"")), LETTERS)))))
}

Project Euler Problem #15

February 22nd, 2009

Problem 15 on Project Euler asks us to find the number of distinct routes between the top left and bottom right corners in a 20×20 grid, with no backtracking allowed.

I originally saw this type of problem tackled in the book Notes On Introductory Combinatorics, by George Polya amongst others. This book is hard to find now, but it is a really clear intro to combinatoric math.

The solution can be paraphrased as follows: if the grid is of size 20×20, and it takes 2 movements to navigate a single square in the grid, then we must make a total of 40 movements to get from the top right to the bottom left. Exactly half of these movements will be left-to-right, and the other half will be up-down. The total number of distinct routes is the number of ways that we can choose 20 of each type of move from the 40 total moves required. So we need the combinatoric construct n-choose-k, or how many ways k items can be selected from n total items. This is represented as tex:{n\choose k}.

In R, calculating tex:{40\choose 20} is just:

choose(40, 20)

Learning Lilypond : Luciana’s Theme

February 22nd, 2009

There is a whole bunch of obscure and wonderful music out there that I have been meaning to transcribe and arrange for the piano, and so I have been looking around for decent music transcription software, and I recently came across lilypond. This is to music engraving what TeX is to mathematical typesetting, and the results are really very nice. The downside of lilypond (as with TeX) is the slightly steep initial learning curve. To put it to the test, I have transcribed a piece of music that I came across a while ago - it’s called “Tema Par Luciana”, and is from an old Italian movie called C’eravamo tanto amati. I haven’t seen the movie, but I came across the theme in a mix created by a French DJ called elektrosonik. The piece is simple enough that it was pretty straightforward to transcribe, but also has enough variety that it is a good starting point for learning lilypond techniques.

Here is the PDF output (Lilypond will also produce a MIDI file if asked):

Tema Par Luciana (PDF)

And here is the lilypond source: luciana.ly

Project Euler Problem #13

February 22nd, 2009

Problem 13 on Project Euler asks us to sum 100 50-digit numbers and give the first 10 digits of the result. This is pretty easy. Note we are using R’s integer division operator %/% to discard the remainder of the large summed integer and just gives us the first 10 digits of the result.

## Problem 13
problem13 <- function() {
    nums <- scan("problem13.dat")
    s <- sum(nums)
    s %/% 10^(floor(log10(s))-9)
}

Project Euler Problem #14

February 22nd, 2009

Problem 14 on the Project Euler site asks us to find the longest chain under 1 million created using the Collatz mapping. This is fairly straightforward, although performance again is not great:

## Problem 14
# Collatz conjecture
problem14 <- function(N) {
        maxChain <- 0
        chains <- rep(0,N)
        x <- 1
        for (i in 1:N) {
                n <- i
                chain <- 0
                while(n > 1) {
                        n <- ifelse(n %% 2 == 0, n/2, 3*n+1)
                        chain <- chain + 1
                        if (n < N && chains[n] > 0) {
                            chain <- chain + chains[n]
                                break
                        }
                        
                }
                chains[i] <- chain
                if (chain > maxChain) {
                        maxChain <- chain
                        x <- i
                }
        }
        x
}

Project Euler Problem #12

February 22nd, 2009

Problem 12 on the Project Euler site asks:

What is the value of the first triangle number to have over five hundred divisors?

A triangular number T(n) is defined as tex:T(n) = \frac{n(n+1)}{2}. The R code below consists of a solution, which involves the fact that the number of proper divisors of an integer n can be calculated by first computing a prime factorisation of the number n, e.g. if tex:n = p^aq^b, where p,q are prime, then the number of proper divisors of n can be calculated as tex:d(n) = (a+1)(b+1). This solution is extremely slow (mainly due to the naive prime sieving algorithm), and could be speeded up dramatically with a little effort.

# Sieve of Eratosthenes
prime.sieve <- function(n) {
 a <- seq.int(1,n)
 p <- 1
 M <- as.integer(sqrt(n))
 while ((p <- p + 1) <= M) {
  if (a[p] != 0)
    a[seq.int(p*p, n, p)] <- 0
 }
 a[a>1 & a>0]
}
 

# Trial Division
# Returns the exponents of the prime
# factors of n
# e.g. if n = p^a*q^b
# tdiv(n) will return (a,b)
tdiv <- function(n) {
 primes <- prime.sieve(n)
 factors <- c()
 i <- 1
 curr <- 0
 
 for (p in primes) {
  while (n %% p == 0) {
   curr <- curr + 1
   n <- n %/% p
  }
  factors[i] <- curr
  i <- i + 1
  curr <- 0
 }
 
 factors[factors > 0]
}

# Compute nth triangular number
T <- function(n) {
        (n*(n+1))/2
}
 
## Problem 12
# This is a slooow solution
problem12 <- function(N) {
 n <- 0 # current triangular number Tn
 i <- 5 # \sum_{i=1}^n{i}
 
 while (TRUE) { 
  n <- T(i)
  factors <- tdiv(n) 
  if (prod(factors+1) >= N) {
   return(n)
  }
  i <- i + 1
 }
}