The Problem With #defines

The C/C++ #define mechanism is a very powerful, but very blunt templating approach. It’s a simple textual search-and-replace mechanism, which can be very useful, but not so subtle.

This was illustrated to me today when I ran into a problem with some code for an R extension that I am writing, that interfaces to Reuters market data feeds. The offending code is shown below:


RTRXFileDb* configDb = new RTRXFileDb(configPath);
if (configDb->error()) {
      Rf_error("Error initializing SFC: %s", configDb->errorText());
            return ScalarReal(-1);
}

This code creates an object instance, and checks the boolean member property error() property for problems. If there is a problem, it calls the R function error() to print the error to the R console.

When I tried to compile this code, however, I got the following error:

error C2039: ‘Rf_error’ : is not a member of ‘RTRXFileDb’

This initially confused me, but then I looked at the R header file error.h and found the definition of error():

#define R_ERROR_H_

#ifdef  __cplusplus
extern "C" {
#endif

void    Rf_error(const char *, ...);
void    Rf_warning(const char *, ...);
void    WrongArgCount(const char *);
void    UNIMPLEMENTED(const char *);
void    R_ShowMessage(const char *s);
    

#ifdef  __cplusplus
}
#endif

#ifndef R_NO_REMAP
#define error Rf_error
#define warning Rf_warning
#endif

#endif /* R_ERROR_H_ */

So error is mapped to Rf_error via a #define directive, meaning that whenever the preprocessor encountered the “error” token, it went ahead and replaced it with the R-mapped definition, regardless of its syntactic context. This is normally not a problem, apart from when we get name collisions, as we have here. There are a few potential ways to fix this, none of them ideal. Some of the options are:

  • #define R_NO_REMAP in the preprocessor, which will stop error being mapped to Rf_error, and just use Rf_error explicitly. The problem with this approach is that R_NO_REMAP will undefine a whole bunch of other R functions, making it tedious to replace every one.
  • Change the R header file: either change the #define block to map Rf_error to _error, or some alternative name, and just change the usages of that, or alternatively remove the error mapping completely.
  • Use #undef error and just use Rf_error explicitly (Pointed out on Reddit)

I’m going to go for one of the latter options, as they are the path of least resistance. This was actually a very small issue. But name collisions of this type can potentially be nasty, or at the very least tedious to fix.

Project Euler Problem #9 (R)

Problem 9 involves finding the Pythagorean triple \(a^2+b^2=c^2\) where a+b+c=1000.

Pythagorean triples have lots of useful relationships that we can exploit. The ones we shall use to solve this problem are:

\[a = k(m^2-n^2)\]

\[b = k(2mn)\]

\[ c = k(m^2+n^2) \]
In this case, we take \(k=1\), and generate the triples as follows:

# Problem 9
# Pythagorean Triple
problem9 <- function() {

        sum <- 0
        a <- 0; b <- 0; c <- 0;
        m <- 0

        for (m in 1:1000) {
         for (n in m:1000) {
         a <- m^2-n^2
         b <- 2*m*n
         c <- n^2 + m^2

         if (all(a^2 + b^2 == c^2, sum(a,b,c)==1000)) {
          return(prod(a,b,c))
         }
        }
        }
}

Project Euler Problem #8 (R)

Problem #8 on the Project Euler problem list requires finding the greatest product of five consecutive digits in a large string of integers.

For this problem, I copied and pasted the large number into a file, and read it in using scan(). As there were line breaks in the copied and pasted numbers, I used paste() with the collapse argument to restore it to a single line. From there, it was a matter of taking 5-character substrings, converting them to numbers, and using prod() to calculate the product of each group of 5 numbers.

gprod <- function() {
 # Read number as string
 snum <- scan(file="number.dat", what=character(0),
  strip.white=TRUE, quiet=TRUE)
 # Concatenate lines into a single line
 snum <- paste(snum, collapse="", sep="")
 mp <- 0
 for (i in 1:(nchar(snum)-4)) {
  s <- substr(snum, i, i+4)
  m <- prod(as.numeric(strsplit(s,"")[[1]]))
  if (m > mp) {
   mp <- m
}
 }
 mp
}

Project Euler Problem #7 (R)

Problem 7 on the Project Euler site is the following challenge:

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.

What is the 10001st prime number?

The routine below returns a list of primes, each successive prime being calculated by testing none of the currently known primes divide the current integer counter value evenly. If not, the current integer value is added to the list of known primes.

nprime <- function(n) {
        if (n == 1) {
                return (2)
        }
        psofar <- c()
        psofar[1] <- 2
        j <- 2
        p <- 1
        i <- 2
        while (p < n)  {
                if ( all((i %% psofar) > 0) ) {
                        psofar[j] <- i
                        j <- j + 1
                        p <- p + 1
                }
                i <- i + 1
        }
        psofar
}

To return the 10001st prime, we invoke it like so:

>tail(psofar(10001),1)

Project Euler Problem #6 (R)

Problem 6 in the Project Euler list asks us to:

Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

The literal translation of this is:

sumdiff <- function(n) {
    sum(c(1:n))^2-sum(c(1:n)^2)
}

However, this is another problem (like Problem #1) that has a closed-form solution, and so can run in constant time.

The closed form expression for the sum of the squares is

\[ sum_{i=1}^{n}i^2 = \frac{1}{6}n(2n+1)(n+1) \]

The closed form expression for the square of the sum is

\[ \left(\sum_{i=1}^{n}i\right)^2 = \left(\frac{n(n+1)}{2}\right)^2 \]

So if we subtract and simplify, we end up with the solution

\[ \frac{1}{12}n(3n^3+2n^2-3n-2) \]

The solution is shown below:

sumdiff2 <- function(n) {
    #(1/4)*(n^2)*(1+n)^2 - (1/6)*n*(1+n)*(1+2*n)
    (1/12)*(n)*(-2-3*n+2*n^2+3*n^3)
}

We can calculate the answer as shown:

> sumdiff2(100)
[1] 25164150

Needless to say, closed form solutions are preferable where possible.

Project Euler Problem #5 (R)

Problem 5 on the Project Euler list is the following:

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.

What is the smallest number that is evenly divisible by all of the numbers from 1 to 20?

This is asking us to compute the lowest common multiple (lcm) from 1 to 20. The lcm of two integers a and b is defined as

\[ \frac{ab}{\mbox{gcd}(a,b)} \]

In order to compute the gcd, we can use a simple algorithm, like a Euclidian reduction:

# Iterative Euclidean algorithm
gcd <- function(a,b) {
    t <- 0
    while (b != 0) {
        t <- b
        b <- a %% b
        a <- t
    }
    a
}

Defining the lcm then becomes:

lcm <- function(a,b) {
    abs(a * (b/(gcd(a,b))))
}

Now we have defined pretty much everything we need to solve the problem. We can write an iterative lcm routine that calculates the least common denominator from 1 to x:

lcm_ <- function(xs) {
    l <- 1
    for (i in 1:(length(xs))) {
        l <- lcm(l,xs[i])
    }
    l
}

And we can calculate the answer:

> lcm_(c(1:20))
[1] 232792560

However, I also experimented with writing a reduce() function (similar to Haskell’s foldr operation). We can write reduce() as follows:

reduce <- function(op,x,xs) {
    y <- x
    for (i in 1:(length(xs)-1)) {
        y <- do.call(op, list(y, xs[i]))
    } 
    y
}

And then invoke:

>reduce(lcm, 1, c(1:20))
[1] 232792560

Note that R actually has inbuilt Map/Reduce operations. Using the native Reduce, this is:

> Reduce(f=lcm, x=c(1:20), right=1)
[1] 232792560

Project Euler Problem #4 (R)

Problem 4 in on Project Euler’s list is the following:

Find the largest palindrome made from the product of two 3-digit numbers.

We can attack this as follows: firstly, we know that for an n-digit number, the minimum n-digit number is \(10^{n-1}\) and the maximum is \(9\sum_{i=0}^{n-1}10^{i}\). So we need the cartesian product of all numbers in this range, which we can then test for the palindrome property, and finally filter for the maximum element.

Testing for the palindrome property requires that we check each digit in the number for equality on either side. We do this first by calculating the number of digits in the product (using \(\lfloor\log_{10}(x)\rfloor\) and then just comparing the digits by using the modulus and shifting. Note a couple of points in the R code below:

  • %/% is integer division;
  • Note the vectorization at work in statements like (p %/% 10^x): this will divide p by successive powers of 10, where the powers are defined in the x vector;
  • Note the use of the logical all() function to test whether all palindromic contenders are equal.
  • pal < - function(N) {
     palindromes <- c()
     min_no <- 1*10^(N-1)
     max_no <- sum(9*10^(seq(0,N-1)))
     n <- 1
     for (i in seq(max_no,min_no,-1)) {
      for (j in seq(i,min_no,-1)) {
       p <- i * j
       digits <- floor(log10(p))
       x <- seq(0, digits %/% 2)
        if (all( ((p %/% 10^x) %% 10)==((p %/% 10^(digits-x)) %% 10) )) {
         palindromes[n] <- p
         n <- n + 1
        }
      }
     }
     max(palindromes)
    }

Project Euler Problem #3 (R)

The third problem in Project Euler is an interesting one:

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143 ?

In order to compute prime factors, we can use a sieving method. The simplest and oldest sieving algorithm is the sieve of Erastothenes, which we can implement in R as follows:

sieve <- function(n) {
    if (n <= 1) {
        return(numeric(0))
    }
    if (n == 2) {
        return(c(2))
    }

    a <- seq(2,n)
    b <- c()
    b[1] <- 2

    i <- 1
    while (length(a) > 0) {
        b[i] <- a[1]
        i <- i + 1
        a <- a[!(a %% a[1] == 0)]
    }
    b
}

So now we need to compute the largest prime factor of 600851475143. An upper bound for the factorisation is \(p \leq \lfloor\sqrt{x}\rfloor\), so we can halt our search at this point (alternatively, we may be able to begin our search here and work downwards). Let’s do it the obvious (and inefficient) way, starting at 1 and working up to \(\lfloor\sqrt{x}\rfloor\), building up a list of prime factors along the way:

pfactor <- function(x) {
    factors <- c()
    j <- 1
    primes <- sieve( floor(sqrt(x)) )
    for (prime in primes) {
        if (x %% prime == 0) {
            factors[j] <- prime
            j <- j + 1
    }
    }
    factors
}

Now to compute the maximum factor, we just take the maximum of this list:

max( pfactor(600851475143) )

Sweave, LaTeX, and Font Encodings

I just found out how to save a couple of frustrating hours trying to track down why font encodings don’t seem to work in LaTeX when using Sweave. Somewhere along the line, it looks like ae.sty is getting directly or indirectly invoked by Sweave, which means that any font encoding directives placed before usepackage Sweave will be ignored. The solution is to declare the Sweave package import like so:

\usepackage[noae]{Sweave}

And fonts should work as specified.