Coding R

The Mandelbrot Set

The Mandelbrot Set is possibly one of the most visually appealing mathematical visualizations. The set is defined as the set of complex numbers $c$ for which the function $f_c(z) = z^2 + c$ remains bounded when iterated – i.e. the sequence $f_c(z), f_c(f_c(z)),f_c(f_c(f_c(…)))$ remains bounded below some fixed absolute threshold.

The basic method to generate a simple Mandelbrot set is fairly simple:

  • Choose the values of $c$ to put in the iteration $z\rightarrow z^2+c$;
  • Iteratively apply the mapping for each value of $c$ and test if the updated value is now $\geq \epsilon$ where $\epsilon$ is some fixed absolute threshold;
  • Update our output grid to reflect the intensity at point $c$ – this can be a simple 1/0 to indicate that the iteration has become unbounded (or ‘escaped’) or it can be a value reflecting how many iterations it took before the value was determined to escape (or not)

R has a ‘native’ representation for complex numbers but for this example we will just use a 2-element vector as a complex number representation, each element being the real and imaginary components respectively.

The iterative step at each point to update the current value of $z \rightarrow z^2+c$ is computed as follows – if $z$ is assumed to be a complex number of the form $x+yi$:

$$(x+yi)(x+yi) = x^2-y^2+2xyi$$

So $Re(z)\rightarrow x^2-y^2+Re(c) $ and $Im(z) \rightarrow 2xy+Im(c)$

Here is some simple R code to illustrate:

# Subset of x-y plane
x <- c(-2,1)
y <- c(-1.5, 1.5)

# Number of points in each dimension
nx <- 500
ny <- 500

xgrid <- seq(x[1],x[2], length.out = nx)
ygrid <- seq(y[1],y[2], length.out = ny)

mand <- function(xgrid,ygrid,frac) {
  i <- j <- it <- 1
  z <- c <- zprev <- c(0,0)
  TOL <- 4  # Escape value
  N <- 50   # Number of iterations to test for escape
  # The output fractal trace
  frac <- matrix(nrow=nx, ncol=ny, byrow=TRUE)
  for (i in seq_along(xgrid)) {
    for (j in seq_along(ygrid)) {
      c <- c(xgrid[i],ygrid[j])  
      z <- c(0,0)
      for (k in 1:N) {
        it <- k
        zprev <- z
        # If zprev is a complex number x + yi
        # Then we update z as follows:
        # Re(z) <- (x^2-y^2) + Re(c)
        z[1] <- zprev[1]*zprev[1]-zprev[2]*zprev[2]+c[1]
        # Im(z) <- 2*z*y + Im(c)
        z[2] <- 2*zprev[1]*zprev[2]+c[2]            
        if ((z[1]*z[1] + z[2]*z[2])>TOL) { break }
      frac[i,j] <- it

To call the function and plot the output:

m <- mand(xgrid,ygrid,frac)
image(m, col=gray.colors(50)) 

Which should produce an image like the following:

Changing the $x$ and $y$ grid coordinates enables you too zoom in to various areas on the fractal e.g. changing the x/y grid to:

x <- c(-2,-0.75)
y <- c(-0.5, 0.5)

Results in the image below

The main drawback of iterative plain R code like this is that it is slow – however this could be speeded up using a native interface e.g. Rcpp and computing the iterative process in C/C++ code, where some of the copying overhead may be avoided.

Coding Finance R Statistics

Fooled By Randomness

I recently read the great book Fooled By Randomness by Nicholas Nassim Taleb. There is a nice illustrative example in there on the scaling property of distributions across different time scales. The example is formulated as the hypothetical example of a dentist who has set up a home trading environment and started to invest in some assets (say stocks). The dentist has access to real-time market data and thus can check in on their current investment portfolio valuation at any point in time.

The investment portfolio has an expected annual return of 15% and volatility of 15%. The assertion in the book is that the more frequently the dentist appears to check the performance of the portfolio, the worse it will seem to perform(see the table below). The question is why, as at first glance it appears counter-intuitive.

The answer is as follows – if we assume a normal distribution for the portfolio returns, the parameters above define a distribution with a mean of 15% and a standard deviation of 10%, .i.e. $\mathcal{N}(0.15, 0.10)$

We can represent this in R as follows:

mu <- .15
sigma <- .1
x <- seq(-2,2,.01)
y <- dnorm(x,mean = mu, sd = sigma)
qplot(x,y) + geom_line() + geom_vline(aes(xintercept=mu),color='red') + geom_vline(aes(xintercept=c(mu-sigma,mu+sigma)),linetype=2)

The graph shows the implied distribution plus the mean (red line) and standard deviation (dashed lines)

Under this implied distribution, 2 standard deviations (or 95%) results in a return spanning -5% to +35%:

mu + 2*sigma
[1] 0.35
mu - 2 *sigma
[1] -0.05

Looking again at the table from the book above – if we calculate success as a return > 0, then we can calculate the probability of success for the distribution above. We can calculate this using the the normal cumulative distribution – which is basically the the sum of the area under the normal curve above – and find out what this value is for 0.

In R, pnorm() gives us the area under the curve. Actually it gives us $P(X<=0)$ so in order to calculate success $(P(X>0))$ we need to calculate the inverse probability:

1-pnorm(0,mean = mu, sd=sigma)
[1] 0.9331928

So the probability of success over one year is 93%, which matches the table above.

Now what happens if the dentist checks their portfolio more frequently – i.e. what does the return look like over 1 quarter as opposed to 1 year? In order to calculate this, we need to adjust the parameters to scale it to a quarterly distribution. So $\mu_Q = \frac{\mu_A}{4}$ and $\sigma_Q = \frac{\sigma_A}{\sqrt{4}}$, i.e. the mean will be divided by 4 to convert to quarters, and the standard deviation will be divided by the square root of 4, as volatility scales with the square root of time.

1-pnorm(0,mean = mu/4, sd=sigma/sqrt(4))
[1] 0.7733726

So we can see that moving from an annual to a quarterly distribution has lowered the probability of success from 93% to 77%

Similarly if we now move from quarterly to monthly, the distribution will be a $ \mathcal{N}{\frac{\mu_A}{12}, \frac{\sigma_A}{\sqrt{12}}}$ distribution:

1-pnorm(0,mu/12, sigma/sqrt(12))
[1] 0.6674972

Which is roughly 67%. Further, we can tabulate the various scaling factors all the way from 1 (annual) to 31536000 (1-second horizon) and calculate the probability of success for each time horizon. We can see the success probability diminish from 93% down to 50% at a one-second resolution.

factors <- c(1,4,12,52,365,8760,31536000)
sapply(factors, function(x) success(mu, sigma, x))
[1] 0.9331928 0.7733726 0.6674972 0.5823904 0.5312902 0.5063934 0.5001066

To make it clearer we can calculate the density of each distribution and plot it:

mu <- 0.15
sigma <- 0.1
x <- seq(-2,2,.01)
factors <- c(1,4,12,52,365,8760,31536000)
names <- c('x','annual','quarterly','monthly','weekly','daily','hourly','second')

# Calculate the density for the distribution at each timescale 
tab <- data.frame(x=x)
for (f in seq_along(factors)) {
  tab <- cbind(tab, dnorm(x, mu/factors[f], sd=sigma/sqrt(factors[f])))

# Plot each density in a separate panel
colnames(tab) <- names
ggplot(melt(tab,id.vars = 'x'), aes(x=x)) + geom_line(aes(y=value)) + facet_grid(variable~.,scales = 'free') + xlab('') + ylab('') + theme_void() + theme(strip.text.y = element_text(angle = 0))

Coding Finance R Statistics

Market Making and The Win/Loss Ratio

The article is a neat little illustration of a simple asymptotic toy distribution given an initial probability of a win or loss per-trade. It is used as an example to illustrate the basic methodology behind the working market-maker business – develop a small edge and scale this up as cheaply as possible to maximise the probability of overall profit.

If we take $p=0.51$ as the probability of a win per-trade and then after $n$ transactions we will have a number of ‘wins’ k that will vary from 0 to n. We model each trade as the outcome of a binomial 0-1 trial.

In order to come out at breakeven or better, the number of wins k needs to be at least $\frac{n}{2}$. Using the binomial distribution this can be modelled as:

$P\left(n>\frac{k}{2}\right) = \sum_{\frac{k}{2}}^\infty \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}$

As the binomial distribution converges to a normal $\mathcal{N}(np, np(1-p))$ as n gets large, we can use the distribution below to model the win/loss probability over n:

$ \int_{\frac{k}{2}}^\infty \mathcal{N}\left(np, np(1-p) \right) dx $

Which is

$ \int_{\frac{k}{2}}^\infty \frac{1}{\sigma\sqrt{2}\pi}e^{-\frac{1}{2}\frac{x-\mu}{\sigma}^2} dx$

Where $\mu=np$ and $\sigma^2=np(1-p)$

This can be modelled in R

> p <- 0.51
> n <- 100
> 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
[1] 0.5792754
> n <- 1000
> 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
[1] 0.7364967

Showing that with a win probability of 51% 100 trades gives us a 57% probability of breakeven or better and 1000 trades gives us a 73% chance of breakeven or better.

We can plot the probability of breakeven holding p constant and changing n from 1 to 1000:

> y <- 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
> library(ggplot2)
> library(scales)
> qplot(n,y)+scale_y_continuous(label=percent)

Which produces the following graph

Which shows the convergence to a sure 100% probability of profit as n gets large.

To make it more interesting we can generate different paths for n from 1 to 10000 but also vary the win probability from say 45% to 51% and look at the paths as we vary n and p:

n <- seq(1,10000)
p<- 0.5
y <- 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
plot(n, y, type='l', ylim=c(0,1))

probs <- seq(0.45, 0.55, length.out = 100)
for (pr in seq_along(probs)){ 
 y<-1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))

Which shows the probabilities of breakeven or better given a number of different starting win/loss probabilities and a varying number of trades. The path with $p=0.5$ is shown in black.