Category: Statistics

Fooled By Randomness

Post author By Rory Winston
Post date June 17, 2020

I recently read the great book Fooled By Randomness by Nicholas Nassim Taleb. There is a nice illustrative example in there on the scaling property of distributions across different time scales. The example is formulated as the hypothetical example of a dentist who has set up a home trading environment and started to invest in some assets (say stocks). The dentist has access to real-time market data and thus can check in on their current investment portfolio valuation at any point in time.

The investment portfolio has an expected annual return of 15% and volatility of 15%. The assertion in the book is that the more frequently the dentist appears to check the performance of the portfolio, the worse it will seem to perform(see the table below). The question is why, as at first glance it appears counter-intuitive.

The answer is as follows – if we assume a normal distribution for the portfolio returns, the parameters above define a distribution with a mean of 15% and a standard deviation of 10%, .i.e. $\mathcal{N}(0.15, 0.10)$

We can represent this in R as follows:

mu <- .15
sigma <- .1
x <- seq(-2,2,.01)
y <- dnorm(x,mean = mu, sd = sigma)
qplot(x,y) + geom_line() + geom_vline(aes(xintercept=mu),color='red') + geom_vline(aes(xintercept=c(mu-sigma,mu+sigma)),linetype=2)

The graph shows the implied distribution plus the mean (red line) and standard deviation (dashed lines)

Under this implied distribution, 2 standard deviations (or 95%) results in a return spanning -5% to +35%:

mu + 2*sigma
[1] 0.35
mu - 2 *sigma
[1] -0.05

Looking again at the table from the book above – if we calculate success as a return > 0, then we can calculate the probability of success for the distribution above. We can calculate this using the the normal cumulative distribution – which is basically the the sum of the area under the normal curve above – and find out what this value is for 0.

In R, pnorm() gives us the area under the curve. Actually it gives us $P(X<=0)$ so in order to calculate success $(P(X>0))$ we need to calculate the inverse probability:

1-pnorm(0,mean = mu, sd=sigma)
[1] 0.9331928

So the probability of success over one year is 93%, which matches the table above.

Now what happens if the dentist checks their portfolio more frequently – i.e. what does the return look like over 1 quarter as opposed to 1 year? In order to calculate this, we need to adjust the parameters to scale it to a quarterly distribution. So $\mu_Q = \frac{\mu_A}{4}$ and $\sigma_Q = \frac{\sigma_A}{\sqrt{4}}$, i.e. the mean will be divided by 4 to convert to quarters, and the standard deviation will be divided by the square root of 4, as volatility scales with the square root of time.

1-pnorm(0,mean = mu/4, sd=sigma/sqrt(4))
[1] 0.7733726

So we can see that moving from an annual to a quarterly distribution has lowered the probability of success from 93% to 77%

Similarly if we now move from quarterly to monthly, the distribution will be a $ \mathcal{N}{\frac{\mu_A}{12}, \frac{\sigma_A}{\sqrt{12}}}$ distribution:

1-pnorm(0,mu/12, sigma/sqrt(12))
[1] 0.6674972

Which is roughly 67%. Further, we can tabulate the various scaling factors all the way from 1 (annual) to 31536000 (1-second horizon) and calculate the probability of success for each time horizon. We can see the success probability diminish from 93% down to 50% at a one-second resolution.

factors <- c(1,4,12,52,365,8760,31536000)
sapply(factors, function(x) success(mu, sigma, x))
[1] 0.9331928 0.7733726 0.6674972 0.5823904 0.5312902 0.5063934 0.5001066

To make it clearer we can calculate the density of each distribution and plot it:

mu <- 0.15
sigma <- 0.1
x <- seq(-2,2,.01)
factors <- c(1,4,12,52,365,8760,31536000)
names <- c('x','annual','quarterly','monthly','weekly','daily','hourly','second')

# Calculate the density for the distribution at each timescale 
tab <- data.frame(x=x)
for (f in seq_along(factors)) {
  tab <- cbind(tab, dnorm(x, mu/factors[f], sd=sigma/sqrt(factors[f])))
}

# Plot each density in a separate panel
colnames(tab) <- names
ggplot(melt(tab,id.vars = 'x'), aes(x=x)) + geom_line(aes(y=value)) + facet_grid(variable~.,scales = 'free') + xlab('') + ylab('') + theme_void() + theme(strip.text.y = element_text(angle = 0))

Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb
My rating: 5 of 5 stars

Coding Finance R Statistics

Market Making and The Win/Loss Ratio

Post author By Rory Winston
Post date June 16, 2020

The article https://online.wsj.com/public/resources/documents/VirtuOverview.pdf is a neat little illustration of a simple asymptotic toy distribution given an initial probability of a win or loss per-trade. It is used as an example to illustrate the basic methodology behind the working market-maker business – develop a small edge and scale this up as cheaply as possible to maximise the probability of overall profit.

If we take $p=0.51$ as the probability of a win per-trade and then after $n$ transactions we will have a number of ‘wins’ k that will vary from 0 to n. We model each trade as the outcome of a binomial 0-1 trial.

In order to come out at breakeven or better, the number of wins k needs to be at least $\frac{n}{2}$. Using the binomial distribution this can be modelled as:

$P\left(n>\frac{k}{2}\right) = \sum_{\frac{k}{2}}^\infty \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}$

As the binomial distribution converges to a normal $\mathcal{N}(np, np(1-p))$ as n gets large, we can use the distribution below to model the win/loss probability over n:

$ \int_{\frac{k}{2}}^\infty \mathcal{N}\left(np, np(1-p) \right) dx $

Which is

$ \int_{\frac{k}{2}}^\infty \frac{1}{\sigma\sqrt{2}\pi}e^{-\frac{1}{2}\frac{x-\mu}{\sigma}^2} dx$

Where $\mu=np$ and $\sigma^2=np(1-p)$

This can be modelled in R

> p <- 0.51
> n <- 100
> 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
[1] 0.5792754
> n <- 1000
> 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
[1] 0.7364967

Showing that with a win probability of 51% 100 trades gives us a 57% probability of breakeven or better and 1000 trades gives us a 73% chance of breakeven or better.

We can plot the probability of breakeven holding p constant and changing n from 1 to 1000:

 n<-seq(1,1000)
> y <- 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
> library(ggplot2)
> library(scales)
> qplot(n,y)+scale_y_continuous(label=percent)

Which produces the following graph

Which shows the convergence to a sure 100% probability of profit as n gets large.

To make it more interesting we can generate different paths for n from 1 to 10000 but also vary the win probability from say 45% to 51% and look at the paths as we vary n and p:

n <- seq(1,10000)
p<- 0.5
y <- 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
plot(n, y, type='l', ylim=c(0,1))

probs <- seq(0.45, 0.55, length.out = 100)
for (pr in seq_along(probs)){ 
 p<-probs[pr]
 y<-1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
 lines(x=n,y=y,col=ifelse(y<0.5,rgb(1,0,0,.5),rgb(0,1,0,.5)))
}

Which shows the probabilities of breakeven or better given a number of different starting win/loss probabilities and a varying number of trades. The path with $p=0.5$ is shown in black.

Statistics

The Geometry Of Least Squares

Post author By Rory Winston
Post date December 29, 2013

Here’s a simple and intuitive way of looking at the geometry of a least squares regression:

Take the bottom left point in the triangle below as the origin O. For the linear model:

$$ Y=X\beta + \epsilon $$

Both $Y$ and $X\beta$ are vectors, and the residual vector $\epsilon$ is the difference. The standard least squares error technique uses $\epsilon^2$ or $(Y-X\beta)^T(Y-X\beta)$ as the error measure to be minimised, and this leads to the calculation of the $\beta$ coefficient vector.

Geometrically, the beta coefficients calculated by the least squares regression minimise the squared length of the error vector. This turns out to be the projection of $Y$ on to $X\beta$ – i.e. the perpendicular vector that turns (O, $Y$, $X\beta$) into a right-angled triangle.

The projection of $Y$ onto $X\beta$ is done using the projection matrix P, which is defined as

\[ P = X\left(X^{T}X\right)^{-1}X^{T} \]

So $ X\beta = \hat{Y} = PY $.

Using the Pythagorean theorem:

$ Y^TY = \hat{Y}^T\hat{Y} + (Y-X\beta)^T(Y-X\beta) $

In other words, the total sum of squares = sum of squares due to regression + residual sum of squares. This is a fundamental part of analysis of variance techniques.