# Statistical Arbitrage II : Simple FX Arbitrage Models

In the context of the foreign exchange markets, there are several simple no-arbitrage conditions, which, if violated outside of the boundary conditions imposed by transaction costs, should provide the arbitrageur with a theoretical profit when market conditions converge to theoretical normality.

Detection of arbitrage conditions in the FX markets requires access to high-frequency tick data, as arbitrage opportunities are usually short-lived. Various market inefficiency conditions exist in the FX markets. Apart from the basic strategies outlined in the following sections, other transient opportunities may exist, if the trader or trading system can detect and act on them quickly enough.

Round-Trip Arbitrage
Possibly the most well-known no-arbitrage boundary condition in foreign exchange is the covered interest parity condition. The covered interest parity condition is expressed as:

$(1+r_d) = \frac{1}{S_t}(1+r_f)F_t$

which specifies that it should not be possible to earn positive return by borrowing domestic assets at $$$r_d$$ for lending abroad at $$r_f$$ whilst covering the exchange rate risk through a forward contract $$F_t$$ of equal maturity. Accounting for transaction costs, we have the following no-arbitrage relationships: $(1+r_d^a) \geq \frac{1}{S^a}(1+r_f^b)F^b$ $(1+r_f^b) \geq S^b(1+r_d^b)\frac{1}{F^a}$ For example, the first condition states that the cost of borrowing domestic currency at the ask rate ($$1+r_d^a$$) should be at least equal to the cost of converting said currency into foreign currency ($$\frac{1}{s^a}$$) at the prevailing spot rate $$S^a$$ (assuming that the spot quote $$S^a$$ represents the cost of a unit of domestic currency in terms of foreign currency), invested at $$1+r_f^b$$, and finally converted back into domestic currency via a forward trade at the ask rate ($$F^a$$). If this condition is violated, then we can perform round-trip arbitrage by converting, investing, and re-converting at the end of the investment term. Persistent violations of this condition are the basis for the roaring carry trade, in particular between currencies such as the Japanese Yen and higher yielding currencies such as the New Zealand dollar and the Euro. Triangular Arbitrage A reduced form of FX market efficiency is that of triangular arbitrage, which is the geometric relationship between three currency pairs. Triangular arbitrage is defined in two forms, forward arbitrage and reverse arbitrage. These relationships are defined below. $$\left(\frac{C_1}{C_2}\right)_{ask} \left(\frac{C_2}{C_3}\right)_{ask} = \left(\frac{C_1}{C_3}\right)_{bid} \\ \left(\frac{C_1}{C_2}\right)_{bid} \left(\frac{C_2}{C_3}\right)_{bid} = \left(\frac{C_1}{C_3}\right)_{ask}$$ With two-way high-frequency prices, we can simultaneously calculate the presence of forward and reverse arbitrage. A contrived example follows: if we have the following theoretical two-way tradeable prices: $$\left(\frac{USD}{JPY}\right) = 90/110$$, $$\left(\frac{GBP}{USD}\right) = 1.5/1.8$$, and $$\left(\frac{JPY}{GBP}\right) = 150/200$$. By the principle of triangular arbitrage, the theoretical two-way spot rate for JPY/GBP should be $$135/180$$. Hence, we can see that JPY is overvalued relative to GBP. We can take advantage of this inequality as follows, assuming our theoretical equity is 1 USD: • Pay 1 USD and receive 90 JPY ; • Sell 90 JPY ~ and receive $$\left(\frac{90}{135}\right)$$ GBP ; • Pay $$\left(\frac{90}{135}\right)$$ GBP, and receive $$\left(\frac{90}{135}\right) \times$$ 1.5 USD = 1.32 USD. We can see that reverse triangular arbitrage can detect a selling opportunity (i.e. the bid currency is overvalued), whilst forward triangular arbitrage can detect a buying opportunity (the ask currency is undervalued). The list of candidate currencies could be extended, and the arbitrage condition could be elegantly represented by a data structure called a directed graph. This would involve creating an adjacency matrix $$R$$, in which an element $$R_{i,j}$$ contains a measure representing the cost of transferring between currency $$i$$ and currency $$j$$. Estimating Position Risk When executing an arbitrage trade, there are some elements of risk. An arbitrage trade will normally involve multiple legs which must be executed simultaneously and at the specified price in order for the trade to be successful. As most arbitrage trades capitalize on small mispricings in asset values, and rely on large trading volumes to achieve significant profit, even a minor movement in the execution price can be disastrous. Hence, a trading algorithm should allow for both trading costs and slippage, normally by adding a margin to the profitability ratio. The main risk in holding an FX position is related to price slippage, and hence, the variance of the currency that we are holding. # Quasi-Random Number Generation in R Random number generation is a core topic in numerical computer science. There are many efficient algorithms for generating random (strictly speaking, pseudo-random) variates from different probability distributions. The figure below shows a sampling of 1000 two-dimensional random variates from the standard Gaussian and Cauchy distributions, respectively. The size of the extreme deviations of the Cauchy distribution is apparent from the graph. However, sometimes we need to produce numbers that are more evenly distributed (quasi-random numbers). For example, in a Monte Carlo integration exercise, we can get faster convergence with a lower error bound using so-called low-discrepancy random sequences, using the GSL library. In the figure below, we show two-dimensional normal and Sobol (a low-discrepancy generator) variates. Normal, Cauchy, and Sobol 2-d variates To generate the graph below, I used the GSL library for R, as shown below: library(gsl) q <- qrng_alloc(type="sobol", 2) rs <- qrng_get(q,1000) par(mfrow=c(3,1)) plot(rnorm(1000), rnorm(1000), pch=20, main="~N(0,1)", ylab="", xlab="") plot(rs, pch=20, main="Sobol", ylab="", xlab="") plot(rcauchy(1000), rcauchy(1000),pch=20, main="~C(0,1)", ylab="",xlab="")  The property of low-discrepancy generators is even more apparent if we view the random variates in a higher dimension, for example the figure below shows the variates as a 3-dimensional cube. Note how the clustering around the centre of the cube is much more pronounced for the Gaussian cube. 3D Random Variates To plot the figure above, I used the GSL and Lattice libraries: library(gsl) library(lattice) q <- qrng_alloc(type="sobol", 3) npoints <- 200 rs <- qrng_get(q,npoints) ltheme <- canonical.theme(color = FALSE) ltheme$strip.background$col <- "transparent" lattice.options(default.theme = ltheme) trellis.par.set(layout.heights = list(top.padding = -20, main.key.padding = 1, key.axis.padding = 0, axis.xlab.padding = 0, xlab.key.padding = 0, key.sub.padding = 0, bottom.padding = -20)) # Plot the normal variates in a 3-dim cube p1 <- cloud(rnorm(npoints) ~ rnorm(npoints) + rnorm(npoints), xlab="x", ylab="y", zlab="z", pch=20, main="~N(0,1)") p2 <- cloud(rs[,1] ~ rs[,2] + rs[,3], xlab="x", ylab="y", zlab="z", pch=20, main="Sobol") print(p1, split=c(1,1,2,1), more=TRUE) print(p2, split=c(2,1,2,1))  # High-Frequency Statistical Arbitrage Computational statistical arbitrage systems are now de rigeur, especially for high-frequency, liquid markets (such as FX). Statistical arbitrage can be defined as an extension of riskless arbitrage, and is quantified more precisely as an attempt to exploit small and consistent regularities in asset price dynamics through use of a suitable framework for statistical modelling. Statistical arbitrage has been defined formally (e.g. by Jarrow) as a zero initial cost, self-financing strategy with cumulative discounted value $$v(t)$$ such that: • $$v(0) = 0$$, • $$\lim_{t\to\infty} E^P[v(t)] > 0$$, • $$\lim_{t\to\infty} P(v(t) < 0) = 0$$, • $$\lim_{t\to\infty} \frac{Var^P[v(t)]}{t}=0 \mbox{ if } P(v(t)<0) > 0 \mbox{ , } \forall{t} < \infty$$ These conditions can be described as follows: (1) the position has a zero initial cost (it is a self-financing trading strategy), (2) the expected discounted profit is positive in the limit, (3) the probability of a loss converges to zero, and (4) a time-averaged variance measure converges to zero if the probability of a loss does not become zero in finite time. The fourth condition separates a standard arbitrage from a statistical arbitrage opportunity. We can represent a statistical arbitrage condition as $$\left| \phi(X_t – SA(X_t))\right| < \mbox{TransactionCost}$$ Where $$\phi()$$ is the payoff (profit) function, $$X$$ is an arbitrary asset (or weighted basket of assets) and $$SA(X)$$ is a synthetic asset constructed to replicate the payoff of $$X$$. Some popular statistical arbitrage techniques are described below. Index Arbitrage Index arbitrage is a strategy undertaken when the traded value of an index (for example, the index futures price) moves sufficiently far away from the weighted components of the index (see Hull for details). For example, for an equity index, the no-arbitrage condition could be expressed as: $\left| F_t – \sum_{i} w_i S_t^i e^{(r-q_i)(T-t)}\right| < \mbox{Cost}$ where $$q_i$$ is the dividend rate for stock i, and $$F_t$$ is the index futures price at time t. The deviation between the futures price and the weighted index basket is called the basis. Index arbitrage was one of the earliest applications of program trading. An alternative form of index arbitrage was a system where sufficient deviations in the forecasted variance of the relationship (estimated by regression) between index pairs and the implied volatilities (estimated from index option prices) on the indices were classed as an arbitrage opportunity. There are many variations on this theme in operation based on the VIX market today. Statistical Pairs trading is based on the notion of relative pricing - securities with similar characteristics should be priced roughly equally. Typically, a long-short position in two assets is created such that the portfolio is uncorrelated to market returns (i.e. it has a negligible beta). The basis in this case is the spread between the two assets. Depending on whether the trader expects the spread to contract or expand, the trade action is called shorting the spread or buying the spread. Such trades are also called convergence trades. A popular and powerful statistical technique used in pairs trading is cointegration, which is the identification of a linear combination of multiple non-stationary data series to form a stationary (and hence predictable) series. Trading Algorithms In recent years, computer algorithms have become the decision-making machines behind many trading strategies. The ability to deal with large numbers of inputs, utilise long variable histories, and quickly evaluate quantitative conditions to produce a trading signal, have made algorithmic trading systems the natural evolutionary step in high-frequency financial applications. Originally the main focus of algorithmic trading systems was in neutral impact market strategies (e.g. Volume Weighted Average Price and Time Weighted Average Price trading), however, their scope has widened considerably, and much of the work previously performed by manual systematic traders can now be done by “black box” algorithms. Trading algorithms are no different from human traders in that they need an unambiguous measure of performance – i.e. risk versus return. The ubiquitous Sharpe ratio ($$\frac{\mu_r – \mu_f}{\sigma}$$) is a popular measure, although other measures are also used. A measure of trading performance that is commonly used is that of total return, which is defined as $R_T \equiv \sum_{j=1}^{n}r_j$ over a number of transactions n, and a return per transaction $$r_j$$. The annualized total return is defined as $$R_A = R_T \frac{d_A}{d_T}$$, where $$d_A$$ is the number of trading days in a year, and $$d_T$$ is the number of days in the trading period specified by $$R_T$$. The maximum drawdown over a certain time period is defined as $$D_T \equiv \max(R_{t_a}-R_{t_b}|t_0 \leq t_a \leq t_b \leq t_E)$$, where $$T = t_E – t_0$$, and $$R_{t_a}$$ and $$R_{t_b}$$ are the total returns of the periods from $$t_0$$ to $$t_a$$ and $$t_b$$ respectively. A resulting indicator is the Stirling Ratio, which is defined as $SR = \frac{R_T}{D_T}$ High-frequency tick data possesses certain characteristics which are not as apparent in aggregated data. Some of these characteristics include: • Non-normal characteristic probability distributions. High-frequency data may have large kurtosis (heavy tails), and be asymmetrically skewed; • Diurnal seasonality – an intraday seasonal pattern influenced by the restrictions on trading times in markets. For instance, trading activity may be busiest at the start and end of the trading day. This may not apply so much to foreign exchange, as the FX market is a decentralized 24-hour operation, however, we may see trend patterns in tick interarrival times around business end-of-day times in particular locations; • Real-time high frequency data may contain errors, missing or duplicated tick values, or other anomalies. Whilst historical data feeds will normally contain corrections to data anomalies, real-time data collection processes must be aware of the fact that adjustments may need to be made to the incoming data feeds. # The Euler Method In R The Euler Method is a very simple method used for numerical solution of initial-value problems. Although there are much better methods in practise, it is a nice intuitive mechanism. The objective is to find a solution to the equation $$\frac{dy}{dt} = f(t,y)$$ over a grid of points (equally spaced, in our case). Euler’s method uses the relation (basically a truncated form of Taylor’s approximation theorem with the error terms chopped off): $$y(t_{i+1}) = y(t_i) + h\frac{df(t_i, y(t_i))}{dt}$$ In R, we can express this iterative solution as: euler <- function(dy.dx=function(x,y){}, h=1E-7, y0=1, start=0, end=1) { nsteps <- (end-start)/h ys <- numeric(nsteps+1) ys[1] <- y0 for (i in 1:nsteps) { x <- start + (i-1)*h ys[i+1] <- ys[i] + h*dy.dx(x,ys[i]) } ys }  Note that given the start and end points, and the size of each step, we figure out the number of steps. Inside the loop, we calculate each successive approximation. An example using the difference equation $$\frac{df(x,y)}{dx} = 3x – y + 8$$ is: dy.dx <- function(x,y) { 3*x - y + 8 } euler(dy.dx, start=0, end=0.5, h=0.1, y0=3) [1] 3.00000 3.50000 3.98000 4.44200 4.88780 5.31902  # Newton’s Method In R Here is a toy example of implementing Newton’s method in R. I found some old code that I had written a few years ago when illustrating the difference between convergence properties of various root-finding algorithms, and this example shows a couple of nice features of R. Newtons method is an iterative root-finding algorithm where we start with an initial guess $$x_0$$ and each successive guess is refined using the iterative process: $$x_{n+1} = x_n – \frac{f(x_n)}{f'(x_n)}$$ Here is a simple implementation of this in R: newton <- function(f, tol=1E-12,x0=1,N=20) { h <- 0.001 i <- 1; x1 <- x0 p <- numeric(N) while (i<=N) { df.dx <- (f(x0+h)-f(x0))/h x1 <- (x0 - (f(x0)/df.dx)) p[i] <- x1 i <- i + 1 if (abs(x1-x0) < tol) break x0 <- x1 } return(p[1:(i-1)]) }  Note a couple of things: * The step-size for numerical differentiation is hardcoded to be 0.001. This is arbitrary and should probably be a parameter. * The algorithm will run until either the number of steps N has been reached, or the error tolerance $$\left|x_{n+1}-x_n\right|< \epsilon$$, where $$\epsilon$$ is defined as the tolerance parameter tol. * The function returns a vector of the iterated x positions, which will be <= N. Here is a simple example: for instance, to find the zeros of the function $$f(x) = x^3 + 4x^2 - 10$$ which in R we can represent as: f <- function(x) { x^3 + 4*x^2 -10 } Let’s plot the function over the range [1,2]: plot(x,f(x), type='l', lwd=1.5, main=expression(sin(x/2) * cos(x/4))) abline(h=0)  It seems obvious from the plot that the zero of f(x) in the range [1,2] is somewhere between x=1.3 and x=1.4. This is made even more clear if we tabulate the x,y values over this range: > xy <- cbind(x,f(x)) > xy x [1,] 1.0 -5.000 [2,] 1.1 -3.829 [3,] 1.2 -2.512 [4,] 1.3 -1.043 [5,] 1.4 0.584 [6,] 1.5 2.375 [7,] 1.6 4.336 [8,] 1.7 6.473 [9,] 1.8 8.792 [10,] 1.9 11.299 [11,] 2.0 14.000  Using the function defined earlier, we can run the root-finding algorithm: > p <- newton(f, x0=1, N=10) > p [1] 1.454256 1.368917 1.365238 1.365230 1.365230 1.365230 1.365230  This returns 1.365230 as the root of f(x). Plotting the last value in the iteration: > abline(v=p[length(p)])  In the iteration loop, we use a schoolbook-style numerical derivative: $$\frac{dy}{dx} = \frac{f(x+\epsilon)-f(x)}{\epsilon}$$ Its also worth noting that R has some support for symbolic derivatives, so we could use a symbolic function expression. A simple example of symbolic differentiation in R (note that R works with expression instances when using symbolic differentiation): > e <- expression(sin(x/2)*cos(x/4)) > dydx <- D(e, "x") > dydx cos(x/2) * (1/2) * cos(x/4) - sin(x/2) * (sin(x/4) * (1/4))  In order for this to be useful, obviously we need to be able to evaluate the calculated derivative. We can do this using eval: > z <- seq(-1,1,.1) > eval(dydx, list(x=z)) [1] 0.3954974 0.4146144 0.4320092 0.4475873 0.4612640 0.4729651 0.4826267 [8] 0.4901964 0.4956329 0.4989067 0.5000000 0.4989067 0.4956329 0.4901964 [15] 0.4826267 0.4729651 0.4612640 0.4475873 0.4320092 0.4146144 0.3954974  Note that we can bind the x parameter in the expression when calling eval(). # Building QuickFIX on Solaris I just had to build the QuickFIX C++ client library on Solaris, and here are a few notes that might be useful when doing the same: * Make sure you have /usr/ccs/bin in your PATH (so that ar can be found); * If you are using gmake, you may need to export MAKE=gmake before you run configure. * If building static libs as well as .so files, run configure --enable-shared=yes. Once the libraries are built, you can compile and link using: $ g++ -I /usr/local/include/quickfix/include/ main.cpp libquickfix.a -o main -lxml2

This will statically compile against quickfix and dynamically link against libxml2.

# The Bias/Variance Tradeoff

Probably one of the nicest explanations of the bias/variance tradeoff is the one I found in the book Introduction to Information Retrieval (full book available online). The tradeoff can be explained mathematically, and also more intuitively. The mathematical explanation is as follows:

if we have a learning method that operates on a given set of input data (call it $x$) and a “real” underlying process that we are trying to approximate (call it $\alpha$), then the expected (squared) error is:

\begin{aligned} E[x-\alpha]^2 = Ex^2 – 2Ex\alpha + \alpha^2\\ = (Ex)^2 – 2Ex\alpha + \alpha^2 + Ex^2 – 2(Ex)^2 + (Ex)^2\\ = [Ex-\alpha]^2 + Ex^2 – E2x(Ex) + E(Ex)^2\\ = [Ex-\alpha]^2 + E[x-Ex]^2 \end{aligned}

Taking advantage of the linearity of expectation and adding a few extra cancelling terms, we end up with the representation:

$$Error = bias (E[x-\alpha]^2) + variance (E[x-Ex]^2)$$

Thats the mathematical equivalence. However, a more descriptive approach is as follows:

Bias is the squared difference between the true underlying distribution and the prediction of the learning process, averaged over our input datasets. Consistently wrong prediction equal large bias. Bias is small when the predictions are consistently right, or the average error across different training sets is roughly zero. Linear models generally have a high bias for nonlinear problems. Bias can represent the domain knowledge that we have built into the learning process – a linear assumption may be unsuitable for a nonlinear problem, and thus result in high bias.

Variance is the variation in prediction (or the consistency) – it is large if different training sets result in different learning models. Linear models will generally have lower variance. High variance generally results in overfitting – in effect, the learning model is learning from noise, and will not generalize well.

Its a useful analogy to think of most learning models as a box with two dials – bias and variance, and the setting of one will affect the other. We can only try and find the “right” setting for the situation we are working with. Hence the bias-variance tradeoff.

# Financial Amnesia

The FT has a nice article on financial amnesia, which talks about the desire of the CFA UK discipline to incorporate some financial history into their curriculum, ostensibly to implant enough of a sense of deja vu in budding financial managers when they encounter potential disaster scenarios that they may avoid repeating them.

I think this is a great idea – the only problem is that it probably wont have any sort of meaningful impact other than provide some fascinating course material for CFA students. The problem with an industry that makes its living from markets that essentially operate on two gears – fear and greed – is that despite all of the prior knowledge in the world, we will willingly impose a state of amnesia upon ourselves if we can convince ourselves that what is happening now is in some way different, or just different enough from whatever disasters happened before. The Internet bubble, tulip-mania, and the various property boom and busts that we have seen over the last decade share at their core a common set of characteristics, but they are different (or “new”) enough that we were able to live in a state of denial. The “fear and greed” mentality also means that even if you know you are operating in a bubble that will eventually burst, you carry on regardless, as you plan to be out of the game before it all goes bad (The Greater Fool theory).

Incidentally, if you want to read a beautifully written and entertaining account of financial bubbles by one of the greatest writers on this topic, you should read this book (“A Short History of Financial Euphoria” by J.K. Galbraith). It packs a huge amount of wit and insight into its relatively small page count.

NOTE: Galbraith also wrote the definitive account on the Great Crash of 1929, which rapidly became required reading around three years ago…it’s also excellent. Galbraith has a beautiful turn of phrase.

# The Elements of Modern Javascript Style

In the spirit of the last post, I wanted to share my newfound interest in another language that I previously hated. Ive always treated JavaScript like an unloved cousin. I hated dealing with it, and my brief encounters with it were really unenjoyable. However, I’ve had to deal with it a bit more recently, and thankfully my earlier encounters are distant memories, and the more recent experience is much better. There are two reasons for this: one is the excellent JQuery framework, and the other is the disparate collection of elegant patterns and best practises that have made JavaScript into a reasonably elegant language to work with.

The single best reference I have found so far that elucidates on all of these is the excellent “Javascript Web Applications” from O’Reilly, which expands on many facets of modern JavaScript usage. I would definitely recommend checking this book out.

# The Elements of Modern C++ Style

Herb Sutter has a fantastic short article over at his site on the new C++11 standard. The new extensions to the language are way overdue. One particular example that I really liked was the application of library algorithms and lambdas to create “extensions” to the language. For instance, the example below adds a “lock” construct (similar to the one in C#):

// C#
lock( mut_x ) {
... use x ...
}

// C++11 without lambdas: already nice, and more flexible (e.g., can use timeouts, other options)
{
lock_guard<mutex> hold { mut_x };
... use x ...
}

// C++11 with lambdas, and a helper algorithm: C# syntax in C++
// Algorithm: template<typename T, typename F> void lock( T& t, F f ) { lock_guard<t> hold(t); f(); }
lock( mut_x, [&]{
... use x ...
});


Of course, all of this is dependent on compiler support. A couple of nice resources on the current state of C++11 support are:

http://www.aristeia.com/C++11/C++11FeatureAvailability.htm

and

https://wiki.apache.org/stdcxx/C%2B%2B0xCompilerSupport