Density Estimation of High-Frequency Financial Data

Frequently we will want to estimate the empirical probability density function of real-world data and compare it to the theoretical density from one or more probability distributions. The following example shows the empirical and theoretical normal density for EUR/USD high-frequency tick data \(X\) (which has been transformed using log-returns and normalized via \(\frac{X_i-\mu_X}{\sigma_X}\)). The theoretical normal density is plotted over the range \(\left(\lfloor\mathrm{min}(X)\rfloor,\lceil\mathrm{max}(X)\rceil\right)\). The results are in the figure below. The discontinuities and asymmetry of the discrete tick data, as well as the sharp kurtosis and heavy tails (a corresponding interval of \(\approx \left[-8,+7\right]\) standard deviations away from the mean) are apparent from the plot.

tick density
Empirical and Theoretical Tick Density

We also show the theoretical and empirical density for the EUR/USD exchange rate log returns over different timescales. We can see from these plots that the distribution of the log returns seems to be asymptotically converging to normality. This is a typical empirical property of financial data.

Density Estimate Across Varying Timescales

The following R source generates empirical and theoretical density plots across different timescales. The data is loaded from files that are sampled at different intervals. I cant supply the data unfortunately, but you should get the idea.

[source lang=”R”]
# Function that reads Reuters CSV tick data and converts Reuters dates
# Assumes format is Date,Tick
readRTD <- function(filename) {
tickData <- read.csv(file=filename, header=TRUE, col.names=c("Date","Tick"))
tickData$Date <- as.POSIXct(strptime(tickData$Date, format="%d/%m/%Y %H:%M:%S"))

# Boilerplate function for Reuters FX tick data transformation and density plot
plot.reutersFXDensity <- function() {
filenames <- c("data/eur_usd_tick_26_10_2007.csv",
labels <- c("Tick", "1 Minute", "5 Minutes", "Hourly", "Daily")

par(mfrow=c(length(filenames), 2),mar=c(0,0,2,0), cex.main=2)
tickData <- c()
i <- 1
for (filename in filenames) {
tickData[[i]] <- readRTD(filename)
# Transform: `$Y = \nabla\log(X_i)$`
logtick <- diff(log(tickData[[i]]$Tick))
# Normalize: `$\frac{(Y-\mu_Y)}{\sigma_Y}$`
logtick <- (logtick-mean(logtick))/sd(logtick)
# Theoretical density range: `$\left[\lfloor\mathrm{min}(Y)\rfloor,\lceil\mathrm{max}(Y)\rceil\right]$`
x <- seq(floor(min(logtick)), ceiling(max(logtick)), .01)
plot(density(logtick), xlab="", ylab="", axes=FALSE, main=labels[i])
lines(x,dnorm(x), lty=2)
#legend("topleft", legend=c("Empirical","Theoretical"), lty=c(1,2))
plot(density(logtick), log="y", xlab="", ylab="", axes=FALSE, main="Log Scale")
lines(x,dnorm(x), lty=2)
i <- i + 1

Binomial Pricing Trees in R

Binomial Tree Simulation

The binomial model is a discrete grid generation method from \(t=0\) to \(T\). At each point in time (\(t+\Delta t\)) we can move up with probability \(p\) and down with probability \((1-p)\). As the probability of an up and down movement remain constant throughout the generation process, we end up with a recombining binary tree, or binary lattice. Whereas a balanced binomial tree with height \(h\) has \(2^{h+1}-1\) nodes, a binomial lattice of height \(h\) has \(\sum_{i=1}^{h}i\) nodes.

The algorithm to generate a binomial lattice of \(M\) steps (i.e. of height \(M\)) given a starting value \(S_0\), an up movement \(u\), and down movement \(d\), is:

FOR i=1 to M
FOR j=0 to i
STATE S(j,i) = S(0)*u^j*d^(n-j)

We can write this function in R and generate a graph of the lattice. A simple lattice generation function is below:

[source lang=”R”]
# Generate a binomial lattice
# for a given up, down, start value and number of steps
genlattice <- function(X0=100, u=1.1, d=.75, N=5) {
X <- c()
X[1] <- X0
count <- 2

for (i in 1:N) {
for (j in 0:i) {
X[count] <- X0 * u^j * d^(i-j)
count <- count + 1

We can generate a sample lattice of 5 steps using symmetric up-and-down values:

[source lang=”R”]
> genlattice(N=5, u=1.1, d=.9)
[1] 100.000 90.000 110.000 81.000 99.000 121.000 72.900 89.100 108.900 133.100 65.610
[12] 80.190 98.010 119.790 146.410 59.049 72.171 88.209 107.811 131.769 161.051

In this case, the output is a vector of alternate up and down state values.

We can nicely graph a binomial lattice given a tool like graphviz, and we can easily create an R function to generate a graph specification that we can feed into graphviz:

[source lang=”R”]
function(S, labels=FALSE) {
shape <- ifelse(labels == TRUE, "plaintext", "point")

cat("digraph G {", "\n", sep="")
cat("node[shape=",shape,", samehead, sametail];","\n", sep="")


# Create a dot node for each element in the lattice
for (i in 1:length(S)) {
cat("node", i, "[label=\"", S[i], "\"];", "\n", sep="")

# The number of levels in a binomial lattice of length N
# is `$\frac{\sqrt{8N+1}-1}{2}$`
L <- ((sqrt(8*length(S)+1)-1)/2 – 1)

for (i in 1:L) {
tabs <- rep("\t",i-1)
j <- i
while(j>0) {
k <- k + 1
j <- j – 1

cat("}", sep="")

This will simply output a dot script to the screen. We can capture this script and save it to a file by invoking:

[source lang=”R”]
> x<-capture.output(dotlattice(genlattice(N=8, u=1.1, d=0.9)))
> cat(x, file="/tmp/")

We can then invoke dot from the command-line on the generated file:

[source lang=”bash”]
$ dot -Tpng -o lattice.png -v

The resulting graph looks like the following:

Binomial Lattice
Binomial Lattice (no labels)

If we want to add labels to the lattice vertices, we can add the labels attribute:

[source lang=”R”]
> x<-capture.output(dotlattice(genlattice(N=8, u=1.1, d=0.9), labels=TRUE))
> cat(x, file="/tmp/")

Lattice (labels)
Binomial Lattice (labels)

Statistical Arbitrage II : Simple FX Arbitrage Models

In the context of the foreign exchange markets, there are several simple no-arbitrage conditions, which, if violated outside of the boundary conditions imposed by transaction costs, should provide the arbitrageur with a theoretical profit when market conditions converge to theoretical normality.

Detection of arbitrage conditions in the FX markets requires access to high-frequency tick data, as arbitrage opportunities are usually short-lived. Various market inefficiency conditions exist in the FX markets. Apart from the basic strategies outlined in the following sections, other transient opportunities may exist, if the trader or trading system can detect and act on them quickly enough.

Round-Trip Arbitrage
Possibly the most well-known no-arbitrage boundary condition in foreign exchange is the covered interest parity condition. The covered interest parity condition is expressed as:

\[ (1+r_d) = \frac{1}{S_t}(1+r_f)F_t \]

which specifies that it should not be possible to earn positive return by borrowing domestic assets at $\(r_d\) for lending abroad at \(r_f\) whilst covering the exchange rate risk through a forward contract \(F_t\) of equal maturity.

Accounting for transaction costs, we have the following no-arbitrage relationships:

\[ (1+r_d^a) \geq \frac{1}{S^a}(1+r_f^b)F^b \]
\[ (1+r_f^b) \geq S^b(1+r_d^b)\frac{1}{F^a} \]

For example, the first condition states that the cost of borrowing domestic currency at the ask rate (\(1+r_d^a\)) should be at least equal to the cost of converting said currency into foreign currency (\(\frac{1}{s^a}\)) at the prevailing spot rate \(S^a\) (assuming that the spot quote \(S^a\) represents the cost of a unit of domestic currency in terms of foreign currency), invested at \(1+r_f^b\), and finally converted back into domestic currency via a forward trade at the ask rate (\(F^a\)). If this condition is violated, then we can perform round-trip arbitrage by converting, investing, and re-converting at the end of the investment term. Persistent violations of this condition are the basis for the roaring carry trade, in particular between currencies such as the Japanese Yen and higher yielding currencies such as the New Zealand dollar and the Euro.

Triangular Arbitrage
A reduced form of FX market efficiency is that of triangular arbitrage, which is the geometric relationship between three currency pairs. Triangular arbitrage is defined in two forms, forward arbitrage and reverse arbitrage. These relationships are defined below.

$$ \left(\frac{C_1}{C_2}\right)_{ask} \left(\frac{C_2}{C_3}\right)_{ask} = \left(\frac{C_1}{C_3}\right)_{bid} \\
\left(\frac{C_1}{C_2}\right)_{bid} \left(\frac{C_2}{C_3}\right)_{bid} = \left(\frac{C_1}{C_3}\right)_{ask} $$

With two-way high-frequency prices, we can simultaneously calculate the presence of forward and reverse arbitrage.

A contrived example follows: if we have the following theoretical two-way tradeable prices: \(\left(\frac{USD}{JPY}\right) = 90/110\), \(\left(\frac{GBP}{USD}\right) = 1.5/1.8\), and \(\left(\frac{JPY}{GBP}\right) = 150/200\). By the principle of triangular arbitrage, the theoretical two-way spot rate for JPY/GBP should be \(135/180\). Hence, we can see that JPY is overvalued relative to GBP. We can take advantage of this inequality as follows, assuming our theoretical equity is 1 USD:

  • Pay 1 USD and receive 90 JPY ;
  • Sell 90 JPY ~ and receive \(\left(\frac{90}{135}\right)\) GBP ;
  • Pay \(\left(\frac{90}{135}\right)\) GBP, and receive \(\left(\frac{90}{135}\right) \times \) 1.5 USD = 1.32 USD.

We can see that reverse triangular arbitrage can detect a selling opportunity (i.e. the bid currency is overvalued), whilst forward triangular arbitrage can detect a buying opportunity (the ask currency is undervalued).

The list of candidate currencies could be extended, and the arbitrage condition could be elegantly represented by a data structure called a directed graph. This would involve creating an adjacency matrix \(R\), in which an element \(R_{i,j}\) contains a measure representing the cost of transferring between currency \(i\) and currency \(j\).

Triangular Graph

Estimating Position Risk
When executing an arbitrage trade, there are some elements of risk. An arbitrage trade will normally involve multiple legs which must be executed simultaneously and at the specified price in order for the trade to be successful. As most arbitrage trades capitalize on small mispricings in asset values, and rely on large trading volumes to achieve significant profit, even a minor movement in the execution price can be disastrous. Hence, a trading algorithm should allow for both trading costs and slippage, normally by adding a margin to the profitability ratio. The main risk in holding an FX position is related to price slippage, and hence, the variance of the currency that we are holding.

Quasi-Random Number Generation in R

Random number generation is a core topic in numerical computer science. There are many efficient
algorithms for generating random (strictly speaking, pseudo-random) variates
from different probability distributions. The figure below shows a
sampling of 1000 two-dimensional random variates from the standard Gaussian and
Cauchy distributions, respectively. The size of the extreme deviations of the
Cauchy distribution is apparent from the graph.

Gaussian and Cauchy Random Variates

However, sometimes we need to produce numbers that are more evenly distributed (quasi-random numbers). For example, in a Monte Carlo integration exercise, we can get faster convergence with a lower error bound using so-called low-discrepancy random sequences, using the GSL library. In the figure below, we show two-dimensional normal and Sobol (a low-discrepancy generator) variates.

Quasi-Random Numbers
Normal, Cauchy, and Sobol 2-d variates

To generate the graph below, I used the GSL library for R, as shown below:

[source lang=”R”]
q <- qrng_alloc(type="sobol", 2)
rs <- qrng_get(q,1000)
plot(rnorm(1000), rnorm(1000), pch=20, main="~N(0,1)",
ylab="", xlab="")
plot(rs, pch=20, main="Sobol",
ylab="", xlab="")
plot(rcauchy(1000), rcauchy(1000),pch=20,
main="~C(0,1)", ylab="",xlab="")

The property of low-discrepancy generators is even more apparent if we view the random variates in a higher dimension, for example the figure below shows the variates as a 3-dimensional cube. Note how the clustering around the centre of the cube is much more pronounced for the Gaussian cube.

3D Random Variates
3D Random Variates

To plot the figure above, I used the GSL and Lattice libraries:

[source lang=”R”]
q <- qrng_alloc(type="sobol", 3)
npoints <- 200
rs <- qrng_get(q,npoints)
ltheme <- canonical.theme(color = FALSE)
ltheme$strip.background$col <- "transparent"
lattice.options(default.theme = ltheme)
trellis.par.set(layout.heights =
list(top.padding = -20,
main.key.padding = 1,
key.axis.padding = 0,
axis.xlab.padding = 0,
xlab.key.padding = 0,
key.sub.padding = 0,
bottom.padding = -20))

# Plot the normal variates in a 3-dim cube
p1 <- cloud(rnorm(npoints) ~ rnorm(npoints) + rnorm(npoints), xlab="x", ylab="y",
zlab="z", pch=20, main="~N(0,1)")
p2 <- cloud(rs[,1] ~ rs[,2] + rs[,3], xlab="x", ylab="y",
zlab="z", pch=20, main="Sobol")
print(p1, split=c(1,1,2,1), more=TRUE)
print(p2, split=c(2,1,2,1))

High-Frequency Statistical Arbitrage

Computational statistical arbitrage systems are now de rigeur, especially for high-frequency, liquid markets (such as FX).
Statistical arbitrage can be defined as an extension of riskless arbitrage,
and is quantified more precisely as an attempt to exploit small and consistent regularities in
asset price dynamics through use of a suitable framework for statistical modelling.

Statistical arbitrage has been defined formally (e.g. by Jarrow) as a zero initial cost, self-financing strategy with cumulative discounted value \(v(t)\) such that:

  • \( v(0) = 0 \),
  • \( \lim_{t\to\infty} E^P[v(t)] > 0 \),
  • \( \lim_{t\to\infty} P(v(t) < 0) = 0 \),
  • \( \lim_{t\to\infty} \frac{Var^P[v(t)]}{t}=0 \mbox{ if } P(v(t)<0) > 0 \mbox{ , } \forall{t} < \infty \)

These conditions can be described as follows: (1) the position has a zero initial cost (it is a self-financing trading strategy), (2) the expected discounted profit is positive in the limit, (3) the probability of a loss converges to zero, and (4) a time-averaged variance measure converges to zero if the probability of a loss does not become zero in finite time. The fourth condition separates a standard arbitrage from a statistical arbitrage opportunity.

We can represent a statistical arbitrage condition as $$ \left| \phi(X_t – SA(X_t))\right| < \mbox{TransactionCost} $$ Where \(\phi()\) is the payoff (profit) function, \(X\) is an arbitrary asset (or weighted basket of assets) and \(SA(X)\) is a synthetic asset constructed to replicate the payoff of \(X\). Some popular statistical arbitrage techniques are described below. Index Arbitrage

Index arbitrage is a strategy undertaken when the traded value of an index (for example, the index futures price) moves sufficiently far away from the weighted components of the index (see Hull for details). For example, for an equity index, the no-arbitrage condition could be expressed as:

\[ \left| F_t – \sum_{i} w_i S_t^i e^{(r-q_i)(T-t)}\right| < \mbox{Cost}\] where \(q_i\) is the dividend rate for stock i, and \(F_t\) is the index futures price at time t. The deviation between the futures price and the weighted index basket is called the basis. Index arbitrage was one of the earliest applications of program trading. An alternative form of index arbitrage was a system where sufficient deviations in the forecasted variance of the relationship (estimated by regression) between index pairs and the implied volatilities (estimated from index option prices) on the indices were classed as an arbitrage opportunity. There are many variations on this theme in operation based on the VIX market today. Statistical Pairs trading is based on the notion of relative pricing - securities with similar characteristics should be priced roughly equally. Typically, a long-short position in two assets is created such that the portfolio is uncorrelated to market returns (i.e. it has a negligible beta). The basis in this case is the spread between the two assets. Depending on whether the trader expects the spread to contract or expand, the trade action is called shorting the spread or buying the spread. Such trades are also called convergence trades.

A popular and powerful statistical technique used in pairs trading is cointegration, which is the identification of a linear combination of multiple non-stationary data series to form a stationary (and hence predictable) series.

Trading Algorithms

In recent years, computer algorithms have become the decision-making machines behind many trading strategies. The ability to deal with large numbers of inputs, utilise long variable histories, and quickly evaluate quantitative conditions to produce a trading signal, have made algorithmic trading systems the natural evolutionary step in high-frequency financial applications. Originally the main focus of algorithmic trading systems was in neutral impact market strategies (e.g. Volume Weighted Average Price and Time Weighted Average Price trading), however, their scope has widened considerably, and much of the work previously performed by manual systematic traders can now be done by “black box” algorithms.

Trading algorithms are no different from human traders in that they need an unambiguous measure of performance – i.e. risk versus return. The ubiquitous Sharpe ratio (\(\frac{\mu_r – \mu_f}{\sigma}\)) is a popular measure, although other measures are also used. A measure of trading performance that is commonly used is that of total return, which is defined as

\[ R_T \equiv \sum_{j=1}^{n}r_j \]

over a number of transactions n, and a return per transaction \(r_j\). The annualized total return is defined as \(R_A = R_T \frac{d_A}{d_T}\), where \(d_A\) is the number of trading days in a year, and \(d_T\) is the number of days in the trading period specified by \(R_T\). The maximum drawdown over a certain time period is defined as \(D_T \equiv \max(R_{t_a}-R_{t_b}|t_0 \leq t_a \leq t_b \leq t_E)\), where \(T = t_E – t_0\), and \(R_{t_a}\) and \(R_{t_b}\) are the total returns of the periods from \(t_0\) to \(t_a\) and \(t_b\) respectively. A resulting indicator is the Stirling Ratio, which is defined as

\[ SR = \frac{R_T}{D_T} \]

High-frequency tick data possesses certain characteristics which are not as apparent in aggregated data. Some of these characteristics include:

  • Non-normal characteristic probability distributions. High-frequency data may have large kurtosis (heavy tails), and be asymmetrically skewed;
  • Diurnal seasonality – an intraday seasonal pattern influenced by the restrictions on trading times in markets. For instance, trading activity may be busiest at the start and end of the trading day. This may not apply so much to foreign exchange, as the FX market is a decentralized 24-hour operation, however, we may see trend patterns in tick interarrival times around business end-of-day times in particular locations;
  • Real-time high frequency data may contain errors, missing or duplicated tick values, or other anomalies. Whilst historical data feeds will normally contain corrections to data anomalies, real-time data collection processes must be aware of the fact that adjustments may need to be made to the incoming data feeds.