Archive for July, 2008

Classic Bit Hackery In Java

Thursday, July 10th, 2008

In the next release of Commons::Net, there is going to be a utility class for calculating various metrics around Classless Interdomain Routing (CIDR) rules and classless IP addresses – the sort of thing you can find in many online Perl-based address calculators. Anyways, the interesting thing about the address rules is that they are all basically forms of bit manipulation. A 32-bit IPv4 address can be decomposed into a network portion (n bits) and a host portion (32-n bits).

/* Create a binary netmask from the number of bits specification /x */
int cidrPart = rangeCheck(Integer.parseInt(matcher.group(5)), 0, NBITS-1);
for (int j = 0; j < cidrPart; ++j) {
netmask |= (1 << 31-j);
}

In the example code above, a 32-bit integer is initialized with a number of 1′s corresponding to the value of the cidrPart variable. for instance, if cidrPart = 8, the netmask integer would look like 11111111000000000000000000000000, with the most significant eight bits set. This allows us to take a 32-bit IP address and easily figure out various attributes. For instance, the line

network = (address & netmask);

strips out the host portion and just leaves us with the network portion of the address.

Similarly, if we know that a broadcast address is in the form nnnnnn11…..1, i.e. n host bits and 32-n 1s, then we can create the broadcast address using the netmask with the line

broadcast = network | ~(netmask);

i.e. invert the netmask and OR with the supplied address.

We can also iterate over valid address ranges in CIDR ranges easily. For instance, we can calculate the lowest and highest addresses easily:

private int low() { return network() + 1; }
private int high() { return broadcast() - 1; }

And thus iterate over each address in the range:

public String[] getAllAddresses() {
String[] addresses = new String[getAddressCount()];
for (int add = low(), j=0; add <= high(); ++add, ++j) {
addresses[j] = format(toArray(add));
}
return addresses;
}
}

But the clever part is the bit counting technique which is used to count the number of 1 bits in the netmask integer (the population). The obvious way is to count up or down from 0 to tex:2^{31}, shifting the target integer at each stage, and counting the set bits using AND. However, in the book Hacker’s Delight, there are tons of clever recipes for situations like this. The one that I ended up using is the following (almost identical to the C version, apart from the use of >>> for unsigned shifts):

/*
* Count the number of 1-bits in a 32-bit integer using a divide-and-conquer strategy
* see Hacker's Delight section 5.1
*/
int pop(int x) {
x = x - ((x >>> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >>> 2) & 0x33333333);
x = (x + (x >>> 4)) & 0x0F0F0F0F;
x = x + (x >>> 8);
x = x + (x >>> 16);
return x & 0x0000003F;
}

This extremely elegant piece of code is a divide-and-conquer procedure for counting 1 bits in a 32-bit integer. At each step, it counts the number of 1s in each adjacent tex:2^n-bit group, where n is increased from 0 to tex:\log_2{(n-1)}, where n is the number of bits in the integer (32 in this case). At each stage, the number of bits in each n-bit group are summed, until finally, we are left with the count of the number of 1 bits, which can be represented in 6 bits (32 = 0xb100000).

It’s easier to visualize what’s going on if you can see how the magic hex constants relate to the structure of a 32-bit integer. For instance, using bc:

/cygdrive/c/sandbox>bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.

ibase=16
obase=2
55555555
1010101010101010101010101010101
33333333
110011001100110011001100110011
0F0F0F0F
1111000011110000111100001111

If you want to see the java code involved, it is here: SubnetUtils.java. There is also a very nice explanation of the bit counting routine (in assembler) here.

JDBC Thin Clients and Oracle RAC

Tuesday, July 8th, 2008

If you are wondering why your jdbc thin client connections to an Oracle Real Application Cluster instance are failing to connect, one possible reason is that you may not be using the correct connection syntax. An Oracle RAC instance can be identified by running tnsping and looking at the host list and the FAILOVER setting:

/cygdrive/h>tnsping MYDB

TNS Ping Utility for 32-bit Windows: Version 10.2.0.1.0 - Production on 08-JUL-2008 17:02:15

Copyright (c) 1997, 2005, Oracle. All rights reserved.

Used parameter files:
c:\lib\oracle-10.2\network\admin\sqlnet.ora

Used LDAP adapter to resolve the alias
Attempting to contact (DESCRIPTION = (SDU = 8192) (TDU = 8192) (ADDRESS_LIST = (ADDRESS = PROTOCOL = TCP)(HOST = host1)(PORT = 1621))(ADDRESS = (PROTOCOL = TCP)(HOST = host2)(PORT = 1621)) (LOAD_BALANCE = on) (FAILOVER = on ) ) (CONNECT_DATA = (SERVICE_NAME = MYDB) (FAILOVER_MODE = (TYPE = session) (METHOD = basic) (RETRIES = 20) (DELAY = 5) ) ) ) OK (0 msec)

The standard JDBC thin URL syntax doesnt seem to work correctly here:

jdbc:oracle:thin:@host1:1621:MYDB

Seems to fail (at least for me), but the alternative unwieldy syntax:

jdbc:oracle:thin:@(DESCRIPTION=(LOAD_BALANCE=on)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=host1)(PORT=1621))(ADDRESS=(PROTOCOL=TCP)(HOST=host2)(PORT=1621)))(CONNECT_DATA=(SERVICE_NAME=MYDB)))

Works fine.

The NTLM and Java Bible

Friday, July 4th, 2008

Not so much a bible really, but a series of posts showing how to perform NTLM authentication using various techniques in Java (and C, via the Neon library):

http://www.theresearchkitchen.com/blog/archives/25
http://www.theresearchkitchen.com/blog/archives/24
http://www.theresearchkitchen.com/blog/archives/23
http://www.theresearchkitchen.com/blog/archives/39
http://www.theresearchkitchen.com/blog/archives/38
http://www.theresearchkitchen.com/blog/archives/37

.vimrc

Wednesday, July 2nd, 2008

Putting my .vimrc here, to save me having to recreate my key shortcuts again:

set nocompatible
source $VIMRUNTIME/vimrc_example.vim
source $VIMRUNTIME/mswin.vim
behave mswin

set diffexpr=MyDiff()
function MyDiff()
  let opt = '-a --binary '
  if &diffopt =~ 'icase' | let opt = opt . '-i ' | endif
  if &diffopt =~ 'iwhite' | let opt = opt . '-b ' | endif
  let arg1 = v:fname_in
  if arg1 =~ ' ' | let arg1 = '"' . arg1 . '"' | endif
  let arg2 = v:fname_new
  if arg2 =~ ' ' | let arg2 = '"' . arg2 . '"' | endif
  let arg3 = v:fname_out
  if arg3 =~ ' ' | let arg3 = '"' . arg3 . '"' | endif
  let eq = ''
  if $VIMRUNTIME =~ ' '
    if &sh =~ '\<cmd'
      let cmd = '""' . $VIMRUNTIME . '\diff"'
      let eq = '"'
    else
      let cmd = substitute($VIMRUNTIME, ' ', '" ', '') . '\diff"'
    endif
  else
    let cmd = $VIMRUNTIME . '\diff'
  endif
  silent execute '!' . cmd . ' ' . opt . arg1 . ' ' . arg2 . ' > ' . arg3 . eq
endfunction

au GUIEnter * simalt ~x
set guifont=Lucida\ Console:h11
set backupdir=c:/lib/vim/backup
set directory=c:/lib/vim/backup
set expandtab
set tabstop=4
set shiftwidth=4
set smarttab

:nmap <C-N><C-N> :set invnumber<CR>
:nmap <C-T> :tabnew<CR>
:nmap <C-H> :set invhlsearch<CR>

colorscheme desert

Matrix Exponentiation in R

Tuesday, July 1st, 2008

A short while ago, I needed to do some matrix exponentiation in R (raising a matrix to a power). By default, the exponentiation operator ^ in R, when applied to a matrix, will just raise each element of the matrix to a power, rather than the matrix itself:

> A < - matrix(c(1:4), nrow=2, byrow=T)
> A
[,1] [,2]
[1,] 1 2
[2,] 3 4
> A ^ 2
[,1] [,2]
[1,] 1 4
[2,] 9 16
>
>

Whereas what we really want in this case is the equivalent of:

> A %*% A
[,1] [,2]
[1,] 7 10
[2,] 15 22
>

There are a few different ways of creating a matrix exponentiation operator in R: we could create an R function and create an exponentiation operator for matrices, similar to the %*% matrix multiplication operator that exists already, or we could write the function in C and link to it. It seems logical to create an %^% operator to match the current set of matrix operators. The choice of writing the exponentiation routine in R or C is actually less important than the exponentiation algorithm used. A fast exponentiation algorithm is the “square and multiply” method, an explanation of which can be found here.

I chose to write a basic exponentiation routine in C, by creating a new operator definition, and following these basic rules:

  • tex:A^{-1} returns the matrix inverse.
  • tex:A^0 returns the identity matrix.
  • tex:A^1 returns the original matrix.
  • tex:A^n returns A to the nth power.
  • Complex as well as real-valued matrices should be supported.

The code for this is shown below. Note that to calculate the matrix inverse, I don’t call directly into the underlying BLAS libraries, but actually call back into R’s solve() function to calculate the inverse. The changes were made to array.c in the R source tree under /src/main.

SEXP do_matexp(SEXP call, SEXP op, SEXP args, SEXP rho)
{
int nrows, ncols;
SEXP matrix, tmp, dims, dims2;
SEXP x, y, x_, x__;
int i,j,e,mode;

// necessary?
mode = isComplex(CAR(args)) ? CPLXSXP : REALSXP;

SETCAR(args, coerceVector(CAR(args), mode));
x = CAR(args);
y = CADR(args);

dims = getAttrib(x, R_DimSymbol);
nrows = INTEGER(dims)[0];
ncols = INTEGER(dims)[1];

if (nrows != ncols)
error(_("can only raise square matrix to power"));

if (!isNumeric(y))
error(_("exponent must be a scalar integer"));

e = asInteger(y);

if (e < -1)
error(_("exponent must be >= -1"));
else if (e == 1)
return x;
else if (e == -1) { /* return matrix inverse via solve() */
SEXP p1, p2, inv;
PROTECT(p1 = p2 = allocList(2));
SET_TYPEOF(p1, LANGSXP);
CAR(p2) = install("solve.default");
p2 = CDR(p2);
CAR(p2) = x;
inv = eval(p1, rho);

UNPROTECT(1);
return inv;
}

PROTECT(matrix = allocVector(mode, nrows * ncols));
PROTECT(tmp = allocVector(mode, nrows * ncols));
PROTECT(x_ = allocVector(mode, nrows * ncols));
PROTECT(x__ = allocVector(mode, nrows * ncols));

if (mode == REALSXP)
Memcpy(REAL(x_), REAL(x), (size_t)nrows*ncols);
else
Memcpy(COMPLEX(x_), COMPLEX(x), (size_t)nrows*ncols);

// Initialize matrix to identity matrix
// Set x[i * ncols + i] = 1
if (mode == REALSXP)
for (i = 0; i < ncols*nrows; i++)
REAL(matrix)[i] = ((i % (ncols+1) == 0) ? 1 : 0);
else
for (i = 0; i < ncols*nrows;i++) {
COMPLEX(matrix)[i].i = 0.0;
COMPLEX(matrix)[i].r = ((i % (ncols+1) == 0) ? 1.0 : 0.0);
}

if (e == 0) {
; // return identity matrix
}
else
while (e > 0) {
if (e & 1) {
if (mode == REALSXP)
matprod(REAL(matrix), nrows, ncols,
REAL(x_), nrows, ncols, REAL(tmp));
else
cmatprod(COMPLEX(matrix), nrows, ncols,
COMPLEX(x_), nrows, ncols, COMPLEX(tmp));

//copyMatrixData(tmp, matrix, nrows, ncols, mode);
if (mode == REALSXP)
Memcpy(REAL(matrix), REAL(tmp), (size_t)nrows*ncols);
else
Memcpy(COMPLEX(matrix), COMPLEX(tmp), (size_t)nrows*ncols);
e--;
}

if (mode == REALSXP)
matprod(REAL(x_), nrows, ncols,
REAL(x_), nrows, ncols, REAL(x__));
else
cmatprod(COMPLEX(x_), nrows, ncols,
COMPLEX(x_), nrows, ncols, COMPLEX(x__));

//copyMatrixData(x__, x_, nrows, ncols, mode);
if (mode == REALSXP)
Memcpy(REAL(x_), REAL(x__), (size_t)nrows*ncols);
else
Memcpy(COMPLEX(x_), COMPLEX(x__), (size_t)nrows*ncols);

e >>= 1;
}

PROTECT(dims2 = allocVector(INTSXP, 2));
INTEGER(dims2)[0] = nrows;
INTEGER(dims2)[1] = ncols;
setAttrib(matrix, R_DimSymbol, dims2);

UNPROTECT(5);
return matrix;
}

To actually hook this routine up to the %^% operator, I needed a further piece of glue in /src/main/names.c to associate the operator:

{"%^%", do_matexp, 3, 1, 2, {PP_BINARY, PREC_POWER, 0}}

Then to try it out (after recompiling R of course!):

> A < - matrix(c(1:4), nrow=2, byrow=TRUE)
> A %^% 2
[,1] [,2]
[1,] 7 10
[2,] 15 22
>