Saturday, 20 November 2010

BSDNT - v0.24 nn_bitset/clear/test and nn_test_random

In today's update we make a long overdue change to bsdnt, again to improve our testing
of the library.

We are going to add a function for generating random bignums with sparse binary
representation. We'll also add some other random functions based on this primitive.

Using test integers with sparse binary representation in our test code will push our
functions harder because it will test lots of corner cases such as words that are all
zero, in the middle of routines, and so on. As it is currently, we'd be extremely
lucky for the random word generator we've been using to generate an all zero word, or
a word with all bits set to one for that matter.

The first step to generating such test randoms is to write a function for setting a
given bit in an integer. This will be an nn_linear function despite it not actually
taking linear time. In fact, it will take essentially constant time. However, it is an
nn type function, so it belongs in an nn module.

The routine is very straightforward. Given a bit index b, starting from 0 for the least
significant bit of a bignum, it will simply use a logical OR to set bit b of the bignum.

Rather than construct a bignum 2^b and OR that with our number, we simply determine
which word of the bignum needs altering and create an OR-mask for that word.

Computing which word to adjust is trivial, but depends on the number of bits in a word.
On a 64 bit machine we shift b to the right by 6 bits (as 2^6 = 64), and on a 32 bit
machine we shift b to the right by 5 bits (2^5 = 32). This has the effect of dividing
b by 64 or 32 respectively (discarding the remainder). This gives us the index of the
word we need to adjust.

Now we need to determine which bit of the word needs setting. This is given by the
remainder after dividing b by 64 or 32 respectively, and this can be determined by
logical AND'ing b with 2^6-1 or 2^5-1 respectively. This yields a value c between 0 and
63 (or 31) inclusive, which is a bit index. To turn that into our OR-mask, we simply
compute 2^c (by shifting 1 to the left by c bits).

Now that we have our OR-mask and the index of the word to OR it with, we can update the
required bit. We call this function nn_bit_set.

While we are at it we create two other functions, nn_bit_clear and nn_bit_test.

It's now straightforward to write test functions which randomly set, clear and test
bits in a random bignum.

Next we create a random bignum generator which sets random bits of a bignum. In order
to do this, we simply choose a random number of bits to set, from 0 to the number of words
in the bignum, then we set that many bits at random.

We call this function nn_test_random1.

We now add a second random bignum generator. It works by creating two bignums using the
function nn_test_random1 and subtracting one from the other. This results in test randoms
with long strings of 1's and 0's in its representation.

We call this function nn_test_random2.

Finally, we create a function nn_test_random which randomly switches between these two
algorithms and our original nn_random algorithm to generate random bignums. We switch all
our test code to use nn_test_random by changing the function randoms_of_len to use it.

At this point we can have greater confidence that our functions are all working as they
are supposed to be, as our test code has been suitably fortified at last! (Well, they are
working now, after I spent a day hunting down the bugs that these new randoms found - no,
I am not kidding. That's how good at finding bugs this trick is!)

Today's code is found here: v0.24

Previous article: v0.23 - sha1 and PRNG tests
Next article: BSDNT - interlude

Friday, 12 November 2010

BSDNT - v0.23 sha1 and PRNG tests

In a recent update we added three PRNGs (pseudo random number
generators). What we are going to do today is add test code for

But how do you test a pseudo random generator? It's producing
basically random values after all. So what does it matter if the
output is screwed up!?

Well, it does matter, as shown by the problems on 32 bit machines
which I wrote about in the PRNG blog post. It would also matter if
the PRNGs were broken on some platform such that they always output
0 every time!

There's a few ways of testing PRNGs. One is simply to run them for a
given number of iterations, write down the last value it produces and
check that it always does this.

The method we are going to use is slightly more sophisticated. We are
going to hash a long series of outputs from the PRNGs, using a hash
function, and check that the hash of the output is always the same.

Basically, our test code will take a long string of words from the
PRNGs, write them to an array of bytes, then compute the sha1 hash of
that array of bytes. It'll then compare the computed hash with a hash
we've computed previously to ensure it has the same value as always.

Moreover, we'll set it up so that the hash is the same regardless of
whether the machine is big or little endian.

The hash function we are going to use is called sha1. Specifically,
we'll be using an implementation of the same written by Brian Gladman
(he also supplied the new test code for the PRNGs for today's update).

The first step is to identify whether the machine is big or little
endian. This refers to the order of bytes within a word in physical
memory. On little endian machines (such as x86 machines) the least
significant byte of a word comes first. On big endian machines the
order is the other way around. Thus the number 0x0A0B0C0D would have
the byte 0x0D stored first on a little endian machine, but 0x0A stored
first on a big endian machine.

Endianness can be identified by architecture, or it can be identified
with a short program. We choose to use the latter method as it should
be hard to fool. At configure time a short C program will run that will
place bytes into a four byte array, then read that array as a single
32 bit number. We then compare the value to a 32 bit value that would
be stored in the given way on a little endian machine. If it compares
equal, then the machine must be little endian. If not we compare with
a number that would be stored in the given way on a big endian machine.

If the machine doesn't match either order, then it must be a very rare
machine with mixed endianness, which we don't support in bsdnt.

The configure script will write some defines out to config.h which
then allow bsdnt modules to identify whether the machine is little or
big endian at compile time, i.e. at zero runtime cost.

Now to discuss the sha1 hashing scheme.

A hashing scheme simply takes a piece of data and computes from it a
series of bits which serve to "identify" that piece of data. If
someone else has access to the same hashing algorithm and a piece of
data which purports to be an exact copy of the original, then they
can verify its identity by computing its hash and comparing.

Of course a hash is only valuable in this sense if it is much shorter
than the piece of data itself (otherwise you'd just compare the
actual data itself).

A very simple hashing scheme might simply add all the words in the
input to compute a hash consisting of a single word.

A secure hashing scheme has at least two other properties.

i) It shouldn't be possible to determine the original data from its
hash. (The data may be secret and one may wish to provide for the
independent verification of its authenticity by having the recipient
compare the hash of the secret data with a publicly published value.
Or, as is sometimes the case, the hash of secret data, such as a
password, might be transmitted publicly, to compare it with a
pre-recorded hash of the data.)

ii) It must be computationally infeasible to substitute fake data
for the original such that the computed hash of the fake data is the
same as that of the original data.

Of course, if the hash is short compared to the data being hashed,
then by definition many other pieces of data will have the same hash.
The only requirement is that it should be computationally infeasible
to find or construct such a piece of data.

The first step in the SHA1 algorithm is some bookkeeping. The
algorithm, as originally described, works with an input message which
is a multiple of 16 words in length. Moreover, the last 64 bits are
reserved for a value which gives the length of the original message in

In order to facilitate this, the original message is padded to a
multiple of 16 words in length, with at least enough padding to allow
the final 64 bits to be part of the padding, and to not overlap part
of the message.

The padding is done by first appending a single binary 1 bit, then
binary zeroes are appended until the message is the right length.
Then of course the length in bits of the original message is placed
in the final 64 bits of the message.

The hashing algorithm itself performs a whole load of prescribed
twists and massages of the padded message.

For this purpose some functions and constants are used.

Given 32 bit words B, C and D there are four functions:

1) f0(B, C, D) = (B AND C) OR ((NOT B) AND D)

2) f1(B, C, D) = B XOR C XOR D

3) f2(B, C, D) = (B AND C) OR (B AND D) OR (C AND D)

4) f3(B, C, D) = B XOR C XOR D

and four corresponding 32 bit constants:

1) C0 = 0x5A827999

2) C1 = 0x6ED9EBA1

3) C2 = 0x8F1BBCDC

4) C3 = 0xCA62C1D6

To begin the algorithm, we break the padded message up into 16 word
blocks M1, M2, M3, i.e. each Mi is 16 words of the padded message.

Each block is processed via a set of steps, and an "accumulated" hash
of 160 bits, consisting of five 32 bit words (the final hash we are
after) is computed: H0, H1, H2, H3, H4.

The algorithm starts by initialising the five "hash words" to the
following values:

H0 = 0x67452301, H1 = 0xEFCDAB89, H2 = 0x98BADCFE, H3 = 0x10325476
and H4 = 0xC3D2E1F0.

Each block of 16 words, Mi, of the padded message is then used in
turn to successively transform these five words, according to the
following steps:

a) Break Mi into 16 words W0, W1, ..., W15.

b) For j = 16 to 79, let Wj be the word given by

Wj = S^(-1)(W{j-3}) XOR W{j-8} XOR W{j-14} XOR W{j-16}),

where S^n(X) means to rotate the word X to the left through n bits
(a negative n means right rotation).

c) Make a copy of the hashing words:

A = H0, B = H1, C = H2, D = H3, E = H4

d) For j = 0 to 79 repeat the following set of transformations in
the order given:

TEMP = S^5(A) + f{j/20}(B, C, D) + E + Wj + C{j/20},

E = D, D = C, C = S^30(B), B = A, A = TEMP,

where j/20 signifies "floor division" by 20, and where f and C are
the above-defined functions and constants.

e) Update the hashing words according to the following:

H0 = H0 + A, H1 = H1 + B, H2 = H2 + C, H3 = H3 + D, H4 = H4 + E.

Note that steps a-e are repeated for each block of 16 words, Mi in
the padded message, further manipulating the five words with each run.
The resulting five words H0, H1, H2, H3, H4 after all the words of the
padded message have been consumed, constitutes the sha1 hash of the
original message.

The function to compute the sha1 hash of a message is given in the
files sha1.c and sha1.h in the top level directory of the source

A new test file t-rand.c is added in the test directory. It contains
the sha1 hash of a large number of words as output by our three
PRNGs. If a user of bsdnt has the same hash for the PRNGs when run
on their machine, then they can have a very high level of confidence
that they are working as expected.

Note that the sha1 algorithm is known as a secure hashing algorithm,
which means that in theory it can be used to hash very important
data so that the recipient can independently confirm the data hasn't
been tampered with (by computing the hash of the value and making
sure it matches some published value).

We don't explain how sha1 actually works. The mysterious constants
are not so mysterious. C0 is the square root of 2 in hexadecimal, C1 is
the square root of 3, C2 the square root of 5, C3 the square root of 10.

I don't know the meaning of the functions f0-f3.

What is worth noting is that in recent years, people have figured out
how to produce sha1 hash collisions (two messages with the same hash).
I don't pretend to be an expert in such things.

All we care about here is that a broken PRNG really can't pretend to
be working, and for that, sha1 works a treat.

Disclaimer: use the information in this post at your OWN RISK!! We
make no representations as to its correctness. The same goes for
bsdnt itself. Read the license agreement for details.

Warning: cryptography is restricted by law in many countries including
many of those where the citizens believe it couldn't possibly be so.
Please check your local laws before making assumptions about what you
may do with crypto.

The code for today's article is here: v0.23

Previous article: v0.22 - Windows support

Thursday, 11 November 2010

BSDNT - v0.22 Windows support

Today's update is a massive one, and comes courtesy of Brian Gladman. At last we add
support for MSVC 2010 on Windows.

In order to support different architectures we add architecture specific files in the arch
directory. There are three different ways that architectures might be supported:

* Inline assembly code

* Standalone assembly code (using an external assembler, e.g. nasm)

* Architecture specific C code

Windows requires both of the last two of these. On Windows 64 bit, MSVC does not support
inline assembly code, and thus it is necessary to supply standalone assembly code for this
architecture. This new assembly code now lives in the arch/asm directory.

On both Windows 32 and 64 bit there is also a need to override some of the C code from the base
bsdnt library with Windows specific code. This lives in the arch directory.

Finally, the inline assembly used by the base bsdnt library on most *nix platforms is now in the
arch/inline directory.

In each case, the os/abi combination is specified in the filenames of the relevant files. For
example on Windows 32, the files overriding code in nn_linear.c/h are in arch/nn_linear_win32.c/h.
(Note win32 and x64 are standard Windows names for 32 and 64 bit x86 architectures, respectively.)

If the code contains architecture specific code (e.g. assembly code) then the name of the file
contains an architecture specifier too, e.g. arch/inline/nn_linear_x86_64_k8.h for code specific
to the AMD k8 and above.

It's incomprehensible that Microsoft doesn't support inline assembly in their 64 bit compiler
making standalone assembly code necessary. It would be possible to use the Intel C compiler on
Windows 64, as this does support inline assembly. But this is very expensive for our volunteer
developers, so we are not currently supporting this. Thus, on Windows 64, the standalone
assembly is provided in the arch/asm directory as just mentioned.

Brian has also provided MSVC build solution files for Windows. These are in the top level source
directory as one might expect.

There are lots of differences on Windows that requires functions in our standard nn_linear.c,
nn_quadratic.c and helper.c files to be overridden on Windows.

The first difference is that on 64 bit Windows, the 64 bit type is a long long, not a long. This
is handled by #including a types_arch.h file in helper.h. On most platforms this file is empty.
However, on Windows it links to an architecture specific types.h file which contains the
requisite type definitions. So a word_t is a long long on Windows.

Also, when dealing with integer constants, we'd use constants like 123456789L when the word type
is a long, but it has to become 123456789LL when it is a long long, as on Windows 64. To get
around this, an architecture specific version of the macro WORD(x) can be defined. Thus, when
using a constant in the code, one merely writes WORD(123456789) and the macro adds the correct
ending to the number depending on what a word_t actually is.

Some other things are different on Windows too. The intrinsic for counting leading zeroes is
different to that used by gcc on other platforms. The same goes for the function for counting
trailing zeroes. We've made these into macros and given them the names high_zero_bits and
low_zero_bits respectively. The default definitions are overridden on Windows in the architecture
specific versions of helper.h in the arch directory.

Finally, on Windows 64, there is no suitable native type for a dword_t. The maximum sized
native type is 64 bits. Much of the nn_linear, and some of the nn_quadratic C code needs to
be overridden to get around this on Windows. We'll only be using dword_t in basecase algorithms
in bsdnt, so this won't propagate throughout the entire library. But it is necessary to
override functions which use dword_t on Windows.

Actually, if C++ is used, one can define a class called dword_t and much of the code can
remain unchanged. Brian has a C++ branch of bsdnt which does this. But for now we have C code
only on Windows (otherwise handling of name mangling in interfacing C++ and assembly code
becomes complex to deal with).

Brian has worked around this problem by defining special mul_64_by_64 and div_128_by_64
functions on 64 bit Windows. These are again defined in the architecture specific version of
helper.h for Windows 64.

Obviously some of the precomputed inverse macros need to be overridden to accomodate these
changes, and so these too have architecture specific versions in the Windows 64 specific version
of the helper.h file.

In today's release we also have a brand new configure file for *nix. This is modified to handle
all the changes we've made to make Windows support easy. But Antony Vennard has also done
some really extensive work on this in preparation for handling standalone assembly on arches
that won't handle our inline assembly (and for people who prefer to write standalone assembly
instead of inline assembly).

The new configure file has an interactive mode which searches for available C compilers (e.g.
gcc, clang, icc, nvcc) and assemblers (nasm, yasm) and allows the user to specify which to use.
This interactive feature is off by default and is only a skeleton at present (it doesn't actually
do anything). It will be the subject of a blog post later on when the configure file is finished.

The code for today is found at: v0.22

Previous article: v0.21 - PRNGs