## Thursday, 14 October 2010

### BSDNT - v0.16 mullow_classical, mulhigh_classical

The most logical routine to implement next would be Hensel division. Its
main application is in doing exact division.

For that reason, we might want to focus on Hensel division without
remainder first.

This would take an m word integer and divide it by an n word integer,
division proceeding from right to left. Essentially it gives us division
modulo B^m.

However, before we implement this, it makes sense to think about what its
"inverse" operation might be.

If q is the Hensel quotient of a by d mod B^m, then the inverse operation
can be thought of as multiplication of q by d mod B^m.

Hensel division gives an m word quotient q, so this would imply that its
inverse should be a multiplication of {d, n} by {q, m} mod B^m. We might call
this inverse operation mullow, as it returns the low m words of the product
d*q.

However, we need to be careful with this kind of multiplication. We'd also
like to have a mulhigh which returns the high part of the multiplication,
and we'd like the sum of mullow and mulhigh to be the same as a full mul.

However, there is a problem if mullow merely returns the product mod B^m. Any
carries out of mullow will have been lost. Also, all the word by word

multiplications that contribute to the high word of the product mod B^m
will be thrown away.

To rectify the problem we accumulate an "overflow" out of the mullow
corresponding to the sum of all these high words and carries. As this
overflow is essentially the sum of arbitrary words it may take up two words.

Thus, instead of mullow yielding m words it will yield m + 2 words. We'd
like to pass the extra two words as an "overflow-in" to mulhigh, thus the
logical thing is to return these two words from mullow separately from the
product mod B^m itself.

Hensel division will also return two overflow words. After all, what it
essentially does to the dividend is subtract a mullow of the quotient by the
divisor. So, the overflow from Hensel division will be defined as precisely
the overflow from mullow(q*d).

We manage the overflow by accumulating it in an dword_t. However, as we don't
wish the user to have to deal with dword_t's (these are used in our internal
implementations only), we split this dword_t into two separate words at the
end and return them as an array of two words representing the "overflow".

Today we shall only implement mullow and mulhigh. The first of these is a lot
like a full multiplication except that the addmul1's become shorter as the
algorithm proceeds and the carry-out's have to be accumulated in two words, as
explained.

At the same time we implement mulhigh. This takes two "overflow-in" words and
computes the rest of the product, again in a fashion similar to a full
multiplication.

Our test code simply stitches a mullow and mulhigh together to see that the
chain is the same as a full multiplication.

we have to be careful in that if one does an n by n mullow, the mulhigh that
we wish to chain with this must start with an n-1 by 1 multiplication,
not an n by 1, otherwise the sum of the mullow and mulhigh would contain the
cross product of certain terms twice.

We also have to be careful in the case where the full multiplication is only
a single word by a single word. Here the overflow out of the mullow part is
only a single word and there is no mulhigh to speak of. It merely passes
the overflow from the mullow straight through.

Both mullow and mulhigh must accept all nonzero lengths, as per full
multiplication. This causes a few cases to deal with in mulhigh. This doesn't
seem particularly efficient or elegant, but there seems to be little we

An interesting question is what the inverse of mulhigh is. Essentially,
this is our euclidean divapprox.

There's something slightly unsatisfying here though. Recall that the divapprox
algorithm proceeds by subtracting values q1*d' where q1 is the current quotient
word and d' is what is currently left of the divisor. We throw away one word of
the divisor at each iteration until finally we are left with just two words.

It would be wholly more satisfying if we didn't require this extra word of the
divisor throughout. We'd then be working right down to a single word in the
divisor so that we will really have subtracted a mulhigh by the time the
algorithm completes.

Any modification I can think of making to the euclidean division to make this
seem more natural also makes it much less efficient.

Perhaps some further thought will lead to a more satisfying way of thinking about
these things, which isn't also less efficient in practice.

The code for today is here: v0.16

Previous article: v0.15 - divapprox_classical
Next article: v0.17 - div_hensel