Wednesday 13 October 2010

BSDNT - v0.15 divapprox_classical

During the past few weeks, Brian Gladman has been doing
some tremendous updates, including some random number generators and
making bsdnt work on Windows (MSVC).

We discuss all matters related to bsdnt on our development list:


Anyhow, now for today's update.

We'd now like to implement a variant of our divrem_classical algorithm. This
time we'd like to just return a quotient, with no remainder. The question is,
can this be done in less time than a full divrem?

At first sight, the answer seems to be no. As we saw in the post on the divrem
algorithm, every word of both the dividend and divisor counts, and we need
to keep the dividend completely updated as the algorithm proceeds, otherwise
we will get the wrong quotient.

So, with the exception of perhaps the final update (which is only needed to
determine the remainder), there doesn't seem to be much we can save.

But what if we allowed the quotient to be approximate, say within 1 of the actual

quotient? In fact, let's demand that if q' is our approximate quotient, that |a -

d*q'| < d. In other words, we allow the quotient to be too large by 1, but not

too small by 1.

Ideally, what we would like is to be doing about half the updating work.
Specifically, we'd like to be truncating both the dividend and divisor as we
go.

Ordinarily we start with something like

AAAAAAAAAAAAAAA /
DDDDDDDDD

To get the first quotient word we shift the divisor and pad with zeroes, thus

AAAAAAAAAAAAAAA /
DDDDDDDD000000

After one iteration, a word of the dividend has been removed, and we then
shift the divisor again.

AAAAAAAAAAAAAA /
DDDDDDDD00000

We continue until we have

AAAAAAAAA /
DDDDDDDD

Now there is only one quotient word left. But we notice that we don't use
most of the remaining dividend or the divisor to determine this quotient
word. In fact, we could almost truncate to

AA /
D

What if we had truncated at this same point all along?

In fact, if we truncate so that the final division is a two word by one
word division (here we have to be careful, in that we are talking about the
number of words *after* normalisation), then clearly our quotient could be
out by as much as two on that final division, by what we have said in an
earlier post. That is of course ignoring any accumulated error along the way.

As we don't wish to multiply the entire thing out to see what we have to do
to correct it, it is clear that this amount of truncation is too great.

So let's truncate one further word to the right in both the dividend and
divisor, so that the final division (to get the final quotient word) is a
three word by two word division.

In fact, in the example above, as there will be five quotient words, there
will be five iterations of the algorithm, after which we want two words of
the divisor remaining. So, we will start with

AAAAAAA.A /
DDDDDD.D

(The decimal points I have inserted are arbitrary, and only for notational
purposes in what follows.)

After five iterations, throwing away one more word of the divisor each time,
we'll be left with

AA.A /
D.D

The first thing to notice is that our previous divrem algorithms, with the
adjustments they made as they went, gave the precise quotient given the
data they started with.

The second thing to notice is that truncating both the dividend and the
divisor at the same point, as above, will not yield a quotient that is too
small. In fact, the quotient we end up with will be the same as what we would
have obtained if we had not truncated the dividend at all, and only truncated
the divisor. Additional places in the dividend can't affect the algorithm.

Truncating the divisor, on the other hand, may result in a different quotient
than we would have obtained without truncation. In fact, as we end up
subtracting less at each update than we would if all those words were still
there, we may end up with a quotient which is too large. The divisor may also
divide more times, because of the truncation, than it would have if it had not
been truncated.

However, it is not enough to merely consider how the quotient changes with
truncation in order to see how far we can be out. We'll likely end up with a
very pessimistic estimate if we do this, because we may suppose that the
quotient can be one too large at each iteration, which is not true.

Instead, the quantity to keep track of is the original dividend minus the
product of the full divisor d and the computed quotient q. At the end of the
algorithm, this is the actual remainder we'll end up with, and we'd like to
keep track of how much our *computed* remainder (what's left of the dividend
after the algorithm completes) differs from this actual remainder.

Essentially, we accumulate an error in the computed remainder due to the
truncation.

Clearly, at each iteration, the word after the decimal point in what remains
of the dividend may be completely incorrect. And we may miss a borrow out of
this place into the place before the decimal point. So after n iterations of
the algorithm, the dividend may become too large by n. Of course n.0 is much
smaller than our original (normalised) divisor d (also considered as a decimal
D.DD...).

At the final step of the algorithm, we will have a dividend which is too large
by at most this amount, and we'll be using a divisor which is truncated to
just two words. However, the latter affects the computed remainder by an
amount much less than the original d (if Q is the final quotient word, it is
as though we added q*0.0DDDDDD to our divisor, so that the full divisor would
go Q times where it otherwise would only go Q-1 times).

So these two sources of error only increase the computed value q'*d + r (where
q' is the computed quotient and r is the computed remainder) by an amount less
than d. Thus, the computed quotient q' can be at most one larger than the
actual quotient q.

This is equivalent to the required |a - d*q'| < d.

So it seems that if we truncate out dividend at the start of the algorithm,
and our divisor after each iteration, we can get an approximate quotient q'
within the required bounds.

We'll leave it to another time to describe the usefulness of an algorithm
which computes a quotient which may be out by one. What we will note is that
we've done computations with much smaller integers. It therefore costs us
significantly less time than a full divrem.

In this week's branch we implement this algorithm. In the test code, we check
the required quotient is the same as the one returned by divrem, or at most
one too large.

The trickiest part is ensuring we truncate at the right point. We want to
finish on the last iteration with two words *after* normalisation of the
divisor.

Actually, if we are really smart, we realise that if d does not need
to be shifted by much to normalise it, we can get away with finishing with
just two *unnormalised* words in the divisor. The error will still be much
less than d.

To be safe, if the number of quotient words is to be n, I check if the
leading word of the unnormalised divisor is more than 2*n. If not, too much
normalisation may be required, and I set up the algorithm to finish with
three unnormalised words instead of two. Otherwise it is safe to finish
with two words in the unnormalised divisor.

The algorithm begins by computing the number of words s the divisor needs to
be to start. This is two more than the number of iterations required to
get all but the final quotient word, since we should have two words in the
divisor at this point. If normalisation is going to be a problem, we add one
to this so that we compute the final quotient word with three unnormalised
words in the divisor.

Now the number of words required to start depends only on the size of the
quotient, and thus it may be more than the number of words d actually has.
Thus we begin with the ordinary divrem algorithm until the number of words
required is less than the number of words d actually has.

Now we truncate d to the required number of words and the dividend to one
more than that. The remaining part of the algorithm proceeds in the same
manner, throwing away a divisor word every iteration.

Only one thing can go wrong, and that is the following: because we are
truncating the divisor at each point, we may end up subtracting too little from
the dividend. In fact, what can happen is that the top word of the dividend
does not become zero after we subtract q*d' (where d' is the truncated divisor).

When this happens, the top word of the dividend may be 1 after the subtraction.

We know that the least significant word of the dividend could be completely
wrong, and the next word may be too large by about as many iterations as
we've completed so far.

Thus, in order to fix our overflow, we subtract from the dividend as much as we
need to in order for the overflow to disappear. We don't mind our dividend
being too large, as we adjust for that in the algorithm. But we cannot allow it to
become too small. Thus we must only subtract from the dividend precisely the
amount required to make the overflow vanish.

We can safely subtract away whatever is in the bottom two words of the
dividend, as this is not even enough to remove the overflow. And then we can
subtract 1 from the whole dividend. This must remove the overflow and is
clearly the least we can get away with subtracting to do so.

The code for today's post is here: v0.15

Previous article: v0.14 - divrem_classical

No comments:

Post a Comment