It's time for our first division function. This will again be a linear function
in that it divides a multiprecision integer by a single word and returns the
(single word) remainder.
This is the first time we can do substantially better than a simple for loop
(other than when we coded nn_cmp and other functions, which allowed early exit).
To get us going, we'll write a really dumb division function first. It'll just
use C's "/" and "%" operators in a for loop.
C is supposed to optimise simultaneous occurrences of / and % together in the
same piece of code to use a single processor instruction where available.
However, even with that optimisation, our function will be exceedingly slow.
We'll see why that is when we implement a not-stupid divrem1, next time around.
To emphasise its stupidity, we'll call our exceedingly dumb division function
nn_divrem1_simple. Perhaps we can use it in our test code to compare against
when we write the more sophisticated version.
Hopefully our naming conventions won't be defeated by this function, which
proceeds left to right (or most significant word first).
One doesn't usually need to write the remainder down immediately following
the quotient (though one could imagine a function operating that way).
However, it is convenient to be able to accept a "remainder-in", essentially
thought of as an extra limb of the dividend, so that divrem can be chained.
In this way, divrem1 is most similar to right shift in semantics (the latter
is just division by a power of two, after all).
Now, if we blindly try to implement this function, we quickly discover a
problem. Even if we work with double words, a division of a two limb quantity
by a single limb can be problematic.
We can illustrate this using decimal digits, instead of machine words. Suppose
we divided 94 by 7. The resulting quotient is 13 with remainder 3 (I hope!).
The point is, the quotient takes up two "words" 1 and 3, not just a single
We can get around it by reducing the top word first. We have 9 divide 7 is 1
remainder 2. Thus the first word of our quotient is 1. Now we are left with
24 divide 7. This time the quotient is 3 with remainder 3 and everything is
fine. Note the remainder, 3, is less than 7.
The critical thing to note is that once we get things started, everything
works fine from then on. The top word in our double words are always reduced
modulo the divisor after the first iteration.
So the almost trivial algorithm almost works, except for the lead-in, where
we have to get things right. After that, we can divide our dividend word on
word until we reach the bottom, where we'll have a single limb remainder.
Now as mentioned, it is also convenient to supply a "carry-in". This acts like
an additional limb of the dividend. Assuming this carry-in was reduced mod
the divisor, we find that we end up with as many words in our quotient as we
had in our original dividend.
Clearly however, if the carry-in is not reduced, we will end up with even one
more quotient limb! However, there is an easy way to avoid this problem.
We simply decide to restrict the carry-in to be reduced modulo the divisor!
This restriction is not a problematic restriction because we will still be
able to chain our division functions together with this restriction (so long
as we use the same divisor throughout).
Anyhow, with this restriction, the quotient will have precisely the same number
of limbs as the dividend, and since we'll start with the carry-in, not the top
limb of the dividend, and that carry-in will already be reduced, there is no
bootstrapping iteration required, i.e. the problem of a non-reduced first limb
as explained above, simply doesn't exist.
In this way we avoid any complex feed-in and wind-down code, and as for all
the functions so far, the length m can be zero. The result is satisfyingly
symmetric, even if it is terribly slow.
In the test code, if q = a / c with remainder r, we can check that a = q*c + r.
We can use the functions we have already implemented, including our mul1_c code
to verify this.
We also check that passing in a reduced carry-in results in a correct result,
by chaining two divisions together.
This will do for today, as a more serious implementation of divrem1 will
require much more serious code. We'll take a look at that next time.
The github branch here v0.7 is for this post.
Previous article: bsdnt v0.6 - addmul1, submul1
Next article: bsdnt assembly language