I was working on a project that required simple arithmetic for very large integers, a set of algorithms called “Arbitrary Precision Math”.
Thinking back to elementary school, simple algorithms exist for addition, subtraction, and multiplication of two numbers with any number of digits.
To my surprise, every algorithm for division either relies on logarithms, which are difficult to implement in arbitrary precision, or the first instruction was “guess the first number, then guess the second number” etc…
Update: 10/2015: I’ve put together a YouTube video for this post. Check it out, here:
Read on, for a simple, reliable, repeatable algorithm for dividing integers of any length.
Table of Contents
Background
What is Arbitrary Precision Math?
Computers store numbers based on a variable’s data type, and the data type determines the largest number that you can store:
Bits | Bytes | Data Type | Largest Signed Integer | Largest Unsigned Integer |
8 | 1 | Byte | 127 | 255 |
16 | 2 | Int (Word) | 32,767 | 65,535 |
32 | 4 | Long (D-Word) | 2,147,483,647 | 4,294,967,295 |
64 | 8 | Int64 (Q-Word) | 9,223,372,036,854,780,000 | 18,446,744,073,709,600,000 |
Notice the trailing zeros for the Int64 data type – the program I used to perform the calculation doesn’t have enough precision to calculate the actual digits, resulting in rounding errors!
Being able to represent any 18-digit number is fairly impressive, if you have a compiler and operating system that support the Int64 data type.
What if you need 20 digits? Or 100 digits?
Although you could implement an arbitrary-precision library using strings, this is increasingly inefficient as the number of digits increases, because each digit requires a whole byte to encode.
A better solution is to create a multi-segment string, whose segments are each one of the base integer data types. For example, a 16-bit Int can represent any base-10 4-digit number: 0000 through 9999. Let’s say you have a data type and custom library that implements a string of Ints, up to 30 positions. This custom data type can represent an unsigned number up to 30 * 4, or 120 digits in length!
Just as you can represent extremely long integers using arbitrary-precision math, you can represent long strings of decimal digits as well, because you can choose where to put the decimal point.
What is Division?
When we say, “divide A by B”, “A” is the Numerator (Dividend), “B” is the Denominator (Divisor).
The result is called the Quotient. In integer division, the fractional portion is expressed as a Remainder, which represents the ratio of “what’s left over” to the denominator / divisor.
- Expressed as a formula:
N/D = Q + R/D - N – Numerator, or Dividend, is the number being divided
- D – Denominator, or Divisor, is the size of the bucket used to slice up N.
- Q – Quotient (result), or the number of slices created if we carve up N in to D-sized buckets
- R – Remainder, or the quantity of N that’s left over, and won’t quite fill a whole D-sized bucket. So the remainder is actually a fraction, a portion of 1, expressed as R/D where R/D < 1
- Multiply the quotient / remainder by the denominator, to reconstruct N. Remember that the remainder is a fraction, R/D:
N/D = Q + R/D
Multiply by D:
N = Q * D + R
Finding the First Digit
Let M be the magnitude in digits (minus 1) of the divisor, D.
So if D = 6789, M would be 3 (4 digits, minus 1)
A is the “quick divisor”, or the first digit of the divisor, with all other digits equal to zero. Expressed as a formula:
A = D – ( D MOD 10^M )
In our example above:
A = 6789 – ( 6789 MOD 10^3 )
A = 6789 – ( 6789 MOD 1000 )
A = 6789 – 789
A = 6000
We’re trying to set all digits to zero, except the first digit. What’s the first thing you do in long division? Drop all the zeros in the denominator. So if we were trying to divide 567,892,345 by 6,789 Reducing 6,789 to 6,000 allows us to reduce the precision of the numerator by the same number of digits. We now have a single-digit math problem!
567,892,345 / 6,789 (Difficult)
567,892,345 / 6,000 = 567,892 / 6 (Easy)
The “quick divisor”, A, will allow us to treat the entire division problem as single-digit division, easily computable by solving one digit at a time.
Process Overview
The process uses error-correcting iterative steps to converge on the quotient and remainder.
Each iterative quotient candidate is multiplied by the divisor, and the difference between the result and the numerator is halved.
This results in a new quotient candidate. To more quickly converge, the previous two quotient candidates can be averaged.
Once there is no further oscillation, and the absolute value of the remainder is less than the denominator (ratio of R/D <1) then the proper quotient has been determined.
Algorithm and Detailed Walkthrough
Algorithm
Recap of variables:
- N – Numerator
- D – Denominator
- Q – Quotient
- R – Remainder
- M – Magnitude
- A – Quick Divisor
Here is the algorithm (Assume all calculations are integer calculations):
- M = length in digits of D, minus 1. So if D is 1234, there are 4 digits, M=3.
- A = D – (D MOD 10^M). This results in the first digit of D, followed by all zeros
- Q = N / A
- R = D+1
- While ABS(R)>=D
- R = N – (Q * D)
- Qn = Q + R / A
- Q = (Q + Qn) / 2
- R = N – (Q * D) Calculate a final value for R
- If R<0 then
- Q=Q-1
- R=R+D
Once the loop exits, Q is the quotient, R is the remainder.
Walk Through – How does this work?
When you have to “guess” the quotient digit based on a multi-digit divisor, how do you do that? Even the guess is an iterative process.
Conversely, it’s easy to do a single-digit division. Grab a pair of digits, divide, take the remainder times 10, grab the next digit, etc…
So we start by creating a “quick divisor”, A, that can be calculated using single-digit division.
The result is way off, but this is an iterative process.
We start with an initial value for Q by using single-digit division to calculate N / A.
Start of Loop
The remainder, R, is the difference between our numerator N, and Q * D (our initial quotient candidate, times the denominator)
By dividing R by A (the quick divisor), we can do another simple, single-digit division step, to get a value that can be added to Q, in order to make Q more accurate.
Taking this approach results in oscillation. To dampen the oscillation, we average Q and Qn:
- The original quotient candidate is Q.
- New Q, or Qn = Q (the original quotient candidate) + R / A (the error or offset)
- Q = (Q + Qn) / 2 (average the two, to dampen oscillation, and reduce iterations by half)
Once R settles in to a an absolute value <=D, you have the right answer, Q
A final computation step must be added, to calculate the final value for R based on the final value for Q, otherwise, you’d have the R corresponding to the old Q value.
In some cases, R can stay negative, but converge to a an absolute value <=D. In this case, a final adjustment must be made:
- Subtract 1 from Q – forces Q * D to a value <= N
- Add D to the remainder, R, representing the “1” we subtracted from Q, and forcing R positive
Q is now the proper quotient, and R is the remainder.
Examples
Example 1
Divide 999,999 by 7,777
N (Numerator) = 999,999
D (Denominator) = 7,777
M = Len(D)-1, so M=3
A = D – (D MOD 10^M)
A = 7,777 – (7,777 MOD 1000)
A = 7,777 – 777
A = 7,000
Q = N / A
A = 999,999 / 7,000
Q = 142 (remember, this is all integer math)
R = D+1 = 7,778 (initial value, false condition for while loop)
Loop | Old Q | Old Q x D | R | R / A | Qn | Q | Guess |
1 | 142 | 1,104,334 | -104,335 | -14 | 128 | 135 | 945,560 |
2 | 135 | 1,049,895 | -49,896 | -7 | 128 | 131 | 968,891 |
3 | 131 | 1,018,787 | -18,788 | -2 | 129 | 130 | 992,222 |
4 | 130 | 1,011,010 | -11,011 | -1 | 129 | 129 | 992,222 |
5 | 129 | 1,003,233 | -3,234 | 0 | 129 | 129 | 999,999 |
Compute the final value for R:
R = N – ( Q * D )
R = 999,999 – (129 * 7,777)
R = -3,234
Since R is negative, we have a final adjustment to Q and R
Q = Q -1
Q = 129-1
Q = 128
R = R + D
R = -3,234 + 7,777
R = 4,543
Checking:
N = Q * D + R
999,999 = 128 * 7,777 + 4,543
999,999 = 995,456 + 4,543
999,999 = 999,999
Check!
Example 2
Divide 999,999,999 by 999,999
N (Numerator) = 999,999,999
D (Denominator) = 999,999
M = Len(D)-1, so M=5
A = D – (D MOD 10^M) = 900,000
Q = N / A = 1,111
R = D+1 = 1,000,000 (initial value, false condition for while loop)
Loop | Old Q | Q x D | R | R / A | Qn | Q | Guess |
1 | 1,111 | 1,110,998,889 | -110,998,890 | -123 | 988 | 1,049 | 938,000,061 |
2 | 1,049 | 1,048,998,951 | -48,998,952 | -54 | 995 | 1,022 | 973,000,026 |
3 | 1,022 | 1,021,998,978 | -21,998,979 | -24 | 998 | 1,010 | 988,000,011 |
4 | 1,010 | 1,009,998,990 | -9,998,991 | -11 | 999 | 1,004 | 994,000,005 |
5 | 1,004 | 1,003,998,996 | -3,998,997 | -4 | 1,000 | 1,002 | 998,000,001 |
6 | 1,002 | 1,001,998,998 | -1,998,999 | -2 | 1,000 | 1,001 | 999,000,000 |
7 | 1,001 | 1,000,998,999 | -999,000 | -1 | 1,000 | 1,000 | 999,000,000 |
Compute the final value for R:
R = N – ( Q * D )
R = 999,999,999 – (1,000 * 999,999)
R = 999
Since R positive, we don’t need to adjust Q nor R
Checking:
N = Q * D + R
999,999,999 = 1,000 * 999,999 + 999
999,999,999 = 999,999,000 + 999
999,999,999 = 999,999,999
Check!
Corollary in Base X
The nice thing about this algorithm is that it works in any base!
The examples above are in base 10.
Recalling our multiple-segment arbitrary-precision data type, we used a string of Int16’s, of which each Int16 is used to store a 4-digit segment of some very large number.
If we consider each segment a base-10,000 digit, then we can perform the above algorithm one whole segment at a time instead of one digit at a time!
In the above example, N = 999,999,999 and D = 999,999, these numbers would be represented by a string of Int16’s like this:
- N = 9:9999:9999 (3 segments)
- D = 99:9999 (2 segments)
- M = 1
- A = 99:9999 – ( 99:9999 MOD 10,000^M ) = 99:0000
The first step would be to find the initial value of Q, by dividing N by A.
This can be done one “digit” at a time:
9:9999:9999 / 99:0000
Fix up the precision based on A:
9:9999 / 99
Compare one digit at a time:
9 / 99 = 0, R 9
The next segment multiplies the remainder times the base, adding the next digit:
9 * 10,000 + 9999 = 99,999
99,999 / 9 is easily computable using 32-bit math. This results in an initial Q = 1,010
Notice that we had to double our precision to accommodate carry / borrow. This means that the segment data type has to be able to accommodate n * b + n, where n is the largest number per segment, and b is the base.
For example, since Int16 can accommodate any 4-digit number, it’s more expedient, if limited to 16-bit math, to limit the segment size to 2 digits. This allows 99 * 100 + 99, or 9999 to be stored in a single segment, to allow for carry / borrow.
On the other hand, if we continue to utilize 16-bit segments, we need a 32-bit data type to perform the segment-by-segment math.
Summary
The algorithm outlined above can be used to perform arbitrary-precision division using character strings, or multi-segment strings of integers, and the concept works in any base.
Dear Mister Parr,
If you replace the “simple division” by a real one, the number of necessary repeats will be much lower.
I worked it out in Excel for strings with a length up to 128 figures, using VBA. If you are interested, I can
send you the worksheet.
With kind regards,
Gerard van Berkel,
Ridderkerk, The Netherlands.
Email: g.v.berkel@hetnet.nl
Hi: The point is that in arbitrary-precision, there is no “go divide this huge number” operation, unless you use a third-party library. In this case, I had strings of numbers, hundreds of digits long.
Found an interesting minor improvement for a speedup by packing as much information in the divisor as possible. My experiments with a 128bit number goes from 8 cycles, to a mere 2.
Pingback: HOME | Justin A. Parr - Technologist