Big Int Encoding
I'm coming at this from the perspective of cryptography, because that's the topic this week that lead me here, but I believe you'll find value from it in many contexts.
In short its like this:
- Non-big ints have fixed sizes
- Big Ints have arbitrary sizes
- A small big int can be smaller in size than small normal integer
- Big Ints can be negative
- if it is ambiguous if the Big Int is negative, a 0x00 pad is used
- if a specific size is desired, Big Ints pad with 0x00 (just like normal ints)
- The rules are subject to change without notice
Why Big Ints
For everyday maths we can use relatively small numbers to describe all the things that we'd want to describe:
- The number of eggs in a carton (12)
- How many siblings are had by an average person in a developed nation (0)
- The total consumer and fedaral debt of the United States ($21 trillion)
Howover, once you start getting into scientific models of the number of atoms in a 1-kilogram silicon sphere or the cryptograhpy involved in the TLS handshake between your browser and the web server of this page, you need something more (if only for a moment).
You need BIG Ints - otherwise known as Arbitrary Precision Integers.
Bit Ints vs... not-big ints?
A normal integer is 32 or 64 bits.
A positive 32-bit integer goes to 4 about billion (or much higher if you use lossy floating point integers that merely approximate a number, but we'll save those for another time). A 64-bit integer goes to... much bigger (but still pretty small, all things considered).
By contrast an EC key is 32 or 48 bytes (and an RSA key is 256 bytes).
Also, normal integers (only 4 or 8 bytes) are aligned to a byte size. If you store the number 1 in a 32 byte integer (technically just 1 bit) it takes up just as much space as the number 100 (7 bits), or 1 million (20 bits).
Big Ints, however, cannot be "aligned" because they don't "fit" into a size that a normal CPU can process (though GPUs may be another story).
If you're used to dynamic languages this might hit home right away, but in static languages it's kinda important to know the size of the thing that has to go into the computer's memory. Big Ints break the rules (or at least convention).
The Size of Big Ints
Big Ints can store very large numbers because they aren't constrained to a particular byte size.
They can can also store very small (or negative) numbers.
We'll use very small Big Ints just for the sake of easy-to-read examples.
256 as 32-bit integers:
# 1 00000000000000000000000000000001 # 256 00000000000000000000000100000000
256 as "Big Ints":
# 1 00000001 # 256 0000000100000000
Positive and Negative "small" ints
Whether a number is positive or negative is denoted by the "high-order bit" (the first bit).
An integer can be signed (allows negative values) or unsigned (positive only). When an integer allows negative values. Without getting into the full mechanics of how that works, suffice it to say that the very first bit is used to mark a number as negative.
This is a 32-bit positive 3:
Change the first bit to 1 and you get a 32-bit negative 4:
(negative numbers are 1 less than their positive equivalent because there's no such thing as -0... at least not in the math we care about right now)
Positive and Negative Big Ints
Big Ints can be either positive or negative just like regular ints, and just like regular ints they use the first bit to denote that.
Unlike regular ints, however, they don't have a set bit size to identify which bit should be the first.
Let's review how some very small numbers might be stored as "Big Ints":
127 (only uses 1 byte, also):
But... what if we want to store 128 as a big int?
Now we have a problem: That's indistinguishable from -1!!
So if a number is positive, we have to give it a byte of padding:
Hence a if you're using Big Ints to store data and you have a random 32 byte (256 bit) integer, it may be one of the smaller numbers in the random space and may actually only use 30 bytes.
If it's one of the larger integers in the space it may actually use 33 bytes (the extra byte for padding), but would still be considered 32 bytes.
Arbitrary Big Ints Bingo
Now that we've acknowledged the rules, it's time to acknowledge why we sometimes have to break them.
Let's say that you're storing the number
If you're doing normal maths operations it won't matter how many bytes you use. If you square it or multiply it or add to it or subtract you'll get all the same answers either way.
However, here's the rub:
It'll look wildly different in base64 if you store it as one byte (
vs if you store it as two bytes (
For this reason, it becomes important that you store and transport it in the correct way.
In the PEM (a.k.a. ASN.1 / x.509 / DER) formats that you use commonly use for storing
cryptographic keys (which have several parts of various lengths encoded inside),
the Integers are generally stored exactly as described above. However, in JWK (another
common format which also uses base64), they're stored as short as possible (no extra
padding for negative or slightly smaller numbers).
Within a PEM itself numbers are sometimes embedded within other numbers.
This is the case for EC, which embeds 65-bytes consisting of a single byte
which is part of EC number formats to mark that the number is actually two numbers:
y point pair, and then exactly 32-bytes for each point, even if they have
to be padded up from smaller numbers. Negative numbers are not considered,
so there's no padding for that as there might otherwise be.
It appears that RSA holds to the ASN.1 rules but, at least in the case of
RSA public modules values are always at least 2041 bits wide and their first byte is
0x90, meaning that the negative number padding is always needed and the
byte-width padding is never needed.
By AJ ONeal
Did I make your day?
Buy me a coffee