# Big Int Encoding

**Published**2018-11-30

I'm coming at this from the perspective of cryptography, because that's the topic this week that lead me here, but I believe you'll find value from it in many contexts.

In short its like this:

- Non-big ints have fixed sizes
- Big Ints have arbitrary sizes
- A small big int can be smaller in size than small normal integer
- Big Ints can be negative
- if it is ambiguous if the Big Int is negative, a 0x00 pad is used
- if a specific size is desired, Big Ints pad with 0x00 (just like normal ints)

- The rules are subject to change without notice

## Why Big Ints

For everyday maths we can use relatively small numbers to describe all the things that we'd want to describe:

- The number of eggs in a carton (12)
- How many siblings are had by an average person in a developed nation (0)
- The total consumer and fedaral debt of the United States ($21 trillion)

Howover, once you start getting into scientific models of the number of atoms in a 1-kilogram silicon sphere or the cryptograhpy involved in the TLS handshake between your browser and the web server of this page, you need something more (if only for a moment).

You need BIG Ints - otherwise known as *Arbitrary Precision* Integers.

## Bit Ints vs... not-big ints?

A normal integer is 32 or 64 *bits*.

A positive 32-bit integer goes to 4 about billion (or much higher if you use lossy
*floating point* integers that merely *approximate* a number, but we'll
save those for another time). A 64-bit integer goes to... much bigger
(but still pretty small, all things considered).

By contrast an EC key is 32 or 48 *bytes* (and an RSA key is 256 *bytes*).

Also, normal integers (only 4 or 8 bytes) are *aligned* to a byte size.
If you store the number 1 in a 32 byte integer (technically just 1 bit)
it takes up just as much space as the number 100 (7 bits), or 1 million (20 bits).

Big Ints, however, cannot be "aligned" because they don't "fit" into a size that a normal CPU can process (though GPUs may be another story).

If you're used to *dynamic* languages this might hit home right away,
but in *static* languages it's kinda important to know the size of the thing
that has to go into the computer's memory. Big Ints break the rules (or at least convention).

## The Size of Big Ints

Big Ints can store very large numbers because they aren't constrained to a particular byte size.

They can can *also* store very small (or negative) numbers.

We'll use very small Big Ints just for the sake of easy-to-read examples.

Example: `1`

and `256`

as 32-bit integers:

```
# 1
00000000000000000000000000000001
# 256
00000000000000000000000100000000
```

Example: `1`

and `256`

as "Big Ints":

```
# 1
00000001
# 256
0000000100000000
```

## Positive and Negative "small" ints

Whether a number is positive or negative is denoted by the "high-order bit" (the first bit).

An integer can be *signed* (allows negative values) or *unsigned* (positive only).
When an integer allows negative values. Without getting into the full mechanics
of how that works, suffice it to say that the very first bit is used to mark a
number as negative.

This is a 32-bit positive 3:

```
00000000000000000000000000000011
```

Change the first bit to 1 and you get a 32-bit negative 4:

```
10000000000000000000000000000011
```

(negative numbers are 1 less than their positive equivalent because there's no such thing as -0... at least not in the math we care about right now)

## Positive and Negative Big Ints

Big Ints can be either positive or negative just like regular ints, and just like regular ints they use the first bit to denote that.

Unlike regular ints, however, they don't have a set bit size to identify which bit should be the first.

Let's review how some very small numbers might be stored as "Big Ints":

127 (only uses 1 byte, also):

```
01111111
```

But... what if we want to store 128 as a big int?

```
10000000
```

Now we have a problem: That's indistinguishable from -1!!

So if a number is positive, we have to give it a byte of **padding**:

```
0000000010000000
```

Hence a if you're using Big Ints to store data and you have a random 32 *byte*
(256 bit) integer, it may be one of the smaller numbers in the random space and
may actually only use 30 bytes.

If it's one of the larger integers in the space it may actually use 33 bytes (the extra byte for padding), but would still be considered 32 bytes.

## Arbitrary Big Ints Bingo

Now that we've acknowledged the rules, it's time to acknowledge why we sometimes have to break them.

Let's say that you're storing the number `64`

(`01000000`

).

If you're doing normal maths operations it won't matter how many bytes you use. If you square it or multiply it or add to it or subtract you'll get all the same answers either way.

However, here's the rub:

It'll look wildly different in base64 if you store it as one byte (`QA==`

)
vs if you store it as two bytes (`AEA=`

).

For this reason, it becomes important that you store and transport it in the correct way.

### Examples:

**JWK**

In the PEM (a.k.a. ASN.1 / x.509 / DER) formats that you use commonly use for storing
cryptographic keys (which have several parts of various lengths encoded inside),
the Integers are generally stored exactly as described above. However, in JWK (another
common format which also uses base64), they're stored as short as possible (no extra `0x00`

padding for negative or slightly smaller numbers).

**EC**

Within a PEM itself numbers are sometimes embedded within other numbers.
This is the case for EC, which embeds 65-bytes consisting of a single byte `0x04`

which is part of EC number formats to mark that the number is actually two numbers:
an `x`

+`y`

point pair, and then exactly 32-bytes for each point, even if they have
to be padded up from smaller numbers. Negative numbers are *not* considered,
so there's no padding for that as there might otherwise be.

**RSA**

It appears that RSA holds to the ASN.1 rules but, at least in the case of `ssh-keygen`

,
RSA public modules values are *always* at least 2041 bits wide and their first byte is
*always* `0x90`

, meaning that the negative number padding is *always* needed and the
byte-width padding is *never* needed.

By AJ ONeal

**Thanks!**It's really motivating to know that people like you are benefiting from what I'm doing and want more of it. :)

Did I make your day?

Buy me a coffee

(you can learn about the bigger picture I'm working towards on my patreon page )