
One caveat with implicitly converting a large integer into a float is that you lose precision. In the example below you can clearly see that something is amiss.
1 2 3 4 5 6 7 8 9 10 | |
The preceding program should output the following.
1 2 3 | |
But why does this happen? In this post we’re going to jump down the rabbit hole and see if we can learn more. First, copy and compile this program with the -g flag so that debugging information will be generated.
1
| |
Now we’re going to inspect the example binary with GDB. We need to read out the raw bytes of each variable. And since an int and a float are both 32 bits/4bytes, we can use the x/4b command to read out the first 4 bytes of each variable.
1 2 3 4 5 6 7 8 9 10 11 12 | |
I’m on a little endian machine, so I’m going to reverse the bytes to make things easier to read and understand. And I’m also going to go ahead and convert our bytes from hexadecimal to binary.
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
x and z are ints, so their layout in memory is very straightforward. The bytes literally represent our decimal number in binary.
y is a float, and is laid out in a different manner. Now, remember that I reversed the bytes earlier. This will come in handy now, and will make deciphering a binary float much easier.
An IEEE 754 binary32 single precision float will consist of 1 sign bit, 8 exponent bits, and 23 mantissa bits.
What’s a mantissa? The significand of a floating-point number.
1 2 3 | |
1 2 3 | |
Below you’ll see that the the mantissa portion of y lines up perfectly with x and z starting at the 6th bit. But how exactly does the compiler truncate x into the 23-bits that make up the mantissa portion of a float?
1 2 3 | |
1 2 3 | |
After a bit of Googling I found a simple algorithm to convert a binary int into a binary float. The algorithm is as follows.
- set sign to 1 if negative else 0
- shift bits left until a 1 is encountered, and count # of shifts
- mantissa = bits[# of shifts : 23 + # of shifts]
- exponent = 127 + (32 - # of shifts)
So if we apply this algorithm to our input of int x it results in 6 shifts. And the first 23-bits from that shift point gives us our mantissa. These results match up perfectly with the raw bytes that we read from GDB.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
1 2 | |
This shows how magnitude is preserved and precision is lost. The algorithm takes 23-bits starting from the most significant bit, resulting in a loss of 3-bits of precision. If the integer were larger the loss would be greater.











