Converting floats to fixed point

Converting a floating point number to fixed point requires some maths, and it involves working with floating point numbers in your code which would appear to make this entire concept pointless. What I use is a C macro so the C compiler should do the literal conversion at compile time and not runtime. Another option is if you have constants that are float values, convert them beforehand and type in the hex representation into your source.

The maths is reasonably simple

#define FIXED_VAL 256

#define FLOAT_TO_FIXED(x) ((int)(x * FIXED_VAL))
#define FIXED_TO_FLOAT(x) (float)x / FIXED_VAL

What this does is take the floating point number, multiply it by 256 – the maximum value 8 bits can store – and then slam it down into an integer to remove the fractional part. The more mathematically inclined of you will now be twitching and realizing this involves a loss of precision.

10.1 * 256 = 2585.6
2585.6 converted to an integer is 2585
2585 in binary is 101000011001

And if we put that back into our table from before we get

128	64	32	16	8	4	2	1	.	1/2	1/4	1/8	1/16	1/32	1/64	1/128	1/256
0	0	0	0	1	0	1	0	.	0	0	0	1	1	0	0	1

And if you now do the maths to turn this back into a fractional number, you get

8 + 2 = 10
1/16 + 1/32 + 1/256 = 0.09765625

Final value = 10.09765625

Which is not 10.1, but it’s “close enough”. If this “close enough” feels wrong, remember that floating point numbers are the same. In fact, maths itself is the same. You know that if you split an item into thirds, you have 1/3 + 1/3 + 1/3 = 1. But convert that on a calculator and suddenly the maths falls apart…

1/3 = 0.333333…
0.333… + 0.333… + 0.333… = 0.999…

Any programmer who has used floats knows the following is unlikely to work:

float area;
    
...
    
if (area == 10.0) { /* code */ }

and is better written as:

float area;
    
...
    
if (area >= 10.0 && area <= 10.01) { /* code */ }

This is why it’s up to us to choose where the fixed point goes to ensure our numbers are accurate enough.

A benefit of using what we will now call 8.8 fixed numbers is that it uses whole bytes, and working on whole bytes is easy for 8-bit CPUs. The Z80, for example, has 16-bit registers so converting a fixed point to an integer involves just taking the high byte out of the 16-bit register pairs.

To avoid confusion, it’s best to create your own type to represent fixed point, rather than trying to remember them or using some awful prefix notation on all your variables.

typedef int16_t fixed8;
    
fixed8 xVel, yVel;
int player_lives;