I just spent SIX HOURS debugging a NULL POINTER...

17 August 2018 8:25 pm

I Quit, No More...

That's it. I give up. I'm going to take up something nice and simple like watching TV or trying to drink my bodyweight in beer. Programming has the ability to make you feel insanely clever one moment, then make you feel incredibly stupid the next moment, with a gradual slide into insanity in the middle.

The Clever Bit

So I'm trying to make some sort of game engine thing for my ODroid-GO, which as you should know by now (I've been mentioning it a few times) is based off an ESP32 microcontroller. This thing has barely any RAM, and likes to crash if you fill it too much.

I'm writing a game, it needs images and sprites and they take up room. Until I started running out of a certain kind of RAM, my method for getting images into the code was via C header files. I even wrote a nice little utility in Python that does it for me. Feed it a PNG file, it squirts out a C header that you include in your source and there's all the image data as a giant array.

This doesn't work very well for large images, it makes the compiler give up with cryptic error messages that I think mean "you've filled my heap(?) and I have no room to run your code!" or something. The inner workings of an assembled binary is a rabbit hole I'm trying not to fall into.

The ODroid-GO has an SD card slot attached to the SPI bus, in addition to the screen. "Real" games use the SD slot for loading things, so maybe mine should too. I didn't want to do this originally because it seemed like a lot of effort (hint, it is!) but not being able to load images was evidently all the motivation I needed to tackle this particularly deep rabbit hole.

The SD Slot

It's a slot, you stick an SD card in it and then use the C fopen() and fread() commands to read data out of it into memory. Memory that you allocate off the heap using malloc(), so far nothing too fancy is going on, it's just standard C.

Except no it's not. Not quite. You see the ODroid-GO runs its own little OS, so first you need some code to "mount" the SD card in a virtual filesystem, which isn't documented anywhere apart from deep in a forum and the bowels of some source code. If I didn't have access to Google, I'd still be trying to figure that out. Working out how to read data off the SD card was difficult, but not too difficult.

The difficult part is that trying to access the SD card after the screen has been initialised makes things crash. There's some voodoo going on in the hardware, and I'm not going anywhere near that. I did discover that if I read data out the SD card before initialising the screen nothing crashes, but then after initialising the screen I can't use the SD card again. A minor "feature" that I'm sure will annoy me in the future.

So the data then...

Right, what am I reading off the SD card then? Well I'm not loading a PNG image, they seem like they need libraries to decompress and unpack the juicy pixels contained within them. Screw that, I'll do what any person would do - invent my own image file format!

This bit is actually pretty cool, I rewrote my Python tool to create a binary file that has a 16 byte header containing image size and pixel density, followed by the pixels themselves in RGB565 format. Later, when sprite sheets are implemented the header will also contain how many sprites are present and their sizes. It's great, I create an image in Gimp, run it through my tool and copy the file onto the SD card. The code loads the file into RAM and ...

and...

Everything goes wrong

Like, there's garbage on the screen, the device reboots itself, sometimes the SD card stops working. I'll say one thing though, try as I might I can't brick the device - the USB programmer always kicks it enough that new code can be uploaded. However seeing this is a bit depressing really

I'm a C programmer, I'm not rooting around in assembly code trying to figure out what's going on. No. I'm going to blunder about for SIX HOURS trying to fix things. I mean, I had plenty of things to poke at. Let's list some of the possible issues:

  1. I am reading data off an SD card
    1. So maybe the file isn't there [it is]
    2. Maybe my Python tool to create the data is not working [it is, I hand-decoded some data to check]
    3. Maybe my SD card is corrupt [nope, try again]
  2. I'm trying to allocate large chunks of RAM
    1. Maybe there isn't enough [nope, keep going...]
    2. I have an array of pointers, maybe my pointer addressing is wrong [no, and don't start putting random * and & in front of variables again]
    3. Maybe the allocations aren't actually working [mmm nope, there's some assert() checks, they don't trigger]
  3. I'm trying to copy data from one part of RAM to another
    1. Perhaps I'm doing the old off-by-one bug [no]
    2. Maybe... [no, just give up and go make a cup of tea]

What was actually wrong?

I'd love to say this was an obscure and deeply exciting bug that taught me something new and fundamental. After all, I did mention NULL POINTER in the title. Sounds exotic, doesn't it? Things don't usually generate NULL pointers unless things have properly broken.

You know what else generates NULL pointers?

Idiots trying to shut compilers up.

I have this function:

It accepts a filename and then goes through the following routine

  1. Allocate RAM for the image
  2. Load the image header off the SD card
  3. Allocate RAM for the image data, using the header (this bit worked 100% perfectly, second time I tried)
  4. Read the image data into RAM
  5. Return a pointer to the new image structure so that I can use it later on

Here's how I use it

Notice the printf() statement? That was me finally losing sanity and thinking "OK, so the variable that is allocated might be getting mangled, let's print out its memory address".

This is what it printed...

And that's when it hit me, like a low doorway I've not noticed. Earlier this afternoon, sometime just after dinner I was debugging the "read the file header and work out what size image we have" part of the code. It's complex, it reads bytes out of a file and expects a certain pattern of them. I even had to remember about memory alignment to make it work properly.

Naturally, being really difficult, I paid extra care to the code and it only broke once because of a typo.

At that time I didn't care about the rest of the code, but it wouldn't compile because it wanted a pointer to a structure. To shut the compiler up I did this:

I should have done this...

Because if I had, I wouldn't have just spent six hours trying to debug a device that acted exactly like its RAM was faulty. Really, it went nuts. I had a stack overflow at one point, another time I copied nonsense garbage onto the screen which crashed it, another time I tried writing into the invalid memory location and it really didn't like that... sometimes...

C is great, when you shoot your feet off you get to keep the remains.

So what did I learn? Probably nothing, I'll make this mistake again repeatedly, programming is like that. Sure, I know that you're supposed to return things from functions, the compiler even makes me do it. What I can't prevent is me deciding "Who cares what it returns, I'm not up to that bit yet, shut up... here, have a zero to be quiet".

You know, I probably created that bug about 30 seconds after creating the function's declaration. I bet I made an empty function like all good programmers do, and made it return NULL so that trying to use it would cause a crash. I guess I never then made the leap of imagination to slap an assert() check after using the function. I mean, it's not like I did it everywhere else...

To get meta for a moment I suppose the thing I have reinforced is that no matter what the bug is, or how obscure it is, I will eventually find it and fix it. Even if it takes SIX HOURS.