A SQLite database isn’t just one file
The title kind of spoils the story, but I ran into a confusing bug recently and it seemed interesting enough to write about.
I had just updated the favicon generator to autodiscover new sites that it needs to fetch an icon for. I pushed the change to production, waited for it to finish running, and then loaded a result page to admire my newly retrieved images—but they weren’t there! The more I investigated, the more confused I got:
- Everything had worked fine in my development environment, which is nearly identical to production (it’s made from a copy of the same data on the same server owned by the same user; the main difference is that I run deploy commands manually instead of letting a script do it).
- The production deploy job reported that it had successfully downloaded and processed the new favicons. Despite that, the production webserver was acting like it was still using an older copy of the database that was missing the new icons.
- I restarted the entire server in case it was reading from an old transaction or something, but the icons were still missing.
- OK, so maybe my deploy job output is wrong and there was a silent error that caused the database not to be updated after all? But I looked in the database with the
sqlite3
command-line tool and the new icons were there. - I downloaded the sqlite file from the production library onto my laptop and... the icons weren't in that file, even though using
sqlite3
on the same file on the server showed that they had been there?!
Eventually I happened to notice the output of ls favicons/
:
favicons.sqlite3
favicons.sqlite3-shm
favicons.sqlite3-wal
The shm
(shared memory) file isn’t particularly relevant here, but the wal
file was exactly the crux of the problem: it contains the write-ahead log: when SQLite is sharing a file with both a writer and readers, it uses this file to track changes, rather than writing them to the main file directly (and potentially overwriting data that the readers still need). Eventually it will block database access briefly to perform a checkpoint* and consolidate the WAL changes into the main file, but by default that only happens when the WAL grows to ~4MB and my new favicons were only tens of kilobytes.
OK, so that explained why the icons disappeared again when I downloaded only the favicons.sqlite3
file to my laptop: since the database hadn’t been checkpointed, the icons were still in the WAL which I had forgotten to download. But the production webserver is sharing the same file and didn’t have to download anything, so why wasn’t it seeing the changes?
The answer turned out to be in my docker-compose.yml
file:
services:
app:
...
volumes:
- ./favicon/favicons.sqlite3:/app/favicon/favicons.sqlite3:ro
As you can see, I’m mounting only the main database file into the running container, so it has no way of knowing that the -wal
file even exists. This is mostly the result of a bad design decision that I made when first building favicon support: the favicon.sqlite3
file lives (.gitignore
d) in the same directory as the source code for generating the icons. This was fine for prototyping, but it means that when building the app image I have to exclude the database when I copy the source code into the app image, and then mount the database back in to the container when it runs; and when I set that up I didn’t think of WAL files since there wasn’t one lying around at the time.
The correct solution for this is fairly simple, since this database is mostly readonly so I don’t need to handle migrating live writers:
- Update the code to give the favicon database a dedicated directory, and share it into the Docker container correctly.
- Use the vacuum command to copy the db into that directory in production.
- Deploy the code change so prod is using the new directory.
- Clean up the old copy now that it's no longer needed.
I’m writing this post as a cautionary tale, both for handling SQLite files and more generally for not taking shortcuts that will come back to bite later: I spent at least an hour debugging a problem that wouldn’t have ever happened if I’d taken an extra two minutes to put these files in a slightly more sensible place before rolling this feature out to begin with.