Feep! » Blog » Post

Second thoughts about redo

Despite my previous enthusiasm about redo, I’ve just finished tearing the last of my .do files out and replacing them with shell scripts again. I don’t think the problems I had were redo’s fault, really; I was just asking it to do things it wasn’t really designed for.

The biggest problems I had were with trying to set up hard links to share data between my development and production environments (to save disk space—but that’s a post for another time). The two main showstoppers with Appenwarr redo were:

The file’s permissions are included as part of its “stamp.” This means that if I chmod a-w an output file (to enforce that they have to be replaced atomically, and avoid accidentally overwriting production data), redo thinks that the file has been manually edited and (silently!) stops updating it. I could patch this check out, but the other problem was that:
redo has to keep a build file database, and Apenwarr redo stores file information by absolute path. This means that even if I copy the redo database after hardlinking all of the files, it doesn’t recognize that they’re the same files relative to the source code.

These two issues are with the way Apenwarr redo in particular handles file changes (and, again, I don’t think this is a problem with it—this design makes a lot of sense for a general software build tool, where you want to be as careful as possible about making sure build artifacts aren’t out of sync with the source code. It only falls down in my situation because I need to avoid running scripts unnecessarily and am willing to sacrifice some correctness guarantees to achieve that.)

However, there are also some more general problems with the design of redo that mean switching to another implementation probably isn’t worth it for me.

Per the spec, targets are always rebuilt when their .do file changes. This makes sense, since if the commands used to build a file have changed the output probably has too (this is a well-known problem with make); however, in my case it means I’m afraid to even touch the build files—even just to fix a typo—for fear of accidentally kicking off a several-hour rebuild. (In particular, git rebase across a commit that happened to modify a .do file is liable to change its mtime, even if the result of the rebase doesn't change the final file.)
DJB didn’t specify anything about the .redo database other than its filename; since I have unusual and specific requirements about sym/hardlinking build outputs I’d have to carefully evaluate any implementation I was thinking of using to make sure it’d work for my situation.
The structure of the .do files was confusing me: redo, being a build system, naturally has files “pull” their dependencies; but for this project I tend to think about the process as a pipeline of outputs getting pushed downstream. This means that my mental model didn’t match the actual structure and I frequently had to start at the output script and work backwards to find the file I was actually trying to edit.
Because redo has one .do file per output file, it encourages a lot of small scripts which made it a bit hard for me to visualize the entire pipeline at once.

In light of all this, I decided it was time for yet another rewrite. (I think this is the fifth? I keep thinking “this time I’ve got the right abstraction!”, but this problem is proving bizzarrely difficult to tackle for some reason. At least it’s small enough that it only takes a few hours to write a new system from scratch with the strategy du jour.)

I briefly considered writing my own redo(-ish) implementation, but concluded that keeping it as simple as possible was the best policy for the moment. I therefore have gone back to shell scripts, though the new ones are more carefully written to automatically keep track of changes that will affect downstream data, which should cut down on the confusion about stale data that I experienced last time I was using shell scripts for this.

Given the previous history of this particular problem, I don’t expect this to be the last time I write about this, but hopefully I’ve at least got something that will tide me over for a while. There are also some interesting-looking patterns for dependency management that are revealing themselves, so if these scripts hold up in the long run I may also revisit them later and write another blog post about how they’re structured.