Feep! » Blog » Post

# The Constant DataBase file format

For a new feature I’m building (“quick results”, coming Soon™), I needed a simple key-value file format, ideally one that was fast and easy work with. The “CDB” format seemed to fit the bill, but there weren’t any JavaScript libraries for it that I liked, so I found myself digging in to the internals to undertand how it worked; and this blog post is a summary of my investigation into the details of the file format.

(Why not SQLite? Well, I was using that originally, but bulk-updating the database seems to have caused a lot of churn in the B-Tree index it uses; I want to store this data on a spinning HDD, and the resulting seeks caused updates to become very slow. CDB is optimized for linear writes, which is perfect for a HDD, and most lookups take two disk seeks or less, which is also about the optimum. The tradeoff is that I need to rewrite the entire database to update it, but that’s fine for this use case since updates are rare and rebuilding the .cdb isn’t the slow part anyway.)

Most of the information in this post was gleaned from the cdb format specification and cdb(5) from tinycdb. Yusuke Shinyama also has a diagram of the cdb file structure; and R. J. Howe’s implementation of cdb has some notes on the format which I only discovered after writing this post.

## What is CDB?

“CDB” (short for “Constant DataBase”) is a file format for a write-once key-value store. It was created by D. J. Bernstein for use in djbdns, qmail, and others; as it’s a useful general-purpose format, various other libraries and utilities have since been created for it. (In particular, I’m using TinyCDB to build my database files, though I’ve written my own utility to read them.)

Some quick facts:

• Write-once (adding entries to the database requires rewriting the file, but this is quite fast; and this approach allows atomic updates using POSIX filesystem semantics, by writing the new database to a tempfile and then renaming it over top of the old file.)
• Based on hash tables
• Keys and values can be arbitrary byte strings
• Keys can have multiple values associated
• Total file size is $2048 + \Sigma\big(\text{len}(key) + \text{len}(value) \;+ \sim\! 24\big)$ bytes
• Key lookup usually takes only two (for successful lookups) or one (for failed lookups) disk accesses
• All internal numbers (i.e. everything except the keys and values) are unsigned 32-bit integers, stored as little endian.
• Supports file sizes up to 4GB (there are some implementations that can use uint64 for internal numbers instead, removing this limitation)

## File layout

A CDB file is arranged in three sections, like so:

┌──────────────────────────────────────┐
├──────────────────────────────────────┤
│                                      │
│             Data section             │
│                                      │
│                   ┌──────────────────┤
├───────────────────┘                  │
│             Hash tables              │
│       (aka “Index section”) ┌────────┘
└─────────────────────────────┘


(There is no alignment or padding; everything is squished together as much as it will go.)

I believe the sections are ordered like this so that, when writing the database, the writer can:

1. Reserve 2048 bytes at the start of the file
2. Dump all the input into the data section and forget about it, retaining only the hash and offset
3. Write the hash tables, and go back to fill in the TOC.

This means cdbmake only needs to keep 16 bytes per record (the hash and offset) in memory while preparing the database; the keys and data can total more than available memory without causing any problems.

## Hash function

Since CDB files are based on hash tables, we need a hash function. CDB uses (seemingly arbitrarily) the following:

\begin{align*} &{\small\text{uint32 }}\texttt{hash}({\small\pmb{\normalsize[\,]}\text{byte }}key) \Rightarrow key\text{.reduce}(\\ &\hspace{2em} 5381,\\ &\hspace{2em} ({\small\text{uint32 }}hv, {\small\text{byte }} c) \Rightarrow \big(\; ((hv \mathbin{<\!\!<} 5) + hv) \mathbin{^\wedge} c \;\big)\\ &) \end{align*}

(Note that all hash calculations are done with ${\small\text{uint32}}$ values; I had some trouble because JavaScript’s << operator uses signed values. The fix for JS seems to be to add ‘>>> 0’ onto the end of the expression, to recast from ${\small\text{sint32}}$ to ${\small\text{uint32}}$.)

## Looking up records

To look up a key in the database, first calculate $HV = \texttt{hash(}key\texttt{)}$, and then read the file as explained below.

The TOC (also known as the “initial pointers”) takes up the first 2048 bytes of the file $(256 \text{ entries} \times \frac{2 {\text{ uint32}}}{\text{entry}} \times \frac{4 \text{ bytes}}{\text{uint32}})$, and has the type:

$\texttt{toc} : \pmb{\big[} 256 \pmb{\big]} \{{\small\text{uint32le }}\texttt{fileOffset}, {\small\text{uint32le }}\texttt{slotCount}\}$

Each key is bucketed into a hash table $T$ by $T = (HV \bmod 256)$. If $\texttt{toc}[T].\texttt{slotCount}$ is zero, that entire hash table doesn’t exist; if a hash value goes into that bucket we can immediately conclude that the key isn’t in the database.

Hash table $T$ is found at $\texttt{toc}[T].\texttt{fileOffset}$ bytes into the file, and has the type:

$\texttt{table} : \pmb{\big[} \texttt{toc}[T].\texttt{slotCount} \pmb{\big]} \{{\small\text{uint32le }} \texttt{hash}, {\small\text{uint32le }}\texttt{fileOffset}\}$

To find a hash $HV$ in the hash table, start at slot $S = \big((HV \mathbin{>\!\!>} 3) \bmod (\texttt{toc}[T].\texttt{slotCount)}\big)$. ($>\!\!>3$ is equivalent to $\div\:256$, rounding down. For JavaScript, again, I had to use >>> 3 instead to get unsigned semantics.) Then proceed through the table; for each slot:

• If $\texttt{table}[S].\texttt{fileOffset}$ is zero (an “empty slot”), stop; there are no more slots relevant to this hash value.
• If $\texttt{table}[S].\texttt{hash}$ equals $HV$, check the record at $\texttt{table}[S].\texttt{fileOffset}$ (see below).

If you get to the end of the table before you stop, wrap around to the beginning (at $\texttt{toc}[T].\texttt{fileOffset}$) and continue until you get to an empty slot or come back to where you started.

The data section contains all of the “records” (key+value pairs) in the order they were added to the database; each one has the type:

$\{ {\small\text{uint32le }}\texttt{keyLength}, {\small\text{uint32le }}\texttt{valueLength}, {\small \pmb{\normalsize[} \texttt{keyLength} \pmb{\normalsize]} \text{byte }} \texttt{key}, {\small \pmb{\normalsize[} \texttt{valueLength} \pmb{\normalsize]} \text{byte }} \texttt{value} \}$

Normally, you’d find a record by getting a $\texttt{fileOffset}$ from the hash table, but it’s also possible to read all records in a file sequentially by starting at byte 2048 (just after the TOC) and scanning through records until byte $\min(...\texttt{toc[}{*}\texttt{].fileOffset})$ (just before the first hash table).

When a slot with a matching hash is found in the hash table:

1. Look up the record starting at $R = \texttt{table}[S].\texttt{fileOffset}$ bytes into the file.
2. If $\texttt{file}[R].\texttt{keyLength}$ does not equal $\text{len}(key)$: return to reading the hash table; don’t bother looking at $\texttt{file}[R].\texttt{key}$ since if the length is different it can’t possibly be the same key. (This happens when there’s a hash collision with a key of a different length.)
3. If $\texttt{file}[R].\texttt{key}$ does not equal $key$: return. (This happens when there’s a hash collision with a key of the same length.)
4. Yield $\texttt{file}[R].\texttt{value}$, then return to reading the hash table for additional values.

## Conclusion

And with that successful lookup, so ends my little tour of the CDB file format. This was the first time in a while that I’ve done any work involving binary formats, so it was an interesting diversion. I hope this description gives you an idea of how cdb works, and why I’ve decided to use it as the basis for this new feature.