JavaScript’s bitwise operators, demystified with diagrams
As I mentioned briefly in my post on the CDB file format, I had some trouble understanding how the JavaScript bitwise operators worked. I couldn’t find a good resource that clearly explained this, so now that I’ve spent a bunch of time figuring this out, here are some diagrams to help clarify the situation.
(Apologies if you’re reading on a phone; the pictures in this one will end up pretty tiny.)
Sorry, wrong Number
A JavaScript Number
is an IEE 754 float64
, with a sign bit, 11 bits of exponent, and 52 bits of fraction:
However, the bitwise operators do not operate on this format! Confusingly, they all work as though they’re operating on a 32-bit integer even though that data type doesn’t appear anywhere else in JS. This means that the bits that the “bitwise” operators operate on have nothing to do with the bits of the Number
; instead, the operators effectively:
- Convert the
Number
into asint32be
(I’ll explain what that looks like below) - Perform the specified bitwise operation
- Convert the resulting value back into a
float64
That is, the bitwise operators take the value of the number (truncated if it’s too big/has a fractional part), but work as though it’s a completely different binary format:
In practice, I suspect this isn’t quite as bad as it sounds: I assume JS engines do some optimization here. But it does mean that you have to ignore the fact that a Number is usually floating point, which is confusing.
Two’s complement (and 32’s a crowd)
So if it’s not a float64
, what is the format that the bitwise operators operate on? It’s:
- 32 bits
- Big endian
- Signed, two’s complement (exception:
>>> 0
interprets the bits as an unsigned int32; see below)
Whew! Let’s break that down. For the rest of this article, I’m going to be representing these numbers with boxes like this:
As you can see, there’s 32 of them; which covers the “32 bits” part. And the highest bits (the “big end”) of the number comes first, so that’s what “big endian” means. (For the bitwise operators, it doesn’t make a difference how the computer actually stores them; but since there are operators called “left shift” and “right shift”, it’s important to know which direction they go conceptually.)
Finally, “two’s complement”, which is a little more complicated to explain. I’ll do it with 8-bit numbers, since those are shorter. First, positive numbers (and zero) look the way you’d expect from basic binary:
0000 0000 = 0
0000 0001 = 1
0000 0010 = 2
0000 0011 = 3
...
0111 1110 = 126
0111 1111 = 127
As you can see, all the positive numbers have their first bit set to 0. Once that bit flips, we change the sign and start counting down.
1000 0000 = -128
1000 0001 = -127
1000 0010 = -126
...
1111 1110 = -2
1111 1111 = -1
I always have to sit down and work out how this happens mathematically, so I’m not going to try to explain it. Suffice it to say, this is why when the first bit is set, the numbers go negative and act a bit weirder. See the discussion of >>> 0
below if you’d like unsigned values.
Hello, Operator?
Logical operators
The bitwise logical operators simply apply an operation to each of the 32 bits in the value, and produce a corresponding bit in the output value. They don’t treat the sign bit specially, which as mentioned above can result in magnitude changes in the resulting Number, but otherwise are straightforward; I summarize them here for completeness:
A & B - sets output bit if bit is set in both A and B
A | B - " " " " " " " " either A or B
A ^ B - " " " " " " " " A but not B, or vice versa
~A - inversion of bits set in A
Shift operators
The shift operators have the format value <<|>> N
, and shift the 32 bits of value
in the specified direction by N
positions. (If N
is more than 32 it wraps around, but you probably shouldn’t do that to avoid confusion.)
<< - Left shift
Left shift, as the name implies, shifts the bits left; the leftmost N
bits of the value
are discarded and the rightmost N
bits are set to zero.
Note that this will result in the sign bit being set to whatever the Nth
bit happened to be, so the value may shift in magnitude unexpectedly.
>> - Sign-propagating right shift
Also known as “sign-preserving”, “sign-filling”, “arithmetic” right shift, or just “right shift”, this operator similarly shifts the bits right, discarding the rightmost N
bits. However, the leftmost N
bits are set to the same as the original sign bit.
(For values less than 231, where the sign bit is zero, this is the same as >>>
.)
>>> - Unsigned right shift
Also known as “zero-fill right shift”, this shifts the bits to the right in the same way as >>
, but fills in the leftmost N
bits with zeros. (This of course results in a positive number, regardless of the sign of the original value.)
This is specified as treating the value as an unsigned integer, but since it sticks zeros in the sign anyway that makes no difference except in the special case of >>> 0
explained below.
>>> 0 - Convert as unsigned
Unsigned right shift by zero is an unusual special case. It looks like it shouldn’t cause anything to happen:
However, because >>>
is specified as interpreting its operand as an unsigned value, when the bit pattern is converted back into a Number
, it also comes out unsigned! This means that, since the values are treated as “just bit patterns” by the rest of the operators, you can wrap a complicated bit manipulation in (...) >>> 0
and it will come out as though the entire calculation had been done with unsigned integers, even if some of the intermediate values look negative when examined.
This also allows for convenient way of examining the bit patterns when experimenting with operators:
((expression)>>>0).toString(2).padStart(32, '0') // Print bit pattern resulting from expression
<< 0 - Convert as signed
Similarly, though probably less necessarily, << 0
or >> 0
(or |0
, or &0xFFFF_FFFF
, or various other incantations) can be used to convert a Number which represents a bit pattern that was treated as unsigned back into one where the sign bit is interpreted. (That is, val >>> 0
and val << 0
both represent the same bit pattern, but when the sign bit is set they come out as two very different-looking Numbers.)
Conclusion
I hope this post cleared up some of the fog around bitwise operators. I’ve always found them to be one of the more confusing parts of the language, especially for how simple they look. I suspect there’s some early history that led to the situation we have today, but armed with these diagrams I’ve been able to work out what I need to get my CDB reader implementation working as it ought.