@Chris K: yes, the fact that all values are tagged with a key in protocol buffers does give it nice forwards and backwards compatibility. It’s also true that if you’re using repeated fields as an array you’ll pay 1 byte per array element, which could be as bad as 100% overhead (if your array is bytes) or as low as 12% overhead (if your array is int64s).
@simon: I want to stay 100% C so that the binaries stay as small as possible (currently 20k compiled) and impose as few requirements on the host platform as humanly possible. You can call C from anything (like even Objective C or LLVM Assembly Language). To me staying 100% C is a no-holds-barred way to be as useful as possible in as many weird situations as possible.
@Dave: Cool, though my intention is to write it in such a way that there is no code generation, so that it can be used by bindings to higher-level languages like Perl, Python, Ruby, etc. So my intended design differs a little from the existing C implementation.
@Eric: thanks man. I had fun at Amazon and definitely remember coming across your name in association with good things!