A lot of people commented on my last entry with the “Unicode is not an encoding” retort. This is apparently a pedantic point that people love to make to prove that they know more about internationalization of text than you do.
The first such comment:
Unicode is not an encoding - It’s a charset. It’s also the only sane way to represent characters in memory.
I’ll see your pedantry and raise you one. Unicode is not only a character set, it’s actually a character set and three associated encodings. I will refer you to the Unicode Standard 5.0, section 2.5:
The Unicode Standard provides three distinct encoding forms for Unicode characters, using 8-bit, 16-bit, and 32-bit units. These are named UTF-8, UTF-16, and UTF-32 respectively.
These three encodings are part of the Unicode standard, and thus are part of Unicode.
I will furthermore refer you to the first sentence of the standard:
The Unicode Standard is the universal character encoding standard for written characters and text.
I will furthermore refer you to two paragraphs later, which speaks of “The Unicode character encoding.”
I will furthermore point out that the word “encoding” or “encode” appears 10 times in the opening section of the standard, while the word “set” (or “character set”) appears exactly once, and is in fact referring to ASCII (”the ASCII charcter set”), not Unicode.
If the Unicode Standard itself is allowed to refer to Unicode inclusive as an “encoding,” then I am too. Though I hate to deprive the pedants of a point that they love correcting other people about.