@Carl: Your noun vs. adjective stuff is making a distinction that the Unicode standard itself does not make. The Unicode standard speaks of “The Unicode character encoding,” speaking in reference to the character set, not the byte-encoding:
“The Unicode character encoding treats alphabetic characters, ideographic characters, and symbols equivalently, which means they can be used in any mixture and with equal facility.”
also:
“The Unicode Standard specifies a numeric value (code point) and a name for each of its characters. In this respect, it is similar to other character encoding standards from ASCII onward.”
According to you, the above statements are nonsense, because the they concern the character set, not the byte-encoding. And yet this is straight out of page 1 of the standard. You’re going to have to admit that there’s some latitude in terminology here, unless this list of people who “don’t know the difference between an encoding and a character set” includes the Unicode Standard 5.0, page 1.
I’m quite aware of the difference between a character set and a character encoding, when this specific distinction is drawn. Python is only capable of representing Unicode characters internally, which means that round-trips from encodings that are not injective onto Unicode will be lossy.