Thursday, November 09, 2006

Myth #4 - Are we having fun yet?

And so the myths go on, and internationalization folks keep chuckling with a look of irony on their faces. Here is Myth #4:

"My product supports Unicode and therefore it's internationalized."

Interesting. Of course, being a myth, it isn't true. Why isn't it true? Doesn't Unicode cover all languages worldwide? Would that it were that easy. In brief, Unicode is a coded character set (see RFC 2130 The Report of the IAB Character Set Workshop), that is, a set of characters associated with a sequence of integers. It defines some character encoding schemes (and forms, but we won't get into that) which take the values associated with the characters and translate them into bytecodes that computers can understand. OK, so what's all this got to do with the myth? The point is that only characters, or parts of characters, are encoded. There is no language information, no locale information, no font information, very little rendering information. What's more, the requirements for supporting Unicode are very lenient. In other words, your product can "support Unicode" and yet only recognize a single character. So, if a product only supports Unicode for its internationalization, not only are you missing information vital to internationalization, but you might not even handle the languages that Unicode covers.

And therein lies the myth. By all means, support Unicode. Support all the currently assigned characters. Unicode is a handy tool for supporting languages around the world. But it is not an internationalization silver bullet panacea elixir cure-all.