Tuesday, January 11, 2011

PL3: May we dance with your dates?

This is the third entry in my series of the 9 coding areas pseudo-localization can test.

Amongst ourselves in internationalization, we refer to certain types of data as locale-sensitive. This is data that changes in some way from region to region (locale to locale), typically in format. The classic example is a date. And the classic example of this example is the short format in the US, month/day/year, that becomes day/month/year in most European countries; today is January 11, 2011, written 1/11/11 in the US, but 11/1/11 in the UK. (Often this is accompanied by some US baiting about how illogical that format is, but we won't go there and mention that in human terms, the day becomes less important beyond a week or so, and the month comes into prominence, lasting an entire year, when finally the year is needed, so it's stuck on the end. And it's written month day, year in longer formats, as above. But I won't mention any of that here.)

There are many more locale-sensitive pieces of data: numbers, time, prices, measurements, weights, telephone numbers, addresses, sizes, etc. Some of these data formats have been standardized, so that programs can select them using locale identifiers and apply them to the data. The Common Locale Data Repository (CLDR) is a public database with locale formats for many locales throughout the world (though admittedly there are no sizes, nor telephone numbers, nor postal addresses). Internationalization library functions and methods use it for formatting and parsing locale-sensitive data.

As these formats can be accessed programmatically for most of the locales of the world, they should be. That is, rather than externalize a date format for a localizer to alter to suit another locale, dates should be programmatically formatted. Why? The long answer is another blog entry, however the short answer is that there are many more formats than there are localizations. For example, the English localization is usually a single localization, but formats for English-speaking locales vary (see the short date format example above). The same is true for French, Spanish, Traditional Chinese, and so on.

But what does all this have to do with pseudo-localization testing?

You may remember my post about using pseudo-l10n to test whether localized files are picked up when they should be. Or perhaps you don't, in which case you might want to review it. Anyway, once you've set up your system such that the pseudo-localized files are mimicking an actual locale, you can exploit that setup to check on whether locale-sensitive formats are programmatically determined.

Run the system with the locale set to the pseudo-localized files' locale, and pay special attention to the formats of dates and numbers. Now change the locale setting to one that has different formats and run again. Check the dates and numbers - have they changed format? You might want to verify the formats against those listed in the CLDR; if you're looking online, check the summary charts. Start with the language base link, then drill down to get to the specific locale you're looking for.

Change the locale to a third value and check again. The formats should be changing appropriately each time. If not, to your customers today might just look like November 1, 2011.

No comments: