Monday, January 21, 2013

PL7: I-O, I-O, so off to work I go

I've said this before, but it bears repeating:  People buy software to do something to data.  For a more extensive discussion of this, see the Internationalization Myth #1 blog.  Go on and read it, it's very short, and I can wait.

There, that didn't take long.  And now my meaning is clear, you can see how important it is to test software to find out whether it can handle data in a particular character encoding.  But the problem is, where can you get the encoded data to run through the system in a hurry?  Easy, just take your test data files, of which you most certainly have an extensive collection, and pseudo-localize them.  You might need to tweak your pseudo-localization tools a bit, but it's well worth it.

If the data is pseudo-localized with characters from a broad cross-section of Unicode, you'll be able to see if Unicode is processed and output correctly, without mangling anything.  Using other character sets, such as EUC-JP, will bear out whether your software can handle those as well.  It doesn't matter if it's supposed to convert, reject, or process these character sets directly, so long as it handles them in an appropriate manner for the markets you want to sell to.

Check the output and make sure the data has not been corrupted.  For example, if the phrase "Some pseudo localized text" were pseudo-localized with a cross-section of Unicode characters, it might look like this:
§õʍϵ Рѕәטᴆᴑ ᴌᴼᶜᶏḼἷ₹ﭺ口 下王丈৳
Just in case it doesn't get to you unmangled via this blog, or you don't have the fonts loaded, here is an image:
If the software you're testing is supposed to store it in a database, and then retrieve it intact, make sure that the output exactly matches the input.  If it's supposed to break the text into individual characters, check that the characters are the same.  You get the idea.  With pseudo-localization, it should be easy to automate the testing (provided the testing software is properly internationalized) as well as visually test it.

Please don't consider this a substitute for proper multilingual testing.  Each language has its own formatting and processing rules that cannot be tested via a simple pseudo-localization.  The classic example is Thai, which has no spaces or syntactic characters to divide words.  In order to parse Thai words correctly, a dictionary must be used, along with some grammatical processing.  In correct pro cess in go pens up big can sofw. or Ms.  Forth E sake of you ruse RS, create some actual input test data from the languages used in your markets.

And remember that this does not test locale-specific data handling.  That data will also have to be created to represent locales in your target markets.

Test the I-O, your 亹stomers wiѭ th⌘k yo=FA for ☃.  Or at least they'll be able to.