The I18n G.A.L.: 2013

Sunday, August 11, 2013

PL8: On the edge

Yet another area in my 9 coding areas pseudo-localization can test is expansion and edge case handling on I/O data.

In PL5: Expanding your universe, I discussed how pseudo-localization of the user interface can test the effect of translation expansion on the look and layout. Here I'm focusing on I/O. If input/output data is pseudo-localized such that strings are significantly expanded, it can verify the handling of expanded strings. This is not just a display issue, it is also and more significantly a processing issue. Once again, I repeat my mantra: people buy software to do something to their data. If that something is not done correctly, the software is not useful to them, and they will look for software that is.

What tends to be difficult to determine is what an input field maximum is (and should be). There is a count question: should the limit be in bytes or characters? This is less important than determining how many characters should logically be allowed in a particular field. Even fields that seem fixed in one country could vary considerably in another. For example, postal codes can vary from a length of zero (no code) to as many as 12 characters (for some US territories). This brings up another question: should the length restrictions vary with the locale? (Or the localization?) The same questions hold for a minimum size.

Determining what should happen if the data is greater than the maximum is important. Should the data be truncated or rejected? Should a warning message pop-up if it's a user input field (as opposed to programmatic input), or should the user simply not be allowed to continue typing? Again, similar questions need to be answered for below minimum.

All of the above is in the realm of design rather than testing, but the point is that the desired behavior should be defined. Pseudo-localization testing will determine what is happening in certain circumstances, but if the behavior is undefined, it is difficult to decide what sort of pseudo-localization needs to be run, and how many locales should be tested.

Similar to PL5, you're looking at the layout and aesthetics of the UI with the expanded text: is it wrapping awkwardly? Is it overlapping other elements? Is it skewing the overall layout? More importantly, though, is the integrity of the text: has it been truncated? Is the last character intact? (if the end of the pseudo-localized string ends in a multibyte character) Has some of the middle text been lost due to over maximum characters overlaying the previous characters in the string?

To the extent that I/O will be displayed, pseudo-l10n can help test paging on smaller devices, as well as PC and tablet. The expansion from the pseudo data will likely move things around, and it's important to determine what effect it will have, and whether it causes a problem for users working in other languages.

At the risk of boring you with repetition, this is not a substitute for testing with multilingual data, simply a quick way to get data for early testing of these important cases.

Wednesday, May 08, 2013

Linking all the Internationalization Myths

Since the Internationalization Myth series spans quite a length of time, I thought I'd put links to all the posts in one blog entry, so without further ado, here they are!

Republishing the myths - background information on the series.

Myth #1: "Internationalization means externalizing the user interface so the software can be translated."

Myth #2: "Translators choose the best phrase in the target language."

Myth #3: "The code is in Java and therefore it's internationalized."

Myth #4: "My product supports Unicode and therefore it's internationalized."

Myth #5: "My product uses open source and so internationalization requirements don't apply."

Myth #6: "ISO-8859-1 is the standard encoding for HTML." (This one has almost gone away.)

Myth #7: "All company employees speak English, so only English needs to be supported by internal tools."

Myth #8: "Administration interfaces don't need internationalization."

Myth #9: "We've never localized this product/module/component/blidget, so it doesn't need internationalization."

Myth #10: "We added internationalization in the last release, so we're done."

Myth #11: "If something is wrong, our customers will tell us."

Myth #12: "My product works in Japanese, therefore it's internationalized."

Myth #13: "Internationalization is implemented after the base product and is written by a separate group of engineers."

Myth #14: "Internationalization is only needed in the software development department."

Myth #15: "Internationalization means making the code easily localizable."

Monday, January 21, 2013

PL7: I-O, I-O, so off to work I go

I've said this before, but it bears repeating: People buy software to do something to data. For a more extensive discussion of this, see the Internationalization Myth #1 blog. Go on and read it, it's very short, and I can wait.

There, that didn't take long. And now my meaning is clear, you can see how important it is to test software to find out whether it can handle data in a particular character encoding. But the problem is, where can you get the encoded data to run through the system in a hurry? Easy, just take your test data files, of which you most certainly have an extensive collection, and pseudo-localize them. You might need to tweak your pseudo-localization tools a bit, but it's well worth it.

If the data is pseudo-localized with characters from a broad cross-section of Unicode, you'll be able to see if Unicode is processed and output correctly, without mangling anything. Using other character sets, such as EUC-JP, will bear out whether your software can handle those as well. It doesn't matter if it's supposed to convert, reject, or process these character sets directly, so long as it handles them in an appropriate manner for the markets you want to sell to.

Check the output and make sure the data has not been corrupted. For example, if the phrase "Some pseudo localized text" were pseudo-localized with a cross-section of Unicode characters, it might look like this:

§õʍϵ Рѕәטᴆᴑ ᴌᴼᶜᶏḼἷ₹ﭺ口下王丈৳

Just in case it doesn't get to you unmangled via this blog, or you don't have the fonts loaded, here is an image:

If the software you're testing is supposed to store it in a database, and then retrieve it intact, make sure that the output exactly matches the input. If it's supposed to break the text into individual characters, check that the characters are the same. You get the idea. With pseudo-localization, it should be easy to automate the testing (provided the testing software is properly internationalized) as well as visually test it.

Please don't consider this a substitute for proper multilingual testing. Each language has its own formatting and processing rules that cannot be tested via a simple pseudo-localization. The classic example is Thai, which has no spaces or syntactic characters to divide words. In order to parse Thai words correctly, a dictionary must be used, along with some grammatical processing. In correct pro cess in go pens up big can sofw. or Ms. Forth E sake of you ruse RS, create some actual input test data from the languages used in your markets.

And remember that this does not test locale-specific data handling. That data will also have to be created to represent locales in your target markets.

Test the I-O, your 亹stomers wiѭ th⌘k yo=FA for ☃. Or at least they'll be able to.