The I18n G.A.L.

All things international, only some of them software

Sunday, August 11, 2013

PL8: On the edge

Yet another area in my 9 coding areas pseudo-localization can test is expansion and edge case handling on I/O data.

In PL5: Expanding your universe, I discussed how pseudo-localization of the user interface can test the effect of translation expansion on the look and layout. Here I'm focusing on I/O.  If input/output data is pseudo-localized such that strings are significantly expanded, it can verify the handling of expanded strings.  This is not just a display issue, it is also and more significantly a processing issue.  Once again, I repeat my mantra: people buy software to do something to their data.  If that something is not done correctly, the software is not useful to them, and they will look for software that is.

What tends to be difficult to determine is what an input field maximum is (and should be).  There is a count question:  should the limit be in bytes or characters?  This is less important than determining how many characters should logically be allowed in a particular field.  Even fields that seem fixed in one country could vary considerably in another.  For example, postal codes can vary from a length of zero (no code) to as many as 12 characters (for some US territories).  This brings up another question:  should the length restrictions vary with the locale?  (Or the localization?)  The same questions hold for a minimum size.

Determining what should happen if the data is greater than the maximum is important.  Should the data be truncated or rejected?  Should a warning message pop-up if it's a user input field (as opposed to programmatic input), or should the user simply not be allowed to continue typing?  Again, similar questions need to be answered for below minimum.

All of the above is in the realm of design rather than testing, but the point is that the desired behavior should be defined.  Pseudo-localization testing will determine what is happening in certain circumstances, but if the behavior is undefined, it is difficult to decide what sort of pseudo-localization needs to be run, and how many locales should be tested.

Similar to PL5, you're looking at the layout and aesthetics of the UI with the expanded text: is it wrapping awkwardly? Is it overlapping other elements? Is it skewing the overall layout?  More importantly, though, is the integrity of the text:  has it been truncated?  Is the last character intact? (if the end of the pseudo-localized string ends in a multibyte character) Has some of the middle text been lost due to over maximum characters overlaying the previous characters in the string?

To the extent that I/O will be displayed, pseudo-l10n can help test paging on smaller devices, as well as PC and tablet.  The expansion from the pseudo data will likely move things around, and it's important to determine what effect it will have, and whether it causes a problem for users working in other languages.

At the risk of boring you with repetition, this is not a substitute for testing with multilingual data, simply a quick way to get data for early testing of these important cases.
Because you want to do unto data as your users expect you to do.

Wednesday, May 08, 2013

Linking all the Internationalization Myths

Since the Internationalization Myth series spans quite a length of time, I thought I'd put links to all the posts in one blog entry, so without further ado, here they are!
  1. Republishing the myths - background information on the series. 

  2. Myth #1: "Internationalization means externalizing the user interface so the software can be translated." 

  3. Myth #2: "Translators choose the best phrase in the target language." 

  4. Myth #3: "The code is in Java and therefore it's internationalized." 

  5. Myth #4: "My product supports Unicode and therefore it's internationalized." 

  6. Myth #5: "My product uses open source and so internationalization requirements don't apply." 

  7. Myth #6: "ISO-8859-1 is the standard encoding for HTML." (This one has almost gone away.)

  8. Myth #7: "All company employees speak English, so only English needs to be supported by internal tools."

  9. Myth #8: "Administration interfaces don't need internationalization." 

  10. Myth #9: "We've never localized this product/module/component/blidget, so it doesn't need internationalization." 

  11. Myth #10: "We added internationalization in the last release, so we're done." 

  12. Myth #11: "If something is wrong, our customers will tell us." 

  13. Myth #12: "My product works in Japanese, therefore it's internationalized." 

  14. Myth #13: "Internationalization is implemented after the base product and is written by a separate group of engineers." 

  15. Myth #14: "Internationalization is only needed in the software development department."

Labels: ,

Monday, January 21, 2013

PL7: I-O, I-O, so off to work I go

I've said this before, but it bears repeating:  People buy software to do something to data.  For a more extensive discussion of this, see the Internationalization Myth #1 blog.  Go on and read it, it's very short, and I can wait.

There, that didn't take long.  And now my meaning is clear, you can see how important it is to test software to find out whether it can handle data in a particular character encoding.  But the problem is, where can you get the encoded data to run through the system in a hurry?  Easy, just take your test data files, of which you most certainly have an extensive collection, and pseudo-localize them.  You might need to tweak your pseudo-localization tools a bit, but it's well worth it.

If the data is pseudo-localized with characters from a broad cross-section of Unicode, you'll be able to see if Unicode is processed and output correctly, without mangling anything.  Using other character sets, such as EUC-JP, will bear out whether your software can handle those as well.  It doesn't matter if it's supposed to convert, reject, or process these character sets directly, so long as it handles them in an appropriate manner for the markets you want to sell to.

Check the output and make sure the data has not been corrupted.  For example, if the phrase "Some pseudo localized text" were pseudo-localized with a cross-section of Unicode characters, it might look like this:
§õʍϵ Рѕәטᴆᴑ ᴌᴼᶜᶏḼἷ₹ﭺ口 下王丈৳
Just in case it doesn't get to you unmangled via this blog, or you don't have the fonts loaded, here is an image:
If the software you're testing is supposed to store it in a database, and then retrieve it intact, make sure that the output exactly matches the input.  If it's supposed to break the text into individual characters, check that the characters are the same.  You get the idea.  With pseudo-localization, it should be easy to automate the testing (provided the testing software is properly internationalized) as well as visually test it.

Please don't consider this a substitute for proper multilingual testing.  Each language has its own formatting and processing rules that cannot be tested via a simple pseudo-localization.  The classic example is Thai, which has no spaces or syntactic characters to divide words.  In order to parse Thai words correctly, a dictionary must be used, along with some grammatical processing.  In correct pro cess in go pens up big can sofw. or Ms.  Forth E sake of you ruse RS, create some actual input test data from the languages used in your markets.

And remember that this does not test locale-specific data handling.  That data will also have to be created to represent locales in your target markets.

Test the I-O, your 亹stomers wiѭ th⌘k yo=FA for ☃.  Or at least they'll be able to.

Wednesday, September 26, 2012

PL6: Don't get too attached to fragments

While fragments might be good for a Greek mosaic, they aren't useful for Greek translation.  Or any other translation, for that matter.

In moving text strings to external resources, some engineers try to be as efficient as possible. Efficiency is certainly a laudable attribute, but when it comes to strings, it may mean taking a less obvious path.

Let me clarify.  A simple set of user interface strings might be:
n file(s) have been deleted.
n folder(s) have been deleted.
n file(s) have been moved.
n folder(s) have been moved.
Looking at these strings in English, one might be tempted to break them down into what seem to be obvious components:
file(s)
folder(s)
have been deleted.
have been moved.
Super!  The code just needs to pick up the components and concatenate them. Fewer words, less storage space taken, and the translation should be cheaper and faster, right?

Well, no, not exactly.  The problem is that while in English these components can be simply concatenated and still make sense, this is not true in other languages.  Possible strings appearing in the UI in somewhat stilted English as:
1 file(s) have been deleted.
2 file(s) have been deleted.
1 folder(s) have been deleted.
2 folder(s) have been deleted.
1 file(s) have been moved.
2 file(s) have been moved.
1 folder(s) have been moved.
2 folder(s) have been moved.
Should become in Italian:
1 file è stato eliminato.
2 file sono stati eliminati.
1 cartella è stata eliminata.
2 cartelle sono state eliminate.
1 file è stato spostato.
2 file sono stati spostati.
1 cartella è stata spostata.
2 cartelle sono state spostate.
Ignoring the possibility that this might not be the best Italian translation, note how many changes occur from sentence to sentence, due to changes in gender and number. The translator is not given the option to accommodate these changes, and must cope. The resulting translation would be extremely poor, akin to English users seeing a string such as "These file she have been deleted." Not exactly user-friendly.

Using a pseudo-localization string that has some sort of opening and closing characters, such as curly braces, will expose concatenated fragments. If a closing brace is followed by an opening brace, chances are the resource strings need some defragging.

Labels: , ,

Wednesday, January 19, 2011

PL5: Expanding your universe

I expect this next test in the series doesn't cover the universe, but it will get you further towards whole Earth applications.

The pseudo-localization applied to your text resources should grow the length of the strings. Depending on your application, I recommend the length is increased by about 50%, at least for short strings. The additional text should also include Asian characters, that is, characters from the Han range of Unicode. Let me explain why.

One aspect of testing is edge cases - testing limits and restrictions to see if they are working and how. This includes minimum, below minimum, maximum, and above maximum lengths. Translation usually grows the length of strings, both in byte count and in display. Fields in the user interface (UI) are often limited in length (byte and display), due to display constraints. Expanded pseudo-localized text will show how the UI could change. And more.

What are you looking for? For starters, you're looking at the layout and aesthetics of the UI with the expanded text. Is text wrapping awkwardly? Is it overlapping other text or screen elements? Is it pushing other objects out of position, skewing the overall layout? Do items line up where they're supposed to? Is the text truncated? Obviously limitations are necessary due to screen size, especially for mobile applications. But translators aren't mind readers - they're probably only seeing text in a resource file or translation tool. The only indication they'll have of a length restriction is the comment that you write next to the string to be translated. Which of course you have done.

Remember that I recommended using Asian text in the pseudo-l10n? This is useful for checking that these characters are legible in the space provided; by legible I mean that the stroke lines are distinct and the tops or bottoms of the character are not cut off. Asian characters are often more intricate and complex, requiring a larger font and additional vertical space to be rendered legibly. However, Asian translations will frequently shrink in width. A pseudo-l10n won't bear that out because it usually adds on to the existing string length. But you could run one that replaces the string with shorter Asian character strings, and see what that does to the UI.

We all need space for understanding, don't we?

Tuesday, January 11, 2011

PL4: For display only

This next pseudo-localization test in the series is very straightforward. Maybe.

You can verify whether your software correctly handles a particular character encoding by pseudo-localizing the resource files into that encoding. Use a broad spectrum of characters from the encoding, if not all of them, to check that the entire set is handled properly. Simply pseudo-localize the resources, bring up the pseudo-localized user interface, and view the text. Of course, you have to know what you're looking at, but if you've familiarized yourself with the set of characters used in the pseudo-localization, that shouldn't be a problem.

But that's not all! Since all of you are using a Unicode encoding for your user interface (a-hem), there's something else you can test. Some characters use a different font depending on the language they are representing. So for example, if characters from the Han section of Unicode are used in your pseudo-localization, you can set the locale to a Chinese region and verify the characters are in a Chinese font. Then change the locale to a Japanese region and see if a Japanese font is used to display them. Again, you have to know what you're looking at, but if your software is responsible for display, this is an important test. (Take a look at these charts, particularly the second and third sections.) If necessary, select a few key Han characters that differ in Chinese and Japanese, make up an image chart, and familiarize yourself with them. As they're used over and over in the pseudo-localization, you'll get to know them well.

Your customers will be happy you did.

Labels: , ,

PL3: May we dance with your dates?

This is the third entry in my series of the 9 coding areas pseudo-localization can test.

Amongst ourselves in internationalization, we refer to certain types of data as locale-sensitive. This is data that changes in some way from region to region (locale to locale), typically in format. The classic example is a date. And the classic example of this example is the short format in the US, month/day/year, that becomes day/month/year in most European countries; today is January 11, 2011, written 1/11/11 in the US, but 11/1/11 in the UK. (Often this is accompanied by some US baiting about how illogical that format is, but we won't go there and mention that in human terms, the day becomes less important beyond a week or so, and the month comes into prominence, lasting an entire year, when finally the year is needed, so it's stuck on the end. And it's written month day, year in longer formats, as above. But I won't mention any of that here.)

There are many more locale-sensitive pieces of data: numbers, time, prices, measurements, weights, telephone numbers, addresses, sizes, etc. Some of these data formats have been standardized, so that programs can select them using locale identifiers and apply them to the data. The Common Locale Data Repository (CLDR) is a public database with locale formats for many locales throughout the world (though admittedly there are no sizes, nor telephone numbers, nor postal addresses). Internationalization library functions and methods use it for formatting and parsing locale-sensitive data.

As these formats can be accessed programmatically for most of the locales of the world, they should be. That is, rather than externalize a date format for a localizer to alter to suit another locale, dates should be programmatically formatted. Why? The long answer is another blog entry, however the short answer is that there are many more formats than there are localizations. For example, the English localization is usually a single localization, but formats for English-speaking locales vary (see the short date format example above). The same is true for French, Spanish, Traditional Chinese, and so on.

But what does all this have to do with pseudo-localization testing?

You may remember my post about using pseudo-l10n to test whether localized files are picked up when they should be. Or perhaps you don't, in which case you might want to review it. Anyway, once you've set up your system such that the pseudo-localized files are mimicking an actual locale, you can exploit that setup to check on whether locale-sensitive formats are programmatically determined.

Run the system with the locale set to the pseudo-localized files' locale, and pay special attention to the formats of dates and numbers. Now change the locale setting to one that has different formats and run again. Check the dates and numbers - have they changed format? You might want to verify the formats against those listed in the CLDR; if you're looking online, check the summary charts. Start with the language base link, then drill down to get to the specific locale you're looking for.

Change the locale to a third value and check again. The formats should be changing appropriately each time. If not, to your customers today might just look like November 1, 2011.

Labels: , ,

Friday, December 10, 2010

PL2: It's good to be resourceful

Another aspect of software internationalization that pseudo-localization can test is whether all relevant resources have been made localizable.

Now, I understand that in order for this test to work, everything that is shown to the user must be visually inspected. This means the entire fixed user interface must be made to display, including windows that come up when various menu options are selected, pop-up windows, help messages, alternate text, user messages, and, that most difficult of entities to bring to view, error messages.

It's funny, when I tell people pseudo-localization can test 9 aspects of internationalization (actually I started with 5 and found more as time went on), this is the test they claim is not valid. And yet, at the same time, this aspect is the primary one that people claim they are testing when they pseudo-localize their product. Go figure. Personally I'm not confident that you can verify the entire user interface has been properly externalized this way, only a good portion of it, and therefore likely all of it. The key word is likely.

So run your pseudo-localization tool, bring up the product (hopefully it picks up the pseudo-localized resources - see the previous post), and have a good look. Try to create some errors so you can at least check that some error messages are being picked up correctly. In case you're wondering, you're checking for text that is still in the source language, without the additional characters or character transformations that the pseudo-l10n creates.

And then, move on to the next test.

Labels: , ,

PL1: A loose l10n is an easy pick-up

The first in my series of the 9 coding areas pseudo-localization can test.
Note: localization is often abbreviated l10n, because there are 10 letters between the l and the n. Sometimes people will capitalize the L to distinguish it from the 1. (PL1 is Pseudo-Localization 1, not Programming Language 1, for those of you old enough to remember that.)

Providing the pseudo-l10n is configured as a real localization will be, it tests whether or not the product will pick up the translated resources.

Once you run your resource files through a pseudo-translator tool, set them up as being for some locale other than the source locale. For example, if your code is typically written in the US and run on environments set to the en-US language/locale, then treat your pseudo-localized files as if they were the Japanese translation for Japan, that is, ja-JP. That typically means appending "ja-JP" to the file names, but may mean they should be stored in a ja-JP directory or database. This exercise also provides familiarity with the localization structure of the product - where the resources reside and what the naming conventions are.

Next, continuing with the example, the environment needs to look like a typical Japanese customer's. Whether that means the system settings are set to ja-JP, or the software itself is set to Japanese, or just the session, make it so. Maybe any of those settings are supposed to make your software switch to Japanese - all environments should then be tested. A side benefit of this is learning how other language/locale environments are set, and how the product is supposed to behave (yes, many folks don't know what the expected behavior should be in some cases).

Now, run your software. Is it picking up the pseudo-Japanese files in all areas of the product? Is it displaying date, numeric, and other dynamically determined formats according to the Japanese locale preferences? Did it do this without requiring a rebuild?

If not, you've got some work to do.

Labels: , ,

Monday, September 14, 2009

Long time, new series - The 9 Coding Areas Pseudo Localization can Test

Hello out there!
It's been a very long time, but not in my head. That is, I always have lots of topics that I'd like to blog about, just very little time to blog. The topic I'd like to begin on today is pseudo-localization, also known as pseudo-translation, pseudolocalization, pseudo-l10n, and probably some other names.

What pseudo-localization is
Pseudo-localization is the programmatic localization of a software product to enable early testing of product localizability. Typically this entails automatically transforming the translatable text into characters outside of ASCII in another character encoding. Sometimes it may include changing localizable objects that are not text, such as data templates and patterns.

For example, some pseudo-localization tools will allow the user to specify a string of characters to be appended to the beginning of a text string and another to the end. When the tool is run against the text resources of the product, all text strings will have these characters appended to them. The intention is that the appended strings contain accented characters and characters from other scripts (such as Russian or Chinese).

Why it is used
The usual reason given for pseudo-localization testing is to determine whether the user interface text is localizable; that is, whether the text has been put into separate resource files to enable localization without changing the program code files. Since the translation is automated, this testing can be done much earlier in the development process than if the real localization were used.

Two other aspects are also commonly tested as an adjunct to the localizability area: whether strings have expansion room, and whether another character encoding is handled correctly.

But the truth is that there is much more pseudo-localization can help test. In this series, I'll explain how it can be used to test 9 areas of internationalization. Stay tuned.

Labels: , ,

Monday, September 15, 2008

Tribute to Unicode, or, Unic-Ode

(In honor of Unicode's 20th anniversary!)

Here's to Unicode, that internationalization panacea,
Let's hope that we'll never find need to replace ya!
Your uniformity, uniqueness, and most of all, universality
Make up your (ten) principled character personality.

On round trips you're compatible with the most obscure graphemes
Forcing them into normalcy, then encoding schemes.
With the patience of saints, it is no surprise
You have an RFC role to canonicalize.

You bring characters together in dynamic combinations
Assisting programs in correct interpretation
(But never insisting on specific presentations,
Instead leaving that to experts in localization.)

Shall I extol your streamlined Basic Multilingual Plane
Even as it goes beyond FFFF again and again?
About your capacity I can't be too complimentary
I mean, over a million possible characters supplementary!

All the world's modern and ancient scripts are included
(Though Klingon is one the committee eluded;
But still there's no need to resort to hysteria,
For one is equipped with a Private Use Area.)

Each character, too, has a property selection
Such as function and case, position and direction.
Bi-directionality I mean, the right-to-left, left-to-right one,
Not something as complex as boustrophedon.

Your versatility knows not a boundary;
Many character encodings have emerged from your foundry.
There are 8-, 16-, and 32-bit selections.
(The 7-bit was deprecated for everyone's protection.)
Though, due to certain platform vending and
Other factors, 16-bit is endian.

But Unicode, don't let this praise make you haughty,
For over the years you too have been naughty!
In case you've forgotten, the Korean move,
European currency symbol, and a few others prove
You're not without fault; but these foibles don't mask the
Incredible improvement over US-ASCII.

So here's to Unicode, and to Joe, Mark, and Lee,
For all their hard work and a very low fee.
This repertoire extraordinaire has made its place.
After all, where else could one find a zero-width no-break space?

(c)2008 I18n G.A.L. All rights reserved. The rest of her is less restrained.

Friday, August 31, 2007

Myth #14 - CEOs, something you should know (last of the old series)

This is, in fact, the final myth of the old series, and it's huge:
"Internationalization is only needed in the software development department."
I have sent Jonathan Schwartz (Sun Microsystems' CEO) exactly one email. I have no idea if he read it, but in it I wrote:
1. Internationalization is not a solved problem.
2. Internationalization cannot be accomplished by engineering alone.
3. Internationalization must be an integral part of everyone's job; it cannot be accomplished by separate people labeled "internationalization specialists".
Today's myth is really #2 in the email, but the other 2 items are related. What do I mean by saying that "internationalization cannot be accomplished by engineering alone"? After all, doesn't software internationalization mean development work? Well, yes it does, but not in a vacuum.

Imagine designing, architecting, and implementing a software product with no specification, no customer requirements, no input from marketing, no data from market research companies - would you do that? Doesn't make sense, does it? And yet that's exactly how we internationalization folks work. There is no-one in marketing to gather international market or customer data; very little international market research is available for the areas that internationalization needs to know; and we don't do much in the way of gathering input from our existing international customers. And yet amazingly, people ask me all the time what languages, writing systems, and locales we should support. Sure, I can use my judgement, just as any other development architect could make her best guess on what to do. But that's not a very good way to do business.

Software internationalization engineers are expected to not only know their specialty, but to know everything about how it is applied everywhere. Just yesterday I had an email exchange with a manager in user interface engineering (UIE). We have been talking about putting together training to help the UIE folks better understand internationalization issues in the user interface. But the word is that their director is not interested. Huh? Sooooo, what does that mean, that UIE designs interfaces that only serve 40% of our market, and that's OK? Maybe this director is a firm believer in Myth #13. I'm not a UIE expert and have no plans to become one. Nor do I plan to move into int'l marketing, nor int'l financial reporting, nor int'l product distribution, nor manufacturing, nor software release, nor any other aspect of getting an international software product out in the marketplace. But all of these areas need to understand how international affects what they do. In other words, everyone needs to internationalize.

Let me give you another example. The globalization team is always trying to report numbers illustrating the value we add. We would like to show sales figures based on the localizations we have completed. The problem is that we have no way of knowing what customers are using which language versions of our products. This is not something that can be discerned from part numbers, because we include many localizations with each product. What needs to happen is that we need to get feedback from customers on what language they use as the primary one for the product they purchased. Globalization is an engineering organization; we are not positioned to garner this kind of information. Other groups could do it by simply internationalizing what they already do.

Or we could just keep guessing.

Labels: , , , ,

Friday, August 03, 2007

Myth #13 - Software engineering managers and triskaidekaphobics, take care!

I am nearing the end of the myth series, but I won't end on 13!
"Internationalization is implemented after the base product and is written by a separate group of engineers."
This one really hurts. And it does take a lot of explaining, but I'll do what I can to condense it. An internationalized product is one that processes data from all over the world. Whatever the product is supposed to do with data, for example, receive and send emails, it can do it with data from lots of different places in lots of different languages using lots of different locale formats. In order for software to be able to do that, every place in the code where the data is touched needs to be internationalized. If the internationalization takes place after the "base" code is written, then that means that a developer is going over the code twice (at least). From a logical standpoint, would you pay development engineers to write the same code twice? That's a very expensive way to develop software. Moreover, would you let engineers who don't have a very thorough knowledge of your code work on the core functionality of your product? That would likely introduce a number of new bugs, some of them quite serious. Yet if you are writing a "base" product and then paying a 3rd party internationalization group to internationalize your code, that's exactly what's happening. And if that internationalized code isn't incorporated into the source tree for the next revision, then you're paying even more to have the same code reworked each time you put out a new version.

'But,' you may wonder, 'my development engineers aren't internationalization experts. How can they write internationalized code?' The answer is training. Internationalization is an aspect of good coding practice. That it isn't yet taught in universities is a real problem, and those of us in the industry are keenly aware of the issue. So you may need to make up for it in house. Set up internationalization training for all development engineers, test engineers, and engineering managers. There are a couple of vendors who provide these services (I don't have firsthand experience with any particular vendor). Furthermore, there is a tremendous amount of documentation in books and online, talking about the concepts as well as the implementation details involved in internationalization. Take a look at the Sun Globalization Resources, and the Java Internationalization site to start with. There are a couple of links on my blog page that have good information on internationalization. Have a look at my article, Internationalization in Software Design, Architecture, and Implementation, and take advantage of some of the mailing lists. In the end, it's much cheaper to train your own engineers than to pay for some other company to rewrite your code. And your engineers may be a little bit more content in their jobs when they're learning new things and have more control over their own code.

Try it out, and let me know how it works.

Labels: , , , ,

Myth #12 - More quality by the dozen

Continuing in the myth series, we address the scope of internationalization:
"My product works in Japanese, therefore it's internationalized."
Seems sort of logical doesn't it? I mean, if Japanese data gets processed and displayed correctly, surely that means the internationalization is correct? Alas, no. In fact, for a long time there was (and to some extent still is) a specialty called "Japanization". So many companies in the U.S. wanted to take advantage of the lucrative Japanese market that firms sprang up, offering this service. They would take the software and alter it to enable Japanese data processing and display on Japanese-capable machines. Sounds pretty good, huh? Except that it was expensive, time-consuming, and had to be repeated for each release. But we'll get into that myth later.

But let's say you didn't send it to a third party "Japanizer", and you really tried to do proper internationalization over the course of design and development. If you test using Japanese, then that must indicate everything is OK, right? Well, no. It's true that testing with Japanese can uncover some real internationalization problems, but not all of them. Arabic, Hindi, and French have different i18n issues from Japanese - even Chinese is different. What are these differences? These other languages often use different charsets, and in the case of Arabic and Hindi, different rendering capabilities. Beyond language, the Japan locale has its own data formats, which are not the same as France, or Brazil, or Russia, etc. And as a further area to test, Japan is within a single time zone, so you haven't tested a multiple time zone situation (a genuine problem for the U.S. with its 7 time zones).

But wait, there's more! Most customers require multilingual capabilities in their software, and that needs to be tested. For example, if you're working on server software, consider that companies will run the software on a server which is set to one language and locale (with the admin possibly using a different language for the console), but clients will be using lots of different languages and locales as they contact the server. This is a very common configuration nowadays. And it's not limited to servers. Just because someone is running the Italian localization of your email client doesn't mean that all their emails will be in Italian, or even Latin script. They may be emailing in Thai, and you need to verify that your product can handle these sorts of multilingual requirements.

For more information on testing the internationalization of software, see my article, Internationalizing Testing. And keep reading the myths!

Labels: , , , , ,

Wednesday, July 25, 2007

Myth #11 - Ahhhh, elevenses, where everyone is literate in English™

Now we enter the land of fantasy, where eating pizza and ice cream causes weight loss and lowers blood cholesterol, and everyone is able to read and write English. Here we have a new myth in the series:
"If something is wrong, our customers will tell us."
The question is, what venues do you give them to tell you? And how do you know that the reason they didn't buy your product is because of certain internationalization (or other) problems? those are the two main stumbling blocks to overcome, but there are undertones as well.
Starting with the venues, have you provided your customer with easy ways to tell you about problems? That is:
  • Do they know what URL to go to, what email address to mail to, and what number to call?
  • Is the text at the URL translated into their language?
  • Does the form at the URL accept text in their language?
  • Is the person who answers the phone the person who helps them throughout the problem reporting process (regardless of who has the correct solution for the problem)?
  • Does the person on the phone speak their language?
  • Are emails in their language?
  • Is there a discussion forum in their language, and is someone from your company monitoring it?
  • Can your bug tracking system handle data in other languages (which your software processes)?
  • Do you conduct seminars and conferences on your products in different parts of the world, and are there interpreters and translations of slides and papers into the local language(s)?
I'm sure there are loads more things that can be done, but this is what I mean by venue. Some of the above points are more general; making sure your customers (or potential customers) know exactly where to go to ask questions and report problems (they should be the same place, one contact point to your company, regardless of the communication) is basic customer service. Providing them different channels (Web, telephone, email) gives them the flexibility to choose what they are most comfortable with. Providing these venues in the customers' languages is also essential; it shows you care about them and you will listen to them. Even more fundamentally it enables them to communicate! Not all customers are comfortable enough with English to report a problem. And there are cultural issues to consider: many people don't want to embarrass themselves by using their so-so English, potentially creating a misunderstanding. In some cultures, pointing out a problem is an incredibly rude thing to do, even if it's a product they've paid good money for. They may be more inclined to quietly return the product or eat the cost and move to a competitor's product. In those cases you will never learn of the problem which lost you that customer, and maybe others, too.
This brings me to the second point, do you ever find out exactly why a potential customer never became an actual customer? Does your company do follow-up? Have you listened carefully to your potential customer's requirements - in their language, using their cultural conventions - and verified that your product fulfills them? Are you approaching potential customers in a culturally appropriate way? Are you advertising in the right fora (forums)?
It may sound like a lot of work, but the payback is huge. If you think your company does OK in this area, take a closer look. Talk to the people who live and work in other countries and get the real story. Most recently I heard from someone who worked in the support area in China. He knew of several cases where customers reported problems with the product running in the Chinese environment. Because the people trying to reproduce the problem didn't try it in a Chinese environment, they couldn't reproduce the problem, and closed the bug report! (Yes, this happened in 2004.)
Do you think those customers ever reported another bug? They may not even be customers anymore. This actually points to a future myth, that is, that internationalization is only necessary for software development. Until then.

Labels: , , ,

Tuesday, July 24, 2007

Myth #10 - For Senior VPs, the power of 10

What I wouldn't give for VPs to read this blog! Actually, it'd be great if they read all the myths, but you can't have everything (where would you put it?)
"We added internationalization in the last release, so we're done."
I find it almost unfathomable that any exec would believe this, and yet I have heard it from more than one. Saying this is like saying "We wrote the code in the last release, so we're done." I hope that there isn't a software VP out there who would say that. Internationalization is inherent in the architecture, design, and implementation of a product. In reality it is part of the entire process of creating, distributing, and selling a product, but I'll just stick to the software development portion. When designing a product, international requirements must be considered in the design and architecture, otherwise you may have to redesign and rewrite the product to enable support of other language and locale data. I have seen it happen, and most recently, a product was scrapped because it wasn't worth rewriting it. What sort of things can trip you on design? Take a look at my article on this very topic:
Internationalization in Software Design, Architecture and Implementation

Every time there is a change in design, the addition of functionality, new code written, there is internationalization. It, too, like all aspects of a changing product, needs to change. It is part and parcel of the entire software development process, from requirements gathering, to design specification, to implementation, to testing, to release. Unless the product itself is done, internationalization work needs to continue.

And the myths go on (they go to 11) ...

Labels: , , ,

Monday, May 21, 2007

Myth #9 - Number 9, number 9, number 9, number 9

(Play this blog backwards for the hidden messages.) As for the more overt message, I continue with the myths, series:
"We've never localized this product/module/component/blidget, so it doesn't need internationalization."
It's a good thing that nothing ever changes in products, nor in markets. Oh, they do? Yes, even though your product has never been localized before, and this may be a real shock, it might be localized in the future. Whoa! And, another shock, localization is a business decision, not a technical decision. In other words, if a customer says, "We'll buy 2 million licenses for your product if you localize it into Cloqrat," you don't want to have to say, "Gee, sorry, it'll take a massive reworking of the code, say, 12 months to get that to you" since your customer will respond "That's OK, we'll just buy Microbrain's version" and then your problem will be solved because you won't ever see that customer again.

But there's more to it than that. Internationalization is a lot more than making the product localizable. It's primarily about data processing (see Myth #1). That is, even folks in the USA have to process data that is not US data. Was that another shock? I'm sorry; try some rooibos tea.

Labels: , , , ,

Thursday, May 10, 2007

Oh no! Not Myth #8! Anything but Myth #8! ...Anything? ...OK, Myth #8

And now, we examine our heads, no, navels, no no, myths, we examine our myths (OK, can you tell I'm getting a little punchy here?):
"Administration interfaces don't need internationalization."
'Cause all sys admins everywhere speak, read, and write English fluently, don't they? You know, the funniest thing about this myth is that it's so often repeated, but I have yet to find any data, study, customer interview, or even efforts to obtain such, to support this myth. Maybe it's a mantra. In any case, the hard facts are that many admins are not that comfortable with English, or in some cases they don't know any at all. If you're charged with keeping a company's systems up and running, how keen are you to do that in an interface that is a second language? I thought so. Nothing like a message popping up on the screen with "Floozid iyarkaba panic gotrios piwec shutdown worqas!!" and there you are, madly flipping through your Cloqrat => English dictionary, trying to remember the conjugation of the verb gotrasco. And then more messages come flying across the screen...

The point being that admins are humans, just like you, and language has meaning for them too. They're going to function a lot better in a native language than in a second language, just like you. Many companies translate their admin interfaces - check out what your company does. And this actually leads into another myth, which I'll post at a later date, namely "We've never localized this before."

}sigh{

Labels: , , ,

Friday, April 27, 2007

Myth #7 - IT depts., are you feeling lucky?

In this, the next installment (or instalment, if you prefer) of the
myths
series, we explore something nearer to our hearts^H^H^H^H^H^Hselves:

"All company employees speak English, so only English needs to be supported by internal tools."

Seems pretty straightforward, huh? But the question is, what are internal tools used for? Even if all employees speak and read English, their data may not be in English. If they live in Beijing, it might make sense to store their Chinese addresses, for example.


Much of the data that internal tools process is not employee data, however. Many companies have bug tracking and reporting tools. These tools are used to log and report on bugs found in company products. Not only can those bugs originate from people outside the company who may not be able to write in English, but the problems can be regarding the processing of data other than English. If the products are internationalized, customers use them to process data from all over the world. If they encounter a problem, they're going to want to put in the data they were using when the problem occurred (if you're lucky). Without that data, you may not be able to reproduce the problem. Worse, if you don't allow that data, the customer may not bother to tell you about the bug at all!


Think about your internal customer databases - can they handle customer data in the local languages? This can be extremely important for customer relations and communications. What about Web based feedback? Could a Greek customer who doesn't write English send you some feedback? Wouldn't you want to know? Getting the feedback translated is a simple matter; getting the customer back after they left in frustration from not being able to communicate with your company is a lot harder and more expensive.


There are many other examples of internal tools that need internationalization: survey tools, forum discussion software, employee benefits tools (if you have worldwide employees), product registration tools, financial tracking tools (does all your revenue arrive in US dollars?), etc. The trick, and I know this is really tricky, is to carefully look at the tool requirements and all the data that the tool will be processing in the foreseeable future. Then design, architect, and implement to cover it. Piece of pie. Easy as cake.

Labels: , , ,

Sunday, February 18, 2007

Myth #6 - Web page authors, this one's fer you


For goodness' sake, how many myths are there? The answer is, as many as I have heard. But truly, there are fewer than 15, so read on! This is one that keeps popping up like a bad penny:
"ISO-8859-1 is the standard encoding for HTML."
Sooooo, does that mean all those Web pages in Japanese and Chinese are a bunch of standard-violating hacks? No, of course not. It is perfectly legal to use any charset in a Web page, but it should be declared. Why? Because ISO-8859-1 is the default charset for HTML (yes, even in HTML 4.0). That means if you don't declare the charset of your page, a browser (or any other HTML interpreter) is free to assume that it's in ISO-8859-1. Now, admittedly, in practice browsers make other assumptions. Typically you set a preference for a default charset (or character encoding, if you prefer). This is sometimes set based on the localization you install; for example, if you install a Russian version of the browser, it may set the default charset as "KOI8-R". But the point is that assumptions will be made, unless you declare the charset in your document. And it's very straightforward. Just put a META tag as the first tag in the HEAD section, like so:
<META HTTP-EQUIV="Content-type" VALUE="text/html; charset=utf-8">
Simple, right? Oh yes, and I sneaked in a better charset to "default" to - UTF-8. UTF-8 is an encoding of Unicode, nearly universally supported, covering most of the living languages of the world. Use it and all your cares will be over - uh oh, see Myth #4.

Labels: , , , ,

Sunday, January 28, 2007

Myth #5 - For open source aficionados

Oh, the myths keep rolling along. This one is more and more commonly heard (used?) as open source is more and more commonly incorporated into company offerings.
"My product uses open source and so internationalization requirements don't apply."

Tell that to your customers when your software doesn't work for them.

A product is only as good as its weakest component. If you use open source in your product and it isn't internationalized, then it may be that your entire product can't handle international data. For example, say you base your product on one of the Linux flavors that isn't internationalized, and say your primary market is, oh, China (I hope a certain VP is reading this). Does that make sense? Why would Chinese customers run on a platform that doesn't support Chinese? Would you run on a platform that doesn't support English? My German is pretty near fluent, but I still run on an English platform. Even if I switch the interface to German, which I sometimes do, I make sure that it can process English correctly.

When producing a software product, all customer requirements should be considered. And whether you write your own code or pull it from open source, those requirements still count. If your market is worldwide, or even within the EU, internationalization is not a "nice-to-have", it's a must. So when choosing external components, be they from a vendor or from open source, consider the work it will take to get the internationalization up to snuff. Then make your decision.

Labels: , , ,

Thursday, November 09, 2006

Myth #4 - Are we having fun yet?

And so the myths go on, and internationalization folks keep chuckling with a look of irony on their faces. Here is Myth #4:

"My product supports Unicode and therefore it's internationalized."

Interesting. Of course, being a myth, it isn't true. Why isn't it true? Doesn't Unicode cover all languages worldwide? Would that it were that easy. In brief, Unicode is a coded character set (see RFC 2130 The Report of the IAB Character Set Workshop), that is, a set of characters associated with a sequence of integers. It defines some character encoding schemes (and forms, but we won't get into that) which take the values associated with the characters and translate them into bytecodes that computers can understand. OK, so what's all this got to do with the myth? The point is that only characters, or parts of characters, are encoded. There is no language information, no locale information, no font information, very little rendering information. What's more, the requirements for supporting Unicode are very lenient. In other words, your product can "support Unicode" and yet only recognize a single character. So, if a product only supports Unicode for its internationalization, not only are you missing information vital to internationalization, but you might not even handle the languages that Unicode covers.

And therein lies the myth. By all means, support Unicode. Support all the currently assigned characters. Unicode is a handy tool for supporting languages around the world. But it is not an internationalization silver bullet panacea elixir cure-all.

Labels: , , ,

Monday, October 16, 2006

Myth #3 - For all Java programmers and their managers

Another couple of days (well, maybe more), another myth:

"The code is in Java and therefore it's internationalized."

C'mon, admit it. How many of you think that? How many of you have actually said that?

Well, it's not true, I'm sorry to say. You see, long, long ago, before there was Java, or even any of the internationalization libraries currently available in C, there was internationalized code. "How could that possibly be?" one wonders, scratching the head in puzzlement. Amazingly enough, even back then, there were people who understood the requirements and designed and coded for them. They had to write a lot more code and make their own custom libraries and tools, but they did it.

It is true that it is much easier to write internationalized code in Java because it provides the tools. But you have to use those tools correctly, or you'll have problems. Use the locale-sensitive functionality available in many classes and methods in java.text and java.util. Make sure you're using i18n friendly classes and methods whatever the package. Take a look at the Java Internationalization site to find out more.

And don't forget to tell the others...

Labels: , , ,

Monday, September 18, 2006

Myth #2 - Just when you thought it was safe...

And so on to Myth #2 (for some background see my previous blog, "Republishing the Myths")
"Translators choose the best phrase in the target language."
Uh-huh. I can hear the translators rolling on the floor laughing (or ROTFL, for those who just love initialisms). Note, I am not disparaging the work of translators - they are professionals and most do a great job considering the limited context they are given. But bear in mind that:
1. They only have the context that you give them.
2. They get paid by the word, or sometimes by the project.
Therefore, given a word to translate, they don't sit there and ponder the literary nuances ... "Hmm, break, what could the programmer mean by this? Is it a break in the text? A break in the execution of the program? Something to cause the program to crash? Or could it refer to the programmer's innermost desire to break free from the shackles of structured code, moving on to more creative and fluid expressions of the starving software engineering soul?" ... No, this doesn't happen, translators would starve if they did this. Why should you care? Well, if you write any text that may possibly be seen by an external user, that is, error messages, help messages, and the like, then think carefully about the text you use. If your product has a glossary (hey, it could happen), use it. Make a comment in the resource file to give the translator some context. Keep like messages together. Use standard English (or German or Japanese or whatever language you're writing in) and stay away from jargon, slang, and local terms and phrases. The translators want to do a good job; give them that opportunity.

Next blog, Myth #3!

Labels: , , , ,

Sunday, September 17, 2006

Beginning again with Myth #1

And so, here it is, and still believed, Myth #1:
"Internationalization means externalizing the user interface so the software can be translated."
I've been in i18n* for over 16 years, and I haven't seen this assumption change in that time. So why is it a myth? Think about why people or companies buy software. Do they buy it for the user interface? If someone in Japan sees email software with a Japanese user interface which can only send and receive email in English (US-ASCII), do you think they'd buy it? Of course not. People buy software to do something to their data. If the software doesn't do to their data what they want/expected, then they're not going to buy it. Seems pretty straightforward, but what does this have to do with internationalization? The answer is, internationalization is, first and foremost, adapting the data processing to handle data from all over the world. This is far more essential than enabling the user interface to be translated, and a good deal more difficult. The difficulty lies mostly in the planning and design, or rather, getting it into the planning and design. The implementation is only difficult in getting implementers to learn a few things and then execute with those things in mind. Externalizing messages is a piece of cake (don't forget images and sounds).

* see Norbert's blog

Labels: , , ,

Republishing the myths

Even though my old blog at Sun is still extant, I don't know how long that will last. So I've decided to republish my Internationalization Myths series here, where I can keep an eye on them. Maybe I'll publish a few more, who knows?

For those of you who want to see the original intro to the Myths series, I have decided to re-post it here:

Humor for internationalization engineers (and others, too)
Allow me to introduce myself. I am I18n G.A.L., and rather than just tell you what that stands for, you can find out or you can guess - creative responses get extra credit. I'm a big fan of creativity, as it's pretty much a requirement for anyone trying to incorporate internationalization into a software organization.
I myself have been in this industry specialty for over 16 years (hey, who knew internationalization has been around that long, and longer?) and in software development for longer than that. But enough of my resume ... or résumé ... or CV, on to the topic at hand. If you're really interested in finding out more about me, there'll be a little of me in everything I write.
What prompted this, my very first blog, is twofold. One, Jonathan Schwartz (then CEO of Sun Microsystems, Inc.) is a fan of blogs and blogging, and so encourages it. Two, I actually thought of something to write. Recently, a group of us internationalization folks at Sun were preparing a presentation for a conference. The presentation is "Architecting Products for the Global Market" (titled so as to keep the word internationalization out and maybe attract non-internationalization folk, ha ha, but I digress). At the end of the presentation, we have a series of myths. We were reviewing the presentation draft and making corrections when we got to the Myths section. As we read each myth, invariably all of us would chuckle. We can't help it. We've all encountered these myths in some form or another, often stated almost verbatim by some developer or executive.
Hence the subject of my blog. I like to make people laugh, and if by publishing these myths even one more person laughs, well, the world is a better place. But of course I must explain why each one is considered a myth, and since there are quite a few, I thought I'd better make it a series of blogs.

Labels: , , ,

Friday, August 05, 2005

Splitting the blog

I have decided to split up my blogging. I18n G.A.L. is really about international, and I'm sure the technical readers aren't interested in my personal move to England. Likewise, I'm sure my family and friends aren't interested in my technical posts! So I've created a second blog, since I can't see an obvious way on Blogger to create topic areas within one blog:

http://exusexen.blogspot.com

When you go there you will see why I have chosen that name. I will create an inaugural blog there momentarily. I expect that I will be posting more there than here for the time being.

Thanks for reading.

Sunday, July 24, 2005

Arriving and some observations

I have received the comment that all my posts thus far have been about leaving, and none about arriving. I've also received some queries about our safety and well-being.

We are fine, all of us. I don't know anyone who has been directly affected by the recent terrorist actions. But as I said to a friend, we cannot let these things change the way we live. Whether there are religious fanaticists blowing up subways or health clinics, random violence from people with guns who shouldn't have them, or from men abusing women in societies that tolerate such actions, we must live in the way we believe is right. I believe in tolerance and peace.

As for the more mundane aspects of arriving, our container should arrive tomorrow. We will have to store everything somewhere, as we don't yet have a house nor jobs. We are actively searching for a car.

But since arriving, there are things I've come to appreciate in a very short time. I thought that living in California spoiled me for fresh fruits and vegetables, and that the farmer's market was something I would really miss. Little did I remember from my many visits here in the past, and I have since learned more. You see, England being so small, things are closer together. That includes the farms to the towns. You can drive 10 minutes and get to little farms, many of which sell their goods right on site. As you drive around, there are small signs up advertising fresh eggs and various vegetables. And on Thursdays, there's a man who comes around here with a truck full of fresh vegetables, fruit, and eggs. The milkman delivers 3 times a week, and in addition to milk, has juice and eggs, too. (Eggs are a recurring theme). Withing a 10 minute walk there are 2 greengrocers, as well as 2 bakeries, 2 butchers, 1 fish shop, and a specialty cooked meat shop. There are supermarkets within walking distance, too, but the products at the specialty shops are superb. Dairy products are much tastier, and the eggs are phenomenal. You can get a greater variety of both (dairy and eggs); for example, there's single cream, double cream, whipping cream, and clotted cream as well as the usual skimmed, semi-skimmed, and regular milks; you can buy duck eggs, or eggs from certain breeds of chicken. Amazing. And the prices of produce is significantly cheaper than in the Bay Area. So some things here can spoil a body, and that body at the moment is me!

Wednesday, July 13, 2005

We made it!

(I hate Internet Explorer, which kindly deleted my entire post when I tried to enable pop-ups temporarily so I could run a spell-check. Next time I'm bringing up Firefox...)

We are finally here. It seemed like we'd never get out. Even though we supposedly had lots of extra time to get everything cleaned up, packed away, stored, and donated, we were working until the very end. I sold my car the day we left! That is, I half sold it, and dear Izzy finished the job. I hope the agency we left our stuff out on the curb for actually came by and picked it up. I assume our tenants would have said something (although at this writing they still haven't moved in).

It's amazing how tasks seem to fill the time allotted.

Some advice should you ever decide to move a long distance but retain your house (apart from "don't!") - it is a huge task. Unless you are a minimalist and extremely organized, it is exhausting, both mentally and physically. Enlist all the help you can get, accept all the help that is offered. Plan to get rid of loads of things you thought you would keep. Don't have any grand ideas of cooking the food in your fridge, freezer, and pantry. Try and secure cleaners well in advance. Plan to spend more money than you budgeted. Order a large garbage pickup on the latest date possible. And get babysitters for the kids, for their sanity as well as yours.

The mental exhaustion comes from making literally hundreds of tiny decisions: do I keep this thing, give it to a certain person, store it in the attic for when we return (maybe 10 years hence), donate it, recycle it, or throw it in the garbage? This goes for nearly every single object you put into a box (if you're not hiring packers). Because, after all, what's the point of putting it into a box if you never want to see it again? This is your opportunity to get rid of the clutter. Towards the end I nearly called a charity agency to just come and clear the rest of the house out, if only to save my mind and my back. This all made it easier to leave. So far so good. The real test comes a couple of months hence, when the novelty wears off and I start missing people, places, and decent Mexican food.

Labels: ,

Wednesday, June 29, 2005

Final wrap up to leave the house

Well, as I'm waiting for files to copy from my flash drive (wonderful little devices, those) to the laptop, I thought I'd blog on our progress. The house is mostly cleaned and packed. The container left the morning of the 23rd, and that was a bit of a panic. Were it not for the help of some very good friends, we would have been up all night the night before (and we're too old for that sort of thing!). The 20 ft. x 8 ft. x 8 ft. container was nearly full, all the way to the top. Lots of stuff, and we'd pared down quite a bit! Some of the stuff we'd "pared down" was actually to be packed up and put into the attic. One disadvantage of retaining our house was the abundance of free attic storage. Translation: we've kept a lot of stuff we wouldn't have kept, I'm sure. But truly we have gotten rid of a lot of junk. And if and when we move back, I hope we can be even more discriminating. We still have more of the kitchen to pack up, and boxes and boxes of stuff to take to various agencies. I need to pop into the office to shred a few docs.

Thank goodness for Izzy, who has come almost every day to pack and make us keep moving. Today she packed and packed and cleaned and cleaned and cleaned, which is a good thing since I can't seem to get another friend's cleaning lady to call me back. I am so looking forward to a few days at Izzy's house, when everything is done and we can kick back by her pool before the long flight across the pond. I'll be able to get some luxuries done, like a haircut, manicure, and pedicure. Aaaaaah. Maybe we'll even take in a movie (Hitchhiker's Guide, anyone?). For now I must finish copying files from my old PC to the laptop so we can give the old PC to our neighbor. Yet another delivery to make tomorrow...

Labels:

Monday, June 20, 2005

Thinking and not sleeping

There are so many things to take care of when you're leaving a country. And when you're busy all day just doing the things that have to be done at that moment, you don't have a chance to consider all the other things that are yet to be done. So, when you lie down to sleep, that's when the mental hamsters start running on the wheel. Many a night in the last couple of weeks I have found myself staring into the darkness, composing emails to old boyfriends, letters to long lost friends, figuratively packing my case and arranging items into box categories ("Open right away", "Open fairly soon", "No hurry", etc.) }sigh{ no rest for the wicked...

Labels:

Tuesday, June 14, 2005

Welcome!

Hello and welcome to the new location for the I18n G.A.L. !
To those of you who have followed me from blogs.sun.com, thank you. To the new folks, you might want to read some of my previous blogs.
I need to keep this short due to a time constraint. You see, I'm off to a new adventure, moving to another country. Needless to say, there's loads to take care of. I expect my new experiences will give me quite a lot to write about. Until then, I'll be packing and sorting through 15 years worth of accumulation, and having lots of fun doing it, rest assured.

Until later...