Wednesday, September 26, 2012

PL6: Don't get too attached to fragments

While fragments might be good for a Greek mosaic, they aren't useful for Greek translation.  Or any other translation, for that matter.

In moving text strings to external resources, some engineers try to be as efficient as possible. Efficiency is certainly a laudable attribute, but when it comes to strings, it may mean taking a less obvious path.

Let me clarify.  A simple set of user interface strings might be:
n file(s) have been deleted.
n folder(s) have been deleted.
n file(s) have been moved.
n folder(s) have been moved.
Looking at these strings in English, one might be tempted to break them down into what seem to be obvious components:
file(s)
folder(s)
have been deleted.
have been moved.
Super!  The code just needs to pick up the components and concatenate them. Fewer words, less storage space taken, and the translation should be cheaper and faster, right?

Well, no, not exactly.  The problem is that while in English these components can be simply concatenated and still make sense, this is not true in other languages.  Possible strings appearing in the UI in somewhat stilted English as:
1 file(s) have been deleted.
2 file(s) have been deleted.
1 folder(s) have been deleted.
2 folder(s) have been deleted.
1 file(s) have been moved.
2 file(s) have been moved.
1 folder(s) have been moved.
2 folder(s) have been moved.
Should become in Italian:
1 file è stato eliminato.
2 file sono stati eliminati.
1 cartella è stata eliminata.
2 cartelle sono state eliminate.
1 file è stato spostato.
2 file sono stati spostati.
1 cartella è stata spostata.
2 cartelle sono state spostate.
Ignoring the possibility that this might not be the best Italian translation, note how many changes occur from sentence to sentence, due to changes in gender and number. The translator is not given the option to accommodate these changes, and must cope. The resulting translation would be extremely poor, akin to English users seeing a string such as "These file she have been deleted." Not exactly user-friendly.

Using a pseudo-localization string that has some sort of opening and closing characters, such as curly braces, will expose concatenated fragments. If a closing brace is followed by an opening brace, chances are the resource strings need some defragging.