Chapter 12

Language Translation

Much of today’s software is written for an international market. Worldwide sales enable vendors to maximize profits. In addition multinational companies often must build systems that cut across countries, cultures, and languages. Language translation can be a difficult issue. Data often is stored in the language of entry, but there can be a need to translate metadata, such as labels in forms and reports. This chapter presents the nucleus of a string translation model.

12.1 Alternative Architectures

Table 12.1 summarizes several approaches to language translation. It is convenient to consider abbreviation along with translation.

Table 12.1

Language Translation Approaches

Approach

Synopsis

Advantages

Disadvantages

Attribute translation in place

Each translated or abbreviated attribute has multiple parallel fields.

  • Simplicity.
  • Precise translation.
  • No language bias.
  • Supports abbreviation.
  • Must add fields.
  • Translations can be inconsistent.
  • A person must provide the translations.

Phrase-to-phrase translation

A lookup mechanism converts a source phrase into a target language and abbreviation.

  • No disruption to applications.
  • Supports abbreviation.
  • Multiple meanings can lead to translation errors.
  • Language bias.
  • A person must provide the translations.

Language-neutral translation

Applications store concept IDs. A lookup table maps IDs to phrases.

  • Precise translation.
  • No language bias.
  • Supports abbreviation.
  • Translated application fields must be stored as IDs.
  • A person must provide the translations.

Automated translation

A software algorithm translates a phrase from one language into another.

  • Persons do not make any translations.
  • Poor translation quality.
  • May not handle abbreviation.

One option is to add parallel columns for translations and abbreviations. This approach is certainly simple, but it is verbose (many columns could be needed) and brittle (each added translation or abbreviation causes modification of the schema).

A dedicated lookup table can convert a phrase from a base to a translated language and handle abbreviations. The advantage is that there are no disruptions to application schema. The downside is that phrases can be translated out of context leading to errors. For example, there are multiple meanings of the word bank.

The language-neutral translation service is a robust choice. This also uses a lookup table, but a concept ID represents the source idea. This approach separates the multiple meaning of words and phrases for a clean translation. The drawback is that application databases must replace translatable strings with concept IDs. Consequently this approach is normally limited to new applications.

Some Web sites implement the last option. For example, Babel Fish and Google Language Tools can both translate a phrase from a source to a target language. Such an approach is not viable for most applications as translation quality is often poor.

The next sections elaborate the first three options.

12.2 Attribute Translation In Place

The simplest approach is to add columns for translations and abbreviations. Figure 12.1 shows an example. The birth place, hair color, and eye color strings are stored in both English and Spanish. The other fields are not translated. This approach is vulnerable to inconsistencies. For example, one person could have brown hair with a Spanish translation and another person could also have brown hair with a different translation.

Figure 12.1

Figure showing attribute translation in place: Person model. Consider when few fields must be translated and for XML files.

Attribute translation in place: Person model. Consider when few fields must be translated and for XML files.

Consider this approach when only a few fields must be translated. Also consider this approach when XML files store data. XML files can handle parallel fields with nested elements (unlike relational database tables).

12.3 Phrase-to-Phrase Translation

Figure 12.2 and Figure 12.3 model the lookup mechanism for phrase-to-phrase translation. The advantage of this approach is that there is no disruption to any existing application schema. Consider this approach when you can limit the phrase vocabulary and avoid multiple meanings.

Figure 12.2

Figure showing phrase-to-phrase translation: UML model. Consider when you can limit the phrase vocabulary and avoid multiple meanings.

Phrase-to-phrase translation: UML model. Consider when you can limit the phrase vocabulary and avoid multiple meanings.

Figure 12.3

Figure showing phrase-to-phrase translation: IDEF1X model.

Phrase-to-phrase translation: IDEF1X model.

A Phrase is a string with a specific Language and AbbreviationType. The Language for a string can be a Dialect, a MajorLanguage, or AllLanguage. A MajorLanguage is a natural language, such as French, English, and Japanese. A Dialect is a variation of a MajorLanguage, such as UK English, US English, and Australian English. AllLanguage has a single record for strings do not vary across languages.

Each Phrase has an AbbreviationType which is the maximum length for a string. For example, there may be a short name (5 characters), a medium name (10 characters), a long name (20 characters), and an extra long name (80 characters). Abbreviations are especially handy for reports and user interface forms.

PhraseEquivalence cross references Phrases with the same meaning. (See the Symmetric relationship antipattern in Chapter 8.) There are synonymous Phrases across Languages and AbbreviationTypes but not for the same Language and AbbreviationType (hence the uniqueness constraint).

The translation service is dedicated software that runs apart from client applications. The translation database stores corresponding Phrases for various Languages and AbbreviationTypes. (A person must populate the translation database.) Upon request, the service finds the translation given a source Phrase, target Language, and target AbbreviationType.

Figure 12.4 shows a sample application table that could be subject to the translation mechanism. The phrase-to-phrase approach has a language bias. For example, the source data may be stored in English and converted to another language only upon translation mapping. Architecturally, a language bias is undesirable because users may detect the favored language.

Figure 12.4

Figure showing phrase-to-phrase translation: Person model.

Phrase-to-phrase translation: Person model.

The pseudocode in Figure 12.5 illustrates the logic for finding a translation. (The pseudocode is written using the UML’s Object Constraint Language [Warmer-1999].) The basic logic is to first look for an exact match to the target language. Otherwise, if a Dialect is specified, look for the corresponding MajorLanguage. If that fails, then make one more try to look for the AllLanguage record.

Figure 12.5

Figure showing phrase-to-phrase translation: Pseudocode for finding a phrase.

Phrase-to-phrase translation: Pseudocode for finding a phrase.

12.4 Language-Neutral Translation

Figure 12.6 and Figure 12.7 show a model for a language-neutral translation service. This approach separates the multiple meaning of words and phrases for a clean translation. However, you replace translatable strings with concepts IDs, limiting this approach to new applications.

Figure 12.6

Figure showing language-neutral translation: UML model. Consider for new applications that require a robust translation approach.

Language-neutral translation: UML model. Consider for new applications that require a robust translation approach.

Figure 12.7

Language-neutral translation: IDEF1X model.

Figure showing language-neutral translation: IDEF1X model.

A Phrase is a string with a specific Language and AbbreviationType. The Language for a string can be a Dialect, a MajorLanguage, or AllLanguage. A MajorLanguage is a natural language, such as French, English, and Japanese. A Dialect is a variation of a MajorLanguage, such as UK English, US English, and Australian English. AllLanguage has a single record for strings that do not vary across languages.

Each Phrase has an AbbreviationType which is the maximum length for a string. For example, a name may be short (5 characters), medium (10 characters), long (20 characters), and extra long (80 characters). Abbreviations are especially handy for reports and forms.

A TranslationConcept is the idea in a person’s mind that underlies a group of related Phrases. The premise of language-neutral translation is that an idea can be precisely expressed in any Language. Of course, this assumption is not exactly true as each language has its nuances. However, it is a good approximation for translating short phrases such as those that occur in user interface screens and reports. The translation service is not intended for long passages such as those in documents and books.

Table 12.2 shows a simple example. A person has the concept “truck” in mind with a translationConceptID of 2054.

Table 12.2

Language-Neutral Translation: Sample Phrases

translationConceptID

Language

AbbreviationType

Phrase

2054

MajorLanguage = English

long

truck

MajorLanguage = French

long

camion

MajorLanguage = English

short

trk

Dialect = British English

long

lorry

  • A MajorLanguage of English and long AbbreviationType yields a Phrase of "truck."
  • A MajorLanguage of French and long AbbreviationType yields a Phrase of "camion."
  • A MajorLanguage of English and short AbbreviationType yields a Phrase of "trk."
  • A Dialect of British English and long AbbreviationType yields a Phrase of "lorry."

In practice, many persons could populate data and define redundant TranslationConcepts. Multiple definitions are undesirable but difficult to avoid. These multiple definitions ripple throughout application databases and are difficult to consolidate.

ConceptEquivalence provides a cross reference for synonymous TranslationConcepts and effects a logical merge. (See Chapter 11.) The application tables store translationConceptIDs. ConceptEquivalence serves only as a cross-reference and is not referenced by application tables. (See the Symmetric relationship antipattern in Chapter 8.) Each occurrence of ConceptEquivalence has a preferred TranslationConcept.

The translation service is dedicated software that runs apart from client applications. To use the service, an application database substitutes a translationConceptID for each translatable phrase. For each TranslationConcept, the translation database stores the corresponding Phrases for the pertinent Languages and AbbreviationTypes. (A person must populate the translation database.) Upon request, the service finds the Phrase for the specified TranslationConcept, Language, and AbbreviationType.

Figure 12.8 shows a sample application table that is subject to language-neutral translation. The use of concept IDs works well for a new application. But it would be disruptive for an existing application to change strings to IDs.

Figure 12.8

Figure showing language-neutral translation: Person model.

Language-neutral translation: Person model.

The pseudocode in Figure 12.9 illustrates the logic for finding a phrase, given a TranslationConcept, AbbreviationType, and Language. The basic logic is to first look for an exact match to the target language. Otherwise, if a Dialect is specified, look for the corresponding MajorLanguage. If that fails, then make one more try to look for the AllLanguage record.

Figure 12.9

Figure showing language-neutral translation: Pseudocode for finding a phrase.

Language-neutral translation: Pseudocode for finding a phrase.

12.5 Chapter Summary

A translation service is helpful when software must support multiple languages such as English, French, and Japanese. The need for such a capability often arises and can be delivered as a service apart from any particular application. This chapter presents several approaches to language translation.

Bibliographic Notes

Several commercial products have language translation capabilities including Multilizer, Schaudin, Lionbridge, and Xataface.

The terms internationalization and localization are prominent in the literature. “Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. Localization is the process of adapting software for a specific region or language by adding locale-specific components and translating text.” [Wikipedia] The models in this chapter deal with internationalization. The population of data addresses localization.

References

[Warmer-1999] Jos Warmer and Anneke Kleppe. The Object Constraint Language. Boston: Addison-Wesley, 1999.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset