10.1. An Overview of Globalization Support

Oracle's globalization support is a collection of features that allow you to manage data in multiple native languages within the same database instance. It also greatly simplifies application development by offering a rich set of globalization functionality to the developer.

Globalization support provides the character sets and datatypes needed to store multilingual data. It ensures that date, time, monetary, numeric, and calendar data will follow any supported locale conventions and display properly. It provides utilities and error messages translated to many different languages. It also provides the internal functionality to sort and to query multilingual data using proper linguistic rules.

In the following sections, you will learn about Oracle's globalization support features. You will get an overview of each feature and the functionality that it provides.

You will learn about the underlying architecture upon which globalization support is built. You'll be introduced to the National Language Support Runtime Library (NLSRTL) and see how its modular design provides flexibility and saves resources.

You will also learn how applications interact with Oracle from a globalization perspective. And finally, you will be introduced to Unicode and the advantages that it offers in a multilingual environment.

10.1.1. Globalization Support Features

Globalization support provides a rich set of functionality to the Oracle database. But it is important to make two distinctions perfectly clear regarding what globalization support does not do:

  • Globalization support does not translate text into different languages.

  • Globalization does not control how multilingual text is displayed on client machines.

Globalization support simply provides the infrastructure to allow text to be stored, manipulated, sorted, and searched in many languages using linguistically significant means. It also allows the data to be displayed using the standard conventions for a specific region.

Globalization support includes these features:


Language support

Globalization support allows data to be stored, processed, and retrieved in virtually any scripted language. For many of these languages, Oracle also provides additional support such as text-sorting conventions, date-formatting conventions (including translated month names), and even error message and utility interface translation.


Territory support

Cultural conventions often differ between geographical locations. For example, local time format, date format, and numeric and monetary conventions can differ significantly between regions, even though they may share a common language. To allow for these differences, the NLS_TERRITORY parameter can be used to define which conventions to follow.

However, these default settings can still be overridden through the use of NLS parameter settings. Overriding the default settings allows finer granularity in defining and customizing display formats to account for special circumstances. For example, it is possible to set the primary currency to the Japanese yen and the secondary currency to the dollar even with the territory defined as India.


Linguistic sorting and searching

Globalization support offers culturally accurate case conversion, sorting, and searching for all supported languages. It offers the ability to search and sort based on the rules of language rather than simply on the order in which the characters are encoded in the character set. It also offers case-insensitive sorts and searches as well as accent-insensitive sorts and searches.

Linguistic sorts are defined separately from the language itself, allowing the ability to share sort definitions between languages. Linguistic sort defaults can also be overridden through the use of NLS parameter settings. This allows you the flexibility to customize your environment as needed.


Character sets and semantics

Oracle supports a vast number of character sets based on national and international standards, including Unicode. Being offered a wide variety of character sets, users can often find one that supports all of the languages needed in a single set.

Unicode is a universal character set that supports all known written languages. Oracle offers full support of the Unicode 3.2 standard and offers several Unicode encoding options.

Unicode can be defined as the database character set, making it the default datatype for all character columns. If Unicode is not defined as the database character set, it can still be used by defining specific columns as Unicode datatypes(in other words, NCHAR, NVARCHAR2, NCLOB).

Many multi-byte character sets use variable widths when storing data. This means that, depending on the character being stored, Oracle may use anywhere from one to four bytes to store it. Therefore, defining column widths in terms of the number of characters, rather than the number of bytes, becomes crucial. Character semantics allow character data to be specified in terms of the number of characters, regardless of the number of bytes actually required. Byte semantics, the default, assume a single byte character set, where one character always requires one byte of storage.

While Unicode may seem like the logical choice for any database, the decision to use it needs to be weighed carefully. There are performance and space usage penalties associated with using Unicode. If a smaller code set is available that encompasses all of the languages you are likely to ever need, then the overhead of Unicode makes it an illogical choice.


Calendars Different geographic areas often utilize different calendar systems, which can make international transactions hard to synchronize. Oracle supports seven distinct calendar systems: Gregorian, Japanese Imperial, ROC (Republic of China) Official, Thai Buddha, Persian, English Hijrah, and Arabic Hijrah. Globalization support offers functionality to resolve calendar system differences.

Locale and calendar customization Oracle's Locale Builder utility allows customization of globalization definitions, including language, character set, territory, and linguistic sorting. Calendars can also be customized using the NLS Calendar utility. Coverage of Locale Builder and the NLS Calendar utilities fall outside the scope of this book.

10.1.2. Globalization Support Architecture

Globalization support in Oracle 10g is implemented through the Oracle National Language Support Runtime Library (NLSRTL). The NLSRTL offers a set of language-independent text and character-processing functions, as well as functions for language convention manipulation. The behavior of these algorithms is determined at runtime (database startup), as the name suggests.

At database startup time, NLSRTL looks for a file named lx1boot.nlb. This file defines the set of locale definitions available to the database. To determine where to look for this file, NLSRTL will first check the environment for the existence of an ORA_NLS10 variable.

If ORA_NLS10 is defined, it will contain the path to where the lxlboot.nlb file resides. If the variable is not set, the default location of $ORACLE_HOME/nls/data will be used instead.

NOTE

By default, ORA_NLS10 is not set. It should be set only in a multi-homed environment where the locale-specific files are shared.

The lxlboot.nlb file identifies the set of locales available to the NLSRTL. These locales are defined in a collection of locale definition files that reside in the same directory as the 1xlboot.nib file.

There are four types of locale definition files:

  • Language

  • Territory

  • Character set

  • Linguistic sort

Each file contains data relating to only one particular locale type. For each locale type, there can be many different definition files.

For example, one language file will exist for French, one for Italian, and so on. In fact, there are approximately 66 different language files, 120 different territory files, 86 different character sets, and 8 different linguistic sorts.

This modular design of the locale definition files offers several distinct benefits, including the following:

  • By using only the set of locales that you need, memory won't be wasted on unnecessary locales.

  • Locale definitions can be mixed and matched.

  • Locale files can be modified without affecting any other files.

  • New locale files can be created without affecting existing files.

All the locale definition files follow the common naming convention:

CodePositionMeaning
lx1–2The standard prefix for all locale definition files
t3Represents the locale type:

0 = language

1 = territory

2 = character set

3 = linguistic sort
nnnn4–7The object ID (in Hex)
.nlb8–11The standard extension for all locale definition files

For example, the file lx1001b.nlb would represent a territory with a territory ID of 0x001B (decimal 27), as shown in Figure 10.1. This happens to be the territory of Algeria.

The complete set of locale definition files represents the globalization options available inside the database. Locale definitions can also be added or modified to support new functionality.

Figure 10.1. Locale definition file lx001b.nlb

10.1.3. Supporting Multilingual Applications

Globalization allows the database to support multi-tier and client/server applications in any language for which it is configured. Locale-dependent operations are governed by NLS parameters and NLS environment variables set on both the client and server sides.

In this section, you will learn how client applications interact with the server from a globalization viewpoint. You will learn the purpose of the character sets defined at database creation time. You'll learn how data conversion issues can affect session performance. And finally, you'll learn how clients resolve globalization environment differences when they connect to a server.

10.1.3.1. Database Character Sets

When a database is created, two session-independent NLS parameters are specified: the database character set and the national character set.

The database character set defines the character set that will govern default text storage in the database. This includes all CHAR, VARCHAR2, LONG, and fixed-width CLOB data as well as all SQL and PL/SQL text.

The national character set is an alternate Unicode character set that governs NCHAR, NVARCHAR2, and NCLOB data.

Together, these two settings define the available character sets for the database.

10.1.3.2. Automatic Data Conversion

When a client makes a connection to a database server, the character sets used on both the client and server are compared. If they do not match, Oracle will need to perform automatic data conversion to resolve the difference. There is overhead involved in this conversion process as well as a risk of data loss. Performance will be affected relative to the level of conversion required.

The exception to this rule is when the database character set is a strict superset of the client character set. Two things must be true in order to classify a character set as a strict superset of another:

  • The superset must contain all of the characters defined in the subset.

  • The encoded values of all characters defined in the subset must match their encoded values in the superset.

If Oracle determines that both of these requirements are met, it will not perform automatic data conversion, because it is not necessary.

10.1.3.3. Resolving Client/Server Settings

Any application that connects to the server is considered to be a client, in terms of globalization. Even if the application lives on the same physical machine as the server, it will still be classified as a client. This includes middle-tier application servers. Therefore, from a globalization perspective, all applications are governed by client-side NLS parameters.

When a client application is run, the client NLS environment is initialized from the environment variable settings. All local NLS operations are executed using these settings. Local NLS operations are client operations performed independently of any Oracle server session (for example, display formatting in Oracle Developer applications).

When the application completes a connection to the database server, the resulting session is initialized with the NLS environment settings of the server.

However, immediately after the session is established, the client implicitly issues an ALTER SESSION statement to synchronize the session NLS environment to match the client's NLS environment. In fact, the session environment can be modified at any time by using the ALTER SESSION statement, as shown here:

SQL*Plus: Release 10.1.0.2.0 - Production on Sat Aug 28 14:02:56 2004

Copyright (c) 1982, 2004, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select sysdate from dual;

SYSDATE
---------
28-AUG-04

SQL> alter session set NLS_LANGUAGE=French;

Session altered.

SQL> select sysdate from dual;

SYSDATE
-----------
28-AUT −04

SQL> alter session set NLS_LANGUAGE=Italian;

Session altered.

SQL> select sysdate from dual;

SYSDATE
---------
28-AGO-04

Remember, however, that using ALTER SESSION changes only the session NLS environment. It does not change the client NLS environment.

10.1.4. Using Unicode in a Multilingual Database

Unicode is a universal character set that encompasses all known written languages in the world. Historically, dealing with multiple languages in a database or an application has been a difficult proposition. Existing character sets have always been too limited. Many don't even offer all the characters required for a single language, much less for all languages!

To support a wide variety of languages, it often meant that applications, databases, and programs using different character sets would have to be able to interact and exchange data, all with proper data conversion taking place every step of the way.

To address this problem, Unicode was created with this simple motto:

Unicode provides a unique number for every character,

no matter what the platform,

no matter what the program,

no matter what the language.

Source: http://www.unicode.org/standard/WhatlslInicode.html

Unicode assigns a guaranteed unique value (known as a code point) to every character to assure that no conflicts exist. Oracle supports version 3.2 of Unicode and offers the encoding methods listed in Table 10.1.

Unicode can be used in Oracle in several ways:

  • It can be defined as the database character set, thereby becoming the default for all SQL CHAR datatypes (CHAR, VARCHAR2, CLOB, and LONG). In this setup, the UTF-8 encoding method will be used.

  • It can be used as needed by creating columns using the NCHAR datatypes, also known as Unicode datatypes (NCHAR, NVARCHAR2, and NCLOB). Unicode data can be encoded as either UTF-8 or UTF-16 when used in this scenario.

Table 10.1. Oracle-Supported Unicode Encoding Methods
Encoding MethodDescription
UTF-8An eight-bit encoding method that uses one to four bytes to store characters, as needed. UTF-8 is a strict superset of ASCII, meaning that every character in the ASCII character set is not only represented in the UTF-8 character set, but that they have the same code point value in both character sets. UTF-8 is supported on Unix platforms, HTML, and most Internet browsers.
UCS-2A fixed-width, 16-bit encoding method, meaning that each character is stored in two bytes. Both Microsoft Windows NT and Java support UCS-2 encoding. UCS-2 supports the older Unicode 3 standard; therefore it does not support supplementary characters.
UTF-16A strict superset of UCS-2 and offers support of supplementary charac- ters by using two UCS-2 code points for each supplementary character. Newer versions of Windows (2000, XP) are based on this encoding method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset