Index
A
abjad alphabet 3–5
abstract character repertoire 9, 137
abugida 3–4
accent mark 137
accented character 137
ADJUST_BYTE_SEMANTIC_COLUMN_LENGTHS option, LIBNAME statement 60
ADJUST_NCHAR_COLUMN_LENGTHS option, LIBNAME statement 60
Afrikaans language 5, 17, 72
Albanian language
encodings and 77
ISO 8859 standard and 17, 20
scripts and 5
alphabets 2–3, 137
American National Standards Institute (ANSI) 14, 21
American Standard Code for Information Interchange
See ASCII (American Standard Code for Information Interchange)
ANSI (American National Standards Institute) 14, 21
ANSI code pages 21–25
Apple Macintosh encodings 26
Arabic language
ANSI code page 23, 25
characters in 6–7
described 3–4, 138
encodings and 68–69
ISO 8859 standard 18
OEM code page 25
Windows support 90
Armenian language 90
ASCII (American Standard Code for Information Interchange)
coded character sets and 13
described 14–16, 137–138
hexadecimal notation in 81
host-to-host translation tables 42–43
multilingual data handling 116, 120
transcoding errors 67
transport-format translation tables 43–44
ATTRIB statement 49, 116
B
Basic Multilingual Plane (BMP) 32
Basque language 17, 20
Belarusian language 4, 18, 70
Bengali language 4, 70
bidirectional text 18, 38n, 138
Big5 encoding 30–31, 138
big-endian machines 34
bits and bytes 12–13
BLOB data type 55, 83n
BMP (Basic Multilingual Plane) 32
BOM (byte-order mark) 34, 138
Bosnian language 17, 77
Breton language 17, 19–20
Bulgarian language 4, 18, 70
byte-order mark (BOM) 34, 138
BYTE semantics 57–63
bytes and bits 12–13
C
CALL INSERT_HTML function 125
Canada
ANSI code page 25
national-use code positions 14
OEM code page 25
Catalan language 17, 20, 72
CCS
See coded character sets
CCSID (coded character set identifier) 103, 140
CDE (Common Desktop Environment) 95
CEDA (cross-environment data access)
described 140
encoding data sets 51, 53
transcoding problems 106
CES
See character encoding schemes
CHAR data type 55
character encoding
See encoding
character encoding schemes
8-bit 21–28
described 12–13, 139, 141
history of 13–31
problems with encodings 31–36
character repertoires
abstract 9, 137
coded character sets and 12
described 9, 140
CHARACTER semantics 57–63
character sets 13, 123, 140
See also coded character sets
character translation
See translation process
character variable padding (CVP) 106, 111
characters
described 6–9
extended 142
forms of 7
graphic 144
invariant 26
national 14–15, 148
special 108–109, 115, 151
variant 15, 26–28, 30, 153
CHARSET= option 49, 123
Chinese characters
ANSI code page 25
described 4
encodings and 70
extended UNIX code and 13
multi-byte encoding schemes 30
OEM code page 25
simplified 8, 25, 68, 70, 151
traditional 7–8, 25, 68, 70, 151
Windows support 90
Chinese National Standards (CNS) 30, 139
CIMPORT procedure 47, 53
CJK language group
coded character sets for 29–31
described 8–9, 139
Unicode and 32
CJKV language group 139
CLOB data type 55, 83n
CNS (Chinese National Standards) 30, 139
code pages
described 13, 140
OEM 23–25
Windows ANSI 21–24
code point 140
code positions 14–15, 140
code table 140
coded character set identifier (CCSID) 103, 140
coded character sets
See also specific encodings
CJK language group and 29–30
described 12–13, 139–140
history of standards 13–31
problems with 31–36
Coe, Michael D. 1–2
Common Desktop Environment (CDE) 95
computing
bits and bytes 12–13
character encoding standards 13–31
problems with encodings 31–36
consonantal alphabets 3
CONTENTS procedure 52
COPY procedure 106
Cornish language 20, 72
CORRECTENCODING= option, MODIFY statement (DATASETS) 49
CPORT procedure 47, 53
Croatian language
encodings and 77
ISO 8859 standard and 17, 20
scripts and 5
cross-environment data access (CEDA)
described 140
encoding data sets 51, 53
transcoding problems 106
CS
See simplified Chinese characters
CVP (character variable padding) 106, 111
CVPBYTES= option, LIBNAME statement 49
CVPMULT= option, LIBNAME statement 49
Cyrillic alphabet
ANSI code page 22, 25
described 3–4
ISO 8859 standard 18
OEM code page 25
Unicode and 33
Czech language
encodings and 77
ISO 8859 standard and 17, 20
scripts and 5
D
Danish language 5, 20, 72
Data Integration Studio 108–109
data sets
described 150
encoding options 51–54
transcoding problems 106, 110–115
DATA steps 140–141
DATASETS procedure 49, 121
DATASTYLE= system option 87
DB2 RDBMS 57
DBCLIENT_ENCODING_FIXED option, LIBNAME statement 61
DBCLIENT_MAX_BYTES option, LIBNAME statement 60, 62–63
DBCS (double-byte character set)
described 29, 45, 141
transcoding problems 109–110
DBCS system option 45
DBCSLANG= system option 45–46, 48–49, 87–88
DBCSTAB procedure 46
DBCSTYPE= system option 45–46, 48–49,
87–88
DBENCODING= option, IMPORT procedure 128
DBSERVER_ENCODING_FIXED option, LIBNAME statement 61
DBSERVER_MAX_BYTES option, LIBNAME statement 60, 62–63
DEC Multinational Character Set (MCS) 26
Devanagari script 4, 19
device maps 65
DFLANG= system option 87
diacritic 137
Digital Equipment Corporation 26
digraphs 141
double-byte character set (DBCS)
described 29, 45, 141
transcoding problems 109–110
DOWNLOAD procedure 43, 47
Dutch language
encodings and 72
ISO 8859 standard 17, 20
national-use code positions 14
scripts and 5
E
EBCDIC (Extended Binary Coded Decimal Interchange Code)
described 26–28, 142
hexadecimal notation in 80–81
host-to-host translation tables 42–43
multilingual data handling 117–120
transcoding errors 67
transport-format translation tables 43–44
variant characters and 15, 27–28
8-bit encoding schemes 21–26
Einstein, Albert 135
EMU (European Monetary Union) 68
ENCODCOMPAT function 49, 67
encoding
See also character encoding schemes
See also coded character sets
See also troubleshooting encoding problems
additional 8-bit schemes 21–28
ASCII 13–16, 137–138
Big5 30–31, 138
checking 79–82
data sets 51–54
ensuring compatibility 53, 67–79
external files 50–51
in SAS 6 42–46
in SAS 8 46–47
in SAS 9 47–64
ISO 8859 standard 16–21
multi-octet schemes 28–31
of output 64
on UNIX systems 92–94
on Windows systems 88–92
on X Window System 94–103
problems with 31–36
RDBMS tables 54–63
SAS/GRAPH approach to 64–67
scripts and 68
transcoding considerations 34–36, 67–79
ENCODING= data set option 52–53, 111
ENCODING= system option
described 87–88
encoding external files and 50–51
multilingual data handling 116, 119–120
SAS 8 encoding and 46, 48
transcoding problems 106
ENCODISVALID function 49, 67
endianness 34
English language
ANSI code page 25
character repertoire of 9
encodings and 73
ISO 8859 standard 17, 20
national-use code positions 14
OEM code page 25
scripts and 5
Esperanto language 18, 20
Estonian language 18, 78
EUC (extended UNIX code) 13, 141
Euro symbol 19, 68, 105
European Monetary Union (EMU) 68
Extended Binary Coded Decimal Interchange Code
See EBCDIC (Extended Binary Coded Decimal Interchange Code)
extended characters 142
extended UNIX code (EUC) 13, 141
external files 50–51, 115
F
Faeroese language 17, 20, 73
FILE statement 50
FILENAME statement 50
final form of characters 7
Finnish language
encodings and 73
ISO 8859 standard 17, 20
national-use code positions 14
scripts and 5
fonts
described 6, 142
monospaced 147
proportional 149
software 66–67
UNIX system support 96–97
French language
character repertoire of 9
encodings and 74
ISO 8859 standard 17, 20
national-use code positions 14
scripts and 5
Frisian language 20
G
Galacian language 20
garbage characters in output 121–129
GB (Guojia Biaozhun) 30, 142–144
GB 2312 142–143
GB 18030 142
GBK extension 143
Georgian language 90
German language
encodings and 74
ISO 8859 standard 17–18, 20
national-use code positions 14
scripts and 5
GETOPTION function 53
GFONT0.FONTS catalog 65
glyph collection 143
glyph image 143
glyph metrics 143
glyph presentation 143
glyph representation 143
glyph shape 143
glyphs 6–9, 143
graphemes 2, 143
graphic characters 144
Greek language
ANSI code page 23, 25
described 2, 4
encodings and 71
forms of characters 7
ISO 8859 standard 18
OEM code page 25
Unicode and 33
Greenlandic language 18, 20, 74
Guojia Biaozhun (GB) 30, 142–144
H
han character set 144
Han unification 39n, 144
hangul writing system 4, 8, 144
hanja writing system 8, 144
Hanzi script 4
Hardy, G. H. 136
Hart, Edwin 11, 15
Hebrew language
ANSI code page 23, 25
described 3, 5, 18
encodings and 71
garbage characters example 127–129
OEM code page 25
Windows support 90
hexadecimal notation 12, 79–82
$HEXw. format 81
high ASCII characters 142
Hindi language 4, 71
hiragana (syllabary)
described 2, 5, 144
forms of characters 7
homophones 2
host-to-host translation tables 42–43
Hungarian language
encodings and 77
garbage character example 124–126
ISO 8859 standard and 17, 20
scripts and 5
I
I18n (internationalization) 146
IANA Charset Registry 13, 123
IBM 3270 standard 144
ICCCM (Inter-Client Communication Conventions Manual) 95
ICE (internal character encoding) 64–65
Icelandic language
ANSI code page 25
encodings and 75
ISO 8859 standard 17, 19–20
OEM code page 25
scripts and 5
iconv transcoding tool 36
ICU (International Components for Unicode) 145
ideographs (ideograms) 4, 7, 145
IEC (International Electrotechnical Commission) 145
IME (input method editor) 145
IMPORT procedure 128
Indian Standard Code for Information Interchange (ISCII) 19, 68
Indonesian language 5, 75
INENCODING= option, LIBNAME statement 49, 52, 106
INFILE statement 50
Informix RDBMS 57
initial form of characters 7
input method editor (IME) 145
Inter-Client Communication Conventions Manual (ICCCM) 95
internal character encoding (ICE) 64–65
International Components for Unicode (ICU) 145
international configuration option 145
International Electrotechnical Commission (IEC) 145
International Organization for Standardization
See ISO (International Organization for Standardization)
International Reference Version (IRV) 14
internationalization (I18n) 146
invariant characters 26
Irish language 17, 19–20
IRV (International Reference Version) 14
ISCII (Indian Standard Code for Information Interchange) 19, 68
ISO (International Organization for Standardization)
described 4, 14, 145
standards released by 16–21
universal coded character sets 32
ISO Arabic standard 18
ISO Cyrillic standard 18
ISO Greek standard 18
ISO Hebrew standard 18
ISO Latin-1 standard 16–17, 26
ISO Latin-2 standard 17
ISO Latin-3 standard 18
ISO Latin-4 standard 18
ISO Latin-5 standard 18
ISO Latin-6 standard 19
ISO Latin-7 standard 19
ISO Latin-8 standard 19
ISO Latin-9 standard 19
ISO Latin-10 standard 19
ISO 646 standard 14, 146
ISO 2022 standard 30
ISO 8859 family
coded character sets and 13
coverage of languages by 20–21
described 16–21, 146
transcoding problems 107
X Window System and 96–97
ISO 10646 standard 32–33
ISO 15924 standard 4–5
ISO Thai standard 19
isolated form of characters 7
ISPF editor 80
Italian language
encodings and 75
ISO 8859 standard 17–18, 20
national-use code positions 14
scripts and 5
J
jamo 8, 146
Japanese language
ANSI code page 25
Chinese script and 8
described 2, 5
encodings and 71
extended UNIX code and 13
forms of characters 7
garbage characters in output 121, 126–127
multi-byte encoding schemes 30
national-use code positions 14
OEM code page 25
Windows support 90
X resource examples 101–103
Japanese Standards Association (JSA) 30
JIS X 0208 standard 30
JIS X 0212 standard 30
JSA (Japanese Standards Association) 30
K
kana 2, 146
kanji character set 2, 5, 146
katakana (syllabary)
described 2, 5, 146
forms of characters 7
Japanese encoding and 30
Kazakh language 4
KCVT function 51
key maps 65
Konkani language 4
Korean language
ANSI code page 25
described 4–5, 8–9
encodings and 71
extended UNIX code and 13
multi-byte character sets 31
OEM code page 25
Windows support 90
Kurdish language 4
L
L10N (localization) 147
LANG environment variable 94
Lappish language 18
Latin 1 character repertoire 9, 12
Latin 2 character repertoire 9
Latin alphabet
ANSI code pages 21–22, 25
described 2–3, 5
forms of characters 6–7
ISO 8859 standard 16–20
OEM code page 25
pinyin system and 7
rōmaji 2, 150
Unicode and 33
Latvian language
encodings and 78
ISO 8859 standard and 18, 20
scripts and 5
LC_ALL environment variable 94
LC_COLLATE environment variable 94
LC_CTYPE environment variable 94
LC_MESSAGES environment variable 94
LC_MONETARY environment variable 94
LC_NUMERIC environment variable 94
LC_TIME environment variable 94
LIBNAME statement
ADJUST_BYTE_SEMANTIC_COLUMN_LENGTHS option 60
ADJUST_NCHAR_COLUMN_LENGTHS option 60
CVPBYTES= option 49
CVPMULT= option 49
DBCLIENT_ENCODING_FIXED option 61
DBCLIENT_MAX_BYTES option 60,
62–63
DBSERVER_ENCODING_FIXED option 61
DBSERVER_MAX_BYTES option 60, 62–63
INENCODING= option 49, 52, 106
ODSCHARSET= option 49
OUTENCODING= option 49, 52
XMLENCODING= option 49
Lithuanian language
encodings and 79
ISO 8859 language and 18, 20
scripts and 5
little-endian machines 34
LOB data type 55
locale
described 147
on UNIX systems 88–92
on Windows systems 88–92
SAS 6 encoding support 42–46
system 89, 126
locale command 92–93
Locale Setup Manager 83n
Locale Setup Window (LSW) 44
LOCALE= system option 46, 48, 87–88
localization (L10N) 147
logical order 147
logograms (logographs) 4, 147
LSW (Locale Setup Window) 44
Luxembourgish language 17, 20
M
Mac OS Roman encoding 26
Macedonian language 4, 18, 70
Malay language 5, 75
Maltese language 18, 20, 78
Manx language 19–20, 75
Marathi language 4, 71
MBCS (multi-byte character set) 28–31, 148
MCS (Multinational Character Set) 26
medial form of characters 7
METADATA_SETASSN function 107
Microsoft SQL Server 57
modal encodings 29
MODIFY statement, DATASETS procedure 49, 121
mojibake 121, 147
Mongolian language 4
monospaced font 147
morphemes 4, 148
MS-DOS Editor 24
multi-byte character set (MBCS) 28–31, 148
multilingual data 115–121, 148
Multinational Character Set (MCS) 26
MySQL RDBMS 58
N
national characters 14–15, 148
national language support (NLS) 148
natural language 148
NCHAR data type 55, 83n
Nepali language 4
Netezza RDBMS 58
Netherlands
See Dutch language
NLS (national language support) 148
NLSSETUP application 44
NOCLONE option, COPY procedure 106
NODBCS system option 45, 87
NOLOCALELANGCHG system option 87
non-modal encodings 29
NONLSCOMPATMODE option 49, 87
nonspacing character 148
Norwegian language
encodings and 75
ISO 8859 standard 17, 20
national-use code positions 14
scripts and 5
NVARCHAR data type 55
O
Occitan language 17
octets 12, 28–31
od program 80–81
ODS HTML statement 49
ODS MARKUP statement 49
ODSCHARSET= option, LIBNAME statement 49
_ODSOPTIONS_ macro variable 122–123
OEM code pages 23–25
OPTIONS procedure 46, 87
Oracle RDBMS
encoding 55–56, 58, 60–63
NLS_LANG parameter 130–131
troubleshooting encoding problems
129–131
OUTENCODING= option, LIBNAME statement 49, 52
output
encoding 64
garbage characters in 121–129
P
PAPERSIZE= system option 87
Persian language 4, 70
phonemes 2–3, 148
pictographs (pictograms) 7, 149
pinyin writing system 7, 149
Polish language
encodings and 77
ISO 8859 standard and 17, 20
scripts and 5
Windows support 89, 91–92
X resources example 97–99
Portuguese language
ANSI code page 25
encodings and 76
ISO 8859 standard 17, 20
national-use code positions 14
OEM code page 25
scripts and 5
PostgreSQL RDBMS 58
presentation, glyph 143
presentation form 149
proportional font 149
R
radical (Chinese character) 149
RDBMS (relational database management system)
client variable values 55
described 149–150
encoding tables 54–63
problems accessing 127–131
rebus principle 2–3, 7
Regional and Language Options dialog box
89–91
relational database management system (RDBMS)
client variable values 55
described 149–150
encoding tables 54–63
problems accessing 127–131
REMOTE engine 43
Rhaeto-Romance language 17, 20
rōmaji character set 2, 150
Roman8 encoding 26
Romanian language 5, 17, 77
romanization 150
RSASIOTRANSERROR system option 49, 87
Russian language
ANSI code page 25
encodings and 70
ISO 8859 standard and 18
OEM code page 25
scripts and 4
transcoding problems example 113–114
S
Sami language 20
SAS 6 encoding 42–44
SAS 8 encoding 46–47, 106, 111
SAS 9 encoding
data sets 51–54
described 47–50
external files 50–51
of output 64
RDBMS tables 54–63
transcoding problems 106, 111
TrueType fonts 66–67
SAS Data Integration Studio 108–109
SAS Explorer 52
SAS/GRAPH encoding 64–67
SASHELP.FONTS catalog 65–66
SASLCL table 43–44
SASXPT table 43–44
SBCS (single-byte character set) 151
Scottish Gaelic language 17, 19–20
scripts
described 2–5, 150
transcoding problems 106
writing systems and 10n, 68
semantic-phonetic compound characters 7–8
Serbian language
encodings and 78
ISO 8859 standard and 18, 71
scripts and 4
setinit authorization code 150
Shift-JIS
ANSI code pages and 25
described 151
encoding external files and 51
encoding problems 110
multi-octet encoding schemes and 29
OEM code pages and 25
simplified Chinese characters
ANSI code page 25
described 8, 151
encodings and 68, 70
extended UNIX code and 13
OEM code page 25
single-byte character set (SBCS) 151
Slovak language
encodings and 78
ISO 8859 standard and 17, 20
scripts and 5
Slovenian language
encodings and 78
ISO 8859 standard and 17, 21
scripts and 5
software fonts 66–67
software globalization 151
Sorbian language 17, 21
Spanish language
encodings and 76
ISO 8859 standard 17–18, 21
multilingual data handling example
117–119
national-use code positions 14
scripts and 5
special characters
described 151
transcoding problems 108–109, 115
Swahili language 5, 17
Swedish language
ISO 8859 standard 17, 21
national-use code positions 14
scripts and 5
Switzerland national-use code positions 14
Sybase RDBMS 58
syllabaries 3–4, 151
%SYSFUNC macro function 52–53, 68
system locale 89, 126
system options, locale-related 86–88
T
Tagalog language 5
Taiwanese character sets 30–31
Tamil language 5, 79
Tatar language 4
Telugu language 5, 79
Teradata RDBMS 59
terminal emulator 103–104, 151
Thai language
ANSI code page 25
described 5
encodings and 79
ISO 8859 standard 19
OEM code page 25
Windows support 90
TRABASE macro 44
traditional Chinese characters
ANSI code page 25
classes of 7–8
described 8, 151
encodings and 68, 70
extended UNIX code and 13
OEM code page 25
TRANSCODE= option, ATTRIB statement 49, 116
transcoding process
described 34–36
determining encoding compatibility 53, 67–79
multilingual data handling 116
translation tables and 42–46
troubleshooting problems in 105–115
transcription process 152
translation process
described 34
host-to-host translation tables 42–43
transport-format translation tables 42–46
transliteration process 7, 152
transport-format translation tables 42–46
TRANTAB procedure 44
TRANTAB= system option 46, 87–88
troubleshooting encoding problems
encoding and locale-related system options 86–88
garbage characters in output 121–127
general remarks 86
multilingual data handling 115–121
operating system-specific options 88–104
problems accessing RDBMS 127–131
transcoding 105–115
TrueType fonts 66–67
Turkish language
ANSI code page 25
encodings and 78
ISO 8859 standard 18, 21
OEM code page 25
scripts and 5
U
UCS (Universal Character Set) 33
Ukranian language 4, 18, 71
Unicode Consortium 31, 152
Unicode server 152
Unicode standard
data types and 55
DEC MCS and 26
described 13, 31–34, 37n, 152
fundamental principles 32
multilingual data handling 115
transcoding problems 107, 110
Windows operating system and 88
Unicode Transformation Format 8
See UTF-8 (Unicode Transformation Format 8)
Unicode Transformation Format 16 (UTF-16) 13, 152
Unicode Transformation Format 32 (UTF-32) 153
Universal Character Set (UCS) 33
UNIX systems
fonts supported 96–97
iconv transcoding tool 36
od program 80–81
system locale and encoding on 92–94
transcoding problems 105–107, 115
UPLOAD procedure 43, 47
Urdu language 4
user locale 89
UTF-8 (Unicode Transformation Format 8)
described 13, 152
encoding external files and 25
multilingual data handling 116
transcoding problems 106, 110–113, 115
UTF-16 (Unicode Transformation Format 16) 13, 152
UTF-32 (Unicode Transformation Format 32) 153
V
VARCHAR data type 55
variant characters
described 26–27, 153
EBCDIC and 15, 27–28
ISO-646 standard and 15
Japanese encoding 30
Vietnamese language
ANSI code page 21, 25
Chinese script and 8
encodings and 68, 79
Latin script and 5
OEM code page 25
Windows support 90
visual order 153
vowels in alphabets 3
W
Welsh language 5, 19, 21
William of Occam 85
Windows Cyrillic standard 23
Windows Latin-1 standard 22
Windows Latin-2 standard 22
Windows Latin-5 standard 22
Windows operating system
ANSI code pages 21–25
system locale and encoding on 88–92, 126
transcoding problems 105
X Windows System implementation 95
Wolof language 5
writing systems
alphabets 3
categories of 3–5
correlation with languages 68
described 2–3, 153
logographic 4
scripts and 10n
syllabaries 3–4, 151
Unicode and 32
X
X Window System
customizing X resources 96
described 94–95
fonts supported 96–97
loading X resources 95–96
X resource examples 97–103
XAPPLRESDIR environment variable 96
Xhosa language 5
XLFD standard 97
xlsfonts command 96–97
XMLENCODING= option, LIBNAME statement 49
XUSERFILESEARCHPATH environment variable 96
Y
Yoruba language 5
Z
z/OS operating system
encoding data sets and 54
German session encoding example 46–47
multilingual data handling 119
Russian session encoding example 47
system locale and encoding on 103–104
ZTERMCID system variable 103
Zulu language 5