Index

A

abjad alphabet 3–5

abstract character repertoire 9, 137

abugida 3–4

accent mark 137

accented character 137

ADJUST_BYTE_SEMANTIC_COLUMN_LENGTHS option, LIBNAME statement 60

ADJUST_NCHAR_COLUMN_LENGTHS option, LIBNAME statement 60

Afrikaans language 5, 17, 72

Albanian language

encodings and 77

ISO 8859 standard and 17, 20

scripts and 5

alphabets 2–3, 137

American National Standards Institute (ANSI) 14, 21

American Standard Code for Information Interchange

See ASCII (American Standard Code for Information Interchange)

ANSI (American National Standards Institute) 14, 21

ANSI code pages 21–25

Apple Macintosh encodings 26

Arabic language

ANSI code page 23, 25

characters in 6–7

described 3–4, 138

encodings and 68–69

ISO 8859 standard 18

OEM code page 25

Windows support 90

Armenian language 90

ASCII (American Standard Code for Information Interchange)

coded character sets and 13

described 14–16, 137–138

hexadecimal notation in 81

host-to-host translation tables 42–43

multilingual data handling 116, 120

transcoding errors 67

transport-format translation tables 43–44

ATTRIB statement 49, 116

B

Basic Multilingual Plane (BMP) 32

Basque language 17, 20

Belarusian language 4, 18, 70

Bengali language 4, 70

bidirectional text 18, 38n, 138

Big5 encoding 30–31, 138

big-endian machines 34

bits and bytes 12–13

BLOB data type 55, 83n

BMP (Basic Multilingual Plane) 32

BOM (byte-order mark) 34, 138

Bosnian language 17, 77

Breton language 17, 19–20

Bulgarian language 4, 18, 70

byte-order mark (BOM) 34, 138

BYTE semantics 57–63

bytes and bits 12–13

C

CALL INSERT_HTML function 125

Canada

ANSI code page 25

national-use code positions 14

OEM code page 25

Catalan language 17, 20, 72

CCS

See coded character sets

CCSID (coded character set identifier) 103, 140

CDE (Common Desktop Environment) 95

CEDA (cross-environment data access)

described 140

encoding data sets 51, 53

transcoding problems 106

CES

See character encoding schemes

CHAR data type 55

character encoding

See encoding

character encoding schemes

8-bit 21–28

described 12–13, 139, 141

history of 13–31

problems with encodings 31–36

character repertoires

abstract 9, 137

coded character sets and 12

described 9, 140

CHARACTER semantics 57–63

character sets 13, 123, 140

See also coded character sets

character translation

See translation process

character variable padding (CVP) 106, 111

characters

described 6–9

extended 142

forms of 7

graphic 144

invariant 26

national 14–15, 148

special 108–109, 115, 151

variant 15, 26–28, 30, 153

CHARSET= option 49, 123

Chinese characters

ANSI code page 25

described 4

encodings and 70

extended UNIX code and 13

multi-byte encoding schemes 30

OEM code page 25

simplified 8, 25, 68, 70, 151

traditional 7–8, 25, 68, 70, 151

Windows support 90

Chinese National Standards (CNS) 30, 139

CIMPORT procedure 47, 53

CJK language group

coded character sets for 29–31

described 8–9, 139

Unicode and 32

CJKV language group 139

CLOB data type 55, 83n

CNS (Chinese National Standards) 30, 139

code pages

described 13, 140

OEM 23–25

Windows ANSI 21–24

code point 140

code positions 14–15, 140

code table 140

coded character set identifier (CCSID) 103, 140

coded character sets

See also specific encodings

CJK language group and 29–30

described 12–13, 139–140

history of standards 13–31

problems with 31–36

Coe, Michael D. 1–2

Common Desktop Environment (CDE) 95

computing

bits and bytes 12–13

character encoding standards 13–31

problems with encodings 31–36

consonantal alphabets 3

CONTENTS procedure 52

COPY procedure 106

Cornish language 20, 72

CORRECTENCODING= option, MODIFY statement (DATASETS) 49

CPORT procedure 47, 53

Croatian language

encodings and 77

ISO 8859 standard and 17, 20

scripts and 5

cross-environment data access (CEDA)

described 140

encoding data sets 51, 53

transcoding problems 106

CS

See simplified Chinese characters

CVP (character variable padding) 106, 111

CVPBYTES= option, LIBNAME statement 49

CVPMULT= option, LIBNAME statement 49

Cyrillic alphabet

ANSI code page 22, 25

described 3–4

ISO 8859 standard 18

OEM code page 25

Unicode and 33

Czech language

encodings and 77

ISO 8859 standard and 17, 20

scripts and 5

D

Danish language 5, 20, 72

Data Integration Studio 108–109

data sets

described 150

encoding options 51–54

transcoding problems 106, 110–115

DATA steps 140–141

DATASETS procedure 49, 121

DATASTYLE= system option 87

DB2 RDBMS 57

DBCLIENT_ENCODING_FIXED option, LIBNAME statement 61

DBCLIENT_MAX_BYTES option, LIBNAME statement 60, 62–63

DBCS (double-byte character set)

described 29, 45, 141

transcoding problems 109–110

DBCS system option 45

DBCSLANG= system option 45–46, 48–49, 87–88

DBCSTAB procedure 46

DBCSTYPE= system option 45–46, 48–49,
87–88

DBENCODING= option, IMPORT procedure 128

DBSERVER_ENCODING_FIXED option, LIBNAME statement 61

DBSERVER_MAX_BYTES option, LIBNAME statement 60, 62–63

DEC Multinational Character Set (MCS) 26

Devanagari script 4, 19

device maps 65

DFLANG= system option 87

diacritic 137

Digital Equipment Corporation 26

digraphs 141

double-byte character set (DBCS)

described 29, 45, 141

transcoding problems 109–110

DOWNLOAD procedure 43, 47

Dutch language

encodings and 72

ISO 8859 standard 17, 20

national-use code positions 14

scripts and 5

E

EBCDIC (Extended Binary Coded Decimal Interchange Code)

described 26–28, 142

hexadecimal notation in 80–81

host-to-host translation tables 42–43

multilingual data handling 117–120

transcoding errors 67

transport-format translation tables 43–44

variant characters and 15, 27–28

8-bit encoding schemes 21–26

Einstein, Albert 135

EMU (European Monetary Union) 68

ENCODCOMPAT function 49, 67

encoding

See also character encoding schemes

See also coded character sets

See also troubleshooting encoding problems

additional 8-bit schemes 21–28

ASCII 13–16, 137–138

Big5 30–31, 138

checking 79–82

data sets 51–54

ensuring compatibility 53, 67–79

external files 50–51

in SAS 6 42–46

in SAS 8 46–47

in SAS 9 47–64

ISO 8859 standard 16–21

multi-octet schemes 28–31

of output 64

on UNIX systems 92–94

on Windows systems 88–92

on X Window System 94–103

problems with 31–36

RDBMS tables 54–63

SAS/GRAPH approach to 64–67

scripts and 68

transcoding considerations 34–36, 67–79

ENCODING= data set option 52–53, 111

ENCODING= system option

described 87–88

encoding external files and 50–51

multilingual data handling 116, 119–120

SAS 8 encoding and 46, 48

transcoding problems 106

ENCODISVALID function 49, 67

endianness 34

English language

ANSI code page 25

character repertoire of 9

encodings and 73

ISO 8859 standard 17, 20

national-use code positions 14

OEM code page 25

scripts and 5

Esperanto language 18, 20

Estonian language 18, 78

EUC (extended UNIX code) 13, 141

Euro symbol 19, 68, 105

European Monetary Union (EMU) 68

Extended Binary Coded Decimal Interchange Code

See EBCDIC (Extended Binary Coded Decimal Interchange Code)

extended characters 142

extended UNIX code (EUC) 13, 141

external files 50–51, 115

F

Faeroese language 17, 20, 73

FILE statement 50

FILENAME statement 50

final form of characters 7

Finnish language

encodings and 73

ISO 8859 standard 17, 20

national-use code positions 14

scripts and 5

fonts

described 6, 142

monospaced 147

proportional 149

software 66–67

UNIX system support 96–97

French language

character repertoire of 9

encodings and 74

ISO 8859 standard 17, 20

national-use code positions 14

scripts and 5

Frisian language 20

G

Galacian language 20

garbage characters in output 121–129

GB (Guojia Biaozhun) 30, 142–144

GB 2312 142–143

GB 18030 142

GBK extension 143

Georgian language 90

German language

encodings and 74

ISO 8859 standard 17–18, 20

national-use code positions 14

scripts and 5

GETOPTION function 53

GFONT0.FONTS catalog 65

glyph collection 143

glyph image 143

glyph metrics 143

glyph presentation 143

glyph representation 143

glyph shape 143

glyphs 6–9, 143

graphemes 2, 143

graphic characters 144

Greek language

ANSI code page 23, 25

described 2, 4

encodings and 71

forms of characters 7

ISO 8859 standard 18

OEM code page 25

Unicode and 33

Greenlandic language 18, 20, 74

Guojia Biaozhun (GB) 30, 142–144

H

han character set 144

Han unification 39n, 144

hangul writing system 4, 8, 144

hanja writing system 8, 144

Hanzi script 4

Hardy, G. H. 136

Hart, Edwin 11, 15

Hebrew language

ANSI code page 23, 25

described 3, 5, 18

encodings and 71

garbage characters example 127–129

OEM code page 25

Windows support 90

hexadecimal notation 12, 79–82

$HEXw. format 81

high ASCII characters 142

Hindi language 4, 71

hiragana (syllabary)

described 2, 5, 144

forms of characters 7

homophones 2

host-to-host translation tables 42–43

Hungarian language

encodings and 77

garbage character example 124–126

ISO 8859 standard and 17, 20

scripts and 5

I

I18n (internationalization) 146

IANA Charset Registry 13, 123

IBM 3270 standard 144

ICCCM (Inter-Client Communication Conventions Manual) 95

ICE (internal character encoding) 64–65

Icelandic language

ANSI code page 25

encodings and 75

ISO 8859 standard 17, 19–20

OEM code page 25

scripts and 5

iconv transcoding tool 36

ICU (International Components for Unicode) 145

ideographs (ideograms) 4, 7, 145

IEC (International Electrotechnical Commission) 145

IME (input method editor) 145

IMPORT procedure 128

Indian Standard Code for Information Interchange (ISCII) 19, 68

Indonesian language 5, 75

INENCODING= option, LIBNAME statement 49, 52, 106

INFILE statement 50

Informix RDBMS 57

initial form of characters 7

input method editor (IME) 145

Inter-Client Communication Conventions Manual (ICCCM) 95

internal character encoding (ICE) 64–65

International Components for Unicode (ICU) 145

international configuration option 145

International Electrotechnical Commission (IEC) 145

International Organization for Standardization

See ISO (International Organization for Standardization)

International Reference Version (IRV) 14

internationalization (I18n) 146

invariant characters 26

Irish language 17, 19–20

IRV (International Reference Version) 14

ISCII (Indian Standard Code for Information Interchange) 19, 68

ISO (International Organization for Standardization)

described 4, 14, 145

standards released by 16–21

universal coded character sets 32

ISO Arabic standard 18

ISO Cyrillic standard 18

ISO Greek standard 18

ISO Hebrew standard 18

ISO Latin-1 standard 16–17, 26

ISO Latin-2 standard 17

ISO Latin-3 standard 18

ISO Latin-4 standard 18

ISO Latin-5 standard 18

ISO Latin-6 standard 19

ISO Latin-7 standard 19

ISO Latin-8 standard 19

ISO Latin-9 standard 19

ISO Latin-10 standard 19

ISO 646 standard 14, 146

ISO 2022 standard 30

ISO 8859 family

coded character sets and 13

coverage of languages by 20–21

described 16–21, 146

transcoding problems 107

X Window System and 96–97

ISO 10646 standard 32–33

ISO 15924 standard 4–5

ISO Thai standard 19

isolated form of characters 7

ISPF editor 80

Italian language

encodings and 75

ISO 8859 standard 17–18, 20

national-use code positions 14

scripts and 5

J

jamo 8, 146

Japanese language

ANSI code page 25

Chinese script and 8

described 2, 5

encodings and 71

extended UNIX code and 13

forms of characters 7

garbage characters in output 121, 126–127

multi-byte encoding schemes 30

national-use code positions 14

OEM code page 25

Windows support 90

X resource examples 101–103

Japanese Standards Association (JSA) 30

JIS X 0208 standard 30

JIS X 0212 standard 30

JSA (Japanese Standards Association) 30

K

kana 2, 146

kanji character set 2, 5, 146

katakana (syllabary)

described 2, 5, 146

forms of characters 7

Japanese encoding and 30

Kazakh language 4

KCVT function 51

key maps 65

Konkani language 4

Korean language

ANSI code page 25

described 4–5, 8–9

encodings and 71

extended UNIX code and 13

multi-byte character sets 31

OEM code page 25

Windows support 90

Kurdish language 4

L

L10N (localization) 147

LANG environment variable 94

Lappish language 18

Latin 1 character repertoire 9, 12

Latin 2 character repertoire 9

Latin alphabet

ANSI code pages 21–22, 25

described 2–3, 5

forms of characters 6–7

ISO 8859 standard 16–20

OEM code page 25

pinyin system and 7

rōmaji 2, 150

Unicode and 33

Latvian language

encodings and 78

ISO 8859 standard and 18, 20

scripts and 5

LC_ALL environment variable 94

LC_COLLATE environment variable 94

LC_CTYPE environment variable 94

LC_MESSAGES environment variable 94

LC_MONETARY environment variable 94

LC_NUMERIC environment variable 94

LC_TIME environment variable 94

LIBNAME statement

ADJUST_BYTE_SEMANTIC_COLUMN_LENGTHS option 60

ADJUST_NCHAR_COLUMN_LENGTHS option 60

CVPBYTES= option 49

CVPMULT= option 49

DBCLIENT_ENCODING_FIXED option 61

DBCLIENT_MAX_BYTES option 60,
62–63

DBSERVER_ENCODING_FIXED option 61

DBSERVER_MAX_BYTES option 60, 62–63

INENCODING= option 49, 52, 106

ODSCHARSET= option 49

OUTENCODING= option 49, 52

XMLENCODING= option 49

Lithuanian language

encodings and 79

ISO 8859 language and 18, 20

scripts and 5

little-endian machines 34

LOB data type 55

locale

described 147

on UNIX systems 88–92

on Windows systems 88–92

SAS 6 encoding support 42–46

system 89, 126

locale command 92–93

Locale Setup Manager 83n

Locale Setup Window (LSW) 44

LOCALE= system option 46, 48, 87–88

localization (L10N) 147

logical order 147

logograms (logographs) 4, 147

LSW (Locale Setup Window) 44

Luxembourgish language 17, 20

M

Mac OS Roman encoding 26

Macedonian language 4, 18, 70

Malay language 5, 75

Maltese language 18, 20, 78

Manx language 19–20, 75

Marathi language 4, 71

MBCS (multi-byte character set) 28–31, 148

MCS (Multinational Character Set) 26

medial form of characters 7

METADATA_SETASSN function 107

Microsoft SQL Server 57

modal encodings 29

MODIFY statement, DATASETS procedure 49, 121

mojibake 121, 147

Mongolian language 4

monospaced font 147

morphemes 4, 148

MS-DOS Editor 24

multi-byte character set (MBCS) 28–31, 148

multilingual data 115–121, 148

Multinational Character Set (MCS) 26

MySQL RDBMS 58

N

national characters 14–15, 148

national language support (NLS) 148

natural language 148

NCHAR data type 55, 83n

Nepali language 4

Netezza RDBMS 58

Netherlands

See Dutch language

NLS (national language support) 148

NLSSETUP application 44

NOCLONE option, COPY procedure 106

NODBCS system option 45, 87

NOLOCALELANGCHG system option 87

non-modal encodings 29

NONLSCOMPATMODE option 49, 87

nonspacing character 148

Norwegian language

encodings and 75

ISO 8859 standard 17, 20

national-use code positions 14

scripts and 5

NVARCHAR data type 55

O

Occitan language 17

octets 12, 28–31

od program 80–81

ODS HTML statement 49

ODS MARKUP statement 49

ODSCHARSET= option, LIBNAME statement 49

_ODSOPTIONS_ macro variable 122–123

OEM code pages 23–25

OPTIONS procedure 46, 87

Oracle RDBMS

encoding 55–56, 58, 60–63

NLS_LANG parameter 130–131

troubleshooting encoding problems
129–131

OUTENCODING= option, LIBNAME statement 49, 52

output

encoding 64

garbage characters in 121–129

P

PAPERSIZE= system option 87

Persian language 4, 70

phonemes 2–3, 148

pictographs (pictograms) 7, 149

pinyin writing system 7, 149

Polish language

encodings and 77

ISO 8859 standard and 17, 20

scripts and 5

Windows support 89, 91–92

X resources example 97–99

Portuguese language

ANSI code page 25

encodings and 76

ISO 8859 standard 17, 20

national-use code positions 14

OEM code page 25

scripts and 5

PostgreSQL RDBMS 58

presentation, glyph 143

presentation form 149

proportional font 149

R

radical (Chinese character) 149

RDBMS (relational database management system)

client variable values 55

described 149–150

encoding tables 54–63

problems accessing 127–131

rebus principle 2–3, 7

Regional and Language Options dialog box
89–91

relational database management system (RDBMS)

client variable values 55

described 149–150

encoding tables 54–63

problems accessing 127–131

REMOTE engine 43

Rhaeto-Romance language 17, 20

rōmaji character set 2, 150

Roman8 encoding 26

Romanian language 5, 17, 77

romanization 150

RSASIOTRANSERROR system option 49, 87

Russian language

ANSI code page 25

encodings and 70

ISO 8859 standard and 18

OEM code page 25

scripts and 4

transcoding problems example 113–114

S

Sami language 20

SAS 6 encoding 42–44

SAS 8 encoding 46–47, 106, 111

SAS 9 encoding

data sets 51–54

described 47–50

external files 50–51

of output 64

RDBMS tables 54–63

transcoding problems 106, 111

TrueType fonts 66–67

SAS Data Integration Studio 108–109

SAS Explorer 52

SAS/GRAPH encoding 64–67

SASHELP.FONTS catalog 65–66

SASLCL table 43–44

SASXPT table 43–44

SBCS (single-byte character set) 151

Scottish Gaelic language 17, 19–20

scripts

described 2–5, 150

transcoding problems 106

writing systems and 10n, 68

semantic-phonetic compound characters 7–8

Serbian language

encodings and 78

ISO 8859 standard and 18, 71

scripts and 4

setinit authorization code 150

Shift-JIS

ANSI code pages and 25

described 151

encoding external files and 51

encoding problems 110

multi-octet encoding schemes and 29

OEM code pages and 25

simplified Chinese characters

ANSI code page 25

described 8, 151

encodings and 68, 70

extended UNIX code and 13

OEM code page 25

single-byte character set (SBCS) 151

Slovak language

encodings and 78

ISO 8859 standard and 17, 20

scripts and 5

Slovenian language

encodings and 78

ISO 8859 standard and 17, 21

scripts and 5

software fonts 66–67

software globalization 151

Sorbian language 17, 21

Spanish language

encodings and 76

ISO 8859 standard 17–18, 21

multilingual data handling example
117–119

national-use code positions 14

scripts and 5

special characters

described 151

transcoding problems 108–109, 115

Swahili language 5, 17

Swedish language

ISO 8859 standard 17, 21

national-use code positions 14

scripts and 5

Switzerland national-use code positions 14

Sybase RDBMS 58

syllabaries 3–4, 151

%SYSFUNC macro function 52–53, 68

system locale 89, 126

system options, locale-related 86–88

T

Tagalog language 5

Taiwanese character sets 30–31

Tamil language 5, 79

Tatar language 4

Telugu language 5, 79

Teradata RDBMS 59

terminal emulator 103–104, 151

Thai language

ANSI code page 25

described 5

encodings and 79

ISO 8859 standard 19

OEM code page 25

Windows support 90

TRABASE macro 44

traditional Chinese characters

ANSI code page 25

classes of 7–8

described 8, 151

encodings and 68, 70

extended UNIX code and 13

OEM code page 25

TRANSCODE= option, ATTRIB statement 49, 116

transcoding process

described 34–36

determining encoding compatibility 53, 67–79

multilingual data handling 116

translation tables and 42–46

troubleshooting problems in 105–115

transcription process 152

translation process

described 34

host-to-host translation tables 42–43

transport-format translation tables 42–46

transliteration process 7, 152

transport-format translation tables 42–46

TRANTAB procedure 44

TRANTAB= system option 46, 87–88

troubleshooting encoding problems

encoding and locale-related system options 86–88

garbage characters in output 121–127

general remarks 86

multilingual data handling 115–121

operating system-specific options 88–104

problems accessing RDBMS 127–131

transcoding 105–115

TrueType fonts 66–67

Turkish language

ANSI code page 25

encodings and 78

ISO 8859 standard 18, 21

OEM code page 25

scripts and 5

U

UCS (Universal Character Set) 33

Ukranian language 4, 18, 71

Unicode Consortium 31, 152

Unicode server 152

Unicode standard

data types and 55

DEC MCS and 26

described 13, 31–34, 37n, 152

fundamental principles 32

multilingual data handling 115

transcoding problems 107, 110

Windows operating system and 88

Unicode Transformation Format 8

See UTF-8 (Unicode Transformation Format 8)

Unicode Transformation Format 16 (UTF-16) 13, 152

Unicode Transformation Format 32 (UTF-32) 153

Universal Character Set (UCS) 33

UNIX systems

fonts supported 96–97

iconv transcoding tool 36

od program 80–81

system locale and encoding on 92–94

transcoding problems 105–107, 115

UPLOAD procedure 43, 47

Urdu language 4

user locale 89

UTF-8 (Unicode Transformation Format 8)

described 13, 152

encoding external files and 25

multilingual data handling 116

transcoding problems 106, 110–113, 115

UTF-16 (Unicode Transformation Format 16) 13, 152

UTF-32 (Unicode Transformation Format 32) 153

V

VARCHAR data type 55

variant characters

described 26–27, 153

EBCDIC and 15, 27–28

ISO-646 standard and 15

Japanese encoding 30

Vietnamese language

ANSI code page 21, 25

Chinese script and 8

encodings and 68, 79

Latin script and 5

OEM code page 25

Windows support 90

visual order 153

vowels in alphabets 3

W

Welsh language 5, 19, 21

William of Occam 85

Windows Cyrillic standard 23

Windows Latin-1 standard 22

Windows Latin-2 standard 22

Windows Latin-5 standard 22

Windows operating system

ANSI code pages 21–25

system locale and encoding on 88–92, 126

transcoding problems 105

X Windows System implementation 95

Wolof language 5

writing systems

alphabets 3

categories of 3–5

correlation with languages 68

described 2–3, 153

logographic 4

scripts and 10n

syllabaries 3–4, 151

Unicode and 32

X

X Window System

customizing X resources 96

described 94–95

fonts supported 96–97

loading X resources 95–96

X resource examples 97–103

XAPPLRESDIR environment variable 96

Xhosa language 5

XLFD standard 97

xlsfonts command 96–97

XMLENCODING= option, LIBNAME statement 49

XUSERFILESEARCHPATH environment variable 96

Y

Yoruba language 5

Z

z/OS operating system

encoding data sets and 54

German session encoding example 46–47

multilingual data handling 119

Russian session encoding example 47

system locale and encoding on 103–104

ZTERMCID system variable 103

Zulu language 5

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset