Selecting and Configuring Character Sets
This
chapter provides an overview of character sets and discusses how to:
- Select character sets.
- Convert between character sets.
- Set data field length checking.
- Use CJK ideographic characters in name character
fields.
- Detect and convert characters.
This
section discusses:
- Character sets.
- Common character sets.
- The Unicode standard.
- Non-Unicode character sets.
- Character sets across the tiers of the PeopleSoft
architecture.
Before you install your PeopleSoft system,
Oracle recommends that you choose an appropriate character set for PeopleSoft
client workstations, web servers, application servers, and database servers, as
well as for file attachment storage locations (that is, FTP sites, HTTP
repositories, and database tables).
A character set, also known as a code
page, is an ordered set of characters in which each character is mapped to
a numeric index, called a codepoint. This codepoint stores character
data in a computer system. Many hundreds of character sets exist. Some are
international standards, maintained by the International Organization for
Standardization (ISO), some are country-specific standards, and others are
specific to a particular computer system vendor. Given the number of separate
computers that are involved in a typical PeopleSoft installation, it is likely
that your system uses several different character sets.
Although there is general agreement on the
content and arrangement of most character sets, especially those that are
maintained by the ISO, many different names are used by vendors and software
packages for similar or identical character sets. US-ASCII encodes the basic
characters and symbols that are needed to write the English language. However,
US-ASCII is limited to 127 characters and cannot represent many characters that
are needed by Western European languages, such as French and German, let alone
ideographic languages, such as Japanese and Chinese, in which each character
represents a word or concept. Many character sets, however, include all
US-ASCII characters in addition to their other characters.
The following table illustrates just a few
common character sets that you are likely to encounter and some of the names
that are used by different vendors to refer to them:
Character Set
|
Description and
Comments
|
Type
|
PeopleSoft and
SQR Name
|
Oracle DBMS Name
|
Microsoft
Windows Name
|
ISO 8859-1
|
Western European
Latin-1. Contains characters that are required to represent Western European
languages. However, does not include the euro symbol, the trademark (TM)
symbol, or the oe ligature.
|
ISO
|
LATIN1 or
ISO_8859-1
|
WE8ISO8859P1
|
CP28591
|
Microsoft Code
Page 1252
|
Microsoft Code
Page 1252 - Western European. Very similar to ISO 8859-1, except for the
inclusion of additional characters. Includes the euro symbol, trademark (TM)
symbol, and oe ligature, but using a different codepoint than ISO
8859-15.
|
Vendor
(Microsoft)
|
CP1252
|
WE8MSWIN1252
|
CP1252
|
ISO 8859-2
|
Central/Eastern
European Latin-2. Contains characters that are required for Central European
languages, including Czech, Hungarian, and Polish. Does not include the euro
symbol.
|
ISO
|
LATIN2 or
ISO_8859-2
|
EE8ISO8859P2
|
CP28592
|
ISO 8859-15
|
Western European
extended Latin-9. Similar to ISO 8859-1, but contains the euro symbol, oe
ligature, and several characters that are required for Icelandic.
|
ISO
|
LATIN9 or
ISO_8859-15
|
WE8ISO8859P15
|
CP28605
|
Shift-JIS
|
Most common
Japanese character set. Defines thousands of characters for writing Japanese.
|
Country (Japan)
|
SJIS
|
JA16SJIS or
JA16SJISTILDE
|
CP932
|
IBM CCSID 37
|
IBM Coded
Character Set ID 37. Western European Multilingual EBCDIC-based character set.
|
Vendor (IBM)
|
EBCDIC
|
WE8EBCDIC37
|
CP1140
|
GB18030
|
Chinese national
character set
|
Country (China)
|
GB18030
|
GB18030
|
GB18030
|
Some of these character sets, such as ISO
8859-1 and IBM CCSID 37, require only one byte to represent each character. For
example, in ISO 8859-1, the hexadecimal number 61 represents the lowercase
Latin letter a. However, larger character sets, such as Shift-JIS, may
require more than one byte to represent each character.
The most important consideration when dealing
with character sets across a system is ensuring that all characters that you
plan to represent within the PeopleSoft system exist in the character set that
is used by each component of the system.
For example, if you plan to maintain Japanese
characters in employee names, you must ensure that:
- The character set that is used by the database system
includes Japanese characters.
- Each external system feeding into or out of the
PeopleSoft system expects data in a character set that includes Japanese
characters.
- Workstations and printers are installed with fonts that
include those characters.
For example, the Japanese Shift-JIS character
set contains Japanese and many US-ASCII characters; it is sufficient for encoding
both English data and the primary characters that are required in Japanese.
However, it does not include the accented Latin characters that are needed for
French, German, and other languages, so it is not a suitable character set for
implementations that encompass Western European countries.
Given the sample list of common character sets
in the previous table and the number of languages that are required by a
typical global PeopleSoft implementation, selecting a character set can be
daunting, especially when you are planning to support a large list of
languages.
To simplify this situation, an industry
consortium of vendors devised a universal character standard: the Unicode
standard. This internationally recognized character standard represents every
character that is required to write virtually every written language. The
Unicode standard was developed and is maintained by the Unicode Consortium in
conjunction with ISO. This standard shares the character repertoire with
ISO/IEC standard 10646: the Universal Multiple-Octet Coded Character Set (UCS),
also known as the Universal Character Set for short.
The PeopleSoft system uses Unicode throughout
PeopleTools to simplify character handling. The PeopleSoft system allows the
use of Unicode within PeopleSoft databases to enable you to maintain a single
database with characters from virtually any language.
The Unicode standard and the ISO 10646
standard are available from their respective organizations.
Unicode defines a code space of more than one
million code points (or characters). Unicode code points are referred to by
writing “U+” followed by the hexadecimal number—for example, U+0000, U+0061,
U+FFFF, U+27741, and so on. To manage such a large repertoire of characters,
Unicode defines multiple planes each comprising 65,533 code points, or
character positions. Plane 0, covering the range of U+0000 to U+FFFF, is known
as the Basic Multilingual Plane (BMP) and is generally sufficient for almost
all modern languages. The other planes are intended to encode extended
ideographic characters, archaic scripts, and other rarely used characters (such
as advanced mathematical symbols). All characters from planes outside the BMP
are known as supplementary characters.
PeopleTools fully supports the use of
characters from the BMP only; supplementary character support is limited to
display in the browser, storage in the database, and reporting output in BI
Publisher. A tool that can be used to view Unicode character properties is
http://www.unicode.org/charts/unihan.html. If a character is at codepoint
U+20000 or higher, it is a supplementary character.
Several different Unicode encoding forms have
been standardized based on two encoding methodologies. Unicode defines these
two encoding methods: the Unicode Transformation Format (UTF), and UCS. An
encoding maps the Unicode code points (or perhaps a subset of code points) to
sequences of code values of some fixed size. In UTF encodings, the number in
the encoding name indicates the number of bits per code value; in UCS
encodings, the number indicates the number of bytes per code value.
Four common encodings of Unicode are widely
used:
- UTF-32 — a 32-bit, fixed-width encoding; equivalent to
UCS-4.
- UTF-16 — a 16-bit, variable-width encoding.
- UTF-8 — an 8-bit, variable-width encoding, which
maximizes compatibility with ASCII.
- UCS-2 — a 2-byte, fixed-width encoding; a subset of
UTF-16 supporting characters in the BMP only.
Other Unicode encodings—such as, CESU-8,
Java’s Modified UTF-8, UTF-1, and others—have specific, and sometimes internal,
applications and are not widely used for the interchange of information.
While the PeopleSoft system currently supports
only the UTF-8 and UCS-2 encodings, the following table presents a brief
comparison of all four common encodings:
This section includes Unicode encoding
examples for the following characters:
Character
|
Unicode Code
Point
|
Description
|
a
|
U+0061
|
Latin small
letter a.
|
ñ
|
U+00F1
|
Latin small
letter ñ.
|
€
|
U+20AC
|
Euro symbol
|
The following table shows the hexadecimal
representation of these characters in each of the four Unicode encodings:
Unicode Encoding
|
Latin Small
Letter a
(U+0061) |
Latin Small
Letter ñ
(U+00F1) |
€ Symbol
(U+20AC) |
UTF-32
|
0x0000006
|
0x000000F1
|
0x000020AC
|
UTF-16
|
0x0061
|
0x00F1
|
0x20AC
|
UTF-8
|
0x61
|
0xC3B1
|
0xE282AC
|
UCS-2
|
0x0061
|
0x00F1
|
0x20AC
|
See Also
Although
much of the PeopleSoft system runs by using Unicode, you can configure several
components with a non-Unicode character set. When making these choices, you
should understand the types of character sets other than Unicode that exist.
This
section discusses:
- Single-byte character sets (SBCSs).
- Double-byte character sets (DBCSs).
Note. For the sake of terminology, some systems, such as Microsoft
Windows, refer to two types of character sets: Unicode and ANSI. ANSI, in this
context, refers to the American National Standards Institute, which maintains
equivalent standards for many national and international standard character
sets. Informally, ANSI character sets refer to non-Unicode character sets,
which can be any international, national, or vendor standard character set,
such as those that are discussed at the beginning of this chapter.
Most character sets use one byte to represent
each character and are therefore known as SBCSs. These character sets are
relatively simple and can represent up to 255 unique characters. Examples of
SBCSs are ISO 8859-1 (Latin1), ISO 8859-2 (Latin2), Microsoft CP1252 (similar
to Latin1, but vendor specific), and IBM CCSID 37.
DBCSs use one or two bytes to represent each
character and are typically used for writing ideographic scripts, such as
Japanese, Chinese, and Korean. Most DBCSs allow a mix of one-byte and two-byte
characters, so you cannot assume an even-string byte length. Encoding with a
mix of one- and two-byte characters is also known as variable-width encoding,
and such a character set is sometimes referred to as a multi-byte character set
(MBCS).
The PeopleSoft system supports two types of
DBCSs:
- Nonshifting
- Shifting
The difference between these types of DBCSs is
in the way in which the system determines whether a particular byte represents
one character or is part of a two-byte character.
PeopleSoft installations include multiple
components, each of which must be able to handle differing character sets.
PeopleSoft application servers and clients
(for example, the PeopleTools development environment and PeopleSoft Pure
Internet Architecture pages) use Unicode exclusively and do not rely on other
character sets to represent and process data. However, depending on your
environment, not all system components support Unicode-encoded data. Therefore,
you might not be able to run all parts of your system in Unicode. For example,
some database platforms and third-party products do not support Unicode. The
following table illustrates support for Unicode in the PeopleSoft system.
Tier
|
Component
|
Unicode Support
|
Client
|
PeopleTools
development environment
|
Yes
|
PeopleSoft Pure
Internet Architecture pages
|
Yes
|
|
Web server
|
Web server
|
Yes
|
Application
server
|
Application
server
|
Yes
|
Database server
|
Non-Unicode DB
(Western European or Japanese)
|
No
|
Unicode DB
|
Yes
|
|
File attachment
storage location
|
FTP server
|
Yes
|
HTTP repository
|
Yes
|
|
Database table
|
See the previous
entry for the database server tier
|
Examples of how to configure these tiers are
provided in this chapter.
In addition to the tiers listed in the
previous table, PeopleTools enables you to configure these system components to
use other character sets:
The character set that is used for PeopleSoft
COBOL processing must match the character set of the database. If you created a
Unicode database for the PeopleSoft implementation, you must also run COBOL in
Unicode.
All direct file I/O operations in PeopleTools,
including file layout objects, trace and log files, and file operations from
Structured Query Report (SQR) programs can be performed in Unicode or any
supported non-Unicode character set. This is useful in situations in which you
must interface with an external system that does not support Unicode.
Some third-party products that are supported
by PeopleTools do not yet support Unicode. In this case, PeopleTools converts
application data to a specific, non-Unicode character set before communicating
with these tools. Check the product documentation for your third-party
application regarding Unicode compliance before determining how the application
and the PeopleSoft system will interoperate.
When Unicode is not used for any of these
types of operations or data storage, the PeopleSoft system transparently
handles the conversion from Unicode to a non-Unicode character set. The
non-Unicode character set that is used depends on several settings, which are
discussed in detail later in this chapter.
The character sets that the PeopleSoft system
supports are defined in the PSCHARSETS table. The following table lists these
character sets and the names by which they may be referred to in PeopleSoft
applications. You may need to know the correct character set name to use in
several situations including:
- In PeopleCode programs for manipulating file layout
objects.
- In the Unix/Linux application server configuration to
determine the default, non-Unicode character set for log files, trace
files, and operating system interfaces.
- When creating your database.
Refer to your hardware and software
requirements guide for details about the character sets that are supported for
your database platform.
Description and
Comments
|
Character Set
Type
|
|
Current
ANSI-based code page.
Not really a
character set, but causes the system to use the default non-Unicode character
set of the host operating system.
|
SBCS or DBCS,
depending on the host operating system.
|
|
ASCII
|
7–bit US-ASCII
|
SBCS
|
Big5
|
Big5 (Traditional
Chinese)
|
Nonshifting DBCS
|
CCSID1027
|
IBM EBCDIC 1027
(Japanese-Latin)
|
SBCS
|
CCSID1047
|
IBM EBCDIC 1047
(Latin1)
|
SBCS
|
CCSID2901
|
IBM EBCDIC 290
(Katakana)
|
SBCS
|
CCSID3001
|
IBM EBCDIC 300
(Kanji)
|
Nonshifting DBCS
|
CCSID9302
|
IBM EBCDIC 930
(Kana-Kanji)
|
Shifting DBCS
|
CCSID9352
|
IBM EBCDIC 935
(Simplified Chinese)
|
Shifting DBCS
|
CCSID9372
|
IBM EBCDIC 937
(Traditional Chinese)
|
Shifting DBCS
|
CCSID9392
|
IBM EBCDIC 939
(Latin-Kanji)
|
Shifting DBCS
|
CCSID942
|
IBM EBCDIC 942
(Japanese PC)
|
Nonshifting DBCS
|
CP1026
|
Windows 1026
(EBCDIC)
|
SBCS
|
CP1250
|
Windows 1250
(Eastern Europe)
|
SBCS
|
CP1251
|
Windows 1251
(Cyrillic)
|
SBCS
|
CP1252
|
Windows 1252
(Western Europe)
|
SBCS
|
CP1253
|
Windows 1253
(Greek)
|
SBCS
|
CP1254
|
Windows 1254
(Turkish)
|
SBCS
|
CP1255
|
Windows 1255
(Hebrew)
|
SBCS
|
CP1256
|
Windows 1256
(Arabic)
|
SBCS
|
CP1257
|
Windows 1257
(Baltic)
|
SBCS
|
CP1258
|
Windows 1258
(Vietnamese)
|
SBCS
|
CP1361
|
Windows 1361
(Korean Johab)
|
SBCS
|
CP437
|
MS-DOS 437 (U.S.)
|
SBCS
|
CP500
|
Windows 500
(EBCDIC 500V1)
|
SBCS
|
CP708
|
Windows 708
(Arabic - ASMO708)
|
SBCS
|
CP720
|
Windows 720
(Arabic - ASMO)
|
SBCS
|
CP737
|
Windows 737
(Greek - 437G)
|
SBCS
|
CP775
|
Windows 775
(Baltic)
|
SBCS
|
CP850
|
MS-DOS 850
(Western Europe)
|
SBCS
|
CP852
|
MS-DOS 852
(Eastern Europe)
|
SBCS
|
CP855
|
MS-DOS 855 (IBM
Cyrillic)
|
SBCS
|
CP857
|
MS-DOS 857 (IBM
Turkish)
|
SBCS
|
CP860
|
MS-DOS 860 (IBM
Portuguese)
|
SBCS
|
CP861
|
MS-DOS 861
(Icelandic)
|
SBCS
|
CP862
|
MS-DOS 862
(Hebrew)
|
SBCS
|
CP863
|
MS-DOS 863
(Canadian French)
|
SBCS
|
CP864
|
MS-DOS 864
(Arabic)
|
SBCS
|
CP865
|
MS-DOS 865
(Nordic)
|
SBCS
|
CP866
|
MS-DOS 866
(Russian)
|
SBCS
|
CP869
|
MS-DOS 869
(Modern Greek)
|
SBCS
|
CP870
|
Windows 870
|
SBCS
|
CP874
|
Windows 874
(Thai)
|
SBCS
|
CP875
|
Windows 875
(EBCDIC)
|
SBCS
|
CP932
|
Windows 932
(Japanese)
|
Nonshifting DBCS
|
CP936
|
Windows 936
(Simplified Chinese)
|
Nonshifting DBCS
|
CP949
|
Windows 949
(Korean)
|
Nonshifting DBCS
|
CP950
|
Windows 950
(Traditional Chinese)
|
Nonshifting DBCS
|
EBCDIC
|
IBM EBCDIC
CCSID37 (USA)
|
SBCS
|
EUC-JP
|
Extended UNIX
code (Japanese)
|
Nonshifting DBCS
|
EUC-KR
|
Extended UNIX
code (Korean)
|
Nonshifting DBCS
|
EUC-TW
|
Extended UNIX
code (Taiwan)
|
Nonshifting DBCS
|
EUC-TW-1986
|
Extended UNIX
code (TW-1986)
|
Nonshifting DBCS
|
GB12345
|
GB 2312
(Simplified Chinese)
|
Nonshifting DBCS
|
GB18030
|
GB18030
(Simplified Chinese)
|
Nonshifting DBCS
|
GB2312
|
GB 2312
(Simplified Chinese)
|
Nonshifting DBCS
|
HKSCS
|
Hong Kong
Supplementary Character Set
|
Nonshifting DBCS
|
ISO-2022-JP2, 3
|
ISO-2022-JP
Japanese
|
Shifting DBCS
|
ISO-2022-KR2
|
ISO-2022-JP
Korean
|
Shifting DBCS
|
ISO_8859-1
|
ISO 8859-1
(Latin1)
|
SBCS
|
ISO_8859-10
|
ISO 8859-10
(Latin6)
|
SBCS
|
ISO_8859-11
|
ISO 8859-11
(Thai)
|
SBCS
|
ISO_8859-14
|
ISO 8859-14
(Latin8)
|
SBCS
|
ISO_8859-15
|
ISO 8859-15
(Latin9/Latin0)
|
SBCS
|
ISO_8859-2
|
ISO 8859-2
(Latin2)
|
SBCS
|
ISO_8859-3
|
ISO 8859-3
(Latin3)
|
SBCS
|
ISO_8859-4
|
ISO 8859-4
(Latin4)
|
SBCS
|
ISO_8859-5
|
ISO 8859-5
(Cyrillic)
|
SBCS
|
ISO_8859-6
|
ISO 8859-6
(Arabic)
|
SBCS
|
ISO_8859-7
|
ISO 8859-7
(Greek)
|
SBCS
|
ISO_8859-8
|
ISO 8859-8
(Hebrew)
|
SBCS
|
ISO_8859-9
|
ISO 8859-9
(Latin5)
|
SBCS
|
JIS_X02011
|
Japanese
Half-width Katakana
|
Nonshifting DBCS
|
JIS_X_0208
|
Japanese Kanji
|
Nonshifting DBCS
|
Java
|
Java (Unicode
encoding)
|
Unicode
|
Johab
|
Johab (Korean)
|
Nonshifting DBCS
|
Shift_JIS
|
Shift-JIS
(Japanese)
|
Nonshifting DBCS
|
UCS2
|
Unicode UCS-2
|
Unicode
|
UTF-8
|
Unicode UTF-8
|
Unicode
|
UTF71
|
Unicode UTF-7.
(An outdated Unicode 7-bit clean transformation that is sometimes used for
email that must pass through gateways that do not support 8-bit characters.)
|
Unicode
|
UTF8
|
Unicode UTF-8
|
Unicode
|
UTF8BOM
|
Unicode UTF-8
with BOM (byte-order mark)
|
Unicode
|
1 Not commonly used.
2 In the PeopleSoft system, shifting DBCSs have limited usage,
such as for file I/O, and are not supported for use as a database character
set.
3 To use certain Windows-31J (also known as Microsoft CP932)
characters in incoming or outgoing email messages, you must complete additional
configuration of your web server (incoming email) and application server or
PeopleSoft Process Scheduler (outgoing email).
This PeopleBook also contains information
about supported character set encodings for globalization when using SQR for
PeopleSoft.
See Also
PeopleTools 8.52 Hardware and Software
Requirements Guide
For more information and code charts for
Microsoft code pages, visit http://msdn.microsoft.com/en-us/goglobal/bb964654.aspx
This section provides an overview of selecting
character sets and discuses how to:
- Select database character sets.
- Select application server character sets.
- Select and manage client workstation character sets.
- Select email character sets.
See Also
When configuring your PeopleSoft system, you
need to consider the character set (or sets) that will be in use on the
following tiers:
- Client.
- Web server.
- Application server.
- Database server.
- File attachment storage location (FTP site, HTTP
repository, or database table).
- Email.
Some operations of your PeopleSoft system
require the interaction of multiple tiers. For example, the uploading of a file
attachment involves the browser on the client, the web server, the application
server, the database server, and ultimately the file attachment storage
location. To ensure the correct transfer of data and files between these tiers,
Oracle recommends configuring each server tier (web server, application server,
database server, and file storage location) to use the same character set as
follows:
- If your PeopleSoft system operates in a multi-language
environment, use a UTF-8 character set on each server tier.
- If your PeopleSoft system operates in a single language
environment, use the native language character set for that language on
each server tier. Alternatively, you could use a UTF-8 character set on
each server tier, which would provide more flexibility than using the
native language character set.
Clients can always be configured to use the
native language of the user of that workstation or browser.
The following table depicts example character
set settings across all tiers for three typical configurations—a multi-language
environment, a single language environment (Western), and a single language
environment (non-Western).
Note. This table shows examples for a particular
combination of languages and platforms; your specific configuration could
differ.
Tier (Platform)
|
Multi-Language
|
Single Language
(Western: French) |
Single Language
(Non-Western: Japanese) |
Where to Check
|
Client (Windows)
|
Any (for example,
English uses CP1252).
|
French (uses
CP1252).
|
Japanese (uses
CP932).
|
Start, Settings,
Control Panel, Regional Options
|
Web server
(Linux) – Shell processes
|
en_US.utf8
|
fr_FR.iso88915
|
ja_JP.sjis
|
locale command
|
Application
server (Linux) – PSAPPSRV processes
|
utf-8
|
latin15
|
sjis
|
psappsrv.cfg
[PSTOOLS] Character Set |
Application
server (Linux) – Email processes
|
utf-8
|
utf-8
|
utf-8
|
psappsrv.cfg
[SMTP Settings] SMTP Character Set |
Application
server (Linux) – Shell processes
|
en_US.utf8
|
fr_FR.iso88915
|
ja_JP.sjis
|
locale command
|
Database server
(Oracle)
|
AL32UTF8
|
WE8ISO885915
|
JS16SJISTILDE
|
NLS_DATABASE_PARAMETERS
|
File attachments:
FTP site4 (Linux) – Shell processes
|
en_US.utf8
|
fr_FR.iso88915
|
ja_JP.sjis
|
locale command
|
4 For file attachments, if the storage location is a database
table or an HTTP repository, then the configuration of one of the other server tiers
will also configure the character set in use for a file attachment storage
location on that tier. Specifically, a database table as a storage location
depends on the settings for the database server; an HTTP file repository as a
storage location depends on the web server settings if the HTTP repository is
deployed on the web server. In the preceding table, information is provided for
an FTP site as a storage location only because an FTP site can be deployed
independently from the other server tiers.
Failure to configure character sets correctly
across server tiers can result in garbled file names.
See Also
The primary character set decision that you
must make when installing a PeopleSoft implementation is which character set to
use for the database system. Ideally, all databases are encoded in Unicode;
however, in some cases Unicode requires several bytes to represent each
character when only one byte may be required in a non-Unicode character set.
Therefore, the PeopleSoft system enables you to use certain non-Unicode
character sets for the database.
By using a Unicode encoded database, you can
maintain a single database with data in any combination of languages. A single
PeopleSoft application server can serve multiple users connecting to the
mixed-language database, regardless of the language or character set of those
users’ client machines. The only restriction on a user’s ability to access
mixed-language data is the capability of the user’s client workstation to
interpret, display, and accept keyboard entry of the characters from the various
languages.
Most
language or region-specific non-Unicode character sets provide sufficient
characters for only a few languages. If you create a non-Unicode database, you
must ensure that all of the characters for all of the languages that you plan
on using can be represented in the character set that you choose.
The following table lists whether a PeopleSoft
language is supported in a Unicode or non-Unicode database character set:
Language Code
|
Language
|
Database
Character Set
|
ARA
|
Arabic
|
Unicode
|
BUL
|
Bulgarian
|
Unicode
|
CFR
|
Canadian French
|
Unicode or
non-Unicode
|
CRO
|
Croatian
|
Unicode
|
CZE
|
Czech
|
Unicode
|
DAN
|
Danish
|
Unicode or
non-Unicode
|
DUT
|
Dutch
|
Unicode or
non-Unicode
|
ENG
|
US English
|
Unicode or
non-Unicode
|
FIN
|
Finnish
|
Unicode or
non-Unicode
|
ESP
|
Spanish
|
Unicode or
non-Unicode
|
FRA
|
French
|
Unicode or
non-Unicode
|
GER
|
German
|
Unicode or
non-Unicode
|
HUN
|
Hungarian
|
Unicode
|
ITA
|
Italian
|
Unicode or
non-Unicode
|
JPN
|
Japanese
|
Unicode or
non-Unicode
|
KOR
|
Korean
|
Unicode
|
NOR
|
Norwegian
|
Unicode or
non-Unicode
|
POL
|
Polish
|
Unicode
|
POR
|
Portuguese
|
Unicode or
non-Unicode
|
ROM
|
Romanian
|
Unicode
|
RUS
|
Russian
|
Unicode
|
SER
|
Serbian
|
Unicode
|
SLK
|
Slovak
|
Unicode
|
SLV
|
Slovenian
|
Unicode
|
SVE
|
Swedish
|
Unicode or
non-Unicode
|
THA
|
Thai
|
Unicode
|
UKE
|
English
|
Unicode or
non-Unicode
|
ZHS
|
Simplified
Chinese
|
Unicode
|
ZHT
|
Traditional
Chinese
|
Unicode
|
Depending on the data that you store and how
the database stores Unicode characters, a Unicode database can be significantly
larger than a non-Unicode database. However, only the storage of character data
is affected; the space that is required for non-character data, such as numbers
and dates (which are stored by the database system as numbers), is not
affected.
Depending on the database platform, you can
use one of the four character set types (SBCS, nonshifting DBCS, shifting DBCS,
or Unicode) when creating the database. However, the number of characters that
you can store in each column is affected greatly by the type of character set
that you choose for the database encoding.
See Also
PeopleTools 8.52 Hardware and Software
Requirements Guide
PeopleTools 8.52 installation guide for your
database platform
Your operating system and database guides
All data that is stored in memory and
processed by the PeopleTools application server is held in Unicode. However,
the application server allows files on the server (created through PeopleCode
file layout objects) and log and trace files to be Unicode or non-Unicode.
Although the PeopleSoft application server uses Unicode internally for all data
processing, it can create these files in Unicode or in a non-Unicode character set.
Each PeopleSoft application server is
configured with a default non-Unicode character set. If a file operation must
create a non-Unicode file, this character set is used, unless another character
set is explicitly specified in the file operation. For example, if you create a
file layout object to write a non-Unicode file, but you don’t specify in which
character set the file should be created, the default non-Unicode character set
of the application server is used.
Microsoft Windows enables you to change the
default character set of the system, although as installed, the default
character set matches the default locale of the Microsoft Windows installation.
To change the system default locale (and therefore the character set), on
Microsoft Windows servers, use the Control Panel’s Regional Options menu. In
the Language settings for the system section, click the Set Default button.
When running on Unix/Linux, the PeopleSoft
application server enables you to specify the default non-Unicode character set
in the application server’s configuration file, which you select by using the
PSADMIN tool. Any valid PeopleSoft character set with a character set type of
SBCS or nonshifting DBCS is a valid default non-Unicode character set for
PeopleSoft application servers that run on Unix/Linux.
See Also
You must consider the client components of
PeopleTools when you are planning your language strategy. The requirements for
language support on client workstations are different, depending on whether you
are using the PeopleSoft Pure Internet Architecture or the PeopleTools
development tools for Microsoft Windows.
This section discusses:
- Character sets and fonts in the PeopleSoft Pure
Internet Architecture.
- Fonts and the PeopleTools development environment.
- Input methods.
The PeopleSoft Pure Internet Architecture
serves all HTML pages in the UTF-8 encoding of Unicode. This encoding is
recognized automatically by the web browser, because the encoding of the page
is announced in the HTTP header when the browser communicates with the web
server. All browsers supported by PeopleTools can support UTF-8 encoded HTML
pages.
However, the browser needs other components to
correctly display and enter the vast array of characters that are available in
Unicode. Specifically, you need appropriate fonts to display the various
scripts in which you expect data to be maintained. In addition, you might need
alternate keyboard layouts or, in the case of ideographic scripts such as
Chinese, Japanese, and Korean, you need input method editors (IMEs) to convert
sequences of keystrokes into ideographs. The requirement for alternate keyboard
and IMEs is the same for both the PeopleSoft Pure Internet Architecture and the
PeopleTools development environment.
Not all fonts contain a full repertoire of
Unicode characters, because many fonts are tailored to address a specific list
of languages and contain only the glyphs that are required by those languages.
If you try to view Unicode data with a font that does not contain the
appropriate characters for the displayed language, you will most likely see
square boxes in place of the appropriate characters. The data has not been
corrupted; there is just no glyph available in the current font for the
character that the system is trying to display. For this reason, you may need
to license or configure several fonts for a global PeopleSoft system.
The
PeopleSoft Pure Internet Architecture includes a set of style sheets, defined
with Application Designer, that determine the font that is used to display HTML
pages. In some cases, the application data may contain characters that are not
present in this font and that require a different font.
The Albany TrueType fonts shipped in the PS_HOME\fonts\truetype
directory support all of the languages supported by the PeopleSoft system.
Alternatively, you may need to obtain and configure fonts that contain the
characters for the languages that you are planning to use, if your workstations
are not already configured with these fonts. Obtain fonts from the following
sources:
- Many Microsoft Windows and other operating system
applications are packaged with Unicode fonts containing glyphs covering a
large range of languages.
Microsoft Office is packaged with several
fonts containing a large portion of the characters in Unicode, including the
Microsoft Sans Serif font. Use these fonts in the PeopleSoft Pure Internet
Architecture by specifying them in the Application Designer style sheet
definitions or by following the browser-specific instructions in this section.
- Many
public domain fonts exist that contain a large character repertoire for
use in web browsers. The unifont.org web site is one location to get
additional information on public domain fonts.
Some of these foundries include Monotype,
Bitstream, and Tiro Typeworks.
Depending on your browser, you can also
download fonts from your browser’s manufacturer.
To enable the display of GB18030 characters,
you can use either the SimSun-18030 font from Microsoft or the Albany fonts
shipped in the PS_HOME\fonts directory. Both of these fonts have glyphs
for the supported ranges of the GB18030 character set.
PeopleTools enables you to specify the font
that is used for all graphical components for all PeopleTools modules that run
on Windows, such as Application Designer. Use these methods to specify fonts:
This setting affects the font that is used by
all of the designer components of PeopleTools, including all of the text that
is contained in the Microsoft Windows resource files
Changing this font setting may be necessary if
your workstation’s default locale does not contain the characters that are used
for the language that you are attempting to display or maintain. For example,
if you are attempting to view Japanese characters on an English Microsoft
Windows workstation, you can change the PeopleSoft Configuration Manager font
setting to select a font that contains the characters for the language that you
are trying to display.
The Albany TrueType fonts shipped in PS_HOME\fonts
directory support all of the languages supported by the PeopleSoft system.
In addition, several fonts that are shipped
with Microsoft Windows and Microsoft Office, including Arial Unicode MS and
Microsoft Sans Serif, contain a large number of glyphs covering most of the
languages that are supported by the Unicode character set. Microsoft Windows
can also be configured with fonts for most worldwide languages by selecting the
required languages under the Regional Settings Control Panel menu.
The PeopleCode editor in Application Designer
also enables you to select a font for character display in the editor’s window
itself. This is useful if the PeopleCode programs that you are working on
contain Unicode characters. To set the font in Application Designer, open the
PeopleCode program, select Edit, Display Fonts and Color.
If users will enter translated data by using
PeopleSoft Pure Internet Architecture or the PeopleTools development
environment, you must ensure that an appropriate keyboard layout or input
method editor is installed on the workstation.
Most alphabetic languages can be typed by
using a relatively simple keyboard layout. Several specialized keyboard layouts
exist for most languages; configure these keyboard layouts through your
operating system. For example, a Spanish keyboard layout contains keys for the
n-tilde character (ñ) and several other accented characters.
However, certain PeopleSoft hot keys do not
work as expected on alternate, non-U.S. keyboard layouts. For example, Alt+', Alt+\,
and Alt+/ do not produce the
expected results on the AZERTY keyboard. This occurs because some keys on
non-U.S. keyboards produce different key codes than the same key on a U.S.
keyboard (also known as a QWERTY keyboard).
A solution to this problem can be found in the
appendix.
There are several ways of entering these
characters by using a nonlocalized keyboard. Your operating system manual can
help you use specialized keyboard layouts, such as the English international
layout, which enables you to enter accented characters by using two keystrokes.
The Microsoft web site contains information about keyboards that are supported
by Microsoft Windows and instructions for installing and configuring Windows
keyboard layouts.
Ideographic
languages, such as Chinese, Japanese, and Korean, require the use of a
front-end processor to intercept multiple keyboard strokes and transform them
into an ideographic character. These are known as IMEs, and they must be
installed on each workstation where you plan to enter the ideographic
languages.
Most
localized versions of operating systems for these languages come preconfigured
with IMEs that are appropriate for the language that is supported by the
operating system. But on systems where the default locale is not Chinese,
Japanese, or Korean, you may need to configure or license an IME from a
third-party vendor. The PeopleSoft Pure Internet Architecture supports any IME
that is supported by your browser. The designer tools in Microsoft Windows
support all standard Microsoft IMEs.
The PeopleSoft system supports UTF-8 for
outgoing Simple Mail Transfer Protocol (SMTP) email messages from PeopleTools
application servers. In addition, the PeopleSoft system supports several
additional encodings for outgoing email.
PeopleSoft application servers support the
following for outgoing email:
- UTF-8 (default).
- ISO-2022-JP, Shift_JIS, EUC-JP (for Japanese).
- ISO-2022-KR, EUC-KR (for Korean).
- GBK, Big5, GB18030 (for Chinese).
Specifying Email Character Sets
You specify an email character set in the
SMTPCharacterSet parameter in the application server configuration file,
psappsrv.cfg. By default, the SMTPCharacterSet parameter is set to UTF-8.
Note. You should specify a value for the
SMTPCharacterSet. If you do not specify a value for the parameter, email is
sent as-is, with no encoding. Leave the parameter set to the default value of
UTF-8 if you are not certain about which value to use.
For example, to use ISO-2022-JP encoding for
outgoing SMTP mail, in the psappsrv.cfg file, set the SMTPCharacterSet
parameter to ISO-2022-JP, as shown in the following example:
[SMTP Setting]
...
SMTPCharacterSet=ISO-2022-JP
SMTPEncodingDLL=blank
You can also write your own SMTPEncodingDLL
modules, if necessary.
To use certain Windows-31J (also known as
Microsoft CP932) characters—specifically, NEC special characters, NEC-selected
IBM extended characters, IBM extension characters, and user-defined
characters—in incoming or outgoing email messages with the ISO-2022-JP Japanese
character set, you must complete additional configuration of your web server
(for incoming email) and application server or PeopleSoft Process Scheduler
(for outgoing email).
For incoming email on the web server, the
following JVM setting must be added to the JAVA_OPTIONS_WIN32 parameter in the
setenv.cmd file:
SET JAVA_OPTIONS_WIN32=
"-Dsun.nio.cs.map=x-windows-iso2022jp/ISO-2022-JP"
For outgoing email on the application server
or PeopleSoft Process Scheduler, the following JVM option must be added to
either the psappsrv.cfg file or the psprcs.cfg file depending on whether the
application server or an AE program, respectively, will be handling outgoing
email messages. JVM options are set in the PSTOOLS section of the file:
[PSTOOLS]
...
JavaVM Options=-Dsun.nio.cs.map=x-windows-iso2022jp/ISO-2022-JP
In addition, your web server, application
server, and PeopleSoft Process Scheduler must be using a Java Runtime
Environment (JRE) or Java Development Kit (JDK) that is supported for extended
Japanese characters. See the release notes on My Oracle Support website.
See My Oracle Support, Knowledge, Tools and
Technology, Documentation, Release Notes.
You can use PeopleCode file functions to
convert files or text strings from one supported PeopleSoft character set to
another supported PeopleSoft character set.
PeopleCode operations such as GetFile,
GetTempFile, Open, ReadLine, and WriteLine automatically account for the file
encoding. Therefore, you can:
- Convert from a Unicode character set to a non-Unicode
character set.
- Convert from a non-Unicode character set to a Unicode
character set.
- Automatically handle the Unicode BOM if it is present
or needs to be written.
When using a character set such as UCS2 or
UTF8BOM, the BOM is added at the beginning of the file contents when using
WriteLine. The BOM is skipped when read by the ReadLine PeopleCode function,
and not interpreted as a text character. Since the BOM is recognised as meta
data and not part of the file's text the BOM is not added to file contents when
writing in a non-Unicode character set with the WriteLine PeopleCode function.
In the following example PeopleCode program,
the FileEncodingConversion function handles converting files from one supported
character set to another. In the body of the program, the function is called to
convert from the UTF8BOM character set to the UCS2 character set.:
REM this function, FileEncodingConversion() converts character
encoding of
input file to another of output file.
REM an example of how to call the function is at the end of this
file.
Local File &InputFile, &OutputFile;
Local string &InputDirFile, &OutputDirFile,
&InputFilename, &OutputFilename;
Local string &sDirSep, &LogLine;
Local array of string &FIleEncoding;
Local boolean &ret;
Function FileEncodingConversion(&InputEncoding,
&InputDirectoryFile,
&OutputEncoding, &OutputDirectoryFile) Returns boolean
&InputFile =
GetFile(&InputDirectoryFile, "R", &InputEncoding,
%FilePath_Absolute);
&OutputFile =
GetFile(&OutputDirectoryFile, "W", &OutputEncoding,
%FilePath_Absolute);
If &InputFile.IsOpen
And
&OutputFile.IsOpen Then
While
(&InputFile.readline(&LogLine))
&OutputFile.Writeline(&LogLine);
End-While;
&InputFile.Close();
&OutputFile.Close();
Return True;
Else
If &InputFile =
Null Then
WinMessage("Error: PeopleCode: File Encoding I/O: " |
"Failed to
open: " | &InputFile.Name);
Else
If &OutputFile
= Null Then
WinMessage("Error: PeopleCode: File Encoding I/O: " |
"Failed
to open:" | &OutputFile.Name);
End-If;
End-If;
Return False;
End-If;
End-Function;
/*-----------------------------------------------------------------------*/
/* Function IsUnix
*/
/* check if OS = Unix
*/
/*-----------------------------------------------------------------------*/
Function IsUnix Returns boolean
&DummyFile =
GetFile("/bin/sh", "E", %FilePath_Absolute);
If &DummyFile.IsOpen
Then;
&DummyFile.Close();
Return True;
Else;
Return False;
End-If;
End-Function;
REM test the function above;
&FIleEncoding = CreateArray("UTF8BOM",
"UCS2", "SJIS", "GB18030", "UTF8", ⇒
"a", "u");
REM WinMessage("ret: " | &ret);
If IsUnix() Then
/* for UNIX */
&ret =
FileEncodingConversion(&FIleEncoding [1], "/home/FS_" |
&FIleEncoding [1] | ".txt", &FIleEncoding [2],
"/home/BEFORE/FS_PCode_" | ⇒
&FIleEncoding [1] | "_to_" | &FIleEncoding [2]
| ".txt");
Else
/* for WINDWOS */
&ret =
FileEncodingConversion(&FIleEncoding [1], "D:\TMP\FS_" |
&FIleEncoding [1] | ".TXT", &FIleEncoding [2],
"D:\TMP\AFTER\FS_PCode_" |
&FIleEncoding [1] | "_to_" | &FIleEncoding [2]
| ".TXT");
End-If
This section provides overviews of Application
Designer field length semantics and field length checking for non-Unicode
databases and discusses how to enable or disable data field length checking.
The database character set determines the way
that PeopleTools interprets the column length that is defined in Application
Designer.
If you create a Unicode database, the field
length, as shown in Application Designer, indicates the maximum number of
Unicode BMP characters that are permitted in the field, regardless of the
Unicode encoding that is used by the database. Some database platforms, such as
Oracle with byte semantics, use byte lengths to measure column sizes when
operating in a Unicode database, while others use character lengths.
When the database uses byte-sized column
lengths, the PeopleSoft system sizes the database columns based on the
worst-case ratio between bytes and characters in the Unicode encoding that is
used by your database. For example, if the AL32UTF8 character set is used by
Oracle with byte semantics, the worst-case character-to-byte ratio when running
against an Oracle Unicode database is 1:3. So, column size is tripled when
creating a Unicode database on Oracle. A field that is defined in Application
Designer as a CHAR(10) is created on an Oracle Unicode database with a type of VARCHAR2(30).
This tripling of the maximum column size does not affect the actual size of the
database, because variable length character fields do not reserve space in the
database.
Other database platforms use character-based
column lengths whose sizes represent the maximum number of Unicode characters
instead of bytes that may be stored. Examples of this implementation are the
NCHAR data type in Microsoft SQL Server and the GRAPHIC data type in DB2 UDB
for Linux, Unix, and Microsoft Windows.
If you create a non-Unicode database, the
field length in Application Designer represents the number of bytes that are
permitted in the field, based on the character set that you used to create the
database. Therefore, a PeopleSoft Unicode database enables you significantly
more space for character data within the database when dealing with ideographic
languages, such as Japanese, that require more than one byte storage per
character.
The following tables show some of the possible
database encodings for database platforms that the PeopleSoft system supports
in Unicode and DBCS and their effects on database column sizes. Each table
shows the database representation and the worst case number of characters
allowed in the character field for a character field defined in Application
Designer with a length of 10.
This table shows the information for an Oracle
database with byte semantics (used by PeopleSoft 8.9 applications and earlier):
Database
Character Set
|
Database
Representation
|
Number of
Characters
|
Unicode
(AL32UTF8)
|
VARCHAR2(30)
|
10
|
Any SBCS
|
VARCHAR2(10)
|
10
|
Shift-JIS
(JA16SJIS or JA16SJISTILDE)
|
VARCHAR2(10)
|
5
|
This table shows the information for an Oracle
with character semantics (used by PeopleSoft 9.0 applications and later):
Database
Character Set
|
Database
Representation
|
Number of
Characters
|
Unicode
(AL32UTF8)
|
VARCHAR2(10)
|
10
|
Any SBCS
|
VARCHAR2(10)
|
10
|
Shift-JIS
(JA16SJIS or JA16SJISTILDE)
|
VARCHAR2(10)
|
10
|
Database
Character Set
|
Database
Representation
|
Number of
Characters
|
Unicode (UCS-2)
|
NVARCHAR(10)
|
10
|
Any SBCS
|
VARCHAR(10)
|
10
|
Shift-JIS (CP932)
|
VARCHAR(10)
|
5
|
This table shows the information for a
Microsoft SQL Server database with CHAR semantics:
Database
Character Set
|
Database
Representation
|
Number of
Characters
|
Unicode (UCS-2)
|
NCHAR(10)
|
10
|
Any SBCS
|
CHAR(10)
|
10
|
Shift-JIS (CP932)
|
CHAR(10)
|
5
|
Database
Character Set
|
Database
Representation
|
Number of
Characters
|
Any SBCS
|
CHAR(10)
|
10
|
Shifting DBCS
(CCSID 930/939)
|
CHAR(10)
|
4 (4 x 2 byte
characters, plus shift-in & shift-out bytes)
|
This table shows the information for all other
databases:
Database
Character Set
|
Database
Representation
|
Number of
Characters
|
Any SBCS
|
CHAR(10)
|
10
|
The
maximum number of characters that are permitted in a PeopleSoft field varies,
depending on the character set of the database. Because all components of
PeopleTools use Unicode for internal storage, by default, field length checking
occurs in terms of Unicode character counts. This calculation is appropriate
for Unicode databases and for any SBCS databases.
However,
if you are using a non-Unicode DBCS, special length checking must occur each
time you move off a field to ensure that the string that you entered fits in
the database column when the string is converted to the database’s character
set.
For
graphically sizing page fields, PeopleTools uses the Unicode length of the
field as defined in Application Designer. For example, if a field is defined in
Application Designer as a 10-character field, page fields in both the
PeopleSoft Pure Internet Architecture and the PeopleTools clients for Microsoft
Windows allow 10 characters to be displayed unless manually resized by the
developer.
However, if the database is encoded in a
non-Unicode DBCS character set, such as Japanese Shift-JIS, special length
validation must occur because the database column size is created relative to a
byte count, not to a character count as is used by the simple field length
validation.
For
example, if a user enters 10 Japanese characters into a field that is defined
as CHAR(10) in Application Designer, this string needs 20 bytes of storage in a
nonshifting DBCS character set and 22 bytes of storage in a shifting character
set. This 10-character input would fail insertion in both of these databases.
To
address this issue, the page processor checks the Data Field Length Checking
option on the PeopleTools Options page and performs character-set specific
length validation against the contents of each field when the field is
validated. Typically length validation occurs when the field’s FieldChange
PeopleCode event fires, so the actual time of validation may differ, depending
on whether your page uses deferred mode processing.
To enable or disable data field length
checking:
- Select PeopleTools, Utilities, Administration,
PeopleTools Options.
The PeopleTools Options page appears.
- From the Data Field Length Checking drop-down list box,
select a value based on the character set that you are using for the
database:
Others
|
Select if you are using a Unicode encoded
database or a non-Unicode SBCS database. This option prevents special field
length checking, which is not required by these types of databases.
|
Select DB2 MBCS if you are running a
Japanese database on the DB2 UDB for Linux, Unix, and Microsoft Windows
platform. This options enables field length checking based on a shifting
DBCS.
|
|
MBCS
|
Select if you are running a non-Unicode
Japanese database on any other platform. This option enables field length
checking based on a nonshifting DBCS.
|
- Note. The non-Unicode DBCS settings are specifically oriented
to Japanese language installations, because Japanese is the only language
that the PeopleSoft system supports in a non-Unicode DBCS encoding. All
languages other than Western European languages and Japanese are supported
by the PeopleSoft system only when using Unicode encoded databases.
- Click the Save button.
This section discusses PeopleSoft standard
name conventions, including name conventions for Chinese, Japanese, and Korean
(CJK) ideographic characters. PeopleSoft standard name conventions apply when
data is entered or displayed in character fields that use Name as the format
type. These conventions should be used when a complete name is constructed from
multiple name character fields or when all name data is entered into a single
name character field.
The PeopleSoft standard name convention is:
[lastname] [suffix],[prefix] [firstname] [middle name/initial]
Examples of typical suffixes include degrees,
affiliations, and titles such as MD, PhD, Jr., and III. Examples of typical
prefixes include titles and honorifics such as Ms., Mr., Dr., Rev., and Hon.
Valid examples of these conventions include:
Name as Displayed
by PeopleSoft Convention
|
Name Elements
Used
|
Actual Name
|
O’Brien,Michael
|
[lastname],[firstname]
|
Michael O’Brien
|
Jones IV,James
|
[lastname] [suffix],[firstname]
|
James Jones IV
|
Phillips
MD,Deanna Lynn
|
[lastname] [suffix],[firstname]
[middle name]
|
Deanna Lynn
Phillips, MD
|
Reynolds Jr.,Dr.
John Q.
|
[lastname] [suffix],[prefix]
[firstname] [middle initial]
|
Dr. John Q.
Reynolds Jr.
|
Phipps-Scott,Ms.
Adrienne
|
[lastname],[prefix] [firstname]
|
Ms. Adrienne
Phipps-Scott
|
Knauft,Günter
|
[lastname],[firstname]
|
Günter Knauft
|
However, if the name contains any CJK
ideographic characters, different standard name conventions apply.
If the name contains any Japanese or Korean
ideographic characters, the first and last names are separated by a space
instead of a comma. In Japanese, a prefix or suffix is optional; in Korean,
only an optional prefix can be used. These modified PeopleSoft standard name
conventions can be used when a name includes any of the following types of
characters:
- Japanese or Korean unified ideographs (Japanese Kanji
or Korean Hanja).
- Japanese half-width or full-width Katakana.
- Japanese Hiragana.
- Korean Hangul.
The PeopleSoft standard name convention for
Japanese names including these ideographic characters is:
[lastname] [firstname][{suffix|prefix}]
The PeopleSoft standard name convention for
Korean names including these ideographic characters is:
[lastname] [firstname][prefix]
Valid examples of these conventions include:
Name as
Displayed by PeopleSoft Convention
|
Name Elements
Used
|
English
Equivalent
|
塩次 伸二
|
[lastname] [firstname]
|
Shinji Shiotsugu
|
塩次 伸二様
|
[lastname] [firstname][prefix]
|
Mr. Shinji
Shiotsugu
|
홍 길동
|
[lastname] [firstname]
|
Hong Gildong
|
홍 길동씨
|
[lastname] [firstname][prefix]
|
Mr. (or Ms.) Hong
Gildong
|
If the name contains Chinese Hanzi, there is
no space or comma between the first name, last name, suffix, and prefix. The
PeopleSoft standard name convention for names including Chinese Hanzi
characters is:
[lastname][firstname][{suffix|prefix}]
Valid examples of this convention include:
Name as
Displayed by PeopleSoft Convention
|
Name Elements
Used
|
English
Equivalent
|
陳嘉明
|
[lastname][firstname]
|
Chan Ka Ming
|
陳嘉明先生
|
[lastname][firstname][prefix]
|
Mr. Chan Ka Ming
|
See Also
PeopleTools
also provides PeopleCode string functions that recognize and convert between
different characters within the Japanese character set. This enables you to
detect, convert, and enforce the types of characters that you can enter in any
PeopleSoft field. For example, the PeopleSoft system uses these functions in
the development of the Alternate Character Architecture in some PeopleSoft
applications. The Alternate Character Architecture is used in several PeopleSoft
applications to provide a feature that enables the entry of, and enforces the
characters contained in, Japanese phonetic spellings (Furigana) by using the
Hiragana or Katakana scripts.
The
following PeopleCode string functions can be used to recognize and convert
between different characters within the Japanese character set:
- CharType
- ContainsCharType
- ContainsOnlyCharType
- ConvertChar
No comments:
Post a Comment