When you use Unicode in OpenEdge applications, the following restrictions, cautions, and suggestions apply:
With the OpenEdge UTF-8 BASIC collation, composed and decomposed characters are treated as different characters. With the International Components for Unicode (ICU) collations, composed and decomposed characters are treated as the same character for comparisons and indexes.
The OpenEdge UTF-8 BASIC collation provides for sorting Unicode data in binary order. Alternatively, the ICU collations provide for sorting Unicode data based on the language-specific requirements for a locale.
Note: You can specify an OpenEdge collation or an ICU collation for sorting data using either the Collation Table (-cpcoll) startup parameter, or the COLLATE option on the FOR statement, the OPEN QUERY statement, and the PRESELECT phrase. For more information on the -cpcoll startup parameter, see OpenEdge Deployment: Startup Command and Parameter Reference. For more information on the ABL elements, see OpenEdge Development: ABL Reference.
For information about using ICU collations as database collations, see Using Databases.
Before sorting Unicode data with the UTF-8 BASIC collation, normalize the data using the ABL NORMALIZE function. Normalizing the data converts the data into a standardized form that allows for more accurate and consistent sorting and indexing. This is important when working with characters or sequences of characters that have multiple representations (for example, base characters and combining characters) because it ensures that equivalent strings have a unique binary representation. For more information on the ABL NORMALIZE function, see OpenEdge Development: ABL Reference.
Note: When sorting Unicode data with an ICU collation, you do not need to normalize the data.
When UTF-8 data contains decomposed characters, you cannot convert it to a single-byte code page. You must first compose the data using the ABL NORMALIZE function. When you convert data from a single-byte code page to Unicode, the result is always composed data.
When an existing database is converted to UTF-8, the amount of storage required by each non-ASCII character increases. Roughly, each non-ASCII Latin-alphabet character converted to UTF-8 tends to require two bytes, while each double-byte Chinese, Japanese, or Korean character converted to UTF-8 tends to require three bytes.
To display and print Unicode data, consider using a Unicode font. They are available commercially.