Try OpenEdge Now
skip to main content
Internationalizing Applications
Using Multi-byte Code Pages : Inside the multi-byte application : Techniques for working with multi-byte characters : Choosing the appropriate unit of measure
 
Choosing the appropriate unit of measure
Several ABL elements, including the LENGTH function, OVERLAY statement, SUBSTRING function, and SUBSTRING statement, let you specify the unit of measure as the character, the byte, or the column. If you choose the wrong unit of measure, you might split or overlay a multi-byte character. Consider the following example:
DEFINE VARIABLE cCharOver AS CHARACTER FORMAT "X(8)" NO-UNDO.
cCharOver = "abc efg".
OVERLAY(cCharOver), 1, 4, "RAW") = "wxyz". /* RAW is wrong */
DISPLAY cCharOver WITH 1 COLUMN.
The example defines a character variable and sets it to a string of seven characters, the fourth of which is double byte. The example then overlays a string of four, single-byte characters on the original string, starting at position one and continuing for four positions. Unfortunately, the unit of measure is the byte (specified by RAW), so the fourth byte of the second string, which is the character z, overlays the fourth byte of the original string, which is the lead byte of the double-byte character.
The following figure shows how the z in the second string overlays the lead byte of the double-byte character in the original string.
Figure 18. A single-byte character overlaying a lead byte
All that remains of the multi-byte character is the trail-byte, as shown in the following figure.
Figure 19. Result of a single-byte character overlaying a lead byte
To fix this error, change the unit of measure to CHARACTER, as shown:
DEFINE VARIABLE cCharOver AS CHARACTER FORMAT "X(8)" NO-UNDO.
cCharOver = "abc efg".
OVERLAY(cCharOver), 1, 4, "CHARACTER") = "wxyz".
/* CHARACTER is correct */
DISPLAY cCharOver WITH 1 COLUMN.
The corrected program produces the string shown in the following figure.
Figure 20. String produced by an OVERLAY statement whose unit of measure is the character