Processing by characters

To keep your application code flexible so it can handle multi-byte character data (the Chinese, Japanese, and Korean languages use double-byte code pages), do not process character data byte-by-byte. Also, do not assume two bytes always equals two characters. By default, ABL functions process character data as whole characters, not as bytes. Make sure you process all character data at the character level when appropriate. For example, a string-processing routine that examines each byte can mistake the second half of a double-byte character as a new character.

The following ABL code processes data by characters, not bytes. As a result, it calculates the storage required incorrectly:

DEFINE VARIABLE iFloppySize AS INTEGER NO-UNDO INITIAL 1457664.
DEFINE VARIABLE iTotal      AS INTEGER NO-UNDO.

OUTPUT TO namelist.txt.
FOR EACH Customer NO-LOCK WHILE iTotal < iFloppySize:
  EXPORT Customer.Name.
  /* Calculate file size in bytes by adding name length, 2 quotes, carriage
     return and linefeed. */
  ASSIGN iTotal = iTotal + LENGTH(Customer.Name) + 4.
END.

OUTPUT TO terminal.
DISPLAY "NAMELIST.TXT is " iTotal "bytes".
IF iTotal >= iFloppySize THEN
  DISPLAY "namelist.txt will not fit on 1 floppy disk."

This procedure exports customer names to a file, namelist.txt, that is intended to fit on a 3.5-inch, 1.44 MB floppy diskette. The procedure quits if the list of names becomes larger than 1,440,000 bytes. Since by default, however, the LENGTH function returns the character count, not the byte count of a string, this procedure does not work properly for multi-byte data. To correct the procedure, use LENGTH(Customer.Name,"RAW") instead of LENGTH(Customer.Name). Using the LENGTH statement with type set to RAW tells OpenEdge to provide the byte count, not the character count.

Another practice that might cause problems for international applications is using a specific numeric value for a character. Different locales use different character sets that have different numeric encoding systems. The letter "é" maps to the hexadecimal value E9 in the ISO 8859-1 code page, but it maps to the hexadecimal value 82 in the IBM850 code page.