Try OpenEdge Now
skip to main content
Internationalizing Applications
Character Processing Table Formats : Code-page conversion table : Convmap changes
 

Convmap changes

The convmap files support supplementary characters in the DBCS-to-UTF-8 conversion (type 17). Any conversion using supplementary characters must use the Unicode compression option. Unicode compression is input as an increasing set of Unicode pairs of the form high value, low value, followed by 0, for a maximum of 5 ranges, with an additional 0 at the end.
The following excerpt from a conversion specification compresses the Unicode values from four ranges in plane 0 and one range in plane 8:
CONVERT
SOURCE-NAME "SUPPLEMENTARY-CP"
TARGET-NAME "UTF-8"
TYPE "17"
FIRST-CONSTANTS
# single, 1st start, 2nd start, gap
0xFFFF 0 0x580 0x4ABF
# ccount, umin, umax, index power, delta power, bad character
779 0x0 0x0FFD 6 6 0
# unicode compression
0x451 0xA7 0 0x266F 0x2000 0 0x33CD 0x3000 0 0xFA2D 0xF900 0 0x8FFE5 0x8FF00 0 0

# High range
0x00 0x00 0x81 0xFC
# low range
0x40 0x7E 0x80 0xFC 0xFF 0xFE
ENDCONSTANTS
Compression cannot occur over more than one plane. For example, this would not be supported:
0x451 0xA7 0 0x345EE 0x20000 0 0 0 0 0 0 0 0 0 0 0
The actual mappings appear in the -DATA section of the table, with the Unicode values expressed as the hex value of the UTF-32 codepoint, as shown:
FIRST-DATA
0x00D8 0x8FF98
0x00DF 0x8FF9F
0x8140 0x3000
0x8141 0x3001
0x8143 0x8FF0C
0x8144 0x8FF0E
ENDTABLE