Try OpenEdge Now
skip to main content
Internationalizing Applications
Character Processing Table Formats : Word-break table
 

Word-break table

The syntax for the word-break table:

Syntax

[ #define symbolic-namesymbol-value] ...
[ Version = 9
Codepage = codepage-name
wordrules-name = wordrules-name
type = table-type
]
word_attr =
{
{char-literal|hex-value|decimal-value} , word-delimiter-attribute
[
, {char-literal |hex-value | decimal-value}
, word-delimiter-attribute] ...
};
symbolic-name
The name of a symbol; for example, DOLLAR-SIGN.
symbol-value
The value of the symbol; for example, "$".
Note: Although OpenEdge lets you compile word-break tables that omit all items within the second pair of square brackets, Progress Software Corporation recommends you always include these items. If the source-code version of a compiled word-break table lacks these items, and the associated database is not so large as to make this unfeasible, Progress Software Corporation recommends you add these items to the table, recompile the table, reassociate the table with the database, and rebuild the indexes.
codepage-name
The name, not surrounded by quotes, of the code page the word-break table is associated with. The maximum length is 20 characters, for example: UTF-8.
wordrules-name
The name, not surrounded by quotes, of the compiled word-break table. The maximum length is 20 characters, for example: utf8sample.
table-type
The number 3.
Note: OpenEdge allows a table type of 1 or 2. Although these are still supported, Progress Software Corporation recommends, if feasible, that you change the table type to 3, recompile the word-break table, reassociate it with the database, and rebuild the indexes.
char-literal
A character within single quotes or a symbolic-name, which represents a character in the code page, for example: '#'.
hex-literal
A hexadecimal value or a symbolic-name, which represents a character in the code page, for example, 0xAC.
decimal-literal
A decimal value or a symbolic-name, which represents a character in the code page, for example: 39.
word-delimiter-attribute
In what context the character is a word delimiter.
The following table describes the word-delimiter attributes.
Table 36. Word delimiter attributes
Word delimiter attribute
Description
Default

LETTER
Always part of a word
Assigned to all characters the current attribute table defines as letters. In English, these are the uppercase characters A-Z and the lowercase characters a-z.

DIGIT
Always part of a word
Assigned to the numerals 0-9.

USE_IT
Always part of a word
Assigned to the following characters:
*Dollar sign ($)
*Percent sign (%)
*Number sign (#)
*At symbol (@)
*Underline (_)

BEFORE_LETTER
Part of a word only if followed by a character with the LETTER attribute; otherwise, treated as a word delimiter
-

BEFORE_DIGIT
Treated as part of a word only if followed by a character with the DIGIT attribute
Assigned to the following characters:
*Period (.)
*Comma (,)
*Hyphen (-)
For example, "12.34" is one word, but "ab.cd" is two words.

BEFORE_LET_DIG
Treated as part of a word only if followed by a character with the LETTER or DIGIT attribute
-

IGNORE
Ignored
Assigned to the apostrophe ('), for example, "John's" is equivalent to "Johns".

TERMINATOR
Word delimiter
Assigned to all other characters.