skip to main content
Corticon Studio: Rule Language Guide : Character precedence: Unicode and Java Collator
 

Try Corticon Now

Character precedence: Unicode and Java Collator

The Unicode standard assigns a 4 digit (hexadecimal) code to every character, including many that can't be typed on standard keyboards. Java (and hence Progress Corticon software) uses a special method named Collator to sort these characters in specific sequences based on the I18n locale of the user.
While sorting by locale allows for regional variations of language-specific characters like accents, the combination of these two systems can also make determining character precedence very complicated. The Unicode code and Java Collator sequence for standard keyboards in US-English locale is shown in the table below.
Sequences for other languages and/or locales may differ, and many other Unicode characters are available but are not shown in the table. We recommend http://www.unicode.org/charts for more information on the Unicode system and http://java.sun.com/docs/books/tutorial/i18n/text/locale.html for more information on the Java Collator method.
*‘Z’=‘z’ evaluates to true because character Z has the same precedence as z (69=69). A given letter has the same precedence regardless of its case. This is an important difference between character precedence determined by ISO or ASCII systems, and the Java Collator system used by Corticon.
*‘C & S’ < ‘C and S’ evaluates to true because character a has a higher precedence than & (26 < 44). These characters are decisive because they are the first different characters encountered as the two strings are compared beginning with characters in position 1.
*‘B’ > ‘aardvark’ evaluates to true because character B has a higher precedence than a (45 > 44).
*‘Marilynn’ < ‘Marilyn’ evaluates to false because character n has a higher precedence than <space> (57 > 1). The first seven characters of each String are identical, so the final character comparison is decisive.
character
name
precedence
Unicode 5.0 code
typed space
1
0020
-
dash or minus sign
2
002D
_
underline or underscore
3
005F
,
comma
4
002C
;
semicolon
5
003B
:
colon
6
003A
!
exclamation point
7
0021
?
question mark
8
003F
/
slash
9
002F
.
period
10
002E
`
grave accent
11
0060
^
circumflex
12
005E
~
tilde
13
007E
apostrophe
14
0027
quotation marks
15
0022
(
left parenthesis
16
0028
)
right parenthesis
17
0029
[
left bracket
18
005B
]
right bracket
19
005D
{
left brace
20
007B
}
right brace
21
007D
@
at symbol
22
0040
$
dollar sign
23
0024
*
asterisk
24
002A
\
backslash
25
005C
&
ampersand
26
0026
#
number sign or hash sign
27
0023
%
percent sign
28
0025
+
plus sign
29
002B
<
less than sign
30
003C
=
equals sign
31
003D
>
greater than sign
32
003E
|
vertical line
33
007C
0..9
numbers 1 through 9
34-43
0031-0039
a, A
letter a, small and capital
44
0061, 0041
b, B
letter b, small and capital
45
0062, 0042
c, C
letter c, small and capital
46
0063, 0043
d, D
letter d, small and capital
47
0064, 0044
e, E
letter e, small and capital
48
0065, 0045
f, F
letter f, small and capital
49
0066, 0046
g, G
letter g, small and capital
50
0067, 0047
h, H
letter h, small and capital
51
0068, 0048
I, I
letter I, small and capital
52
0069, 0049
j, J
letter j, small and capital
53
006A, 004A
k, K
letter k, small and capital
54
006B, 004B
l, L
letter l, small and capital
55
006C, 004C
m, M
letter m, small and capital
56
006D, 004D
n, N
letter n, small and capital
57
006E, 004E
o, O
letter o, small and capital
58
006F, 004F
p, P
letter p, small and capital
59
0070, 0050
q, Q
letter q, small and capital
60
0071, 0051
r, R
letter r, small and capital
61
0072, 0052
s, S
letter s, small and capital
62
0073, 0053
t, T
letter t, small and capital
63
0074, 0054
u, U
letter u, small and capital
64
0075, 0055
v, V
letter v, small and capital
65
0076, 0056
w, W
letter w, small and capital
66
0077, 0057
x, X
letter x, small and capital
67
0078, 0058
y, Y
letter y, small and capital
68
0079, 0059
z, Z
letter z, small and capital
69
007A, 005A