skip to main content
Corticon Studio: Rule Language Guide : Character precedence: Unicode and Java Collator
 

Try Corticon Now

Character precedence: Unicode and Java Collator

The Unicode standard assigns a 4 digit (hexadecimal) code to every character, including many that can't be typed on standard keyboards. Java (and hence Progress Corticon software) uses a special method named Collator to sort these characters in specific sequences based on the I18n locale of the user.
While sorting by locale allows for regional variations of language-specific characters like accents, the combination of these two systems can also make determining character precedence very complicated. The Unicode code and Java Collator sequence for standard keyboards in US-English locale is shown in the table below.
Sequences for other languages and/or locales may differ, and many other Unicode characters are available but are not shown in the table. We recommend http://www.unicode.org/charts for more information on the Unicode system and http://java.sun.com/docs/books/tutorial/i18n/text/locale.html for more information on the Java Collator method.
*‘Z’=‘z’ evaluates to false.
*‘C & S’ < ‘C and S’ evaluates to true because character a has a higher precedence than & (26 < 44). These characters are decisive because they are the first different characters encountered as the two strings are compared beginning with characters in position 1.
*‘B’ > ‘aardvark’ evaluates to true because character B has a higher precedence than a (45 > 44).
*‘Marilynn’ < ‘Marilyn’ evaluates to false because character n has a higher precedence than <space> (57 > 1). The first seven characters of each String are identical, so the final character comparison is decisive.
character
name
precedence
Unicode 5.0 code
typed space
1
0020
-
dash or minus sign
2
002D
_
underline or underscore
3
005F
,
comma
4
002C
;
semicolon
5
003B
:
colon
6
003A
!
exclamation point
7
0021
?
question mark
8
003F
/
slash
9
002F
.
period
10
002E
`
grave accent
11
0060
^
circumflex
12
005E
~
tilde
13
007E
apostrophe
14
0027
quotation marks
15
0022
(
left parenthesis
16
0028
)
right parenthesis
17
0029
[
left bracket
18
005B
]
right bracket
19
005D
{
left brace
20
007B
}
right brace
21
007D
@
at symbol
22
0040
$
dollar sign
23
0024
*
asterisk
24
002A
\
backslash
25
005C
&
ampersand
26
0026
#
number sign or hash sign
27
0023
%
percent sign
28
0025
+
plus sign
29
002B
<
less than sign
30
003C
=
equals sign
31
003D
>
greater than sign
32
003E
|
vertical line
33
007C
0..9
numbers 1 through 9
34-43
0031-0039
a, A
letter a, small and capital
44
0061, 0041
b, B
letter b, small and capital
45
0062, 0042
c, C
letter c, small and capital
46
0063, 0043
d, D
letter d, small and capital
47
0064, 0044
e, E
letter e, small and capital
48
0065, 0045
f, F
letter f, small and capital
49
0066, 0046
g, G
letter g, small and capital
50
0067, 0047
h, H
letter h, small and capital
51
0068, 0048
I, I
letter I, small and capital
52
0069, 0049
j, J
letter j, small and capital
53
006A, 004A
k, K
letter k, small and capital
54
006B, 004B
l, L
letter l, small and capital
55
006C, 004C
m, M
letter m, small and capital
56
006D, 004D
n, N
letter n, small and capital
57
006E, 004E
o, O
letter o, small and capital
58
006F, 004F
p, P
letter p, small and capital
59
0070, 0050
q, Q
letter q, small and capital
60
0071, 0051
r, R
letter r, small and capital
61
0072, 0052
s, S
letter s, small and capital
62
0073, 0053
t, T
letter t, small and capital
63
0074, 0054
u, U
letter u, small and capital
64
0075, 0055
v, V
letter v, small and capital
65
0076, 0056
w, W
letter w, small and capital
66
0077, 0057
x, X
letter x, small and capital
67
0078, 0058
y, Y
letter y, small and capital
68
0079, 0059
z, Z
letter z, small and capital
69
007A, 005A