Namespace: OpenEdge.Core.Util
Class
UTF8Encoder
Parent classes:
Progress.Lang.Object

Inherits: Progress.Lang.Object

Copyright (c) 2020 by Progress Software Corporation. All rights reserved.
File:UTF8Encoder
Purpose:Encodes UTF-8 strings, characters and values to and from Unicode codepoints
Author(s):pjudge
Created:2020-03-02
Notes:* Ranges and algorithms taken from https://en.wikipedia.org/wiki/UTF-8#Description and
https://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
allows for more complex sequences (like Swift's "\u{&1}" )
/* Encodes non-ASCII (a UTF8 value of 32 < utf8 < 128) characters into Unicode-codepoints, using
the escape sequence/mask given.
May include &1 for more complex sequences (like Swift's \u{&1} )




Method Summary
Options Name Purpose
LONGCHAR Decode (longchar, character) /** Attempts to decode a previously-encoded string. By default 4-character hex values are used, for escape sequences like U+ or \u . If the escape sequence has a terminator - like \U{&1} - then the length of the hex value is determined using the mask. @param longchar The encoded string @param character The escape sequence, by default U+ but for JSON may need to be \u . May include &1 for more complex sequences (like Swift's \u{&1} ) @return longchar A decoded UTF-8 string */
LONGCHAR Encode (longchar, character) /* Encodes non-ASCII (a UTF8 value of 32 < utf8 < 128) characters into Unicode-codepoints, using the escape sequence/mask given. @param longchar The string to encode. @param character The escape sequence, by default U+ but for JSON may need to be \u . May include &1 for more complex sequences (like Swift's \u{&1} ) @return longchar The encoded string */
INT64 UnicodeToUtf8 (character) /* Converts a Unicode codepoint into a UTF8 value. For example U+20AC is decided as decimal 8364, which is used as an input for the UnicodeToUtf8(int64) method. @param character The Unicode codepoint as a HEX value @return int64 The UTF-8 value. Will be ZERO/0 if the codepoint is unknown or negative. */
INT64 UnicodeToUtf8 (int64) /* Converts a Unicode codepoint into a UTF8 value For example the unicode codepoint 8364 (which is U+20AC) is returned as decimal 14844588 which can then be turned into a character with CHR(<val>, 'utf-8':u, 'utf-8':u) @param int64 The Unicode codepoint @return int64 The UTF-8 value. Will be ZERO/0 if the codepoint is unknown or negative. */
INTEGER Utf8ToUnicode (character) /* Converts a UTF8 character into a unicode codepoint. @param character The UTF-8 character string. Only the first character of the input value is used. @return integer The Unicode codepoint. Will be ZERO/0 if the codepoint is null. */
INTEGER Utf8ToUnicode (int64) /* Converts a UTF8 values into a unicode codepoint @param int64 The UTF-8 value (from the ASC() function or elsewhere) @return integer The Unicode codepoint. Will be ZERO/0 if the codepoint is null. */


Method Detail
Top

LONGCHAR Decode (longchar, character)

Purpose: Attempts to decode a previously-encoded string. By default 4-character hex values are
used, for escape sequences like U+ or \u . If the escape sequence has a terminator - like \U{&1}
- then the length of the hex value is determined using the mask.
May include &1 for more complex sequences (like Swift's \u{&1} )
Parameters:
pString LONGCHAR
pEscapeSeq CHARACTER
Returns LONGCHAR
longchar A decoded UTF-8 string
Top

LONGCHAR Encode (longchar, character)

/* Encodes non-ASCII (a UTF8 value of 32 < utf8 < 128) characters into Unicode-codepoints, using
the escape sequence/mask given.
May include &1 for more complex sequences (like Swift's \u{&1} )
Parameters:
pString LONGCHAR
pEscapeSeq CHARACTER
Returns LONGCHAR
longchar The encoded string
Top

INT64 UnicodeToUtf8 (character)

/* Converts a Unicode codepoint into a UTF8 value.
For example U+20AC is decided as decimal 8364, which is used as an input
for the UnicodeToUtf8(int64) method.
Parameters:
pHexValue CHARACTER
Returns INT64
int64 The UTF-8 value. Will be ZERO/0 if the codepoint is unknown or negative.
Top

INT64 UnicodeToUtf8 (int64)

/* Converts a Unicode codepoint into a UTF8 value
For example the unicode codepoint 8364 (which is U+20AC) is returned as
decimal 14844588 which can then be turned into a character with
CHR(<val>, 'utf-8':u, 'utf-8':u)
Parameters:
pUnicode INT64
Returns INT64
int64 The UTF-8 value. Will be ZERO/0 if the codepoint is unknown or negative.
Top

INTEGER Utf8ToUnicode (character)

/* Converts a UTF8 character into a unicode codepoint.
Parameters:
pUtf8 CHARACTER
Returns INTEGER
integer The Unicode codepoint. Will be ZERO/0 if the codepoint is null.
Top

INTEGER Utf8ToUnicode (int64)

/* Converts a UTF8 values into a unicode codepoint
Parameters:
pUtf8 INT64
Returns INTEGER
integer The Unicode codepoint. Will be ZERO/0 if the codepoint is null.


Copyright © 2022 Progress Software Corporation. All rights Reserved.

Progress® OpenEdge® Release 11.7.15