OpenEdge ABL API Reference

Namespace:

OpenEdge.Core.Util

Class

UTF8Encoder

Parent classes:

Progress.Lang.Object

OpenEdge.Core.Util.UTF8Encoder

Inherits:

Progress.Lang.Object

	Copyright (c) 2020 by Progress Software Corporation. All rights reserved.
File:	UTF8Encoder
Purpose:	Encodes UTF-8 strings, characters and values to and from Unicode codepoints
Author(s):	pjudge
Created:	2020-03-02
Notes:	* Ranges and algorithms taken from https://en.wikipedia.org/wiki/UTF-8#Description and
	https://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
	allows for more complex sequences (like Swift's "\u{&1}" )
	/* Encodes non-ASCII (a UTF8 value of 32 < utf8 < 128) characters into Unicode-codepoints, using
	the escape sequence/mask given.
	May include &1 for more complex sequences (like Swift's \u{&1} )

Method Summary

Options Name Purpose

LONGCHAR Decode (longchar, character)

/** Attempts to decode a previously-encoded string. By default 4-character hex values are used, for escape sequences like U+ or \u . If the escape sequence has a terminator - like \U{&1} - then the length of the hex value is determined using the mask. @param longchar The encoded string @param character The escape sequence, by default U+ but for JSON may need to be \u . May include &1 for more complex sequences (like Swift's \u{&1} ) @return longchar A decoded UTF-8 string */

LONGCHAR Encode (longchar, character)

/* Encodes non-ASCII (a UTF8 value of 32 < utf8 < 128) characters into Unicode-codepoints, using the escape sequence/mask given. @param longchar The string to encode. @param character The escape sequence, by default U+ but for JSON may need to be \u . May include &1 for more complex sequences (like Swift's \u{&1} ) @return longchar The encoded string */

INT64 UnicodeToUtf8 (character)

/* Converts a Unicode codepoint into a UTF8 value. For example U+20AC is decided as decimal 8364, which is used as an input for the UnicodeToUtf8(int64) method. @param character The Unicode codepoint as a HEX value @return int64 The UTF-8 value. Will be ZERO/0 if the codepoint is unknown or negative. */

INT64 UnicodeToUtf8 (int64)

/* Converts a Unicode codepoint into a UTF8 value For example the unicode codepoint 8364 (which is U+20AC) is returned as decimal 14844588 which can then be turned into a character with CHR(<val>, 'utf-8':u, 'utf-8':u) @param int64 The Unicode codepoint @return int64 The UTF-8 value. Will be ZERO/0 if the codepoint is unknown or negative. */

INTEGER Utf8ToUnicode (character)

/* Converts a UTF8 character into a unicode codepoint. @param character The UTF-8 character string. Only the first character of the input value is used. @return integer The Unicode codepoint. Will be ZERO/0 if the codepoint is null. */

INTEGER Utf8ToUnicode (int64)

/* Converts a UTF8 values into a unicode codepoint @param int64 The UTF-8 value (from the ASC() function or elsewhere) @return integer The Unicode codepoint. Will be ZERO/0 if the codepoint is null. */

Method Detail

Top

LONGCHAR Decode (longchar, character)

Purpose:	Attempts to decode a previously-encoded string. By default 4-character hex values are
	used, for escape sequences like U+ or \u . If the escape sequence has a terminator - like \U{&1}
	- then the length of the hex value is determined using the mask.
	May include &1 for more complex sequences (like Swift's \u{&1} )

Parameters:
pString	LONGCHAR

pEscapeSeq	CHARACTER

Returns	LONGCHAR
	longchar A decoded UTF-8 string

Top

LONGCHAR Encode (longchar, character)

	/* Encodes non-ASCII (a UTF8 value of 32 < utf8 < 128) characters into Unicode-codepoints, using
	the escape sequence/mask given.
	May include &1 for more complex sequences (like Swift's \u{&1} )

Parameters:
pString	LONGCHAR

pEscapeSeq	CHARACTER

Returns	LONGCHAR
	longchar The encoded string

Top

INT64 UnicodeToUtf8 (character)

	/* Converts a Unicode codepoint into a UTF8 value.
	For example U+20AC is decided as decimal 8364, which is used as an input
	for the UnicodeToUtf8(int64) method.

Parameters:
pHexValue	CHARACTER

Returns	INT64
	int64 The UTF-8 value. Will be ZERO/0 if the codepoint is unknown or negative.

Top

INT64 UnicodeToUtf8 (int64)

	/* Converts a Unicode codepoint into a UTF8 value
	For example the unicode codepoint 8364 (which is U+20AC) is returned as
	decimal 14844588 which can then be turned into a character with
	CHR(<val>, 'utf-8':u, 'utf-8':u)

Parameters:
pUnicode	INT64

Returns	INT64
	int64 The UTF-8 value. Will be ZERO/0 if the codepoint is unknown or negative.

Top

INTEGER Utf8ToUnicode (character)

/* Converts a UTF8 character into a unicode codepoint.

Parameters:
pUtf8	CHARACTER

Returns	INTEGER
	integer The Unicode codepoint. Will be ZERO/0 if the codepoint is null.

Top

INTEGER Utf8ToUnicode (int64)

/* Converts a UTF8 values into a unicode codepoint

Parameters:
pUtf8	INT64

Returns	INTEGER
	integer The Unicode codepoint. Will be ZERO/0 if the codepoint is null.

Progress® OpenEdge® Release 11.7.15