From 108411ee2db72d035b1ae5131065b0537080544e Mon Sep 17 00:00:00 2001 From: dsiders Date: Thu, 29 Aug 2024 17:01:04 +0100 Subject: [PATCH] Docs: LazUtils/lazutf8. Adds UTF8CodepointCount topics for changes in c8a1f93a, --- docs/xml/lazutils/lazutf8.xml | 64 +++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/docs/xml/lazutils/lazutf8.xml b/docs/xml/lazutils/lazutf8.xml index a98451bba4..f1999f39f1 100644 --- a/docs/xml/lazutils/lazutf8.xml +++ b/docs/xml/lazutils/lazutf8.xml @@ -632,6 +632,70 @@ Even faster UTF-8 character counting Number of byte values in the UTF-8-encoded string. + + +Gets the number of UTF-8-encoded codepoints in the specified value. + + +

+UTF8CodepointCount is an overloaded PtrInt function used +to determine the number of UTF-8 codepoints found in the specified value. The +overloaded variants allow the value to be specified using either the String or +the PChar type. +

+

+UTF8CodepointCount iterates over the byte values in the s or p arguments, and +increments the return value when a valid UTF-8 codepoint is found. Valid +codepoints include those represented by combining character combinations. +UTF8CodepointLen (in system.pp) is called to the get the size for each +of the UTF-8 codepoints. The process is repeated until all of the bytes in the +input value have been examined, or a codepoint with a length of zero (0) is encountered. +

+

+The return value is zero (0) if the s or p arguments are empty, or when the +ByteCount argument is zero (0). +

+
+ +Added in LazUtils version 4.0. (c8a1f93a) + + + +I wrote a test application to compare the results for UTF8Length and +UTF8CodepointCount. They return exactly the same values for the UTF-8 strings I +cribbed from the Unicode web site. +So basically, why is this routine needed? + + + + + + + +UTF8CodepointLen + +
+ + +Pointer to the Integer value with the number of codepoints including combining characters. + + + + +String with the codepoints examined in the routine. + + + + +PChar type with the codepoints examined in the routine. + + + + +Number of bytes in the PChar value. + + + Converts a UTF-8-encoded character to its unique Unicode U+XXXX character