Returns the number of bytes of the codepoint starting at p It returns 0 if p is nil. It returns 1 if p is a 1-byte UTF-8 codepoint or p is an invalid UTF-8 sequence. Otherwise it returns a number 2..4. It does not check for malicious codepoints like #$c0#$80, nor for non defined codepoints like #$f3#$a0#$87#$b9. Use UTF8CharacterLength to step through a string with a simple loop:

while p^<>#0 do begin inc(p,UTF8CharacterLength(p)); end;

Even if p contains invalid UTF-8 it will run through the string without overflow. UTF8CharacterStrictLength Returns the codepoint at p and the number of bytes to skip. If p=nil then CharLen and result are 0 otherwise CharLen>0. If there is an encoding error the Result is 0 and CharLen=1 to skip forward. It is safe to do: var s: string; p:=1; while p<=length(s) do begin UTF8CharacterToUnicode(@s[p],CharLen); inc(p,CharLen); end; For speed reasons this function only checks for 1,2,3,4 byte encoding errors. Especially it does not check if the codepoint is defined in the unicode table. Encodes the given codepoint as an UTF-8 sequence of 1 to 4 bytes. It does not add a #0. Simple and fast function to write a single unicode codepoint as UTF-8 to Buf and returns the number of bytes written It does not append a #0. It does not check if it is the codepoint actually exists in unicode tables. It returns 0 if the codepoint can not be represented as a 1 to 4 byte UTF-8 sequence. Replaces all invalid UTF8 characters with spaces. Stops at #0. For a valid UTF-8 character it returns the length in bytes (1..4). Otherwise 0. Returns the character index, where the SearchForText starts in SearchInText Returns the character index, where the SearchForText starts in SearchInText. An optional StartPos can be given (as character index, not in byte). Returns 0 if not found. Returns -1 if ok, otherwise byte index of invalid UTF8 codepoint It always stops on irregular codepoints. For example Codepoint 0 is normally encoded as #0, but it can also be encoded as #192#0. Because most software does not check this, it can be exploited and is a security risk. If StopOnNonUTF8 is false it will ignore undefined codes. For example #128. By default it stops on such codes. Replace invalid UTF8 and replace #0..#31 characters with '#0'..'#31' Removes space at start and endIt removes spaces, tabs, line breaks and control characters at start and end. Use Flags to only delete at start or only at end or to to not delete line breaks. Control characters are the unicode sets C0 and C1 and the left-to-right and right-to-left marks. Returns the position where SearchInText starts in SearchForTextIf not found it returns nil. Null characters #0 are treated as normal characters. Compare two strings using language specific sorting