Docs: LazUtils/lazutf8. Updates UTF8FixBroken topic to include the ReplaceChar argument added in cc3fc445.

(cherry picked from commit 0f3fec23f7)
This commit is contained in:
dsiders 2024-08-12 23:40:01 +01:00
parent 7fa257e772
commit 9bf9348da0

View File

@ -1065,35 +1065,38 @@ Deprecated. Use UTF8CodepointToByteIndex instead.
<element name="UTF8FixBroken">
<short>
Replaces all invalid UTF-8 characters with spaces.
Replaces all invalid UTF-8 characters in a string with the specified character.
</short>
<descr>
<p>
<var>UTF8FixBroken</var> is an overloaded routine used to replace all invalid
UTF-8 characters with spaces. The overloaded variants allow the UTF-8-encoded
content to be specified using either a PChar or a String type.
UTF-8 characters in the specified value with a replacement character. The
overloaded variants allow the UTF-8-encoded content to be specified using
either a PChar or a String type.
</p>
<p>
<var>ReplaceChar</var> contains the character used to replace any invalid UTF-8
characters found in the input value. The default value for ReplaceChar is the
Space character (Hex $20 Decimal 32).
</p>
<p>
The PChar variant examines the specified byte values to determine when an
invalid UTF-8 codepoint is found. This includes byte values that fall outside
of the ranges allowed in UTF-8, and common byte sequences used to inject XSS
vulnerabilities.
invalid UTF-8 codepoint is found. This includes 1, 2, or 3 byte values, those
that fall outside of the ranges allowed in UTF-8, and common byte sequences
used to inject XSS vulnerabilities. UTF8FixBroken stops processing at the first
occurrence of the byte value #0 (Decimal 0). UTF-8 byte sequences updated in
the routine are stored in the original PChar argument.
</p>
<p>
UTF-8 byte sequences updated in the routine are stored in the original PChar
argument.
</p>
<p>
UTF8FixBroken processing at the first occurrence of the byte value #0
(Decimal 0).
</p>
<p>
The String variant converts the argument to a PChar type and calls
FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. When found,
UniqueString is called to get a new reference-counted String for the return
value.
The String variant converts the input argument to a PChar type and calls
FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. If invalid
bytes are found, UniqueString is called to get a new reference-counted String
for the return value generated by calling the overloaded PChar variant.
</p>
</descr>
<version>
Modified in LazUtils version 4.0 to include the ReplaceChar argument.
</version>
<seealso>
<link id="FindInvalidUTF8Codepoint"/>
<link id="#rtl.system.UniqueString">UniqueString</link>
@ -1109,6 +1112,12 @@ PChar with the UTF-8-encoded values examined in the routine.
String with the UTF-8-encoded values examined in the routine.
</short>
</element>
<element name="UTF8FixBroken.ReplaceChar">
<short>
Character used to replace invalid codepoints in the input argument. The default
value for the argument is the Space character (decimal 32 hex $20).
</short>
</element>
<element name="UTF8CodepointStrictSize">
<short>Gets the number of bytes needed for the UTF-8 codepoint.</short>