mirror of
https://gitlab.com/freepascal.org/lazarus/lazarus.git
synced 2025-08-22 04:39:27 +02:00
Docs: LazUtils/lazutf8. Updates UTF8FixBroken topic to include the ReplaceChar argument added in cc3fc445
.
This commit is contained in:
parent
f30f2e2a71
commit
0f3fec23f7
@ -1065,35 +1065,38 @@ Deprecated. Use UTF8CodepointToByteIndex instead.
|
|||||||
|
|
||||||
<element name="UTF8FixBroken">
|
<element name="UTF8FixBroken">
|
||||||
<short>
|
<short>
|
||||||
Replaces all invalid UTF-8 characters with spaces.
|
Replaces all invalid UTF-8 characters in a string with the specified character.
|
||||||
</short>
|
</short>
|
||||||
<descr>
|
<descr>
|
||||||
<p>
|
<p>
|
||||||
<var>UTF8FixBroken</var> is an overloaded routine used to replace all invalid
|
<var>UTF8FixBroken</var> is an overloaded routine used to replace all invalid
|
||||||
UTF-8 characters with spaces. The overloaded variants allow the UTF-8-encoded
|
UTF-8 characters in the specified value with a replacement character. The
|
||||||
content to be specified using either a PChar or a String type.
|
overloaded variants allow the UTF-8-encoded content to be specified using
|
||||||
|
either a PChar or a String type.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
<var>ReplaceChar</var> contains the character used to replace any invalid UTF-8
|
||||||
|
characters found in the input value. The default value for ReplaceChar is the
|
||||||
|
Space character (Hex $20 Decimal 32).
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
The PChar variant examines the specified byte values to determine when an
|
The PChar variant examines the specified byte values to determine when an
|
||||||
invalid UTF-8 codepoint is found. This includes byte values that fall outside
|
invalid UTF-8 codepoint is found. This includes 1, 2, or 3 byte values, those
|
||||||
of the ranges allowed in UTF-8, and common byte sequences used to inject XSS
|
that fall outside of the ranges allowed in UTF-8, and common byte sequences
|
||||||
vulnerabilities.
|
used to inject XSS vulnerabilities. UTF8FixBroken stops processing at the first
|
||||||
|
occurrence of the byte value #0 (Decimal 0). UTF-8 byte sequences updated in
|
||||||
|
the routine are stored in the original PChar argument.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
UTF-8 byte sequences updated in the routine are stored in the original PChar
|
The String variant converts the input argument to a PChar type and calls
|
||||||
argument.
|
FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. If invalid
|
||||||
</p>
|
bytes are found, UniqueString is called to get a new reference-counted String
|
||||||
<p>
|
for the return value generated by calling the overloaded PChar variant.
|
||||||
UTF8FixBroken processing at the first occurrence of the byte value #0
|
|
||||||
(Decimal 0).
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
The String variant converts the argument to a PChar type and calls
|
|
||||||
FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. When found,
|
|
||||||
UniqueString is called to get a new reference-counted String for the return
|
|
||||||
value.
|
|
||||||
</p>
|
</p>
|
||||||
</descr>
|
</descr>
|
||||||
|
<version>
|
||||||
|
Modified in LazUtils version 4.0 to include the ReplaceChar argument.
|
||||||
|
</version>
|
||||||
<seealso>
|
<seealso>
|
||||||
<link id="FindInvalidUTF8Codepoint"/>
|
<link id="FindInvalidUTF8Codepoint"/>
|
||||||
<link id="#rtl.system.UniqueString">UniqueString</link>
|
<link id="#rtl.system.UniqueString">UniqueString</link>
|
||||||
@ -1109,6 +1112,12 @@ PChar with the UTF-8-encoded values examined in the routine.
|
|||||||
String with the UTF-8-encoded values examined in the routine.
|
String with the UTF-8-encoded values examined in the routine.
|
||||||
</short>
|
</short>
|
||||||
</element>
|
</element>
|
||||||
|
<element name="UTF8FixBroken.ReplaceChar">
|
||||||
|
<short>
|
||||||
|
Character used to replace invalid codepoints in the input argument. The default
|
||||||
|
value for the argument is the Space character (decimal 32 hex $20).
|
||||||
|
</short>
|
||||||
|
</element>
|
||||||
|
|
||||||
<element name="UTF8CodepointStrictSize">
|
<element name="UTF8CodepointStrictSize">
|
||||||
<short>Gets the number of bytes needed for the UTF-8 codepoint.</short>
|
<short>Gets the number of bytes needed for the UTF-8 codepoint.</short>
|
||||||
|
Loading…
Reference in New Issue
Block a user