mirror of
https://gitlab.com/freepascal.org/lazarus/lazarus.git
synced 2025-04-18 21:50:05 +02:00
Docs: LazUtils/lazutf8. Updates UTF8FixBroken topic to include the ReplaceChar argument added in cc3fc445
.
(cherry picked from commit 0f3fec23f7
)
This commit is contained in:
parent
7fa257e772
commit
9bf9348da0
@ -1065,35 +1065,38 @@ Deprecated. Use UTF8CodepointToByteIndex instead.
|
||||
|
||||
<element name="UTF8FixBroken">
|
||||
<short>
|
||||
Replaces all invalid UTF-8 characters with spaces.
|
||||
Replaces all invalid UTF-8 characters in a string with the specified character.
|
||||
</short>
|
||||
<descr>
|
||||
<p>
|
||||
<var>UTF8FixBroken</var> is an overloaded routine used to replace all invalid
|
||||
UTF-8 characters with spaces. The overloaded variants allow the UTF-8-encoded
|
||||
content to be specified using either a PChar or a String type.
|
||||
UTF-8 characters in the specified value with a replacement character. The
|
||||
overloaded variants allow the UTF-8-encoded content to be specified using
|
||||
either a PChar or a String type.
|
||||
</p>
|
||||
<p>
|
||||
<var>ReplaceChar</var> contains the character used to replace any invalid UTF-8
|
||||
characters found in the input value. The default value for ReplaceChar is the
|
||||
Space character (Hex $20 Decimal 32).
|
||||
</p>
|
||||
<p>
|
||||
The PChar variant examines the specified byte values to determine when an
|
||||
invalid UTF-8 codepoint is found. This includes byte values that fall outside
|
||||
of the ranges allowed in UTF-8, and common byte sequences used to inject XSS
|
||||
vulnerabilities.
|
||||
invalid UTF-8 codepoint is found. This includes 1, 2, or 3 byte values, those
|
||||
that fall outside of the ranges allowed in UTF-8, and common byte sequences
|
||||
used to inject XSS vulnerabilities. UTF8FixBroken stops processing at the first
|
||||
occurrence of the byte value #0 (Decimal 0). UTF-8 byte sequences updated in
|
||||
the routine are stored in the original PChar argument.
|
||||
</p>
|
||||
<p>
|
||||
UTF-8 byte sequences updated in the routine are stored in the original PChar
|
||||
argument.
|
||||
</p>
|
||||
<p>
|
||||
UTF8FixBroken processing at the first occurrence of the byte value #0
|
||||
(Decimal 0).
|
||||
</p>
|
||||
<p>
|
||||
The String variant converts the argument to a PChar type and calls
|
||||
FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. When found,
|
||||
UniqueString is called to get a new reference-counted String for the return
|
||||
value.
|
||||
The String variant converts the input argument to a PChar type and calls
|
||||
FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. If invalid
|
||||
bytes are found, UniqueString is called to get a new reference-counted String
|
||||
for the return value generated by calling the overloaded PChar variant.
|
||||
</p>
|
||||
</descr>
|
||||
<version>
|
||||
Modified in LazUtils version 4.0 to include the ReplaceChar argument.
|
||||
</version>
|
||||
<seealso>
|
||||
<link id="FindInvalidUTF8Codepoint"/>
|
||||
<link id="#rtl.system.UniqueString">UniqueString</link>
|
||||
@ -1109,6 +1112,12 @@ PChar with the UTF-8-encoded values examined in the routine.
|
||||
String with the UTF-8-encoded values examined in the routine.
|
||||
</short>
|
||||
</element>
|
||||
<element name="UTF8FixBroken.ReplaceChar">
|
||||
<short>
|
||||
Character used to replace invalid codepoints in the input argument. The default
|
||||
value for the argument is the Space character (decimal 32 hex $20).
|
||||
</short>
|
||||
</element>
|
||||
|
||||
<element name="UTF8CodepointStrictSize">
|
||||
<short>Gets the number of bytes needed for the UTF-8 codepoint.</short>
|
||||
|
Loading…
Reference in New Issue
Block a user