Docs: LazUtils/lazutf8. Updates UTF8FixBroken topic to include the ReplaceChar argument added in cc3fc445.

(cherry picked from commit 0f3fec23f7)
2025-04-18 21:50:05 +02:00 · 2024-08-12 23:40:01 +01:00 · 2024-08-12 23:40:01 +01:00 · 9bf9348da0
commit 9bf9348da0
parent 7fa257e772
1 changed files with 27 additions and 18 deletions
--- a/docs/xml/lazutils/lazutf8.xml
+++ b/docs/xml/lazutils/lazutf8.xml
@ -1065,35 +1065,38 @@ Deprecated. Use UTF8CodepointToByteIndex instead.

 <element name="UTF8FixBroken">
 <short>
-Replaces all invalid UTF-8 characters with spaces.
+Replaces all invalid UTF-8 characters in a string with the specified character.
 </short>
 <descr>
 <p>
 <var>UTF8FixBroken</var> is an overloaded routine used to replace all invalid 
-UTF-8 characters with spaces. The overloaded variants allow the UTF-8-encoded 
-content to be specified using either a PChar or a String type.
+UTF-8 characters in the specified value with a replacement character. The 
+overloaded variants allow the UTF-8-encoded content to be specified using 
+either a PChar or a String type.
+</p>
+<p>
+<var>ReplaceChar</var> contains the character used to replace any invalid UTF-8 
+characters found in the input value. The default value for ReplaceChar is the 
+Space character (Hex $20 Decimal 32).
 </p>
 <p>
 The PChar variant examines the specified byte values to determine when an 
-invalid UTF-8 codepoint is found. This includes byte values that fall outside 
-of the ranges allowed in UTF-8, and common byte sequences used to inject XSS 
-vulnerabilities.
+invalid UTF-8 codepoint is found. This includes 1, 2, or 3 byte values, those 
+that fall outside of the ranges allowed in UTF-8, and common byte sequences 
+used to inject XSS vulnerabilities. UTF8FixBroken stops processing at the first 
+occurrence of the byte value #0 (Decimal 0). UTF-8 byte sequences updated in 
+the routine are stored in the original PChar argument.
 </p>
 <p>
-UTF-8 byte sequences updated in the routine are stored in the original PChar 
-argument.
-</p>
-<p>
-UTF8FixBroken processing at the first occurrence of the byte value #0 
-(Decimal 0).
-</p>
-<p>
-The String variant converts the argument to a PChar type and calls 
-FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. When found, 
-UniqueString is called to get a new reference-counted String for the return 
-value.
+The String variant converts the input argument to a PChar type and calls 
+FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. If invalid 
+bytes are found, UniqueString is called to get a new reference-counted String 
+for the return value generated by calling the overloaded PChar variant.
 </p>
 </descr>
+<version>
+Modified in LazUtils version 4.0 to include the ReplaceChar argument.
+</version>
 <seealso>
 <link id="FindInvalidUTF8Codepoint"/>
 <link id="#rtl.system.UniqueString">UniqueString</link>
@ -1109,6 +1112,12 @@ PChar with the UTF-8-encoded values examined in the routine.
 String with the UTF-8-encoded values examined in the routine.
 </short>
 </element>
+<element name="UTF8FixBroken.ReplaceChar">
+<short>
+Character used to replace invalid codepoints in the input argument. The default 
+value for the argument is the Space character (decimal 32 hex $20).
+</short>
+</element>

 <element name="UTF8CodepointStrictSize">
 <short>Gets the number of bytes needed for the UTF-8 codepoint.</short>