Contains routines used for string manipulation.

LazStringUtils.pas contains routines used for string manipulation. It is part of the LazUtils package.

File added in LCL version 2.0.X (revision 58631). Represents comment styles available in CommentText. Auto-detected comment style. No comment. Comment surrounded by { } characters. Delphi inline comment using a // marker. TurboPascal comment using (* *) markers. C++ comment using /* */ markers. Perl comment using a # marker. HTML/XML comment using  markers. Set type for TCommentType enumeration values. End-of-line character sequence.

EndOfLine is a ShortString constant used to represent the end-of-line character sequence. The value is set to the LineEnding for the platform or operating system.

LineEnding Determines if a string starts with the specified value.

LazStartsStr is a Boolean function used to determine if the string in AText starts with the text specified in ASubText. It is a modified version of StartsStr from the RTL strutils.pp unit.

LazStartsStr casts the values in ASubText and AText to PChar types, and calls StrLComp to perform the comparison.

The returns value is True when AText begins with the specified sub-text. It also returns True when ASubText is an empty string (''), which is Delphi compatible.

True when the string starts with the specified sub-text. Value to look for at the start of the text. Text examined in the routine. Determines if a string ends with the specified value.

LazEndsStr is a Boolean function used to determine if the string in AText ends with the text specified in ASubText. It is a modified version of EndsStr from the RTL strutils.pp unit.

LazEndsStr casts the values in ASubText and AText to PChar types, and calls StrLComp to perform the comparison.

The returns value is True when AText ends with the specified sub-text. It also returns True when ASubText is an empty string (''), which is Delphi compatible.

True when the string ends with the specified value. Text to look for at the end of the string. String examined in the routine. Determines if a string occurs at the beginning of the specified text.

LazStartsText is a fast implementation of StartsText. The version in the RTL strutils.pp unit calls AnsiCompareText and is very slow.

LazStartsText casts the values in ASubText and AText to PChar types, and calls StrLIComp to perform a case-insensitive comparison for the number of characters in ASubText.

The return value is True when ASubText and AText start with the same values. The return value is also True when ASubText is an empty string (''); this is Delphi compatible. The return value is False when StrLIComp returns a non-zero value.

StartsText StrLIComp True when the sub-text is at the start of the string value. Value to locate at the start of the string. String value used in the comparison. Determines if a string occurs at the end of the specified text.

LazEndsText is a fast implementation of EndsText. The version in the RTL strutils.pp unit calls AnsiCompareText and is very slow.

LazEndsText casts the values in ASubText and AText to PChar types, and calls StrLIComp to perform a case-insensitive comparison for the number of characters in ASubText at the end of the value in AText.

The return value is True when AText ends with the value in ASubText. The return value is True when ASubText is an empty string (''); this is Delphi compatible. It is False when StrLIComp returns a non-zero value, or when ASubText is longer than AText.

EndsText StrLIComp True when the sub-text is at the end of the string value. Value to locate at the end of the string. String value used in the comparison. A case-insensitive optimized version of the Pos routine.

PosI implements a case-insensitive optimized version of the Pos routine found the RTL system unit. It only accepts String values in its parameters, unlike the RTL overloaded variants which accept combinations of the Char, ShortString, AnsiString, UnicodeString, WideString, and Variant types.

It is also an alternative to the ContainsText routine in the RTL StrUtils unit, which has a very slow implementation.

SubStr contains the value to locate in S.

PosI supports ASCII comparison. It converts the values in SubStr and S to lowercase char by char for fast comparisons. The result is much faster than Pos(Lowercase(SubStr),Lowercase(S)) which is often used.

The return value contains the position in S where SubStr is located, or 0 when SubStr does not occur in S. The position is 1-based just like indexed access to values in the String type.

Pos ContainsText Position of the sub-string within the searched value, or 0 when not found. Value to locate in the searched value. Value searched for the specified sub-string. Indicates whether the characters in the specified value are numeric digits in the range '0'..'9'.

Examines each character in s (in reverse order).

Returns True if all of the characters in s are in the range '0'..'9'. Returns False if any character is not in the required range, or when s is an empty string ('').

Added in LazUtils version 4.0. Replaces the deprecated IsNumber() routine. Returns True if all of the characters in s are in the range '0'..'9'. String with the characters examined in the routine. Deprecated. Use IsNumeric instead.

Calls the IsNumeric routine to determine the return value.

Returns True if all of the characters in s are in the range '0'..'9'. Returns False if any character is not in the required range, or when s is an empty string ('').

Deprecated in LazUtils version 4.0 to avoid confusion with the RTL routine of the same name. Use IsNumeric instead. True when characters in s are in the range '0'..'9' String with values examined in the routine. Gets the number of LineEnding sequences in the specified text. Number of LineEnding sequences in the text. Text examined in the routine. Number of characters in the last line. Converts CR and LF characters in s string to the specified line-ending sequence.

ChangeLineEndings is a String function used to convert CR or LF characters in a string to the specified line-ending character sequence. ChangeLineEndings assumes that values in S are single-byte values and not multi-byte UTF-8 characters.

ChangeLineEndings iterates over the byte values in S, and replaces any occurrences of CR (#13), LF (#10), or CR+LF (#13#10) to the specified end-of-line character sequence. No actions are performed in the function when S is an empty string ('').

Value from S after conversion of line endings. Values examined in the routine. End-of-line character sequence applied to the return value. Normalizes end-of-line characters in a string to the value in LineEnding.

Calls ChangeLineEndings to normalize line ending sequences to the value in LineEnding.

Value after line ending sequences have been replaced. String with values examined and updated in the routine. Converts CR or LF characters in a string to the specified delimiter character. Value after converting CR, LF to the delimiter character String with values converted in the routine. Delimiter character used in place of CR, LF characters. Converts all Tab characters in a string to the specified number of space characters.

Replaces all Tab characters (#9) in S with the specified number of space characters.

String value after converting Tab characters to spaces. String with values updated in the routine. Number of space characters to use for each Tab character. True when UTF-8 codepoints are used to convert individual characters. Converts a string into a comment using the specified comment style.

CommentText is a String function used to convert the specified string into a comment using a given comment style.

CommentType is a TCommentType value which indicates the comment style applied to the value in S. See TCommentType for more information about the enumeration values and their meanings.

No actions are performed in the function when CommentType contains comtNone.

When CommentType contains comtDefault, the comtPascal comment style is applied to the value in S.

An internal procedure is used to apply the starting marker (and ending marker when needed) as well as a continuation character sequence for a multi-line comment.

CommentText is used in the implementation of the source code editor in the Lazarus IDE. An Exception can be raised if the comment length does not match the expected length for the comment style.

Raises an Exception with the message 'CommentText ERROR: ' when an unexpected length is found for the commented value. Comment with the specified style marker(s). Value converted into a comment. Comment type applied to the string. Creates a regular expression from a filter expression used in IDE dialogs.

Ensures that characters in Src are converted to the notation needed for regular expressions, including '.', ',', ';', '*', '+', '?', and '\'. The return value is enclosed in a regex single line expression ('^(...)$').

Used in the implementation of the Clean Directory and Change Encoding dialogs in the Lazarus IDE.

Regular expression for the value in Src. Filter expression converted to a regular expression in the routine. Replaces control characters with Pascal-style character constants.

Replaces control characters (#0..#31) with Pascal-style character constants using the #nnn notation.

String after replacing binary characters. Values examined and converted in the routine. Converts occurrences of special characters to spaces.

Converts special characters (#0..#31, #127) to a Space character. Converts line breaks to a single Space character (#32). Trims leading and trailing Spaces. Calls UTF8FixBroken when FixUTF8 is True.

String after converting special characters. String examined and converted in the routine. True when invalid UTF-8 codepoints are repaired in the string. Shortens and "ellipsifies" the specified value.

ShortDotsLine is a String function used to generate a shortened and "ellipsified" string for the value in Line.

ShortDotsLine calls Utf8EscapeControlChars to convert any control characters in Line to their representation as a hexadecimal character value.

The value in the MaxTextLen constant is used as the maximum length for the "ellipsified" string value. If the number of UTF-8 codepoints in the line is larger than the value in MaxTextLen, the string is shortened to the maximum length and 3 (three) Period ('.') characters are appended to the return value.

Utf8EscapeControlChars Shortened and "ellipsified" value for the specified line of text. Line of text examined in the routine. Combines and optionally shortens the specified values.

BeautifyLineXY is a String function used to combine the values in the Filename, Line, X and Y arguments into a formatted message. The message is in the form:

examplefile.pas (123, 1) The error message goes here.

Filename contains a file name used at the start of the formatted message.

X represents the line number in the context for the message.

Y represents the column number in the context for the message.

Line contains the context for the formatted message. The ShortDotsLine routine is called to shorten and "ellipsify" the message in Line when needed.

BeautifyLineXY is used in the implementation of Jump History and Search Result views in the Lazarus IDE.

Formatted message using the specified values. File name used at the start of the formatted message. Contains the context for the formatted message. Line number for the message context. Column number for the message context. Applies line breaks and indenting to a string value.

BreakString is a String function used to apply line breaks and indent spacing to the text specified in S.

MaxLineLength contains the maximum number of characters allowed on any given line.

Indent contains the number of space characters used to indent text following a line break. The value in Indent may be adjusted if it is too large for the value specified in MaxLineLength.

BreakString examines values in S and counts the number of characters in each of the lines. Existing CR (#13) or LF (#10) characters are preserved. If the value in MaxLineLength is reached for any given line, a new line is created by inserting the value in LineEnding. The line break occurs at a natural word boundary when one can be determined.

Inserting a line break causes an indent with the number of space characters in Indent to be inserted in the return value following the line break.

The process is repeated until all values in S have been handled.

The return value contains the content in S after applying line breaks and indent spacing.

LineEnding String with values after applying line breaks and indent spacing. Contains the text examined and formatted in the routine. Maximum length of lines in the converted value. Number of Space characters prepended as an indent for lines in the converted value. Creates and populates a TStringList with lines determined using the specified delimiter character. TStrings instance created and populated in the routine. String with the values examined and loaded into the string list. Character used to delimit lines of text in S. TStrings instance where lines of text are stored. True to clear the string list; False to append lines to existing values. Gets a string with the lines of text from the specified TStrings instance. String with the delimited lines of text from the TStrings instance. TStrings instance with text values retrieved in the routine. End-of-Line sequence used to delimit lines of text in the result value. True to omit empty lines in the string list from the return value. Converts the specified lines in a TStrings instance to a string value.

StringListPartToText is a String function used to get a line of text which contains the specified range of lines from a TStrings instance.

List is the TStrings instance which contains the lines of text examined in the function.

FromIndex and ToIndex indicate the line numbers in List used in the return value for the function. They must contain valid ordinal positions, and are used to access the indexed Strings property in the TStrings instance.

If FromIndex contains -1, it defaults to the first ordinal position (0). ToIndex must be equal to or larger than the value in FromIndex, and valid for the number of Strings in the string list. It defaults to the upper limit for the string list when it is too large. FromIndex cannot have a value that is larger than the one in ToIndex.

No actions are performed in the function when List is unassigned (contains Nil), or when values in the FromIndex or ToIndex parameters are invalid. The return value is an empty string ('') in these scenarios.

IgnoreEmptyLines indicates whether empty lines in the string list are omitted from the return value for the function. When set to True, any Strings value that is an empty string ('') is discarded. Otherwise, the empty value is denoted by adding the value in Delimiter to the return value.

Delimiter contains the end-of-line sequence used to separate strings added to the return value.

TStrings TStringList Text representing the specified lines in the TStrings instance. TStrings instance examined in the method. First line includes in the text. Last line included in the text. Delimiter inserted between lines in the text. Indicates if empty lines are excluded from the text. Converts the content in a TStrings instance to a string value.

Adds a LF (#10) character to the end of text lines in the string list. Quotes each string value which ends with a LF character using surrounding Quote (') characters.

String representing the contents of the TStrings instance. TStrings instance examined in the method. First line included in the string value. Last line included in the string value. Indicates if empty lines are excluded from the result. Stores a multi-line string as separate lines in a TStrings instance.

StringToStringList is a procedure used to convert the multi-line String in S to separate lines of text in a TStrings instance.

The LF (#10) character is used to mark the end of a line in S, and causes the preceding text to be added to the string list in List. The LF character is not included in the value added to the string list.

If no end-of-line characters are found in S, then a single line of text is added to the string list.

String with values extracted and stored in the string list. TStrings instance where values are stored in the routine. Gets the next delimited value in List starting at the specified position.

GetNextDelimitedItem is a String function used to get the next item in a delimited list of items starting at the specified position.

List contains the list of values examined in the routine.

Delimiter is the character used to separate item values in List.

Position contains the initial character position in List examined in the routine.

GetNextDelimitedItem iterates over the characters in List starting at the character in Position. When the character in Delimiter is encountered, the characters starting at Position and prior to the position for the Delimiter are copied into the return value.

The value in Position is incremented to skip both the character values and the delimiter for the list item.

Value for the list item at the specified position. List of delimited values examined in the routine. Delimiter character used to separate values in the list. Initial position in the list where the characters are examined Determines if a value exists in a delimited list of values.

HasDelimitedItem is a Boolean function used to determine if the specified value exists in a delimited list of values.

List contains the item values examined in the routine.

Delimiter is the character used to separate the item values in List.

FindItem contains the value to locate in the List of items.

HasDelimitedItem calls FindNextDelimitedItem to get the return value for the method. The return value is True when FindNextDelimitedItem returns an non-empty string value.

True when the specified item is found in the list. Values checked for the specified item. Delimiter used to separate items in the list. Value to locate in the list of items. Finds the next occurrence of a specific value in a delimited list.

FindNextDelimitedItem is a String function used to find the next occurrence of a specific value in a delimited list of values.

List is the list with the delimited values searched in the routine.

Delimiter is the character used to separate the values in List.

Position is a variable parameter with the character index in List where the find operation is started. Position is 1-based, like using indexed access to the values in a String.

FindItem is the value to locate in List starting at the given position.

FindNextDelimitedItem calls the GetNextDelimitedItem routine to get the individual delimited values in List. GetNextDelimitedItem updates the value in Position when an item is retrieved. The value in Position is set to Len(List)+1 if FindItem is not found in the routine.

The return value is set to the value in FindItem if it is found in List. The return value is an empty string ('') if FindItem is not found in the List starting at the given position.

The value in FindItem when found, or an empty string. List with delimited values examined in the routine. Character used to separate values in the list. Starting position for the values examined in the routine. Item value to locate in List. Combines two string values using the specified delimiter character.

The value in Delimiter is omitted from the return value if either A or B is an empty string ('').

Combined values using the specified delimiter. First value merged in the result. Second value merged after the delimiter. Delimiter used to separate values. Gets the first line of text up to an end-of-line character.

StripLn is a String function used to get the first line of text in the specified value up to an end-of-line character. CR and LF characters are recognized as end-of-line characters.

The return value contains the values from ALine prior to the first end-of-line character, or the entire contents of ALine when an end-of-line character is not found.

The value in ALine is a constant parameter and is not altered in any way in the routine. Text in the initial line of text. Values examined in the routine. Implements formatting facilities in the debugger. GetPart is an overloaded String function. It is used to implement facilities in the debugger. Converts a multi-line string to a single line of text.

TextToSingleLine is a String function used to replace end-of-line characters (like CR (#13) and LF (#10)) in AText with Space (#32) characters. Duplicate adjacent Space characters are converted to a single Space character in the return value.


// uses LazStringUtils;
// var sMultiLn, sSingleLn: String;

sMultiLn := 'The rain  '#10' in Spain  '#13#10' falls  mainly  on  the  plain.  ';

sSingleLn := TextToSingleLine(sMultiLn);
// value is: 'The rain in Spain falls mainly on the plain.'

Text after removing end-of-line characters and duplicate space characters. Text values examined and converted in the routine. Inverts the case for characters in the specified text.

Inverts the case for characters in the specified string value. Like using LowerCase and UpperCase simultaneously.

String with values after inverting the case for each character. String with the characters converted in the routine. Replaces (or appends) the specified number of bytes at a given position.

ReplaceSubstring is a procedure used to replace a portion of a string with the specified value.

Startpos contains the byte position in S where the substitution is performed. When StartPos is larger that the number of bytes in S, the value in Insertion is appended to the existing string value. The initial value in StartPos is 1.

Count contains the number of bytes in the string to be replaced in the routine. Count cannot exceed the number of bytes available starting at StartPos. No actions are performed in the routine when Count is 0 (zero) and the length of the Insertion parameter is 0 (zero).

ReplaceSubstring calls CompareMem to determine if the specified range in S and the value in Insertion have the same content. No actions are performed when the contain the same values.

The affected byte values in S are transferred by calling the System.Move routine. SetLength is called to update the new length for the string.

String with values examined and updated in the routine. Initial byte position in the string where the substitution occurs. Number of bytes replaced in the string. Value inserted (or appended) to the value in the string. Emulates the CASE .. OF statement for string values.

StringCase is an overloaded Integer function used to emulate the Pascal CASE .. OF statement.

AString contains the value compared to the elements in the ACase array.

ACase is an array of String values with the case constants used in the comparison to AString.

AIgnoreCase indicates whether case is significant when AString is compared to elements in ACase. When set to False (the default), case is significant. When set to True, a case-insensitive comparison is performed for the values using the CompareText routine.

APartial indicates whether the value in AString can be a partial match for the value in an array element. When set to False, AString must match the array element exactly to be considered a match. When set to True, any value in ACase which starts with the value in AString is considered a match.

The return value contains the ordinal position for the element in ACase which matches the value in AString. The return value is -1 if a match was not found for the value in AString.


// var SelOpt: Integer;

// returns 2
SelOpt := StringCase('Charlie', [ 'Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo' ]);

// returns 3 for case-insensitive partial match
SelOpt := StringCase('del', [ 'Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo' ], True, True);

// returns -1
SelOpt := StringCase('foo', [ 'Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo' ]);

CompareText Ordinal position for the selected case constant, or -1 when a match is not found. Value compared to the array elements in ACase. Case constants which determine the selected case in the return value. True if case is ignored in the comparison. True is a partial match at the start of a selector is considered a match. Returns True if P1 and P2 have the same content.

Returns False if either P1 or P2 are unassigned (contain Nil).

True when P1 and P2 have the same content, or are the same pointer. Pointer to characters compared in the routine. Pointer to characters compared in the routine. Like StrScan but compares only the specified number of characters in MaxLen.

The return value is Nil when P is unassigned (contains Nil), or when P contains a terminating null character prior to finding a match for c before comparing the requested number of characters in MaxLen.

When c is located in P, the return value is a PChar pointer to the location where c was located.

Pointer to the character located in P, or Nil. Pointer to characters examined in the routine. Character to locate in the specified values. Maximum number of characters examined in the routine. Writes the specified string value to the specified file name.

SaveStringToFile creates a TFileStream instance for the specified file name, and writes the content in AString to the stream. The stream is created with fmCreate mode. The Write method in the file stream is used to store the value in AString. The return value is True if the string value was successfully written to the stream instance.

Use LoadStringFromFile to read a string value from a file.

Added in LazUtils version 4.0. TFileStream Returns True if the string was successfully written to the file. String value stored in the routine. File name where the string value is stored. Reads a string with the contents of the specified file name.

LoadStringFromFile creates a TFileStream instance used to load the values from the file name in AFileName. The stream is created with the fmOpenREad file mode, and its Read method is called to read the contents of the entire file stream.

The return values is a String type with the values read from the stream, and its size matches the size of the file stream. If the file stream has no content, the return value is an empty string ('').

Use SaveStringToFile to store a string to a specified file name.

Added in LazUtils version 4.0. TFileStream String with content read from the specified file name. Qualified file name with the string value read in the routine. Deprecated. Use SysUtils.IsValidIdent in RTL instead.

LazIsValidIdent is a Boolean function used to determine if the name specified in Ident contains a valid identifier name. The identifier name can represent a Namespace, Package, Unit, Component, or a Member.

The return value is True when Ident contains a valid identifier name using the Pascal naming convention. The return value is False if Ident contains an empty string ('').

AllowDots indicates whether Period ('.') characters are allowed in the identifier name. The default value for the argument is False (not allowed).

StrictDots indicates whether a Period character in the identifier name causes name validation to restart at the next character in the identifier name. When set to False, a Period character is ignored during identifier name validation. The default value for the argument is False, and is significant only when AllowDots is set to True.

LazIsValidIdent examines each of the characters in Ident to ensure that the value uses the Pascal identifier naming convention. This requires the value to contain only alphanumeric characters or the Underscore ('_') character. It must start with an alphabetic character or the Underscore character; numeric characters cannot be used at the start of the identifier name. The remaining characters in the identifier name can be any alphanumeric character.

Case is not significant when determining the validity of an identifier name.

LazIsValidIdent is used in the implementation of various tools in the Lazarus IDE. (IDEIntf, CodeTools, BuildIntf)

Deprecated since LCL version 3.6. Use SysUtils.IsValidIdent in RTL instead. IsValidIdent True if the specified value is a valid identifier name. Identifier name examined in the routine. True if period characters are allowed in the identifier name. True if the position of a period character is validated. Defines the maximum length for shortened or "ellipsified" text.

Used in the ShortDotsLine routine.