Provides routines used to read XML content from a file, text file, or stream.

laz2_xmlread.pas provides routines used to read XML content from a file, text file, or stream. It is copied from the FCL unit xmlread.pp (SVN revision 15251), and adapted to use UTF-8 instead of widestrings by Mattias Gaertner.

Copyright (c) 1999-2000 by Sebastian Guenther, sg@freepascal.org
Modified in 2006 by Sergei Gorelkin, sergei_gorelkin@mail.ru
Converted to use UTF-8 instead of widestrings by Mattias Gaertner.

laz2_xmlread.pas is part of the lazutils package.

Represents error severity codes for XML read operations.

TErrorSeverity is an enumerated type that represents error severity codes for XML read operations. TErrorSeverity is the type used to represent error codes in EXMLReadError, and passed as an argument to the error handler in the TXMLReader.

Error is a warning. Error is an Error (not a Warning). Error is an unrecoverable Fatal Error. Represents options enabled when reading XML content. TXMLReaderFlag is an enumerated type with values for options enabled when reading XML content. TXMLReaderFlag values are stored in the TXMLReaderFlags set type, and passed as an argument to the ReadXMLFile, ReadXMLFragment, and ReadDTDFile routines. Indicates the < character is allowed in attribute values. Indicates control characters and reserved XML characters are allowed in XML attribute values. Please note that enabling (and using) this option can result in an XML document that is technically not valid. Null characters are never allowed in an XML document. Controls characters are allowed in XML 1.1, but must be encoded - which this flag circuments. Indicates if '--' is allowed in an XML comment. Indicates whitespace is preserved when reading XML content. Set type used to store options enabled when reading XML content.

TXMLReaderFlags is a set type used to store zero or more values from the TXMLReaderFlag enumeration. TXMLReaderFlags is the type passed as an argument to the ReadXMLFile, ReadXMLFragment, and ReadDTDFile routines.

Exception raised when reading XML content.

EXMLReadError is an Exception descendant raised when an error occurs while reading XML content. EXMLReadError provides properties that indicate the severity of the error, its error message, and the position where the error occurred.

EXMLReadError is used in the TXMLReader implementation class which de-serializes XML documents.

Severity for the XML read error.

Severity is a read-only TErrorSeverity property that identifies the severity of the error. The value in Severity is assigned when an error is encountered, and an exception is created in TXMLReader.

Use ErrorMessage to get the description for the error condition. Use Line and LinePos to determine the line and column numbers in the XML input source where the error was encountered.

Error message for the XML read error.

ErrorMessage is a read-only String property that contains a description for the error condition. The value in ErrorMessage is assigned when the exception is created in TXMLReader.

Use Severity to determine the error level for the exception. Use Line and LinePos to get the line and column numbers where the error occurred in the XML input source.

Line number in the XML content where the error occurred. Offset in the line where the error occurred. TPoint with the line and column numbers where the error occurred. Value for the TPoint type. Reads the content of an XML file into the specified XML document.

ReadXMLFile is an overloaded procedure used to read the XML content from the specified source. Overloaded variants are provided which use a file name, a Text type, a File type, or a Stream as the input source.

When using the variant that specifies AFileName, the parameter must represent a valid file name. An exception is raised if AFileName does not exist on the local file system. The value in AFileName is converted to a File URI and used as the BaseURI in the XML Document.

The Flags parameter contains TXMLReaderFlag values enabled in the routine. The default value for the parameter is an empty set ([]), and indicates that no options are enabled by default. Values in the Flags parameter control the behavior enabled when XML content is de-serialized. Please note that use of the xrfAllowSpecialCharsInAttributeValue flag allows characters which are not normally allowed in an XML document. An XML document created with this option enabled (and used) cannot be exchanged and processed by an external validating XML processor. Technically, the XML document is invalid. WriteXML and WriteXMLFile in LazUtils will accept and process these values when configured to do so.

ReadXMLFile creates a TXMLReader instance that is used to read, parse, and store the values from the XML input source using its ProcessXML method. Values read in the routine are stored in the TXMLDocument instance in ADoc.

XML document populated in the routine. File name with the XML content read in the routine. Options enabled when reading the XML content. Stream with the XML content read in the routine. Base URI for XML content read in the routine. Reads an XML fragment into the specified DOM Node.

ReadXMLFragment is an overloaded procedure used to read an XML Document Fragment from the specified input source. Overloaded variants are provided that use a file name, a File type, a Text type, or a stream as the input source for the routine. Values read in the method are stored as children in the DOM Node specified in AParentNode.

When using the variant that specifies AFileName, the parameter must represent a valid file name. An exception is raised if AFileName does not exist on the local file system. The value in AFileName is converted to a File URI and used as the BaseURI in the XML Document.

The Flags parameter contains TXMLReaderFlag values enabled in the routine. The default value for the parameter is an empty set ([]), and indicates that no options are enabled by default. Values in the Flags parameter control the behavior enabled when XML content is de-serialized.

ReadXMLFragment creates a TXMLReader instance that is used to read, parse, and store the values from the XML input source using its ProcessFragment method.

TDOMNode
DOM Node used as the parent for the XML fragment. File name with the XML content read in the routine. Options enabled while reading XML content. Stream with the XML content read in the routine. Base URI for XML content read in the routine. Reads and stores a DTD file into the specified XML document. XML document where the DTD is stored in the routine. File name with the content for the DTD. Stream with the content for the DTD. Base URI for values read in the routine. Represents option settings for the TDOMParser class.

TDOMParseOptions represents parser options used in the TDOMParser class. TDOMParseOptions contains properties that control the behavior of the DOM parser, and the TXMLReader that uses the parser.

Gets the value for the CanonicalForm property. Value for the CanonicalForm property. Sets the value for the CanonicalForm property. New value for the CanonicalForm property. Indicates if validation is performed when reading XML content. Indicates if whitespace is preserved when reading XML content. Indicates if entities references are expanded when reading XML values. Indicates if comments are ignored when reading XML content. Indicates if CDATA sections are treated as Text nodes. Not used in the current implementation. Indicates if Namespaces are handled when reading XML content. Indicates if DTDs are handled when reading XML content. Maximum number of characters allowed in expanded entity references. Indicates whether XML Canonical Form is used for parsed XML content.

CanonicalForm is a Boolean property which indicates whether an XML parser should convert XML content to Canonical XML form. This is also referred to as Normal form, and means that the following are used in the parsed XML document:

  • UTF-8 encoding is used.
  • An End-of-line is represented using the newline character (#10).
  • Whitespace in attribute values is normalized.
  • Entity references and non-special character references are expanded.
  • CDATA sections are replaced with their character content.
  • Empty elements are encoded as start and end element tag pairs, not using the special empty element syntax.
  • Default attributes are explicitly declared.
  • Superfluous namespace declarations are deleted.

The property value is True when all of the following properties are True:

  • CanonicalForm
  • ExpandEntities
  • CDSectionsAsText
  • Namespaces
  • PreserveWhitespace

Changing the property value to True causes the following properties to be set to True:

  • ExpandEntities
  • CDSectionsAsText
  • Namespaces
  • PreserveWhitespace

The value in CanonicalForm is used in the constructor for a TXMLReader class instance which uses an XML parser with these parse options.

Values that control how XML content is stored in a DOM sub-tree.

TXMLContextAction is an enumeration type with values that control how XML content is stored in the DOM sub-tree for a specified context. TXMLContextAction represents the value passed as an argument to the TDOMParser.ParseWithContext method.

DOM Fragment is appended as child nodes. DOM Fragment replaces child nodes. DOM Fragment is inserted before child nodes. DOM Fragment is appended after child nodes. DOM Fragment replaces child nodes. Event type signalled when an error occurs while processing XML content.

TXMLErrorEvent is an object procedure type that specifies an event signalled when an error occurs while reading and parsing XML content. TXMLErrorEvent is the type used for the TDOMParsser.OnError event handler, and allows the parser and its TXMLReader class instance to share error information and control.

Exception for the event notification. Represents the input source for XML content.

TXMLInputSource is a class used to represent an input source with XML content. TXMLInputSource is based on the DOM InputSource interface from DOM Level 3. It is not a fully compliant implementation of the interface. It does not implement separate byteStream and characterStream properties; the Stream and StringData properties are provided instead. In addition, it does not implement the Encoding property. All values in are expected to use the UTF-8 encoding.

Use the overloaded constructors to create an input source with either a string or a stream as the storage for the XML content.

A TXMLInputSource instance is passed as an argument to the TDOMParser.Parse and TDOMParser.ParseWithContext methods. It is subsequently passed to the TXMLReader instance that reads XML content from the input source.

Constructor for the class instance.

Create is the overloaded constructor for the class instance. The variants allow the content for the input source to be assigned using a String or a TStream descendant. Both variants call the inherited constructor to initialize the class instance. The values in AStringData and AStream are assigned to their corresponding properties.

Stream with the content for the XML input source. Values used as the content for the XML input source. Stream with the XML content for the input source. Stream is a read-only TStream property that contains the TStream descendant with the XML content for the input source. The value in Stream is assigned in the Create constructor. String with the XML content for the input source. StringData is a read-only String property that contains the XML content for the input source. The value in StringData is assigned in the Create constructor. Base URI for content in the XML input source.

BaseURI is a String property that represents the absolute resource identifier used to resolve relative URIs found in the XML input source. BaseURI contains the value passed as an argument to the ReadXMLFile, ReadXMLFragment, and ReadDTDFile routines. BaseURI is supplied to TXMLReader and TDOMParser to resolve relative URIs when processing the XML input source.

System Identifier for content in the XML input source.

SystemID is a String property that represents the System Identifier for content in the XML input source. SystemID normally contains a URL for the resource.

The initial value in SystemID is the BaseURI passed as an argument to the ReadXMLFile, ReadXMLFragment, and ReadDTDFile routines. SystemID is updated when TXMLReader is used to resolve entity references, notations, or document types in the XML content for the input source.

Public Identifier for content in the XML input source.

PublicID is a String property that represent the Public Identifier for content in the XML input source. PublicID contains a value in the following format:

[Prefix]//[OwnerID]//[TextClass] [TextDescription]//[Language]//[DisplayVersion]

Where the components have the following values and meanings:

[Prefix]
One of the values: '-', '+', or 'ISO'
[OwnerID]
A value like 'W3C' or 'mozilla,.org'
[TextClass]
Values like 'DTD' or 'NOTATION'
[TextDescription]
Values like 'HTML 4.01' or 'DocBook XML V5.0'
[Language]
Values like 'EN', 'FR', 'DE'
[DisplayVersion]
Optional values

For example, the PublicID for DocBook version 5 is:

-//OASIS//DTD DocBook V5.0//EN

Use SystemID to get the URI (or URL) for the resource.

Implements a parser used to de-serialize XML content into DOM Nodes.

TDOMParser is a class which implements a DOM Parser component. TDOMParser provides methods to parse XML content specified by its URI or using an XML input source. TDOMParser creates DOM nodes needed to represent the XML content, and stores the nodes in a TXMLDocument instance or a DOM content node.

Use the Options property to enable specific features or behavior in the parser. Of particular importance is the Validate, which enables validation when processing XML content in the parser.

DOM (Document Object Model) does not specify an interface for a parser class. TDOMParser utilizes the TXMLReader class to perform the actions required to convert and de-serialize XML content. TXMLReader closely resembles the XMLReader class defined in SAX (Simple API for XML), but uses its own methods instead of Handler class instances to process various DOM Node types. TXMLReader is compliant with the XML 1.0 specification.

Constructor for the class instance.

Create is the constructor for the class instance. Create allocates resources needed for the Options property.

Destructor for the class instance.

Destroy is the destructor for the class instance. Destroy frees resources allocated to the Options property, and calls the inherited destructor.

Parses the input source and updates the specified XML document.

Parse is a procedure used to read and process the XML content in the specified XML input source. Parse creates DOM nodes needed to represent the XML content, and adds them to the specified TXMLDocument.

Parse creates TXMLReader and TXMLCharSource class instances that are used to convert and process the XML content in Src.

ADoc is an output parameter that represents the TXMLDocument created and updated in the XML reader class instance.

XML input source with content parsed in the method. XML Document updated in the method. Parses XML content from the specified URI into the XML document.

ParseURI is a procedure used to read and process the XML content from the specified URI. ParseURI creates DOM nodes needed to represent the XML content, and adds them to the specified TXMLDocument.

ParseURI creates TXMLReader and TXMLCharSource class instances that are used to convert and process the XML content in URI. The value in URI is resolved to determine the absolute URI used when processing the XML content.

ADoc is an output parameter that represents the TXMLDocument created and updated in the XML reader class instance.

URI with the content parsed in the method. XML document updated in the method. Parses the XML input source into the specified DOM context Node.

ParseWithContext is a TDOMNode function used to read and process XML content in the specified XML input source. ParseWithContext creates DOM nodes needed to represent the XML content.

The newly created DOM nodes are accumulated in a temporary DOM Document Fragment, and its child nodes are ultimately added to the DOM node in Context. The insertion point for the new nodes is determined using the Action argument. When Action contains xaAppendAsChildren or xaReplaceChildren, the nodes are added (or replaced) as child nodes in Context. When Action contains xaInsertBefore, xaInsertAfter, or xaReplace, the nodes are added (or replaced) as child nodes in the parent node for Context.

ParseWithContext creates a TXMLReader that is used to convert and process the XML content in the XML input source. The return value contains the first child node in the DOM Document Fragment created in the method.

An EDOMNotSupported exception is raised when Action contains xaReplaceChildren and the target node is also a DOM Document node. Replacing the DOM Document is not supported in TDOMParser.

An EDOMHierarchyRequest exception is raised when the target node is not a DOM Element or Document Fragment node.

TDOMNode
Not used in the current implementation. XML input source for the operation. DOM Node that provides the context for DOM nodes read in the method. Load and Save action requested for the specified context. Options for the DOM parser. Options is a read-only TDOMParseOptions property that represents options settings used when processing XML content in the parser. Event handler signalled when an error occurs while reading XML content. OnError is a TXMLErrorEvent property that implements the event handler signalled when a error occurs while reading or processing XML content in the parser. Decoder used to convert encoded values in XML content. Pointer to the context for the decoder. Routine used to decode content using the specified buffer arguments. Decode is an address for the Integer function (??? is there a better term?) used to decode content using the specified buffer arguments. Number of characters decoded in the routine. Buffer with values decoded in the routine. Length of the input buffer. Output buffer for the values decoded in the routine. Length of the output buffer. Optional routine used to perform cleanup actions for the decoder Context. Cleanup is an address to the procedure used to perform cleanup actions for the decoder Context. Pointer to the Context for the decoder. Function type used to get a decoder routine for the specified target encoding.

TGetDecoderProc is a Boolean function type used to get the Decoder routine used to convert value in an arbitrary encoding to the encoding specified in AEncoding.

TGetDecoderProc is the type passed as an argument to the RegisterDecoder routine.

True if a decoder routine is found for the specified target encoding. Encoding name for the converted values from the decoder routine. Returns a record with the decode and clean-up routines for a given context. Registers the procedure used to decode values in a specific encoding. Procedure to register in the routine.