Commit Graph

10 Commits

Author SHA1 Message Date
sergei
c7259969ce + fcl-xml, implemented TDOMNode.BaseURI property.
* Moved element loading procedure from xmlread.pp to dom.pp, speeds things up a bit.

git-svn-id: trunk@20558 -
2012-03-21 22:19:27 +00:00
sergei
6adf381867 * fcl-xml, upgrade to comply with XML 1.0 Fifth Edition. This makes naming rules for xml 1.0 identical to ones for xml 1.1.
git-svn-id: trunk@20422 -
2012-02-24 06:25:32 +00:00
sergei
92e0f43f5d * Changed Utf8String to AnsiString, fixes compilation after merging of cpstring.
git-svn-id: trunk@19639 -
2011-11-14 22:58:48 +00:00
sergei
8f29def76e * xmlread.pp: when IgnoreComments=True, merge together text nodes that precede and follow the skipped comment. With this fix, the reader finally produces normalized documents in all modes, so remove the corresponding cheat from testing program (xmlts.pp).
git-svn-id: trunk@15442 -
2010-06-15 16:13:42 +00:00
sergei
f387d7cb2d * Only EXMLReadError is expected to be thrown from a negative test, any other exception is unexpected and should render the test as failed.
git-svn-id: trunk@13860 -
2009-10-15 21:27:31 +00:00
michael
213f8a41c7 * Large patch from Sergei Gorelkin:
xmlutils.pp, names.inc:
    * exclude colon from the NameChar bitmap and handle it in code.

  dom.pp:
    + TDOMText.IsElementContentWhitespace now implemented completely.
    * Attributes created by TDOMElement.SetAttribute get their
     OwnerElement property assigned properly
    * Attribute replaced by TDOMNamedNodeMap.SetNamedItem get their
     OwnerElement reset to nil
    * TDOMElement.SetAttributeNode does not destroy the attribute when it
     is being replaced by itself
    * Most node boolean properties collected into a single FFlags field
     to reduce memory requirements.

  xmlread.pp:
    + Syntax-level support of namespaces: handle colons in names, check
     correct qualified name syntax, prohibit colons in entity/notation/PI
     names and ID/IDREF attribute values (all this only happens when
     Options.Namespaces is set to True - not by default).
    * Reaching end of input while parsing the Ignore Section is a fatal
     error because parameter entities are not recognized there.
    * Reaching end of input while parsing entity value literal that was
     started in a parameter entity aborts immediately instead of hopelessly
     scanning the whole document up to its end.
    * Fixed parsing duplicate Element declarations. The content models of
     subsequent declarations are now discarded as they should - not
     appended to the existing model.
    * Fixed parsing duplicate Attlist declarations. In addition to dropping
     the attribute declaration itself, do not modify the corresponding
     element declaration and suppress 'Duplicate ID attribute' and
     'Duplicate NOTATION attribute' validation errors.
    * Fixed error position in cases when attribute value lacks the closing
     quote.
    * Some refactoring in order to reduce number of WideString vars and code
     size (some SkipX and ExpectX merged into SkipX(required: Boolean)).
    * TXMLCharSource.FLocation record replaced by single integer FLineNo
     because LinePosition is always calculated.
    * TXMLCharSource.FCursor replaced by local var.
    * TXMLReader.NameIs changed to a more general BufEquals(), it eliminates
     TXMLReader.GetString and some WideString variables.

  tests/xmlts.pp:
    * Ignored tests do not change suite conformance state.

  tests/testgen.pp
    * Added a forgotten semicolon.

git-svn-id: trunk@11869 -
2008-10-08 18:06:52 +00:00
michael
d812fa0c92 * Patch from Sergei Gorelkin:
* excludes #$FFFE and #$FFFF from allowed XML 1.1 name chars, so
  IsXmlName result is correct when its argument comes not from the
  parser.
xmlread.pp:
+ Two new parsing options, Namespaces and ResolveExternals (not
  functional yet but needed to proceed).
* Fixed checking of WFC [28a], forces fatal error as soon as possible
  and prevents parsing of further (potentially malicious) data.
  Hopefully now it is truly compliant to the specs and not just
  satisfies the tests.
* In entity value literals, nesting is checked by entity, not by the
  input source (consistent to other places).
- Saving FCursor around attribute default value isn't necessary because
  FCursor is always nil while parsing the DTD.
* TList's changed to more lightweight TFPList's.
* Changed once more (probably the last time) recognizing the standalone
  percent sign in parameter entity declarations. Rationale is that
  FCurChar is no more out of sync with FSource.FBuf^, and therefore may
  be removed.

tests/xmlts.pp and tests/README:
+ Added support for the latest XML test suite (by skipping tests
  targeted for the upcoming fifth edition of XML specs).
+ 'Namespaces' option is passed to the parser.
* README updated with the latest testsuite URL.

git-svn-id: trunk@11303 -
2008-07-01 19:14:56 +00:00
michael
77b38b6be5 * Patch from Sergei Gorelkin:
fcl-xml/src/dom.pp: resolved a number of Level 1 conformance issues:

  * Node.Normalize always deletes empty text nodes
  * Node.Normalize is recursive into Attributes
  * Node.InsertBefore corrected exception code in case when RefChild is
    not one of node's children
  + Node.InsertBefore added missing check for possible cycle in tree
  + Node.AppendChild and Node.InsertBefore added checking type of NewChild
  + CloneNode enabled for Fragment and Entity
  - CloneNode deleted for DocumentType (w3 specs directly prohibit cloning
    it between documents, and cloning within one document is claimed
    'implementation specific' - but makes no sense).
  + Node.ImportNode is now working

  * Uncommented Level 2 node properties (NamespaceURI, localName and
    Prefix), this caused a name clash and a lot of function argument
    renames.

  fcl-xml/src/xmlutils.pp:

  + overloaded IsXmlName() that accepts PWideChars

  fcl-xml/src/xmlconf.pp

  * Applied a fix similar to xmlcfg.pp for Mantis #10554

  fcl-xml/src/xmlread.pp:

  * Major: Got errors reported at correct locations for all 1600+ negative
    tests. Easy to say, but it required modifying almost every second
    line of code.
  * TContentParticle references an existing element definition instead of
    storing its own name (this allows content model matching without
    string comparisons).
  * Resorted to old-style 'object' for TElementValidator and to plain
    procedures for decoders (allows to drop almost all related memory
    management).
  * Moved parameter entity detection from char to token level, this
    simplifies things a bit.
  + Added second level of buffering to input source (a step towards
    supporting arbitrary encodings).
  * The main parsing loop contains no implicit exception frames now.


  fcl-xml/src/xmlwrite.pp

  * Replaced the stupid indenting algorithm with a simple rule: "Do not
    write indents adjacent to text nodes". Now it does not make a mess
    out of the documents which were parsed with PreserveWhitespace=True.
  * Use specialized node properties instead of generic ones, this
    eliminates WideString copies and results in almost 2x performance
    boost in Windows.
  * Even more performance:
    * Write line endings together with indents -> twice less calls.
    * Increase slack in buffer and write strings with known length (i.e.
      most of markup) without overflow checking.

  fcl-xml/tests/xmlts.pp:

  * Use parser options instead of dedicated procedure to 'canonicalize'
    documents, the parser has become mature enough to do that.
  * Fatal error in non-valid category is a test failure, as well as
    validation error alone in not-wellformed category.

  fcl-xml/src/README

  * Brought a bit up to date

  fcl-xml/tests/README

  + Added testsuite errata/issues

git-svn-id: trunk@10314 -
2008-02-13 10:31:09 +00:00
michael
4e6cd59d8c * Patch by Sergei Gorelkin:
xmlread.pp:
  * As a step towards SAX-based validation, element content validator is
  rewritten from scratch, so it now accepts child elements one by
  one. This also enables reporting location of validation errors (however,
  most locations aren't reported correctly yet).
  * More straightforward handling of comments and PIs in internal subset.
  * Attribute text is handled separately from element text.
  * Unified handling of fatal and validation errors.

  xmlutils.pp:
  * Removed auto widechar->char conversions. These should have been a part
  of fix for #9528, but were not noticed at that moment.

  dom.pp:
  * Reworked 'ugly workarounds' in node removal code.
  + Element nodes remove themselves from document list of IDs, so no invalid pointers are left around.

  xmlts.pp:
  * Corrected validation diagnostics (display the first message and ingore subsequent ones).
  * Validation error alone in a not-well-formed case is a test failure.

git-svn-id: trunk@8896 -
2007-10-21 16:09:41 +00:00
michael
645b0d2cb1 * Patch from Sergei Gorelkin
+ DTD validation
  + Correct reporting of the position of most fatal errors
  + TDOMDocument.CreateElement and others check their arguments for validity
    (INVALID_CHARACTER_ERR is reported where specification says)
  + property TDOMAttr.DataType
  + implemented TDOMDocument.GetElementByID
  * Common code moved to xmlutils.pp
  * whitespace in PublicID literals is normalized

git-svn-id: trunk@6749 -
2007-03-08 09:40:00 +00:00