- XMLVersion property has been made available in TDOMDocument, removed inheritance check.
* Improved parsing the content model a bit
* Cleanup comments
git-svn-id: trunk@16987 -
* Namespace handling rewritten to fit into XMLReader's own data structures.
* Remaining TDOMElementDef's replaced by TElementDecl.
* Removed DoAttrText(), it has become obsolete.
* Create objects that are needed for namespace processing only if actually doing namespace processing, reduces memory requirements.
* Improved TAttributeDef construction.
git-svn-id: trunk@16230 -
* TXMLNodeType, TNodeData and TAttrDataType moved to xmlutils.pp, so they can be shared between dom, xmlread and dtdmodel.
* TContentParticle class moved from xmlread.pp to dtdmodel.pp.
* dom.pp and xmlread.pp switched to DOM-independent representation of DTD element declarations and attribute defaults.
git-svn-id: trunk@16221 -
* TContentParticle only stores and compares a pointer to an element definition, a particular type of that definition doesn't matter - so change it to TObject.
* In case of mixed content model, assign Type and Quantity to the root content particle, and process it the same way as element-only models.
* While parsing, store entities in THashTable instead of TDOMNamedNodeMap.
* Assign Prefix to element and attribute NodeData.
git-svn-id: trunk@16208 -
* ExpectAttValue() now builds both plain string value and first-level node chain.
* Normalize() procedure moved to xmlutils.pp and made publicly available as BufNormalize.
* ParseLiteral() cleaned of attribute-specific code; at this point it is clear it won't be used for parsing attributes.
git-svn-id: trunk@16186 -
+ state transitions needed to report start/end element events correctly
+ procedures for maintaining attribute data
* excluded FCursor from attribute value parsing
git-svn-id: trunk@16161 -
later: the same as i386/darwin, except
a) uses the non-fragile Objective-C ABI/runtime
b) does not require stubs for direct calls/jumps (not required for
i386/darwin under 10.6 and later either, but still generated
there for backwards compatibility)
c) only the same packages are enabled as for ARM/Darwin
d) MacOSAll is compiled specifically for the iPhoneSimulator SDK
This target also defines the symbol "darwin" apart from the target
name "iphonesim" for source code compatibility reasons.
git-svn-id: trunk@16065 -
* Parse entities by creating another instance of TXMLReader. This is much more straightforward than saving/restoring context of the existing reader.
* Fixed version setting logic so that ReadXMLFragment procedures are now suitable to read entities:
accept streams conforming to extParsedEnt [78], correctly read fragments into documents having version=1.1.
git-svn-id: trunk@16046 -
* r15443 changed the node class with biggest instance size from TDOMAttr to TDOMEntity. Changed that in TDOMDocument constructor, too. Otherwise nodes created with TDOMEntity.CloneNode will leak (they cannot be inserted into tree).
* Do not restore default attributes during document destruction.
* Also added a general check that raises exception if someone tries to allocate from node pool during destruction.
* Fixed replaceChild() method: it was deleting node if that node was replaced by itself.
+ Test for replaceChild.
git-svn-id: trunk@16010 -
* Renamed ParseElement to ParseStartTag to reflect its actual functionality
* Changed ParseQuantity into function returning a enumeration type
* Simplified TXMLDecodingSource.NewLine
* Changed the main loop (ParseContent) so that multiple calls to DoText() are replaced by a single call.
- Removed "if FCDSectinsAsText" branch in DoCDSect. It is obsolete since this case is handled in ParseContent.
git-svn-id: trunk@15975 -
* Applied counterpart of sax_html.pp r15564, eliminating redundant wide-to-ansi conversions;
* AStart parameter of IgnorableWhitespace event should be zero, not 1;
* XML is case-sensitive, removed calls to lowercase();
* Accumulate token characters in FRawTokenText, then convert it all at once to SAXString. Without it, handling multi-byte encodings like UTF-8 was impossible, because it was converting by individual bytes which always resulted in errors. Provides a partial fix for Mantis #16732. Also provides a single location to insert a proper decoding procedure.
git-svn-id: trunk@15738 -
* For ancestor and ancestor-or-self axes, added checks for attribute nodes similar to parent axis.
* For reverse axes, collect and filter nodes in 'natural' (i.e. reversed) order, and only then reverse order while adding to result node set. This is much simpler to implement.
* Fixed memory leak (not destroying TXPathFilterNode.FExpr)
git-svn-id: trunk@15652 -
o The prefix to resolve should not include following ':*' characters
o NextToken changes CurTokenString, so NextToken must be after reading CurTokenString.
o Added a test for that
* XPath test suite, fixed comparison of numeric result (it is quite tricky in presence of NaNs).
git-svn-id: trunk@15639 -
- Deleted TXPathLocationPathNode, it was too much overhead to store a single bit of information. The path root (if any) is now represented by TStep node with Axis=axisRoot.
* Changed TStep linkage from 'right' to 'left', this is consistent with the way of parsing expressions and considerably simplifies evaluation.
* Fixed ParsePathExpr procedure so it no longer accepts empty/truncated expressions.
git-svn-id: trunk@15632 -
* Also modified it so the data is accumulated in local vars, and resulting TStep objects are created only after the parsing is successfully complete.
* TXPathScanner.ParsePrimaryExpr: eliminated variable.
git-svn-id: trunk@15628 -
+ Implemented DOM level 3 properties xmlVersion and xmlEncoding for both TDOMDocument and TDOMEntity classes. Also declared property inputEncoding for these classes.
* Non-conformant TXMLDocument.Encoding has been deprecated; it is now an alias for xmlEncoding property.
* TDOMDocument and TDOMEntity now share a common ancestor, TDOMNode_TopLevel.
* api.xml: enabled testing for the new properties
git-svn-id: trunk@15443 -
Without this, certain (malformed) documents (e.g. eduni/xml-1.1/005.xml) were causing
InputSource leaks.
Note: these leaks are a side effect from recent changes to entity processing and are not
observed with older versions.
git-svn-id: trunk@14361 -
* Also modified TXMLReader.ParseContent so that it produces normalized text nodes, i.e. merges text nodes on entity boundaries (when Options.ExpandEntities=True, of course) and merges the text coming from CDATA sections when Options.CDSectionsAsText=True.
git-svn-id: trunk@14248 -
* General entities are now processed non-recursively;
* They are now re-parsed on each inclusion, enabling proper validation and ensuring SAX-compatible order of events. Also less dependent on DOM-specific calls like CloneNode.
git-svn-id: trunk@14232 -
- Removed recognition of 'ISO8859-1', as it was a workaround for incorrect fpdoc encodings.
- Removed with statement in ParseContent, it won't work if we handle entities non-recusively, because FSource will be changing.
git-svn-id: trunk@14207 -
* Moved line ending processing from the encoder to a higher level; without this, implementing/using external encoders is very problematic.
+ Implemented line ending processing for c14n mode.
git-svn-id: trunk@14194 -
* dedicated procedure for writing the document node;
* no indenting in c14n mode;
* ignore Specified property of the attributes in c14n mode.
git-svn-id: trunk@14192 -
* Hard limit of literal lengths: 3 on version, 30 on encoding name, 2 or 3 on standalone. Without this, a misplaced quote could cause excessive amount of processing, because input buffer is reloaded by small 3-char chunks at this time.
* Encoding validity is checked in-line, the very first illegal character aborts processing.
git-svn-id: trunk@13961 -
* Delay switching to xml 1.1 rules until the declaration has been parsed, this ensures that NEL and LSEP chars in declaration are rejected (rmt-056, rmt-057).
git-svn-id: trunk@13922 -
- TXMLCharSource.PublicID removed, it is unused.
* Base URI of an entity is stored in FURI field of entity, and passed to ResolveEntity.
* When error happens while parsing an internal entity, report the URI where that entity was declared, not where it was included.
git-svn-id: trunk@13921 -
* text() selector matches text and CDATA nodes, but not comments
* Names of processing instructions are now matched as they should.
git-svn-id: trunk@13915 -
* Also changed TSpecialCharCallback to take the string and the index, so it can process certain sequences, not only single chars.
* In canonical mode, CDATA sections are written as text.
git-svn-id: trunk@13906 -
* separate procedure SkipQuote, reused by SkipQuotedLiteral and ExpectAttValue;
* inlined SkipPubidLiteral to the (only) place where it is called.
git-svn-id: trunk@13858 -
* Progress with namespace support. Resolve namespace prefixed while parsing, compare namespaceURI/localNames while evaluating. Existing tests for namespace-uri(), local-name() and name() now all pass, but resolving interface isn't ready for general use yet.
* Fixed name() to default to context node if argument is omitted.
xpathts.pp:
+ support for prefix resolving while testing.
git-svn-id: trunk@13846 -
* Report unclosed Ignore section as soon as it is detected, improves error messages and simplifies the code.
* Since SkipUntilSeq is only ever called with 1 or 2 delimiter chars, support just that rather than arbitrary-length array. Simplifies code.
git-svn-id: trunk@13818 -
+ Added TDOMTestBase.LoadStringData method, allows loading documents from string.
* Don't return empty string from GetResourceURI when file doesn't exist. Thus we can see the problematic filename in the test output.
+ Added extras.pp, contains a few tests not present in w3.org test suite.
+ Added extras2.pp, contains some tests ported by hand because no automatic conversion possible yet. It addresses namespace fixup during serialization and canonical-form issues.
README_DOM.txt: updated to reflect the added units.
git-svn-id: trunk@13729 -
* dynamic -> virtual (does not change anything, in fact, because FPC handles these two keywords identically).
* Default implementations for GetFeature, GetProperty, SetFeature, SetProperty - removes warnings about abstract methods at build time.
git-svn-id: trunk@13382 -
* Recognize only five predefined XML entities, not all the stuff defined for HTML.
* Recognize character refs in hex notation only using lowercase 'x'.
git-svn-id: trunk@13376 -
* htmldefs.pp - no more limited to Latin-1; uses binary search instead of linear.
* sax_html.pp - no longer emits SkippedEntity events; any reference is either resolved or handled as text.
* sax_xml.pp - in contrast to HTML, never handles entities as text (either resolved or passed to SkippedEntity).
git-svn-id: trunk@13368 -
* In case of decoding error, count line endings in the same way as during normal processing.
* Improved error diagnostics in ParseAttlistDecl().
git-svn-id: trunk@13359 -
+ Define elements which may omit end-tag (except HTML, HEAD and BODY which may also omit the start-tag)
+ Define which elements may close other elements (modelled after libxml2).
* DIV may have #PCDATA content.
sax_html.pp:
* Improve the parser to report startElement/endElement events properly. Should resolve Mantis #14073 and related element hierarchy issues.
git-svn-id: trunk@13357 -
- Removed (made abstract) default implementations of TXPathVariable.AsText(), AsNumber() and
AsBoolean(). These methods are overriden by all TXPathVariable descendants, therefore in
TXPathVariable itself they are dead and only increase executable size.
- Removed debug statement committed by accident in r13256.
tests/xpathts.pp:
* Annotated some tests, added a few tests for name(), namespace-uri() and local-name().
git-svn-id: trunk@13322 -
+ Character count checks for parameter entities, protects against entity expansion attacks using PE's.
+ Cache external PE's so they are only fetched once, considerably reduces traffic and CPU load in
case of attack.
* Do not repeat attempts to read from input stream once the read operation has returned less bytes
than requested.
git-svn-id: trunk@13321 -
+ New option TDOMParseOptions.DisallowDoctype - prohibits processing of the DTD (specs compliant,
targeted for SOAP applications).
+ New option TDOMParseOptions.MaxChars - limits max document length, protects against entity
expansion attacks and DoS by feeding in too long documents. Default value of 0 means no
restrictions. Tested with internal and external general entities, TBD with parameter entities.
* Fixed calculation of URIs used to retrieve external entities, they should be evaluated at the
point of entity declaration rather than at the point of resolving (which happens at the first
inclusion).
dom.pp:
* TDOMNode.SetReadOnly, calling Attributes was causing creation of TAttributeMap on every element.
Fixed.
git-svn-id: trunk@13313 -
recursive with respect to entities, however). This enables more useful backtrace in case of
parsing errors, and makes more fun at profiling.
git-svn-id: trunk@13275 -
- Removed remaining assignments of TXMLFileInputSource.SystemID, as it is assigned in constructor.
- As attributes now remove themselves from owner upon destruction, removed the redundant check.
* ReadXMLFile, ReadDTDFile: Moved all assignments of Document to the 'finally' sections to avoid
leaks if parsing error happens. This was already done in most frequently used overloaded
ReadXMLFile, but wasn't noticed in other places.
* TXMLReader.CheckName was unable to detect a malformed local part of a QName if its prefix is
well-formed and crosses the buffer boundary. Fixed.
git-svn-id: trunk@13259 -
* Qualified names, 'NCName:*' and variable references are handled as single tokens (no
whitespace is allowed between parts).
* Function and variable names may have a prefix now.
git-svn-id: trunk@13256 -
+ Utility function TXPathScanner.SkipToken, saves some amount of typing.
* Allow TXPathLocationPathNode to have FFirstStep = nil, and don't create a redundant
initial step while parsing.
git-svn-id: trunk@13253 -
TStep.SelectNodes, TStep.ApplyPredicates and the remaining part.
* Since predicates contained in a location path are evaluated within separate contexts of their own,
evaluation of the location path itself does not require a full context (only need context nodes).
This simplifies things quite a bit.
+ Added support for evaluating filter expressions follwed by location path. Things like
"id('foo')/bar" work now.
git-svn-id: trunk@13244 -
* Store predicates of TStep and TXPathFilterNode in dynarrays instead of TList, this way we allocate
only as much memory as needed, and allocate anything only when predicates are actually present
(that's minority of all cases).
* Eliminated intermediate TList in filtering step results.
* Remaining TList's replaced by TFPList's.
* Fixed ordering of nodes on preceding-sibling axis.
git-svn-id: trunk@13230 -
* Default attributes which have a colon in name now get a namespace assigned upon restoring. This
fixes remaining level 2 testsuite failures.
git-svn-id: trunk@13229 -
+ Modified TDOMImplentation.HasFeature to return True for level 2 features.
Now the DOM Level 2 support is fairly complete (1 error and 1 failure still remain).
git-svn-id: trunk@13228 -
calling NodeFilter for each node. The NodeFilter return value determines whether the node will
be added to the list, and whether node's children should be recursively iterated.
This considerably simplifies creating TDOMNodeList descendants, as they no longer need to mess
with the class internals (FRevision, FList, etc).
git-svn-id: trunk@13227 -
* Cloning/importing attributes and elements preserves their namespaces
* Importing an element does not import non-specified attributes.
+ Separate, much faster, implementation for element cloning.
git-svn-id: trunk@13226 -
+ Initial part of TDOMElement.SetAttributeNS
+ TDOMImplementation.CreateDocumentType checks validity of QualifiedName
* Use 'managed' memory allocation in TDOMEntity.CloneNode and TDOMNotation.CloneNode to avoid leaks
* TDOMDocument.RemoveID now using THashTable.RemoveData(), simplifies things
git-svn-id: trunk@13200 -
+ Implemented handling of default attributes:
* creating an element also creates and attaches the default attributes;
* removing an attribute restores it with default value, if there's one.
+ Attribute nodes remove themselves from the owner element upon destruction, making it possible
to Free attributes manually.
* TDOMNamedNodeMap.SetNamedItem does not reset attribute OwnerElement if the argument node is already
contained in the map (and whole operation is therefore is a no-op).
git-svn-id: trunk@13196 -
* Every node created by Document.CreateXXX method is now guaranteed to be destroyed with the
document, whether it is part of the tree or not. Therefore, DOM methods which remove nodes
from the tree (namely, TDOMNode.RemoveChild, TDOMNode.ReplaceChild,
TDOMElement.SetAttributeNode and TDOMElement.SetAttributeNodeNS) no longer need to destroy
their return value and are now conformant to the specs.
* Nodes are allocated in arrays of instances (emulates 'placement new operator' in C++ terms).
Allocation and freeing are as fast as possible (just assigns a couple of pointers).
* Behaviour of nodes that are created by direct call to constructor is unchanged.
git-svn-id: trunk@13185 -
src/dom.pp:
* GetElementsByTagName[NS] results now get cached in a hashtable. Repeated calls to
GetElementsByTagName with same arguments return the same instance of NodeList. All NodeLists
created during document lifetime are destroyed with the document.
src/xmlutils.pp:
* THashTable.Lookup(), changed SetString to SetLength+Move because SetString truncates on #0
+ added THashTable.RemoveData() method
tests/api.xml:
- No longer need to 'garbage collect' the NodeLists.
git-svn-id: trunk@13180 -
* Do not convert tests which request implementation attribute 'signed'='true'.
Such tests aren't applicable to our unsigned DOM, they only cause compiler warnings
and noise in the test report.
+ Support for default properties (obj.item(x) -> obj[x]).
+ Support black-listing of testcases. Some of them (in HTML testsuite) are easier to
rewrite by hand than to convert.
+ Support adding certain units to 'uses' clause (e.g. HTML suite must use dom_html).
git-svn-id: trunk@13172 -
Since the procedure still raises NO_MODIFICATION_ERR later (while inserting new node to the parent),
the testsuite wasn't able to detect this bug, causing the old node to be modified and the new node to leak :/
git-svn-id: trunk@13144 -
* Partial fix for #13605 (fixes the issue as reported, but a fix for nodelists returned by
GetElementsByTagName[NS] is also needed):
- Each node which can have children has an associated instance of TDOMNodeList;
- Other nodes are substituted by special 'null' node, having an empty child list;
- Muiltiple calls to ChildNodes return the same instance of TDOMNodeList;
- Calling Free on result of ChildNodes is optional; if not freed, it will be destroyed with
the owning node.
tests/domunit.pp:
* Changed TFPObjectList to TObjectList, which has been found to destroy its elements in opposite
order (reported as #13715). As NodeLists became owned by Document, old (FIFO) destruction order is
causing their double destruction.
git-svn-id: trunk@13143 -
* Identifier is treated as AxisName only if it is followed by '::'
* Identifier is treated as NodeType only if it is followed by '('.
git-svn-id: trunk@13123 -
+ Implemented [more or less] correct parsing of last two variations of PathExpr [19].
(only parsing - evaluation still has to be done).
git-svn-id: trunk@13116 -
* Split parsing location steps into a separate procedure, TXPathScanner.ParseStep();
* Changed some case statements to if's, in order to improve indentation and readability.
git-svn-id: trunk@13115 -
- Removed TRefClass, memory of TDOMNodeList's must be managed in some other way (part of #13605).
- Removed unnecessary forward class declarations.
git-svn-id: trunk@13113 -
* #12 is not a whitespace char;
* '!' is not valid unless it is a part of '!=' token;
* Accept full XML 1.0 name character range as identifiers.
git-svn-id: trunk@13109 -
xmlread.pp:
* Moved assignment of TXMLFileInputSource.SystemID into TXMLInputFileSource.Create.
* Eliminated nested procedure in TXMLReader.ProcessDefaultAttributes, it was redundant since
r12026.
xpath.pp:
* Moved predicate evaluation code, which is common for filter and step nodes, into
TXPathExprNode.EvalPredicate().
git-svn-id: trunk@13056 -
domunit.pp:
+ Added AssertEqualsNoCase() method.
* TDOMTestBase.Load chaned doc argument to untyped, because some tests use TDOMNode instead of a
TDOMDocument.
* Completed the URIEquals() method; changed parameter types to PChar in order to be able to
distinguish nil from empty string.
* Modified GetResourceURI method so it is capable to handle the directory structure of CVS
snapshot (level2/html using data files from level1/html).
testgen.pp:
* Counterparts to changes in domunit.pp
+ Support for renaming APIs (e.g. 'type'->'htmlType')
+ Support for enforcing the correct type of function arguments (emits 'as' operator).
api.xml:
+ Added data for DOM Level3 XPath and Level2 HTML suites.
git-svn-id: trunk@13055 -
* Handle empty tags by emitting a pair of StartElement/EndElement
events. While this is optional in HTML parser, here in XML parser it is a requirement.
git-svn-id: trunk@13048 -
src/xpath.pp:
+ Implemented sum() and normalize-space() core functions.
* Rewrote comparison code, nodesets are now handled correctly (to the
extent of the tests which I could find).
* starts-with() and contains() return True when second argument is an
empty string - that differs from Pos() behavior.
* NaNs are propagated through mod operator, floor() and ceiling().
* Fixed memory leak caused by not releasing arguments after function
calls.
* string-value of a nodeset is the value of its first node, not all
nodes glued together.
tests/xpathts.pp:
+ Added 120 tests, most coming from OASIS Prototype XSLT/XPath 1.0
conformance test suite.
+ Tests can now take an input xml data as a string.
git-svn-id: trunk@13046 -
+ added TXPathBinaryNode as a common ancestor for binary operations;
+ TXPathBooleanOpNode now handles only 'and' and 'or' operators,
the purpose is to not evaluate the second argument if the result can
be determined by the first argument;
* Comparison operations moved to TXPathCompareNode and fixed
to support INFs and NANs correctly;
* Fixed TranslateWideString() that was not deleting characters;
* Fixed 'substring-after' function so its result is empty when argument
string does not contain the pattern;
* Fixed 'round' funcion so it complies to the specs;
* Completed implementation of 'substring' function (but surrogate pairs
are not handled yet);
* Mask exInvalidOp and exZeroDivide FPU exceptions while evaluating
expressions, this ensures correct calculations with respect to INFs
and NANs.
+ Added testsuite for xpath
git-svn-id: trunk@12961 -
* TXPathUnionNode.Evaluate: fixed two crashes. The object returned by
TXPathVariable.AsNodeSet is owned by that TXPathVariable and should
not be explicitly destroyed. Also TXPathVariable should not be
released if its AsNodeSet result will be used later.
* TXPathLocationPathNode.Evaluate/EvaluateStep:
- fixed crash in axisFollowing case branch (caused by wrong variable
being used in the loop).
- rewrote axisPreceding branch so it builds the result node list in
correct (document) order.
- Fixed predicate match condition that was always evaluating as True.
* TXPathScanner.ParseLocationPath: modified so it never returns nil.
This fixes crash in cases when '/' or '//' are used otherwise than
the whole expression (e.g. 'string(/)').
* Replaced manual searching in TList by calls to IndexOf() in two
places.
git-svn-id: trunk@12934 -
Fixes a bug in internal DTD subset processing, which was preventing
tokens that cross input buffer boundary from being correctly added to
DocType.InternalSubset (the first part of such tokens was dropped).
git-svn-id: trunk@12860 -
* Fixed crash resulting out of changing TDOMDocument.OwnerDocument
from Self to nil a while ago.
* Refactored the TXPathScanner code so that it:
a) does not use UngetToken.
b) does not build CurTokenString by appending single characters.
c) moved parsing code from nested procedures to regular TXPathScanner
methods.
+ Implemented many (but not all yet) core library functions.
+ Support for scanning 'processing-instruction("name")' syntax.
+ Support for scanning 'foo:*' and 'foo:bar' node name tests.
* NodeSets are always convertible to numbers and booleans.
* String representation of an Element is its TextContent, not NodeName.
* TXPathConstantNode must Release its value, not destroy it (enables
correct result of expressions that consist of a single constant).
* Some fixes in attempt to make math operations conformant to the specs.
git-svn-id: trunk@12855 -
dom.pp:
* Added a comment about TDOMDocument destruction order.
* Changed declaration of standard namespaces from literals to typed
consts. This makes package operational on arm-wince which has a flaw
in WideString literal assignments (issue #13237).
Even without that bug, the change allows to save some bytes in
executable, because typed consts are only put there once, while
literals are compiled in for every unit that uses them.
htmwrite.pp:
* removed an unused variable
git-svn-id: trunk@12819 -
that descendent classes can override the NodeName properly
* Fixed an AV when GetNodeName is called and there is no NodeName set
* Removed the THtmlCustomElement.NodeName property and override the GetNodeName
method instead. The hashtable of TDOMNode_NS is not used because
THtmlCustomElement uses a faster lookupsystem for tag/node-names
* Added a basic test for the htmlwriter unit
git-svn-id: trunk@12732 -
+ add posix thread support
* improve signal handling
* synchronize haiku's baseunix unit with the unix one (maybe it will be possible to remove Haiku's one in a future patch, but i keep it for now)
+ add support for standard sockets
* fix some functions import to use the right libraries under Haiku
* fix packages compilation
git-svn-id: trunk@12636 -
* Changed the design of input decoders so they process data by chunks
instead of char-by-char. It is much faster, and allows supporting
external pluggable decoders.
+ Interface for external decoders.
* ResolvePredefined() is rewritten so it doesn't call CompareMem five
times do determine a single char.
* ParseCharRef renamed to ParseRef, because it parses entity refs as
well.
* Added guard conditions to prevent integer overflows in ParseRef.
* ContextPush(TXMLCharSource) merged into Initialize().
xmliconv.pas is a new unit, containing an libiconv-based decoder. It depends on existing iconvenc package, and
thus supports all platforms that are supported by iconvenc.
xmliconv_windows.pas is the variation that allows to use libiconv functionality on Windows (It would require
the iconv.dll to be distributed with the application, but since I haven't succeeded yet in writing a native
Windows decoder, this is better than nothing).
git-svn-id: trunk@12582 -
xmlread.pp:
* Remove TXMLReader.FCurChar, by replacing it by FSource.FBuf^.
* Aiming to support any input encoding, the parser has been refactored
to consume UTF-16 produced by 'xml-unaware' decoder (i.e. with line
endings unadjusted and possibly containing chars that are invalid for
XML). The majority of parsing is now done in SkipUntil methods of
TXMLCharSource and TXMLDecodingSource. This design also considerably
increases performance because it processed chars in batches instead of
one-by-one (the decoders still process chars one-by-one, but they are
subject for soon replacement).
* Signature of BufAppendChunk changed to take starting and ending
addresses of the buffer instead of starting address and length.
* More sophisticated parsing of end-tags, avoids calls to StoreLocation
if possible (despite its trvial look, StoreLocation is quite expensive
in CPU cycles).
dom.pp:
* Some progress with DOM level 2. Implemented namespaceURI, prefix,
localName properties for Elements and Attributes. The namespace
information occupies only 32 bits per node.
* Implemented storing names of elements and attributes in a hash table.
This considerably reduces amount of used memory because each unique
string is stored only once. Reducing memory allocation count also
improves parsing speed.
* Using the hash table also allows to link DTD declarations directly to
the element nodes, avoiding any lookup at all.
dom_htmp.pp:
* Merely fixes compilation after changes to the DOM.
git-svn-id: trunk@12318 -
src/xmlread.pp, src/dom.pp
* Improvements to attribute processing: attributes are now validated as
they come. This enables reporting of the corresponding validation
errors at correct positions (previously everything was reported at the
end of element start-tag).
* Search for a declaration for attribute, not for an attribute
corresponding to the declaration. This reduces number of lookups
(because unspecified attributes are not searched) and obsoletes the
need in FDeclared field on every attribute.
tests/domunit.pp, tests/testgen.pp:
* Various improvements required to support converting of the
DOM level 3 XPath module.
git-svn-id: trunk@12026 -
a) they should not be necessary and only hide the symptoms of a not
understood bug on some platforms
b) doing so breaks things on some other platforms
git-svn-id: trunk@11951 -
xmlutils.pp:
+ Added THashTable - a simple hashed container with WideString keys.
dom.pp:
* Use the hash table instead of a sorted list for storing document IDs.
* Replaced all TLists by TFPList (which is smaller and faster).
* Fixed TDOMElement.RemoveAttributeNode to throw NOT_FOUND_ERR when
the requested node is not one of the element's attributes.
+ Added node read-only checks where required by the specs, this fixes
about 50 DOM tests.
xmlread.pp:
* Got rid of TXMLCharSource.FReloadHook, the corresponding procedure may
be called directly.
* Used a separate buffer to store the entity value literals, this
enables correct including of external PEs that have a text declaration
at the beginning.
* Some refactoring: ParseAttribute has been split into a separate
procedure, ProcessTextAndRefs was merged into ParseContent.
git-svn-id: trunk@11942 -
xmlutils.pp, names.inc:
* exclude colon from the NameChar bitmap and handle it in code.
dom.pp:
+ TDOMText.IsElementContentWhitespace now implemented completely.
* Attributes created by TDOMElement.SetAttribute get their
OwnerElement property assigned properly
* Attribute replaced by TDOMNamedNodeMap.SetNamedItem get their
OwnerElement reset to nil
* TDOMElement.SetAttributeNode does not destroy the attribute when it
is being replaced by itself
* Most node boolean properties collected into a single FFlags field
to reduce memory requirements.
xmlread.pp:
+ Syntax-level support of namespaces: handle colons in names, check
correct qualified name syntax, prohibit colons in entity/notation/PI
names and ID/IDREF attribute values (all this only happens when
Options.Namespaces is set to True - not by default).
* Reaching end of input while parsing the Ignore Section is a fatal
error because parameter entities are not recognized there.
* Reaching end of input while parsing entity value literal that was
started in a parameter entity aborts immediately instead of hopelessly
scanning the whole document up to its end.
* Fixed parsing duplicate Element declarations. The content models of
subsequent declarations are now discarded as they should - not
appended to the existing model.
* Fixed parsing duplicate Attlist declarations. In addition to dropping
the attribute declaration itself, do not modify the corresponding
element declaration and suppress 'Duplicate ID attribute' and
'Duplicate NOTATION attribute' validation errors.
* Fixed error position in cases when attribute value lacks the closing
quote.
* Some refactoring in order to reduce number of WideString vars and code
size (some SkipX and ExpectX merged into SkipX(required: Boolean)).
* TXMLCharSource.FLocation record replaced by single integer FLineNo
because LinePosition is always calculated.
* TXMLCharSource.FCursor replaced by local var.
* TXMLReader.NameIs changed to a more general BufEquals(), it eliminates
TXMLReader.GetString and some WideString variables.
tests/xmlts.pp:
* Ignored tests do not change suite conformance state.
tests/testgen.pp
* Added a forgotten semicolon.
git-svn-id: trunk@11869 -
* testgen.pp - an utility to convert w3.org tests from XML format
into fpcunit-compatible Pascal source. The official testsuite uses
xslt for conversion, but, since there is no xslt for Pascal, and no
xslt support in FCL yet, I wrote an utility.
* api.xml - API 'database', needed by testgen.
* domunit.pp - an fpcunit extension, provides DOM-specific runtime
support.
* README_DOM - provides some instructions about putting it all together.
git-svn-id: trunk@11390 -
* excludes #$FFFE and #$FFFF from allowed XML 1.1 name chars, so
IsXmlName result is correct when its argument comes not from the
parser.
xmlread.pp:
+ Two new parsing options, Namespaces and ResolveExternals (not
functional yet but needed to proceed).
* Fixed checking of WFC [28a], forces fatal error as soon as possible
and prevents parsing of further (potentially malicious) data.
Hopefully now it is truly compliant to the specs and not just
satisfies the tests.
* In entity value literals, nesting is checked by entity, not by the
input source (consistent to other places).
- Saving FCursor around attribute default value isn't necessary because
FCursor is always nil while parsing the DTD.
* TList's changed to more lightweight TFPList's.
* Changed once more (probably the last time) recognizing the standalone
percent sign in parameter entity declarations. Rationale is that
FCurChar is no more out of sync with FSource.FBuf^, and therefore may
be removed.
tests/xmlts.pp and tests/README:
+ Added support for the latest XML test suite (by skipping tests
targeted for the upcoming fifth edition of XML specs).
+ 'Namespaces' option is passed to the parser.
* README updated with the latest testsuite URL.
git-svn-id: trunk@11303 -
dom.pp:
* Document.OwnerDocument returns nil.
* Document.TextContent returns empty string and setting it does nothing.
* Fixed EntityReference, it now gets its children upon creation and is
correctly imported between documents.
+ Node.IsSupported()
* DOM feature name comparison is done case-insensitive.
* Reworked Node.AppendChild/Node.InsertBefore. Duplicate functionality
removed. Resolves remaining issues with hierarchy/ownership checks
(except for Document nodes which is a different story altogether).
The same code is now executed for nodes attached to a Fragment as
well as for regular nodes.
+ Text.SplitText checks for valid ParentNode.
xmlread.pp:
+ Implemented TDOMParser.ParseWithContext (except the case of replacing
the whole document)
* Fixed AV when calling ParseXXX methods with input source that could
not be resolved.
* Completely ignore comments in external DTD subset, it fixes a couple
of DOM tests and has no effect on XML testsuite.
git-svn-id: trunk@11217 -
fcl-xml/src/dom.pp: resolved a number of Level 1 conformance issues:
* Node.Normalize always deletes empty text nodes
* Node.Normalize is recursive into Attributes
* Node.InsertBefore corrected exception code in case when RefChild is
not one of node's children
+ Node.InsertBefore added missing check for possible cycle in tree
+ Node.AppendChild and Node.InsertBefore added checking type of NewChild
+ CloneNode enabled for Fragment and Entity
- CloneNode deleted for DocumentType (w3 specs directly prohibit cloning
it between documents, and cloning within one document is claimed
'implementation specific' - but makes no sense).
+ Node.ImportNode is now working
* Uncommented Level 2 node properties (NamespaceURI, localName and
Prefix), this caused a name clash and a lot of function argument
renames.
fcl-xml/src/xmlutils.pp:
+ overloaded IsXmlName() that accepts PWideChars
fcl-xml/src/xmlconf.pp
* Applied a fix similar to xmlcfg.pp for Mantis #10554
fcl-xml/src/xmlread.pp:
* Major: Got errors reported at correct locations for all 1600+ negative
tests. Easy to say, but it required modifying almost every second
line of code.
* TContentParticle references an existing element definition instead of
storing its own name (this allows content model matching without
string comparisons).
* Resorted to old-style 'object' for TElementValidator and to plain
procedures for decoders (allows to drop almost all related memory
management).
* Moved parameter entity detection from char to token level, this
simplifies things a bit.
+ Added second level of buffering to input source (a step towards
supporting arbitrary encodings).
* The main parsing loop contains no implicit exception frames now.
fcl-xml/src/xmlwrite.pp
* Replaced the stupid indenting algorithm with a simple rule: "Do not
write indents adjacent to text nodes". Now it does not make a mess
out of the documents which were parsed with PreserveWhitespace=True.
* Use specialized node properties instead of generic ones, this
eliminates WideString copies and results in almost 2x performance
boost in Windows.
* Even more performance:
* Write line endings together with indents -> twice less calls.
* Increase slack in buffer and write strings with known length (i.e.
most of markup) without overflow checking.
fcl-xml/tests/xmlts.pp:
* Use parser options instead of dedicated procedure to 'canonicalize'
documents, the parser has become mature enough to do that.
* Fatal error in non-valid category is a test failure, as well as
validation error alone in not-wellformed category.
fcl-xml/src/README
* Brought a bit up to date
fcl-xml/tests/README
+ Added testsuite errata/issues
git-svn-id: trunk@10314 -
xmlread.pp:
* As a step towards SAX-based validation, element content validator is
rewritten from scratch, so it now accepts child elements one by
one. This also enables reporting location of validation errors (however,
most locations aren't reported correctly yet).
* More straightforward handling of comments and PIs in internal subset.
* Attribute text is handled separately from element text.
* Unified handling of fatal and validation errors.
xmlutils.pp:
* Removed auto widechar->char conversions. These should have been a part
of fix for #9528, but were not noticed at that moment.
dom.pp:
* Reworked 'ugly workarounds' in node removal code.
+ Element nodes remove themselves from document list of IDs, so no invalid pointers are left around.
xmlts.pp:
* Corrected validation diagnostics (display the first message and ingore subsequent ones).
* Validation error alone in a not-well-formed case is a test failure.
git-svn-id: trunk@8896 -