Contains an HTML-to-Text renderer

html2textrender.pas contains an HTML-to-Text renderer. It converts HTML into plain text by stripping tags and their attributes.

THTML2TextRenderer is an HTML-to-Text renderer. It converts HTML into plain text by stripping tags and their attributes. Converted text includes configurable indentation for HTML tags that affect the indentation level. The following HTML tags include special processing in the renderer:

  • HTML
  • BODY
  • P
  • BR
  • HR
  • OL
  • UL
  • LI
  • DIV CLASS="TITLE" (forces title mark output)

The following Named character entities are converted to their plain text equivalent:

 
' '
<
'<'
&gt;
'>;'
&amp;
'&'

Other named character entities or numeric character entities are included verbatim in the plain text output.

A UTF-8 Byte Order Mark in the HTML is ignored.

End of line marker, by default standard LineEnding Markup used at the start/end of title text Markup used for an HR Tag Markup used at the start of an Anchor Tag Markup used at the end of an Anchor Tag Markup used for a list item tag Text added when there are too many lines Increment (in spaces) for each nested HTML level Sets a pending line break to be added later Sets a maximum of one pending line break to be added later Appends text to the plaint-text output for the renderer Handles an HTML tag and its attributes values Handles an HTML character entity Rest the state and output for the renderer Creates the class instance Frees the class instance Parses the HTML and renders to plain text

Parses the HTML and renders to plain text. Output is limited to aMaxLines lines. Note: AddOutput, HtmlTag and HtmlEntity return False if MaxLines was exceeded.