+ Initial implementation

2025-04-21 20:09:25 +02:00 · 2000-01-12 23:31:23 +00:00 · 2000-01-12 23:31:23 +00:00 · 78794fe79a
commit 78794fe79a
parent a1bfbbfeea
2 changed files with 1399 additions and 0 deletions
--- a/install/man/man1/plex.1
+++ b/install/man/man1/plex.1
@ -0,0 +1,350 @@
+.TH plex 1 "10 Jan 2000" FreePascal "Pascal lexical analyzer generator"
+.SH NAME
+plex - The Pascal Lex lexical analyzer generator.
+
+
+.SH Usage
+
+.B lex [options] lex-file[.l] [output-file[.pas]]
+
+
+.SH Options
+
+.TP
+.B \-v
+.I Verbose:
+Lex generates a readable description of the generated
+lexical analyzer, written to lex-file with new extension 
+.I .lst
+.TP
+.B \-o  
+.I Optimize:
+Lex optimizes DFA tables to produce a minimal DFA.
+
+.SH Description
+
+TP Lex is a program generator that is used to generate the Turbo Pascal source
+code for a lexical analyzer subroutine from the specification of an input
+language by a regular expression grammar.
+
+TP Lex parses the source grammar contained in lex-file (with default suffix
+.l) and writes the constructed lexical analyzer subroutine to the specified
+output-file (with default suffix .pas); if no output file is specified, output
+goes to lex-file with new suffix .pas. If any errors are found during
+compilation, error messages are written to the list file (lex-file with new
+suffix .lst).
+
+The generated output file contains a lexical analyzer routine, yylex,
+implemented as:
+
+.RS
+  function yylex : Integer;
+.RE
+
+This routine has to be called by your main program to execute the lexical
+analyzer. The return value of the yylex routine usually denotes the number
+of a token recognized by the lexical analyzer (see the return routine in the
+LexLib unit). At end-of-file the yylex routine normally returns 0.
+
+The code template for the yylex routine may be found in the yylex.cod
+file. This file is needed by TP Lex when it constructs the output file. It
+must be present either in the current directory or in the directory from which
+TP Lex was executed (TP Lex searches these directories in the indicated
+order). (NB: For the Linux/Free Pascal version, the code template is searched
+in some directory defined at compile-time instead of the execution path,
+usually /usr/lib/fpc/lexyacc.)
+
+The TP Lex library (LexLib) unit is required by programs using Lex-generated
+lexical analyzers; you will therefore have to put an appropriate uses clause
+into your program or unit that contains the lexical analyzer routine. The
+LexLib unit also provides various useful utility routines; see the file
+lexlib.pas for further information.
+
+
+.SH Lex Source
+
+A TP Lex program consists of three sections separated with the %% delimiter:
+
+definitions
+
+%%
+.LP
+rules
+.LP
+%%
+.LP
+auxiliary procedures
+
+All sections may be empty. The TP Lex language is line-oriented; definitions
+and rules are separated by line breaks. There is no special notation for
+comments, but (Turbo Pascal style) comments may be included as Turbo Pascal
+fragments (see below).
+
+The definitions section may contain the following elements:
+
+.TP
+.B  regular definitions in the format:
+
+     name   substitution
+
+which serve to abbreviate common subexpressions. The {name} notation
+causes the corresponding substitution from the definitions section to
+be inserted into a regular expression. The name must be a legal
+identifier (letter followed by a sequence of letters and digits;
+the underscore counts as a letter; upper- and lowercase are distinct).
+Regular definitions must be non-recursive.
+
+.TP
+.B  start state definitions in the format:
+
+     %start name ...
+
+which are used in specifying start conditions on rules (described
+below). The %start keyword may also be abbreviated as %s or %S.
+
+.TP
+.B  Turbo Pascal declarations enclosed between %{ and %}. 
+
+These will be
+inserted into the output file (at global scope). Also, any line that
+does not look like a Lex definition (e.g., starts with blank or tab)
+will be treated as Turbo Pascal code. (In particular, this also allows
+you to include Turbo Pascal comments in your Lex program.)
+
+.SH Rules
+
+The rules section of a TP Lex program contains the actual specification of
+the lexical analyzer routine. It may be thought of as a big CASE statement
+discriminating over the different patterns to be matched and listing the
+corresponding statements (actions) to be executed. Each rule consists of a
+regular expression describing the strings to be matched in the input, and a
+corresponding action, a Turbo Pascal statement to be executed when the
+expression matches. Expression and statement are delimited with whitespace
+(blanks and/or tabs). Thus the format of a Lex grammar rule is:
+
+   expression      statement;
+
+Note that the action must be a single Turbo Pascal statement terminated
+with a semicolon (use begin ... end for compound statements). The statement
+may span multiple lines if the successor lines are indented with at least
+one blank or tab. The action may also be replaced by the | character,
+indicating that the action for this rule is the same as that for the next
+one.
+
+The TP Lex library unit provides various variables and routines which are
+useful in the programming of actions. In particular, the yytext string
+variable holds the text of the matched string, and the yyleng Byte variable
+its length.
+
+Regular expressions are used to describe the strings to be matched in a
+grammar rule. They are built from the usual constructs describing character
+classes and sequences, and operators specifying repetitions and alternatives.
+The precise format of regular expressions is described in the next section.
+
+The rules section may also start with some Turbo Pascal declarations
+(enclosed in %{ %}) which are treated as local declarations of the
+actions routine.
+
+Finally, the auxiliary procedures section may contain arbitrary Turbo
+Pascal code (such as supporting routines or a main program) which is
+simply tacked on to the end of the output file. The auxiliary procedures
+section is optional.
+
+
+.SH Regular Expressions
+
+The following table summarizes the format of the regular expressions
+recognized by TP Lex (also compare Aho, Sethi, Ullman 1986, fig. 3.48).
+c stands for a single character, s for a string, r for a regular expression,
+and n,m for nonnegative integers.
+
+expression   matches                        example
+----------   ----------------------------   -------
+c            any non-operator character c   a
+\\c           character c literally          \\*
+"s"          string s literally             "**"
+.            any character but newline      a.*b
+^            beginning of line              ^abc
+$            end of line                    abc$
+[s]          any character in s             [abc]
+[^s]         any character not in s         [^abc]
+r*           zero or more r's               a*
+r+           one or more r's                a+
+r?           zero or one r                  a?
+r{m,n}       m to n occurrences of r        a{1,5}
+r{m}         m occurrences of r             a{5}
+r1r2         r1 then r2                     ab
+r1|r2        r1 or r2                       a|b
+(r)          r                              (a|b)
+r1/r2        r1 when followed by r2         a/b
+<x>r         r when in start condition x    <x>abc
+---------------------------------------------------
+
+The operators *, +, ? and {} have highest precedence, followed by
+concatenation. The | operator has lowest precedence. Parentheses ()
+may be used to group expressions and overwrite default precedences.
+The <> and / operators may only occur once in an expression.
+
+The usual C-like escapes are recognized:
+
+\\n     denotes newline
+\\r     denotes carriage return
+\\t     denotes tab
+\\b     denotes backspace
+\\f     denotes form feed
+\\NNN   denotes character no. NNN in octal base
+
+You can also use the \\ character to quote characters which would otherwise
+be interpreted as operator symbols. In character classes, you may use
+the - character to denote ranges of characters. For instance, [a-z]
+denotes the class of all lowercase letters.
+
+The expressions in a TP Lex program may be ambigious, i.e. there may be inputs
+which match more than one rule. In such a case, the lexical analyzer prefers
+the longest match and, if it still has the choice between different rules,
+it picks the first of these. If no rule matches, the lexical analyzer
+executes a default action which consists of copying the input character
+to the output unchanged. Thus, if the purpose of a lexical analyzer is
+to translate some parts of the input, and leave the rest unchanged, you
+only have to specify the patterns which have to be treated specially. If,
+however, the lexical analyzer has to absorb its whole input, you will have
+to provide rules that match everything. E.g., you might use the rules
+
+   .   |
+   \\n  ;
+
+which match "any other character" (and ignore it).
+
+Sometimes certain patterns have to be analyzed differently depending on some
+amount of context in which the pattern appears. In such a case the / operator
+is useful. For instance, the expression a/b matches a, but only if followed
+by b. Note that the b does not belong to the match; rather, the lexical
+analyzer, when matching an a, will look ahead in the input to see whether
+it is followed by a b, before it declares that it has matched an a. Such
+lookahead may be arbitrarily complex (up to the size of the LexLib input
+buffer). E.g., the pattern a/.*b matches an a which is followed by a b
+somewhere on the same input line. TP Lex also has a means to specify left
+context which is described in the next section.
+
+
+Start Conditions
+----------------
+
+TP Lex provides some features which make it possible to handle left context.
+The ^ character at the beginning of a regular expression may be used to
+denote the beginning of the line. More distant left context can be described
+conveniently by using start conditions on rules.
+
+Any rule which is prefixed with the <> construct is only valid if the lexical
+analyzer is in the denoted start state. For instance, the expression <x>a
+can only be matched if the lexical analyzer is in start state x. You can have
+multiple start states in a rule; e.g., <x,y>a can be matched in start states
+x or y.
+
+Start states have to be declared in the definitions section by means of
+one or more start state definitions (see above). The lexical analyzer enters
+a start state through a call to the LexLib routine start. E.g., you may
+write:
+
+%start x y
+%%
+<x>a    start(y);
+<y>b    start(x);
+%%
+begin
+  start(x); if yylex=0 then ;
+end.
+
+Upon initialization, the lexical analyzer is put into state x. It then
+proceeds in state x until it matches an a which puts it into state y.
+In state y it may match a b which puts it into state x again, etc.
+
+Start conditions are useful when certain constructs have to be analyzed
+differently depending on some left context (such as a special character
+at the beginning of the line), and if multiple lexical analyzers have to
+work in concert. If a rule is not prefixed with a start condition, it is
+valid in all user-defined start states, as well as in the lexical analyzer's
+default start state.
+
+
+Lex Library
+-----------
+
+The TP Lex library (LexLib) unit provides various variables and routines
+which are used by Lex-generated lexical analyzers and application programs.
+It provides the input and output streams and other internal data structures
+used by the lexical analyzer routine, and supplies some variables and utility
+routines which may be used by actions and application programs. Refer to
+the file lexlib.pas for a closer description.
+
+You can also modify the Lex library unit (and/or the code template in the
+yylex.cod file) to customize TP Lex to your target applications. E.g.,
+you might wish to optimize the code of the lexical analyzer for some
+special application, make the analyzer read from/write to memory instead
+of files, etc.
+
+
+Implementation Restrictions
+---------------------------
+
+Internal table sizes and the main memory available limit the complexity of
+source grammars that TP Lex can handle. There is currently no possibility to
+change internal table sizes (apart from modifying the sources of TP Lex
+itself), but the maximum table sizes provided by TP Lex seem to be large
+enough to handle most realistic applications. The actual table sizes depend on
+the particular implementation (they are much larger than the defaults if TP
+Lex has been compiled with one of the 32 bit compilers such as Delphi 2 or
+Free Pascal), and are shown in the statistics printed by TP Lex when a
+compilation is finished. The units given there are "p" (positions, i.e. items
+in the position table used to construct the DFA), "s" (DFA states) and "t"
+(transitions of the generated DFA).
+
+As implemented, the generated DFA table is stored as a typed array constant
+which is inserted into the yylex.cod code template. The transitions in each
+state are stored in order. Of course it would have been more efficient to
+generate a big CASE statement instead, but I found that this may cause
+problems with the encoding of large DFA tables because Turbo Pascal has
+a quite rigid limit on the code size of individual procedures. I decided to
+use a scheme in which transitions on different symbols to the same state are
+merged into one single transition (specifying a character set and the
+corresponding next state). This keeps the number of transitions in each state
+quite small and still allows a fairly efficient access to the transition
+table.
+
+The TP Lex program has an option (-o) to optimize DFA tables. This causes a
+minimal DFA to be generated, using the algorithm described in Aho, Sethi,
+Ullman (1986). Although the absolute limit on the number of DFA states that TP
+Lex can handle is at least 300, TP Lex poses an additional restriction (100)
+on the number of states in the initial partition of the DFA optimization
+algorithm. Thus, you may get a fatal `integer set overflow' message when using
+the -o option even when TP Lex is able to generate an unoptimized DFA. In such
+cases you will just have to be content with the unoptimized DFA. (Hopefully,
+this will be fixed in a future version. Anyhow, using the merged transitions
+scheme described above, TP Lex usually constructs unoptimized DFA's which are
+not far from being optimal, and thus in most cases DFA optimization won't have
+a great impact on DFA table sizes.)
+
+
+.SH Differences from UNIX Lex
+
+Major differences between TP Lex and UNIX Lex are listed below.
+
+
+TP Lex produces output code for Turbo Pascal, rather than for C.
+
+Character tables (%T) are not supported; neither are any directives
+to determine internal table sizes (%p, %n, etc.).
+
+Library routines are named differently from the UNIX version (e.g.,
+the `start' routine takes the place of the `BEGIN' macro of UNIX
+Lex), and, of course, all macros of UNIX Lex (ECHO, REJECT, etc.) had
+to be implemented as procedures.
+
+The TP Lex library unit starts counting line numbers at 0, incrementing
+the count BEFORE a line is read (in contrast, UNIX Lex initializes
+yylineno to 1 and increments it AFTER the line end has been read). This
+is motivated by the way in which TP Lex maintains the current line,
+and will not affect your programs unless you explicitly reset the
+yylineno value (e.g., when opening a new input file). In such a case
+you should set yylineno to 0 rather than 1.
+
--- a/install/man/man1/pyacc.1
+++ b/install/man/man1/pyacc.1