+ Initial implementation

2025-10-26 23:31:40 +01:00 · 2000-01-12 23:31:23 +00:00 · 2000-01-12 23:31:23 +00:00 · 78794fe79a
commit 78794fe79a
parent a1bfbbfeea
2 changed files with 1399 additions and 0 deletions
--- a/install/man/man1/plex.1
+++ b/install/man/man1/plex.1
@ -0,0 +1,350 @@
 .TH plex 1 "10 Jan 2000" FreePascal "Pascal lexical analyzer generator"
 .SH NAME
 plex - The Pascal Lex lexical analyzer generator.
 .SH Usage
 .B lex [options] lex-file[.l] [output-file[.pas]]
 .SH Options
 .TP
 .B \-v
 .I Verbose:
 Lex generates a readable description of the generated
 lexical analyzer, written to lex-file with new extension 
 .I .lst
 .TP
 .B \-o  
 .I Optimize:
 Lex optimizes DFA tables to produce a minimal DFA.
 .SH Description
 TP Lex is a program generator that is used to generate the Turbo Pascal source
 code for a lexical analyzer subroutine from the specification of an input
 language by a regular expression grammar.
 TP Lex parses the source grammar contained in lex-file (with default suffix
 .l) and writes the constructed lexical analyzer subroutine to the specified
 output-file (with default suffix .pas); if no output file is specified, output
 goes to lex-file with new suffix .pas. If any errors are found during
 compilation, error messages are written to the list file (lex-file with new
 suffix .lst).
 The generated output file contains a lexical analyzer routine, yylex,
 implemented as:
 .RS
  function yylex : Integer;
 .RE
 This routine has to be called by your main program to execute the lexical
 analyzer. The return value of the yylex routine usually denotes the number
 of a token recognized by the lexical analyzer (see the return routine in the
 LexLib unit). At end-of-file the yylex routine normally returns 0.
 The code template for the yylex routine may be found in the yylex.cod
 file. This file is needed by TP Lex when it constructs the output file. It
 must be present either in the current directory or in the directory from which
 TP Lex was executed (TP Lex searches these directories in the indicated
 order). (NB: For the Linux/Free Pascal version, the code template is searched
 in some directory defined at compile-time instead of the execution path,
 usually /usr/lib/fpc/lexyacc.)
 The TP Lex library (LexLib) unit is required by programs using Lex-generated
 lexical analyzers; you will therefore have to put an appropriate uses clause
 into your program or unit that contains the lexical analyzer routine. The
 LexLib unit also provides various useful utility routines; see the file
 lexlib.pas for further information.
 .SH Lex Source
 A TP Lex program consists of three sections separated with the %% delimiter:
 definitions
 %%
 .LP
 rules
 .LP
 %%
 .LP
 auxiliary procedures
 All sections may be empty. The TP Lex language is line-oriented; definitions
 and rules are separated by line breaks. There is no special notation for
 comments, but (Turbo Pascal style) comments may be included as Turbo Pascal
 fragments (see below).
 The definitions section may contain the following elements:
 .TP
 .B  regular definitions in the format:
     name   substitution
 which serve to abbreviate common subexpressions. The {name} notation
 causes the corresponding substitution from the definitions section to
 be inserted into a regular expression. The name must be a legal
 identifier (letter followed by a sequence of letters and digits;
 the underscore counts as a letter; upper- and lowercase are distinct).
 Regular definitions must be non-recursive.
 .TP
 .B  start state definitions in the format:
     %start name ...
 which are used in specifying start conditions on rules (described
 below). The %start keyword may also be abbreviated as %s or %S.
 .TP
 .B  Turbo Pascal declarations enclosed between %{ and %}. 
 These will be
 inserted into the output file (at global scope). Also, any line that
 does not look like a Lex definition (e.g., starts with blank or tab)
 will be treated as Turbo Pascal code. (In particular, this also allows
 you to include Turbo Pascal comments in your Lex program.)
 .SH Rules
 The rules section of a TP Lex program contains the actual specification of
 the lexical analyzer routine. It may be thought of as a big CASE statement
 discriminating over the different patterns to be matched and listing the
 corresponding statements (actions) to be executed. Each rule consists of a
 regular expression describing the strings to be matched in the input, and a
 corresponding action, a Turbo Pascal statement to be executed when the
 expression matches. Expression and statement are delimited with whitespace
 (blanks and/or tabs). Thus the format of a Lex grammar rule is:
   expression      statement;
 Note that the action must be a single Turbo Pascal statement terminated
 with a semicolon (use begin ... end for compound statements). The statement
 may span multiple lines if the successor lines are indented with at least
 one blank or tab. The action may also be replaced by the | character,
 indicating that the action for this rule is the same as that for the next
 one.
 The TP Lex library unit provides various variables and routines which are
 useful in the programming of actions. In particular, the yytext string
 variable holds the text of the matched string, and the yyleng Byte variable
 its length.
 Regular expressions are used to describe the strings to be matched in a
 grammar rule. They are built from the usual constructs describing character
 classes and sequences, and operators specifying repetitions and alternatives.
 The precise format of regular expressions is described in the next section.
 The rules section may also start with some Turbo Pascal declarations
 (enclosed in %{ %}) which are treated as local declarations of the
 actions routine.
 Finally, the auxiliary procedures section may contain arbitrary Turbo
 Pascal code (such as supporting routines or a main program) which is
 simply tacked on to the end of the output file. The auxiliary procedures
 section is optional.
 .SH Regular Expressions
 The following table summarizes the format of the regular expressions
 recognized by TP Lex (also compare Aho, Sethi, Ullman 1986, fig. 3.48).
 c stands for a single character, s for a string, r for a regular expression,
 and n,m for nonnegative integers.
 expression   matches                        example
 ----------   ----------------------------   -------
 c            any non-operator character c   a
 \\c           character c literally          \\*
 "s"          string s literally             "**"
 .            any character but newline      a.*b
 ^            beginning of line              ^abc
 $            end of line                    abc$
 [s]          any character in s             [abc]
 [^s]         any character not in s         [^abc]
 r*           zero or more r's               a*
 r+           one or more r's                a+
 r?           zero or one r                  a?
 r{m,n}       m to n occurrences of r        a{1,5}
 r{m}         m occurrences of r             a{5}
 r1r2         r1 then r2                     ab
 r1|r2        r1 or r2                       a|b
 (r)          r                              (a|b)
 r1/r2        r1 when followed by r2         a/b
 <x>r         r when in start condition x    <x>abc
 ---------------------------------------------------
 The operators *, +, ? and {} have highest precedence, followed by
 concatenation. The | operator has lowest precedence. Parentheses ()
 may be used to group expressions and overwrite default precedences.
 The <> and / operators may only occur once in an expression.
 The usual C-like escapes are recognized:
 \\n     denotes newline
 \\r     denotes carriage return
 \\t     denotes tab
 \\b     denotes backspace
 \\f     denotes form feed
 \\NNN   denotes character no. NNN in octal base
 You can also use the \\ character to quote characters which would otherwise
 be interpreted as operator symbols. In character classes, you may use
 the - character to denote ranges of characters. For instance, [a-z]
 denotes the class of all lowercase letters.
 The expressions in a TP Lex program may be ambigious, i.e. there may be inputs
 which match more than one rule. In such a case, the lexical analyzer prefers
 the longest match and, if it still has the choice between different rules,
 it picks the first of these. If no rule matches, the lexical analyzer
 executes a default action which consists of copying the input character
 to the output unchanged. Thus, if the purpose of a lexical analyzer is
 to translate some parts of the input, and leave the rest unchanged, you
 only have to specify the patterns which have to be treated specially. If,
 however, the lexical analyzer has to absorb its whole input, you will have
 to provide rules that match everything. E.g., you might use the rules
   .   |
   \\n  ;
 which match "any other character" (and ignore it).
 Sometimes certain patterns have to be analyzed differently depending on some
 amount of context in which the pattern appears. In such a case the / operator
 is useful. For instance, the expression a/b matches a, but only if followed
 by b. Note that the b does not belong to the match; rather, the lexical
 analyzer, when matching an a, will look ahead in the input to see whether
 it is followed by a b, before it declares that it has matched an a. Such
 lookahead may be arbitrarily complex (up to the size of the LexLib input
 buffer). E.g., the pattern a/.*b matches an a which is followed by a b
 somewhere on the same input line. TP Lex also has a means to specify left
 context which is described in the next section.
 Start Conditions
 ----------------
 TP Lex provides some features which make it possible to handle left context.
 The ^ character at the beginning of a regular expression may be used to
 denote the beginning of the line. More distant left context can be described
 conveniently by using start conditions on rules.
 Any rule which is prefixed with the <> construct is only valid if the lexical
 analyzer is in the denoted start state. For instance, the expression <x>a
 can only be matched if the lexical analyzer is in start state x. You can have
 multiple start states in a rule; e.g., <x,y>a can be matched in start states
 x or y.
 Start states have to be declared in the definitions section by means of
 one or more start state definitions (see above). The lexical analyzer enters
 a start state through a call to the LexLib routine start. E.g., you may
 write:
 %start x y
 %%
 <x>a    start(y);
 <y>b    start(x);
 %%
 begin
  start(x); if yylex=0 then ;
 end.
 Upon initialization, the lexical analyzer is put into state x. It then
 proceeds in state x until it matches an a which puts it into state y.
 In state y it may match a b which puts it into state x again, etc.
 Start conditions are useful when certain constructs have to be analyzed
 differently depending on some left context (such as a special character
 at the beginning of the line), and if multiple lexical analyzers have to
 work in concert. If a rule is not prefixed with a start condition, it is
 valid in all user-defined start states, as well as in the lexical analyzer's
 default start state.
 Lex Library
 -----------
 The TP Lex library (LexLib) unit provides various variables and routines
 which are used by Lex-generated lexical analyzers and application programs.
 It provides the input and output streams and other internal data structures
 used by the lexical analyzer routine, and supplies some variables and utility
 routines which may be used by actions and application programs. Refer to
 the file lexlib.pas for a closer description.
 You can also modify the Lex library unit (and/or the code template in the
 yylex.cod file) to customize TP Lex to your target applications. E.g.,
 you might wish to optimize the code of the lexical analyzer for some
 special application, make the analyzer read from/write to memory instead
 of files, etc.
 Implementation Restrictions
 ---------------------------
 Internal table sizes and the main memory available limit the complexity of
 source grammars that TP Lex can handle. There is currently no possibility to
 change internal table sizes (apart from modifying the sources of TP Lex
 itself), but the maximum table sizes provided by TP Lex seem to be large
 enough to handle most realistic applications. The actual table sizes depend on
 the particular implementation (they are much larger than the defaults if TP
 Lex has been compiled with one of the 32 bit compilers such as Delphi 2 or
 Free Pascal), and are shown in the statistics printed by TP Lex when a
 compilation is finished. The units given there are "p" (positions, i.e. items
 in the position table used to construct the DFA), "s" (DFA states) and "t"
 (transitions of the generated DFA).
 As implemented, the generated DFA table is stored as a typed array constant
 which is inserted into the yylex.cod code template. The transitions in each
 state are stored in order. Of course it would have been more efficient to
 generate a big CASE statement instead, but I found that this may cause
 problems with the encoding of large DFA tables because Turbo Pascal has
 a quite rigid limit on the code size of individual procedures. I decided to
 use a scheme in which transitions on different symbols to the same state are
 merged into one single transition (specifying a character set and the
 corresponding next state). This keeps the number of transitions in each state
 quite small and still allows a fairly efficient access to the transition
 table.
 The TP Lex program has an option (-o) to optimize DFA tables. This causes a
 minimal DFA to be generated, using the algorithm described in Aho, Sethi,
 Ullman (1986). Although the absolute limit on the number of DFA states that TP
 Lex can handle is at least 300, TP Lex poses an additional restriction (100)
 on the number of states in the initial partition of the DFA optimization
 algorithm. Thus, you may get a fatal `integer set overflow' message when using
 the -o option even when TP Lex is able to generate an unoptimized DFA. In such
 cases you will just have to be content with the unoptimized DFA. (Hopefully,
 this will be fixed in a future version. Anyhow, using the merged transitions
 scheme described above, TP Lex usually constructs unoptimized DFA's which are
 not far from being optimal, and thus in most cases DFA optimization won't have
 a great impact on DFA table sizes.)
 .SH Differences from UNIX Lex
 Major differences between TP Lex and UNIX Lex are listed below.
 TP Lex produces output code for Turbo Pascal, rather than for C.
 Character tables (%T) are not supported; neither are any directives
 to determine internal table sizes (%p, %n, etc.).
 Library routines are named differently from the UNIX version (e.g.,
 the `start' routine takes the place of the `BEGIN' macro of UNIX
 Lex), and, of course, all macros of UNIX Lex (ECHO, REJECT, etc.) had
 to be implemented as procedures.
 The TP Lex library unit starts counting line numbers at 0, incrementing
 the count BEFORE a line is read (in contrast, UNIX Lex initializes
 yylineno to 1 and increments it AFTER the line end has been read). This
 is motivated by the way in which TP Lex maintains the current line,
 and will not affect your programs unless you explicitly reset the
 yylineno value (e.g., when opening a new input file). In such a case
 you should set yylineno to 0 rather than 1.
--- a/install/man/man1/pyacc.1
+++ b/install/man/man1/pyacc.1