mirror of
				https://gitlab.com/freepascal.org/fpc/source.git
				synced 2025-10-31 09:32:00 +01:00 
			
		
		
		
	+ Initial implementation
This commit is contained in:
		
							parent
							
								
									a1bfbbfeea
								
							
						
					
					
						commit
						78794fe79a
					
				
							
								
								
									
										350
									
								
								install/man/man1/plex.1
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										350
									
								
								install/man/man1/plex.1
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,350 @@ | |||||||
|  | .TH plex 1 "10 Jan 2000" FreePascal "Pascal lexical analyzer generator" | ||||||
|  | .SH NAME | ||||||
|  | plex - The Pascal Lex lexical analyzer generator. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .SH Usage | ||||||
|  | 
 | ||||||
|  | .B lex [options] lex-file[.l] [output-file[.pas]] | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .SH Options | ||||||
|  | 
 | ||||||
|  | .TP | ||||||
|  | .B \-v | ||||||
|  | .I Verbose: | ||||||
|  | Lex generates a readable description of the generated | ||||||
|  | lexical analyzer, written to lex-file with new extension  | ||||||
|  | .I .lst | ||||||
|  | .TP | ||||||
|  | .B \-o   | ||||||
|  | .I Optimize: | ||||||
|  | Lex optimizes DFA tables to produce a minimal DFA. | ||||||
|  | 
 | ||||||
|  | .SH Description | ||||||
|  | 
 | ||||||
|  | TP Lex is a program generator that is used to generate the Turbo Pascal source | ||||||
|  | code for a lexical analyzer subroutine from the specification of an input | ||||||
|  | language by a regular expression grammar. | ||||||
|  | 
 | ||||||
|  | TP Lex parses the source grammar contained in lex-file (with default suffix | ||||||
|  | .l) and writes the constructed lexical analyzer subroutine to the specified | ||||||
|  | output-file (with default suffix .pas); if no output file is specified, output | ||||||
|  | goes to lex-file with new suffix .pas. If any errors are found during | ||||||
|  | compilation, error messages are written to the list file (lex-file with new | ||||||
|  | suffix .lst). | ||||||
|  | 
 | ||||||
|  | The generated output file contains a lexical analyzer routine, yylex, | ||||||
|  | implemented as: | ||||||
|  | 
 | ||||||
|  | .RS | ||||||
|  |   function yylex : Integer; | ||||||
|  | .RE | ||||||
|  | 
 | ||||||
|  | This routine has to be called by your main program to execute the lexical | ||||||
|  | analyzer. The return value of the yylex routine usually denotes the number | ||||||
|  | of a token recognized by the lexical analyzer (see the return routine in the | ||||||
|  | LexLib unit). At end-of-file the yylex routine normally returns 0. | ||||||
|  | 
 | ||||||
|  | The code template for the yylex routine may be found in the yylex.cod | ||||||
|  | file. This file is needed by TP Lex when it constructs the output file. It | ||||||
|  | must be present either in the current directory or in the directory from which | ||||||
|  | TP Lex was executed (TP Lex searches these directories in the indicated | ||||||
|  | order). (NB: For the Linux/Free Pascal version, the code template is searched | ||||||
|  | in some directory defined at compile-time instead of the execution path, | ||||||
|  | usually /usr/lib/fpc/lexyacc.) | ||||||
|  | 
 | ||||||
|  | The TP Lex library (LexLib) unit is required by programs using Lex-generated | ||||||
|  | lexical analyzers; you will therefore have to put an appropriate uses clause | ||||||
|  | into your program or unit that contains the lexical analyzer routine. The | ||||||
|  | LexLib unit also provides various useful utility routines; see the file | ||||||
|  | lexlib.pas for further information. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .SH Lex Source | ||||||
|  | 
 | ||||||
|  | A TP Lex program consists of three sections separated with the %% delimiter: | ||||||
|  | 
 | ||||||
|  | definitions | ||||||
|  | 
 | ||||||
|  | %% | ||||||
|  | .LP | ||||||
|  | rules | ||||||
|  | .LP | ||||||
|  | %% | ||||||
|  | .LP | ||||||
|  | auxiliary procedures | ||||||
|  | 
 | ||||||
|  | All sections may be empty. The TP Lex language is line-oriented; definitions | ||||||
|  | and rules are separated by line breaks. There is no special notation for | ||||||
|  | comments, but (Turbo Pascal style) comments may be included as Turbo Pascal | ||||||
|  | fragments (see below). | ||||||
|  | 
 | ||||||
|  | The definitions section may contain the following elements: | ||||||
|  | 
 | ||||||
|  | .TP | ||||||
|  | .B  regular definitions in the format: | ||||||
|  | 
 | ||||||
|  |      name   substitution | ||||||
|  | 
 | ||||||
|  | which serve to abbreviate common subexpressions. The {name} notation | ||||||
|  | causes the corresponding substitution from the definitions section to | ||||||
|  | be inserted into a regular expression. The name must be a legal | ||||||
|  | identifier (letter followed by a sequence of letters and digits; | ||||||
|  | the underscore counts as a letter; upper- and lowercase are distinct). | ||||||
|  | Regular definitions must be non-recursive. | ||||||
|  | 
 | ||||||
|  | .TP | ||||||
|  | .B  start state definitions in the format: | ||||||
|  | 
 | ||||||
|  |      %start name ... | ||||||
|  | 
 | ||||||
|  | which are used in specifying start conditions on rules (described | ||||||
|  | below). The %start keyword may also be abbreviated as %s or %S. | ||||||
|  | 
 | ||||||
|  | .TP | ||||||
|  | .B  Turbo Pascal declarations enclosed between %{ and %}.  | ||||||
|  | 
 | ||||||
|  | These will be | ||||||
|  | inserted into the output file (at global scope). Also, any line that | ||||||
|  | does not look like a Lex definition (e.g., starts with blank or tab) | ||||||
|  | will be treated as Turbo Pascal code. (In particular, this also allows | ||||||
|  | you to include Turbo Pascal comments in your Lex program.) | ||||||
|  | 
 | ||||||
|  | .SH Rules | ||||||
|  | 
 | ||||||
|  | The rules section of a TP Lex program contains the actual specification of | ||||||
|  | the lexical analyzer routine. It may be thought of as a big CASE statement | ||||||
|  | discriminating over the different patterns to be matched and listing the | ||||||
|  | corresponding statements (actions) to be executed. Each rule consists of a | ||||||
|  | regular expression describing the strings to be matched in the input, and a | ||||||
|  | corresponding action, a Turbo Pascal statement to be executed when the | ||||||
|  | expression matches. Expression and statement are delimited with whitespace | ||||||
|  | (blanks and/or tabs). Thus the format of a Lex grammar rule is: | ||||||
|  | 
 | ||||||
|  |    expression      statement; | ||||||
|  | 
 | ||||||
|  | Note that the action must be a single Turbo Pascal statement terminated | ||||||
|  | with a semicolon (use begin ... end for compound statements). The statement | ||||||
|  | may span multiple lines if the successor lines are indented with at least | ||||||
|  | one blank or tab. The action may also be replaced by the | character, | ||||||
|  | indicating that the action for this rule is the same as that for the next | ||||||
|  | one. | ||||||
|  | 
 | ||||||
|  | The TP Lex library unit provides various variables and routines which are | ||||||
|  | useful in the programming of actions. In particular, the yytext string | ||||||
|  | variable holds the text of the matched string, and the yyleng Byte variable | ||||||
|  | its length. | ||||||
|  | 
 | ||||||
|  | Regular expressions are used to describe the strings to be matched in a | ||||||
|  | grammar rule. They are built from the usual constructs describing character | ||||||
|  | classes and sequences, and operators specifying repetitions and alternatives. | ||||||
|  | The precise format of regular expressions is described in the next section. | ||||||
|  | 
 | ||||||
|  | The rules section may also start with some Turbo Pascal declarations | ||||||
|  | (enclosed in %{ %}) which are treated as local declarations of the | ||||||
|  | actions routine. | ||||||
|  | 
 | ||||||
|  | Finally, the auxiliary procedures section may contain arbitrary Turbo | ||||||
|  | Pascal code (such as supporting routines or a main program) which is | ||||||
|  | simply tacked on to the end of the output file. The auxiliary procedures | ||||||
|  | section is optional. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .SH Regular Expressions | ||||||
|  | 
 | ||||||
|  | The following table summarizes the format of the regular expressions | ||||||
|  | recognized by TP Lex (also compare Aho, Sethi, Ullman 1986, fig. 3.48). | ||||||
|  | c stands for a single character, s for a string, r for a regular expression, | ||||||
|  | and n,m for nonnegative integers. | ||||||
|  | 
 | ||||||
|  | expression   matches                        example | ||||||
|  | ----------   ----------------------------   ------- | ||||||
|  | c            any non-operator character c   a | ||||||
|  | \\c           character c literally          \\* | ||||||
|  | "s"          string s literally             "**" | ||||||
|  | .            any character but newline      a.*b | ||||||
|  | ^            beginning of line              ^abc | ||||||
|  | $            end of line                    abc$ | ||||||
|  | [s]          any character in s             [abc] | ||||||
|  | [^s]         any character not in s         [^abc] | ||||||
|  | r*           zero or more r's               a* | ||||||
|  | r+           one or more r's                a+ | ||||||
|  | r?           zero or one r                  a? | ||||||
|  | r{m,n}       m to n occurrences of r        a{1,5} | ||||||
|  | r{m}         m occurrences of r             a{5} | ||||||
|  | r1r2         r1 then r2                     ab | ||||||
|  | r1|r2        r1 or r2                       a|b | ||||||
|  | (r)          r                              (a|b) | ||||||
|  | r1/r2        r1 when followed by r2         a/b | ||||||
|  | <x>r         r when in start condition x    <x>abc | ||||||
|  | --------------------------------------------------- | ||||||
|  | 
 | ||||||
|  | The operators *, +, ? and {} have highest precedence, followed by | ||||||
|  | concatenation. The | operator has lowest precedence. Parentheses () | ||||||
|  | may be used to group expressions and overwrite default precedences. | ||||||
|  | The <> and / operators may only occur once in an expression. | ||||||
|  | 
 | ||||||
|  | The usual C-like escapes are recognized: | ||||||
|  | 
 | ||||||
|  | \\n     denotes newline | ||||||
|  | \\r     denotes carriage return | ||||||
|  | \\t     denotes tab | ||||||
|  | \\b     denotes backspace | ||||||
|  | \\f     denotes form feed | ||||||
|  | \\NNN   denotes character no. NNN in octal base | ||||||
|  | 
 | ||||||
|  | You can also use the \\ character to quote characters which would otherwise | ||||||
|  | be interpreted as operator symbols. In character classes, you may use | ||||||
|  | the - character to denote ranges of characters. For instance, [a-z] | ||||||
|  | denotes the class of all lowercase letters. | ||||||
|  | 
 | ||||||
|  | The expressions in a TP Lex program may be ambigious, i.e. there may be inputs | ||||||
|  | which match more than one rule. In such a case, the lexical analyzer prefers | ||||||
|  | the longest match and, if it still has the choice between different rules, | ||||||
|  | it picks the first of these. If no rule matches, the lexical analyzer | ||||||
|  | executes a default action which consists of copying the input character | ||||||
|  | to the output unchanged. Thus, if the purpose of a lexical analyzer is | ||||||
|  | to translate some parts of the input, and leave the rest unchanged, you | ||||||
|  | only have to specify the patterns which have to be treated specially. If, | ||||||
|  | however, the lexical analyzer has to absorb its whole input, you will have | ||||||
|  | to provide rules that match everything. E.g., you might use the rules | ||||||
|  | 
 | ||||||
|  |    .   | | ||||||
|  |    \\n  ; | ||||||
|  | 
 | ||||||
|  | which match "any other character" (and ignore it). | ||||||
|  | 
 | ||||||
|  | Sometimes certain patterns have to be analyzed differently depending on some | ||||||
|  | amount of context in which the pattern appears. In such a case the / operator | ||||||
|  | is useful. For instance, the expression a/b matches a, but only if followed | ||||||
|  | by b. Note that the b does not belong to the match; rather, the lexical | ||||||
|  | analyzer, when matching an a, will look ahead in the input to see whether | ||||||
|  | it is followed by a b, before it declares that it has matched an a. Such | ||||||
|  | lookahead may be arbitrarily complex (up to the size of the LexLib input | ||||||
|  | buffer). E.g., the pattern a/.*b matches an a which is followed by a b | ||||||
|  | somewhere on the same input line. TP Lex also has a means to specify left | ||||||
|  | context which is described in the next section. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | Start Conditions | ||||||
|  | ---------------- | ||||||
|  | 
 | ||||||
|  | TP Lex provides some features which make it possible to handle left context. | ||||||
|  | The ^ character at the beginning of a regular expression may be used to | ||||||
|  | denote the beginning of the line. More distant left context can be described | ||||||
|  | conveniently by using start conditions on rules. | ||||||
|  | 
 | ||||||
|  | Any rule which is prefixed with the <> construct is only valid if the lexical | ||||||
|  | analyzer is in the denoted start state. For instance, the expression <x>a | ||||||
|  | can only be matched if the lexical analyzer is in start state x. You can have | ||||||
|  | multiple start states in a rule; e.g., <x,y>a can be matched in start states | ||||||
|  | x or y. | ||||||
|  | 
 | ||||||
|  | Start states have to be declared in the definitions section by means of | ||||||
|  | one or more start state definitions (see above). The lexical analyzer enters | ||||||
|  | a start state through a call to the LexLib routine start. E.g., you may | ||||||
|  | write: | ||||||
|  | 
 | ||||||
|  | %start x y | ||||||
|  | %% | ||||||
|  | <x>a    start(y); | ||||||
|  | <y>b    start(x); | ||||||
|  | %% | ||||||
|  | begin | ||||||
|  |   start(x); if yylex=0 then ; | ||||||
|  | end. | ||||||
|  | 
 | ||||||
|  | Upon initialization, the lexical analyzer is put into state x. It then | ||||||
|  | proceeds in state x until it matches an a which puts it into state y. | ||||||
|  | In state y it may match a b which puts it into state x again, etc. | ||||||
|  | 
 | ||||||
|  | Start conditions are useful when certain constructs have to be analyzed | ||||||
|  | differently depending on some left context (such as a special character | ||||||
|  | at the beginning of the line), and if multiple lexical analyzers have to | ||||||
|  | work in concert. If a rule is not prefixed with a start condition, it is | ||||||
|  | valid in all user-defined start states, as well as in the lexical analyzer's | ||||||
|  | default start state. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | Lex Library | ||||||
|  | ----------- | ||||||
|  | 
 | ||||||
|  | The TP Lex library (LexLib) unit provides various variables and routines | ||||||
|  | which are used by Lex-generated lexical analyzers and application programs. | ||||||
|  | It provides the input and output streams and other internal data structures | ||||||
|  | used by the lexical analyzer routine, and supplies some variables and utility | ||||||
|  | routines which may be used by actions and application programs. Refer to | ||||||
|  | the file lexlib.pas for a closer description. | ||||||
|  | 
 | ||||||
|  | You can also modify the Lex library unit (and/or the code template in the | ||||||
|  | yylex.cod file) to customize TP Lex to your target applications. E.g., | ||||||
|  | you might wish to optimize the code of the lexical analyzer for some | ||||||
|  | special application, make the analyzer read from/write to memory instead | ||||||
|  | of files, etc. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | Implementation Restrictions | ||||||
|  | --------------------------- | ||||||
|  | 
 | ||||||
|  | Internal table sizes and the main memory available limit the complexity of | ||||||
|  | source grammars that TP Lex can handle. There is currently no possibility to | ||||||
|  | change internal table sizes (apart from modifying the sources of TP Lex | ||||||
|  | itself), but the maximum table sizes provided by TP Lex seem to be large | ||||||
|  | enough to handle most realistic applications. The actual table sizes depend on | ||||||
|  | the particular implementation (they are much larger than the defaults if TP | ||||||
|  | Lex has been compiled with one of the 32 bit compilers such as Delphi 2 or | ||||||
|  | Free Pascal), and are shown in the statistics printed by TP Lex when a | ||||||
|  | compilation is finished. The units given there are "p" (positions, i.e. items | ||||||
|  | in the position table used to construct the DFA), "s" (DFA states) and "t" | ||||||
|  | (transitions of the generated DFA). | ||||||
|  | 
 | ||||||
|  | As implemented, the generated DFA table is stored as a typed array constant | ||||||
|  | which is inserted into the yylex.cod code template. The transitions in each | ||||||
|  | state are stored in order. Of course it would have been more efficient to | ||||||
|  | generate a big CASE statement instead, but I found that this may cause | ||||||
|  | problems with the encoding of large DFA tables because Turbo Pascal has | ||||||
|  | a quite rigid limit on the code size of individual procedures. I decided to | ||||||
|  | use a scheme in which transitions on different symbols to the same state are | ||||||
|  | merged into one single transition (specifying a character set and the | ||||||
|  | corresponding next state). This keeps the number of transitions in each state | ||||||
|  | quite small and still allows a fairly efficient access to the transition | ||||||
|  | table. | ||||||
|  | 
 | ||||||
|  | The TP Lex program has an option (-o) to optimize DFA tables. This causes a | ||||||
|  | minimal DFA to be generated, using the algorithm described in Aho, Sethi, | ||||||
|  | Ullman (1986). Although the absolute limit on the number of DFA states that TP | ||||||
|  | Lex can handle is at least 300, TP Lex poses an additional restriction (100) | ||||||
|  | on the number of states in the initial partition of the DFA optimization | ||||||
|  | algorithm. Thus, you may get a fatal `integer set overflow' message when using | ||||||
|  | the -o option even when TP Lex is able to generate an unoptimized DFA. In such | ||||||
|  | cases you will just have to be content with the unoptimized DFA. (Hopefully, | ||||||
|  | this will be fixed in a future version. Anyhow, using the merged transitions | ||||||
|  | scheme described above, TP Lex usually constructs unoptimized DFA's which are | ||||||
|  | not far from being optimal, and thus in most cases DFA optimization won't have | ||||||
|  | a great impact on DFA table sizes.) | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .SH Differences from UNIX Lex | ||||||
|  | 
 | ||||||
|  | Major differences between TP Lex and UNIX Lex are listed below. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | TP Lex produces output code for Turbo Pascal, rather than for C. | ||||||
|  | 
 | ||||||
|  | Character tables (%T) are not supported; neither are any directives | ||||||
|  | to determine internal table sizes (%p, %n, etc.). | ||||||
|  | 
 | ||||||
|  | Library routines are named differently from the UNIX version (e.g., | ||||||
|  | the `start' routine takes the place of the `BEGIN' macro of UNIX | ||||||
|  | Lex), and, of course, all macros of UNIX Lex (ECHO, REJECT, etc.) had | ||||||
|  | to be implemented as procedures. | ||||||
|  | 
 | ||||||
|  | The TP Lex library unit starts counting line numbers at 0, incrementing | ||||||
|  | the count BEFORE a line is read (in contrast, UNIX Lex initializes | ||||||
|  | yylineno to 1 and increments it AFTER the line end has been read). This | ||||||
|  | is motivated by the way in which TP Lex maintains the current line, | ||||||
|  | and will not affect your programs unless you explicitly reset the | ||||||
|  | yylineno value (e.g., when opening a new input file). In such a case | ||||||
|  | you should set yylineno to 0 rather than 1. | ||||||
|  | 
 | ||||||
							
								
								
									
										1049
									
								
								install/man/man1/pyacc.1
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										1049
									
								
								install/man/man1/pyacc.1
									
									
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because it is too large
												Load Diff
											
										
									
								
							
		Loading…
	
		Reference in New Issue
	
	Block a user
	 michael
						michael