% % $Id$ % This file is part of the FPC documentation. % Copyright (C) 1998 by Florian Klaempfl % % The FPC documentation is free text; you can redistribute it and/or % modify it under the terms of the GNU Library General Public License as % published by the Free Software Foundation; either version 2 of the % License, or (at your option) any later version. % % The FPC Documentation is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU % Library General Public License for more details. % % You should have received a copy of the GNU Library General Public % License along with the FPC documentation; see the file COPYING.LIB. If not, % write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, % Boston, MA 02111-1307, USA. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Preamble. \input{preamble.inc} \ifpdf \pdfinfo{/Author(Michael Van Canneyt) /Title(Programmers' Guide) /Subject(Free Pascal Compiler documentation) /Keywords(Free Pascal, Compiler, Internals) } \fi % % Settings % \makeindex % % Start of document. % \begin{document} \title{Free Pascal :\\ Compiler documentation} \docdescription{Compiler documentation for \fpc, version \fpcversion} \docversion{1.0} \input{date.inc} \author{Micha\"el Van Canneyt\\Florian Kl\"ampfl} \maketitle \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Introduction %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Introduction} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % About this document \section{About this document} This document tries to make the internal workings of \fpc more clear. It is assumed that the reader has some knowledge about compiler building. This document describes the compiler as it is/functions at the time of writing. Since the compiler is under continuous development, some of the things described here may be outdated. In case of doubt, consult the \file{README} files distributed with the compiler. The \file{README} files are, in case of conflict with this manual, authoritative. I hope, my poor english is quite understandable. Feel free to correct spelling mistakes. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % About the compiler \section{About the compiler} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Getting more information. \section{Getting more information.} The ultimate source for information about compiler internals is the compiler source, though it isn't very well documented. If you need more information you should join the developers mailing list or you can contact the developers. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Overview %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Overview} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % History \section{History} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The compiler passes \section{The compiler passes} It isn't easy to divide the compilation process of \fpc into passes how it is described by many thesis about compiler building, but I would say \fpc does the compilation in five passes: \begin{enumerate} \item Scanning and Parsing. The compiler reads the input file, does preprocessing (i. e. reading include files, expanding macros ...) (\ref{ch:scanner}) and the parser (\ref{ch:parser}) creates a parse tree (\ref{ch:parse_tree}). While this pass the compiler builds also the symbol tables (\ref{ch:symbol_tables}). \item Semantic analysis. This pass checks if semantic of the code is correct, i.e. if the types of expressions matches to the operators (\ref{ch:semantical_analysis}). This pass determines also how many registers are needed to evalute an expression, this information is used by the code generator later. \item Code generation \item Optimizing of the assembler \item Assembler writing \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The scanner %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% \chapter{The scanner} \label{ch:scanner} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The symbol tables %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{The symbol tables} \label{ch:symbol_tables} The symbol table is used to store information about all symbols, declarations and definitions in a program. In an abstract view, a symbol table is a data base with a string field as index. \fpc implements the symbol table mainly as a binary tree, but for big symbol tables some hash technics are used. The implementation can be found in symtable.pas, object tsymtable. The symbol table module can't be associated with a stage of the compiler, each stage accesses it. The scanner uses a symbol table to handle preprocessor symbols, the parser inserts declaration and the code generator uses the collected information about symbols and types to generate the code. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Definitions \section{Definitions} Definitions are one of the most important data structures in \fpc. They are used to describe types, for example the type of a variable symbol is given by a definition and the result type of a expression is given as a definition. They have nothing to do with the definition of a procedure. Definitions are implemented as an object (in file \file{symtable.pas}, \var{tdef} and its descendents). There are a lot of different definitions, for example to describe ordinal types, arrays, pointers, procedures, ... To make it more clear let's have a look at the fields of tdef: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Symbols %% \section{Symbols} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Working with symbol tables %% \section{Working with symbol tables} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The parse tree %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% \chapter{The parse tree} \label{ch:parse_tree} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The parser %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% \chapter{The parser} \label{ch:parser} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The semantical analysis %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% \chapter{The semantical analysis} \label{ch:semantical_analysis} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The code generation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% \chapter{The code generation} \label{ch:code_generation} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The assembler writers %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{The assembler writers} \label{ch:assembler_writers} \fpc doesn't generate machine language, it generates assembler which must be assembled and linked. The assembler output is configurable, \fpc can create assembler for the \file{GNU AS}, the \file{NASM} (Netwide assembler) and the assemblers of Borland and Microsoft. The default assembler is the \file{GNU AS}, because it is fast and and available on many platforms. Why don't we use the \file{NASM}? It is 2-4 times slower than the \file{GNU AS} and it is created for hand-written assembler, while the \file{GNU AS} is designed as back end for a compiler. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Miscalleanous %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% \chapter{Miscalleanous} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The register allocation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{The register allocation} The register allocation is very hairy, so it gets an own chapter in this manual. Please be careful when changing things regarding the register allocation and test such changes intensive. Future versions will may implement another kind of register allocation to make this part of the compiler more robust, see \ref{se:future_plans}. But the current system is less or more working and changing it would be a lot of work, so we have to live with it. The current register allocation mechanism was implemented 5 years ago and I didn't think that the compiler would become so popular, so not much time was spent in the design of it. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Basics \section{Basics} The register allocation is done in the first and the second pass of the compiler. The first pass of a node has to calculate how much registers are necessary to generate code for the node, but it also has to take care of child nodes i.e. how much registers they need. The register allocation is done via \var{getregister\*} %(where * is \var{32} or \var{mmx}). Registers can be released via \var{ungetregister\*}. All registers of a reference (i.e. base and index) can be released by \var{del\_reference}. These procedures take care of the register type, i.e. stack/base registers and registers allocated by register variables aren't added to the set of unused registers. If there is a problem in the register allocation an \var{internalerror(10)} occurs. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % A simple example \section{A simple example} \subsection{The first pass} This is a part of the first pass for a pointer dereferencation (\var{p\^\ }), the type determination and some other stuff are left out. \begin{verbatim} procedure firstderef(var p : ptree); begin // ..... // first pass of the child node firstpass(p^.left); // ..... // to dereference a pointer we need one one register // but if the child node needs more registers, we // have to pass this to our parent node p^.registers32:=max(p^.left^.registers32,1); // a pointer dereferencation doesn't need // fpu or mmx registers p^.registersfpu:=p^.left^.registersfpu; p^.registersmmx:=p^.left^.registersmmx; // ..... end; \end{verbatim} \subsection{The second pass} The following code contains the complete second pass for a pointer dereferencing node as it is used by current compiler versions: \begin{verbatim} procedure secondderef(var p : ptree); var hr : tregister; begin // second pass of the child node, this generates also // the code of the child node secondpass(p^.left); // setup the reference (this sets all values to nil, zero or // R_NO) clear_reference(p^.location.reference); // now we have to distinguish the different locations where // the child node could be stored case p^.left^.location.loc of LOC_REGISTER: // LOC_REGISTER allows us to use simply the // result register of the left node p^.location.reference.base:=p^.left^.location.register; LOC_CREGISTER: begin // we shouldn't destroy the result register of the // result node, because it is a register variable // so we allocate a register hr:=getregister32; // generate the loading instruction emit_reg_reg(A_MOV,S_L,p^.left^.location.register,hr); // setup the result location of the current node p^.location.reference.base:=hr; end; LOC_MEM,LOC_REFERENCE: begin // first, we have to release the registers of // the reference, before we can allocate // register, del_reference release only the // registers used by the reference, // the contents of the registers isn't destroyed del_reference(p^.left^.location.reference); // now there should be at least one register free, so // we can allocate one for the base of the result hr:=getregister32; // generate dereferencing instruction exprasmlist^.concat(new(pai386,op_ref_reg( A_MOV,S_L,newreference(p^.left^.location.reference), hr))); // setup the location of the new created reference p^.location.reference.base:=hr; end; end; end; \end{verbatim} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Binary nodes \section{Binary nodes} The whole thing becomes a little bit more hairy if you have to generate code for a binary+ node (a node with two or more childs). If a node calls second pass for a child node, it has to ensure that enough registers are free to evaluate the child node (\var{usableregs>=childnode\^.registers32}). If this condition isn't met, the current node has to store and restore all registers which the node owns to release registers. This should be done using the procedures \var{maybe\_push} and \var{restore}. If still \var{usableregs