fpc/docs/internal.tex
2001-07-29 20:28:00 +00:00

602 lines
20 KiB
TeX

%
% $Id$
% This file is part of the FPC documentation.
% Copyright (C) 1998 by Florian Klaempfl
%
% The FPC documentation is free text; you can redistribute it and/or
% modify it under the terms of the GNU Library General Public License as
% published by the Free Software Foundation; either version 2 of the
% License, or (at your option) any later version.
%
% The FPC Documentation is distributed in the hope that it will be useful,
% but WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
% Library General Public License for more details.
%
% You should have received a copy of the GNU Library General Public
% License along with the FPC documentation; see the file COPYING.LIB. If not,
% write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
% Boston, MA 02111-1307, USA.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Preamble.
\input{preamble.inc}
\ifpdf
\pdfinfo{/Author(Michael Van Canneyt)
/Title(Programmers' Guide)
/Subject(Free Pascal Compiler documentation)
/Keywords(Free Pascal, Compiler, Internals)
}
\fi
%
% Settings
%
\makeindex
%
% Start of document.
%
\begin{document}
\title{Free Pascal :\\ Compiler documentation}
\docdescription{Compiler documentation for \fpc, version \fpcversion}
\docversion{1.0}
\input{date.inc}
\author{Micha\"el Van Canneyt\\Florian Kl\"ampfl}
\maketitle
\tableofcontents
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Introduction
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% About this document
\section{About this document}
This document tries to make the internal workings of \fpc more clear.
It is assumed that the reader has some knowledge about compiler
building.
This document describes the compiler as it is/functions at the time of
writing. Since the compiler is under continuous development, some of the
things described here may be outdated. In case of doubt, consult the
\file{README} files distributed with the compiler.
The \file{README} files are, in case of conflict with this manual,
authoritative.
I hope, my poor english is quite understandable. Feel free to correct
spelling mistakes.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% About the compiler
\section{About the compiler}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Getting more information.
\section{Getting more information.}
The ultimate source for information about compiler internals is
the compiler source, though it isn't very well documented. If you
need more information you should join the developers mailing
list or you can contact the developers.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Overview
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Overview}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% History
\section{History}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The compiler passes
\section{The compiler passes}
It isn't easy to divide the compilation process of \fpc into passes
how it is described by many thesis about compiler building,
but I would say \fpc does the compilation in five passes:
\begin{enumerate}
\item Scanning and Parsing. The compiler reads the input file,
does preprocessing (i. e.
reading include files, expanding macros ...) (\ref{ch:scanner})
and the parser (\ref{ch:parser}) creates a parse tree (\ref{ch:parse_tree}).
While this pass the compiler builds also the symbol tables
(\ref{ch:symbol_tables}).
\item Semantic analysis. This pass checks if semantic of
the code is correct, i.e. if the types of expressions matches
to the operators (\ref{ch:semantical_analysis}). This pass determines
also how many registers are needed to evalute an expression, this
information is used by the code generator later.
\item Code generation
\item Optimizing of the assembler
\item Assembler writing
\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The scanner
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \chapter{The scanner}
\label{ch:scanner}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The symbol tables
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{The symbol tables}
\label{ch:symbol_tables}
The symbol table is used to store information about all
symbols, declarations and definitions in a program.
In an abstract view, a symbol table is a data base with a string field
as index. \fpc implements the symbol table mainly as a binary tree, but
for big symbol tables some hash technics are used. The implementation
can be found in symtable.pas, object tsymtable.
The symbol table module can't be associated with a stage of the compiler,
each stage accesses it.
The scanner uses a symbol table to handle preprocessor symbols, the
parser inserts declaration and the code generator uses the collected
information about symbols and types to generate the code.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Definitions
\section{Definitions}
Definitions are one of the most important data structures in \fpc.
They are used to describe types, for example the type of a variable
symbol is given by a definition and the result type
of a expression is given as a definition.
They have nothing to do with the definition of a procedure.
Definitions are implemented as an object (in file \file{symtable.pas},
\var{tdef} and its descendents). There are a lot of different
definitions, for example to describe
ordinal types, arrays, pointers, procedures, ...
To make it more clear let's have a look at the fields of tdef:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Symbols
%% \section{Symbols}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Working with symbol tables
%% \section{Working with symbol tables}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The parse tree
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \chapter{The parse tree}
\label{ch:parse_tree}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The parser
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \chapter{The parser}
\label{ch:parser}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The semantical analysis
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \chapter{The semantical analysis}
\label{ch:semantical_analysis}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The code generation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \chapter{The code generation}
\label{ch:code_generation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The assembler writers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{The assembler writers}
\label{ch:assembler_writers}
\fpc doesn't generate machine language, it generates
assembler which must be assembled and linked.
The assembler output is configurable, \fpc can create
assembler for the \file{GNU AS}, the \file{NASM} (Netwide assembler) and
the assemblers of Borland and Microsoft. The default assembler
is the \file{GNU AS}, because it is fast and and available on
many platforms. Why don't we use the \file{NASM}? It is 2-4 times
slower than the \file{GNU AS} and it is created for
hand-written assembler, while the \file{GNU AS} is designed
as back end for a compiler.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Miscalleanous
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% \chapter{Miscalleanous}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The register allocation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{The register allocation}
The register allocation is very hairy, so it gets
an own chapter in this manual. Please be careful when changing things
regarding the register allocation and test such changes intensive.
Future versions will may implement another kind of register allocation
to make this part of the compiler more robust, see
\ref{se:future_plans}. But the current
system is less or more working and changing it would be a lot of
work, so we have to live with it.
The current register allocation mechanism was implemented 5 years
ago and I didn't think that the compiler would become
so popular, so not much time was spent in the design of it.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Basics
\section{Basics}
The register allocation is done in the first and the second pass of
the compiler.
The first pass of a node has to calculate how much registers
are necessary to generate code for the node, but it also has
to take care of child nodes i.e. how much registers
they need.
The register allocation is done via \var{getregister\*}
%(where * is \var{32} or \var{mmx}).
Registers can be released via \var{ungetregister\*}. All registers
of a reference (i.e. base and index) can be released by
\var{del\_reference}. These procedures take care of the register type,
i.e. stack/base registers and registers allocated by register
variables aren't added to the set of unused registers.
If there is a problem in the register allocation an \var{internalerror(10)}
occurs.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% A simple example
\section{A simple example}
\subsection{The first pass}
This is a part of the first pass for a pointer dereferencation
(\var{p\^\ }), the type determination and some other stuff are left out.
\begin{verbatim}
procedure firstderef(var p : ptree);
begin
// .....
// first pass of the child node
firstpass(p^.left);
// .....
// to dereference a pointer we need one one register
// but if the child node needs more registers, we
// have to pass this to our parent node
p^.registers32:=max(p^.left^.registers32,1);
// a pointer dereferencation doesn't need
// fpu or mmx registers
p^.registersfpu:=p^.left^.registersfpu;
p^.registersmmx:=p^.left^.registersmmx;
// .....
end;
\end{verbatim}
\subsection{The second pass}
The following code contains the complete second pass for
a pointer dereferencing node as it is used by current
compiler versions:
\begin{verbatim}
procedure secondderef(var p : ptree);
var
hr : tregister;
begin
// second pass of the child node, this generates also
// the code of the child node
secondpass(p^.left);
// setup the reference (this sets all values to nil, zero or
// R_NO)
clear_reference(p^.location.reference);
// now we have to distinguish the different locations where
// the child node could be stored
case p^.left^.location.loc of
LOC_REGISTER:
// LOC_REGISTER allows us to use simply the
// result register of the left node
p^.location.reference.base:=p^.left^.location.register;
LOC_CREGISTER:
begin
// we shouldn't destroy the result register of the
// result node, because it is a register variable
// so we allocate a register
hr:=getregister32;
// generate the loading instruction
emit_reg_reg(A_MOV,S_L,p^.left^.location.register,hr);
// setup the result location of the current node
p^.location.reference.base:=hr;
end;
LOC_MEM,LOC_REFERENCE:
begin
// first, we have to release the registers of
// the reference, before we can allocate
// register, del_reference release only the
// registers used by the reference,
// the contents of the registers isn't destroyed
del_reference(p^.left^.location.reference);
// now there should be at least one register free, so
// we can allocate one for the base of the result
hr:=getregister32;
// generate dereferencing instruction
exprasmlist^.concat(new(pai386,op_ref_reg(
A_MOV,S_L,newreference(p^.left^.location.reference),
hr)));
// setup the location of the new created reference
p^.location.reference.base:=hr;
end;
end;
end;
\end{verbatim}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Binary nodes
\section{Binary nodes}
The whole thing becomes a little bit more hairy if you have to
generate code for a binary+ node (a node with two or more
childs). If a node calls second pass for a child node,
it has to ensure that enough registers are free
to evaluate the child node (\var{usableregs>=childnode\^.registers32}).
If this condition isn't met, the current node has
to store and restore all registers which the node owns to
release registers. This should be done using the
procedures \var{maybe\_push} and \var{restore}. If still
\var{usableregs<childnode\^.registers32}, the child nodes have to solve
the problem. The point is: if \var{usableregs<childnode\^.registers32},
the current node has to release all registers which it owns
before the second pass is called. An example for generating
code of a binary node is \var{cg386add.secondadd}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FPU registers
\section{FPU registers}
The number of required FPU registers also has to be calculated, but
there's one difference: you don't have to save registers. If not
enough FPU registers are free, an error message is generated, as the user
has to take care of this situation since this is a consequence
of the stack structure of the FPU.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Testing register allocation
\section{Testing register allocation}
To test new stuff, you should compile a procedure which contains some local
longint variables with \file{-Or}, to limit the number of
registers:
\begin{verbatim}
procedure test;
var
l,i,j,k : longint;
begin
l:=i; // this forces the compiler to assign as much as
j:=k; // possible variables to registers
// here you should insert your code
end;
\end{verbatim}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Future plans
\section{Future plans}
\label{se:future_plans}
\appendix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Coding style guide
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Coding style guide}
This chapter describes what you should consider if you modify the
compiler sources.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The formatting of the source
\section{The formatting of the sources}
Rules how to format the sources.
\begin{itemize}
\item All compiler files should be saved in UNIX format i.e. only
a line feed (\#10), no carrige return (\#13).
\item Don't use tabs
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Some hints how to write the code
\section{Some hints how to write the code}
\begin{itemize}
\item Assigned should be used instead of checking for nil directly, as
it can help solving pointer problems when in real mode.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Compiler Defines
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Compiler Defines}
The compiler can be configured using command line defines, the
basic set is decribed here, switches which change rapidly or
which are only used temporarly are described in the header
of \file{PP.PAS}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Target Processor
\section{Target processor}
The target processor must be set always and it can be:
\begin{description}
\item [\var{I386}] for Intel 32 bit processors of the i386 class
\item [\var{M68K}] for Motorola processors of the 68000 class
\end{description}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Include compiler Parts
\section{Include compiler Parts}
\subsection{General}
\begin{description}
\item[\var{GDB}] include GDB stab debugging (\file{-g}) support
\item[\var{UseBrowser}] include Browser (\file{-b}) support
\end{description}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Leave Out specific Parts
\section{Leave Out specific Parts}
Leaving out parts of the compiler can be useful if you want to create
a compiler which should also run on systems with less memory
requirements (for example a real mode version compiled with Turbo Pascal).
\subsection{General}
\begin{description}
\item[\var{NoOpt}] will leave out the optimizer
\end{description}
\subsection{I386 specific}
The following defines apply only to the i386 version of the compiler.
\begin{description}
\item[\var{NoAg386Int}] No Intel styled assembler (for MASM/TASM) writer
\item[\var{NoAg386Nsm}] No NASM assembler writer
\item[\var{NoAg386Att}] No AT\&T assembler (for the GNU AS) writer
\item[\var{NoRA386Int}] No Intel assembler parser
\item[\var{NoRA386Dir}] No direct assembler parser
\item[\var{NoRA386Att}] No AT\&T assembler parser
\end{description}
\subsection{M68k specific}
The following defines apply only to the M68k version of the compiler.
\begin{description}
\item[\var{NoAg68kGas}] No gas asm writer
\item[\var{NoAg68kMit}] No mit asm writer
\item[\var{NoAg68kMot}] No mot asm writer
\item[\var{NoRA68kMot}] No Motorola assembler parser
\end{description}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Location of the code generator functions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Location of the code generator functions}
This appendix describes where to find the functions of
the code generator. The file names are given for the
i386, for the m68k rename the 386 to 68k
\begin{description}
\item[\file{cg386con}] Constant generation
\begin{description}
\item[\var{secondordconst}]
\item[\var{secondrealconst}]
\item[\var{secondstringconst}]
\item[\var{secondfixconst}]
\item[\var{secondsetconst}]
\item[\var{secondniln}]
\end{description}
\item[\file{cg386mat}] Mathematic functions
\begin{description}
\item[\var{secondmoddiv}]
\item[\var{secondshlshr}]
\item[\var{secondumminus}]
\item[\var{secondnot}]
\end{description}
\item[\file{cg386cnv}] Type conversion functions
\begin{description}
\item[\var{secondtypeconv}]
\item[\var{secondis}]
\item[\var{secondas}]
\end{description}
\item[\file{cg386add}] Add/concat functions
\begin{description}
\item[\var{secondadd}]
\end{description}
\item[\file{cg386mem}] Memory functions
\begin{description}
\item[\var{secondvecn}]
\item[\var{secondaddr}]
\item[\var{seconddoubleaddr}]
\item[\var{secondsimplenewdispose}]
\item[\var{secondhnewn}]
\item[\var{secondhdisposen}]
\item[\var{secondselfn}]
\item[\var{secondwith}]
\item[\var{secondloadvmt}]
\item[\var{secondsubscriptn}]
\item[\var{secondderef}]
\end{description}
\item[\file{cg386flw}] Flow functions
\begin{description}
\item[\var{secondifn}]
\item[\var{second\_while\_repeatn}]
\item[\var{secondfor}]
\item[\var{secondcontinuen}]
\item[\var{secondbreakn}]
\item[\var{secondexitn}]
\item[\var{secondlabel}]
\item[\var{secondgoto}]
\item[\var{secondtryfinally}]
\item[\var{secondtryexcept}]
\item[\var{secondraise}]
\item[\var{secondfail}]
\end{description}
\item[\file{cg386ld}] Load/Store functions
\begin{description}
\item[\var{secondload}]
\item[\var{secondassignment}]
\item[\var{secondfuncret}]
\end{description}
\item[\file{cg386set}] Set functions
\begin{description}
\item[\var{secondcase}]
\item[\var{secondin}]
\end{description}
\item[\file{cg386cal}] Call/inline functions
\begin{description}
\item[\var{secondparacall}]
\item[\var{secondcall}]
\item[\var{secondprocinline}]
\item[\var{secondinline}]
\end{description}
\item[\file{cgi386}] Main secondpass handling
\begin{description}
\item[\var{secondnothing}]
\item[\var{seconderror}]
\item[\var{secondasm}]
\item[\var{secondblockn}]
\item[\var{secondstatement}]
\end{description}
\end{description}
\end{document}