+ Added Carls patches.

2025-08-16 04:29:20 +02:00 · 1998-07-22 13:51:21 +00:00 · 1998-07-22 13:51:21 +00:00 · 61285973f9
commit 61285973f9
parent 08bb4c9d4b
2 changed files with 685 additions and 361 deletions
--- a/docs/prog.tex
+++ b/docs/prog.tex
@ -31,8 +31,8 @@
 \begin{document}
 \title{Free Pascal \\ Programmers' manual}
 \docdescription{Programmers' manual for \fpc, version \fpcversion}
-\docversion{1.3}
-\date{March 1998}
+\docversion{1.4}
+\date{July 1998}
 \author{Micha\"el Van Canneyt}
 \maketitle
 \tableofcontents
@ -45,7 +45,7 @@
 This is the programmer's manual for \fpc.

 It describes some of the peculiarities of the \fpc compiler, and provides a
-glimp of how the compiler generates its code, and how you can change the
+glimpse of how the compiler generates its code, and how you can change the
 generated code. It will not, however, provide you with a detailed account of
 the inner workings of the compiler, nor will it tell you how to use the
 compiler (described in the \userref). It also will not describe the inner
@ -193,17 +193,17 @@ appears on your system.

 {\em Remark :} Take care that the object file you're linking is in a
 format the linker understands. Which format this is, depends on the platform
-you're on. Typing \var{ld} on th command line gives a list of formats
+you're on. Typing \var{ld} on the command line gives a list of formats
 \var{ld} knows about.

 You can pass other files and options to the linker using the \var{-k}
 command-line option. You can specify more than one of these options, and
-they
-will be passed to the linker, in the order that you specified them on the
-command line, just before the names of the object files that must be linked.
+they will be passed to the linker, in the order that you specified them on
+the command line, just before the names of the object files that must be
+linked.

 % Assembler type
-\subsection{\var{\$I386\_XXX} : Specify assembler format}
+\subsection{\var{\$I386\_XXX} : Specify assembler format (Intel x86 only)}
 This switch informs the compiler what kind of assembler it can expect in an
 \var{asm} block. The \var{XXX} should be replaced by one of the following:
 \begin{description}
@ -218,7 +218,7 @@ is compiled, unless they are replaced by another directive of the same type.
 The command-line switch that corresponds to this switch is \var{-R}.


-\subsection{\var{\$MMX} : MMX support}
+\subsection{\var{\$MMX} : MMX support (Intel x86 only)}
 As of version 0.9.8, \fpc supports optimization for the \textbf{MMX} Intel
 processor (see also \ref{ch:MMXSupport}). This optimizes certain code parts for the \textbf{MMX} Intel
 processor, thus greatly improving speed. The speed is noticed mostly when
@ -267,7 +267,7 @@ generated. You can specify this switch \textbf{only} befor the \var{Program}
 or \var{Unit} clause in your source file. The different kinds of formats are
 shown in \seet{Formats}.

-\begin{FPCltable}{ll}{Formats generated by the compiler}{Formats} \hline
+\begin{FPCltable}{ll}{Formats generated by the x86 compiler}{Formats} \hline
 Switch value & Generated format \\ \hline
 att  & AT\&T assembler file. \\
 o    & Unix object file.\\
@ -315,23 +315,42 @@ the executable. The effect of this switch is the same as the command-line
 switch \var{-g}. By default, insertion of debugging information is off.

 \subsection{\var{\$E} : Emulation of coprocessor}
-This directive controls the emulation of the coprocessor. On the i386
-processor, it is supported for
-compatibility with Turbo Pascal. The compiler itself doesn't do the emulation
-of the coprocessor. Under \dos, the \dos extender does this, and under
-\linux, the kernel takes care of the coprocessor support.

-If you use the Motorola 680x0 version, then the switch is recognized, as
-there is no extender to emulate the coprocessor, so the compiler must do
-that by itself.
+This directive controls the emulation of the coprocessor. There is no
+command-line counterpart for this directive.
+
+\subsubsection{ Intel x86 version }
+
+When this switch is enabled, all floating point instructions
+which are not supported by standard coprocessor emulators will give out
+a warning.
+
+The compiler itself doesn't do the emulation of the coprocessor.
+
+To use coprocessor emulation under \dos go32v1 there is nothing special
+required, as it is handled automatically.
+
+To use coprocessor emulation under \dos go32v2 you must use the
+emu387 unit, which contains correct initialization code for the
+emulator.
+
+Under \linux, the kernel takes care of the coprocessor support.
+
+\subsubsection{ Motorola 680x0 version }
+
+When the switch is on, no floating point opcodes are emitted
+by the code generator. Instead, internal run-time library routines
+are called to do the necessary calculations. In this case all
+real types are mapped to the single IEEE floating point type.
+
+\emph{ Remark : } By default, emulation is on. It is possible to
+intermix emulation code with real floating point opcodes, as
+long as the only type used is single or real.

-There is no command-line counterpart for this directive.

 \subsection{\var{\$G} : Generate 80286 code}

-This option is recognised for Turbo Pascal cmpatibility, but is ignored,
-because the compiler needs at least a 386 or higher class processor.
-
+This option is recognised for Turbo Pascal compatibility, but is ignored,

 \subsection{\var{\$L} : Local symbol information}

@ -348,15 +367,20 @@ mathematics.
 \subsection{\var{\$O} : Overlay code generation }

 This switch is recognised for Turbo Pascal compatibility, but is otherwise
-ignored, since the compiler requires a 386 or higher computer, with at 
-least 4 Mb. of ram.
+ignored.

 \subsection{\var{\$Q} : Overflow checking}
 The \var{\{\$Q+\}} directive turns on integer overflow checking.
 This means that the compiler inserts code to check for overflow when doing
 computations with an integer.
 When an overflow occurs, the run-time library will print a message
-\var{Overflow at xxx}, and exit the program with exit code 1.
+\var{Overflow at xxx}, and exit the program with exit code 215.
+
+\emph{ Remark: } Overflow checking behaviour is not the same as in
+Turbo Pascal since all arithmetic operations are done via 32-bit
+values. Furthermore, the Inc() and Dec() standard system procedures
+\emph{ are } checked for overflow in \fpc, while in Turbo Pascal they
+are not.

 Using the \var{\{\$Q-\}} switch switches off the overflow checking code
 generation.
@ -370,27 +394,26 @@ indices, enumeration types, subrange types, etc. Specifying the
 \var{\{\$R+\}} switch tells the computer to generate code to check these
 indices. If, at run-time, an index or enumeration type is specified that is
 out of the declared range of the compiler, then a run-time error is
-generated, and the program exits with exit code 1.
+generated, and the program exits with exit code 201.

 The \var{\{\$R-\}} switch tells the compiler not to generate range checking
 code. This may result in faulty program behaviour, but no run-time errors
 will be generated.

-{\em Remark: } this has not been implemented completely yet.
+{\em Remark: } Range checking for sets and enumerations are not yet fully
+implemented.

 \subsection{\var{\$S} : Stack checking}
 The \var{\{\$S+\}} directive tells the compiler to generate stack checking
 code. This generates code to check if a stack overflow occurred, i.e. to see
 whether the stack has grown beyond its maximally allowed size. If the stack
 grows beyond the maximum size, then a run-time error is generated, and the
-program will exit with exit code 1.
+program will exit with exit code 202.

 Specifying \var{\{\$S-\}} will turn generation of stack-checking code off.

-There is no command-line switch which is equivalent to this directive.
-
-{\em Remark: } In principle, the stack is almost unlimited, 
-i.e. limited to the total free amount of memory on the computer.
+The command-line compiler switch \var{-Ct} has the same effect as the
+\var{\{\$S+\}} directive.


 \subsection{\var{\$X} : Extended syntax}
@ -410,10 +433,10 @@ end;
 {$X-}
 Func (A);
 \end{verbatim}
-The reason this construct is supported is that
-you may wish to call a function for certain side-effects it has, but you
-don't need the function result. In this case you don't need to assign the
-function result, saving you an extra variable.
+The reason this construct is supported is that you may wish to call a
+function for certain side-effects it has, but you don't need the function
+result. In this case you don't need to assign the function result, saving
+you an extra variable.

 The command-line compiler switch \var{-Sa1} has the same effect as the
 \var{\{\$X+\}} directive.
@ -500,7 +523,7 @@ you should change \var{v} with the version number of the compiler
 you're using, \var{r} with the release number and \var{p}
 with the patch-number of the compiler. 'OS' needs to be changed by the type
 of operating system. Currently this can be one of \var{DOS}, \var{GO32V2},
-\var{LINUX}, \var{OS2} or \var{WIN32}. This symbol is undefined if you
+\var{LINUX}, \var{OS2}, \var{WIN32}, \var{MACOS}, \var{AMIGA} or \var{ATARI}. This symbol is undefined if you
 specify a target that is different from the platform you're compiling on.
 the \var{-TSomeOS} option on the command line will define the \var{SomeOS} symbol,
 and will undefined the existing platform symbol\footnote{In versions prior to
@ -809,7 +832,7 @@ need to compile with the \var{-Sm} command-line switch.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % Using assembly language
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\chapter{Using assembly language}
+\chapter{Using Assembly language}
 \label{ch:AsmLang}
 \fpc supports inserting of assembler instructions in your code. The
 mechanism for this is the same as under Turbo Pascal. There are, however
@ -817,7 +840,7 @@ some substantial differences, as will be explained in the following.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % Intel syntax
-\section{Intel syntax}
+\section{Intel syntax (Intel x86 only) }
 \label{se:Intel}

 As of version 0.9.7, \fpc supports Intel syntax in it's \var{asm} blocks.
@ -956,7 +979,7 @@ The Intel inline assembler supports the following macros :

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % AT&T syntax
-\section{AT\&T Syntax}
+\section{AT\&T Syntax (Intel x86 only) }
 \label{se:AttSyntax}
 \fpc uses the \gnu \var{as} assembler to generate its object files. Since
 the \gnu assembler uses AT\&T assembly syntax, the code you write should
@ -1057,57 +1080,33 @@ they are pushed {\em right} to {\em left}, instead of left to right for
 Turbo Pascal. This is especially important if you have some assembly
 subroutines in Turbo Pascal which you would like to translate to \fpc.

-Function results are returned in the first register, if they fit in the
-register. For more information on this, see \sees{Stack}
+Function results are returned in the accumulator, if they fit in the
+register.

 The registers are {\em not} saved when calling a function or procedure. If
 you want to call a procedure or function from assembly language, you must
 save any registers you wish to preserve.

 The first thing a procedure does is saving the base pointer, and setting the
-base (\var{\%ebp}) pointer equal to the stack pointer (\var{\%esp}). 
-References to the pushed parameters and local variables are constructed 
-using the base pointer.
+base pointer equal to the stack pointer. References to the pushed parameters
+and local variables are constructed using the base pointer.

-In practice this amounts to the following assembly code as the procedure or
-function header :
-\begin{verbatim}
-   pushl   %ebp
-   movl    %esp,%ebp
-\end{verbatim}  
-
-When the procedure or function exits, it clears the stack by means of the
-\var{RET xx} call, where \var{xx} is the total size of the pushed parameters
-on the stack. Thus, in case parameters with a total size of \var{xx} have
-been passed to a function, the generated exit sequence looks as follows:
-\begin{verbatim}
-  leave
-  ret  $xx
-\end{verbatim}
+When the procedure or function exits, it clears the stack.

 When you want your code to be called by a C library or used in a C
 program, you will run into trouble because of this calling mechanism. In C,
 the calling procedure is expected to clear the stack, not the called
-procedure. To avoid this problem, \fpc supports the \var{export} modifier.
-Procedures that are defined using the export modifier, use a C-compatible
-calling mechanism. This means that they can be called from a C program or
-library, or that you can use them as a callback function.
+procedure. In other words, the arguments still are on the stack when the
+procedure exits. To avoid this problem, \fpc supports the \var{export}
+modifier. Procedures that are defined using the export modifier, use a
+C-compatible calling mechanism. This means that they can be called from a
+C program or library, or that you can use them as a callback function.

 This also means that you cannot call this procedure or function from your
 own program, since your program uses the Pascal calling convention.
 However, in the exported function, you can of course call other Pascal
 routines.

-Technically, the C calling mechanism is implemented by generating the
-following exit sequence at the end of your function or procedure:
-\begin{verbatim}
-  leave         {Copies EBP to ESP, pops EBP from the stack.}
-  ret
-\end{verbatim}
-Comparing this exit sequence with the previous one makes it clear why you
-cannot call this procedure from within Pascal: The arguments still are on
-the stack when the procedure exits.
-
 As of version 0.9.8, the \fpc compiler supports also the \var{cdecl} and
 \var{stdcall} modifiers, as found in Delphi. The \var{cdecl} modifier does
 the same as the \var{export} modifier, and \var{stdcall} does nothing, since
@ -1136,6 +1135,54 @@ popstack & Right-to-left & Caller  & No \\ \hline

 More about this can be found in \seec{Linking} on linking.

+
+
+
+\subsection{ Intel x86 calling conventions }
+
+Standard entry code for procedures and functions is as follows on the
+x86 architecture:
+\begin{verbatim}
+   pushl   %ebp
+   movl    %esp,%ebp
+\end{verbatim}
+
+The generated exit sequence for procedure and functions looks as follows:
+\begin{verbatim}
+  leave
+  ret  $xx
+\end{verbatim}
+
+Where \var{xx} is the total size of the pushed parameters.
+
+To have more information on function return values take a look at the
+\seec{RegConvs} section.
+
+
+\subsection{ Motorola 680x0 calling conventions }
+
+Standard entry code for procedures and functions is as follows on the
+680x0 architecture:
+\begin{verbatim}
+   move.l  a6,-(sp)
+   move.l  sp,a6
+\end{verbatim}
+
+The generated exit sequence for procedure and functions looks as follows:
+\begin{verbatim}
+  unlk   a6
+  move.l (sp)+,a0     ; Get return address
+  add.l  #xx,sp       ; Remove allocated stack
+  move.l a0,-(sp)     ; Put back return address on top of the stack
+\end{verbatim}
+
+Where \var{xx} is the total size of the pushed parameters.
+
+To have more information on function return values take a look at the
+\seec{RegConvs} section.
+
+
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % Telling the compiler what registers have changed
 \section{Telling the compiler what registers have changed}
@ -1153,9 +1200,8 @@ asm
  ...
 end ['R1',...,'Rn'];
 \end{verbatim}
-Here \var{R1} to \var{Rn} are the names of the (extended) registers you 
-modify in your assembly code. They can be one of \var{'EAX', 'EBX', 'ECX',
-'EDX', 'EDI', 'ESI'} for the Intel processor.
+Here \var{R1} to \var{Rn} are the names of the 32-bit registers you
+modify in your assembly code.

 As an example :
 \begin{verbatim}
@ -1167,6 +1213,27 @@ As an example :
 \end{verbatim}
 This example tells the compiler that the \var{EAX} register was modified.

+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Register conventions
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Register Conventions}
+\label{se:RegConvs}
+
+The compiler has different register conventions, depending on the
+target processor used.
+
+\subsection{ Intel x86 version }
+
+When optimizations are on, no register can be freely modified, without
+first being saved and then restored. Otherwise, EDI is usually used as
+a scratch register and can be freely used in assembler blocks.
+
+\subsection{ Motorola 680x0 version }
+
+Registers which can be freely modified without saving are registers
+D0, D1, D6, A0, A1, and floating point registers FP2 to FP7. All other
+registers are to be considered reserved and should be saved and then
+restored when used in assembler blocks.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % Linking issues
@ -1542,14 +1609,14 @@ instructions on how to use and declare objects, see \refref.
 When using objects that need virtual methods, the compiler uses two help
 procedures that are in the run-time library. They are called
 \var{Help\_Destructor} and \var{Help\_Constructor}, and they are written in
-assebly language. They are used to allocate the necessary memory if needed,
+assembly language. They are used to allocate the necessary memory if needed,
 and to insert the Virtual Method Table (VMT) pointer in the newly allocated
 object.

 When the compiler encounters a call to an object's constructor,
 it sets up the stack frame for the call, and inserts a call to the
 \var{Help\_Constructor}
-procedure before issuing the call to the real constuctor. 
+procedure before issuing the call to the real constructor.
 The helper procedure allocates the needed memory (if needed) and inserts the
 VMT pointer in the object. After that, the real constructor is called.

@ -1690,7 +1757,7 @@ set up the stack. Then it calls the main program.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % MMX Support
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\chapter{MMX support}
+\chapter{MMX support (Intel x86 only) }
 \label{ch:MMXSupport}

 \section{What is it about ?}
@ -1846,7 +1913,7 @@ procedure.
 \label{se:ThirtytwoBit}
 The \fpc Pascal compiler issues 32-bit code. This has several consequences:
 \begin{itemize}
-\item You need a i386 or higher processor to run the generated code. The
+\item You need a 386 processor to run the generated code. The
 compiler functions on a 286 when you compile it using Turbo Pascal,
 but the generated programs cannot be assembled or executed.
 \item You don't need to bother with segment selectors. Memory can be
@ -1909,8 +1976,8 @@ pointer to \var{self} is pushed on the stack.
 \item If the procedure or function is nested in another function or
 procedure, then the frame pointer of the parent procedure is pushed on the
 stack.
-\item The return address is pushed on the stack (by the \var{Call}
-instruction).
+\item The return address is pushed on the stack (This is done automatically
+by the instruction which calls the subroutine).
 \end{enumerate}

 The resulting stack frame upon entering looks as in \seet{StackFrame}.
@ -1924,19 +1991,71 @@ Offset & What is stored & Optional ? \\ \hline
 +0 & Return address & No\\ \hline
 \end{FPCltable}

+\subsection{ Intel x86 version }
+
 The stack is cleared with the \var{ret} I386 instruction, meaning that the
 size of all pushed parameters is limited to 64K.

-The stack size is unlimited for all supported platforms. On the \var{GO32V2}
-platform, the minimum guaranteed stack is 128Kb, but this can be set with
-the \var{-Ctxxx} compiler switch. 
+\subsubsection{ DOS }

+Under the DOS targets , the default stack is set to 256Kb. This value
+cannot be modified for the GO32V1 target. But this can be modified
+with the GO32V2 target using a special DJGPP utility \var{stubedit}.
+It is to note that the stack size may be changed with some compiler
+switches, this stack size, if \emph{greater} then the default stack
+size will be used instead, otherwise the default stack size is used.
+
+\subsubsection{ Linux }
+
+Under Linux, stack size is only limited by the available memory by
+the system.
+
+\subsubsection{ OS/2 }
+
+Under OS/2, stack size is determined by one of the runtime
+environment variables set for EMX. Therefore, the stack size
+is user defined.
+
+\subsection{ Motorola 680x0 version }
+
+All depending on the processor target, the stack can be cleared in two
+manners, if the target processor is a MC68020 or higher, the stack will
+be cleared with a simple \var{rtd} instruction, meaning that the size
+of all pushed parameters is limited to 32K.
+
+Otherwise on MC68000/68010 processors, the stack clearing mechanism
+is sligthly more complicated, the exit code will look like this:
+
+\begin{verbatim}
+{
+  move.l  (sp)+,a0
+  add.l   paramsize,a0
+  move.l  a0,-(sp)
+  rts
+}
+\end{verbatim}
+
+\subsubsection{ Amiga }
+
+Under AmigaOS, stack size is determined by the user, which sets this
+value using the stack program. Typical sizes range from 4K to 40K.
+
+\subsubsection{ Atari }
+
+Under Atari TOS, stack size is currently limited to 8K, and it cannot
+be modified. This may change in a future release of the compiler.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% The heap
+\section{The heap}
+\label{se:Heap}
 The heap is used to store all dynamic variables, and to store class
 instances. The interface to the heap is the same as in Turbo Pascal,
 although the effects are maybe not the same. On top of that, the \fpc
 run-time library has some extra possibilities, not available in Turbo
 Pascal. These extra possibilities are explained in the next subsections.

+
 % The heap grows
 \subsection{The heap grows}
 \fpc supports the \var{HeapEerror} procedural variable. If this variable is
@ -2064,7 +2183,7 @@ ReleaseTempHeap;

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % Accessing DOS memory under the GO32 extender
-\section{Accessing \dos memory under the Go32 extender}
+\section{Accessing \dos memory under the Go32 extender (Intel x86 only) }
 \label{se:AccessingDosMemory}

 Because \fpc is a 32 bit compiler, and uses a \dos extender, accessing DOS
@ -2118,6 +2237,317 @@ After using the selector, you must free it again using the
 More information on all this can be found in the \unitsref, the chapter on
 the \file{GO32} unit.

+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Optimizations done in the compiler
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\chapter{Optimizations}
+
+\section{ Non processor specific }
+
+The following sections describe the general optimizations
+done by the compiler, they are non processor specific. Some
+of these require some compiler switch override while others are done
+automatically (those which require a switch will be noted as such).
+
+\subsection{ Constant folding }
+
+In \fpc, if the operand(s) of an operator are constants, they
+will be evaluated at compile time.
+
+Example
+
+\begin{verbatim}
+   x:=1+2+3+6+5;
+will generate the same code as
+   x:=17;
+\end{verbatim}
+
+Furthermore, if an array index is a constant, the offset will
+be evaluated at compile time. This means that accessing MyData[5]
+is as efficient as accessing a normal variable.
+
+Finally, calling \var{Chr}, \var{Hi}, \var{Lo}, \var{Ord}, \var{Pred},
+or \var{Succ} functions with constant parameters generates no
+run-time library calls, instead, the values are evaluated at
+compile time.
+
+\subsection{ Constant merging }
+
+Using the same constant string two or more times generates only
+one copy of the string constant.
+
+\subsection{ Short cut evaluation }
+
+Evaluation of boolean expression stops as soon as the result is
+known, which makes code execute faster then if all boolean operands
+were evaluted.
+
+\subsection{ Constant set inlining }
+
+Using the \var{in} operator is always more efficient then using the
+equivalent <>, =, <=, >=, < and > operators. This is because
+range comparisons can be done more easily with \var{in} then with
+normal comparison operators.
+
+\subsection{ Small sets }
+
+Sets which contain less then 33 elements can be directly encoded
+using a 32-bit value, therefore no run-time library calls to
+evaluate operands on these sets are required; they are directly encoded
+by the code generator.
+
+\subsection{ Range checking }
+
+Assignments of constants to variables are range checked at compile
+time, which removes the need the generation of runtime range checking
+code.
+
+\emph{Remark:} This feature was not implemented before version
+0.99.5 of \fpc.
+
+\subsection{ Shifts instead of multiply or divide }
+
+When one of the operands in a multiplication is a power of
+two, they are encoded using arithmetic shifts instructions,
+which generates more efficient code.
+
+Similarly, if the divisor in a \var{div} operation is a power
+of two, it is encoded using arithmetic shifts instructions.
+
+The same is true when accessing array indexes which are
+powers of two, the address is calculated using arithmetic
+shifts instead of the multiply instruction.
+
+\subsection{ Automatic alignment }
+
+By default all variables larger then a byte are guaranteed to be aligned
+at least on a word boundary.
+
+Furthermore all pointers allocated using the standard runtime
+library (\var{New} and \var{GetMem} among others) are guaranteed
+to return pointers aligned on a quadword boundary (64-bit alignment).
+
+Alignment of variables on the stack depends on the target processor.
+
+\emph{ Remark: } Quadword alignment of pointers is not guaranteed
+on systems which don't use an internal heap, such as for the Win32
+target.
+
+\emph{ Remark: } Alignment is also done \emph{between} fields in
+records, objects and classes, this is \emph{not} the same as
+in Turbo Pascal and may cause problems when using disk I/O with these
+types. To get no alignment between fields use the \var{packed} directive
+or the \var{\{\$PackRecords n\}} switch. For further information, take a
+look at the reference manual under the \var{record} heading.
+
+\subsection{ Smart linking }
+
+This feature removes all unreferenced code in the final executable
+file, making the executable file much smaller.
+
+\emph{ Remark: } Smart linking was implemented starting with
+version 0.99.6 of \fpc.
+
+\subsection{ Inline routines }
+
+The following runtime library routines are coded directly into the
+final executable : \var{Lo}, \var{Hi}, \var{High}, \var{Sizeof},
+\var{TypeOf}, \var{Length}, \var{Pred}, \var{Succ}, \var{Inc},
+\var{Dec} and \var{Assigned}.
+
+\emph{ Remark: } Inline \var{Inc} and \var{Dec} were not completely
+implemented until version 0.99.6 of \fpc.
+
+\subsection{ Case optimization }
+
+When using the \var{-Oa} switch, case statements in certain cases will
+be decoded using a jump table, which in certain cases will make the
+case statement execute faster.
+
+\subsection{ Stack frame omission }
+
+When using the \var{-Ox} switch, under certain specific conditions,
+the stack frame (entry and exit code for the routine) will be omitted, and
+the variable will directly be accessed via the stack pointer.
+
+Conditions for omission of the stack frame :
+
+\begin{itemize}
+\item Routine does not call other routines
+\item Routine does not contain assembler statements
+\item Routine is not declared using the \var{Interrupt} directive
+\item Routine is not a constructor or destructor
+\end{itemize}
+
+\subsection{ Register variables }
+
+When using the \var{-Ox} switch, local variables or parameters
+which are used very often will be moved to registers for faster
+access.
+
+\emph{ Remark: } Register variable allocation is currently
+broken and should not be used.
+
+\subsection{ Intel x86 specific }
+
+Here follows a listing of the opimizing techniques used in the compiler:
+\begin{enumerate}
+\item When optimizing for a specific Processor (\var{-O3, -O4, -O5 -O6},
+the following is done:
+\begin{itemize}
+\item In \var{case} statements, a check is done whether a jump table
+or a sequence of conditional jumps should be used for optimal performance.
+\item Determines a number of strategies when doing peephole optimization:
+\var{movzbl (\%ebp), \%eax} on PentiumPro and PII systems will be changed
+into \var{xorl \%eax,\%eax; movb (\%ebp),\%al } for lesser systems.
+\end{itemize}
+Cyrix \var{6x86} processor owners should optimize with \var{-O4} instead of
+\var{-O5}, because \var{-O5} leads to larger code, and thus to smaller
+speed, according to the Cyrix developers FAQ.
+  \item When optimizing for speed (\var{-OG}) or size (\var{-Og}), a choice is
+made between using shorter instructions (for size) such as \var{enter \$4},
+or longer instructions \var{subl \$4,\%esp} for speed. When smaller size is
+requested, things aren't aligned on 4-byte boundaries.  When speed is
+requested, things are aligned on 4-byte boundaries as much as possible.
+\item Simple optimization (\var{-Oa}) makes sure the peephole optimizer is
+used, as well as the reloading optimizer.
+\item Uncertain optimizations (\var{-Oz}): With this switch, the reloading
+optimizer (enabled with \var{-Oa}) can be forced into making uncertain
+optimizations.
+
+You can enable uncertain optimizations only in certain cases,
+otherwise you will produce a bug; the following technical description
+tells you when to use them:
+\begin{quote}
+% Jonas's own words..
+\em
+If uncertain optimizations are enabled, the reloading optimizer assumes
+that
+\begin{itemize}
+\item If something is written to a local/global register or a
+procedure/function parameter, this value doesn't overwrite the value to
+which a pointer points.
+\item If something is written to memory pointed to by a pointer variable,
+this value doesn't overwrite the value of a local/global variable or a
+procedure/function parameter.
+\end{itemize}
+% end of quote
+\end{quote}
+The practical upshot of this is that you cannot use the uncertain
+optimizations if you access any local or global variables through pointers. In
+theory, this includes \var{Var} parameters, but it is all right
+if you don't both read the variable once through its \var{Var} reference
+and then read it using it's name.
+
+The following example will produce bad code when you switch on
+uncertain optimizations:
+\begin{verbatim}
+Var temp: Longint;
+
+Procedure Foo(Var Bar: Longint);
+Begin
+  If (Bar = temp)
+    Then
+      Begin
+        Inc(Bar);
+        If (Bar <> temp) then Writeln('bug!')
+      End
+End;
+
+Begin
+  Foo(Temp);
+End.
+\end{verbatim}
+The reason it produces bad code is because you access the global variable
+\var{Temp} both through its name \var{Temp} and through a pointer, in this
+case using the \var{Bar} variable parameter, which is nothing but a pointer
+to \var{Temp} in the above code.
+
+On the other hand, you can use the uncertain optimizations if
+you access global/local variables or parameters through pointers,
+and {\em only} access them through this pointer\footnote{
+You can use multiple pointers to point to the same variable as well, that
+doesn't matter.}.
+
+For example:
+\begin{verbatim}
+Type TMyRec = Record
+                a, b: Longint;
+              End;
+     PMyRec = ^TMyRec;
+
+
+     TMyRecArray = Array [1..100000] of TMyRec;
+     PMyRecArray = ^TMyRecArray;
+
+Var MyRecArrayPtr: PMyRecArray;
+    MyRecPtr: PMyRec;
+    Counter: Longint;
+
+Begin
+  New(MyRecArrayPtr);
+  For Counter := 1 to 100000 Do
+    Begin
+       MyRecPtr := @MyRecArrayPtr^[Counter];
+       MyRecPtr^.a := Counter;
+       MyRecPtr^.b := Counter div 2;
+    End;
+End.
+\end{verbatim}
+Will produce correct code, because the global variable \var{MyRecArrayPtr}
+is not accessed directly, but through a pointer (\var{MyRecPtr} in this
+case).
+
+In conclusion, one could say that you can use uncertain optimizations {\em
+only} when you know what you're doing.
+\end{enumerate}
+
+\subsection{ Motorola 680x0 specific }
+
+Using the \var{-O2} switch does several optimizations in the
+code produced, the most notable being:
+
+\begin{itemize}
+\item Sign extension from byte to long will use \var{EXTB}
+\item Returning of functions will use \var{RTD}
+\item Range checking will generate no run-time calls
+\item Multiplication will use the long \var{MULS} instruction, no
+runtime library call will be generated
+\item Division will use the long \var{DIVS} instruction, no
+runtime library call will be generated
+\end{itemize}
+
+
+\section{ Floating point }
+
+This is where can be found processor specific information on Floating
+point code generated by the compiler.
+
+\subsection{ Intel x86 specific }
+
+All normal floating point types map to their real type, including
+\var{comp} and \var{extended}.
+
+\subsection{ Motorola 680x0 specific }
+
+Early generations of the Motorola 680x0 processors did not have integrated
+floating point units, so to circumvent this fact, all floating point
+operations are emulated (when the \var{\$E+} switch ,which is the default)
+using the IEEE \var{Single} floating point type. In other words when
+emulation is on, Real, Single, Double and Extended all map to the
+\var{single} floating point type.
+
+When the \var{\$E} switch is turned off, normal 68882/68881/68040
+floating point opcodes are emitted. The Real type still maps to
+\var{Single} but the other types map to their true floating point
+types. Only basic FPU opcodes are used, which means that it can
+work on 68040 processors correctly.
+
+\emph{ Remark: } \var{Double} and \var{Extended} types in true floating
+point mode have not been extensively tested as of version 0.99.5.
+
+\emph{ Remark: } The \var{comp} data type is currently not supported.
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % Appendices
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -2281,130 +2711,6 @@ changed by changing the \var{bytearray1} type in \file{cobjects.pas}
 compiler. When using the 32-bit compiler, the limit is set to 1024. You can
 change this by redefining the \var{maxunits} constant in the
 \file{files.pas} compiler source file.
-\item Procedures or functions accept parameters with a total size up to 
-\var{\$ffff} bytes. This limit is due to the \var{RET} instruction of the I386
-processor. If the calls were made using the C convention this limit would
-disappear.
 \end{enumerate}

-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-% Appendix D
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\chapter{Optimizing techniques used in the compiler.}
-Here follows a listing of the opimizing techniques used in the compiler:
-\begin{enumerate}
-\item When optimizing for a specific Processor (\var{-O3, -O4, -O5 -O6}, 
-the following is done:
-\begin{itemize}
-\item In \var{case} statements, a check is done whether a jump table
-or a sequence of conditional jumps should be used for optimal performance.
-\item Determines a number of strategies when doing peephole optimization:
-\var{movzbl (\%ebp), \%eax} on PentiumPro and PII systems will be changed
-into \var{xorl \%eax,\%eax; movb (\%ebp),\%al } for lesser systems.
-\end{itemize}
-Cyrix \var{6x86} processor owners should optimize with \var{-O4} instead of
-\var{-O5}, because \var{-O5} leads to larger code, and thus to smaller
-speed, according to the Cyrix developers FAQ.
-  \item When optimizing for speed (\var{-OG}) or size (\var{-Og}), a choice is
-made between using shorter instructions (for size) such as \var{enter \$4},
-or longer instructions \var{subl \$4,\%esp} for speed. When smaller size is
-requested, things aren't aligned on 4-byte boundaries.  When speed is
-requested, things are aligned on 4-byte boundaries as much as possible.
-\item Simple optimization (\var{-Oa}) makes sure the peephole optimizer is
-used, as well as the reloading optimizer.
-\item Maximum optimization (\var{-Ox}) avoids creation of stack frames if
-they aren't required, and unnecessary loading of registers is avoided as
-much as possible. (buggy at the moment (version 0.99.0).
-\item Uncertain optimizations (\var{-Oz}): With this switch, the reloading 
-optimizer (enabled with \var{-Oa}) can be forced into making uncertain 
-optimizations.
-
-You can enable uncertain optimizations only in certain cases, 
-otherwise you will produce a bug; the following technical description 
-tells you when to use them:
-\begin{quote}
-% Jonas's own words..
-\em
-If uncertain optimizations are enabled, the reloading optimizer assumes 
-that
-\begin{itemize}
-\item If something is written to a local/global register or a 
-procedure/function parameter, this value doesn't overwrite the value to 
-which a pointer points.
-\item If something is written to memory pointed to by a pointer variable, 
-this value doesn't overwrite the value of a local/global variable or a 
-procedure/function parameter.
-\end{itemize}
-% end of quote
-\end{quote}
-The practical upshot of this is that you cannot use the uncertain
-optimizations if you access any local or global variables through pointers. In 
-theory, this includes \var{Var} parameters, but it is all right 
-if you don't both read the variable once through its \var{Var} reference 
-and then read it using it's name. 
-
-The following example will produce bad code when you switch on
-uncertain optimizations:
-\begin{verbatim}
-Var temp: Longint;
-
-Procedure Foo(Var Bar: Longint);
-Begin
-  If (Bar = temp)
-    Then
-      Begin
-        Inc(Bar);
-        If (Bar <> temp) then Writeln('bug!')
-      End
-End;
-
-Begin
-  Foo(Temp);
-End.
-\end{verbatim}
-The reason it produces bad code is because you access the global variable
-\var{Temp} both through its name \var{Temp} and through a pointer, in this
-case using the \var{Bar} variable parameter, which is nothing but a pointer
-to \var{Temp} in the above code.
-
-On the other hand, you can use the uncertain optimizations if
-you access global/local variables or parameters through pointers, 
-and {\em only} access them through this pointer\footnote{
-You can use multiple pointers to point to the same variable as well, that 
-doesn't matter.}.
-
-For example:
-\begin{verbatim}
-Type TMyRec = Record
-                a, b: Longint;
-              End;
-     PMyRec = ^TMyRec;
-
-
-     TMyRecArray = Array [1..100000] of TMyRec;
-     PMyRecArray = ^TMyRecArray;
-
-Var MyRecArrayPtr: PMyRecArray;
-    MyRecPtr: PMyRec;
-    Counter: Longint;
-
-Begin
-  New(MyRecArrayPtr);
-  For Counter := 1 to 100000 Do
-    Begin
-       MyRecPtr := @MyRecArrayPtr^[Counter];
-       MyRecPtr^.a := Counter;
-       MyRecPtr^.b := Counter div 2;
-    End;
-End.
-\end{verbatim}
-Will produce correct code, because the global variable \var{MyRecArrayPtr}
-is not accessed directly, but through a pointer (\var{MyRecPtr} in this
-case). 
-
-In conclusion, one could say that you can use uncertain optimizations {\em
-only} when you know what you're doing.
-\end{enumerate} 
 \end{document}
--- a/docs/ref.tex
+++ b/docs/ref.tex
@ -114,10 +114,10 @@ percent sign (\var{\%}). Thus, \var{255} can be specified in binary notation
 as \var{\%11111111}.

 \subsection{Real types}
-\fpc uses the math coprocessor (or an emulation) for al its floating-point 
-calculations. The Real native type for is processor dependant,
+\fpc uses the math coprocessor (or an emulation) for all its floating-point
+calculations. The Real native type is processor dependant,
 but it is either Single or Double. Only the IEEE floating point type are
-supported, and these depend on the target processor and emulation options .
+supported, and these depend on the target processor and emulation options.
 The true Turbo Pascal compatible types are listed in
 \seet{Reals}.
 \begin{FPCltable}{lccr}{Supported Real types}{Reals}
@ -812,21 +812,6 @@ command-line switch.
 {\em Remark:} These constructions are just for typing convenience, they
 don't generate different code.

-\fpc also supports typed assignments. This means that an assignment
-statement has a definite type, and hence can be assigned to another
-variable. The type of the assignment \var{a:=b} is the type of \var{a}
-(or, in this case, of \var{b}), and this can be assigned to another
-variable : \var{c:=a:=b;}.
-To summarize: the construct
-\begin{verbatim}
- a:=b:=c;
-\end{verbatim}
-results in both \var{a} and \var{b} being assign the value of \var{c}, which
-may be an expression.
-
-For this construct to be allowed, it is necessary to specify the \var{-Sa4}
-switch on the command line.
-
 \subsection{The \var{Case} statement}
 \fpc supports the \var{case} statement. Its prototype is
 \begin{verbatim}
@ -968,7 +953,11 @@ Be aware of the fact that the boolean expressions \var{Expression1} and
 will be stopped at the point where the outcome is known with certainty)

 \subsection{The \var{With} statement}
-The with statement serves to access the elements of a record, without
+
+The with statement serves to access the elements of a record\footnote{
+The \var{with} statement does not work correctly when used with 
+objects or classes until version 0.99.6}
+, without
 having to specify the name of the record. Given the declaration:
 \begin{verbatim}
 Type Passenger = Record
@ -1063,10 +1052,12 @@ are also declared with open arrays as parameters, {\em not} to functions or
 procedures which accept arrays of fixed length.

 \section{Using assembler in your code}
+
 \fpc supports the use of assembler in your code, but not inline
-assembler macros. Assembly functions (i.e. functions declared with the
-\var{Assembler} keyword) are supported as of version 0.9.7. (see
-\progref for more information about this).
+assembler macros.  To have more information on the processor
+specific assembler syntax and its limitations, see the \progref.
+
+\subsection{ Assembler statements }

 The following is an example of assembler inclusion in your code.
 \begin{verbatim}
@ -1090,6 +1081,34 @@ recognise it, and treat it as any other conditionals.
 \emph{ Remark: } Before version 0.99.1, \fpc did not support
 reference to variables by their names in the assembler parts of your code.

+\subsection{ Assembler procedures and functions }
+
+Assembler procedures and functions are declared using the
+\var{Assembler} directive. The \var{Assembler} keyword is supported
+as of version 0.9.7. This permits the code generator to make a number
+of code generation optimizations.
+
+The code generator does not generate any stack frame (entry and exit
+code for the routine) if it contains no local variables. In the case
+of functions, ordinal values must be returned in the accumulator. In
+the case of floating point values, these depend on the target processor
+and emulation options.
+
+\emph{ Remark: } Before version 0.99.1, \fpc did not support
+reference to variables by their names in the assembler parts of your code.
+
+\emph{ Remark: } Currently, the \var{Assembler} directive has not the
+same effect as in Turbo Pascal, so beware! In \fpc, parameters are
+treated normally, which is not the case in Turbo Pascal. Furthermore,
+the stack frame will be omitted if there are no local variables, in this
+case if the assembly routine has any parameters, they will be referenced
+directly via the stack pointer. This is \em{ NOT} like Turbo Pascal where
+the stack frame is only omitted if there are no parameters \em{ and } no
+local variables. Therefore, if your assembly routines will modify the stack
+pointer, such as when pushing or popping values on the stack, the
+\var{Assembler} keyword should not be used. Instead, use a normal procedure
+with \var{Asm} blocks.
+
 \section{Modifiers}
 \fpc doesn't support all Turbo Pascal modifiers, but
 does support a number of additional modifiers. They are used mainly for assembler and
@ -1207,7 +1226,6 @@ function must be exactly the same.
 The \var{external} modifier has also an extended syntax:
 \begin{enumerate}
 \item
-
 \begin{verbatim}
 external 'lname';
 \end{verbatim}