The old code generated a strange IT-sequence:
IT EQ
MOVEQ r0, #1
IT NE
MOVNE r0, #1
Now we generate:
ITE EQ
MOVEQ r0, #1
MOVNE r0, #1
IT stands for IfThen, ITE for IfThenElse it has a couple of other forms
where the instruction gets extended to handle more of the following
instructions. So we have ITEE, ITETE etc, up to 4 instructions can be
handled.
git-svn-id: trunk@21996 -
If the needed adjustment is not expressible in a shifterconst, the old code
loaded a temporary register (fixed to r12) via a_load_const_reg and used it
to adjust the SP. Resulting in:
mov r12, #44
orr r12, r12, #4096
sub sp, sp, r12
The new code will try to split the adjustment into 2 shifterconstants and
will do two seperate adjustments:
sub sp, sp, #44
sub sp, sp, #4096
If that doesn't work we'll fall back to the old code. But that should
happen VERY rarely, only for stacks bigger than 256k which are not
expressible in 2 shifter constants.
git-svn-id: trunk@21863 -
This removes the duplications in a_op_reg_reg_reg_checkoverflow.
OP_ROL stays seperate because it needs some special treatment again.
The code for OP_ROL was changed, previously it generated:
mov tempreg, #32
sub src1, tempreg, src1
mov dst, src2, ror src1
This would trash src1, which MIGHT be a problem, but i'm not totally
sure. But the mov/sub was replaced with rsb, so the new code looks like
this.
rsb tempreg, src1, #32
mov dst, src2, ror tempreg
If src1 gets freed afterwards the regallocator should be able to change
that into:
rsb src1, src1, #32
mov dst, src2, ror src1
git-svn-id: trunk@21804 -
The previous code was full with duplicated code, this new version just
maps the OP_* to the correct SM_* and does some special handling for
OP_ROL which is done via OP_ROR.
git-svn-id: trunk@21801 -
getintparaloc + adapted all call sites of getintparaloc. This
led to a number of additional, related changes:
o corrected the type information for some getintparaloc parameters
o don't allocate some intparalocs in cases they aren't used
o changed "const tvardata" parameter into "constref tvardata" for
fpc_variant_copy_overwrite to make pass-by-reference semantics
explicit
o moved a number of routines that now have to call find_system_type()
from cgobj to hlcgobj so that cgobj doesn't have to start depending
on the symtable unit
o added versions of the cpureg alloc/dealloc methods to hlcgobj that
call through to their cgobj counter parts, so we can call save/restore
the cpu registers before/after calling system helpers from hlcgobj
(not implemented in hlcgobj itself, because all basic register
allocator functionality is still part of cgobj/cgcpu)
git-svn-id: trunk@21696 -
This now generates:
mvn r0, r0, lsl #24/#16
mov r0, r0, lsr/asr #24/#16
The lsr/asr might be folded into a following instruction, making the
whole operation 1 cycle instead of 2-3 with the previous solution.
git-svn-id: trunk@21658 -
OP_ADD, OP_SUB, OP_ORR will be split into two intructions if possible when a load/const
construction is required.
OP_AND is a bit different, because we can't just split it up, but we try
to find a two instruction BIC-equivalent to it.
Till now code like
a:= a and $FFFF;
produced code like
mov r0, $FF00
orr r0, r0, $FF
and r1, r1, r0
With this addition we produce code like:
bic r0, r0, $FF00
bic r0, r0, $FF
Saving us at least a cycle and in some cases also a load from the
constant-pool.
This uses the new split_into_shifter_const function.
git-svn-id: trunk@21647 -
* use split_into_shifter_const to reduce the MOV/ORR combination to a
single check and allow a broader rang of combinations.
* Introduce MVN/BIC combination to load values which have more 1 than 0
bits set (like small negative values)
git-svn-id: trunk@21646 -
BIC clears the specified bits, while AND keeps them. The usage of BIC
allows a broader range of shifterconsts to be used on the ARM cpu, often
saving a cycle.
Previously code like:
Data:=Data and $FFFFFF00
would result in
mvn r1, #255
and r0, r0, r1
This patch changes this to
bic r0, r0, #255
git-svn-id: trunk@21510 -
BX is supported from ARMv4T onwards, but i don't have a armv4t device to
test it.
Using BX instead of mov pc,lr allows for a better pipeline utilization
by enabling the CPUs branch predictor to work properly.
git-svn-id: trunk@21505 -
Reorder unaligned Load sequence on ARM
The old version produced code like that:
ldrb rDEST, [rBASE]
ldrb rTemp, [rBASE, #1]
orr rDEST, rDEST, rTEMP lsl #8 (2 stall cycles)
ldrb rTemp, [rBASE, #2]
orr rDEST, rDEST, rTEMP lsl #16 (2 stall cycles)
ldrb rTemp, [rBASE, #3]
orr rDEST, rDEST, rTEMP lsl #24 (2 stall cycles)
This creates a lot of stall-cycles on ARM Implementations with load
delay slots like Marvel Kirkwood or Intel XScale. With the usual up to 2
stall-cycles this code requires a total of 13 cycles (7 instructions + 6 stall
cycles) in best case.
The new code uses a second temp register to avoid the stall cycles.
ldrb rDEST, [rBASE]
ldrb rTemp1, [rBASE, #1]
ldrb rTemp2, [rBASE, #2]
orr rDEST, rDEST, rTEMP1 lsl #8
ldrb rTemp1, [rBASE, #3]
orr rDEST, rDEST, rTEMP2 lsl #16
orr rDEST, rDEST, rTEMP1 lsl #24 (1 stall cycle)
The rescheduling and second register bring the total cycles down to 8.
If a later rescheduling should happen for the last orr it even can go
down to 7.
git-svn-id: trunk@21363 -
Optimize ARM OP_MUL/OP_IMUL for x*ispowerof2(const+1) cases
Calculations like a*7 can be optimized to a*8-a with the usage of RSB and left
shifts which can be done in a single cycle.
git-svn-id: trunk@21351 -
always points to the previous r7 on the stack (with the saved return
address coming right after it) so that the debugger and crashreporter
can use it for backtraces as specified in the ABI
o changed NR_FRAME_POINTER_REG and RS_FRAME_POINTER_REG from a symbolic
into a typed constant, and added a new method to tprocinfo that can
be used to initialze it (so it can be inited to r7/r11 depending on
the target platform)
* allow using r9 on Darwin, it was only used by the system on iOS up to
2.x, which we no longer support
* prefer using r9 and r12 before r4..r11 on Darwin, because they are
volatile and hence do not have to be saved
git-svn-id: trunk@20661 -
o new eabihf (hard float) abi
o vfpv3_d16 variant of VFP (default variant used by EABI assemblers: VFPv3
with only 16 double registers instead of 32) and pass it to GNU as
o make the odd numbered single precision floating point VFP registers
available for explicit allocation for use by the calling convention
* fixed copy/paste error in stdname of S30 register
-> use -dFPC_ARMHF to create an ARM eabi hard float compiler
(mantis #21554)
git-svn-id: trunk@20660 -
* $CPU/cgcpu.pas: disable the generation of VMT loading code
* dbgstabs.pas, dbgdwarf.pas: treat virtual methods of helpers as normal methods
* ncgcal.pas: don't register virtual helper methods for WPO
* ncgrtti.pas: write virtual helper methods as normal methods to RTTI
* nobj.pas: correctly handle final and override cases in helpers
* pdecvar.pas: property getters
* rautils.pas: no VMT offset in records
git-svn-id: branches/svenbarth/classhelpers@17150 -
* generate add.w instead of add for thumb-2 in case one of the registers
is > r8
* add register interferences for the "add" instruction so the register
allocator can detect invalid instruction forms (even for assembler code)
* fixed error in thumb2.inc detected by the previous change
git-svn-id: trunk@16633 -
be used outside the code generator
* renamed tabstractprocdef.requiredargarea into callerargareasize,
and also added calleeargareasize field; added init_paraloc_info(side)
method to init the parameter locations and init those size fields and
replaced all "if not procdef.has_paraloc_info then ..." blocks with
procdef.init_paraloc_info(callersize)"
* moved detection of stack tainting parameters from psub to
symdef/tabstractprocdef
+ added tcallparanode.contains_stack_tainting_call(), which detects
whether a parameter contains a call that makes use of stack paramters
* record for each parameter whether or not any following parameter
contains a call with stack parameters; if not, in case the current
parameter itself is a stack parameter immediately place it in its
final location also for use_fixed_stack platforms rather than
first putting it in a temporary location (part of mantis #17442)
* on use_fixed_stack platforms, always first evaluate parameters
containing a stack tainting call, since those force any preceding
stack parameters of the current call to be stored in a temp location
and copied to the final location afterwards
git-svn-id: trunk@16050 -
- sort of reverted r14134, which is no longer required after the above
change (new_section() inserts the alignment itself)
* made the tai_section.create() constructor private so it cannot be
called directly anymore
git-svn-id: trunk@15482 -
represent complex locations (required for full x86-64 ABI support,
which is not yet implemented) -> lots of special result handling
code has been removed and replaced by the parameter handling
routines
+ added support for composite parameters (and hence function
results) to tcg.a_load_ref_cgpara() (so it can be used for
handling, e.g., 64 bit parameters on 32 bit platforms)
* the above fixed writing past the end of allocated memory when
handling records returned in registers on x86-64 whose size is
not a multiple of 8 bytes (mantis #16357)
- removed the x86-64 and PPC specific versions of a_load_ref_cgpara(),
as they are now handled correctly by the generic version
* moved the responsibility of allocating tcgpara cpu registers
(using paramanager.allocparaloc()) from the callers of
cg.a_load*_cgpara() to the cg.a_load*_cgpara() methods
themselves (so the register allocation can be done efficiently
when dealing with function results)
* for the above, renamed paramanager.alloc/freeparaloc() to
paramanager.alloc/freecgpara(), and use paramanager.allocparaloc()
to allocate individual pcgparalocations instead
* fixed the register size of SSE2 function result registers for
x86-64 (when used for floating point), which results in removing
a few superfluous "movs? %xmm0,%xmm0" instructions
* fixed compilation of paramanagers of avr, m68k and mips after r13695
and also updated them for these new changes
git-svn-id: trunk@15350 -
and above, so this also works when calling thumb code (should actually
also be done for ARMv5T, but we don't have a monicker for that yet)
* use BX instead of "mov r15, r14" for simple returns from subroutines
on ARMv6+ to support returning to thumb code from ARM code (idem)
git-svn-id: trunk@14332 -
+ RTL support:
o VFP exceptions are disabled by default on Darwin,
because they cause kernel panics on iPhoneOS 2.2.1 at least
o all denormals are truncated to 0 on Darwin, because disabling
that also causes kernel panics on iPhoneOS 2.2.1 (probably
because otherwise denormals can also cause exceptions)
* set softfloat rounding mode correctly for non-wince/darwin/vfp
targets
+ compiler support: only half the number of single precision
registers is available due to limitations of the register
allocator
+ added a number of comments about why the stackframe on ARM is
set up the way it is by the compiler
+ added regtype and subregtype info to regsets, because they're
also used for VFP registers (+ support in assembler reader)
+ various generic support routines for dealing with floating point
values located in integer registers that have to be transferred to
mm registers (needed for VFP)
* renamed use_sse() to use_vectorfpu() and also use it for
ARM/vfp support
o only superficially tested for Linux (compiler compiled with -Cpvfpv6
-Cfvfpv2 works on a Cortex-A8, no testsuite run performed -- at least
the fpu exception handler still needs to be implemented), Darwin has
been tested more thoroughly
+ added ARMv6 cpu type and made it default for Darwin/ARM
+ ARMv6+ implementations of atomic operations using ldrex/strex
* don't use r9 on Darwin/ARM, as it's reserved under certain
circumstances (don't know yet which ones)
* changed C-test object files for ARM/Darwin to ARMv6 versions
* check in assembler reader that regsets are not empty, because
instructions with a regset operand have undefined behaviour in that
case
* fixed resultdef of tarmtypeconvnode.first_int_to_real in case of
int64->single type conversion
* fixed constant pool locations in case 64 bit constants are generated,
and/or when vfp instructions with limited reach are present
WARNING: when using VFP on an ARMv6 or later cpu, you *must* compile all
code with -Cparmv6 (or higher), or you will get crashes. The reason is
that storing/restoring multiple VFP registers must happen using
different instructions on pre/post-ARMv6.
git-svn-id: trunk@14317 -
-- Zusammenführen der Unterschiede zwischen Projektarchiv-URLs in ».«:
U rtl/arm/setjump.inc
A rtl/arm/thumb2.inc
U rtl/arm/divide.inc
A rtl/embedded/arm/stm32f103.pp
U rtl/inc/system.inc
U compiler/alpha/cgcpu.pas
U compiler/sparc/cgcpu.pas
U compiler/i386/cgcpu.pas
U compiler/ncgld.pas
U compiler/powerpc/cgcpu.pas
U compiler/avr/cgcpu.pas
U compiler/aggas.pas
U compiler/powerpc64/cgcpu.pas
U compiler/x86_64/cgcpu.pas
U compiler/cgobj.pas
U compiler/psystem.pas
U compiler/aasmtai.pas
U compiler/m68k/cgcpu.pas
U compiler/ncgutil.pas
U compiler/rautils.pas
U compiler/arm/raarmgas.pas
U compiler/arm/armatts.inc
U compiler/arm/cgcpu.pas
U compiler/arm/armins.dat
U compiler/arm/rgcpu.pas
U compiler/arm/cpubase.pas
U compiler/arm/agarmgas.pas
U compiler/arm/cpuinfo.pas
U compiler/arm/armop.inc
U compiler/arm/narmadd.pas
U compiler/arm/aoptcpu.pas
U compiler/arm/armatt.inc
U compiler/arm/aasmcpu.pas
U compiler/systems/t_embed.pas
U compiler/psub.pas
U compiler/options.pas
git-svn-id: trunk@13801 -
alignment for each memory reference (mantis #12137, and
test/packages/fcl-registry/tregistry1.pp on sparc). This also
enables better code generation for packed records in many cases.
o several changes were made to the compiler to minimise the chances
of accidentally forgetting to set the alignment of memory references
in the future:
- reference_reset*() now has an extra alignment parameter
- location_reset() can now only be used for non LOC_(C)REFERENCE,
use location_reset_ref() for those (split the tloc enum so the
compiler can catch errors using range checking)
git-svn-id: trunk@12719 -
the syntax is exactly the same as for "external", except for
the keyword. It is currently only active for Darwin targets.
It should also work at least for Linux targets, but only with
the GNU assembler (which is why it is not activated there)
+ test for this functionality
git-svn-id: trunk@12009 -
+ write the header for non-pic darwin/arm call stubs properly in aggas
* r9 is not available for general use on darwin/arm according to the llvm
code generator
git-svn-id: trunk@11862 -
* Improved generic a_load_ref_reg_unaligned if ref alignment is 2.
* Improved unaligned load/store of register for ARM.
* It fixes passing records by value on ARM.
+ New test.
git-svn-id: trunk@10681 -
* Fixed sign/zero-extension in second_int_to_bool for all CPUs. x86 and pppc were not affected by this bug, but I fixed it for all CPUs for consistency.
* cg/tcnvint1 is passed on arm now.
git-svn-id: trunk@10669 -
for qwordbool) + test:
o assigning true to such a variable now sets them to $ff/$ffff/$ffffffff
o these types are now all signed
o converting an integer type to a byte/word/long/qwordbool using an
explicit type cast keeps the integer's original value stored in the
bool, instead of forcing it to ord(true)/ord(false)
(mantis #10233 and #10613, implemented for all architectures, testsuite
tested for ppc32, sparc and x86)
* fixed some places where the rtl depended on longbool(true) having the
value 1
* extended several boolean tests (and adapted some to no longer assume
that byte/word/long/qwordbool(true)=1)
+ support for converting to qwordbool in second_int_to_bool for x86, ppc
and sparc
git-svn-id: trunk@9898 -
* fixed downsizing the precision of floating point values
* floating point constants are now treated using only the minimal
precision required (e.g. 2.0 is now a single, 1.1 extended etc)
(Delphi compatible)
git-svn-id: trunk@5927 -
* symtables based on TFPHashObjectList and TFPObjectList
* rename torddef.typ to torddef.ordtype
* rename tfloatdef.typ to tfloatdef.floattype
* rename tdef.deftype to tdef.typ
* remove obsolete browser code, browcol is kept so the ide
can still be compiled
git-svn-id: trunk@5192 -
getjumplabel and added type para to getlabel for specific types
* moved lineinfo generation from assemble and aggas to dbgstabs
git-svn-id: trunk@1120 -
was not being initialised -> getfinaldestination always failed, which
caused wrong optimizations in some cases)
* changed the inverse_cond into a function, because tasmcond is a record
on ppc
+ added a compare_conditions() function for the same reason
- invalid calling conventions for a certain cpu are rejected
- arm softfloat calling conventions
- -Sp for cpu dependend code generation
- several arm fixes
- remaining code for value open array paras on heap
* started to fix ppc, needs an overhaul
+ stabs info improve for spilling, not sure if it works correctly/completly
- MMX_SUPPORT removed from Makefile.fpc
* move some protected and private field around
* the temp. register for register parameters/arguments are now released
before the move to the parameter register is done. This improves
the code in a lot of cases.