like field reordering (possible problems cracker classes) or using ebp as normal register (broken
stack traces from dump_stack)
+ niln is also valid in a cse domain
* parameters passed by reference shall have a complexity >1
* load nodes from outer scopes shall have a complexity >1
* better cse debugging
+ more node types added to cse
* consider parameters passed by reference in cse
* take care of cse in parameters in simple cases
git-svn-id: trunk@22050 -
Especially with 64bit operators the CG sometimes generates:
and r0, r1, #0
Which just clears r0 and is equivalent with
mov r0, #0
git-svn-id: trunk@22032 -
In certain cases the CG would emit something like
bic r1, r0, #0
As BIC is clearing the specified bits this is equivalent to
mov r1, r0
This patch changes the CG to emit the mov instead which the register
allocator will hopefully remove most of the time.
git-svn-id: trunk@22024 -
Currently the register spiller can not handle the "bond" between IT* and
a following instruction, sometimes breaking them apart, which breaks the
build or worse the result.
So for now we're not emitting A_IT* in second_cmp64bit anymore but use a
conditional jump instead.
This fixes Mantis #22520
git-svn-id: trunk@22009 -
In r21686 I've introduced optimized 64bit shifts for ARM. But the
methods did not check for which machine it has to generate the code.
This patch disables the optimized code for now if the target is in
cpu_thumb2 and falls back to the generic code.
There are 2 problems with the current code:
1.) Thumb-2 does not support shift by register on all data instruction
as ARM does.
2.) The code does not generate the required IT-block for the conditional
executed code.
git-svn-id: trunk@21997 -
The old code generated a strange IT-sequence:
IT EQ
MOVEQ r0, #1
IT NE
MOVNE r0, #1
Now we generate:
ITE EQ
MOVEQ r0, #1
MOVNE r0, #1
IT stands for IfThen, ITE for IfThenElse it has a couple of other forms
where the instruction gets extended to handle more of the following
instructions. So we have ITEE, ITETE etc, up to 4 instructions can be
handled.
git-svn-id: trunk@21996 -
r21885 added a new peephole optimizer. The associated code refactoring
missed a check for
tai(hp1).typ = tai_instruction
Which can lead to an access violation later on, because the rest of the
code expects to find a taicpu in hp1.
git-svn-id: trunk@21949 -
order to minimise memory losses due to alignment padding. Not yet enabled
by default at any optimization level, but can be (de)activated separately
via -Oo(no)orderfields
o added separate tdef.structalignment method that returns the alignment
of a type when it appears in a record/object/class (factors out
AIX-specific double alignment in structs)
o changed the handling of the offset of a delegate interface
implemented via a field, by taking the field offset on demand
rather than at declaration time (because the ordering optimization
causes the offsets of fields to be unknown until the entire
declaration has been parsed)
git-svn-id: trunk@21947 -
Like MOV these instructions support 2 operands, with the second beeing a
shifterop.
Without this patch the asm reader would fail on something like
cmp r0, r1, lsr 16
with
Error: Unknown identifier "LSR"
git-svn-id: trunk@21911 -
ARM can not reference an arbitrary offset so it needs some special
handling if the offset goes beyond abs(4095).
The code for do_spill_read and do_spill written used to be very similar.
I've partially factored out the code into spilling_create_load_store.
The former code loaded the offset from a constant pool, which is a waste
of memory-bandwidth and cache lines. The new code tries to find a way to
adjust the baseregister so the memory location can be reached more
easily, this allows us to handle at least +-1MB with just a single
additional ADD or SUB instruction. If that fails we'll resort to the normal
constant loading code, which on it's own will fallback to loading the
constant from a constant-pool.
So instead of:
ldr r1, =16388
ldr r0, [r13, r1]
which will at least uses 4 cycles (2 Instruction cycles + 2 stall
cycles) on most cores.
We try to generate:
add r1, r13, #16384
ldr r0, [r1, #4]
which most armv5+ cores will execute in 2 cycles. We'll also save on
DCache usage.
git-svn-id: trunk@21889 -
If the needed adjustment is not expressible in a shifterconst, the old code
loaded a temporary register (fixed to r12) via a_load_const_reg and used it
to adjust the SP. Resulting in:
mov r12, #44
orr r12, r12, #4096
sub sp, sp, r12
The new code will try to split the adjustment into 2 shifterconstants and
will do two seperate adjustments:
sub sp, sp, #44
sub sp, sp, #4096
If that doesn't work we'll fall back to the old code. But that should
happen VERY rarely, only for stacks bigger than 256k which are not
expressible in 2 shifter constants.
git-svn-id: trunk@21863 -
This removes the duplications in a_op_reg_reg_reg_checkoverflow.
OP_ROL stays seperate because it needs some special treatment again.
The code for OP_ROL was changed, previously it generated:
mov tempreg, #32
sub src1, tempreg, src1
mov dst, src2, ror src1
This would trash src1, which MIGHT be a problem, but i'm not totally
sure. But the mov/sub was replaced with rsb, so the new code looks like
this.
rsb tempreg, src1, #32
mov dst, src2, ror tempreg
If src1 gets freed afterwards the regallocator should be able to change
that into:
rsb src1, src1, #32
mov dst, src2, ror src1
git-svn-id: trunk@21804 -
The previous code was full with duplicated code, this new version just
maps the OP_* to the correct SM_* and does some special handling for
OP_ROL which is done via OP_ROR.
git-svn-id: trunk@21801 -
This fixes 64bit shifts on arm with a constant shift value of 0.
The old code would have emitted something like this
mov r0, r0, lsl #32
as 32 is an invalid shift value (and would be wrong anyway) the
assembler declined to assemble the produced source.
The new code will just not emit any code for a shift value of 0.
tests/test/tint642.pp now tests shl/shr 0 on 64 bit values.
tests/webtbs/tw22326.pp is also added as an additional test.
git-svn-id: trunk@21746 -
getintparaloc + adapted all call sites of getintparaloc. This
led to a number of additional, related changes:
o corrected the type information for some getintparaloc parameters
o don't allocate some intparalocs in cases they aren't used
o changed "const tvardata" parameter into "constref tvardata" for
fpc_variant_copy_overwrite to make pass-by-reference semantics
explicit
o moved a number of routines that now have to call find_system_type()
from cgobj to hlcgobj so that cgobj doesn't have to start depending
on the symtable unit
o added versions of the cpureg alloc/dealloc methods to hlcgobj that
call through to their cgobj counter parts, so we can call save/restore
the cpu registers before/after calling system helpers from hlcgobj
(not implemented in hlcgobj itself, because all basic register
allocator functionality is still part of cgobj/cgcpu)
git-svn-id: trunk@21696 -
* set tcgpara.def for the function return location (field introduced for and
already used by the JVM code generator, required for future hlcg
functionality)
git-svn-id: trunk@21691 -
This code generate different versions of assembly depending on the
amount to shift.
Variable Amount: 6 cycles (5 if last shift can be folded)
Constant 1 : 2 cycles
Constant 2-31 : 3 cycles (2 if last shift can foldable)
Constant 32 : 1 cycle (depends on the register allocator)
Constant 33-64 : 2 cycles
This should speed up softfpu on arm a bit.
git-svn-id: trunk@21686 -
RRX (Rotate Right with eXtend) does a single bit right rotation through
the carry. So it does not take any arguments, neither constant nor
register.
Also remove redundant shiftmode2str and replace usage of it with gas_shiftmode2str.
git-svn-id: trunk@21685 -
This code will generate the following sequence on arm:
r1=dst
r0=src
movs r1, r0
rsbmi r1, r0, #0
movs will set the N-flag when the MSB of r0 is set, if it is set, rsb
will calculate dst:=0-src;
git-svn-id: trunk@21678 -
The old version did not check the S-Postfix for MOV, which results in
removing instructions like:
movs r0, r0
which breaks later flag usage.
git-svn-id: trunk@21676 -
This now generates:
mvn r0, r0, lsl #24/#16
mov r0, r0, lsr/asr #24/#16
The lsr/asr might be folded into a following instruction, making the
whole operation 1 cycle instead of 2-3 with the previous solution.
git-svn-id: trunk@21658 -
OP_ADD, OP_SUB, OP_ORR will be split into two intructions if possible when a load/const
construction is required.
OP_AND is a bit different, because we can't just split it up, but we try
to find a two instruction BIC-equivalent to it.
Till now code like
a:= a and $FFFF;
produced code like
mov r0, $FF00
orr r0, r0, $FF
and r1, r1, r0
With this addition we produce code like:
bic r0, r0, $FF00
bic r0, r0, $FF
Saving us at least a cycle and in some cases also a load from the
constant-pool.
This uses the new split_into_shifter_const function.
git-svn-id: trunk@21647 -
* use split_into_shifter_const to reduce the MOV/ORR combination to a
single check and allow a broader rang of combinations.
* Introduce MVN/BIC combination to load values which have more 1 than 0
bits set (like small negative values)
git-svn-id: trunk@21646 -
This functions tries to split up a 32-bit value into two shifter
constants. This approach finds a broader range for two shifter constant
combinations.
git-svn-id: trunk@21645 -
cgutils, and define them so they are no larger than what is required by
the current target platform
* added cgutils to the uses clause of several units that use the
tcpuregisterset type
git-svn-id: trunk@21624 -
RS_INVALID superregister (instead of sometimes RS_NO and sometimes
RS_INVALID)
* check for RS_INVALID in tcg.g_save_registers() and ignore such entries
git-svn-id: trunk@21622 -
This slightly changes the semantics of RegUsedAfterInstruction.
We now check if the `current value` of the register will be used later.
It will do `the right thing` for all the normal use cases.
git-svn-id: trunk@21519 -
BIC clears the specified bits, while AND keeps them. The usage of BIC
allows a broader range of shifterconsts to be used on the ARM cpu, often
saving a cycle.
Previously code like:
Data:=Data and $FFFFFF00
would result in
mvn r1, #255
and r0, r0, r1
This patch changes this to
bic r0, r0, #255
git-svn-id: trunk@21510 -
The loop checked for the wrong instruction for .opcode = A_STR. Making
the whole optimizer non functional but at least not destructive.
git-svn-id: trunk@21508 -
This optimizer folds shift/roll operations into following data
instructions.
It will change code like:
mov r0, r0, lsl #16
add r1, r0, r1
into
add r1, r1, r0, lsl #16
Source registers will be reordered when necessary, also SUB/SBC will be
replaced with RSB/RSC and vice versa when reordering is required.
It could be expanded to support more operations like LDR/STR.
git-svn-id: trunk@21507 -
This changes the ARM Peephole optimizer RedundantMovProcess to also
recognize and modify something like the following sequence.
mov r0, r1
mov r0, r0, lsl #8
this would be changed into
mov r0, r1, lsl #8
git-svn-id: trunk@21506 -
BX is supported from ARMv4T onwards, but i don't have a armv4t device to
test it.
Using BX instead of mov pc,lr allows for a better pipeline utilization
by enabling the CPUs branch predictor to work properly.
git-svn-id: trunk@21505 -
- Support MLA and MUL in DataMov2Data
- SMLAL and UMLAL are also reading from oper[0]
- UMLAL, UMULL, SMLAL and SMULL are writing to oper[1]
git-svn-id: trunk@21421 -
Reorder unaligned Load sequence on ARM
The old version produced code like that:
ldrb rDEST, [rBASE]
ldrb rTemp, [rBASE, #1]
orr rDEST, rDEST, rTEMP lsl #8 (2 stall cycles)
ldrb rTemp, [rBASE, #2]
orr rDEST, rDEST, rTEMP lsl #16 (2 stall cycles)
ldrb rTemp, [rBASE, #3]
orr rDEST, rDEST, rTEMP lsl #24 (2 stall cycles)
This creates a lot of stall-cycles on ARM Implementations with load
delay slots like Marvel Kirkwood or Intel XScale. With the usual up to 2
stall-cycles this code requires a total of 13 cycles (7 instructions + 6 stall
cycles) in best case.
The new code uses a second temp register to avoid the stall cycles.
ldrb rDEST, [rBASE]
ldrb rTemp1, [rBASE, #1]
ldrb rTemp2, [rBASE, #2]
orr rDEST, rDEST, rTEMP1 lsl #8
ldrb rTemp1, [rBASE, #3]
orr rDEST, rDEST, rTEMP2 lsl #16
orr rDEST, rDEST, rTEMP1 lsl #24 (1 stall cycle)
The rescheduling and second register bring the total cycles down to 8.
If a later rescheduling should happen for the last orr it even can go
down to 7.
git-svn-id: trunk@21363 -
Optimize ARM OP_MUL/OP_IMUL for x*ispowerof2(const+1) cases
Calculations like a*7 can be optimized to a*8-a with the usage of RSB and left
shifts which can be done in a single cycle.
git-svn-id: trunk@21351 -
Improve ARM-Peephole Optimizers
1.) Introduce a ARM-specific RegUsedAfterInstruction which analyzes
instructions and reg allocation information to see if a register is
really needed afterwards to decide if some special optimizations can be
done.
2.) Introduce "RemoveSuperfluousMove"
This tries to fold mov into a previous Data-Instruction (ADD, ORR, etc)
or LDR-Instruction.
3.) Introduce new Optimizer "DataMov2Data" and modify LdrMov2Ldr to use
RemoveSuperfluousMove
4.) Expand Ldr* and Str* Optimizers to also work on {Ldr,Str}{,b,h}
git-svn-id: trunk@21314 -
Inline a couple of small functions of the ARM-Compiler
These small changes improved overall compile times of the fpc suite by
about 2-3% running on an 1.2GHz Kirkwood.
git-svn-id: trunk@21312 -
future use by high level code generator targets
o this in turn required that all a_load*_loc* methods are called via
hlcg rather than via cg, since a location can be a subsetref/reg and
and those are no longer handled in tcg
o that then required moving several force_location_* routines into
thlcg because they use a_load_loc*, but did not take tdef size
parameters (which are required by the thlcg a_load_loc* routines)
o the only practical consequence is that from now on, you have to
use hlcg.location_force_mem/reg() (fpureg not yet) and
hlcg.gen_load_loc_cgpara() instead of the removed versions from ncgutil,
and hlcg.a_load*loc*() instead of cg.a_load*loc* if a subsetref/reg
might be involved
git-svn-id: trunk@21287 -
* Introduce MatchInstruction and MatchOperand
MatchInstruction allows to match an instruction by condition and
oppostfix. MatchOperand checks if an operand is a register and matches
another operand. In the future this could be overloaded with other
versions not only accepting TRegister.
* Optimize cmp,moveq,movne sequence on ARM
This patch implements an peephole optimizer for the following sequence:
cmp reg,const1
movne reg,const2
moveq reg,const1
* Small improvements to the ARM peephole optimizer
Most instructions in the ARM ISA have taicpu(p).oper[0]^.typ = top_reg
as the only option, so there is no need to check for it if we're
looking at those instructions.
* Remove redundant mov instructions on ARM
This is an addition to the ARM PeepHole Optimizer.
It folds code like this:
mov reg1, reg2
add reg1, reg1, (const|reg)
git-svn-id: trunk@21024 -
o support for the new codepage-aware ansistrings in the jvm branch
o empty ansistrings are now always represented by a nil pointer rather than
by an empty string, because an empty string also has a code page which
can confuse code (although this will make ansistrings harder to use
in Java code)
o more string helpers code shared between the general and jvm rtl
o support for indexbyte/word in the jvm rtl (warning: first parameter
is an open array rather than an untyped parameter there, so
indexchar(pcharvar^,10,0) will be equivalent to
indexchar[pcharvar^],10,0) there, which is different from what is
intended; changing it to an untyped parameter wouldn't help though)
o default() support is not yet complete
o calling fpcres is currently broken due to limitations in
sysutils.executeprocess() regarding handling unix quoting and
the compiler using the same command lines for scripts and directly
calling external programs
o compiling the Java compiler currently requires adding ALLOW_WARNINGS=1
to the make command line
git-svn-id: branches/jvmbackend@20887 -
of the AVR-specific ifdef'ed variant
o since the only special character we use in mangled names on all platforms
is $, added a new field to tasminfo called "dollarsign" that holds the
character $'s should be replaced with (if it doesn't have to be replaced,
leave it at $)
git-svn-id: trunk@20801 -
always points to the previous r7 on the stack (with the saved return
address coming right after it) so that the debugger and crashreporter
can use it for backtraces as specified in the ABI
o changed NR_FRAME_POINTER_REG and RS_FRAME_POINTER_REG from a symbolic
into a typed constant, and added a new method to tprocinfo that can
be used to initialze it (so it can be inited to r7/r11 depending on
the target platform)
* allow using r9 on Darwin, it was only used by the system on iOS up to
2.x, which we no longer support
* prefer using r9 and r12 before r4..r11 on Darwin, because they are
volatile and hence do not have to be saved
git-svn-id: trunk@20661 -
o new eabihf (hard float) abi
o vfpv3_d16 variant of VFP (default variant used by EABI assemblers: VFPv3
with only 16 double registers instead of 32) and pass it to GNU as
o make the odd numbered single precision floating point VFP registers
available for explicit allocation for use by the calling convention
* fixed copy/paste error in stdname of S30 register
-> use -dFPC_ARMHF to create an ARM eabi hard float compiler
(mantis #21554)
git-svn-id: trunk@20660 -
* remove unnecessary ldr after str to the same memoy location, however, to do this optimization safely, we should add support for volatile variables
git-svn-id: trunk@20399 -