This slightly changes the semantics of RegUsedAfterInstruction.
We now check if the `current value` of the register will be used later.
It will do `the right thing` for all the normal use cases.
git-svn-id: trunk@21519 -
BIC clears the specified bits, while AND keeps them. The usage of BIC
allows a broader range of shifterconsts to be used on the ARM cpu, often
saving a cycle.
Previously code like:
Data:=Data and $FFFFFF00
would result in
mvn r1, #255
and r0, r0, r1
This patch changes this to
bic r0, r0, #255
git-svn-id: trunk@21510 -
The loop checked for the wrong instruction for .opcode = A_STR. Making
the whole optimizer non functional but at least not destructive.
git-svn-id: trunk@21508 -
This optimizer folds shift/roll operations into following data
instructions.
It will change code like:
mov r0, r0, lsl #16
add r1, r0, r1
into
add r1, r1, r0, lsl #16
Source registers will be reordered when necessary, also SUB/SBC will be
replaced with RSB/RSC and vice versa when reordering is required.
It could be expanded to support more operations like LDR/STR.
git-svn-id: trunk@21507 -
This changes the ARM Peephole optimizer RedundantMovProcess to also
recognize and modify something like the following sequence.
mov r0, r1
mov r0, r0, lsl #8
this would be changed into
mov r0, r1, lsl #8
git-svn-id: trunk@21506 -
BX is supported from ARMv4T onwards, but i don't have a armv4t device to
test it.
Using BX instead of mov pc,lr allows for a better pipeline utilization
by enabling the CPUs branch predictor to work properly.
git-svn-id: trunk@21505 -
- Support MLA and MUL in DataMov2Data
- SMLAL and UMLAL are also reading from oper[0]
- UMLAL, UMULL, SMLAL and SMULL are writing to oper[1]
git-svn-id: trunk@21421 -
Reorder unaligned Load sequence on ARM
The old version produced code like that:
ldrb rDEST, [rBASE]
ldrb rTemp, [rBASE, #1]
orr rDEST, rDEST, rTEMP lsl #8 (2 stall cycles)
ldrb rTemp, [rBASE, #2]
orr rDEST, rDEST, rTEMP lsl #16 (2 stall cycles)
ldrb rTemp, [rBASE, #3]
orr rDEST, rDEST, rTEMP lsl #24 (2 stall cycles)
This creates a lot of stall-cycles on ARM Implementations with load
delay slots like Marvel Kirkwood or Intel XScale. With the usual up to 2
stall-cycles this code requires a total of 13 cycles (7 instructions + 6 stall
cycles) in best case.
The new code uses a second temp register to avoid the stall cycles.
ldrb rDEST, [rBASE]
ldrb rTemp1, [rBASE, #1]
ldrb rTemp2, [rBASE, #2]
orr rDEST, rDEST, rTEMP1 lsl #8
ldrb rTemp1, [rBASE, #3]
orr rDEST, rDEST, rTEMP2 lsl #16
orr rDEST, rDEST, rTEMP1 lsl #24 (1 stall cycle)
The rescheduling and second register bring the total cycles down to 8.
If a later rescheduling should happen for the last orr it even can go
down to 7.
git-svn-id: trunk@21363 -
Optimize ARM OP_MUL/OP_IMUL for x*ispowerof2(const+1) cases
Calculations like a*7 can be optimized to a*8-a with the usage of RSB and left
shifts which can be done in a single cycle.
git-svn-id: trunk@21351 -
Improve ARM-Peephole Optimizers
1.) Introduce a ARM-specific RegUsedAfterInstruction which analyzes
instructions and reg allocation information to see if a register is
really needed afterwards to decide if some special optimizations can be
done.
2.) Introduce "RemoveSuperfluousMove"
This tries to fold mov into a previous Data-Instruction (ADD, ORR, etc)
or LDR-Instruction.
3.) Introduce new Optimizer "DataMov2Data" and modify LdrMov2Ldr to use
RemoveSuperfluousMove
4.) Expand Ldr* and Str* Optimizers to also work on {Ldr,Str}{,b,h}
git-svn-id: trunk@21314 -
Inline a couple of small functions of the ARM-Compiler
These small changes improved overall compile times of the fpc suite by
about 2-3% running on an 1.2GHz Kirkwood.
git-svn-id: trunk@21312 -
future use by high level code generator targets
o this in turn required that all a_load*_loc* methods are called via
hlcg rather than via cg, since a location can be a subsetref/reg and
and those are no longer handled in tcg
o that then required moving several force_location_* routines into
thlcg because they use a_load_loc*, but did not take tdef size
parameters (which are required by the thlcg a_load_loc* routines)
o the only practical consequence is that from now on, you have to
use hlcg.location_force_mem/reg() (fpureg not yet) and
hlcg.gen_load_loc_cgpara() instead of the removed versions from ncgutil,
and hlcg.a_load*loc*() instead of cg.a_load*loc* if a subsetref/reg
might be involved
git-svn-id: trunk@21287 -
* Introduce MatchInstruction and MatchOperand
MatchInstruction allows to match an instruction by condition and
oppostfix. MatchOperand checks if an operand is a register and matches
another operand. In the future this could be overloaded with other
versions not only accepting TRegister.
* Optimize cmp,moveq,movne sequence on ARM
This patch implements an peephole optimizer for the following sequence:
cmp reg,const1
movne reg,const2
moveq reg,const1
* Small improvements to the ARM peephole optimizer
Most instructions in the ARM ISA have taicpu(p).oper[0]^.typ = top_reg
as the only option, so there is no need to check for it if we're
looking at those instructions.
* Remove redundant mov instructions on ARM
This is an addition to the ARM PeepHole Optimizer.
It folds code like this:
mov reg1, reg2
add reg1, reg1, (const|reg)
git-svn-id: trunk@21024 -
o support for the new codepage-aware ansistrings in the jvm branch
o empty ansistrings are now always represented by a nil pointer rather than
by an empty string, because an empty string also has a code page which
can confuse code (although this will make ansistrings harder to use
in Java code)
o more string helpers code shared between the general and jvm rtl
o support for indexbyte/word in the jvm rtl (warning: first parameter
is an open array rather than an untyped parameter there, so
indexchar(pcharvar^,10,0) will be equivalent to
indexchar[pcharvar^],10,0) there, which is different from what is
intended; changing it to an untyped parameter wouldn't help though)
o default() support is not yet complete
o calling fpcres is currently broken due to limitations in
sysutils.executeprocess() regarding handling unix quoting and
the compiler using the same command lines for scripts and directly
calling external programs
o compiling the Java compiler currently requires adding ALLOW_WARNINGS=1
to the make command line
git-svn-id: branches/jvmbackend@20887 -
of the AVR-specific ifdef'ed variant
o since the only special character we use in mangled names on all platforms
is $, added a new field to tasminfo called "dollarsign" that holds the
character $'s should be replaced with (if it doesn't have to be replaced,
leave it at $)
git-svn-id: trunk@20801 -
always points to the previous r7 on the stack (with the saved return
address coming right after it) so that the debugger and crashreporter
can use it for backtraces as specified in the ABI
o changed NR_FRAME_POINTER_REG and RS_FRAME_POINTER_REG from a symbolic
into a typed constant, and added a new method to tprocinfo that can
be used to initialze it (so it can be inited to r7/r11 depending on
the target platform)
* allow using r9 on Darwin, it was only used by the system on iOS up to
2.x, which we no longer support
* prefer using r9 and r12 before r4..r11 on Darwin, because they are
volatile and hence do not have to be saved
git-svn-id: trunk@20661 -
o new eabihf (hard float) abi
o vfpv3_d16 variant of VFP (default variant used by EABI assemblers: VFPv3
with only 16 double registers instead of 32) and pass it to GNU as
o make the odd numbered single precision floating point VFP registers
available for explicit allocation for use by the calling convention
* fixed copy/paste error in stdname of S30 register
-> use -dFPC_ARMHF to create an ARM eabi hard float compiler
(mantis #21554)
git-svn-id: trunk@20660 -