mirror of
https://gitlab.com/freepascal.org/fpc/source.git
synced 2025-11-19 16:19:38 +01:00
Reorder unaligned Load sequence on ARM The old version produced code like that: ldrb rDEST, [rBASE] ldrb rTemp, [rBASE, #1] orr rDEST, rDEST, rTEMP lsl #8 (2 stall cycles) ldrb rTemp, [rBASE, #2] orr rDEST, rDEST, rTEMP lsl #16 (2 stall cycles) ldrb rTemp, [rBASE, #3] orr rDEST, rDEST, rTEMP lsl #24 (2 stall cycles) This creates a lot of stall-cycles on ARM Implementations with load delay slots like Marvel Kirkwood or Intel XScale. With the usual up to 2 stall-cycles this code requires a total of 13 cycles (7 instructions + 6 stall cycles) in best case. The new code uses a second temp register to avoid the stall cycles. ldrb rDEST, [rBASE] ldrb rTemp1, [rBASE, #1] ldrb rTemp2, [rBASE, #2] orr rDEST, rDEST, rTEMP1 lsl #8 ldrb rTemp1, [rBASE, #3] orr rDEST, rDEST, rTEMP2 lsl #16 orr rDEST, rDEST, rTEMP1 lsl #24 (1 stall cycle) The rescheduling and second register bring the total cycles down to 8. If a later rescheduling should happen for the last orr it even can go down to 7. git-svn-id: trunk@21363 - |
||
|---|---|---|
| .. | ||
| aasmcpu.pas | ||
| agarmgas.pas | ||
| aoptcpu.pas | ||
| aoptcpub.pas | ||
| aoptcpuc.pas | ||
| aoptcpud.pas | ||
| armatt.inc | ||
| armatts.inc | ||
| armins.dat | ||
| armnop.inc | ||
| armop.inc | ||
| armreg.dat | ||
| armtab.inc | ||
| cgcpu.pas | ||
| cpubase.pas | ||
| cpuinfo.pas | ||
| cpunode.pas | ||
| cpupara.pas | ||
| cpupi.pas | ||
| cputarg.pas | ||
| hlcgcpu.pas | ||
| itcpugas.pas | ||
| narmadd.pas | ||
| narmcal.pas | ||
| narmcnv.pas | ||
| narmcon.pas | ||
| narminl.pas | ||
| narmmat.pas | ||
| narmset.pas | ||
| pp.lpi.template | ||
| raarm.pas | ||
| raarmgas.pas | ||
| rarmcon.inc | ||
| rarmdwa.inc | ||
| rarmnor.inc | ||
| rarmnum.inc | ||
| rarmrni.inc | ||
| rarmsri.inc | ||
| rarmsta.inc | ||
| rarmstd.inc | ||
| rarmsup.inc | ||
| rgcpu.pas | ||