Reorder unaligned Load sequence on ARM
The old version produced code like that:
ldrb rDEST, [rBASE]
ldrb rTemp, [rBASE, #1]
orr rDEST, rDEST, rTEMP lsl #8 (2 stall cycles)
ldrb rTemp, [rBASE, #2]
orr rDEST, rDEST, rTEMP lsl #16 (2 stall cycles)
ldrb rTemp, [rBASE, #3]
orr rDEST, rDEST, rTEMP lsl #24 (2 stall cycles)
This creates a lot of stall-cycles on ARM Implementations with load
delay slots like Marvel Kirkwood or Intel XScale. With the usual up to 2
stall-cycles this code requires a total of 13 cycles (7 instructions + 6 stall
cycles) in best case.
The new code uses a second temp register to avoid the stall cycles.
ldrb rDEST, [rBASE]
ldrb rTemp1, [rBASE, #1]
ldrb rTemp2, [rBASE, #2]
orr rDEST, rDEST, rTEMP1 lsl #8
ldrb rTemp1, [rBASE, #3]
orr rDEST, rDEST, rTEMP2 lsl #16
orr rDEST, rDEST, rTEMP1 lsl #24 (1 stall cycle)
The rescheduling and second register bring the total cycles down to 8.
If a later rescheduling should happen for the last orr it even can go
down to 7.
git-svn-id: trunk@21363 -
* symtable.pas:
+ add function "get_generic_in_hierarchy_by_name" which returns a def with the name of the symbol in the given object/record hierarchy (useful only in non-Delphi modes)
+ add function "return_specialization_of_generic" which returns the specialization def of a given class inside the given object/record hierarchy
* pexpr.pas, factor, factor_read_id: instead of checking whether the names of the found symbol and the current_structdef are equal, check whether the generic appears in hierarchy of current_structdef
* ptype.pas:
* id_type: check whether the found symbol is a generic dummy and we are currently parsing a generic then return the correct def of the generic instead of the dummy one
* single_type: when using the generic type without type parameters the def must resolve to the specialized def when specializing the class instead of the generic def which the dummy symbol points to
* read_named_type, expr_type: like in "single_type" we need to resolve the use of the parameterless type name to the correct specialization def instead of the generic def
* pdecobj.pas, object_dec: also set the typesym of the current_structdef as otherwise some assumptions about generics with the above mentioned changes aren't valid anymore (like the def the typesym is unset again afterwards)
+ add tests for both bug reports (the one for 19499 is slightly modified so that it does not contain any errors)
git-svn-id: trunk@21361 -
contents of a procvar had to be loaded in case of a procedure of object
or nested procvar rather than only the code address (harmless, because
this code is only active for low level targets currently and since
r21330 the location's size was used because the source and destination
types were the same)
git-svn-id: trunk@21352 -
Optimize ARM OP_MUL/OP_IMUL for x*ispowerof2(const+1) cases
Calculations like a*7 can be optimized to a*8-a with the usage of RSB and left
shifts which can be done in a single cycle.
git-svn-id: trunk@21351 -
rather than with single/double quotes depending on the target platform
(ld only supports double quotes), and rather than only quoting when
necessary (wastes time since quotes are always allowed, and double
quotes inside a directory name cannot be escaped for ld; they are
simply not supported by the program) (mantis #22059, follow-up to
r21069 and r21208)
git-svn-id: trunk@21343 -
target the AIX linker
* never quote file names added to link.res when it's not a linkerscript
(only newline is a separator in the case)
git-svn-id: trunk@21342 -
+ implement trashing of local variables if subroutine is inlined
* fix some errors releated to interproc. gotos and inlining
+ node_count function
* inline cannot be used with iochecking and safecall calling conventions
* track inherited usage
* don't inline if inherited is used
git-svn-id: trunk@21335 -
method's visibility, because this is common practice in Object-Pascal.
It can be re-enabled with {$warn INTF_RAISE_VISIBILITY on} (mainly
useful when trying to keep code compatible with the JVM target)
git-svn-id: trunk@21329 -
constants on Darwin, because its linker uses global symbols as delimiters
of subsections for dead code stripping. This was previously solved by
never making any ansistring constants smart linkable, which is now
solved
git-svn-id: trunk@21328 -
* added some new minor deprecations (mostly unused constants that are
potentially not crossplatform, and an shortstring version of a function)
git-svn-id: trunk@21321 -
* modified fpmake files in packages to allow a "make all" for NativeNT in packages to work (nearly all packages need to be disabled; the main cause for this are the still missing DOS and Objects units for the target)
git-svn-id: trunk@21319 -
Improve ARM-Peephole Optimizers
1.) Introduce a ARM-specific RegUsedAfterInstruction which analyzes
instructions and reg allocation information to see if a register is
really needed afterwards to decide if some special optimizations can be
done.
2.) Introduce "RemoveSuperfluousMove"
This tries to fold mov into a previous Data-Instruction (ADD, ORR, etc)
or LDR-Instruction.
3.) Introduce new Optimizer "DataMov2Data" and modify LdrMov2Ldr to use
RemoveSuperfluousMove
4.) Expand Ldr* and Str* Optimizers to also work on {Ldr,Str}{,b,h}
git-svn-id: trunk@21314 -
Inline a couple of small functions of the ARM-Compiler
These small changes improved overall compile times of the fpc suite by
about 2-3% running on an 1.2GHz Kirkwood.
git-svn-id: trunk@21312 -