Move ThumbFunc flag from section to symbol.
Make .w forms optional the other way around. If .w is explicitly put on an instruction the assembler should always chose a wide form.
git-svn-id: branches/laksen/armiw@29341 -
Switched codegeneration of VFPv2 and VFPv3 to use UAL mnemonics and syntax.
Updated VFP code in RTL to use UAL syntax too.
Added preliminary ELF support for ARM.
Added support for linking of WinCE COFF files. Should work for with a standard ARMv4-I target.
git-svn-id: branches/laksen/armiw@29247 -
* Changed type of softfloat_exception_mask and softfloat_exception_flags to TFPUExceptionMask, softfloat_rounding_mode to TFPURoundingMode.
- Cleaned out numerous conversions happening when getting/setting exception mask and rounding mode.
git-svn-id: trunk@27215 -
- memory barriers are only needed on armv6 and up
- DMB on ARMv6 is "mcr 15, 0, r0, cr7, cr10, {5}", not "mcr 15, 0, r0, cr7, cr10, {4}"
- improve write barrier on armv7 by using "dmb st" instead of "dmb sy"
todo: The use of the correct barrier code should be determined during runtime.
git-svn-id: trunk@22867 -
Add simple Mul+Sub/Mul+Add into MLS/MLA optimizations
Fix some other small issues in the optimizer
Implement Interlocked* functions with proper use of LDREX/STREX
git-svn-id: branches/laksen/arm-embedded@22801 -
fpc_ansistr_incr_ref for Darwin/ARM: they don't follow the Darwin/ARM
ABI for function calls, the code already contains enough ifdefs and
I don't want to spend time on maintaining OS-specific assembler
implementations
git-svn-id: trunk@22121 -
An LDR will have two load latency cycles on most ARM implementations,
moving the
mov r4, r0
two instructions away from the corresponding ldr will avoid the stalls.
git-svn-id: trunk@22107 -
+ CPUARM_HAS_BX is defined if the CPU supports the BX* instruction
+ CPUARM_HAS_REV is defined if the CPU supports the REV instruction. Note that you still have to check for compiler versions > 2.6.0 since the assembler reader of 2.6.0 does not understand that instruction.
+ CPUARM_HAS_IDIV is defined if the CPU supports the sdiv, udiv instructions. Use of this fixes a bug where previously these instruction were only used for armv7-m, while cortex3m cpus also support it.
+ CPUARM_HAS_LDREX is defined if the CPU supports the ldrex/strex instructions. Use of this fixes a bug with armv7(-a) cpus where this path has not been used.
+ SYSTEM_HAS_KUSER_CMPXCHG is defined if the system (mainly OS) support the kuser_cmpxchg functions. Use of this fixes a bug where ARMHF systems did not use it for synchronization (although ARMHF is armv7+ only, i.e. the LDREX path is used anyway)
git-svn-id: trunk@22081 -
Instead of directly using "swp" in InterlockedExchange, use
- kuser_cmpxchg if available (on Linux/armel)
- the fpc global mutex (fpc_system_lock) otherwise
to implement it.
git-svn-id: trunk@22062 -
Optimized to minimize load latency and icache usage. Together with the
previous fpc_ansistr_decr_ref optimization this little test programm
runs about 40% faster.
program stringspeed;
procedure test(s:string);
begin
end;
var
s:string;
i: cardinal;
begin
s:='abcd';
for i:=0 to $FFFFFF do
test(s);
end.
Even with s:='' it's about 30% faster.
git-svn-id: trunk@22035 -
As fpc_ansistr_decr_ref is a very often called procedure in typical
pascal programs this optimized version will shave off some cycles
compared to the generic one.
It tries to avoid load latencies as much as possible and also uses the
new Z-flag functionality of the InterlockedDecrement from the previous
patch. Also FreeMem is called as a tail-function.
git-svn-id: trunk@22034 -
Use movs instead of mov when setting the result in r0. This way the Z
flag will be set for the calling function which might allow some smaller
optimizations later on. It does not affect current code in any way,
because flags are not expected to be used across function calls.
git-svn-id: trunk@22033 -
We're currently using rev for armv6+, but FPC 2.6 could not handle the
instruction. So if somebody wants to build trunk it can't be for armv6+.
We'll circumvent the problem by always using the the generic code when
build with FPC 2.6.
git-svn-id: trunk@22003 -
This adds some small improvements to Move_pld and Move_blended.
1.) Overlapping memory is handled as "unusual" and the code is placed at
the end of the function for better icache/bpu performance
2.) Fused the overlap check into 3 instructions with a single jump
instead of 5 instructions with 2 jumps.
2.) Use ldmia/stmia with 2 registers instead of ldr/str for faster
copying.
3.) Some code cleanup
git-svn-id: trunk@21992 -
This corrects the handling of exception masks and ARM VFP
implementations. The old code enable the exception when it was present
in the mask. So in fact it did the contrary of what it was supposed to
do.
VFP-Support is currently broken, this patch at least allows to build a
working VFP-native compiler. But the full build still breaks because of
some compiler options not properly beeing passed down to packages/ which
results in:
"Trying to use a unit which was compiled with a different FPU mode"
because somehow OPT="-Cfvfpv2" did not get passed down.
git-svn-id: trunk@21952 -