The linker will always change BL to BLX if necessary, but not vice versa (linker version dependent).
"BLX label" ALWAYS changes the instruction set. It changes a processor in ARM state to Thumb state,
or a processor in Thumb state to ARM state.
git-svn-id: trunk@36086 -
Move ThumbFunc flag from section to symbol.
Make .w forms optional the other way around. If .w is explicitly put on an instruction the assembler should always chose a wide form.
git-svn-id: branches/laksen/armiw@29341 -
Switched codegeneration of VFPv2 and VFPv3 to use UAL mnemonics and syntax.
Updated VFP code in RTL to use UAL syntax too.
Added preliminary ELF support for ARM.
Added support for linking of WinCE COFF files. Should work for with a standard ARMv4-I target.
git-svn-id: branches/laksen/armiw@29247 -
- memory barriers are only needed on armv6 and up
- DMB on ARMv6 is "mcr 15, 0, r0, cr7, cr10, {5}", not "mcr 15, 0, r0, cr7, cr10, {4}"
- improve write barrier on armv7 by using "dmb st" instead of "dmb sy"
todo: The use of the correct barrier code should be determined during runtime.
git-svn-id: trunk@22867 -
fpc_ansistr_incr_ref for Darwin/ARM: they don't follow the Darwin/ARM
ABI for function calls, the code already contains enough ifdefs and
I don't want to spend time on maintaining OS-specific assembler
implementations
git-svn-id: trunk@22121 -
An LDR will have two load latency cycles on most ARM implementations,
moving the
mov r4, r0
two instructions away from the corresponding ldr will avoid the stalls.
git-svn-id: trunk@22107 -
+ CPUARM_HAS_BX is defined if the CPU supports the BX* instruction
+ CPUARM_HAS_REV is defined if the CPU supports the REV instruction. Note that you still have to check for compiler versions > 2.6.0 since the assembler reader of 2.6.0 does not understand that instruction.
+ CPUARM_HAS_IDIV is defined if the CPU supports the sdiv, udiv instructions. Use of this fixes a bug where previously these instruction were only used for armv7-m, while cortex3m cpus also support it.
+ CPUARM_HAS_LDREX is defined if the CPU supports the ldrex/strex instructions. Use of this fixes a bug with armv7(-a) cpus where this path has not been used.
+ SYSTEM_HAS_KUSER_CMPXCHG is defined if the system (mainly OS) support the kuser_cmpxchg functions. Use of this fixes a bug where ARMHF systems did not use it for synchronization (although ARMHF is armv7+ only, i.e. the LDREX path is used anyway)
git-svn-id: trunk@22081 -
Instead of directly using "swp" in InterlockedExchange, use
- kuser_cmpxchg if available (on Linux/armel)
- the fpc global mutex (fpc_system_lock) otherwise
to implement it.
git-svn-id: trunk@22062 -
Optimized to minimize load latency and icache usage. Together with the
previous fpc_ansistr_decr_ref optimization this little test programm
runs about 40% faster.
program stringspeed;
procedure test(s:string);
begin
end;
var
s:string;
i: cardinal;
begin
s:='abcd';
for i:=0 to $FFFFFF do
test(s);
end.
Even with s:='' it's about 30% faster.
git-svn-id: trunk@22035 -
As fpc_ansistr_decr_ref is a very often called procedure in typical
pascal programs this optimized version will shave off some cycles
compared to the generic one.
It tries to avoid load latencies as much as possible and also uses the
new Z-flag functionality of the InterlockedDecrement from the previous
patch. Also FreeMem is called as a tail-function.
git-svn-id: trunk@22034 -
Use movs instead of mov when setting the result in r0. This way the Z
flag will be set for the calling function which might allow some smaller
optimizations later on. It does not affect current code in any way,
because flags are not expected to be used across function calls.
git-svn-id: trunk@22033 -
We're currently using rev for armv6+, but FPC 2.6 could not handle the
instruction. So if somebody wants to build trunk it can't be for armv6+.
We'll circumvent the problem by always using the the generic code when
build with FPC 2.6.
git-svn-id: trunk@22003 -
This adds some small improvements to Move_pld and Move_blended.
1.) Overlapping memory is handled as "unusual" and the code is placed at
the end of the function for better icache/bpu performance
2.) Fused the overlap check into 3 instructions with a single jump
instead of 5 instructions with 2 jumps.
2.) Use ldmia/stmia with 2 registers instead of ldr/str for faster
copying.
3.) Some code cleanup
git-svn-id: trunk@21992 -
The new version is more optimized to the "common case"
We assume most of the data will be aligned, thats why the unaligned
case has been moved to the end of the function so the aligned case is
more cache- and pipeline friendly.
I've also reduced the loop unrolling for the block transfer loop,
because for large blocks we'll most likely hit the write buffer limit
anyway.
I've did some measurements. The new routine is a bit slower for less
than 8 bytes, but beats the old one by 10-15% with 8 bytes++
git-svn-id: trunk@21760 -
get_caller_frame, get_caller_addr and dump_stack
with default NIL value to systemh.inc.
+ Added new get_addr function.
system.inc: Use get_addr and get_frame to call
HandleErrorAddrFrame instead of HandleErrorFrame
in several error functions.
Modify dump_stack to use frame and addr parameters.
Provide a dummy get_addr function returning nil.
i386/i386.inc, x86_64./x86_64.inc: Provide real
implementation of get_addr function.
git-svn-id: trunk@21697 -