Use "nostackframe" for:
- Sptr (broken without nostackframe)
- get_caller_addr
- get_caller_frame
Use cmp+ldrne instead of movs+beq+ldr, its a bit more pipeline-friendly
and takes burden of the BPU.
git-svn-id: trunk@20506 -
The following functions where changed to make use of the kernel helper
kuser_cmpxchg:
InterLockedDecrement
InterLockedIncrement
InterLockedExchangeAdd
InterLockedCompareExchange
The previous implementation using a spinlock had a couple of drawbacks:
1.) The functions could not be used safely on values not completly managed
by the process itself, because the spinlock did not protect data but the
functions. For example, think about two processes using shared memory.
They would not be able to share fpc_system_lock, making it unsafe to use
these functions.
2.) With many active threads, there was a high chance that the scheduler
would interrupt a thread while fpc_system_lock was taken, which would
result in the following threads using one of these functions to spinlock till
the end of its timeslice. This could result in unwanted and unnecessary
latencies.
3.) Every function contained a pointer to fpc_system_lock. Resulting in
two polluted DCache-Lines per call and possible latencies through dcache
misses.
The new implementation only works on Linux Kernel >= 2.6.16
The functions are implemented in a way which tries to minimize cache pollution
and load latencies.
Even without Multithreading the new functions are a lot faster. I've did
comparisons on my Kirkwood 1.2GHz with the following template code:
var X: longint;
begin
X := 0;
while X < longint(100*1000000) do
FUNCTION(X);
Writeln(X);
end.
Function New Old
InterLockedIncrement: 0m3.696s 0m23.220s
InterLockedExchangeAdd: 0m4.034s 0m23.242s
InterLockedCompareExchange: 0m4.703s 0m24.006s
This speedup is most probably because of the reduced memory access,
which resulted in lots of cache misses.
git-svn-id: trunk@20491 -
* generate add.w instead of add for thumb-2 in case one of the registers
is > r8
* add register interferences for the "add" instruction so the register
allocator can detect invalid instruction forms (even for assembler code)
* fixed error in thumb2.inc detected by the previous change
git-svn-id: trunk@16633 -
and above, so this also works when calling thumb code (should actually
also be done for ARMv5T, but we don't have a monicker for that yet)
* use BX instead of "mov r15, r14" for simple returns from subroutines
on ARMv6+ to support returning to thumb code from ARM code (idem)
git-svn-id: trunk@14332 -
+ RTL support:
o VFP exceptions are disabled by default on Darwin,
because they cause kernel panics on iPhoneOS 2.2.1 at least
o all denormals are truncated to 0 on Darwin, because disabling
that also causes kernel panics on iPhoneOS 2.2.1 (probably
because otherwise denormals can also cause exceptions)
* set softfloat rounding mode correctly for non-wince/darwin/vfp
targets
+ compiler support: only half the number of single precision
registers is available due to limitations of the register
allocator
+ added a number of comments about why the stackframe on ARM is
set up the way it is by the compiler
+ added regtype and subregtype info to regsets, because they're
also used for VFP registers (+ support in assembler reader)
+ various generic support routines for dealing with floating point
values located in integer registers that have to be transferred to
mm registers (needed for VFP)
* renamed use_sse() to use_vectorfpu() and also use it for
ARM/vfp support
o only superficially tested for Linux (compiler compiled with -Cpvfpv6
-Cfvfpv2 works on a Cortex-A8, no testsuite run performed -- at least
the fpu exception handler still needs to be implemented), Darwin has
been tested more thoroughly
+ added ARMv6 cpu type and made it default for Darwin/ARM
+ ARMv6+ implementations of atomic operations using ldrex/strex
* don't use r9 on Darwin/ARM, as it's reserved under certain
circumstances (don't know yet which ones)
* changed C-test object files for ARM/Darwin to ARMv6 versions
* check in assembler reader that regsets are not empty, because
instructions with a regset operand have undefined behaviour in that
case
* fixed resultdef of tarmtypeconvnode.first_int_to_real in case of
int64->single type conversion
* fixed constant pool locations in case 64 bit constants are generated,
and/or when vfp instructions with limited reach are present
WARNING: when using VFP on an ARMv6 or later cpu, you *must* compile all
code with -Cparmv6 (or higher), or you will get crashes. The reason is
that storing/restoring multiple VFP registers must happen using
different instructions on pre/post-ARMv6.
git-svn-id: trunk@14317 -
-- Zusammenführen der Unterschiede zwischen Projektarchiv-URLs in ».«:
U rtl/arm/setjump.inc
A rtl/arm/thumb2.inc
U rtl/arm/divide.inc
A rtl/embedded/arm/stm32f103.pp
U rtl/inc/system.inc
U compiler/alpha/cgcpu.pas
U compiler/sparc/cgcpu.pas
U compiler/i386/cgcpu.pas
U compiler/ncgld.pas
U compiler/powerpc/cgcpu.pas
U compiler/avr/cgcpu.pas
U compiler/aggas.pas
U compiler/powerpc64/cgcpu.pas
U compiler/x86_64/cgcpu.pas
U compiler/cgobj.pas
U compiler/psystem.pas
U compiler/aasmtai.pas
U compiler/m68k/cgcpu.pas
U compiler/ncgutil.pas
U compiler/rautils.pas
U compiler/arm/raarmgas.pas
U compiler/arm/armatts.inc
U compiler/arm/cgcpu.pas
U compiler/arm/armins.dat
U compiler/arm/rgcpu.pas
U compiler/arm/cpubase.pas
U compiler/arm/agarmgas.pas
U compiler/arm/cpuinfo.pas
U compiler/arm/armop.inc
U compiler/arm/narmadd.pas
U compiler/arm/aoptcpu.pas
U compiler/arm/armatt.inc
U compiler/arm/aasmcpu.pas
U compiler/systems/t_embed.pas
U compiler/psub.pas
U compiler/options.pas
git-svn-id: trunk@13801 -
multi-platform version of patch in r12461, which caused the i386 version
of fpc_pchar_length to return 0 in all cases, which used tabs, and did
not include a test case)
git-svn-id: trunk@12464 -
+ int_str assembler implementations for i386
+ fpc_shortstr_to_shortstr assembler implementation for ARM
+ fpc_shortstr_assign assembler implementation for ARM
+ fpc_Pchar_length assembler implementation for ARM
git-svn-id: trunk@9582 -