paweld/fpc - fpc - brudnopis.ovh

paweld/fpc

mirror of https://gitlab.com/freepascal.org/fpc/source.git synced 2025-04-10 17:57:58 +02:00

Author	SHA1	Message	Date
Jeppe Johansen	3ee29eb219	Fixed ARMv7-EM code generation and RTL compilation Added LM4F120H5 controller type and startup code git-svn-id: branches/laksen/arm-embedded@22903 -	2012-11-01 17:25:01 +00:00
Jeppe Johansen	4e84431dde	Fix some optimizations which assume that there are 3 operands Add simple Mul+Sub/Mul+Add into MLS/MLA optimizations Fix some other small issues in the optimizer Implement Interlocked* functions with proper use of LDREX/STREX git-svn-id: branches/laksen/arm-embedded@22801 -	2012-10-21 16:20:52 +00:00
Jeppe Johansen	a8f9b0dac4	Added initial support for the Cortex-M4F FPv4_S16 FPU git-svn-id: branches/laksen/arm-embedded@22597 -	2012-10-08 20:10:45 +00:00
florian	1bb6248186	* disable hand optimized assembler for arm thumb2 as well git-svn-id: trunk@22313 -	2012-09-04 14:29:52 +00:00
florian	86a6cee8fa	- removed due to BSD license header git-svn-id: trunk@22286 -	2012-09-02 20:46:58 +00:00
Jonas Maebe	2dbe48a76c	* extra "addr" parameters for get_caller_addr/frame (patch by Jeppe Johansen, mantis #22727) git-svn-id: trunk@22252 -	2012-08-26 19:07:36 +00:00
masta	13e2572140	Remove unnecessary compiler version checks in rtl/arm/arm.inc The CPUARM_HAS_* flags are never defined in 2.6, so there is no reason to check for the compiler version. git-svn-id: trunk@22128 -	2012-08-19 15:51:44 +00:00
Jonas Maebe	c29e6bbcb8	* disabled assembler implementations of fpc_ansistr_decr_ref/ fpc_ansistr_incr_ref for Darwin/ARM: they don't follow the Darwin/ARM ABI for function calls, the code already contains enough ifdefs and I don't want to spend time on maintaining OS-specific assembler implementations git-svn-id: trunk@22121 -	2012-08-19 09:37:07 +00:00
florian	312984cb4f	* ifdef blx InterlockedExchange correctly git-svn-id: trunk@22117 -	2012-08-17 20:30:19 +00:00
masta	6729164fcc	Work around load latency in InterlockedExchange for ARM An LDR will have two load latency cycles on most ARM implementations, moving the mov r4, r0 two instructions away from the corresponding ldr will avoid the stalls. git-svn-id: trunk@22107 -	2012-08-17 12:42:49 +00:00
florian	e353222a8a	* if the selected cpu type supports pld, provide and use only the pld variant git-svn-id: trunk@22105 -	2012-08-17 10:37:36 +00:00
tom_at_work	38226169a9	Make use of "blx" instruction in fpc_ansistr_dec_ref conditional on CPUARM_HAS_BX, otherwise just use the "bl" instruction. Bug introduced in r22035. Fixes bug report 22632. git-svn-id: trunk@22102 -	2012-08-17 08:28:08 +00:00
florian	e6efbd36ad	* compiler defined cpuflags instead of creating them during system unit compilation git-svn-id: trunk@22091 -	2012-08-15 15:49:11 +00:00
tom_at_work	f252fd369e	Tried to reorganize the ARM define mess in rtl/arm/arm.inc. Instead of requiring to enumerate all possible ARM variants each time a CPU feature is used, add a define of the format CPUARM_HAS_XXX and use that. Note that a better solution would be to properly implement the compiler cpuinfo infrastructure, however that is much more work. + CPUARM_HAS_BX is defined if the CPU supports the BX* instruction + CPUARM_HAS_REV is defined if the CPU supports the REV instruction. Note that you still have to check for compiler versions > 2.6.0 since the assembler reader of 2.6.0 does not understand that instruction. + CPUARM_HAS_IDIV is defined if the CPU supports the sdiv, udiv instructions. Use of this fixes a bug where previously these instruction were only used for armv7-m, while cortex3m cpus also support it. + CPUARM_HAS_LDREX is defined if the CPU supports the ldrex/strex instructions. Use of this fixes a bug with armv7(-a) cpus where this path has not been used. + SYSTEM_HAS_KUSER_CMPXCHG is defined if the system (mainly OS) support the kuser_cmpxchg functions. Use of this fixes a bug where ARMHF systems did not use it for synchronization (although ARMHF is armv7+ only, i.e. the LDREX path is used anyway) git-svn-id: trunk@22081 -	2012-08-14 19:45:03 +00:00
tom_at_work	9a82fb9eb4	Fix InterlockedExchange for non-armv6+ ARMV processors. Original InterlockedExchange was not atomic in regards to the other Interlocked* functions, leading to crashes if they were used. Instead of directly using "swp" in InterlockedExchange, use - kuser_cmpxchg if available (on Linux/armel) - the fpc global mutex (fpc_system_lock) otherwise to implement it. git-svn-id: trunk@22062 -	2012-08-11 19:32:11 +00:00
florian	2fc350eabd	* the reference counter offset depends only on the current rtl, not the compiler version, so no ifdef needed git-svn-id: trunk@22038 -	2012-08-08 18:59:19 +00:00
masta	51af7bd440	Assembly version of fpc_ansistr_incr_ref for ARM Optimized to minimize load latency and icache usage. Together with the previous fpc_ansistr_decr_ref optimization this little test programm runs about 40% faster. program stringspeed; procedure test(s:string); begin end; var s:string; i: cardinal; begin s:='abcd'; for i:=0 to $FFFFFF do test(s); end. Even with s:='' it's about 30% faster. git-svn-id: trunk@22035 -	2012-08-08 15:29:26 +00:00
masta	b9770519f8	Assembly version of fpc_ansistr_decr_ref for ARM As fpc_ansistr_decr_ref is a very often called procedure in typical pascal programs this optimized version will shave off some cycles compared to the generic one. It tries to avoid load latencies as much as possible and also uses the new Z-flag functionality of the InterlockedDecrement from the previous patch. Also FreeMem is called as a tail-function. git-svn-id: trunk@22034 -	2012-08-08 06:44:31 +00:00
masta	25e2f5f3fa	Small improvement to InterlockedExchange on ARM Use movs instead of mov when setting the result in r0. This way the Z flag will be set for the calling function which might allow some smaller optimizations later on. It does not affect current code in any way, because flags are not expected to be used across function calls. git-svn-id: trunk@22033 -	2012-08-08 06:44:26 +00:00
masta	e4a719fcff	Fix ARM SwapEndian on armv6+ for compilation with FPC 2.6 We're currently using rev for armv6+, but FPC 2.6 could not handle the instruction. So if somebody wants to build trunk it can't be for armv6+. We'll circumvent the problem by always using the the generic code when build with FPC 2.6. git-svn-id: trunk@22003 -	2012-08-03 22:38:07 +00:00
florian	291157330e	* fix setjump for arm<=armv5 with vfp git-svn-id: trunk@22002 -	2012-08-03 22:04:22 +00:00
masta	2e0203b7a2	Improved Move implementation on ARM This adds some small improvements to Move_pld and Move_blended. 1.) Overlapping memory is handled as "unusual" and the code is placed at the end of the function for better icache/bpu performance 2.) Fused the overlap check into 3 instructions with a single jump instead of 5 instructions with 2 jumps. 2.) Use ldmia/stmia with 2 registers instead of ldr/str for faster copying. 3.) Some code cleanup git-svn-id: trunk@21992 -	2012-08-01 11:15:20 +00:00
masta	f354651180	Fix ARM FPU Exceptions for WinCE r21952 introduced wrong code (through copy&waste) for the wince exception-setup routines. This patch hopefully fixes the code again. git-svn-id: trunk@21961 -	2012-07-23 22:58:02 +00:00
masta	386738a7c3	Fix ARM FPU exception masks This corrects the handling of exception masks and ARM VFP implementations. The old code enable the exception when it was present in the mask. So in fact it did the contrary of what it was supposed to do. VFP-Support is currently broken, this patch at least allows to build a working VFP-native compiler. But the full build still breaks because of some compiler options not properly beeing passed down to packages/ which results in: "Trying to use a unit which was compiled with a different FPU mode" because somehow OPT="-Cfvfpv2" did not get passed down. git-svn-id: trunk@21952 -	2012-07-23 07:26:57 +00:00
masta	64c122100f	Small optimizations to FillChar for ARM The new version is more optimized to the "common case" We assume most of the data will be aligned, thats why the unaligned case has been moved to the end of the function so the aligned case is more cache- and pipeline friendly. I've also reduced the loop unrolling for the block transfer loop, because for large blocks we'll most likely hit the write buffer limit anyway. I've did some measurements. The new routine is a bit slower for less than 8 bytes, but beats the old one by 10-15% with 8 bytes++ git-svn-id: trunk@21760 -	2012-07-02 23:54:19 +00:00
pierre	8469741700	+ Added additional addr pointer parameter to get_caller_frame, get_caller_addr and dump_stack with default NIL value to systemh.inc. + Added new get_addr function. system.inc: Use get_addr and get_frame to call HandleErrorAddrFrame instead of HandleErrorFrame in several error functions. Modify dump_stack to use frame and addr parameters. Provide a dummy get_addr function returning nil. i386/i386.inc, x86_64./x86_64.inc: Provide real implementation of get_addr function. git-svn-id: trunk@21697 -	2012-06-24 21:22:09 +00:00
masta	c5fbe3bb3b	Use bx lr in ARM-RTL for armv5 ARMv5 supports the BX instruction. BX usually is better supported by Branch Prediction Units than mov pc,lr. git-svn-id: trunk@21649 -	2012-06-18 16:59:39 +00:00
masta	c5d7ae513a	ARM assembly versions of strupper and strlower This is about 1/3 faster than the generic code. git-svn-id: trunk@21648 -	2012-06-18 16:59:34 +00:00
florian	2a2a1e5788	* patch by Nico Erfurth: Optimize SwapEndian for ARM The new version uses a pure pascal version for the 32bit case. With the lastest compiler optimizations this generates optimal 4-instruction code which can be inlined. The rev-versions for armv6+ are gone now, the inlineable pascal-code is faster than the call-overhead for the rev-implementation. The 64-bit versions received an updated assembly version which saves 4 cycles total on <armv6. git-svn-id: trunk@21511 -	2012-06-06 19:46:06 +00:00
florian	c39d12a618	* fix longjmp for -Cparmv7m, resolves #22014 git-svn-id: trunk@21311 -	2012-05-15 18:56:27 +00:00
florian	df0201799e	o patch by Nico Erfurth: Support Assembly optimized functions of SwapEndian on ARM Currently the ARM-Port uses generic functions for SwapEndian, which are relativly slow. This patch adds optimized functions for the 32 and 64-bit case, the 16 bit case is still handled with a normal function, while the generated code is far from optimal, the inlining (which is not possible with asm-functions) makes it faster than the optimized function. Some Numbers from my 1.2GHz Kirkwood (ARMv5): Old New Result SwapEndian(Integer) 12.168s 5.411s 44.47% SwapEndian(Int64) 168.28s 9.015s 5.36% Testcode was begin I := $FFFFFFF; while I > 0 do begin Val2 := MySwapEndian(Val); Dec(I); end; end. Currently only the ARM implementation is tested. ARMv6+ includes a rev instruction, while I've implemented them, I was not able to test them. git-svn-id: trunk@20685 -	2012-04-01 17:31:49 +00:00
Jonas Maebe	bba4b02eb2	* use r7 instead of r11 as frame pointer on Darwin/iOS, and make sure r7 always points to the previous r7 on the stack (with the saved return address coming right after it) so that the debugger and crashreporter can use it for backtraces as specified in the ABI o changed NR_FRAME_POINTER_REG and RS_FRAME_POINTER_REG from a symbolic into a typed constant, and added a new method to tprocinfo that can be used to initialze it (so it can be inited to r7/r11 depending on the target platform) * allow using r9 on Darwin, it was only used by the system on iOS up to 2.x, which we no longer support * prefer using r9 and r12 before r4..r11 on Darwin, because they are volatile and hence do not have to be saved git-svn-id: trunk@20661 -	2012-03-29 20:54:33 +00:00
Jonas Maebe	6ba8dc7146	+ support for the ARM hard float EABI on Linux (patch by Peter Green): o new eabihf (hard float) abi o vfpv3_d16 variant of VFP (default variant used by EABI assemblers: VFPv3 with only 16 double registers instead of 32) and pass it to GNU as o make the odd numbered single precision floating point VFP registers available for explicit allocation for use by the calling convention * fixed copy/paste error in stdname of S30 register -> use -dFPC_ARMHF to create an ARM eabi hard float compiler (mantis #21554) git-svn-id: trunk@20660 -	2012-03-29 20:50:09 +00:00
florian	e9c5458dd2	o patch by Nico Erfurth: * Fix for InterLockedCompareExchange on ARMEL InterLockedCompareExchange would not return the current data on failure. Getting this to work correctly is a bit tricky. As kuser_cmpxchg does not return the set value, we have to load it. There is a tiny chance that we get rescheduled between calling kuser_cmpxchg and loading the value. If the value changed in between there is the possibility that we would return the Comperand without having done an actual swap. Which might cause havoc and destruction. So, if the exchange failed, compare the value and loop again in case of CurrentValue == Comperand. * Improve testing of InterLockedCompareExchange Added a test to check for the case when Comperand is different from the current value. git-svn-id: trunk@20514 -	2012-03-11 21:08:57 +00:00
florian	891d7b9349	* comitted wrong patch in r20491, fixed with this revision git-svn-id: trunk@20510 -	2012-03-11 07:38:21 +00:00
florian	18866623cd	o patch by Nico Erfurth: Optimize some ARM-RTL functions Use "nostackframe" for: - Sptr (broken without nostackframe) - get_caller_addr - get_caller_frame Use cmp+ldrne instead of movs+beq+ldr, its a bit more pipeline-friendly and takes burden of the BPU. git-svn-id: trunk@20506 -	2012-03-10 21:52:06 +00:00
florian	5b03826549	o patch by Nico Erfurth: Better Locked* implementation for arm on linux The following functions where changed to make use of the kernel helper kuser_cmpxchg: InterLockedDecrement InterLockedIncrement InterLockedExchangeAdd InterLockedCompareExchange The previous implementation using a spinlock had a couple of drawbacks: 1.) The functions could not be used safely on values not completly managed by the process itself, because the spinlock did not protect data but the functions. For example, think about two processes using shared memory. They would not be able to share fpc_system_lock, making it unsafe to use these functions. 2.) With many active threads, there was a high chance that the scheduler would interrupt a thread while fpc_system_lock was taken, which would result in the following threads using one of these functions to spinlock till the end of its timeslice. This could result in unwanted and unnecessary latencies. 3.) Every function contained a pointer to fpc_system_lock. Resulting in two polluted DCache-Lines per call and possible latencies through dcache misses. The new implementation only works on Linux Kernel >= 2.6.16 The functions are implemented in a way which tries to minimize cache pollution and load latencies. Even without Multithreading the new functions are a lot faster. I've did comparisons on my Kirkwood 1.2GHz with the following template code: var X: longint; begin X := 0; while X < longint(100*1000000) do FUNCTION(X); Writeln(X); end. Function New Old InterLockedIncrement: 0m3.696s 0m23.220s InterLockedExchangeAdd: 0m4.034s 0m23.242s InterLockedCompareExchange: 0m4.703s 0m24.006s This speedup is most probably because of the reduced memory access, which resulted in lots of cache misses. git-svn-id: trunk@20491 -	2012-03-10 11:33:20 +00:00
florian	5fa184c952	+ patch by Jeppe Johansen to make use of the div/udiv instruction on arm7m, resolves #20022 * explicitly make symbol addressing PC relative git-svn-id: trunk@19221 -	2011-09-24 21:41:01 +00:00
sergei	4ebc34c5e7	* Promoted result type of FPC_PCHAR_LENGTH and FPC_PWIDECHAR_LENGTH to SizeInt. + Check for nil pointer in FPC_PWIDECHAR_LENGTH git-svn-id: trunk@17733 -	2011-06-13 04:59:17 +00:00
florian	8bff2a0de4	* patch by Jeppe Johansen to fix thumb2 epilog generation, resolves #18392 git-svn-id: trunk@17252 -	2011-04-05 19:25:20 +00:00
florian	0e74cea8ed	* patch by Simon Ley to improve move on arm: unneeded plds are removed, resolves #19050 git-svn-id: trunk@17251 -	2011-04-05 18:44:10 +00:00
Jonas Maebe	780e75bfac	o patch by Jeppe Johansen to fix mantis #17472 : * generate add.w instead of add for thumb-2 in case one of the registers is > r8 * add register interferences for the "add" instruction so the register allocator can detect invalid instruction forms (even for assembler code) * fixed error in thumb2.inc detected by the previous change git-svn-id: trunk@16633 -	2010-12-24 15:54:39 +00:00
Jonas Maebe	c14574bb56	* don't change the fpu control word in the initialisation code of dynamic libraries (mantis #16263, #16801) git-svn-id: trunk@16347 -	2010-11-14 16:00:25 +00:00
florian	24fea58b92	+ initial implementation of iso style gotos in iso mode * made setjmp/longjmp accessible to the compiler by compiler proc, they are used by the iso goto code git-svn-id: trunk@15711 -	2010-08-05 19:20:46 +00:00
florian	3aa1315c06	* thumb2 opcode fixes by Jeppe Johansen, resolves #16306 git-svn-id: trunk@15154 -	2010-04-21 17:40:35 +00:00
Jonas Maebe	fbebd87593	* use BLX instead of "mov r14, r15; mov r15, reg" for a_call_reg on ARMv6 and above, so this also works when calling thumb code (should actually also be done for ARMv5T, but we don't have a monicker for that yet) * use BX instead of "mov r15, r14" for simple returns from subroutines on ARMv6+ to support returning to thumb code from ARM code (idem) git-svn-id: trunk@14332 -	2009-12-04 22:38:50 +00:00
Jonas Maebe	91fc26a530	* the bits in the VFP fpscr don't mask exceptions, but enable them (was used correctly in fpu init code in arm.inc, but inverted in setexcetionmask logic) git-svn-id: trunk@14328 -	2009-12-04 19:54:35 +00:00
Jonas Maebe	d1538ab023	o added ARM VPFv2/VFPv3 support: + RTL support: o VFP exceptions are disabled by default on Darwin, because they cause kernel panics on iPhoneOS 2.2.1 at least o all denormals are truncated to 0 on Darwin, because disabling that also causes kernel panics on iPhoneOS 2.2.1 (probably because otherwise denormals can also cause exceptions) * set softfloat rounding mode correctly for non-wince/darwin/vfp targets + compiler support: only half the number of single precision registers is available due to limitations of the register allocator + added a number of comments about why the stackframe on ARM is set up the way it is by the compiler + added regtype and subregtype info to regsets, because they're also used for VFP registers (+ support in assembler reader) + various generic support routines for dealing with floating point values located in integer registers that have to be transferred to mm registers (needed for VFP) * renamed use_sse() to use_vectorfpu() and also use it for ARM/vfp support o only superficially tested for Linux (compiler compiled with -Cpvfpv6 -Cfvfpv2 works on a Cortex-A8, no testsuite run performed -- at least the fpu exception handler still needs to be implemented), Darwin has been tested more thoroughly + added ARMv6 cpu type and made it default for Darwin/ARM + ARMv6+ implementations of atomic operations using ldrex/strex * don't use r9 on Darwin/ARM, as it's reserved under certain circumstances (don't know yet which ones) * changed C-test object files for ARM/Darwin to ARMv6 versions * check in assembler reader that regsets are not empty, because instructions with a regset operand have undefined behaviour in that case * fixed resultdef of tarmtypeconvnode.first_int_to_real in case of int64->single type conversion * fixed constant pool locations in case 64 bit constants are generated, and/or when vfp instructions with limited reach are present WARNING: when using VFP on an ARMv6 or later cpu, you must compile all code with -Cparmv6 (or higher), or you will get crashes. The reason is that storing/restoring multiple VFP registers must happen using different instructions on pre/post-ARMv6. git-svn-id: trunk@14317 -	2009-12-03 22:46:30 +00:00
florian	515774b864	* merged armthum branch -- Zusammenführen der Unterschiede zwischen Projektarchiv-URLs in ».«: U rtl/arm/setjump.inc A rtl/arm/thumb2.inc U rtl/arm/divide.inc A rtl/embedded/arm/stm32f103.pp U rtl/inc/system.inc U compiler/alpha/cgcpu.pas U compiler/sparc/cgcpu.pas U compiler/i386/cgcpu.pas U compiler/ncgld.pas U compiler/powerpc/cgcpu.pas U compiler/avr/cgcpu.pas U compiler/aggas.pas U compiler/powerpc64/cgcpu.pas U compiler/x86_64/cgcpu.pas U compiler/cgobj.pas U compiler/psystem.pas U compiler/aasmtai.pas U compiler/m68k/cgcpu.pas U compiler/ncgutil.pas U compiler/rautils.pas U compiler/arm/raarmgas.pas U compiler/arm/armatts.inc U compiler/arm/cgcpu.pas U compiler/arm/armins.dat U compiler/arm/rgcpu.pas U compiler/arm/cpubase.pas U compiler/arm/agarmgas.pas U compiler/arm/cpuinfo.pas U compiler/arm/armop.inc U compiler/arm/narmadd.pas U compiler/arm/aoptcpu.pas U compiler/arm/armatt.inc U compiler/arm/aasmcpu.pas U compiler/systems/t_embed.pas U compiler/psub.pas U compiler/options.pas git-svn-id: trunk@13801 -	2009-10-04 09:03:44 +00:00
Jonas Maebe	22aacd2a60	* return 0 for length(pchar(0)), like Kylix does (using corrected and multi-platform version of patch in r12461, which caused the i386 version of fpc_pchar_length to return 0 in all cases, which used tabs, and did not include a test case) git-svn-id: trunk@12464 -	2009-01-01 22:02:17 +00:00

1 2 3

123 Commits