Commit Graph

256 Commits

Author SHA1 Message Date
Rika Ichinose
ce6db34224 Shortcut Compare*(a, a) before entering the aligned loop. 2025-03-29 22:07:03 +01:00
Jonas Maebe
91667644f4 fpc_cpuinit: add destroyed register lists to assembler blocks
Otherwise the compiler assumes no registers are overwritten. And while the
regular code generator won't use register variables if assembler blocks are
present, LLVM is not restricted like that (and it could still cause issues
even with the default code generator in case PIC-rebased addresses are
accessed).
2025-03-24 23:02:10 +01:00
Rika Ichinose
ff2492edf5 Add System.UMul64x64_128. 2025-03-15 22:18:55 +01:00
Rika Ichinose
94a1f33f60 Shorten i386 and x86-64 atomic implementations to offset the LoC cost of the previous commit. 2024-12-19 19:42:25 +00:00
Rika Ichinose
bb43afd26d Add more specialized atomics for i386 and x86-64. 2024-12-19 19:42:25 +00:00
Sven/Sarah Barth
e94d02a067 * with all existing RTLs switched over to the atomic intrinsics, the define FPC_SYSTEM_INTERLOCKED_USE_INTRIN can be removed again 2024-12-12 22:05:20 +01:00
Sven/Sarah Barth
ba7e87aff3 * switch x86_64 RTL to provide the atomic intrinsics instead of Interlocked* functions 2024-12-12 22:05:16 +01:00
florian
e471c08cf8 + SHA512Support 2024-12-07 11:10:34 +01:00
florian
73e96f8f1e * simplify SysResetFPU 2024-12-06 21:21:02 +01:00
florian
ccae78f97a + RiscV64: apply OptPass1OP also to addiw 2024-11-13 22:56:13 +01:00
florian
54dcfa78f8 * cleanup 2024-10-26 20:32:14 +02:00
Rika Ichinose
aed4292017 SSE set operations (i386). 2024-10-26 15:48:17 +00:00
Alligator-1
00d5351b55 partial revert 2024-08-26 20:20:57 +00:00
Alligator-1
8c3829e698 nostackframe 2024-08-26 13:02:45 +00:00
Rika Ichinose
d7352e7b66 Remove most of the VER3_0 conditionals. 2024-08-25 09:44:11 +00:00
Rika Ichinose
ca0e04a346 Faster path for IndexBytes with a match at the beginning. 2024-08-19 20:15:54 +00:00
Rika Ichinose
1030f67fb4 Use IndexQWord_SSE41 directly if -Cp RTL compiled with supports SSE 4.1. 2024-07-21 08:40:12 +00:00
Rika Ichinose
8bf2dc3f2b Simplify CPU units (70 LoC + 500 b code + 500 b data). 2024-07-18 20:13:11 +00:00
Rika Ichinose
a575a5c0fd Move Int128Rec to System; remove i386 and x86_64 CPU unit dependency on SysUtils. 2024-07-15 13:31:20 +00:00
Rika Ichinose
0ca608243c SSE4.1 IndexQWord for i386 and x86-64. 2024-06-29 20:37:55 +00:00
florian
567187d4ba + TSCSupport 2024-06-29 22:32:36 +02:00
florian
a0cae50af6 * rtl part of #35433 2024-05-01 23:15:12 +02:00
Rika Ichinose
b87e22151a Use non-conservative Fill thresholds. 2024-04-22 19:37:36 +00:00
florian
11f076f0e7 + CMPXCHG16BSupport 2024-02-28 22:18:42 +01:00
Rika Ichinose
2d6294eb26 MovQ + Shr → PExtrW. 2024-02-18 21:37:39 +00:00
Rika Ichinose
c29dd86bb2 Remove runtime ABI adapter in x86_64.inc:IndexByte/Word, and save two jumps in the common case. 2024-02-11 15:05:03 +00:00
Rika Ichinose
7bf502ad40 Change Mov*DQ to Mov*PS; they are always equivalent because no operations but the memory transfers are performed, and 1 byte shorter each. 2024-02-10 22:47:40 +00:00
Rika Ichinose
12f18177ae Simplify x86_64.inc:Move non-temporal loops, and adjust thresholds for move distances considered too short for NT. 2024-02-10 22:47:40 +00:00
Rika Ichinose
0b5998ee8b Write two last values after 2× loops unconditionally instead of an extra check. 2024-02-10 22:47:40 +00:00
Rika Ichinose
e395166cb7 Check for Move overlaps in more obvious way (that also does no jumps in forward case). 2024-02-10 22:47:40 +00:00
Rika Ichinose
0d5f7fa66b Increase non-temporal i386 & x64 Fill* thresholds to 4 Mb. 2024-01-01 18:33:33 +00:00
Rika Ichinose
1ec0326995 REP STOS branch for x64 Fill* (only for System V ABI for now). 2023-11-26 15:06:59 +00:00
Rika Ichinose
a4c324ee23 Fill* for x64, physically sharing half of the code with FillChar. 2023-11-26 15:06:59 +00:00
Rika Ichinose
b468793c63 Index/Compare refined by hand instead of mostly being GCC output. 2023-11-21 22:32:16 +00:00
florian
b164817e18 * check also for XGETBV support, resolves problem reported by Pierre 2023-11-20 22:55:25 +01:00
florian
704ad21b23 + centralized cpu capability detection 2023-11-18 22:28:50 +01:00
Rika Ichinose
c07f36b30b Post-modern CompareByte for x86-64/SSE2. 2023-11-16 21:42:51 +00:00
Rika Ichinose
0bc1d8d446 Deny effective RTM support if CPUID bit RTM_ALWAYS_ABORT is set. 2023-11-01 17:10:14 +00:00
Rika Ichinose
e00ab51185 On i386 and x86_64, add cpu.CPUID — high-level wrapper to CPUID instruction, and cpu.CPUBrandString — convenience for CPUID leaves 80000002, 80000003, and 80000004. 2023-10-31 21:20:45 +03:00
Rika Ichinose
0e426db5de x86_64.inc: shorten Interlocked*, perform macro-fused test+jz in Index* early. 2023-10-25 21:05:21 +00:00
Rika Ichinose
2dca69f2ac Specialized fpc_varset_OP_sets for i386 and x86-64. 2023-08-30 19:38:33 +00:00
Michael VAN CANNEYT
ccfa38c68e * Dotted RTL compiles 2023-07-27 19:04:03 +02:00
Michael VAN CANNEYT
5ce739135b * Char -> AnsiChar 2023-07-14 17:26:10 +02:00
Rika Ichinose
669d41172c Fix UTF-8 symbols in comments. 2023-07-08 21:18:55 +00:00
Rika Ichinose
8d5d7b480d Supposedly faster Move for x64. 2023-07-08 21:18:55 +00:00
Rika Ichinose
f20c7b9ae9 Shorter x86_64.inc:inc/declocked. 2023-06-14 21:19:11 +00:00
Rika Ichinose
b56cbad50e Supposedly faster FillChar for x64. 2023-04-13 15:55:42 +00:00
Rika Ichinose
8e884d9acd Handle Index* / Compare* tail by directly reading last VECSIZE bytes, if there was at least one full vector. 2023-04-03 20:08:56 +00:00
florian
ee16fc7b96 * patch by Rika, trivial adjustments to !373, resolves #40172 2023-02-27 22:07:06 +01:00
Rika Ichinose
da12cfc867 Improved CompareWord for i386 and x86_64. 2023-02-25 22:52:38 +00:00