Commit Graph

584 Commits

Author SHA1 Message Date
Rika Ichinose
ce6db34224 Shortcut Compare*(a, a) before entering the aligned loop. 2025-03-29 22:07:03 +01:00
Rika Ichinose
ff2492edf5 Add System.UMul64x64_128. 2025-03-15 22:18:55 +01:00
Rika Ichinose
4f92679625 BMI1 → BMI2. 2025-03-13 01:02:15 +03:00
Rika Ichinose
900b1fc4ec Check for refcount = 1 first. 2025-02-16 15:17:48 +03:00
Rika Ichinose
6ccad3dc4e Shortcut declocked on refcount = 1. 2025-01-31 22:03:25 +00:00
Rika Ichinose
94a1f33f60 Shorten i386 and x86-64 atomic implementations to offset the LoC cost of the previous commit. 2024-12-19 19:42:25 +00:00
Rika Ichinose
bb43afd26d Add more specialized atomics for i386 and x86-64. 2024-12-19 19:42:25 +00:00
Sven/Sarah Barth
e94d02a067 * with all existing RTLs switched over to the atomic intrinsics, the define FPC_SYSTEM_INTERLOCKED_USE_INTRIN can be removed again 2024-12-12 22:05:20 +01:00
Sven/Sarah Barth
295d3f0969 * switch i386 RTL to provide the atomic intrinsics instead of Interlocked* functions 2024-12-12 22:05:16 +01:00
florian
e471c08cf8 + SHA512Support 2024-12-07 11:10:34 +01:00
Rika Ichinose
d1db5d2104 Darwin: re-enable new assembler fill*word variants
Work around with an extra jump to an extra function.
2024-11-23 19:06:47 +03:00
Jonas Maebe
28e9ebc7da Darwin: disable new assembler fill*word variants
They use interprocedural gotos at the assembler level, which is incompatible
with auto-generated CFI
2024-11-20 21:39:19 +01:00
Rika Ichinose
6e655eb5a3 Remove fpc_varset_* indirections if SSE support is guaranteed. 2024-10-27 08:16:25 +00:00
florian
54dcfa78f8 * cleanup 2024-10-26 20:32:14 +02:00
Rika Ichinose
aed4292017 SSE set operations (i386). 2024-10-26 15:48:17 +00:00
Rika Ichinose
9917350ef0 AVX2 CompareByte for i386. 2024-09-23 20:10:57 +00:00
Rika Ichinose
fc1050a834 Make use of CPUX86_HINT_BSX_DEST_UNCHANGED_ON_ZF_1 in Bsf*/Bsr*. 2024-09-22 08:33:44 +00:00
Rika Ichinose
d7352e7b66 Remove most of the VER3_0 conditionals. 2024-08-25 09:44:11 +00:00
Rika Ichinose
ea33fdcdf8 Decimate rtl/i386/strings.inc. 2024-08-19 20:34:10 +00:00
Rika Ichinose
ca0e04a346 Faster path for IndexBytes with a match at the beginning. 2024-08-19 20:15:54 +00:00
Rika Ichinose
1030f67fb4 Use IndexQWord_SSE41 directly if -Cp RTL compiled with supports SSE 4.1. 2024-07-21 08:40:12 +00:00
Rika Ichinose
8bf2dc3f2b Simplify CPU units (70 LoC + 500 b code + 500 b data). 2024-07-18 20:13:11 +00:00
Rika Ichinose
a575a5c0fd Move Int128Rec to System; remove i386 and x86_64 CPU unit dependency on SysUtils. 2024-07-15 13:31:20 +00:00
Rika Ichinose
73bf0c82bb Disable _Plain versions when compiling RTL for newer CPUs. 2024-07-14 14:36:17 +00:00
florian
9d957cd6b3 * fix TSC support bit as mentioned by Rika 2024-07-01 22:26:03 +02:00
florian
bdef7af09e * corrected rte number after last merge 2024-06-30 15:25:08 +02:00
Rika Ichinose
ea271c1088 Make int64 division helpers “nostackframe”. 2024-06-30 12:51:40 +00:00
Rika Ichinose
0ca608243c SSE4.1 IndexQWord for i386 and x86-64. 2024-06-29 20:37:55 +00:00
florian
567187d4ba + TSCSupport 2024-06-29 22:32:36 +02:00
florian
a0cae50af6 * rtl part of #35433 2024-05-01 23:15:12 +02:00
Rika Ichinose
0655b342d4 Shorter IndexByte_Plain. 2024-04-26 18:35:47 +00:00
Rika Ichinose
b87e22151a Use non-conservative Fill thresholds. 2024-04-22 19:37:36 +00:00
Rika Ichinose
bad42011ab Better i386.inc:fpc_ansistr_unique. 2024-04-08 18:47:18 +00:00
Rika Ichinose
e42209457e Shorter i386 Exp(). 2024-03-29 21:04:32 +00:00
florian
22e9033076 + MMXSupport added to cpu unit
* mmx unit makes more use of cpu unit
2024-03-28 10:42:08 +01:00
Rika Ichinose
6c8acf28cd Shorten MMX unit. 2024-03-28 09:13:43 +00:00
Rika Ichinose
a35577593b Don’t misalign FillChar pattern. 2024-03-10 21:24:21 +00:00
Rika Ichinose
e87e14c7cc Make some i386.inc functions “nostackframe”. 2024-02-21 21:08:36 +00:00
Rika Ichinose
e5b47310c8 Supposedly faster i386 int() and frac(). 2024-02-18 21:37:39 +00:00
Rika Ichinose
0b5998ee8b Write two last values after 2× loops unconditionally instead of an extra check. 2024-02-10 22:47:40 +00:00
Rika Ichinose
e395166cb7 Check for Move overlaps in more obvious way (that also does no jumps in forward case). 2024-02-10 22:47:40 +00:00
Rika Ichinose
e4a0b1adb4 Use ERMS in all eligible cases, again.
Namely, when Move.count > NtThreshold but move distance is too short. 8310b169b7 messed with the logic and made this case fall back to a regular loop instead of more preferable ERMS.
2024-01-11 21:51:44 +00:00
Rika Ichinose
35345fe145 Fix FillQWord_SSE2 stack usage. 2024-01-06 21:18:56 +00:00
Rika Ichinose
9d8b801e4c Improve i386 fpc_shortstr_to_shortstr(), fpc_shortstr_compare(), and add fpc_shortstr_compare_equal(). 2024-01-01 21:12:52 +00:00
Rika Ichinose
0d5f7fa66b Increase non-temporal i386 & x64 Fill* thresholds to 4 Mb. 2024-01-01 18:33:33 +00:00
Rika Ichinose
b7d32e4933 ERMSB-aware Fill* for i386. 2024-01-01 18:33:33 +00:00
Rika Ichinose
8310b169b7 Move ERMS branch into a separate function instead of runtime checks of fast_large_repmovstosb. 2023-12-31 09:54:09 +00:00
Rika Ichinose
f14aced9c5 Attempt to ERMS backward i386 ‘Move’s. 2023-12-31 09:54:09 +00:00
Rika Ichinose
ecc56d7e68 Attempt to save push/pop ebx on small non-GPR moves. 2023-12-10 13:26:39 +00:00
Rika Ichinose
0750777fc8 Supposedly better fastmove.inc. 2023-12-10 13:26:39 +00:00