fix: harden reassociation barriers for fast-math#1276
fix: harden reassociation barriers for fast-math#1276DiamonDinoia wants to merge 1 commit intoxtensor-stack:masterfrom
Conversation
f9a9992 to
2f9a431
Compare
|
@serge-sans-paille this is also ready for review. |
3685ff0 to
c4dec73
Compare
|
I think I can simplify this a bit more. A is not needed anymore if I we go this route. |
6d6bc61 to
d571f2f
Compare
| template <class T> | ||
| XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept | ||
| { | ||
| #if XSIMD_WITH_INLINE_ASM |
There was a problem hiding this comment.
Shouldn't we make this empty if we're not under fast-math?
|
Well, if we are not under fast math this should essentially be a no op.
Also, fast math does not detect if only associative math is enabled . Which
also breaks it. Now it becames checking all flags of the compiler that can
reorder floating points.
…On Sunday, March 29, 2026, serge-sans-paille ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In include/xsimd/arch/common/xsimd_common_details.hpp
<https://urldefense.com/v3/__https://github.com/xtensor-stack/xsimd/pull/1276?email_source=notifications&email_token=ACGKNQMMCG3APXMEDVTJKC34TDCDPA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBSGY2DINRZGMZ2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K64DSL5ZGK5TJMV3V6Y3MNFRWW*discussion_r3005807424__;Iw!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKY2y44AK$>
:
> + // exists at each call site; it is unused at runtime.
+ //
+ // Two overloads:
+ // reassociation_barrier(reg, reason) – raw register
+ // reassociation_barrier(batch, reason) – extracts .data
+ //
+ // Uses the tightest register-class constraint for the target so
+ // the value stays in its native SIMD register (no spill):
+ // x86 (SSE/AVX/AVX-512) : "+x" – XMM / YMM / ZMM
+ // ARM (NEON / SVE) : "+w" – vector / SVE Z-reg
+ // PPC (VSX) : "+wa" – VS register
+ // other / MSVC : address + memory clobber (fallback)
+ template <class T>
+ XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept
+ {
+#if XSIMD_WITH_INLINE_ASM
Shouldn't we make this empty if we're not under fast-math?
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/xtensor-stack/xsimd/pull/1276?email_source=notifications&email_token=ACGKNQPGDROZDUYFFC3B7XL4TDCDPA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBSGY2DINRZGMZ2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2L24DSL5ZGK5TJMV3V63TPORUWM2LDMF2GS33OONPWG3DJMNVQ*pullrequestreview-4026446933__;Iw!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKYc70ZNS$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ACGKNQN7OHPQ2TPQJDWPYTL4TDCDPAVCNFSM6AAAAACWYMMMSGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DAMRWGQ2DMOJTGM__;!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKdMsdfC6$>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
Well, I don't think we can say that is a no-op. Should we allow for user-defined |
|
Let me think about this for a second. MSVC does not allow inline asm. I should update the comment there. There might be better alternatives for scalars, WASM, and RISC-V that don't impact performance. |
b34c94a to
d95d1d3
Compare
- Add zero-cost register constraints for all supported architectures: x86 "+x", ARM NEON/SVE "+w", PPC VSX "+wa", RISC-V scalar "+f", RISC-V RVV "+vr" (GCC 15+ / Clang 20+). - Replace old "r"(&x):"memory" fallback with "+m" guarded by new XSIMD_REASSOCIATIVE_MATH macro so unknown targets only spill when the compiler can actually reassociate. - Add XSIMD_REASSOCIATIVE_MATH config macro, auto-detected from __FAST_MATH__ / __ASSOCIATIVE_MATH__, user-overridable for Clang with standalone -fassociative-math. - Add std::array overload so emulated batches get per-element barriers instead of spilling the whole array. - Add missing barriers in exp/exp2/exp10 range reduction (float and double) after nearbyint() to prevent compensated subtraction reordering. - Add missing barriers in log2 (float and double) after to_float(k) to protect Kahan compensated summation.
d95d1d3 to
edba47b
Compare
|
I tried to have the no op always executed and the spill asm is guarded with macros. I think is the best solution as users should not worry about FP reordering in practice, except corner cases. We could document those if needed. |
There is a bug in nearbyint
-fassociative-mathbreaks it as it does not define__FAST_MATH__.Also the barrier used was causing a stack spill.
I centralized a barrier function that we can use everywhere in the code and used it in the places I know it helps.
Now we can just use
reassociation_barrierto avoid compiler reordering of instructions.Let me know if you like the internal API and if you need changes to it. I find that this is the solution that minimizes ifdef boilerplate and allows to dispatch to all archs. (With c++17 this will be simpler).
Cheers,
Marco