fix: harden reassociation barriers for fast-math by DiamonDinoia · Pull Request #1276 · xtensor-stack/xsimd

DiamonDinoia · 2026-03-19T20:06:39Z

There is a bug in nearbyint -fassociative-math breaks it as it does not define __FAST_MATH__.
Also the barrier used was causing a stack spill.

I centralized a barrier function that we can use everywhere in the code and used it in the places I know it helps.

Now we can just use reassociation_barrier to avoid compiler reordering of instructions.

Let me know if you like the internal API and if you need changes to it. I find that this is the solution that minimizes ifdef boilerplate and allows to dispatch to all archs. (With c++17 this will be simpler).

Cheers,
Marco

DiamonDinoia · 2026-03-23T20:29:52Z

@serge-sans-paille this is also ready for review.

.github/workflows/windows.yml

include/xsimd/arch/xsimd_common_fwd.hpp

DiamonDinoia · 2026-03-27T20:33:07Z

I think I can simplify this a bit more. A is not needed anymore if I we go this route.

serge-sans-paille · 2026-03-29T06:54:25Z

include/xsimd/arch/common/xsimd_common_details.hpp

+            template <class T>
+            XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept
+            {
+#if XSIMD_WITH_INLINE_ASM


Shouldn't we make this empty if we're not under fast-math?

DiamonDinoia · 2026-03-29T15:00:47Z

Well, if we are not under fast math this should essentially be a no op. Also, fast math does not detect if only associative math is enabled . Which also breaks it. Now it becames checking all flags of the compiler that can reorder floating points.

…

On Sunday, March 29, 2026, serge-sans-paille ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In include/xsimd/arch/common/xsimd_common_details.hpp <https://urldefense.com/v3/__https://github.com/xtensor-stack/xsimd/pull/1276?email_source=notifications&email_token=ACGKNQMMCG3APXMEDVTJKC34TDCDPA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBSGY2DINRZGMZ2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K64DSL5ZGK5TJMV3V6Y3MNFRWW*discussion_r3005807424__;Iw!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKY2y44AK$> : > + // exists at each call site; it is unused at runtime. + // + // Two overloads: + // reassociation_barrier(reg, reason) – raw register + // reassociation_barrier(batch, reason) – extracts .data + // + // Uses the tightest register-class constraint for the target so + // the value stays in its native SIMD register (no spill): + // x86 (SSE/AVX/AVX-512) : "+x" – XMM / YMM / ZMM + // ARM (NEON / SVE) : "+w" – vector / SVE Z-reg + // PPC (VSX) : "+wa" – VS register + // other / MSVC : address + memory clobber (fallback) + template <class T> + XSIMD_INLINE void reassociation_barrier(T& x, const char*) noexcept + { +#if XSIMD_WITH_INLINE_ASM Shouldn't we make this empty if we're not under fast-math? — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/xtensor-stack/xsimd/pull/1276?email_source=notifications&email_token=ACGKNQPGDROZDUYFFC3B7XL4TDCDPA5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBSGY2DINRZGMZ2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2L24DSL5ZGK5TJMV3V63TPORUWM2LDMF2GS33OONPWG3DJMNVQ*pullrequestreview-4026446933__;Iw!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKYc70ZNS$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ACGKNQN7OHPQ2TPQJDWPYTL4TDCDPAVCNFSM6AAAAACWYMMMSGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DAMRWGQ2DMOJTGM__;!!DSb-azq1wVFtOg!WiXH_0OrBz6JCIv7x-DiGYBuUBKXVDKcuEENC8ic8K5UQ3HRuKK4aVV6FfZKWDQfRzIQGM5ZEcg7sPaZKJwzfq1tKdMsdfC6$> . You are receiving this because you authored the thread.Message ID: ***@***.***>

serge-sans-paille · 2026-03-29T18:12:39Z

Well, I don't think we can say that

__asm__ volatile("" : : "r"(&x) : "memory");

is a no-op. Should we allow for user-defined XSIMD_REASSOCIATIVE which would be forced by __FAST_MATH__ but could also be set by the user?
I don't want fast-math to impact non-fast math code generation.

DiamonDinoia · 2026-03-29T20:47:05Z

Let me think about this for a second. MSVC does not allow inline asm. I should update the comment there. There might be better alternatives for scalars, WASM, and RISC-V that don't impact performance.

- Add zero-cost register constraints for all supported architectures: x86 "+x", ARM NEON/SVE "+w", PPC VSX "+wa", RISC-V scalar "+f", RISC-V RVV "+vr" (GCC 15+ / Clang 20+). - Replace old "r"(&x):"memory" fallback with "+m" guarded by new XSIMD_REASSOCIATIVE_MATH macro so unknown targets only spill when the compiler can actually reassociate. - Add XSIMD_REASSOCIATIVE_MATH config macro, auto-detected from __FAST_MATH__ / __ASSOCIATIVE_MATH__, user-overridable for Clang with standalone -fassociative-math. - Add std::array overload so emulated batches get per-element barriers instead of spilling the whole array. - Add missing barriers in exp/exp2/exp10 range reduction (float and double) after nearbyint() to prevent compensated subtraction reordering. - Add missing barriers in log2 (float and double) after to_float(k) to protect Kahan compensated summation.

DiamonDinoia · 2026-03-30T17:37:43Z

I tried to have the no op always executed and the spill asm is guarded with macros. I think is the best solution as users should not worry about FP reordering in practice, except corner cases. We could document those if needed.

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 5 times, most recently from f9a9992 to 2f9a431 Compare March 23, 2026 19:55

DiamonDinoia changed the title ~~fix: harden reassociation barriers for fast-math nearbyint~~ fix: harden reassociation barriers for fast-math Mar 23, 2026

DiamonDinoia mentioned this pull request Mar 23, 2026

test(s) fail for x86_64 musl #1244

Open

serge-sans-paille requested changes Mar 27, 2026

View reviewed changes

.github/workflows/windows.yml Outdated Show resolved Hide resolved

.github/workflows/windows.yml Outdated Show resolved Hide resolved

.github/workflows/windows.yml Outdated Show resolved Hide resolved

include/xsimd/arch/xsimd_common_fwd.hpp Outdated Show resolved Hide resolved

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 5 times, most recently from 3685ff0 to c4dec73 Compare March 27, 2026 19:57

DiamonDinoia mentioned this pull request Mar 27, 2026

ci: add clang-cl Windows CI jobs #1283

Open

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 2 times, most recently from 6d6bc61 to d571f2f Compare March 28, 2026 22:42

serge-sans-paille reviewed Mar 29, 2026

View reviewed changes

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch 3 times, most recently from b34c94a to d95d1d3 Compare March 30, 2026 16:53

DiamonDinoia force-pushed the fix/nearbyint-fastmath branch from d95d1d3 to edba47b Compare March 30, 2026 16:57

DiamonDinoia requested a review from serge-sans-paille March 30, 2026 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden reassociation barriers for fast-math#1276

fix: harden reassociation barriers for fast-math#1276
DiamonDinoia wants to merge 1 commit intoxtensor-stack:masterfrom
DiamonDinoia:fix/nearbyint-fastmath

DiamonDinoia commented Mar 19, 2026 •

edited

Loading

Uh oh!

DiamonDinoia commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DiamonDinoia commented Mar 27, 2026

Uh oh!

serge-sans-paille Mar 29, 2026

Uh oh!

DiamonDinoia commented Mar 29, 2026 via email

Uh oh!

serge-sans-paille commented Mar 29, 2026

Uh oh!

DiamonDinoia commented Mar 29, 2026 •

edited

Loading

Uh oh!

DiamonDinoia commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DiamonDinoia commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiamonDinoia commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DiamonDinoia commented Mar 27, 2026

Uh oh!

serge-sans-paille Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

DiamonDinoia commented Mar 29, 2026 via email

Uh oh!

serge-sans-paille commented Mar 29, 2026

Uh oh!

DiamonDinoia commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiamonDinoia commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DiamonDinoia commented Mar 19, 2026 •

edited

Loading

DiamonDinoia commented Mar 29, 2026 •

edited

Loading