Skip to content

perf: Element-wise comparison only for tolerance-requiring data types#26

Merged
Marius Merkle (MariusMerkleQC) merged 18 commits intomainfrom
optimize
Mar 30, 2026
Merged

perf: Element-wise comparison only for tolerance-requiring data types#26
Marius Merkle (MariusMerkleQC) merged 18 commits intomainfrom
optimize

Conversation

@MariusMerkleQC
Copy link
Copy Markdown
Collaborator

@MariusMerkleQC Marius Merkle (MariusMerkleQC) commented Mar 27, 2026

Motivation

See this comment.

Changes

Introduce a function _needs_element_wise_comparison that checks whether element-wise comparison needs to be performed; this is the case for

(1) float vs numeric columns -> absolute and relative tolerances apply (-> _is_float_numeric_pair())
(2) temporal columns -> absolute temporal tolerance applies (-> _is_temporal_pair())
(3) Different enums
(4) Enum vs categorical comparison

In all other cases, naive comparison suffices, and this shortcut is taken if the above helper returns False. This avoids the expensive _compare_sequence_columns(). The performance improvement can be seen in the benchmark test.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (ff8439c) to head (8dfe919).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #26   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines          758       776   +18     
=========================================
+ Hits           758       776   +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes condition_equal_columns for nested list/array columns by avoiding the expensive element-wise comparison path when tolerances/special handling aren’t needed, and updates the performance benchmark accordingly.

Changes:

  • Add _needs_element_wise_comparison() (plus helpers) to decide when list/array columns require element-wise comparison.
  • Shortcut list/array comparisons to eq_missing() when element-wise handling is deemed unnecessary.
  • Update the performance test to assert comparable performance for list<i64> comparisons.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
diffly/_conditions.py Introduces dtype-based gating to skip element-wise list/array comparisons unless tolerances/special handling are needed.
tests/test_performance.py Updates benchmark expectations to ensure the optimized path is not significantly slower than direct eq_missing().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks

Base automatically changed from benchmark to main March 30, 2026 11:30
@MariusMerkleQC Marius Merkle (MariusMerkleQC) merged commit a6992fa into main Mar 30, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants