Skip to content

Add consumed_args callback optimization and apply to array_reduce() for carry to improve performance#21340

Open
drealecs wants to merge 2 commits intophp:masterfrom
drealecs:improve-array-reduce-consume-carry
Open

Add consumed_args callback optimization and apply to array_reduce() for carry to improve performance#21340
drealecs wants to merge 2 commits intophp:masterfrom
drealecs:improve-array-reduce-consume-carry

Conversation

@drealecs
Copy link
Copy Markdown

@drealecs drealecs commented Mar 4, 2026

Related to #8283

For some callback operations, it is safe to consume/reuse a value passed as a parameter instead of copying it first.
This PR adds a generic internal mechanism for that (through zend_fcall_info.consumed_args) and applies it to array_reduce() carry argument, and a few other callback-using functions.

This reduces copy-on-write churn, having the parameter value the only reference, and significantly improves array_reduce() performance for mutable accumulator patterns, while keeping userland semantics unchanged.

The same mechanism can be used in other callback-using functions, and it is also applied here to preg_replace_callback() / preg_replace_callback_array() and ob_start().

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch 2 times, most recently from 86f212d to 2f9aa58 Compare March 4, 2026 17:24
@drealecs drealecs changed the title Draft: Improve array_reduce() consume carry Improve array_reduce() consume carry Mar 4, 2026
@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 2f9aa58 to 3299db5 Compare March 4, 2026 18:55
@TimWolla
Copy link
Copy Markdown
Member

TimWolla commented Mar 4, 2026

Can you provide the benchmark you used to verify this improves the situation?

@drealecs
Copy link
Copy Markdown
Author

drealecs commented Mar 5, 2026

Can you provide the benchmark you used to verify this improves the situation?

Yes, this is a good point.
Here is a simple test I used locally:

<?php

declare(strict_types=1);

function test($size) {
    $data = [];
    for ($i = 0; $i < $size; $i++) {
        $data[] = substr(hash('crc32b', (string) $i), 0, 4);
    }

    $start = hrtime(true);
    $result = array_reduce(
        $data,
        static function ($carry, $item) {
            @$carry[$item]++;
            return $carry;
        },
        [],
    );
    $duration = hrtime(true) - $start;

    printf("size=%d, unique=%d, duration=%.3f ms\n", $size, count($result), $duration/1_000_000);
}

test(10_000);
test(20_000);
test(40_000);

On master, this is the output:

$ ./sapi/cli/php test_array_reduce.php
size=10000, unique=9980, duration=1199.907 ms
size=20000, unique=19084, duration=4844.224 ms
size=40000, unique=34066, duration=19404.972 ms

and we can see that the complexity is O(n^2).

On this branch, this is the output:

$ ./sapi/cli/php test_array_reduce.php
size=10000, unique=9980, duration=9.108 ms
size=20000, unique=19084, duration=18.905 ms
size=40000, unique=34066, duration=35.392 ms

with O(n) complexity.

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 3299db5 to cdbe42a Compare March 5, 2026 11:53
Copy link
Copy Markdown
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice proposal! The approach makes sense to me. Applying this to all cases will require some care. This is safe for any argument that is owned. So this is fine for $initial (and the return value of $callback in subsequent iterations), but wouldn't be for the elements $array (ZEND_HASH_FOREACH_VAL(htbl, operand)) given operand has not been ADDREFd at this point.

Before merging this, it may make sense to scout the code base to see which functions this may be applied to. Some of the use cases might influence the design.

@iluuu1994 iluuu1994 requested a review from Girgias March 6, 2026 13:04
@drealecs
Copy link
Copy Markdown
Author

drealecs commented Mar 6, 2026

Before merging this, it may make sense to scout the code base to see which functions this may be applied to. Some of the use cases might influence the design.

I already did some analysis, and there are a few other places where we could implement it to get some improvements:

  • preg_replace_callback() / preg_replace_callback_array(): for $matches, valuable if callback would modify it.
  • ob_start(): for $buffer, valuable if callback would modify it.
  • there are other cases, but it's not usual to modify the params in the callbacks, like $key for array_find().

What do you think? And related to the design, what can we improve?

Copy link
Copy Markdown
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have this split into two PRs. One that adds the consumed_args with various tests added to the zend_test extension (e.g. ref variables as that seems to be explicitly checked and I'd like to understand more) and then a follow-up PR to add support for array_reduce() and whatever other functions.

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch 2 times, most recently from 008446f to fb5617f Compare March 7, 2026 11:39
@drealecs drealecs requested a review from kocsismate as a code owner March 7, 2026 11:39
@drealecs
Copy link
Copy Markdown
Author

drealecs commented Mar 7, 2026

I'd rather have this split into two PRs. One that adds the consumed_args with various tests added to the zend_test extension (e.g. ref variables as that seems to be explicitly checked and I'd like to understand more) and then a follow-up PR to add support for array_reduce() and whatever other functions.

Thank you for the review, Gina. Great point about the tests.
I implemented them using the zend_test extension, including the reference-sensitive paths. Please let me know if this covers the cases you had in mind.

I’d prefer to keep this as a single PR if that works for you. I did create exactly two commits from the start, and I plan to keep them that way, so the separation is explicit and easier to review. They contain now:

  1. engine-only consumed_args support in zend_call_function() + tests.
  2. adoption in array_reduce() + some other callback-using functions.

If you still prefer two separate PRs, I can split them, but I hoped this structure would keep review manageable while preserving context. And if merging can be done not using squash, it will keep git history relevant, but I don't know if that's something commonly used, so let me know if I should change it or if it works for you this way. Thanks again!

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 2847abb to 9171681 Compare March 7, 2026 12:43
@drealecs drealecs requested a review from Girgias March 7, 2026 12:44
Copy link
Copy Markdown
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch 4 times, most recently from ba43bb5 to 23853ee Compare March 19, 2026 09:41
Copy link
Copy Markdown
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming tests pass. Thanks for the changes @drealecs!

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 23853ee to b54002d Compare March 19, 2026 10:11
@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch 2 times, most recently from 7eedaf8 to 24ca52b Compare March 19, 2026 12:09
@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch 3 times, most recently from bd76567 to 804c1ed Compare March 23, 2026 15:25
Copy link
Copy Markdown
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing any more obvious issues, but I didn't check this in detail beyond "reading the diff".

@ndossche ndossche removed their request for review March 24, 2026 22:09
@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch 3 times, most recently from 5fafbe4 to 5bddc39 Compare March 26, 2026 14:38
Comment on lines +26 to +29
gc_collect_cycles();
gc_mem_caches();
memory_reset_peak_usage();
$start = memory_get_peak_usage();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is very reliable :/ Maybe the better option is to create a new "debug" function in zend_test that takes part of the debug_zval_dump() code and just returns an integer for the refcount number.

if (Z_REFCOUNTED_P(struc)) {
	return Z_REFCOUNT_P(struc);
} else {
	/* interned */
	return -1:
}

But I totally get that you're fed up with the wild goose chase, so also happy to just drop the test. :)

Copy link
Copy Markdown
Author

@drealecs drealecs Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed now the last commit with testing ob_start using a new refcounting test function, zend_test_refcount().

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I totally get that you're fed up with the wild goose chase, so also happy to just drop the test. :)

😄 No worries, I considered from the start that it might take some time.

Also, since things are (almost) ready, and as you were the one mentioning it, to split the PR: I still kept it as it is, the PR is made of two commits, taking care to keep them well composed and focused on what each represents.
So, just to be sure... does this sound good? And also wondering if the merge can be done without using squash, so they are preserved?

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 5bddc39 to f4789c0 Compare March 27, 2026 14:37
@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from f4789c0 to c59fd5d Compare March 29, 2026 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants