Add Process-wide memory-aware limiter for OOM protection

### Pre-filing checklist

- [x] I searched existing issues and didn't find a duplicate

### Component(s)

Rust OTAP dataflow (rust/otap-dataflow/)

### Is your feature request related to a problem?

As an extension of the original resource-control discussion in [#919](https://github.com/open-telemetry/otel-arrow/issues/919), I think the Rust collector still lacks one important protection that the Go collector has today: a process-wide, memory-aware limiter.

The collector already has strong structural backpressure mechanisms such as bounded channels, topic queue limits, fanout `max_inflight`, and disk-budget controls. These are useful, but they are all structural. None of them react to actual process memory usage.

That matters because total memory is not determined only by queue capacities. It also includes:

- request bodies and decode buffers
- compression/decompression overhead
- connection/request state
- in-flight work outside queue accounting
- allocator fragmentation / resident pages
- thread stacks, runtime overhead, mmap/native allocations

It also depends on the size of each individual event or batch. Bounded queues usually cap the number of queued items, not the number of bytes they occupy.

For example, with a channel size of `128`:
- if each event is roughly `100 KB`, the queue payload alone is about `12.8 MB`
- if each event is roughly `1 MB`, the queue payload alone is about `128 MB`

So the same static queue configuration can produce very different memory footprints depending on payload size, which is another reason a runtime memory-aware limiter is still needed.

So even with bounded channels, the process can still approach cgroup / RSS limits and get OOM-killed.

This is especially relevant in Kubernetes, where the collector runs under a cgroup memory limit and can be OOM-killed even if internal queues are bounded, so reacting to actual process/cgroup memory is important.

### Proposed Solution

Introduce a Phase 1 process-wide memory limiter for the Rust collector.

This would be a narrower, concrete next step that complements the broader framework discussion in [#919](https://github.com/open-telemetry/otel-arrow/issues/919), rather than replacing it.

### Scope

Add a process-wide limiter that:

- periodically samples real memory usage
- maintains `Normal / Soft / Hard` pressure state
- rejects new ingress while under pressure
- optionally fails readiness on hard pressure
- exposes telemetry so operators can see when the limiter is active

#### Memory sources

Use the following priority order:

1. cgroup working set, when available
2. RSS
3. jemalloc resident

This aligns the limiter with what the host/container environment actually cares about for OOM behavior.

#### Ingress behavior

When memory exceeds the configured **soft** limit:

- reject new OTLP/gRPC requests early with `ResourceExhausted`
- reject new OTLP/HTTP requests early with `503 Service Unavailable`
- allow in-flight work to drain naturally

When memory exceeds the **hard** limit:

- continue rejecting new ingress
- optionally fail readiness so Kubernetes / load balancers stop routing traffic

#### Why ingress shedding

If the limiter only acts deeper in the pipeline, the collector has already accepted request state and often already allocated the memory we are trying to protect against.

Rejecting at ingress is the highest-value behavior because it prevents the collector from taking on additional memory load once it is already under pressure.

#### Why process-wide first

Phase 1 should be process-wide because the actual failure boundary is process / cgroup OOM.

Per-pipeline-group limits may be valuable later, but they are harder to enforce correctly because memory ownership is shared across:

- allocators
- runtime state
- topics and shared buffers
- connection/request handling
- fragmentation and resident pages

So a process-wide limiter is the most reliable and highest-value first step.

#### Phase 1

Phase 1 could include:

- process-wide limiter state
- periodic memory sampling
- configurable soft / hard limits
- early ingress rejection for OTLP HTTP and gRPC
- readiness failure on hard pressure
- metrics/logging for limiter state and rejected requests
- cgroup-aware limit derivation when appropriate

#### Phase 2

Phase 2 could build on this with allocator- and environment-specific recovery behavior, for example:

- jemalloc tuning / reclaim hooks on hard-pressure transitions
- more detailed metrics and reason-specific rejection counters
- per-signal or per-receiver priority policies
- admin/debug visibility into current limiter state and sampled source
- future exploration of per-pipeline-group budgeting if justified


### Alternatives Considered

A few alternatives were considered:

#### 1. Rely only on existing bounded queues / static queue sizing

This helps, but it is not sufficient.

Bounded queues and existing backpressure mechanisms limit some parts of the system, but they do not cap total process memory. They usually bound item count rather than byte size, and request/batch sizes can vary significantly, so the same queue configuration can produce very different memory footprints.

#### 2. Implement a generalized limiter framework first

A broader limiter framework may eventually support memory, rate, and other resource-control policies in a unified way, and that direction is already being discussed in [#919](https://github.com/open-telemetry/otel-arrow/issues/919).

However, a dedicated memory limiter is still a worthwhile Phase 1 feature because it solves a concrete OOM-protection problem with much lower complexity and immediate operational value. It should not be viewed as a throwaway patch; several of its core pieces would still be useful if a broader limiter framework is introduced later.

#### 3. Per-pipeline-group memory limits first

This may be useful later, but it is harder to make correct because actual memory ownership is process-wide and shared across allocators, runtime state, topics, and connection handling.

A process-wide limiter is a better first step because the actual failure boundary is process / cgroup OOM.


### Additional Context

#### How this relates to #919

Issue [#919](https://github.com/open-telemetry/otel-arrow/issues/919) discusses resource-control more broadly, including rate limiting and memory limiting as part of a larger framework.

This proposal is intended as a concrete next step in that direction:

- narrow scope
- high operational value
- minimal architectural risk
- directly addresses process OOM protection

In other words, this can be viewed as a practical Phase 1 memory-limiter implementation that complements the broader framework discussion in #919 rather than replacing it.

This should not be viewed as a throwaway patch. It is a useful Phase 1 feature that addresses a real OOM-protection gap today, while also providing pieces that can be reused or adapted if a broader limiter framework is implemented later.

#### Relevant Go collector prior art

- Go memory limiter processor docs:  
  [go.opentelemetry.io/collector/processor/memorylimiterprocessor](https://pkg.go.dev/go.opentelemetry.io/collector/processor/memorylimiterprocessor)

- Go collector repo:  
  [open-telemetry/opentelemetry-collector](https://github.com/open-telemetry/opentelemetry-collector)

- Related discussion about applying limiting at receivers / ingress:  
  [Applying memory_limiter extension #9591](https://github.com/open-telemetry/opentelemetry-collector/issues/9591)

One useful takeaway from the Go collector is that memory-aware limiting is a real operational safeguard, not just a tuning feature.

## Platform notes

While cgroup-aware behavior is especially useful on Linux/Kubernetes, the basic limiter concept is not Linux-only:

- RSS-based limiting can be cross-platform
- cgroup-aware behavior is an additional Linux/container-specific improvement


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Process-wide memory-aware limiter for OOM protection #2425

Pre-filing checklist

Component(s)

Is your feature request related to a problem?

Proposed Solution

Scope

Memory sources

Ingress behavior

Why ingress shedding

Why process-wide first

Phase 1

Phase 2

Alternatives Considered

1. Rely only on existing bounded queues / static queue sizing

2. Implement a generalized limiter framework first

3. Per-pipeline-group memory limits first

Additional Context

How this relates to #919

Relevant Go collector prior art

Platform notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Process-wide memory-aware limiter for OOM protection #2425

Description

Pre-filing checklist

Component(s)

Is your feature request related to a problem?

Proposed Solution

Scope

Memory sources

Ingress behavior

Why ingress shedding

Why process-wide first

Phase 1

Phase 2

Alternatives Considered

1. Rely only on existing bounded queues / static queue sizing

2. Implement a generalized limiter framework first

3. Per-pipeline-group memory limits first

Additional Context

How this relates to #919

Relevant Go collector prior art

Platform notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions