Skip to content

Add Process-wide memory-aware limiter for OOM protection #2425

@lalitb

Description

@lalitb

Pre-filing checklist

  • I searched existing issues and didn't find a duplicate

Component(s)

Rust OTAP dataflow (rust/otap-dataflow/)

Is your feature request related to a problem?

As an extension of the original resource-control discussion in #919, I think the Rust collector still lacks one important protection that the Go collector has today: a process-wide, memory-aware limiter.

The collector already has strong structural backpressure mechanisms such as bounded channels, topic queue limits, fanout max_inflight, and disk-budget controls. These are useful, but they are all structural. None of them react to actual process memory usage.

That matters because total memory is not determined only by queue capacities. It also includes:

  • request bodies and decode buffers
  • compression/decompression overhead
  • connection/request state
  • in-flight work outside queue accounting
  • allocator fragmentation / resident pages
  • thread stacks, runtime overhead, mmap/native allocations

It also depends on the size of each individual event or batch. Bounded queues usually cap the number of queued items, not the number of bytes they occupy.

For example, with a channel size of 128:

  • if each event is roughly 100 KB, the queue payload alone is about 12.8 MB
  • if each event is roughly 1 MB, the queue payload alone is about 128 MB

So the same static queue configuration can produce very different memory footprints depending on payload size, which is another reason a runtime memory-aware limiter is still needed.

So even with bounded channels, the process can still approach cgroup / RSS limits and get OOM-killed.

This is especially relevant in Kubernetes, where the collector runs under a cgroup memory limit and can be OOM-killed even if internal queues are bounded, so reacting to actual process/cgroup memory is important.

Proposed Solution

Introduce a Phase 1 process-wide memory limiter for the Rust collector.

This would be a narrower, concrete next step that complements the broader framework discussion in #919, rather than replacing it.

Scope

Add a process-wide limiter that:

  • periodically samples real memory usage
  • maintains Normal / Soft / Hard pressure state
  • rejects new ingress while under pressure
  • optionally fails readiness on hard pressure
  • exposes telemetry so operators can see when the limiter is active

Memory sources

Use the following priority order:

  1. cgroup working set, when available
  2. RSS
  3. jemalloc resident

This aligns the limiter with what the host/container environment actually cares about for OOM behavior.

Ingress behavior

When memory exceeds the configured soft limit:

  • reject new OTLP/gRPC requests early with ResourceExhausted
  • reject new OTLP/HTTP requests early with 503 Service Unavailable
  • allow in-flight work to drain naturally

When memory exceeds the hard limit:

  • continue rejecting new ingress
  • optionally fail readiness so Kubernetes / load balancers stop routing traffic

Why ingress shedding

If the limiter only acts deeper in the pipeline, the collector has already accepted request state and often already allocated the memory we are trying to protect against.

Rejecting at ingress is the highest-value behavior because it prevents the collector from taking on additional memory load once it is already under pressure.

Why process-wide first

Phase 1 should be process-wide because the actual failure boundary is process / cgroup OOM.

Per-pipeline-group limits may be valuable later, but they are harder to enforce correctly because memory ownership is shared across:

  • allocators
  • runtime state
  • topics and shared buffers
  • connection/request handling
  • fragmentation and resident pages

So a process-wide limiter is the most reliable and highest-value first step.

Phase 1

Phase 1 could include:

  • process-wide limiter state
  • periodic memory sampling
  • configurable soft / hard limits
  • early ingress rejection for OTLP HTTP and gRPC
  • readiness failure on hard pressure
  • metrics/logging for limiter state and rejected requests
  • cgroup-aware limit derivation when appropriate

Phase 2

Phase 2 could build on this with allocator- and environment-specific recovery behavior, for example:

  • jemalloc tuning / reclaim hooks on hard-pressure transitions
  • more detailed metrics and reason-specific rejection counters
  • per-signal or per-receiver priority policies
  • admin/debug visibility into current limiter state and sampled source
  • future exploration of per-pipeline-group budgeting if justified

Alternatives Considered

A few alternatives were considered:

1. Rely only on existing bounded queues / static queue sizing

This helps, but it is not sufficient.

Bounded queues and existing backpressure mechanisms limit some parts of the system, but they do not cap total process memory. They usually bound item count rather than byte size, and request/batch sizes can vary significantly, so the same queue configuration can produce very different memory footprints.

2. Implement a generalized limiter framework first

A broader limiter framework may eventually support memory, rate, and other resource-control policies in a unified way, and that direction is already being discussed in #919.

However, a dedicated memory limiter is still a worthwhile Phase 1 feature because it solves a concrete OOM-protection problem with much lower complexity and immediate operational value. It should not be viewed as a throwaway patch; several of its core pieces would still be useful if a broader limiter framework is introduced later.

3. Per-pipeline-group memory limits first

This may be useful later, but it is harder to make correct because actual memory ownership is process-wide and shared across allocators, runtime state, topics, and connection handling.

A process-wide limiter is a better first step because the actual failure boundary is process / cgroup OOM.

Additional Context

How this relates to #919

Issue #919 discusses resource-control more broadly, including rate limiting and memory limiting as part of a larger framework.

This proposal is intended as a concrete next step in that direction:

  • narrow scope
  • high operational value
  • minimal architectural risk
  • directly addresses process OOM protection

In other words, this can be viewed as a practical Phase 1 memory-limiter implementation that complements the broader framework discussion in #919 rather than replacing it.

This should not be viewed as a throwaway patch. It is a useful Phase 1 feature that addresses a real OOM-protection gap today, while also providing pieces that can be reused or adapted if a broader limiter framework is implemented later.

Relevant Go collector prior art

One useful takeaway from the Go collector is that memory-aware limiting is a real operational safeguard, not just a tuning feature.

Platform notes

While cgroup-aware behavior is especially useful on Linux/Kubernetes, the basic limiter concept is not Linux-only:

  • RSS-based limiting can be cross-platform
  • cgroup-aware behavior is an additional Linux/container-specific improvement

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions