-
Notifications
You must be signed in to change notification settings - Fork 80
Description
Pre-filing checklist
- I searched existing issues and didn't find a duplicate
Component(s)
Rust OTAP dataflow (rust/otap-dataflow/)
Is your feature request related to a problem?
As an extension of the original resource-control discussion in #919, I think the Rust collector still lacks one important protection that the Go collector has today: a process-wide, memory-aware limiter.
The collector already has strong structural backpressure mechanisms such as bounded channels, topic queue limits, fanout max_inflight, and disk-budget controls. These are useful, but they are all structural. None of them react to actual process memory usage.
That matters because total memory is not determined only by queue capacities. It also includes:
- request bodies and decode buffers
- compression/decompression overhead
- connection/request state
- in-flight work outside queue accounting
- allocator fragmentation / resident pages
- thread stacks, runtime overhead, mmap/native allocations
It also depends on the size of each individual event or batch. Bounded queues usually cap the number of queued items, not the number of bytes they occupy.
For example, with a channel size of 128:
- if each event is roughly
100 KB, the queue payload alone is about12.8 MB - if each event is roughly
1 MB, the queue payload alone is about128 MB
So the same static queue configuration can produce very different memory footprints depending on payload size, which is another reason a runtime memory-aware limiter is still needed.
So even with bounded channels, the process can still approach cgroup / RSS limits and get OOM-killed.
This is especially relevant in Kubernetes, where the collector runs under a cgroup memory limit and can be OOM-killed even if internal queues are bounded, so reacting to actual process/cgroup memory is important.
Proposed Solution
Introduce a Phase 1 process-wide memory limiter for the Rust collector.
This would be a narrower, concrete next step that complements the broader framework discussion in #919, rather than replacing it.
Scope
Add a process-wide limiter that:
- periodically samples real memory usage
- maintains
Normal / Soft / Hardpressure state - rejects new ingress while under pressure
- optionally fails readiness on hard pressure
- exposes telemetry so operators can see when the limiter is active
Memory sources
Use the following priority order:
- cgroup working set, when available
- RSS
- jemalloc resident
This aligns the limiter with what the host/container environment actually cares about for OOM behavior.
Ingress behavior
When memory exceeds the configured soft limit:
- reject new OTLP/gRPC requests early with
ResourceExhausted - reject new OTLP/HTTP requests early with
503 Service Unavailable - allow in-flight work to drain naturally
When memory exceeds the hard limit:
- continue rejecting new ingress
- optionally fail readiness so Kubernetes / load balancers stop routing traffic
Why ingress shedding
If the limiter only acts deeper in the pipeline, the collector has already accepted request state and often already allocated the memory we are trying to protect against.
Rejecting at ingress is the highest-value behavior because it prevents the collector from taking on additional memory load once it is already under pressure.
Why process-wide first
Phase 1 should be process-wide because the actual failure boundary is process / cgroup OOM.
Per-pipeline-group limits may be valuable later, but they are harder to enforce correctly because memory ownership is shared across:
- allocators
- runtime state
- topics and shared buffers
- connection/request handling
- fragmentation and resident pages
So a process-wide limiter is the most reliable and highest-value first step.
Phase 1
Phase 1 could include:
- process-wide limiter state
- periodic memory sampling
- configurable soft / hard limits
- early ingress rejection for OTLP HTTP and gRPC
- readiness failure on hard pressure
- metrics/logging for limiter state and rejected requests
- cgroup-aware limit derivation when appropriate
Phase 2
Phase 2 could build on this with allocator- and environment-specific recovery behavior, for example:
- jemalloc tuning / reclaim hooks on hard-pressure transitions
- more detailed metrics and reason-specific rejection counters
- per-signal or per-receiver priority policies
- admin/debug visibility into current limiter state and sampled source
- future exploration of per-pipeline-group budgeting if justified
Alternatives Considered
A few alternatives were considered:
1. Rely only on existing bounded queues / static queue sizing
This helps, but it is not sufficient.
Bounded queues and existing backpressure mechanisms limit some parts of the system, but they do not cap total process memory. They usually bound item count rather than byte size, and request/batch sizes can vary significantly, so the same queue configuration can produce very different memory footprints.
2. Implement a generalized limiter framework first
A broader limiter framework may eventually support memory, rate, and other resource-control policies in a unified way, and that direction is already being discussed in #919.
However, a dedicated memory limiter is still a worthwhile Phase 1 feature because it solves a concrete OOM-protection problem with much lower complexity and immediate operational value. It should not be viewed as a throwaway patch; several of its core pieces would still be useful if a broader limiter framework is introduced later.
3. Per-pipeline-group memory limits first
This may be useful later, but it is harder to make correct because actual memory ownership is process-wide and shared across allocators, runtime state, topics, and connection handling.
A process-wide limiter is a better first step because the actual failure boundary is process / cgroup OOM.
Additional Context
How this relates to #919
Issue #919 discusses resource-control more broadly, including rate limiting and memory limiting as part of a larger framework.
This proposal is intended as a concrete next step in that direction:
- narrow scope
- high operational value
- minimal architectural risk
- directly addresses process OOM protection
In other words, this can be viewed as a practical Phase 1 memory-limiter implementation that complements the broader framework discussion in #919 rather than replacing it.
This should not be viewed as a throwaway patch. It is a useful Phase 1 feature that addresses a real OOM-protection gap today, while also providing pieces that can be reused or adapted if a broader limiter framework is implemented later.
Relevant Go collector prior art
-
Go memory limiter processor docs:
go.opentelemetry.io/collector/processor/memorylimiterprocessor -
Go collector repo:
open-telemetry/opentelemetry-collector -
Related discussion about applying limiting at receivers / ingress:
Applying memory_limiter extension #9591
One useful takeaway from the Go collector is that memory-aware limiting is a real operational safeguard, not just a tuning feature.
Platform notes
While cgroup-aware behavior is especially useful on Linux/Kubernetes, the basic limiter concept is not Linux-only:
- RSS-based limiting can be cross-platform
- cgroup-aware behavior is an additional Linux/container-specific improvement
Metadata
Metadata
Assignees
Labels
Type
Projects
Status