continuous-batching

Here are 10 public repositories matching this topic...

psmarter / mini-infer

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, MoE expert parallelism, OpenAI-compatible serving

machine-learning cuda inference pytorch transformer triton moe quantization language-model inference-engine kv-cache tensor-parallelism llm speculative-decoding pagedattention continuous-batching

Updated Mar 28, 2026
Python

lumia431 / photon_infer

Star

A High-Performance LLM Inference Engine with vLLM-Style Continuous Batching

modern-cpp inference-engine ai-infra vllm llm-inference paged-attention continuous-batching

Updated Jan 2, 2026
C++

gty111 / gLLM

Star

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

pipeline-parallelism tensor-parallelism llm-serving llm-inference pagedattention continuous-batching qwen3 token-throttling chunked-prefill

Updated Mar 25, 2026
Python

caimari / vtts

Star

Continuous batching for TTS — like vLLM, but for voice. Serve 10+ simultaneous text-to-speech requests on a single GPU.

text-to-speech pytorch tts speech-synthesis voice-synthesis voice-cloning voice-agent gpu-inference vllm continuous-batching real-time-tts qwen3-tts

Updated Mar 15, 2026
Python

swaylenhayes / vllm-mlx

Star

Fork of OpenAI and Anthropic compatible server for Apple Silicon. Native MLX backend, 500+ tok/s. Run LLMs and vision-language models with continuous batching, MCP tool calling, and multimodal support.

inference-server mlx multimodal apple-silicon llm vllm local-ai continuous-batching tool-calling openai-compatible

Updated Mar 20, 2026
Python

maxime-dlabai / mlx-continuous-batching

Star

OpenAI-compatible server with continuous batching for MLX on Apple Silicon

macos inference text-generation mlx apple-silicon openai-api llm continuous-batching

Updated Dec 4, 2025
Python

nagababumo / Efficiently-Serving-LLMs

Star

batching lora quantization lorax low-rank-adaptation continuous-batching multi-lora

Updated Jun 19, 2024
Jupyter Notebook

AdaXL / adaptive-llm-scheduler

Star

Adaptive LLM inference scheduler simulation — continuous batching, priority preemption, KV-cache routing, and speculative decoding in Python/asyncio.

research gpu scheduling inference asyncio kv-cache llm speculative-decoding continuous-batching

Updated Mar 10, 2026
Python

LessUp / hetero-paged-infer

Star

PagedAttention + Continuous Batching Inference Engine Prototype (Rust): Paged KV Cache & Dynamic Scheduling | PagedAttention + Continuous Batching 推理引擎原型（Rust），KV Cache 分页管理与动态调度

rust gpu-computing llm llm-inference paged-attention continuous-batching

Updated Mar 24, 2026
Rust

JhoneCasali / llm-batch

Star

Process batches of large language model tasks efficiently using multithreading in C++ for faster and scalable LLM workflows.

Updated Mar 28, 2026
C++

Improve this page

Add a description, image, and links to the continuous-batching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the continuous-batching topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

continuous-batching

Here are 10 public repositories matching this topic...

psmarter / mini-infer

lumia431 / photon_infer

gty111 / gLLM

caimari / vtts

swaylenhayes / vllm-mlx

maxime-dlabai / mlx-continuous-batching

nagababumo / Efficiently-Serving-LLMs

AdaXL / adaptive-llm-scheduler

LessUp / hetero-paged-infer

JhoneCasali / llm-batch

Improve this page

Add this topic to your repo