Skip to content

fix: lora-syncer gracefully handles adapter load failures without blocking#2702

Open
gyliu513 wants to merge 1 commit intokubernetes-sigs:mainfrom
gyliu513:sc
Open

fix: lora-syncer gracefully handles adapter load failures without blocking#2702
gyliu513 wants to merge 1 commit intokubernetes-sigs:mainfrom
gyliu513:sc

Conversation

@gyliu513
Copy link
Copy Markdown
Contributor

@gyliu513 gyliu513 commented Mar 27, 2026

What type of PR is this?

What this PR does / why we need it:

Fixes lora-syncer error handling so that a single adapter load/unload failure does not block processing of remaining adapters. Previously, a permanently invalid adapter (e.g., LoRA rank exceeding vLLM's max) would cause misleading
error logs and prevent other valid adapters from being loaded on every reconcile cycle.

Which issue(s) this PR fixes:

Fixes #584

/kind bug

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 27, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 27, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit bfe17bb
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69c6b3e6a99d8400087470b0
😎 Deploy Preview https://deploy-preview-2702--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and liu-cong March 27, 2026 16:44
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 27, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gyliu513
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 27, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @gyliu513. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 27, 2026
@gyliu513
Copy link
Copy Markdown
Contributor Author

/cc @kfswain

@k8s-ci-robot k8s-ci-robot requested a review from kfswain March 27, 2026 16:45
@kfswain
Copy link
Copy Markdown
Collaborator

kfswain commented Mar 27, 2026

Thanks @gyliu513!

We've actually deprecated this feature. I really need to remove it.

https://docs.vllm.ai/en/stable/design/lora_resolver_plugins/ would be the suggested path forward, so you can mount a volume to your vLLM replicas and assume that the Lora adapter will exist.

Will you lmk if that works for you? If not, would love to find out why and see if we can solve those pain points

@gyliu513
Copy link
Copy Markdown
Contributor Author

Thanks @kfswain for the context! Good to know this is being deprecated.

I'll take a look at the vLLM LoRA Resolver Plugin approach. A couple of questions as I evaluate it:

  1. Lazy vs preloading: The resolver plugin resolves adapters on-demand when a request arrives, which means the first request may hit cold-start latency. The lora-syncer preloads adapters upfront via ConfigMap. Is there a recommended way to handle this with the resolver plugin?

  2. EPP LoRA Affinity integration: The current EPP scheduling relies on vLLM's running_lora_adapters / waiting_lora_adapters / max_lora Prometheus metrics to do pod affinity scoring. Will these metrics still work correctly with the resolver plugin approach?

  3. Explicit unloading: lora-syncer supports ensureNotExist to actively unload adapters and free GPU memory. Is there an equivalent mechanism with the resolver plugin?

I can close this PR if the path forward is clear. Did you open issues for the plan of remove lora-syncer? I'd be happy to contribute, thanks1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lora-syncer tool's error handling needs improvement

3 participants