fix: lora-syncer gracefully handles adapter load failures without blocking#2702
fix: lora-syncer gracefully handles adapter load failures without blocking#2702gyliu513 wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gyliu513 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @gyliu513. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cc @kfswain |
|
Thanks @gyliu513! We've actually deprecated this feature. I really need to remove it. https://docs.vllm.ai/en/stable/design/lora_resolver_plugins/ would be the suggested path forward, so you can mount a volume to your vLLM replicas and assume that the Lora adapter will exist. Will you lmk if that works for you? If not, would love to find out why and see if we can solve those pain points |
|
Thanks @kfswain for the context! Good to know this is being deprecated. I'll take a look at the vLLM LoRA Resolver Plugin approach. A couple of questions as I evaluate it:
I can close this PR if the path forward is clear. Did you open issues for the plan of remove lora-syncer? I'd be happy to contribute, thanks1 |
What type of PR is this?
What this PR does / why we need it:
Fixes lora-syncer error handling so that a single adapter load/unload failure does not block processing of remaining adapters. Previously, a permanently invalid adapter (e.g., LoRA rank exceeding vLLM's max) would cause misleading
error logs and prevent other valid adapters from being loaded on every reconcile cycle.
Which issue(s) this PR fixes:
Fixes #584
/kind bug
Does this PR introduce a user-facing change?: