{"id":630,"date":"2026-03-21T22:52:57","date_gmt":"2026-03-21T14:52:57","guid":{"rendered":"https:\/\/pa.yingzhi8.cn\/index.php\/2026\/03\/21\/providers-vllm\/"},"modified":"2026-03-21T23:08:57","modified_gmt":"2026-03-21T15:08:57","slug":"providers-vllm","status":"publish","type":"post","link":"https:\/\/pa.yingzhi8.cn\/index.php\/2026\/03\/21\/providers-vllm\/","title":{"rendered":"vLLM"},"content":{"rendered":"<h1>vLLM<\/h1>\n<p>vLLM can serve open-source (and some custom) models via an <strong>OpenAI-compatible<\/strong> HTTP API. OpenClaw can connect to vLLM using the <code>openai-completions<\/code> API.<\/p>\n<p>OpenClaw can also <strong>auto-discover<\/strong> available models from vLLM when you opt in with <code>VLLM_API_KEY<\/code> (any value works if your server doesn\u2019t enforce auth) and you do not define an explicit <code>models.providers.vllm<\/code> entry.<\/p>\n<h2>Quick start<\/h2>\n<ol>\n<li>Start vLLM with an OpenAI-compatible server.<\/li>\n<\/ol>\n<p>Your base URL should expose <code>\/v1<\/code> endpoints (e.g. <code>\/v1\/models<\/code>, <code>\/v1\/chat\/completions<\/code>). vLLM commonly runs on:<\/p>\n<ul>\n<li><code>http:\/\/127.0.0.1:8000\/v1<\/code><\/li>\n<\/ul>\n<ol start=\"2\">\n<li>Opt in (any value works if no auth is configured):<\/li>\n<\/ol>\n<p>&#8220;`bash  theme={&#8220;theme&#8221;:{&#8220;light&#8221;:&#8221;min-light&#8221;,&#8221;dark&#8221;:&#8221;min-dark&#8221;}}<br \/>\nexport VLLM_API_KEY=&#8221;vllm-local&#8221;<\/p>\n<pre><code>\n3. Select a model (replace with one of your vLLM model IDs):\n\n```json5  theme={&quot;theme&quot;:{&quot;light&quot;:&quot;min-light&quot;,&quot;dark&quot;:&quot;min-dark&quot;}}\n{\n  agents: {\n    defaults: {\n      model: { primary: &quot;vllm\/your-model-id&quot; },\n    },\n  },\n}\n<\/code><\/pre>\n<h2>Model discovery (implicit provider)<\/h2>\n<p>When <code>VLLM_API_KEY<\/code> is set (or an auth profile exists) and you <strong>do not<\/strong> define <code>models.providers.vllm<\/code>, OpenClaw will query:<\/p>\n<ul>\n<li><code>GET http:\/\/127.0.0.1:8000\/v1\/models<\/code><\/li>\n<\/ul>\n<p>\u2026and convert the returned IDs into model entries.<\/p>\n<p>If you set <code>models.providers.vllm<\/code> explicitly, auto-discovery is skipped and you must define models manually.<\/p>\n<h2>Explicit configuration (manual models)<\/h2>\n<p>Use explicit config when:<\/p>\n<ul>\n<li>vLLM runs on a different host\/port.<\/li>\n<li>You want to pin <code>contextWindow<\/code>\/<code>maxTokens<\/code> values.<\/li>\n<li>Your server requires a real API key (or you want to control headers).<\/li>\n<\/ul>\n<p>&#8220;`json5  theme={&#8220;theme&#8221;:{&#8220;light&#8221;:&#8221;min-light&#8221;,&#8221;dark&#8221;:&#8221;min-dark&#8221;}}<br \/>\n{<br \/>\n  models: {<br \/>\n    providers: {<br \/>\n      vllm: {<br \/>\n        baseUrl: &#8220;http:\/\/127.0.0.1:8000\/v1&#8221;,<br \/>\n        apiKey: &#8220;${VLLM_API_KEY}&#8221;,<br \/>\n        api: &#8220;openai-completions&#8221;,<br \/>\n        models: [<br \/>\n          {<br \/>\n            id: &#8220;your-model-id&#8221;,<br \/>\n            name: &#8220;Local vLLM Model&#8221;,<br \/>\n            reasoning: false,<br \/>\n            input: [&#8220;text&#8221;],<br \/>\n            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },<br \/>\n            contextWindow: 128000,<br \/>\n            maxTokens: 8192,<br \/>\n          },<br \/>\n        ],<br \/>\n      },<br \/>\n    },<br \/>\n  },<br \/>\n}<\/p>\n<pre><code>\n## Troubleshooting\n\n* Check the server is reachable:\n\n```bash  theme={&quot;theme&quot;:{&quot;light&quot;:&quot;min-light&quot;,&quot;dark&quot;:&quot;min-dark&quot;}}\ncurl http:\/\/127.0.0.1:8000\/v1\/models\n<\/code><\/pre>\n<ul>\n<li>If requests fail with auth errors, set a real <code>VLLM_API_KEY<\/code> that matches your server configuration, or configure the provider explicitly under <code>models.providers.vllm<\/code>.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>vLLM vLLM can serve open-source (and some custom) model [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-630","post","type-post","status-publish","format-standard","hentry","category-docs"],"_links":{"self":[{"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/posts\/630","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/comments?post=630"}],"version-history":[{"count":2,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/posts\/630\/revisions"}],"predecessor-version":[{"id":734,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/posts\/630\/revisions\/734"}],"wp:attachment":[{"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/media?parent=630"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/categories?post=630"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/tags?post=630"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}