llmfit — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

检测本地硬件（ RAM、CPU、GPU/VRAM ），并推荐具有最佳量化、速度估计和拟合评分的最佳拟合本地LLM模型。

数据与表格

作者：Alex Jones @alexsjones

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v0.2.2

统计：⭐ 1 · 662 · 6 current installs · 6 all-time installs

⭐ 1

安装量（当前） 6

🛡 VirusTotal ：良性 · OpenClaw ：可疑

Package：llmfit

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：可疑

OpenClaw 评估

技能的运行时指令符合其规定的目的（检测硬件并推荐本地LLM ），但安装元数据不一致，且BREW/CARGO安装源不清楚-在安装或运行二进制文件之前进行验证。

目的

名称和描述与运行时指令相匹配： SKILL.md告诉代理运行llmfit CLI以检测硬件并生成模型建议。所需的二进制文件（ llmfit ）与该目的一致，并且没有请求不相关的凭据或文件。

说明范围

指令的范围仅限于运行llmfit命令（ SYSTEM、RECOMMEND ）并将输出映射到本地提供程序（ Ollama、vLLM、LM Studio ）。它们不指示读取任意系统文件或泄露机密。该技能建议编辑openclaw.json以配置模型，这与其目标一致。

安装机制

安装元数据不一致且存在潜在风险： SKILL.md/注册表列出了Homebrew公式“AlexsJones/llmfit” （第三方点击）和第二个安装条目，该条目标记为“cargo install llmfit” ，但标记为种类： “node” （注册表还列出了“node” ）。这种不匹配（节点与货物标签）是草率的，妨碍了对安装源的清晰审查。未知所有者的自制水龙头在使用前应进行审查，因为它们……

证书

技能不要求环境变量或凭据。这与本地硬件检测和推荐工具相称。

持久

始终为false ，并且技能不会请求或自动修改其他技能的配置。它仅建议编辑openclaw.json （用户驱动）。技能元数据或说明没有请求特权或持久在场。

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「llmfit」。简介：检测本地硬件（ RAM、CPU、GPU/VRAM ） ，并推荐具有最佳量化、速度估计和拟合评分的最佳拟合本地LLM模型。。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/alexsjones/llmfit/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: llmfit-advisor
description: Detect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM models with optimal quantization, speed estimates, and fit scoring.
metadata:
  {
    "openclaw":
      {
        "emoji": "🧠",
        "requires": { "bins": ["llmfit"] },
        "install":
          [
            {
              "id": "brew",
              "kind": "brew",
              "formula": "AlexsJones/llmfit",
              "bins": ["llmfit"],
              "label": "Install llmfit (brew tap AlexsJones/llmfit && brew install llmfit)",
            },
            {
              "id": "cargo",
              "kind": "node",
              "bins": ["llmfit"],
              "label": "Install llmfit (cargo install llmfit)",
            },
          ],
      },
  }
---

# llmfit-advisor

Hardware-aware local LLM advisor. Detects your system specs (RAM, CPU, GPU/VRAM) and recommends models that actually fit, with optimal quantization and speed estimates.

## When to use (trigger phrases)

Use this skill immediately when the user asks any of:

- "what local models can I run?"
- "which LLMs fit my hardware?"
- "recommend a local model"
- "what's the best model for my GPU?"
- "can I run Llama 70B locally?"
- "configure local models"
- "set up Ollama models"
- "what models fit my VRAM?"
- "help me pick a local model for coding"

Also use this skill when:

- The user wants to configure `models.providers.ollama` or `models.providers.lmstudio`
- The user mentions running models locally and you need to know what fits
- A model recommendation is needed and the user has local inference capability (Ollama, vLLM, LM Studio)

## Quick start

### Detect hardware

```bash
llmfit --json system
```

Returns JSON with CPU, RAM, GPU name, VRAM, multi-GPU info, and whether memory is unified (Apple Silicon).

### Get top recommendations

```bash
llmfit recommend --json --limit 5
```

Returns the top 5 models ranked by a composite score (quality, speed, fit, context) with optimal quantization for the detected hardware.

### Filter by use case

```bash
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 3
llmfit recommend --json --use-case chat --limit 3
```

Valid use cases: `general`, `coding`, `reasoning`, `chat`, `multimodal`, `embedding`.

### Filter by minimum fit level

```bash
llmfit recommend --json --min-fit good --limit 10
```

Valid fit levels (best to worst): `perfect`, `good`, `marginal`.

## Understanding the output

### System JSON

```json
{
  "system": {
    "cpu_name": "Apple M2 Max",
    "cpu_cores": 12,
    "total_ram_gb": 32.0,
    "available_ram_gb": 24.5,
    "has_gpu": true,
    "gpu_name": "Apple M2 Max",
    "gpu_vram_gb": 32.0,
    "gpu_count": 1,
    "backend": "Metal",
    "unified_memory": true
  }
}
```

### Recommendation JSON

Each model in the `models` array includes:

| Field | Meaning |
|---|---|
| `name` | HuggingFace model ID (e.g. `meta-llama/Llama-3.1-8B-Instruct`) |
| `provider` | Model provider (Meta, Alibaba, Google, etc.) |
| `params_b` | Parameter count in billions |
| `score` | Composite score 0–100 (higher is better) |
| `score_components` | Breakdown: `quality`, `speed`, `fit`, `context` (each 0–100) |
| `fit_level` | `Perfect`, `Good`, `Marginal`, or `TooTight` |
| `run_mode` | `GPU`, `CPU+GPU Offload`, or `CPU Only` |
| `best_quant` | Optimal quantization for the hardware (e.g. `Q5_K_M`, `Q4_K_M`) |
| `estimated_tps` | Estimated tokens per second |
| `memory_required_gb` | VRAM/RAM needed at this quantization |
| `memory_available_gb` | Available VRAM/RAM detected |
| `utilization_pct` | How much of available memory the model uses |
| `use_case` | What the model is designed for |
| `context_length` | Maximum context window |

### Fit levels explained

- **Perfect**: Model fits comfortably with room to spare. Ideal choice.
- **Good**: Model fits but uses most available memory. Will work well.
- **Marginal**: Model barely fits. May work but expect slower performance or reduced context.
- **TooTight**: Model does not fit. Do not recommend.

### Run modes explained

- **GPU**: Full GPU inference. Fastest. Model weights loaded entirely into VRAM.
- **CPU+GPU Offload**: Some layers on GPU, rest in system RAM. Slower than pure GPU.
- **CPU Only**: All inference on CPU using system RAM. Slowest but works without GPU.

## Configuring OpenClaw with results

After getting recommendations, configure the user's local model provider.

### For Ollama

Map the HuggingFace model name to its Ollama tag. Common mappings:

| llmfit name | Ollama tag |
|---|---|
| `meta-llama/Llama-3.1-8B-Instruct` | `llama3.1:8b` |
| `meta-llama/Llama-3.3-70B-Instruct` | `llama3.3:70b` |
| `Qwen/Qwen2.5-Coder-7B-Instruct` | `qwen2.5-coder:7b` |
| `Qwen/Qwen2.5-72B-Instruct` | `qwen2.5:72b` |
| `deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct` | `deepseek-coder-v2:16b` |
| `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B` | `deepseek-r1:32b` |
| `google/gemma-2-9b-it` | `gemma2:9b` |
| `mistralai/Mistral-7B-Instruct-v0.3` | `mistral:7b` |
| `microsoft/Phi-3-mini-4k-instruct` | `phi3:mini` |
| `microsoft/Phi-4-mini-instruct` | `phi4-mini` |

Then update `openclaw.json`:

```json
{
  "models": {
    "providers": {
      "ollama": {
        "models": ["ollama/<ollama-tag>"]
      }
    }
  }
}
```

And optionally set as default:

```json
{
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/<ollama-tag>"
      }
    }
  }
}
```

### For vLLM / LM Studio

Use the HuggingFace model name directly as the model identifier with the appropriate provider prefix (`vllm/` or `lmstudio/`).

## Workflow example

When a user asks "what local models can I run?":

1. Run `llmfit --json system` to show hardware summary
2. Run `llmfit recommend --json --limit 5` to get top picks
3. Present the recommendations with scores and fit levels
4. If the user wants to configure one, map it to the appropriate Ollama/vLLM/LM Studio tag
5. Offer to update `openclaw.json` with the chosen model

When a user asks for a specific use case like "recommend a coding model":

1. Run `llmfit recommend --json --use-case coding --limit 3`
2. Present the coding-specific recommendations
3. Offer to pull via Ollama and configure

## Notes

- llmfit detects NVIDIA GPUs (via nvidia-smi), AMD GPUs (via rocm-smi), and Apple Silicon (unified memory).
- Multi-GPU setups aggregate VRAM across cards automatically.
- The `best_quant` field tells you the optimal quantization — higher quant (Q6_K, Q8_0) means better quality if VRAM allows.
- Speed estimates (`estimated_tps`) are approximate and vary by hardware and quantization.
- Models with `fit_level: "TooTight"` should never be recommended to users.