llmfit-hardware-model-matcher — 技能

技能详情（站内镜像，无评论）

Terminal tool that detects your hardware and recommends which LLM models will actually run well on your system

AI 与大模型

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.0

统计：⭐ 0 · 39 · 1 current installs · 1 all-time installs

⭐ 0

安装量（当前） 1

🛡 VirusTotal ：可疑 · OpenClaw ：可疑

Package：adisinghstudent/llmfit-hardware-model-matcher

安全扫描（ClawHub）

VirusTotal ：可疑
OpenClaw ：可疑

OpenClaw 评估

The skill's purpose and runtime instructions are coherent, but provenance and install instructions are inconsistent and it instructs users to run remote install scripts (curl | sh) and to run a network service, which are risky without verification.

目的

The name/description (detect hardware and recommend runnable LLMs) match the SKILL.md commands and features (system detection, fit/recommend/plan, provider runtimes). There are no unexpected credential or file-access demands for this functionality.

说明范围

Instructions stay within the stated purpose (detect hardware, score models, run a local API). However the SKILL.md instructs running a local HTTP server (llmfit serve --host 0.0.0.0) which exposes a service to the network and the tool will need to probe hardware (nvidia-smi, system info) — both are expected for the feature but have operational risk and should be run only after verifying the binary/source.

安装机制

The file contains multiple install paths: brew, scoop, Docker, git clone, and a quick-install curl -fsSL https://llmfit.axjns.dev/install.sh | sh (pipe-to-shell). The curl|sh approach pulls and executes a remote script from an unfamiliar domain (llmfit.axjns.dev) — this is high-risk. The Docker image and GitHub repo references (ghcr.io/alexsjones and github.com/AlexsJones) appear, but the skill metadata lists no homepage/source and elsewhere r…

证书

The skill declares no required environment variables or credentials. The SKILL.md mentions an optional OLLAMA_CONTEXT_LENGTH env var and hardware overrides; nothing requests unrelated secrets or broad credential access.

持久

always:false (good). The skill suggests running a long-lived REST server bound to 0.0.0.0, which can expose local machine services if started; this is an operational risk rather than a policy/privilege mis-declaration. Autonomous invocation is allowed (default) but not combined with other broad privileges here.

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「llmfit-hardware-model-matcher」。简介：Terminal tool that detects your hardware and recommends which LLM models will a…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/adisinghstudent/llmfit-hardware-model-matcher/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: llmfit-hardware-model-matcher
description: Terminal tool that detects your hardware and recommends which LLM models will actually run well on your system
triggers:
  - "find LLM models that fit my hardware"
  - "which AI models can I run locally"
  - "recommend models for my GPU RAM"
  - "check if a model will run on my machine"
  - "llmfit model recommendations"
  - "local LLM hardware compatibility"
  - "what LLM fits my system specs"
  - "score models for my computer"
---

# llmfit Hardware Model Matcher

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner).

---

## Installation

### macOS / Linux (Homebrew)
```sh
brew install llmfit
```

### Quick install script
```sh
curl -fsSL https://llmfit.axjns.dev/install.sh | sh

# Without sudo, installs to ~/.local/bin
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local
```

### Windows (Scoop)
```sh
scoop install llmfit
```

### Docker / Podman
```sh
docker run ghcr.io/alexsjones/llmfit

# With jq for scripting
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
```

### From source (Rust)
```sh
git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# binary at target/release/llmfit
```

---

## Core Concepts

- **Fit tiers**: `perfect` (runs great), `good` (runs well), `marginal` (runs but tight), `too_tight` (won't run)
- **Scoring dimensions**: quality, speed (tok/s estimate), fit (memory headroom), context capacity
- **Run modes**: GPU, CPU+GPU offload, CPU-only, MoE
- **Quantization**: automatically selects best quant (e.g. Q4_K_M, Q5_K_S, mlx-4bit) for your hardware
- **Providers**: Ollama, llama.cpp, MLX, Docker Model Runner

---

## Key Commands

### Launch Interactive TUI
```sh
llmfit
```

### CLI Table Output
```sh
llmfit --cli
```

### Show System Hardware Detection
```sh
llmfit system
llmfit --json system   # JSON output
```

### List All Models
```sh
llmfit list
```

### Search Models
```sh
llmfit search "llama 8b"
llmfit search "mistral"
llmfit search "qwen coding"
```

### Fit Analysis
```sh
# All runnable models ranked by fit
llmfit fit

# Only perfect fits, top 5
llmfit fit --perfect -n 5

# JSON output
llmfit --json fit -n 10
```

### Model Detail
```sh
llmfit info "Mistral-7B"
llmfit info "Llama-3.1-70B"
```

### Recommendations
```sh
# Top 5 recommendations (JSON default)
llmfit recommend --json --limit 5

# Filter by use case: general, coding, reasoning, chat, multimodal, embedding
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 5
```

### Hardware Planning (invert: what hardware do I need?)
```sh
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json
```

### REST API Server (for cluster scheduling)
```sh
llmfit serve
llmfit serve --host 0.0.0.0 --port 8787
```

---

## Hardware Overrides

When autodetection fails (VMs, broken nvidia-smi, passthrough setups):

```sh
# Override GPU VRAM
llmfit --memory=32G
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --memory=24G recommend --json

# Megabytes
llmfit --memory=32000M

# Works with any subcommand
llmfit --memory=16G info "Llama-3.1-70B"
```

Accepted suffixes: `G`/`GB`/`GiB`, `M`/`MB`/`MiB`, `T`/`TB`/`TiB` (case-insensitive).

### Context Length Cap
```sh
# Estimate memory fit at 4K context
llmfit --max-context 4096 --cli

# With subcommands
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5

# Environment variable alternative
export OLLAMA_CONTEXT_LENGTH=8192
llmfit recommend --json
```

---

## REST API Reference

Start the server:
```sh
llmfit serve --host 0.0.0.0 --port 8787
```

### Endpoints

```sh
# Health check
curl http://localhost:8787/health

# Node hardware info
curl http://localhost:8787/api/v1/system

# Full model list with filters
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

# Top runnable models for this node (key scheduling endpoint)
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

# Search by model name/provider
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"
```

### Query Parameters for `/models` and `/models/top`

| Param | Values | Description |
|---|---|---|
| `limit` / `n` | integer | Max rows returned |
| `min_fit` | `perfect|good|marginal|too_tight` | Minimum fit tier |
| `perfect` | `true|false` | Force perfect-only |
| `runtime` | `any|mlx|llamacpp` | Filter by runtime |
| `use_case` | `general|coding|reasoning|chat|multimodal|embedding` | Use case filter |
| `provider` | string | Substring match on provider |
| `search` | string | Free-text across name/provider/size/use-case |
| `sort` | `score|tps|params|mem|ctx|date|use_case` | Sort column |
| `include_too_tight` | `true|false` | Include non-runnable models |
| `max_context` | integer | Per-request context cap |

---

## Scripting & Automation Examples

### Bash: Get top coding models as JSON
```bash
#!/bin/bash
# Get top 3 coding models that fit perfectly
llmfit recommend --json --use-case coding --limit 3 | 
  jq -r '.models[] | "(.name) ((.score)) - (.quantization)"'
```

### Bash: Check if a specific model fits
```bash
#!/bin/bash
MODEL="Mistral-7B"
RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)
FIT=$(echo "$RESULT" | jq -r '.fit')
if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then
  echo "$MODEL will run well (fit: $FIT)"
else
  echo "$MODEL may not run well (fit: $FIT)"
fi
```

### Bash: Auto-pull top Ollama model
```bash
#!/bin/bash
# Get the top fitting model name and pull it with Ollama
TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
echo "Pulling: $TOP_MODEL"
ollama pull "$TOP_MODEL"
```

### Python: Query the REST API
```python
import requests

BASE_URL = "http://localhost:8787"

def get_system_info():
    resp = requests.get(f"{BASE_URL}/api/v1/system")
    return resp.json()

def get_top_models(use_case="coding", limit=5, min_fit="good"):
    params = {
        "use_case": use_case,
        "limit": limit,
        "min_fit": min_fit,
        "sort": "score"
    }
    resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)
    return resp.json()

def search_models(query, runtime="any"):
    resp = requests.get(
        f"{BASE_URL}/api/v1/models/{query}",
        params={"runtime": runtime}
    )
    return resp.json()

# Example usage
system = get_system_info()
print(f"GPU: {system.get('gpu_name')} | VRAM: {system.get('vram_gb')}GB")

models = get_top_models(use_case="reasoning", limit=3)
for m in models.get("models", []):
    print(f"{m['name']}: score={m['score']}, fit={m['fit']}, quant={m['quantization']}")
```

### Python: Hardware-aware model selector for agents
```python
import subprocess
import json

def get_best_model_for_task(use_case: str, min_fit: str = "good") -> dict:
    """Use llmfit to select the best model for a given task."""
    result = subprocess.run(
        ["llmfit", "recommend", "--json", "--use-case", use_case, "--limit", "1"],
        capture_output=True,
        text=True
    )
    data = json.loads(result.stdout)
    models = data.get("models", [])
    return models[0] if models else None

def plan_hardware_requirements(model_name: str, context: int = 4096) -> dict:
    """Get hardware requirements for running a specific model."""
    result = subprocess.run(
        ["llmfit", "plan", model_name, "--context", str(context), "--json"],
        capture_output=True,
        text=True
    )
    return json.loads(result.stdout)

# Select best coding model
best = get_best_model_for_task("coding")
if best:
    print(f"Best coding model: {best['name']}")
    print(f"  Quantization: {best['quantization']}")
    print(f"  Estimated tok/s: {best['tps']}")
    print(f"  Memory usage: {best['mem_pct']}%")

# Plan hardware for a specific model
plan = plan_hardware_requirements("Qwen/Qwen3-4B-MLX-4bit", context=8192)
print(f"Min VRAM needed: {plan['hardware']['min_vram_gb']}GB")
print(f"Recommended VRAM: {plan['hardware']['recommended_vram_gb']}GB")
```

### Docker Compose: Node scheduler pattern
```yaml
version: "3.8"
services:
  llmfit-api:
    image: ghcr.io/alexsjones/llmfit
    command: serve --host 0.0.0.0 --port 8787
    ports:
      - "8787:8787"
    environment:
      - OLLAMA_CONTEXT_LENGTH=8192
    devices:
      - /dev/nvidia0:/dev/nvidia0  # pass GPU through
```

---

## TUI Key Reference

| Key | Action |
|---|---|
| `↑`/`↓` or `j`/`k` | Navigate models |
| `/` | Search (name, provider, params, use case) |
| `Esc`/`Enter` | Exit search |
| `Ctrl-U` | Clear search |
| `f` | Cycle fit filter: All → Runnable → Perfect → Good → Marginal |
| `a` | Cycle availability: All → GGUF Avail → Installed |
| `s` | Cycle sort: Score → Params → Mem% → Ctx → Date → Use Case |
| `t` | Cycle color theme (auto-saved) |
| `v` | Visual mode (multi-select for comparison) |
| `V` | Select mode (column-based filtering) |
| `p` | Plan mode (what hardware needed for this model?) |
| `P` | Provider filter popup |
| `U` | Use-case filter popup |
| `C` | Capability filter popup |
| `m` | Mark model for comparison |
| `c` | Compare view (marked vs selected) |
| `d` | Download model (via detected runtime) |
| `r` | Refresh installed models from runtimes |
| `Enter` | Toggle detail view |
| `g`/`G` | Jump to top/bottom |
| `q` | Quit |

### Themes
`t` cycles: Default → Dracula → Solarized → Nord → Monokai → Gruvbox  
Theme saved to `~/.config/llmfit/theme`

---

## GPU Detection Details

| GPU Vendor | Detection Method |
|---|---|
| NVIDIA | `nvidia-smi` (multi-GPU, aggregates VRAM) |
| AMD | `rocm-smi` |
| Intel Arc | sysfs (discrete) / `lspci` (integrated) |
| Apple Silicon | `system_profiler` (unified memory = VRAM) |
| Ascend | `npu-smi` |

---

## Common Patterns

### "What can I run on my 16GB M2 Mac?"
```sh
llmfit fit --perfect -n 10
# or interactively
llmfit
# press 'f' to filter to Perfect fit
```

### "I have a 3090 (24GB VRAM), what coding models fit?"
```sh
llmfit recommend --json --use-case coding | jq '.models[]'
# or with manual override if detection fails
llmfit --memory=24G recommend --json --use-case coding
```

### "Can Llama 70B run on my machine?"
```sh
llmfit info "Llama-3.1-70B"
# Plan what hardware you'd need
llmfit plan "Llama-3.1-70B" --context 4096 --json
```

### "Show me only models already installed in Ollama"
```sh
llmfit
# press 'a' to cycle to Installed filter
# or
llmfit fit -n 20  # run, press 'i' in TUI for installed-first
```

### "Script: find best model and start Ollama"
```bash
MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
ollama serve &
ollama run "$MODEL"
```

### "API: poll node capabilities for cluster scheduler"
```bash
# Check node, get top 3 good+ models for reasoning
curl -s "http://node1:8787/api/v1/models/top?limit=3&min_fit=good&use_case=reasoning" | 
  jq '.models[].name'
```

---

## Troubleshooting

**GPU not detected / wrong VRAM reported**
```sh
# Verify detection
llmfit system

# Manual override
llmfit --memory=24G --cli
```

**`nvidia-smi` not found but you have an NVIDIA GPU**
```sh
# Install CUDA toolkit or nvidia-utils, then retry
# Or override manually:
llmfit --memory=8G fit --perfect
```

**Models show as too_tight but you have enough RAM**
```sh
# llmfit may be using context-inflated estimates; cap context
llmfit --max-context 2048 fit --perfect -n 10
```

**REST API: test endpoints**
```sh
# Spawn server and run validation suite
python3 scripts/test_api.py --spawn

# Test already-running server
python3 scripts/test_api.py --base-url http://127.0.0.1:8787
```

**Apple Silicon: VRAM shows as system RAM (expected)**
```sh
# This is correct — Apple Silicon uses unified memory
# llmfit accounts for this automatically
llmfit system  # should show backend: Metal
```

**Context length environment variable**
```sh
export OLLAMA_CONTEXT_LENGTH=4096
llmfit recommend --json  # uses 4096 as context cap
```