VoiceClaw — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages using local Whisper (whisper.cpp) and generate voice replies using local Piper T...

通信与消息

作者：Asif @asif2bd

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.6

统计：⭐ 0 · 350 · 3 current installs · 3 all-time installs

⭐ 0

安装量（当前） 3

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：asif2bd/voiceclaw

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

VoiceClaw's files, scripts, and SKILL.md are internally consistent with its stated purpose: offline local STT/TTS using whisper, piper, and ffmpeg, and it does not request unrelated credentials or network access.

综合结论

This skill appears to be what it says: an offline STT/TTS helper that runs local whisper/piper/ffmpeg. Before installing: ensure you obtain whisper, piper, ffmpeg, and voice/model files from trusted sources; verify the model files exist (scripts will error otherwise); note that any one-time model download (documented in README) will contact the network only if you run the manual curl/git commands yourself; review and test the included scripts …

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「VoiceClaw」。简介：Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages us…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/asif2bd/voiceclaw/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: voiceclaw
description: "Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages using local Whisper (whisper.cpp) and generate voice replies using local Piper TTS. Requires whisper, piper, and ffmpeg pre-installed on the system. All inference runs on-device — no network calls, no cloud APIs, no API keys. Use when an agent receives a voice/audio message and should respond in both voice and text, or when any text response should be synthesized and sent as audio. Triggers on: voice messages, audio attachments, respond in voice, send as audio, speak this, voiceclaw."
metadata:
  {
    "openclaw":
      {
        "requires": { "bins": ["whisper", "piper", "ffmpeg"] },
        "network": "none",
        "env":
          [
            { "name": "WHISPER_BIN", "description": "Path to whisper binary (default: auto-detected via which)" },
            { "name": "WHISPER_MODEL", "description": "Path to ggml-base.en.bin model file (default: ~/.cache/whisper/ggml-base.en.bin)" },
            { "name": "PIPER_BIN", "description": "Path to piper binary (default: auto-detected via which)" },
            { "name": "VOICECLAW_VOICES_DIR", "description": "Path to directory containing .onnx voice model files (default: ~/.local/share/piper/voices)" }
          ]
      }
  }
---

# VoiceClaw

Local-only voice I/O for OpenClaw agents.

- **STT:** `transcribe.sh` — converts audio to text via local Whisper binary
- **TTS:** `speak.sh` — converts text to speech via local Piper binary
- **Network calls: none** — both scripts run fully offline
- **No cloud APIs, no API keys required**

---

## Prerequisites

The following must be installed on the system before using this skill:

| Requirement | Purpose |
|---|---|
| `whisper` binary | Speech-to-text inference |
| `ggml-base.en.bin` model file | Whisper STT model |
| `piper` binary | Text-to-speech synthesis |
| `*.onnx` voice model files | Piper TTS voices |
| `ffmpeg` | Audio format conversion |

See **README.md** for installation and setup instructions.

---

## Environment Variables

| Variable | Default | Purpose |
|---|---|---|
| `WHISPER_BIN` | auto-detected via `which` | Path to whisper binary |
| `WHISPER_MODEL` | `~/.cache/whisper/ggml-base.en.bin` | Path to Whisper model file |
| `PIPER_BIN` | auto-detected via `which` | Path to piper binary |
| `VOICECLAW_VOICES_DIR` | `~/.local/share/piper/voices` | Directory containing `.onnx` voice model files |

---

## Verify Setup

```bash
which whisper && echo "STT binary: OK"
which piper   && echo "TTS binary: OK"
which ffmpeg  && echo "ffmpeg: OK"
ls "${WHISPER_MODEL:-$HOME/.cache/whisper/ggml-base.en.bin}" && echo "STT model: OK"
ls "${VOICECLAW_VOICES_DIR:-$HOME/.local/share/piper/voices}"/*.onnx 2>/dev/null | head -1 && echo "TTS voices: OK"
```

---

## Inbound Voice: Transcribe

```bash
# Transcribe audio → text (supports ogg, mp3, m4a, wav, flac)
TRANSCRIPT=$(bash scripts/transcribe.sh /path/to/audio.ogg)
```

Override model path:
```bash
WHISPER_MODEL=/path/to/ggml-base.en.bin bash scripts/transcribe.sh audio.ogg
```

---

## Outbound Voice: Speak

```bash
# Step 1: Generate WAV (local Piper — no network)
WAV=$(bash scripts/speak.sh "Your response here." /tmp/reply.wav en_US-lessac-medium)

# Step 2: Convert to OGG Opus (Telegram voice requirement)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply.ogg -y -loglevel error

# Step 3: Send via message tool (filePath=/tmp/reply.ogg)
```

Override voice directory:
```bash
VOICECLAW_VOICES_DIR=/path/to/voices bash scripts/speak.sh "Hello." /tmp/reply.wav
```

---

## Available Voices

| Voice | Style |
|---|---|
| `en_US-lessac-medium` | Neutral American (default) |
| `en_US-amy-medium` | Warm American female |
| `en_US-joe-medium` | American male |
| `en_US-kusal-medium` | Expressive American male |
| `en_US-danny-low` | Deep American male (fast) |
| `en_GB-alba-medium` | British female |
| `en_GB-northern_english_male-medium` | Northern British male |

---

## Agent Behavior Rules

1. **Voice in → Voice + Text out.** Always respond with both a voice reply and a text reply when a voice message is received.
2. **Include the transcript.** Show *"🎙️ I heard: [transcript]"* at the top of every text reply to a voice message.
3. **Keep voice responses concise.** Piper TTS works best under ~200 words — summarize for audio, include full detail in text.
4. **Local only.** Never use a cloud TTS/STT API. Only the local `whisper` and `piper` binaries.
5. **Send voice before text.** Send the audio file first, then follow with the text reply.

---

## Full Example

```bash
# 1. Transcribe inbound voice message
TRANSCRIPT=$(bash path/to/voiceclaw/scripts/transcribe.sh /path/to/voice.ogg)

# 2. Compose reply and generate audio
RESPONSE="Deployment complete. All checks passed."
WAV=$(bash path/to/voiceclaw/scripts/speak.sh "$RESPONSE" /tmp/reply_$$.wav)
ffmpeg -i "$WAV" -c:a libopus -b:a 32k /tmp/reply_$$.ogg -y -loglevel error

# 3. Send voice + text
# message(action=send, filePath=/tmp/reply_$$.ogg, ...)
# reply: "🎙️ I heard: $TRANSCRIPTnn$RESPONSE"
```

---

## Troubleshooting

| Issue | Fix |
|---|---|
| `whisper: command not found` | Ensure whisper binary is installed and in PATH |
| Whisper model not found | Set `WHISPER_MODEL=/path/to/ggml-base.en.bin` |
| `piper: command not found` | Ensure piper binary is installed and in PATH |
| Voice model missing | Set `VOICECLAW_VOICES_DIR=/path/to/voices/` |
| OGG won't play on Telegram | Ensure `-c:a libopus` flag in ffmpeg command |