Speak Turbo - Talk to your Claude 90ms latency! — 技能

技能详情（站内镜像，无评论）

Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency....

媒体与内容

作者：Jay @emzod

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.7

统计：⭐ 0 · 531 · 0 current installs · 0 all-time installs

⭐ 0

安装量（当前） 0

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：speakturbo-tts

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill's code, docs, and runtime instructions are consistent with a local TTS daemon/CLI: it requires no external credentials, binds to localhost, and its install/runtime behavior matches the described purpose.

目的

Name/description (low-latency local TTS for agents) matches the code and SKILL.md: a local FastAPI daemon (pocket-tts) + Rust/Python CLI that streams audio from http://127.0.0.1:7125. Required capabilities (none) align with a purely local service.

说明范围

SKILL.md and CLI/daemon code stick to the stated scope: starting a local daemon, calling GET /tts and /health on localhost, validating input, and writing audio to allowed paths. There are no instructions to read unrelated files, exfiltrate secrets, or call external endpoints from the runtime code (aside from installing dependencies during setup).

安装机制

The registry entry has no formal install spec, but the bundle includes an install.sh which pip-installs dependencies and may build a Rust binary. pip/cargo will fetch packages from upstream registries if run by the user — this is expected for native Python/Rust projects but is a point for users to review (see guidance). No downloads from obscure/personal servers are embedded.

证书

No environment variables, credentials, or secret/config paths are requested. The skill writes to ~/.local/bin and ~/.speakturbo and creates a PID file under the user's home directory, which is reasonable for a local CLI/daemon. No unrelated service credentials are required.

持久

The skill is not force-enabled (always: false) and can be invoked by the agent. It persists minimally under the user's home (~/.speakturbo, ~/.local/bin) and writes a daemon PID file; it does not modify other skills or system-wide configs. Users should be aware of files created under their home and the optional daemon background process.

综合结论

This package appears coherent for a local, privacy-focused TTS service. Before installing, consider: (1) install.sh will run pip install and optionally cargo build — these commands will fetch packages from upstream registries (review pocket-tts and other dependencies before running). (2) The installer writes a CLI into ~/.local/bin and creates ~/.speakturbo (daemon PID and optional config). If you run the daemon, it listens only on 127.0.0.1:7…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Speak Turbo - Talk to your Claude 90ms latency!」。简介：Give your agent the ability to speak to you real-time. Talk to your Claude! Ult…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/emzod/speakturbo-tts/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: speakturbo-tts
description: Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
---

# speakturbo - Talk to your Claude!

Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.

## Quick Start

```bash
# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: ⚡ 92ms → ▶ 93ms → ✓ 1245ms

# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav  # Should show ~50-100KB file
```

**Output explained:** `⚡` = first audio received, `▶` = playback started, `✓` = done

## First Run

The **first execution takes 2-5 seconds** while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.

```bash
# First run (slow - daemon starting)
speakturbo "Starting up"  # ~2-5 seconds

# Second run (fast - daemon already running)
speakturbo "Now I'm fast"  # ~90ms
```

## Usage

```bash
# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"

# Save to file (no audio playback)
speakturbo "Hello" -o output.wav

# Save to specific file
speakturbo "Goodbye" -o goodbye.wav

# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q

# List available voices
speakturbo --list-voices
```

## Available Voices

| Voice | Type |
|-------|------|
| `alba` | Female (default) |
| `marius` | Male |
| `javert` | Male |
| `jean` | Male |
| `fantine` | Female |
| `cosette` | Female |
| `eponine` | Female |
| `azelma` | Female |

## Performance

| Metric | Value |
|--------|-------|
| Time to first sound | ~90ms (daemon warm) |
| First run | 2-5s (daemon startup) |
| Real-time factor | ~4x faster |
| Sample rate | 24kHz mono |

## Architecture

```
speakturbo (Rust CLI, 2.2MB)
    │
    │ HTTP streaming (port 7125)
    ▼
speakturbo-daemon (Python + pocket-tts)
    │
    │ Model in memory, auto-shutdown after 1hr idle
    ▼
Audio playback (rodio)
```

## Text Input

- **Encoding:** UTF-8
- **Quotes in text:** Use escaping: `speakturbo "She said "hello""`
- **Long text:** Supported, streams as it generates

## Output Path Security

The `-o` flag only writes to directories that are on the allowlist. By default, these are:

- `/tmp` and system temp directories
- Your current working directory
- `~/.speakturbo/`

If you need to write elsewhere, use `--allow-dir`:

```bash
speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path
```

To permanently allow a directory, add it to `~/.speakturbo/config`:

```bash
mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config
```

The config file is one directory per line. Lines starting with `#` are comments.

## Exit Codes

| Code | Meaning |
|------|---------|
| 0 | Success (audio played/saved) |
| 1 | Error (daemon connection failed, invalid args) |

## When to Use

**Use speakturbo when:**
- You need instant audio feedback (~90ms)
- Speed matters more than voice variety
- Built-in voices are sufficient

**Use `speak` instead when:**
- You need custom voice cloning (Morgan Freeman, etc.)
  → `speak "text" --voice ~/.chatter/voices/morgan_freeman.wav`
- You need emotion tags like `[laugh]`, `[sigh]`
- Quality/variety matters more than speed

See the `speak` skill documentation for full usage.

## Troubleshooting

**No audio plays:**
```bash
# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}

# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav  # macOS
aplay /tmp/test.wav   # Linux
```

**Daemon won't start:**
```bash
# Check port availability
lsof -i :7125

# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test"  # Auto-restarts daemon
```

**First run is slow:**
This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).

## Daemon Management

The daemon auto-starts on first use and **auto-shuts down after 1 hour idle**.

```bash
# Check status
curl http://127.0.0.1:7125/health

# Manual stop
pkill -f "daemon_streaming"

# View logs
cat /tmp/speakturbo.log
```

## Comparison with speak

| Feature | speakturbo | speak |
|---------|------------|-------|
| Time to first sound | ~90ms | ~4-8s |
| Voice cloning | ❌ | ✅ |
| Emotion tags | ❌ | ✅ |
| Voices | 8 built-in | Custom wav files |
| Engine | pocket-tts | Chatterbox |