ListenHub Asr — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Transcribe audio files to text using local speech recognition. Triggers on: "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".

媒体与内容

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v0.1.0

统计：⭐ 0 · 133 · 0 current installs · 0 all-time installs

⭐ 0

安装量（当前） 0

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：0xfango/marswave-asr

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill's requests and instructions are coherent with its stated purpose (local offline transcription); it asks to run the local coli CLI, may write small config files, and may auto-download speech models, but it does not request unrelated credentials or surprising permissions.

目的

The skill's purpose (local ASR via the coli CLI) matches its instructions: it checks for coli and ffmpeg, runs `coli asr`, and parses JSON output. Nothing in the metadata or SKILL.md requests unrelated services or credentials.

说明范围

Instructions perform local environment checks (which/which ffmpeg), read/write small config files in the current directory and $HOME, run `coli asr` which may auto-download models, and may write transcript Markdown files to the current working directory. These are within scope but are persistent file operations and involve network downloads initiated by the coli tool.

安装机制

This is an instruction-only skill with no install spec. The SKILL.md suggests installing `@marswave/coli` via npm if missing, but the skill itself does not fetch or execute remote archives. Risk from installs is therefore limited to user-initiated npm/brew/apt commands.

证书

The skill declares no required environment variables or credentials. It only references local paths (config dirs and ~/.coli/models) appropriate to running a local ASR CLI. No unrelated secrets are requested.

持久

The skill writes config to .listenhub/asr in the current directory and $HOME/.listenhub/asr, and `coli` may persist models under ~/.coli/models (~60MB). always:false so it is not force-enabled, but it does create files and download models when run.

综合结论

This skill appears to do what it says: local transcription via the coli CLI. Before installing/using it, be aware that: - It will create small config files in the current directory and in $HOME (~/.listenhub/asr). - The coli CLI may auto-download speech models (~60MB) into ~/.coli/models; this involves network download and disk usage. - If coli is missing the skill will recommend `npm install -g @marswave/coli` — review that npm package and it…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「ListenHub Asr」。简介：Transcribe audio files to text using local speech recognition. Triggers on: "转录…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/0xfango/marswave-asr/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: asr
metadata:
  openclaw:
    emoji: "🎙️"
    requires:
      tools: ["coli"]
description: |
  Transcribe audio files to text using local speech recognition. Triggers on:
  "转录", "transcribe", "语音转文字", "ASR", "识别音频", "把这段音频转成文字".
---

## When to Use

- User wants to transcribe an audio file to text
- User provides an audio file path and asks for transcription
- User says "转录", "识别", "transcribe", "语音转文字"

## When NOT to Use

- User wants to synthesize speech from text (use `/tts`)
- User wants to create a podcast or explainer (use `/podcast` or `/explainer`)

## Purpose

Transcribe audio files to text using `coli asr`, which runs fully offline via local
speech recognition models. No API key required. Supports Chinese, English, Japanese,
Korean, and Cantonese (sensevoice model) or English-only (whisper model).

Run `coli asr --help` for current CLI options and supported flags.

## Hard Constraints

- No shell scripts. Use direct commands only.
- Always read config following `shared/config-pattern.md` before any interaction
- Follow `shared/common-patterns.md` for interaction patterns
- Never ask more than one question at a time

<HARD-GATE>
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as
plain text. Ask one question at a time. Wait for the user's answer before proceeding.
After all parameters are collected, summarize and ask the user to confirm before
running any transcription.

</HARD-GATE>

## Interaction Flow

### Step 0: Prerequisites Check

Before config setup, silently check the environment:

```bash
COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)
```

| Issue | Action |
|-------|--------|
| `coli` not found | Block. Tell user to run `npm install -g @marswave/coli` first |
| `ffmpeg` not found | Warn (WAV files still work). Suggest `brew install ffmpeg` / `sudo apt install ffmpeg` |
| Models not downloaded | Inform user: first transcription will auto-download models (~60MB) to `~/.coli/models/` |

If `coli` is missing, stop here and do not proceed.

### Step 0: Config Setup

Follow `shared/config-pattern.md` Step 0.

Initial defaults:
```bash
# 当前目录:
mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"

# 全局:
mkdir -p "$HOME/.listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json"
CONFIG_PATH="$HOME/.listenhub/asr/config.json"
```

Config summary display:
```
当前配置 (asr)：
  模型：sensevoice / whisper-tiny.en
  润色：开启 / 关闭
```

### Setup Flow (first run or reconfigure)

Ask in order:

1. **model**: "默认使用哪个语音识别模型？"
   - "sensevoice（推荐）" — 支持中英日韩粤，可检测语言、情绪、音频事件
   - "whisper-tiny.en" — 仅英文

3. **polish**: "转录后由 AI 润色文本？（修正标点、去语气词、提升可读性）"
   - "是（推荐）" → `polish: true`
   - "否，保留原始转录" → `polish: false`

Save all answers at once after collecting them.

### Step 1: Get Audio File

If the user hasn't provided a file path, ask:

> "请提供要转录的音频文件路径。"

Verify the file exists before proceeding.

### Step 2: Confirm

```
准备转录：

  文件：{filename}
  模型：{model}
  润色：{是 / 否}

继续？
```

### Step 3: Transcribe

Run `coli asr` with JSON output (to get metadata):

```bash
coli asr -j --model {model} "{file}"
```

On first run, `coli` will automatically download the required model. This may take a
moment — inform the user if models haven't been downloaded yet.

Parse the JSON result to extract `text`, `lang`, `emotion`, `event`, `duration`.

### Step 4: Polish (if enabled)

If `polish` is `true`, take the raw `text` from the transcription result and rewrite
it to fix punctuation, remove filler words, and improve readability. Preserve the
original meaning and speaker intent. Do not summarize or paraphrase.

### Step 5: Present Result

Display the transcript directly in the conversation:

```
转录完成

{transcript text}

─────────────────
语言：{lang} · 情绪：{emotion} · 时长：{duration}s
```

If polished, show the polished version with a note that it was AI-refined. Offer to
show the raw original on request.

### Step 6: Export as Markdown (optional)

After presenting the result, ask:

```
Question: "保存为 Markdown 文件到当前目录？"
Options:
  - "是" — save to current directory
  - "否" — done
```

If yes, write `{audio-filename}-transcript.md` to the **current working directory**
(where the user is running Claude Code). The file should contain the transcript text
(polished version if polish was enabled), with a front-matter header:

```markdown
---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}
```

## Composability

- **Invoked by**: future skills that need to transcribe recorded audio
- **Invokes**: nothing

## Examples

> "帮我转录这个文件 meeting.m4a"

1. Check prerequisites
2. Read config
3. Confirm: meeting.m4a, sensevoice, polish on
4. Run `coli asr -j --model sensevoice "meeting.m4a"`
5. Polish the raw text
6. Display inline

> "transcribe interview.wav, no polish"

1. Check prerequisites
2. Read config
3. Override polish to false for this session
4. Run `coli asr -j --model sensevoice "interview.wav"`
5. Display raw transcript inline