Speechall command-line tool for fast speech-to-text transcription using multiple providers — 技能

技能详情（站内镜像，无评论）

Install and use the speechall CLI tool for speech-to-text transcription. Use when the user wants to: (1) transcribe audio or video files to text, (2) install speechall on macOS or Linux, (3) list available STT models and their capabilities, (4) use speaker diarization, subtitl…

媒体与内容

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v0.1.1

统计：⭐ 0 · 1.2k · 0 current installs · 0 all-time installs

⭐ 0

安装量（当前） 0

🛡 VirusTotal ：良性 · OpenClaw ：可疑

Package：atacan/speechall-cli

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：可疑

OpenClaw 评估

The skill's instructions and purpose are generally coherent for a CLI wrapper around an external Speechall service, but the runtime instructions require an API key and reference external download/homebrew taps while the skill metadata does not declare any required credentials or a homepage — this metadata/instructions mismatch and the unknown source warrant caution.

目的

The name/description match the SKILL.md: it documents installing and using a speechall CLI to transcribe audio via the Speechall API and lists sensible commands and options (transcribe, models, diarization, output formats). The claimed multi-provider support is plausible for a single aggregator API.

说明范围

The SKILL.md stays within the stated purpose: it only instructs how to install the CLI, set an API key, and run transcription/model-listing commands. It references external endpoints (speechall.com console and GitHub releases) which is expected, but the instructions explicitly require an API key (SPEECHALL_API_KEY) even though the skill metadata did not declare any required env vars — see environment_proportionality.

安装机制

There is no install spec in the registry (instruction-only), and the SKILL.md suggests installing via Homebrew or downloading GitHub releases. Those are standard distribution channels; nothing in the instructions calls out obscure or shortener URLs or archive extraction behavior. However, there is no published homepage/source in the metadata to verify the tap or GitHub repo beyond the URLs in the instructions.

证书

The SKILL.md requires an API key supplied via SPEECHALL_API_KEY or a flag and points users to speechall.com to create keys, yet the registry metadata lists no required env vars/primary credential. That mismatch is an incoherence: the skill effectively needs a sensitive credential but does not declare it. Also the skill references external provider names (OpenAI, Deepgram, Google, etc.) which might imply additional credentials in some setups — …

持久

The skill does not request always:true, does not include an install spec that writes to disk via the registry, and does not ask to modify other skills or system config. It only instructs the agent/user to install a CLI locally, so no elevated persistent privileges are requested by the skill itself.

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Speechall command-line tool for fast speech-to-text transcription using multiple providers」。简介：Install and use the speechall CLI tool for speech-to-text transcription. Use wh…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/atacan/speechall-cli/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: speechall-cli
description: "Install and use the speechall CLI tool for speech-to-text transcription. Use when the user wants to: (1) transcribe audio or video files to text, (2) install speechall on macOS or Linux, (3) list available STT models and their capabilities, (4) use speaker diarization, subtitles, or other transcription features from the terminal. Triggers on mentions of speechall, audio transcription CLI, or speech-to-text from the command line."
---

# speechall-cli

CLI for speech-to-text transcription via the Speechall API. Supports multiple providers (OpenAI, Deepgram, AssemblyAI, Google, Gemini, Groq, ElevenLabs, Cloudflare, and more).

## Installation

### Homebrew (macOS and Linux)

```bash
brew install Speechall/tap/speechall
```

**Without Homebrew**: Download the binary for your platform from https://github.com/Speechall/speechall-cli/releases and place it on your `PATH`.

### Verify

```bash
speechall --version
```

## Authentication

An API key is required. Provide it via environment variable (preferred) or flag:

```bash
export SPEECHALL_API_KEY="your-key-here"
# or
speechall --api-key "your-key-here" audio.wav
```

The user can create an API key on https://speechall.com/console/api-keys

## Commands

### transcribe (default)

Transcribe an audio or video file. This is the default subcommand — `speechall audio.wav` is equivalent to `speechall transcribe audio.wav`.

```bash
speechall <file> [options]
```

**Options:**

| Flag | Description | Default |
|---|---|---|
| `--model <provider.model>` | STT model identifier | `openai.gpt-4o-mini-transcribe` |
| `--language <code>` | Language code (e.g. `en`, `tr`, `de`) | API default (auto-detect) |
| `--output-format <format>` | Output format (`text`, `json`, `verbose_json`, `srt`, `vtt`) | API default |
| `--diarization` | Enable speaker diarization | off |
| `--speakers-expected <n>` | Expected number of speakers (use with `--diarization`) | — |
| `--no-punctuation` | Disable automatic punctuation | — |
| `--temperature <0.0-1.0>` | Model temperature | — |
| `--initial-prompt <text>` | Text prompt to guide model style | — |
| `--custom-vocabulary <term>` | Terms to boost recognition (repeatable) | — |
| `--ruleset-id <uuid>` | Replacement ruleset UUID | — |
| `--api-key <key>` | API key (overrides `SPEECHALL_API_KEY` env var) | — |

**Examples:**

```bash
# Basic transcription
speechall interview.mp3

# Specific model and language
speechall call.wav --model deepgram.nova-2 --language en

# Speaker diarization with SRT output
speechall meeting.wav --diarization --speakers-expected 3 --output-format srt

# Custom vocabulary for domain-specific terms
speechall medical.wav --custom-vocabulary "myocardial" --custom-vocabulary "infarction"

# Transcribe a video file (macOS extracts audio automatically)
speechall presentation.mp4
```

### models

List available speech-to-text models. Outputs JSON to stdout. Filters combine with AND logic.

```bash
speechall models [options]
```

**Filter flags:**

| Flag | Description |
|---|---|
| `--provider <name>` | Filter by provider (e.g. `openai`, `deepgram`) |
| `--language <code>` | Filter by supported language (`tr` matches `tr`, `tr-TR`, `tr-CY`) |
| `--diarization` | Only models supporting speaker diarization |
| `--srt` | Only models supporting SRT output |
| `--vtt` | Only models supporting VTT output |
| `--punctuation` | Only models supporting automatic punctuation |
| `--streamable` | Only models supporting real-time streaming |
| `--vocabulary` | Only models supporting custom vocabulary |

**Examples:**

```bash
# List all available models
speechall models

# Models from a specific provider
speechall models --provider deepgram

# Models that support Turkish and diarization
speechall models --language tr --diarization

# Pipe to jq for specific fields
speechall models --provider openai | jq '.[].identifier'
```

## Tips

- On macOS, video files (`.mp4`, `.mov`, etc.) are automatically converted to audio before upload.
- On Linux, pass audio files directly (`.wav`, `.mp3`, `.m4a`, `.flac`, etc.).
- Output goes to stdout. Redirect to save: `speechall audio.wav > transcript.txt`
- Errors go to stderr, so piping stdout is safe.
- Run `speechall --help`, `speechall transcribe --help`, or `speechall models --help` to see all valid enum values for model identifiers, language codes, and output formats.