Smallest Ai — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Ultra-fast text-to-speech and speech-to-text via Smallest AI's Lightning v3.1 and Pulse models. Use when the user wants to generate speech, convert text to v...

媒体与内容

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.0.1

统计：⭐ 0 · 90 · 0 current installs · 0 all-time installs

⭐ 0

安装量（当前） 0

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：abhishekmishragithub/smallest-ai

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill's files, instructions, and requested environment (SMALLEST_API_KEY and curl) are consistent with a Smallest AI text-to-speech / speech-to-text integration; nothing in the package appears to request unrelated secrets or perform unexpected system access.

目的

Name/description, scripts, and documentation all describe TTS/STT via Smallest AI and the only required credential is SMALLEST_API_KEY; required binary (curl) is appropriate for the provided curl-based scripts. No unrelated services, credentials, or binaries are requested.

说明范围

SKILL.md and included scripts instruct the agent to call smallest.ai endpoints, synthesize or transcribe audio, and write local media files. The runtime instructions do not ask the agent to read unrelated system files or other environment variables; all file I/O is local (media/tmp) and aligned with the stated functionality.

安装机制

There is no remote install/download step; this is an instruction+scripts skill with bundled scripts and docs. No arbitrary external archives or shortener URLs are used in install steps, lowering install-time risk.

证书

Only SMALLEST_API_KEY is required (declared as primaryEnv). That single credential is appropriate and expected for a third-party TTS/STT provider; no other secrets or config paths are requested.

持久

The skill is not marked always:true, does not request system-wide privileges, and does not modify other skills' configs. Agent autonomous invocation remains default but is not combined with excessive privileges here.

综合结论

This package appears to be a straightforward Smallest AI TTS/STT integration and only needs your Smallest API key and curl. Before installing: (1) verify the skill's origin (source/homepage is listed as unknown here) and prefer official provider repos if available; (2) be aware that any text and audio you send will be transmitted to smallest.ai (so avoid sending sensitive secrets or private audio you don't want shared with the provider); (3) s…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Smallest Ai」。简介：Ultra-fast text-to-speech and speech-to-text via Smallest AI's Lightning v3.1 a…。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/abhishekmishragithub/smallest-ai/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: smallest-ai
description: >
  Ultra-fast text-to-speech and speech-to-text via Smallest AI's Lightning v3.1 and Pulse models.
  Use when the user wants to generate speech, convert text to voice, read text aloud,
  create voice notes, transcribe audio to text, or clone a voice.
  Sub-100ms latency TTS. 64ms TTFT STT. Supports 30+ languages including Hindi and Spanish.
  Voices include sophia, robert, advika, vivaan, camilla, and 80+ more.
metadata:
  openclaw:
    emoji: "⚡"
    requires:
      bins: ["curl"]
      env: ["SMALLEST_API_KEY"]
    primaryEnv: "SMALLEST_API_KEY"
---

# Smallest AI — Ultra-Fast Voice Suite

Text-to-speech (sub-100ms) via Lightning v3.1 and speech-to-text (64ms TTFT) via Pulse.

## Setup

1. Get API key from https://waves.smallest.ai → click "API Key" in left panel
2. Set `SMALLEST_API_KEY` in your environment:
```bash
export SMALLEST_API_KEY="your_key_here"
```

## Defaults

- Default female voice: `sophia` (American English)
- Default male voice: `robert` (American English)
- Default language: `en`
- Default speed: `1.0`
- Default sample rate: `24000`

## Voice Selection Rules

Follow these rules to select the voice:

1. If user explicitly names a voice (e.g. "use advika"), use that voice.
2. If user asks for a **male** voice, use the configured `defaultVoiceMale`.
3. If user asks for a **female** voice, use the configured `defaultVoiceFemale`.
4. If no gender preference, use `defaultVoiceFemale` (sophia by default).
5. For **Hindi** content: use `advika` (female) or `vivaan` (male).
6. For **Spanish** content: use `camilla` (female) or `carlos` (male).
7. For **Tamil** content: use `anitha` (female) or `raju` (male).

Always pass the configured `defaultLanguage`, `defaultSpeed`, and `defaultSampleRate` as `--lang`, `--speed`, and `--rate` flags unless the user overrides them.

## Text-to-Speech

Generate speech audio from text using Lightning v3.1 model.

### Shell (preferred — zero dependencies)

```bash
{baseDir}/scripts/tts.sh "Text to speak" --voice sophia --rate 24000 --speed 1.0 --lang en
```

### Python (requires `pip install smallestai` or just `requests`)

```bash
python3 {baseDir}/scripts/tts.py "Text to speak" --voice sophia --speed 1.0 --lang en --out speech.wav
```

### Voices

| Voice     | Gender | Accent          | Best For                    |
|-----------|--------|-----------------|-----------------------------|
| sophia    | Female | American        | General use (default)       |
| robert    | Male   | American        | Professional, reports (default) |
| advika    | Female | Indian          | Hindi content, code-switch  |
| vivaan    | Male   | Indian          | Bilingual English/Hindi     |
| camilla   | Female | Mexican/Latin   | Spanish content             |
| zara      | Female | American        | Conversational              |
| melody    | Female | American        | Storytelling, greetings     |
| arjun     | Male   | Indian          | English/Hindi bilingual     |
| stella    | Female | American        | Expressive, warm            |

80+ more voices available. List all with: `{baseDir}/scripts/voices.sh`

### Options

- `--voice <id>`: Voice identifier (default: sophia)
- `--rate <hz>`: Sample rate — 8000 | 16000 | 24000 | 44100 (default: 24000)
- `--speed <n>`: Playback speed 0.5–2.0 (default: 1.0)
- `--lang <code>`: Language code (default: en). See `{baseDir}/references/languages.md`
- `--out <path>`: Output file (default: auto-named `media/tts_<timestamp>.wav`)

### Output

Scripts print `MEDIA: <filepath>` on success. OpenClaw sends this as an audio attachment.

### Multilingual

Supports 30+ languages. Pass `--lang` with ISO code:

```bash
{baseDir}/scripts/tts.sh "नमस्ते, कैसे हैं आप?" --voice advika --lang hi
{baseDir}/scripts/tts.sh "Bonjour le monde" --voice sophia --lang fr
{baseDir}/scripts/tts.sh "Hola, buenos días" --voice camilla --lang es
```

Code-switching (mixing languages) works automatically — no flag needed:

```bash
{baseDir}/scripts/tts.sh "Hey, मुझे meeting remind कर दो" --voice advika --lang hi
```

## Speech-to-Text

Transcribe audio files using Pulse model. Supports WAV, MP3, OGG, FLAC.

### Shell

```bash
{baseDir}/scripts/stt.sh /path/to/audio.wav
{baseDir}/scripts/stt.sh /path/to/audio.wav --diarize --timestamps --emotions
```

### Python

```bash
python3 {baseDir}/scripts/stt.py /path/to/audio.wav --diarize --timestamps --lang en
```

### Options

- `--lang <code>`: Language (default: en)
- `--diarize`: Identify different speakers
- `--timestamps`: Word-level timing
- `--emotions`: Detect emotional tone

### Output

Returns JSON with `transcription` field. With `--diarize`, includes speaker labels per word.

## When to Use

Trigger this skill when the user:

- Asks to "say", "speak", "read aloud", or "generate speech/audio"
- Wants a "voice message", "voice note", or "audio file"
- Asks to "transcribe", "convert speech/audio to text"
- Mentions "Smallest AI", "Lightning TTS", or "Pulse STT"
- Needs fast or low-latency speech generation
- Wants Hindi, Spanish, multilingual, or code-switched voice output
- Asks to compare TTS providers or benchmark latency

## Error Handling

- Missing API key → tell user to set `SMALLEST_API_KEY`
- HTTP 401 → invalid or expired API key
- HTTP 429 → rate limited, wait and retry
- HTTP 400 → check text length (max ~5000 chars per request). Split long text into chunks.
- Empty audio → verify voice_id is valid

## Limits

- Max text per request: ~5000 characters
- For longer text: split into sentences, synthesize each, concatenate with sox or ffmpeg
- Free tier: 30 minutes/month of TTS
- Basic ($5/mo): 3 hours of TTS + 1 voice clone