Gemini STT — 技能 — openclaw中文资讯站

技能详情（站内镜像，无评论）

Transcribe audio files using Google's Gemini API or Vertex AI

媒体与内容

许可证：MIT-0

MIT-0 ·免费使用、修改和重新分发。无需归因。

版本：v1.1.0

统计：⭐ 2 · 2.8k · 11 current installs · 11 all-time installs

⭐ 2

安装量（当前） 11

🛡 VirusTotal ：良性 · OpenClaw ：良性

Package：araa47/gemini-stt

安全扫描（ClawHub）

VirusTotal ：良性
OpenClaw ：良性

OpenClaw 评估

The skill appears to do what it claims—transcribe audio via Gemini or Vertex AI—and its code and instructions are consistent with that purpose, but the registry metadata omits required authentication details and should be corrected/verified before use.

目的

Skill name/description (Gemini/Vertex STT) match the code and runtime instructions. The only mismatch is registry metadata claiming 'no required env vars' while SKILL.md and the script require either GEMINI_API_KEY or Google ADC (gcloud). This is an inconsistency in metadata, not in functionality.

说明范围

Runtime instructions and the script are scoped to reading an audio file, base64-encoding it, and calling Google Gemini or Vertex endpoints. It invokes 'gcloud' only to obtain an access token/project configuration. It does not read unrelated system files or send data to unexpected endpoints.

安装机制

No install spec; the skill is instruction-only with a single Python script that uses only the standard library. Low risk from installation artifacts.

证书

Authentication requirements (GEMINI_API_KEY or gcloud ADC and possibly GOOGLE_CLOUD_PROJECT/CLOUDSDK_CORE_PROJECT) are appropriate for contacting Gemini/Vertex. However, the skill metadata declares no required environment variables or primary credential, which is inaccurate and could mislead users about needed credentials.

持久

The skill does not request permanent inclusion (always:false), does not modify other skills or system settings, and does not persist credentials. It runs commands locally (gcloud) but does not escalate privileges or change system-wide configuration.

综合结论

This skill is coherent with its stated purpose, but before installing: (1) be aware it requires authentication—either set GEMINI_API_KEY or run 'gcloud auth application-default login' and ensure a proper GCP project is configured; the registry metadata currently omits these requirements. (2) Using ADC (gcloud) will cause the script to call 'gcloud auth print-access-token' and use your ADC permissions to call Vertex; prefer a least-privilege se…

安装（复制给龙虾 AI）

将下方整段复制到龙虾中文库对话中，由龙虾按 SKILL.md 完成安装。

请把本段交给龙虾中文库（龙虾 AI）执行：为本机安装 OpenClaw 技能「Gemini STT」。简介：Transcribe audio files using Google's Gemini API or Vertex AI。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装：https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/araa47/gemini-stt/SKILL.md
（来源：yingzhi8.cn 技能库）

SKILL.md

打开原始 SKILL.md（GitHub raw）

---
name: gemini-stt
description: Transcribe audio files using Google's Gemini API or Vertex AI
metadata: {"clawdbot":{"emoji":"🎤","os":["linux","darwin"]}}
---

# Gemini Speech-to-Text Skill

Transcribe audio files using Google's Gemini API or Vertex AI. Default model is `gemini-2.0-flash-lite` for fastest transcription.

## Authentication (choose one)

### Option 1: Vertex AI with Application Default Credentials (Recommended)

```bash
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID
```

The script will automatically detect and use ADC when available.

### Option 2: Direct Gemini API Key

Set `GEMINI_API_KEY` in environment (e.g., `~/.env` or `~/.clawdbot/.env`)

## Requirements

- Python 3.10+ (no external dependencies)
- Either GEMINI_API_KEY or gcloud CLI with ADC configured

## Supported Formats

- `.ogg` / `.opus` (Telegram voice messages)
- `.mp3`
- `.wav`
- `.m4a`

## Usage

```bash
# Auto-detect auth (tries ADC first, then GEMINI_API_KEY)
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg

# Force Vertex AI
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex

# With a specific model
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --model gemini-2.5-pro

# Vertex AI with specific project and region
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex --project my-project --region us-central1

# With Clawdbot media
python ~/.claude/skills/gemini-stt/transcribe.py ~/.clawdbot/media/inbound/voice-message.ogg
```

## Options

| Option | Description |
|--------|-------------|
| `<audio_file>` | Path to the audio file (required) |
| `--model`, `-m` | Gemini model to use (default: `gemini-2.0-flash-lite`) |
| `--vertex`, `-v` | Force use of Vertex AI with ADC |
| `--project`, `-p` | GCP project ID (for Vertex, defaults to gcloud config) |
| `--region`, `-r` | GCP region (for Vertex, default: `us-central1`) |

## Supported Models

Any Gemini model that supports audio input can be used. Recommended models:

| Model | Notes |
|-------|-------|
| `gemini-2.0-flash-lite` | **Default.** Fastest transcription speed. |
| `gemini-2.0-flash` | Fast and cost-effective. |
| `gemini-2.5-flash-lite` | Lightweight 2.5 model. |
| `gemini-2.5-flash` | Balanced speed and quality. |
| `gemini-2.5-pro` | Higher quality, slower. |
| `gemini-3-flash-preview` | Latest flash model. |
| `gemini-3-pro-preview` | Latest pro model, best quality. |

See [Gemini API Models](https://ai.google.dev/gemini-api/docs/models) for the latest list.

## How It Works

1. Reads the audio file and base64 encodes it
2. Auto-detects authentication:
   - If ADC is available (gcloud), uses Vertex AI endpoint
   - Otherwise, uses GEMINI_API_KEY with direct Gemini API
3. Sends to the selected Gemini model with transcription prompt
4. Returns the transcribed text

## Example Integration

For Clawdbot voice message handling:

```bash
# Transcribe incoming voice message
TRANSCRIPT=$(python ~/.claude/skills/gemini-stt/transcribe.py "$AUDIO_PATH")
echo "User said: $TRANSCRIPT"
```

## Error Handling

The script exits with code 1 and prints to stderr on:
- No authentication available (neither ADC nor GEMINI_API_KEY)
- File not found
- API errors
- Missing GCP project (when using Vertex)

## Notes

- Uses Gemini 2.0 Flash Lite by default for fastest transcription
- No external Python dependencies (uses stdlib only)
- Automatically detects MIME type from file extension
- Prefers Vertex AI with ADC when available (no API key management needed)