技能详情(站内镜像,无评论)
许可证:MIT-0
MIT-0 ·免费使用、修改和重新分发。无需归因。
版本:v0.1.6
统计:⭐ 7 · 4.3k · 21 current installs · 24 all-time installs
⭐ 7
安装量(当前) 24
🛡 VirusTotal :良性 · OpenClaw :可疑
Package:avatarneil/discord-voice
安全扫描(ClawHub)
- VirusTotal :良性
- OpenClaw :可疑
OpenClaw 评估
The plugin mostly matches its stated purpose (Discord voice → STT/TTS → agent), but it includes prompt-injection surface (config can inject hints into agent prompts) and accesses agent session/workspace data — both of which widen its capability beyond a simple voice bridge and deserve careful review.
目的
Name/description match what the code and manifests request: a Discord bot token, optional STT/TTS API keys (OpenAI, ElevenLabs, Deepgram), ffmpeg and native build tools. package.json and openclaw.plugin.json list TTS/STT providers and corresponding SDKs (discord.js, @deepgram/sdk, @aws-sdk/client-polly, kokoro-js, @xenova/transformers), which are proportionate to a voice plugin.
说明范围
SKILL.md and plugin metadata explicitly support injecting a TTS hint into the agent prompt (noEmojiHint) and route transcribed text through the agent and its session/workspace. That is functionally a prompt-injection channel (the static scan also flagged 'system-prompt-override'). While some hinting is useful for TTS formatting, this capability lets configuration modify agent prompts and thus can influence model behavior beyond simple transcri…
安装机制
No opaque remote download/install spec in the registry entry; installation is standard (npm install / git clone). Dependencies are normal NPM packages from known registries. No high-risk custom URL downloads or archive extraction were found in the provided manifests.
证书
Declared secrets are proportional: a Discord bot token is required and various optional provider API keys are listed. However the plugin loads core agent dependencies and reads/writes the agent session store and workspace (resolveStorePath, loadSessionStore, ensureAgentWorkspace). That access is likely necessary for maintaining per-guild sessions, but it also gives the plugin programmatic access to agent data and potentially other configuratio…
持久
The skill is not force-included (always:false) and follows the platform default allowing autonomous invocation. It can auto-join voice channels if configured and will interact with the agent runtime and session stores — a normal level of privilege for this kind of plugin, but combined with prompt-injection/config-driven hinting this increases the attack surface.
安装(复制给龙虾 AI)
将下方整段复制到龙虾中文库对话中,由龙虾按 SKILL.md 完成安装。
请把本段交给龙虾中文库(龙虾 AI)执行:为本机安装 OpenClaw 技能「Discord Voice」。简介:Real-time voice conversations in Discord voice channels with Claude AI。
请 fetch 以下地址读取 SKILL.md 并按文档完成安装:https://raw.githubusercontent.com/openclaw/skills/refs/heads/main/skills/avatarneil/discord-voice/SKILL.md
(来源:yingzhi8.cn 技能库)
SKILL.md
---
name: discord-voice
description: Real-time voice conversations in Discord voice channels with Claude AI
metadata:
clawdbot:
config:
requiredConfig:
- discord.token
optionalEnv:
- OPENAI_API_KEY
- ELEVENLABS_API_KEY
- DEEPGRAM_API_KEY
systemDependencies:
- ffmpeg
- build-essential
example: |
{
"plugins": {
"entries": {
"discord-voice": {
"enabled": true,
"config": {
"sttProvider": "local-whisper",
"ttsProvider": "openai",
"ttsVoice": "nova",
"vadSensitivity": "medium",
"streamingSTT": true,
"bargeIn": true,
"allowedUsers": []
}
}
}
}
}
---
# Discord Voice Plugin for Clawdbot
Real-time voice conversations in Discord voice channels. Join a voice channel, speak, and have your words transcribed, processed by Claude, and spoken back.
## Features
- **Join/Leave Voice Channels**: Via slash commands, CLI, or agent tool
- **Voice Activity Detection (VAD)**: Automatically detects when users are speaking
- **Speech-to-Text**: Whisper API (OpenAI), Deepgram, or Local Whisper (Offline)
- **Streaming STT**: Real-time transcription with Deepgram WebSocket (~1s latency reduction)
- **Agent Integration**: Transcribed speech is routed through the Clawdbot agent
- **Text-to-Speech**: OpenAI TTS, ElevenLabs, or Kokoro (Local/Offline)
- **Audio Playback**: Responses are spoken back in the voice channel
- **Barge-in Support**: Stops speaking immediately when user starts talking
- **Auto-reconnect**: Automatic heartbeat monitoring and reconnection on disconnect
## Requirements
- Discord bot with voice permissions (Connect, Speak, Use Voice Activity)
- API keys for STT and TTS providers
- System dependencies for voice:
- `ffmpeg` (audio processing)
- Native build tools for `@discordjs/opus` and `sodium-native`
## Installation
### 1. Install System Dependencies
```bash
# Ubuntu/Debian
sudo apt-get install ffmpeg build-essential python3
# Fedora/RHEL
sudo dnf install ffmpeg gcc-c++ make python3
# macOS
brew install ffmpeg
```
### 2. Install via ClawdHub
```bash
clawdhub install discord-voice
```
Or manually:
```bash
cd ~/.clawdbot/extensions
git clone <repository-url> discord-voice
cd discord-voice
npm install
```
### 3. Configure in clawdbot.json
```json5
{
plugins: {
entries: {
"discord-voice": {
enabled: true,
config: {
sttProvider: "local-whisper",
ttsProvider: "openai",
ttsVoice: "nova",
vadSensitivity: "medium",
allowedUsers: [], // Empty = allow all users
silenceThresholdMs: 1500,
maxRecordingMs: 30000,
openai: {
apiKey: "sk-...", // Or use OPENAI_API_KEY env var
},
},
},
},
},
}
```
### 4. Discord Bot Setup
Ensure your Discord bot has these permissions:
- **Connect** - Join voice channels
- **Speak** - Play audio
- **Use Voice Activity** - Detect when users speak
Add these to your bot's OAuth2 URL or configure in Discord Developer Portal.
## Configuration
| Option | Type | Default | Description |
| --------------------- | -------- | ----------------- | ----------------------------------------------- |
| `enabled` | boolean | `true` | Enable/disable the plugin |
| `sttProvider` | string | `"local-whisper"` | `"whisper"`, `"deepgram"`, or `"local-whisper"` |
| `streamingSTT` | boolean | `true` | Use streaming STT (Deepgram only, ~1s faster) |
| `ttsProvider` | string | `"openai"` | `"openai"` or `"elevenlabs"` |
| `ttsVoice` | string | `"nova"` | Voice ID for TTS |
| `vadSensitivity` | string | `"medium"` | `"low"`, `"medium"`, or `"high"` |
| `bargeIn` | boolean | `true` | Stop speaking when user talks |
| `allowedUsers` | string[] | `[]` | User IDs allowed (empty = all) |
| `silenceThresholdMs` | number | `1500` | Silence before processing (ms) |
| `maxRecordingMs` | number | `30000` | Max recording length (ms) |
| `heartbeatIntervalMs` | number | `30000` | Connection health check interval |
| `autoJoinChannel` | string | `undefined` | Channel ID to auto-join on startup |
### Provider Configuration
#### OpenAI (Whisper + TTS)
```json5
{
openai: {
apiKey: "sk-...",
whisperModel: "whisper-1",
ttsModel: "tts-1",
},
}
```
#### ElevenLabs (TTS only)
```json5
{
elevenlabs: {
apiKey: "...",
voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel
modelId: "eleven_multilingual_v2",
},
}
```
#### Deepgram (STT only)
```json5
{
deepgram: {
apiKey: "...",
model: "nova-2",
},
}
```
## Usage
### Slash Commands (Discord)
Once registered with Discord, use these commands:
- `/discord_voice join <channel>` - Join a voice channel
- `/discord_voice leave` - Leave the current voice channel
- `/discord_voice status` - Show voice connection status
### CLI Commands
```bash
# Join a voice channel
clawdbot discord_voice join <channelId>
# Leave voice
clawdbot discord_voice leave --guild <guildId>
# Check status
clawdbot discord_voice status
```
### Agent Tool
The agent can use the `discord_voice` tool:
```
Join voice channel 1234567890
```
The tool supports actions:
- `join` - Join a voice channel (requires channelId)
- `leave` - Leave voice channel
- `speak` - Speak text in the voice channel
- `status` - Get current voice status
## How It Works
1. **Join**: Bot joins the specified voice channel
2. **Listen**: VAD detects when users start/stop speaking
3. **Record**: Audio is buffered while user speaks
4. **Transcribe**: On silence, audio is sent to STT provider
5. **Process**: Transcribed text is sent to Clawdbot agent
6. **Synthesize**: Agent response is converted to audio via TTS
7. **Play**: Audio is played back in the voice channel
## Streaming STT (Deepgram)
When using Deepgram as your STT provider, streaming mode is enabled by default. This provides:
- **~1 second faster** end-to-end latency
- **Real-time feedback** with interim transcription results
- **Automatic keep-alive** to prevent connection timeouts
- **Fallback** to batch transcription if streaming fails
To use streaming STT:
```json5
{
sttProvider: "deepgram",
streamingSTT: true, // default
deepgram: {
apiKey: "...",
model: "nova-2",
},
}
```
## Barge-in Support
When enabled (default), the bot will immediately stop speaking if a user starts talking. This creates a more natural conversational flow where you can interrupt the bot.
To disable (let the bot finish speaking):
```json5
{
bargeIn: false,
}
```
## Auto-reconnect
The plugin includes automatic connection health monitoring:
- **Heartbeat checks** every 30 seconds (configurable)
- **Auto-reconnect** on disconnect with exponential backoff
- **Max 3 attempts** before giving up
If the connection drops, you'll see logs like:
```
[discord-voice] Disconnected from voice channel
[discord-voice] Reconnection attempt 1/3
[discord-voice] Reconnected successfully
```
## VAD Sensitivity
- **low**: Picks up quiet speech, may trigger on background noise
- **medium**: Balanced (recommended)
- **high**: Requires louder, clearer speech
## Troubleshooting
### "Discord client not available"
Ensure the Discord channel is configured and the bot is connected before using voice.
### Opus/Sodium build errors
Install build tools:
```bash
npm install -g node-gyp
npm rebuild @discordjs/opus sodium-native
```
### No audio heard
1. Check bot has Connect + Speak permissions
2. Check bot isn't server muted
3. Verify TTS API key is valid
### Transcription not working
1. Check STT API key is valid
2. Check audio is being recorded (see debug logs)
3. Try adjusting VAD sensitivity
### Enable debug logging
```bash
DEBUG=discord-voice clawdbot gateway start
```
## Environment Variables
| Variable | Description |
| -------------------- | ------------------------------ |
| `DISCORD_TOKEN` | Discord bot token (required) |
| `OPENAI_API_KEY` | OpenAI API key (Whisper + TTS) |
| `ELEVENLABS_API_KEY` | ElevenLabs API key |
| `DEEPGRAM_API_KEY` | Deepgram API key |
## Limitations
- Only one voice channel per guild at a time
- Maximum recording length: 30 seconds (configurable)
- Requires stable network for real-time audio
- TTS output may have slight delay due to synthesis
## License
MIT