edge-tts
已认证Edge-TTS Skill
Overview
Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation.
Quick Start
When you detect TTS intent from triggers or user request:
- Call the tts tool (Clawdbot built-in) to convert text to speech
- The tool returns a MEDIA: path
- Clawdbot routes the audio to the current channel
// Example: Built-in tts tool usage
tts("Your text to convert to speech")
// Returns: MEDIA: /path/to/audio.mp3
Trigger Detection
Smart Trigger Keywords
The skill intelligently detects TTS intent from various natural language triggers:
English triggers:
- "say", "speak", "read", "voice", "audio", "speech"
- "tts", "text-to-speech", "convert to speech"
- "speak to me", "talk to me", "say this"
- "voice message", "voice reply"
Chinese triggers:
- "说话", "读一下", "用声音", "语音"
- "tts", "语音转文字" (though this refers to STT, the context indicates TTS)
- "读给我听", "说给我听"
- "发语音", "语音回复"
Automatic Trigger Detection
When any trigger keyword is detected in user input:
- Extract content: Separate trigger keywords from the actual message content
- Convert to speech: Use edge-tts to generate audio
- Send both formats:
- Text content (as normal text)
- Voice message (using <qqvoice> tag or appropriate format)
Example Behavior
User says: "你好,请用语音回复我。"
Skill behavior:
- Detects trigger: "用语音回复我" (use voice to reply)
- Extracts content: "你好" (Hello)
- Generates audio: Creates voice file with zh-CN-XiaoxiaoNeural voice
- Sends to user:
- Text: "你好" (Hello)
- Voice: <qqvoice>path/to/audio.mp3</qqvoice>
User says: "读一下这个:今天天气很好。"
Skill behavior:
- Detects trigger: "读一下" (read this)
- Extracts content: "今天天气很好。" (Today's weather is very good.)
- Generates audio with specified text
- Sends both text and voice
Comprehensive Examples
Example 1: Simple voice command
User: "你说话"
Skill: 生成语音并发送文本 + <qqvoice>音频文件</qqvoice>
Example 2: Voice reply with specific content
User: "发语音:好的,我明白了"
Skill: 生成语音"好的,我明白了" + 发送文本"好的,我明白了" + <qqvoice>音频文件</qqvoice>
Example 3: Voice for multitasking
User: "读一下这个:会议将在下午三点开始"
Skill: 生成语音"会议将在下午三点开始" + 发送文本"会议将在下午三点开始" + <qqvoice>音频文件</qqvoice>
Example 4: Voice for accessibility
User: "用语音回复我:我准备好了"
Skill: 生成语音"我准备好了" + 发送文本"我准备好了" + <qqvoice>音频文件</qqvoice>
Advanced Customization
Using the Node.js Scripts
For more control, use the bundled scripts directly:
#### TTS Converter
cd scripts
npm install
node tts-converter.js "Your text" --voice en-US-AriaNeural --rate +10% --output output.mp3
Options:
--voice, -v: Voice name (default: en-US-AriaNeural)--lang, -l: Language code (e.g., en-US, es-ES)--format, -o: Output format (default: audio-24khz-48kbitrate-mono-mp3)--pitch: Pitch adjustment (e.g., +10%, -20%, default)--rate, -r: Rate adjustment (e.g., +10%, -20%, default)--volume: Volume adjustment (e.g., +0%, -10%, default)--save-subtitles, -s: Save subtitles as JSON file--output, -f: Output file path (default: tts_output.mp3)--proxy, -p: Proxy URL (e.g., http://localhost:7890)--timeout: Request timeout in milliseconds (default: 10000)--list-voices, -L: List available voices
#### Configuration Manager
cd scripts
npm install
node config-manager.js --set-voice en-US-AriaNeural
node config-manager.js --set-rate +10%
node config-manager.js --get
node config-manager.js --reset
Voice Selection
Common voices (use --list-voices for full list):
English:
en-US-MichelleNeural(female, natural)en-US-AriaNeural(female, natural)en-US-GuyNeural(male, natural)en-GB-SoniaNeural(female, British)en-GB-RyanNeural(male, British)
Chinese:
zh-CN-XiaoxiaoNeural(female, natural, default)zh-CN-YunxiNeural(male, natural)zh-CN-YunyangNeural(male, broadcast style)
Other Languages:
es-ES-ElviraNeural(Spanish, Spain)fr-FR-DeniseNeural(French)de-DE-KatjaNeural(German)ja-JP-NanamiNeural(Japanese)zh-CN-YunxiNeural(Chinese, male)zh-CN-YunyangNeural(Chinese, male, broadcast style)ar-SA-ZariyahNeural(Arabic)
Rate Guidelines
Rate values use percentage format:
"default": Normal speed"-20%"to"-10%": Slow, clear (tutorials, stories, accessibility)"+10%": Slightly fast (summaries, natural pace, default for Chinese)"+20%": Moderate speed (conversational)"+30%"to"+50%": Fast (news, efficiency)
Output Formats
Choose audio quality based on use case:
audio-24khz-48kbitrate-mono-mp3: Standard quality (voice notes, messages)audio-24khz-96kbitrate-mono-mp3: High quality (presentations, content)audio-48khz-96kbitrate-stereo-mp3: Highest quality (professional audio, music)
Resources
scripts/tts-converter.js
Main TTS conversion script using node-edge-tts. Generates audio files with customizable voice, rate, volume, pitch, and format. Supports subtitle generation and voice listing.
scripts/config-manager.js
Manages persistent user preferences for TTS settings (voice, language, format, pitch, rate, volume). Stores config in ~/.tts-config.json.
scripts/package.json
NPM package configuration with node-edge-tts dependency.
references/node_edge_tts_guide.md
Complete documentation for node-edge-tts npm package including:
- Full voice list by language
- Prosody options (rate, pitch, volume)
- Usage examples (CLI and Module)
- Subtitle generation
- Output formats
- Best practices and limitations
Voice Testing
Test different voices and preview audio quality at: https://tts.travisvn.com/
Refer to this when you need specific voice details or advanced features.
Installation
To use the bundled scripts:
cd /home/user/clawd/skills/public/tts-skill/scripts
npm install
This installs:
node-edge-tts- TTS librarycommander- CLI argument parsing
Workflow
- Detect intent: Check for "tts" trigger or keyword in user message
- Choose method: Use built-in
ttstool for simple requests, orscripts/tts-converter.jsfor customization - Generate audio: Convert the target text (message, search results, summary)
- Return to user: The tts tool returns a MEDIA: path; Clawdbot handles delivery
Testing
Basic Test
Run the test script to verify TTS functionality:
cd /home/user/clawd/skills/public/edge-tts/scripts
npm test
This generates a test audio file and verifies the TTS service is working.
Voice Testing
Test different voices and preview audio quality at: https://tts.travisvn.com/
Integration Test
Use the built-in tts tool for quick testing:
// Example: Test TTS with default settings
tts("This is a test of the TTS functionality.")
Configuration Test
Verify configuration persistence:
cd /home/user/clawd/skills/public/edge-tts/scripts
node config-manager.js --get
node config-manager.js --set-voice en-US-GuyNeural
node config-manager.js --get
Troubleshooting
- Test connectivity: Run
npm testto check if TTS service is accessible - Check voice availability: Use
node tts-converter.js --list-voicesto see available voices - Verify proxy settings: If using proxy, test with
node tts-converter.js "test" --proxy http://localhost:7890 - Check audio output: The test should generate
test-output.mp3in the scripts directory
Notes
- node-edge-tts uses Microsoft Edge's online TTS service (updated, working authentication)
- No API key needed (free service)
- Output is MP3 format by default
- Requires internet connection
- Supports subtitle generation (JSON format with word-level timing)
- Temporary File Handling: By default, audio files are saved to the system's temporary directory (
/tmp/edge-tts-temp/on Unix,C:\Users\<user>\AppData\Local\Temp\edge-tts-temp\on Windows) with unique filenames (e.g.,tts_1234567890_abc123.mp3). Files are not automatically deleted - the calling application (Clawdbot) should handle cleanup after use. You can specify a custom output path with the--outputoption if permanent storage is needed. - TTS keyword filtering: The skill automatically filters out TTS-related keywords (tts, TTS, text-to-speech) from text before conversion to avoid converting the trigger words themselves to audio
- For repeated preferences, use
config-manager.jsto set defaults - Default voice:
en-US-MichelleNeural(female, natural) - Neural voices (ending in
Neural) provide higher quality than Standard voices
💡 安装方法
下载 ZIP 解压到 skills/ 目录即可使用