edge-tts

已认证

AI v1.0.0 · 小杨

📥 下载 235 次 📦 1137.6KB 📅 2026-05-25

Edge-TTS Skill

Overview

Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation.

Quick Start

When you detect TTS intent from triggers or user request:

Call the tts tool (Clawdbot built-in) to convert text to speech
The tool returns a MEDIA: path
Clawdbot routes the audio to the current channel

// Example: Built-in tts tool usage
tts("Your text to convert to speech")
// Returns: MEDIA: /path/to/audio.mp3

Trigger Detection

Smart Trigger Keywords

The skill intelligently detects TTS intent from various natural language triggers:

English triggers:

"say", "speak", "read", "voice", "audio", "speech"
"tts", "text-to-speech", "convert to speech"
"speak to me", "talk to me", "say this"
"voice message", "voice reply"

Chinese triggers:

"说话", "读一下", "用声音", "语音"
"tts", "语音转文字" (though this refers to STT, the context indicates TTS)
"读给我听", "说给我听"
"发语音", "语音回复"

Automatic Trigger Detection

When any trigger keyword is detected in user input:

Extract content: Separate trigger keywords from the actual message content
Convert to speech: Use edge-tts to generate audio
Send both formats:

Text content (as normal text)
Voice message (using <qqvoice> tag or appropriate format)

Example Behavior

User says: "你好，请用语音回复我。"

Skill behavior:

Detects trigger: "用语音回复我" (use voice to reply)
Extracts content: "你好" (Hello)
Generates audio: Creates voice file with zh-CN-XiaoxiaoNeural voice
Sends to user:

Text: "你好" (Hello)
Voice: <qqvoice>path/to/audio.mp3</qqvoice>

User says: "读一下这个：今天天气很好。"

Skill behavior:

Detects trigger: "读一下" (read this)
Extracts content: "今天天气很好。" (Today's weather is very good.)
Generates audio with specified text
Sends both text and voice

Comprehensive Examples

Example 1: Simple voice command

User: "你说话"
Skill: 生成语音并发送文本 + <qqvoice>音频文件</qqvoice>

Example 2: Voice reply with specific content

User: "发语音：好的，我明白了"
Skill: 生成语音"好的，我明白了" + 发送文本"好的，我明白了" + <qqvoice>音频文件</qqvoice>

Example 3: Voice for multitasking

User: "读一下这个：会议将在下午三点开始"
Skill: 生成语音"会议将在下午三点开始" + 发送文本"会议将在下午三点开始" + <qqvoice>音频文件</qqvoice>

Example 4: Voice for accessibility

User: "用语音回复我：我准备好了"
Skill: 生成语音"我准备好了" + 发送文本"我准备好了" + <qqvoice>音频文件</qqvoice>

Advanced Customization

Using the Node.js Scripts

For more control, use the bundled scripts directly:

#### TTS Converter

cd scripts
npm install
node tts-converter.js "Your text" --voice en-US-AriaNeural --rate +10% --output output.mp3

Options:

--voice, -v: Voice name (default: en-US-AriaNeural)
--lang, -l: Language code (e.g., en-US, es-ES)
--format, -o: Output format (default: audio-24khz-48kbitrate-mono-mp3)
--pitch: Pitch adjustment (e.g., +10%, -20%, default)
--rate, -r: Rate adjustment (e.g., +10%, -20%, default)
--volume: Volume adjustment (e.g., +0%, -10%, default)
--save-subtitles, -s: Save subtitles as JSON file
--output, -f: Output file path (default: tts_output.mp3)
--proxy, -p: Proxy URL (e.g., http://localhost:7890)
--timeout: Request timeout in milliseconds (default: 10000)
--list-voices, -L: List available voices

#### Configuration Manager

cd scripts
npm install
node config-manager.js --set-voice en-US-AriaNeural

node config-manager.js --set-rate +10%

node config-manager.js --get

node config-manager.js --reset

Voice Selection

Common voices (use --list-voices for full list):

English:

en-US-MichelleNeural (female, natural)
en-US-AriaNeural (female, natural)
en-US-GuyNeural (male, natural)
en-GB-SoniaNeural (female, British)
en-GB-RyanNeural (male, British)

Chinese:

zh-CN-XiaoxiaoNeural (female, natural, default)
zh-CN-YunxiNeural (male, natural)
zh-CN-YunyangNeural (male, broadcast style)

Other Languages:

es-ES-ElviraNeural (Spanish, Spain)
fr-FR-DeniseNeural (French)
de-DE-KatjaNeural (German)
ja-JP-NanamiNeural (Japanese)
zh-CN-YunxiNeural (Chinese, male)
zh-CN-YunyangNeural (Chinese, male, broadcast style)
ar-SA-ZariyahNeural (Arabic)

Rate Guidelines

Rate values use percentage format:

"default": Normal speed
"-20%" to "-10%": Slow, clear (tutorials, stories, accessibility)
"+10%": Slightly fast (summaries, natural pace, default for Chinese)
"+20%": Moderate speed (conversational)
"+30%" to "+50%": Fast (news, efficiency)

Output Formats

Choose audio quality based on use case:

audio-24khz-48kbitrate-mono-mp3: Standard quality (voice notes, messages)
audio-24khz-96kbitrate-mono-mp3: High quality (presentations, content)
audio-48khz-96kbitrate-stereo-mp3: Highest quality (professional audio, music)

Resources

scripts/tts-converter.js

Main TTS conversion script using node-edge-tts. Generates audio files with customizable voice, rate, volume, pitch, and format. Supports subtitle generation and voice listing.

scripts/config-manager.js

Manages persistent user preferences for TTS settings (voice, language, format, pitch, rate, volume). Stores config in ~/.tts-config.json.

scripts/package.json

NPM package configuration with node-edge-tts dependency.

references/node_edge_tts_guide.md

Complete documentation for node-edge-tts npm package including:

Full voice list by language
Prosody options (rate, pitch, volume)
Usage examples (CLI and Module)
Subtitle generation
Output formats
Best practices and limitations

Voice Testing

Test different voices and preview audio quality at: https://tts.travisvn.com/

Refer to this when you need specific voice details or advanced features.

Installation

To use the bundled scripts:

cd /home/user/clawd/skills/public/tts-skill/scripts
npm install

This installs:

node-edge-tts - TTS library
commander - CLI argument parsing

Workflow

Detect intent: Check for "tts" trigger or keyword in user message
Choose method: Use built-in tts tool for simple requests, or scripts/tts-converter.js for customization
Generate audio: Convert the target text (message, search results, summary)
Return to user: The tts tool returns a MEDIA: path; Clawdbot handles delivery

Testing

Basic Test

Run the test script to verify TTS functionality:

cd /home/user/clawd/skills/public/edge-tts/scripts
npm test

This generates a test audio file and verifies the TTS service is working.

Voice Testing

Test different voices and preview audio quality at: https://tts.travisvn.com/

Integration Test

Use the built-in tts tool for quick testing:

// Example: Test TTS with default settings
tts("This is a test of the TTS functionality.")

Configuration Test

Verify configuration persistence:

cd /home/user/clawd/skills/public/edge-tts/scripts
node config-manager.js --get
node config-manager.js --set-voice en-US-GuyNeural
node config-manager.js --get

Troubleshooting

Test connectivity: Run npm test to check if TTS service is accessible
Check voice availability: Use node tts-converter.js --list-voices to see available voices
Verify proxy settings: If using proxy, test with node tts-converter.js "test" --proxy http://localhost:7890
Check audio output: The test should generate test-output.mp3 in the scripts directory

Notes

node-edge-tts uses Microsoft Edge's online TTS service (updated, working authentication)
No API key needed (free service)
Output is MP3 format by default
Requires internet connection
Supports subtitle generation (JSON format with word-level timing)
Temporary File Handling: By default, audio files are saved to the system's temporary directory (/tmp/edge-tts-temp/ on Unix, C:\Users\<user>\AppData\Local\Temp\edge-tts-temp\ on Windows) with unique filenames (e.g., tts_1234567890_abc123.mp3). Files are not automatically deleted - the calling application (Clawdbot) should handle cleanup after use. You can specify a custom output path with the --output option if permanent storage is needed.
TTS keyword filtering: The skill automatically filters out TTS-related keywords (tts, TTS, text-to-speech) from text before conversion to avoid converting the trigger words themselves to audio
For repeated preferences, use config-manager.js to set defaults
Default voice: en-US-MichelleNeural (female, natural)
Neural voices (ending in Neural) provide higher quality than Standard voices

💡 安装方法

下载 ZIP 解压到 skills/ 目录即可使用