Generate speech
Use AI for text-to-speech or to clone your voice via API
Модели в коллекции
Сортировка: по популярности (run_count)Kokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.
Create song covers with any RVC v2 trained AI voice from audio files.
🔊 Text-Prompted Generative Audio Model
Generate expressive, natural speech. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.
NeonAI Coqui AI TTS Plugin.
Generate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".
Generates speech from text
The fastest open source TTS model without sacrificing quality.
SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Updated to OpenVoice v2: Versatile Instant Voice Cloning
Clone voices to use with Minimax's speech-02-hd and speech-02-turbo
F5-TTS, the new state-of-the-art in open source voice cloning
Orpheus 3B - high quality, emotive Text to Speech
Generate expressive, natural speech in 23 languages. Features instant voice cloning from short audio, emotion control, and seamless cross-language voice transfer.
Generate expressive, natural speech with Resemble AI's Chatterbox.
MetaVoice-1B: 1.2B parameter base model trained on 100K hours of speech
Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Minimax Speech 2.8 Turbo: Turn text into natural, expressive speech with voice cloning, emotion control, and support for 40+ languages
lightweight text-to-speech (TTS) model, trained on 10.5K hours of audio data
A F5-TTS fine-tuned for Spanish
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs
Pheme generates a variety of conversational voices in 16 kHz for phone-call applications
A novel speech model for insane prosody.