Transcribe audio channels with speech to text, synthesize messages with text to speech, and download your audio & transcription files.
Open our docs in a new tab -->
Visit our website:
SeaVoice Discord Bot Homepage -->
đ The SeaVoice Bot is a new speech-to-text and text-to-speech Discord integration brought to you by Seasalt.ai, a startup run by some of the worldâs leading experts in deep speech recognition, neural speech synthesis, and natural language processing. đ
Watch the demo video: https://www.youtube.com/embed/drOVk_bexFY
SeaVoice is a voice intelligence bot that uses advanced AI technology to improve the Discord voice channel experience. One of the great things about Discordâs text channels is that they maintain a permanent log of the serverâs conversations. But what about the voice channels? Once something is said verbally in the channel, itâs gone - you canât catch up on part of the conversation you missed or search the conversation later.
Invite SeaVoice to the voice channel, and you can get real time speech transcriptions delivered to a chat channel as the conversation is happening. Youâll also receive a final version of your transcript and voice recording in a DM after the session ends. SeaVoice is set apart from bots offering similar services because itâs backed by state-of-the-art deep learning models crafted by Seasalt.ai.
We feel that providing highly accurate transcriptions for voice channels is a huge accessibility improvement for Discord. Additionally, because transcriptions are automatically posted to a text channel, that means they are permanent, searchable, and shareable. Similarly, speech synthesis also boosts participation in voice channels by making them more accessible to people who canât or donât want to speak personally.
Capabilitiesâïž Speech-to-Text
Transcribe Audio from Discord Voice Channels
/recognize [language]
/recognize [language]
-> Bot joins the voice channel youâre currently in, and continues to listen and output transcription in real time to the chat channel. The bot will record and transcribe everyone in the voice channel. Transcriptions are output to the text channel where the initial slash command was entered. When the session ends, the bot will DM the session creator a final transcription file, an SRT-formatted transcript file (used for subtitles), and a link to a full audio download. The session will automatically wrap up if all the users leave the voice channel, or if the bot shuts down or restarts for any reason (such as when a new version gets released).
SeaVoice currently supports 12 languages. The English and Taiwanese Mandarin models are our own in-house models trained from scratch; they are highly accurate and reliable. All other languages are supported using a multilingual open source model as the base. The performance wasnât great out of the box, so we integrated it into our own STT pipeline and tuned the model to improve the performance. One thing you may notice with the open source model is âhallucinationâ. This can manifest in a couple different ways, such as: inserting words/phrases that werenât said, transcribing in the wrong language, and/or translating the spoken language to a different language.
Language |
---|
English |
Mandarin (Taiwan) |
Spanish |
Italian |
Portuguese |
German |
Japanese |
Korean |
Russian |
Hindi |
Vietnamese |
Synthesize Speech from Chat to Voice Channel
Seasalt.ai also excels at speech synthesis. We offer a text-to-speech command, which allows users to type in a chat channel and have audio synthesized and played in a particular voice channel for them.
/speak [voice] [text]To use this command, you should already be in a voice channel. In any text channel, type the /speak
slash command and then optionally specify which voice
you would like to use, and enter the text
that you would like synthesized. When the TTS is done speaking, a đ reaction will be applied to the command message. The default voice if not specified is Orca
, you can also set your own default voice using the /user_config
command. You can see the available voices below:
Name | Sex | Language |
---|---|---|
Orca | M | American English |
Narwhal | M | British English |
Angelfish | F | American English |
Starfish | F | Mandarin (Taiwan) |
Dolphin | F | Mandarin (Taiwan) |
Export Audio & Transcriptions from Voice Channels
Users are able to download their transcriptions and full audio recordings to a file.
When the STT session ends the bot will a final transcription file, an SRT-formatted transcript file (used for subtitles), and a link to a full audio download. To download the audio, follow the link and then right click in the web browser and select âSave asâŠâ. Download links will expire after 24 hours - so if you want to a permanent copy of your file, download it to your computer.
ConfigurationSeaVoice offers customizable settings for both servers and individual users.
Note: If you update any settings, you must stop and re-start any active /recognize
sessions before the new configurations are applied.
Configure settings for everyone in the server
/server_config [live_transcript] [transcript_recipients] [transcript_style] [ignore_bots] [censor]
Use the /server_config
command to configure the settings for the current server that you are in. Only users with admin permissions in the server may use this command.
Servers currently have the following settings:
Configure settings for just yourself
/user_config [exclude_stt] [default_tts_voice]
Use the /user_config
command to configure your personal settings for your Discord account.
These settings will persist no matter which server you are in.
Users currently have the following settings:
Check your current server or user configurations
/server_status
Run the /server_status
command to get a break down of your current server configurations.
Run the /user_status
command to get a break down of your current user configurations.
The SeaVoice Discord bot is completely free. No sign up required. Try it out and have fun!
About Seasalt.aiSeasalt.ai is a Seattle-based startup founded by experts in speech and language technologies.
Data UsageWe collect anonymized voice data for the sole purpose of improving our speech and NLP models. We will never share or sell your data. You can read our full privacy policy here.
KeywordsText-to-speech | TTS | Text to Speech | Text to Voice | Speech Synthesis
Speech-to-text | STT | Transcription | Speech Recognition
Real-time
Artificial Intelligence | AI
Communication
Utility
Voice Channel | Voice Chat
Accessibility