From virtual assistants to voiceovers for audiobooks, AI speech generation has become a fast-growing field — and it’s no wonder companies are rushing to realize the technology’s potential.
Among them is the Valencia-based company Voicemod. The startup has developed an AI voice changer and soundboard software that provides instant voice-to-speech conversion. Unlike most of its competitors, the company claims to convert voices in real-time and with low latency, allowing users to converse like in real life.
According to Jaime Bosch, CEO and co-founder of Voicemod, the company trains its AI model using publicly available datasets and professional voice actors, resulting in a broad pool of voice expressions, pitches, tones and emotions. Through machine learning techniques, the model learns to understand, analyze and predict a person’s language patterns and subtleties.
“When a user speaks into our software or application, their speech input is processed in real time,” Bosch told TNW. “Our AI model then applies the learned patterns and transformations to the input, enabling instant language conversion.”
Voicemod is primarily aimed at the entertainment industry including gamers, streamers, content creators and vtubers on platforms ranging from Discord and Twitch to Zoom and WhatsApp.
In order to continue to meet the increasing demand from users for self-expression, pseudonymity and creativity on the Internet, the start-up is now launching the so-called “AI Humans” collection in addition to the 100 language options in its portfolio. Although Voicemod already offers filters for human voices, the new collection is said to be the company’s most human-realistic yet.
Photo credit: Voicemod
AI Humans is based on recordings of voice actors and consists of 20 sound avatars that differ in personality, gender and age. The roles include Joe, an 80-year-old male voice with a “raspy, sardonic tone” and Jennifer, a 25-year-old female voice with an “energetic and kind” character. Users can also adjust the pitch of each persona, changing the perception of the voice’s gender and age.
The following video can give you an idea of what these characters sound like:
“AI voices offer exciting opportunities for industries looking to foster creative exploration and self-expression, improve personalization, and foster inclusion in digital spaces,” said Bosch.
However, despite the positive effects that AI speech generation can have, the technology also comes with numerous risks. Some of these include abuse, fraud, identity theft and even voice theft, which particularly affects professional voice actors.
According to Bosch, Voicemod is actively working to mitigate these risks. For example, the company is developing watermarking technology to help platforms identify and track AI-generated voices, while implementing measures to protect the intellectual property of the voice actors it works with.
Bosch believes that AI will become “a tool” for these professionals. “What is perhaps being overlooked in these discussions is that behind every deployment of real-time voice AI, the use case that Voicemod is targeting, there is a human effectively driving the AI,” he told TNW.
Voicemod already has over 40 million desktop downloads. In the future, it will also be introduced on mobile devices and will reach millions of monthly active users. It also works on B2B partnerships with gaming companies and VR headset platforms.
The software is available for free, with the option of a paid PRO version that unlocks additional features and content.