How data is used in AI dubbing... | Papercup Blog

How data is used in AI dubbing...

by Sam Wells

August 2, 2024

4 min read

How data is used in AI dubbing

Data is the cornerstone of any generative AI company. In AI dubbing, the quality and effectiveness of synthetic voices hinges on the types of data used to train speech models. From custom speech data meticulously recorded to fit specific use cases to licensed and open-source datasets, each type contributes something different to the process. Let’s look at the data we use at Papercup, why, and how we procure it.

The role of data in AI dubbing

What is training data?

Training data is the foundation upon which AI algorithms learn patterns, relationships, and make predictions or decisions. In AI dubbing, this training data primarily consists of speech data, categorized into:

Open-source speech data: Data with an open license that anyone can use, modify, or distribute.
Custom speech data: High-quality data commissioned for specific use cases
Licensed speech data: Pre-existing data licensed by content owners for use in model training

The different types of data: custom, licenced and open-source

Open-source data

Open-source data offers a vast pool of speech content, which allows us to establish a baseline for training AI models. This initial training phase helps establish basic model capabilities before refining them with high-quality, use-case-specific data

Licensed data

Licensed data (data, in our case, procured via partnerships with top content studios) provides access to high-quality speech recordings that fit particular use cases. We never copy the underlying voices in the content, only the style of the speech.

Our partnerships are mutually beneficial to both parties. Content owners license their data to train our models and can then take advantage of the higher-grade AI dubs our models can produce.

Custom speech data

At Papercup, custom speech data refers to high-quality speech datasets commissioned by Papercup and procured specifically to build synthetic voices. To procure this data, we work with professional voice actors, a process that enables us to cast for the right voice characteristics—tone, pitch, and accent—and direct performance.

Another reason we use custom data is that we can be very specific about how it will be used commercially—this protects both us and the voice actors, as we lay out in our ethical pledge.

Custom and licensed data provide high-quality, specific training examples; open-source data can fill in the gaps and increase the overall volume of training data. This combination ensures that models are well-rounded and capable of performing effectively in various scenarios.

What are all these different types of data used for?

Producing high-quality, multilingual voices

The custom datasets created with voice talent enable Papercup to offer AI dubbing in a wide range of languages that sound native. Romance languages derived from Latin, known for their emotive nature, are well-suited for expressive voice performances, while Germanic languages and Arabic present unique challenges due to their phonetic characteristics. Working with voice actors allows us to iteratively collect data that solves problems specific to creating different languages. The process allows us to optimize the data we collect by using phonetic guide markers and skilled language-speaking voice directors to maintain consistency and accuracy across different languages.

Testing new languages

Open-source data allows us to roll out new languages faster using vast amounts of diverse, readily available speech data. While this alone cannot be used to create languages in totality, it allows us to test and learn before laying on top custom data that is more expensive and more time-consuming to source

Tailoring models to specific use cases

Customer data is invaluable for training AI models to handle specific use cases. For instance, a media company specializing in documentary films can provide speech data that helps our AI learn the nuances of documentary narration. Using this tailored data, we create models better suited for particular genres, improving the quality and relevance of the synthetic voices.

Using customer data is not a one-time process; new data is continuously incorporated to refine and improve the models. As more data is collected, the models become increasingly sophisticated and capable of generating even higher-quality synthetic voices.

Performance monitoring and adjustment

Customer feedback on the quality and accuracy of the dubs is crucial for identifying areas of improvement. This feedback loop enables us to make the necessary adjustments, ensuring our AI voices meet the high standards expected by our customers.

How does Papercup procure its data?

Working with voice actors

Our VAs record to a carefully constructed method: our scripts are meticulously crafted and optimized to capture the subtle nuances of real language and produce precisely the data we need. Creating this controlled environment is crucial for ensuring the quality of the synthetic voice. In our case, quality can refer to audio quality (fidelity), the expressivity of the voice, and its ability to produce all manner of natural speech and non-speech features (e.g., laughing, crying).

Commissioning this high-quality data ensures we consistently train our models to produce equally high-quality AI voices for specific use cases. For example, a speech model trained solely on open-source sports material won’t produce high-quality dubs for reality television series.

Content partnerships with top studios

As well as working directly with voice actors, we have data partnerships with top content studios (whose content aligns with our objectives) that give us access to a specific data pool to train our models. This works for the industry and us; the studios are investing in a service that enables them to scale global distribution, and we’re gaining access to top-quality, use-case-specific data.

Customer data

Papercup prioritizes the ethical use of data. We never use customer data without explicit permission. Some customers allow their data to be used to improve AI models, while others prefer to keep it solely for dubbing purposes. This transparent approach ensures that the use of data aligns with customer preferences and enhances the quality of the AI-generated voices.

Summary

Data is the foundation of AI dubbing and directly influences the quality of the synthetic voice models we produce. Papercup’s voices are geared towards providing AI dubbing for media companies (Fremantle, Fuse Media, Cineverse), digital publishers (Bloomberg, Sky News and Insider) and enterprises like Mindvalley. We factor this into our data collection and shape our process around it to create AI voices that surpass customers’ quality bar and, perhaps even more importantly, that of their viewers.

Find out more about AI dubbing, take the tour.

In this article

The role of data in AI dubbing

The different types of data: custom, licenced and open-source

What are all these different types of data used for?

How does Papercup procure its data?

Summary

Related Blogs

The overall average watch time and completion on our new Spanish Sky News channel is so far above and beyond what we had expected. That’s a testament to the quality of the Papercup solution and then how it has transformed into positive user behavior that shows us how they consume content.

Andy Gill

Audience & Partnerships Manager at Sky News

The primary driver for looking at AI dubbing was being able to increase our viewer base and revenue and to do that through using videos from our existing archive. Previously, we have tried different approaches for translations before such as subtitling but more recently we’ve been interested by the new YouTube MLA feature, which has the potential to drive audience expansion.

John Montoya

Senior Director, Content Strategy at Vice