Audio Content Creation
Optimize text-to-speech voice output by easily adjusting and fine-tuning key speech attributes.
Get started
New to Speech Services? Create a Speech resource
Tailor audio to your unique needs
Audio Content Creation enables you to visually inspect the speech attributes in real-time – such as voice style, rate, pitch,
volume, pronunciation and breaks. Doing this allows you to tailor speech patterns and quickly create more accurate,
expressive and customized audio output.
<mstts:express-as type="cheerful">"Oh my god, you are a genius."</mstts:express-as>Mom said to her son.
Customize output by <prosody rate="-50.00%"> slowing-down the speed rate.</prosody>
Add a break <break time="600ms"/> between words.
You can pronounce it ASAP or <sub alias="as soon as possible">ASAP</sub>.
Here’s how it all comes to life
Build accurate, authoritative audio for news reporting
Credibility is crucial to media organizations. This trust is built, in part, through accurate pronunciation of proper names, places and events. Our platform helps you tailor voice output to ensure precision in reporting.

Fine-tune custom expressions for conversational AI
Voice interaction has become ubiquitous. A big part of our daily voice interactions with conversational AI revolve around standard questions. They respond with frequently-used, fixed phrases that should be pleasant and natural. To accomplish this, our platform lets you customize text-to-speech audio output for optimized performance – including tweaking domain-specific expressions.

Perform lifelike narration for training videos and product demos
Professional narration by a voice actor adds sophistication and polish to presentations. But human narrators are costly and time-consuming to direct. This platform creates human-like audio that’s both cost-effective and time-efficient.
Steps for getting the best audio
Custom voice Diagram
1 Create a Speech resource at
2 Create a new tuning file or upload your texts.
3 Choose a language and voices for your texts.
4 Customize, and fine tune, the speech output.
5 Download the audio, or get the SSML code, to embed.