Voice Generator

Create high-quality text-to-speech audio files for your NICE CXone contact center

Note: The Voice Generator requires a Pro subscription to access all premium voices and unlimited generations.

Overview

The Voice Generator allows you to easily create professional audio files for your IVR systems and contact center applications. Using advanced text-to-speech engines from Google Cloud and Amazon Polly, this tool produces natural-sounding voice recordings that can be directly imported into NICE CXone Studio.

Key Features

Multiple Voice Providers: Choose from Google Cloud TTS and Amazon Polly voice engines
Extensive Voice Library: Access 200+ natural-sounding voices across multiple languages
Speech Markup Language Support: Fine-tune pronunciation, emphasis, pauses, and intonation
Audio Format Control: Generate audio in .wav format optimized for contact center use
Voice Sample Previews: Listen to voice samples before generating your audio file
Batch Processing: Generate multiple audio files in a single operation
Secure Storage: All audio files are securely stored in your account
Direct CXone Import: Easily import generated files into NICE CXone Studio

Getting Started

Follow these steps to generate your first audio file:

Navigate to Tools → Studio Tools → Voice Generator in the main menu
Enter your text in the input field
Select your preferred voice provider (Google Cloud or Amazon Polly)
Choose a voice from the dropdown menu
Adjust voice settings (speed, pitch, etc.) if needed
Click "Generate Audio" to create your audio file
Preview the audio using the built-in player
Download the .wav file or save it to your library

Voice Providers

Google Cloud Text-to-Speech

Google Cloud TTS offers advanced neural voices with natural-sounding speech patterns. These voices use WaveNet technology for highly realistic audio generation.

Key features:

Neural voices with natural prosody and intonation
Support for 40+ languages and variants
Wide range of speaking styles
SSML support for advanced speech customization

Amazon Polly

Amazon Polly provides lifelike voices with consistent quality. The service includes standard voices as well as neural voices for even more natural-sounding speech.

Key features:

Neural and standard voice options
Support for 30+ languages
Newscaster speaking style
Extensive SSML markup support

Using Speech Markup Language (SSML)

You can enhance your text-to-speech output using Speech Synthesis Markup Language (SSML). This allows you to control how the text is spoken, including pronunciation, pauses, emphasis, and more.

Example SSML Tags

<speak>
  Hello, my name is <say-as interpret-as="characters">DJ</say-as>.
  I'll be your <emphasis level="strong">virtual assistant</emphasis> today.
  <break time="500ms"/>
  Please select from the following options.
</speak>

Common SSML Tags

Tag	Description	Example
<break>	Adds a pause	`<break time="500ms"/>`
<emphasis>	Emphasizes text	`<emphasis level="strong">important</emphasis>`
<say-as>	Controls how text is spoken	`<say-as interpret-as="characters">PIN</say-as>`
<prosody>	Adjusts rate, pitch, volume	`<prosody rate="slow" pitch="low">text</prosody>`
<sub>	Substitutes pronunciation	`<sub alias="Doctor">Dr.</sub>`

Audio File Management

All generated audio files are stored in your account and can be managed through the Voice Generator interface:

Library: Access all your previously generated audio files
Download: Download files in .wav format compatible with NICE CXone Studio
Rename: Change file names for better organization
Delete: Remove files you no longer need
Re-generate: Create a new version of an existing file with different settings

Best Practices

Follow these guidelines to get the best results from the Voice Generator:

Keep prompts concise: Shorter prompts are easier for callers to understand
Use punctuation: Proper punctuation improves the natural flow of speech
Test different voices: Different voices may handle certain phrases better than others
Use SSML for special terms: Acronyms, numbers, and specialized terms often benefit from SSML markup
Consider language localization: Choose voices that match your target audience's dialect and accent
Preview before saving: Always listen to the generated audio before finalizing
Maintain consistent voices: Use the same voice for all prompts in a single IVR flow

Examples

Here are some example use cases for the Voice Generator:

Basic Greeting

Thank you for calling Customer Service. This call may be recorded for quality and training purposes.

Menu Options with SSML

<speak>
  Please select from the following options:
  <break time="300ms"/>
  For Sales, press <emphasis level="strong">1</emphasis>.
  <break time="300ms"/>
  For Support, press <emphasis level="strong">2</emphasis>.
  <break time="300ms"/>
  For Billing, press <emphasis level="strong">3</emphasis>.
  <break time="300ms"/>
  To repeat this menu, press <emphasis level="strong">0</emphasis>.
</speak>

Troubleshooting

If you encounter issues with the Voice Generator, try these solutions:

Audio sounds unnatural: Try different voices or add punctuation and SSML markup
Generation fails: Ensure your text doesn't exceed the character limit (4,000 characters)
Pronunciation issues: Use SSML <say-as> or <phoneme> tags to correct pronunciation
File size too large: Shorten your text or split into multiple audio files
Import issues with Studio: Ensure you're using the correct audio format (.wav)