Documentation Studio Tools Voice Generator

Voice Generator

Create high-quality text-to-speech audio files for your NICE CXone contact center

Note: The Voice Generator requires a Pro subscription to access all premium voices and unlimited generations.

Overview

The Voice Generator allows you to easily create professional audio files for your IVR systems and contact center applications. Using advanced text-to-speech engines from Google Cloud and Amazon Polly, this tool produces natural-sounding voice recordings that can be directly imported into NICE CXone Studio.

Key Features

  • Multiple Voice Providers: Choose from Google Cloud TTS and Amazon Polly voice engines
  • Extensive Voice Library: Access 200+ natural-sounding voices across multiple languages
  • Speech Markup Language Support: Fine-tune pronunciation, emphasis, pauses, and intonation
  • Audio Format Control: Generate audio in .wav format optimized for contact center use
  • Voice Sample Previews: Listen to voice samples before generating your audio file
  • Batch Processing: Generate multiple audio files in a single operation
  • Secure Storage: All audio files are securely stored in your account
  • Direct CXone Import: Easily import generated files into NICE CXone Studio

Getting Started

Follow these steps to generate your first audio file:

  1. Navigate to Tools → Studio Tools → Voice Generator in the main menu
  2. Enter your text in the input field
  3. Select your preferred voice provider (Google Cloud or Amazon Polly)
  4. Choose a voice from the dropdown menu
  5. Adjust voice settings (speed, pitch, etc.) if needed
  6. Click "Generate Audio" to create your audio file
  7. Preview the audio using the built-in player
  8. Download the .wav file or save it to your library

Voice Providers

Google Cloud Text-to-Speech

Google Cloud TTS offers advanced neural voices with natural-sounding speech patterns. These voices use WaveNet technology for highly realistic audio generation.

Key features:

  • Neural voices with natural prosody and intonation
  • Support for 40+ languages and variants
  • Wide range of speaking styles
  • SSML support for advanced speech customization

Amazon Polly

Amazon Polly provides lifelike voices with consistent quality. The service includes standard voices as well as neural voices for even more natural-sounding speech.

Key features:

  • Neural and standard voice options
  • Support for 30+ languages
  • Newscaster speaking style
  • Extensive SSML markup support

Using Speech Markup Language (SSML)

You can enhance your text-to-speech output using Speech Synthesis Markup Language (SSML). This allows you to control how the text is spoken, including pronunciation, pauses, emphasis, and more.

Example SSML Tags

<speak>
  Hello, my name is <say-as interpret-as="characters">DJ</say-as>.
  I'll be your <emphasis level="strong">virtual assistant</emphasis> today.
  <break time="500ms"/>
  Please select from the following options.
</speak>

Common SSML Tags

Tag Description Example
<break> Adds a pause <break time="500ms"/>
<emphasis> Emphasizes text <emphasis level="strong">important</emphasis>
<say-as> Controls how text is spoken <say-as interpret-as="characters">PIN</say-as>
<prosody> Adjusts rate, pitch, volume <prosody rate="slow" pitch="low">text</prosody>
<sub> Substitutes pronunciation <sub alias="Doctor">Dr.</sub>

Audio File Management

All generated audio files are stored in your account and can be managed through the Voice Generator interface:

  • Library: Access all your previously generated audio files
  • Download: Download files in .wav format compatible with NICE CXone Studio
  • Rename: Change file names for better organization
  • Delete: Remove files you no longer need
  • Re-generate: Create a new version of an existing file with different settings

Best Practices

Follow these guidelines to get the best results from the Voice Generator:

  • Keep prompts concise: Shorter prompts are easier for callers to understand
  • Use punctuation: Proper punctuation improves the natural flow of speech
  • Test different voices: Different voices may handle certain phrases better than others
  • Use SSML for special terms: Acronyms, numbers, and specialized terms often benefit from SSML markup
  • Consider language localization: Choose voices that match your target audience's dialect and accent
  • Preview before saving: Always listen to the generated audio before finalizing
  • Maintain consistent voices: Use the same voice for all prompts in a single IVR flow

Examples

Here are some example use cases for the Voice Generator:

Basic Greeting

Thank you for calling Customer Service. This call may be recorded for quality and training purposes.

Menu Options with SSML

<speak>
  Please select from the following options:
  <break time="300ms"/>
  For Sales, press <emphasis level="strong">1</emphasis>.
  <break time="300ms"/>
  For Support, press <emphasis level="strong">2</emphasis>.
  <break time="300ms"/>
  For Billing, press <emphasis level="strong">3</emphasis>.
  <break time="300ms"/>
  To repeat this menu, press <emphasis level="strong">0</emphasis>.
</speak>

Troubleshooting

If you encounter issues with the Voice Generator, try these solutions:

  • Audio sounds unnatural: Try different voices or add punctuation and SSML markup
  • Generation fails: Ensure your text doesn't exceed the character limit (4,000 characters)
  • Pronunciation issues: Use SSML <say-as> or <phoneme> tags to correct pronunciation
  • File size too large: Shorten your text or split into multiple audio files
  • Import issues with Studio: Ensure you're using the correct audio format (.wav)

Related Documentation

Need help with the Voice Generator? Contact our support team for assistance.