Voice - PolyAI Platform

This page requires Python familiarity. It covers programmatic voice configuration from Python functions. The PolyAI platform supports flexible voice selection for external providers such as ElevenLabs, Cartesia, Rime, PlayHT, Minimax, Hume, and Google TTS.

Provider classes

When picking models, adjusting stability, or accessing third-party providers – use provider-specific TTSVoice classes.

Example: ElevenLabs

from polyai.voice import ElevenLabsVoice

conv.set_voice(
    ElevenLabsVoice(
        provider_voice_id="gDnGxUcsitTxRiGHr904",
        model_id="eleven_turbo_v2_5",
        stability=1.0,          # Recommended starting point (Robust); eleven_v3 only supports 0.0, 0.5, 1.0
        similarity_boost=0.7,
        speed=1.0,              # Optional: 0.7–1.2, adjusts speech rate
    )
)

Available ElevenLabs model IDs: eleven_monolingual_v1, eleven_multilingual_v1, eleven_turbo_v2, eleven_turbo_v2_5, eleven_flash_v2_5, and eleven_v3. The default is eleven_turbo_v2_5. See ElevenLabs for details on each model.

eleven_v3 limitations:

Stability: The eleven_v3 model only supports discrete stability values: 0.0 (Creative), 0.5 (Natural), and 1.0 (Robust). Values between these are not supported and may produce unexpected results. This differs from earlier models where stability accepts a continuous range.
Streaming latency: Do not set optimize_streaming_latency when using eleven_v3 – this parameter is not supported by the v3 model and will cause an error.

Example: Cartesia

from polyai.voice import CartesiaVoice, Emotion, EmotionKind, EmotionIntensity

conv.set_voice(
    CartesiaVoice(
        provider_voice_id="a1b2c3d4",
        speed=0.0,  # -1.0 (slowest) to 1.0 (fastest)
        emotions=[
            Emotion(EmotionKind.POSITIVITY, EmotionIntensity.HIGH)
        ],
        model_id="sonic"  # also: "sonic-preview", "sonic-3", "sonic-3.5", or a dated identifier e.g. "sonic-3-2025-10-27"
    )
)

Some Cartesia voices are faster than expected at the default speed. Test your chosen voice at speed=0.0 before deploying, and adjust toward -1.0 if the output is too fast.

Emotion options (legacy models):

EmotionKind: ANGER, POSITIVITY, SURPRISE
EmotionIntensity: LOWEST, LOW, HIGH, HIGHEST

Sonic 3 parameters: When using a Sonic 3 model ID (sonic-3 or sonic-3.5), the following additional parameters are supported:

volume (float, optional) – controls output volume (e.g. 0.5–2.0).
emotion (str, optional) – emotion string (e.g. "happy"). Sonic 3 models support a 10-emotion set.
language (str, optional) – language code (e.g. "en").
speed – on Sonic 3 models, the effective speed range is 0.6–1.5.

sonic-3.5 is the latest Cartesia model and inherits all Sonic 3 behavior. Use it for the most natural, expressive output; fall back to sonic-3 or sonic if you need an earlier model for parity with existing voices.

Example: Rime

from polyai.voice import RimeVoice

conv.set_voice(
    RimeVoice(
        provider_voice_id="voice_id",
        speech_alpha=1.0,  # <1.0 faster, >1.0 slower
        model_id="mistv2"  # or "mist"
    )
)

Example: Minimax

from polyai.voice import MinimaxVoice

conv.set_voice(
    MinimaxVoice(
        model_id="speech-02-hd",  # or speech-02-turbo, speech-01-hd, speech-01-turbo
        voice_id="voice_id",
        speed=1.0,      # 0.5-2.0
        vol=1.0,        # 0-10
        pitch=0,        # -12 to 12
        emotion="happy" # happy, sad, angry, fearful, disgusted, surprised, neutral
    )
)

Example: Hume

from polyai.voice import HumeVoice

conv.set_voice(
    HumeVoice(
        provider_voice_id="voice_uuid_or_name",
        voice_description="patient, empathetic counselor",  # Optional
        version="2",        # "1" for octave-1, "2" for octave-2
        instant_mode=False, # Ultra-low latency mode
        provider="HUME_AI"  # "CUSTOM_VOICE" or "HUME_AI"
    )
)

Example: Google TTS

from polyai.voice import GoogleVoice

conv.set_voice(
    GoogleVoice(
        provider_voice_id="ja-JP-Neural2-B",
        gender="male"  # "male", "female", or "neutral"
    )
)

Example: Custom provider

from polyai.voice import CustomVoice

conv.set_voice(
    CustomVoice(
        provider="MY_PROVIDER",
        provider_voice_id="voice_id",
        custom_param="value"  # Any additional kwargs
    )
)

Voice randomization

Use VoiceWeighting to randomly select a voice based on weighted probabilities:

from polyai.voice import VoiceWeighting, ElevenLabsVoice

conv.randomize_voice([
    VoiceWeighting(
        voice=ElevenLabsVoice(provider_voice_id="voice1"),
        weight=0.7
    ),
    VoiceWeighting(
        voice=ElevenLabsVoice(provider_voice_id="voice2"),
        weight=0.3
    ),
])

Weights must sum to 1.0.
Voices without explicit weights share the remaining probability equally.

Cache behavior

Changing model_id does not automatically invalidate cached audio.
To reset cached audio:
- Go to Channels > Voice > Audio management and delete existing cache entries.
- Or, create a new voice entry with a different voice ID.

Prepend the model ID to the voice ID (e.g. eleven_turbo_v2_5/a1b2c3...) to isolate cache entries per model. This is the most reliable way to ensure the correct model is used after a switch.

Language codes

When configuring a voice, make sure the language code in the provider_voice_id matches your deployment’s locale. An incorrect language code (e.g. en-GB instead of en-IE) can cause the TTS provider to render a different accent or voice than expected, even when the correct voice ID is set.

Additional options

stability – controls tone variability across runs (ElevenLabs).
speed – adjusts speech rate (ElevenLabs: 0.7–1.2; PlayHT: 0.1–5.0; other providers may differ).
randomize_voice() – supports external providers for weighted selection.

Documentation Index

​Provider classes

​Example: ElevenLabs

​Example: Cartesia

​Example: Rime

​Example: Minimax

​Example: Hume

​Example: Google TTS

​Example: Custom provider

​Voice randomization

​Cache behavior

​Language codes

​Additional options

Provider classes

Example: ElevenLabs

Example: Cartesia

Example: Rime

Example: Minimax

Example: Hume

Example: Google TTS

Example: Custom provider

Voice randomization

Cache behavior

Language codes

Additional options