Input

Language accent (region)
Voice (quality)
Text to process You can add synthetic pauses by adding a silence tags measured in seconds. E.g. Hello[1s]Kokoro[0.2s]Web
Speed 1x