How To Use Pyro

Garrett Cahill
November 27, 2023
How To Use Pyro Illustration

How to use Pyro

Pyro makes high-quality AI voices to create ads in seconds. Below are effective techniques to guide Pyro's AI in adding pauses, conveying emotions, and pacing the speech.​

Pauses

There are a few ways to introduce a pause or break and influence the rhythm and cadence of the speaker. The most consistent way is programmatically using the syntax <break time="1.5s" />. This will create an exact and natural pause in the speech. It is not just added silence between words, but the AI has an actual understanding of this syntax and will add a natural pause.

However, since this is more than just inserted silence, how the AI handles these pauses can vary. As usual, the voice used plays a pivotal role in the output. Some voices, those trained with a few “uh”s and “ah”s in them, have shown to sometimes insert those vocal mannerisms during the pauses, like a real speaker might.

An example could look like this:

"Give me one second to think about it." <break time="1.0s" /> "Yes, that would work."

Break time should be described in seconds, and the AI can handle pauses of up to 3 seconds in length.

Please avoid using an excessive number of break tags as that has shown to potentially cause some instability in the AI. The speech of the AI might start speeding up and become very fast, or it might introduce more noise in the audio and a few other strange artifacts. We are working on resolving this.

Alternatives

These options are inconsistent and might not always work. We recommend using the syntax above for consistency.

One trick that seems to provide the most consistent output - sans the above option - is a simple dash - or the em-dash —. You can even add multiple dashes such as -- -- for a longer pause.

"It – is - getting late."

Ellipsis ... can sometimes also work to add a pause between words but usually also adds some “hesitation” or “nervousness” to the voice that might not always fit.

“I... yeah, I guess so..."

Pronunciation

In certain instances, you may want the model to pronounce a word, name, or phrase in a specific way. Pronunciation can be specified using standardized pronunciation alphabets. 

Currently we support the International Phonetic Alphabet (IPA) and the CMU Arpabet. Pronunciations are specified by wrapping words using the Speech Synthesis Markup Language (SSML) phoneme tag.

To use this feature, you need to wrap the desired word or phrase in the <phoneme alphabet="ipa" ph="your-IPA-Pronunciation-here">word</phoneme> tag for IPA, or <phoneme alphabet="cmu-arpabet" ph="your-CMU-pronunciation-here">word</phoneme> tag for CMU Arpabet. Replace "your-IPA-Pronunciation-here" or "your-CMU-pronunciation-here" with the desired IPA or CMU Arpabet pronunciation.

An example for IPA:

<phoneme alphabet="ipa" ph="ˈæktʃuəli">actually</phoneme>

An example for CMU Arpabet:

<phoneme alphabet="cmu-arpabet" ph="AE K CH UW AH L IY">actually</phoneme>

It is important to note that this only works per word. Meaning that if you, for example, have a name with a first and last name that you want to be pronounced a certain way, you will have to create the pronunciation for each word individually.

English is a lexical stress language, which means that within multi-syllable words, some syllables are emphasized more than others. The relative salience of each syllable is crucial for proper pronunciation and meaning distinctions. So, it is very important to remember to include the lexical stress when writing both IPA and ARPAbet as otherwise, the outcome might not be optimal.

Take the word “talon”, for example.

Incorrect:

<phoneme alphabet="cmu-arpabet" ph="T AE L AH N">talon</phoneme>

Correct:

<phoneme alphabet="cmu-arpabet" ph="T AE1 L AH0 N">talon</phoneme>

The first example might switch between putting the primary emphasis on AE and AH, while the second example will always be pronounced reliably with the emphasis on AE and no stress on AH.

If you write it as:

<phoneme alphabet="cmu-arpabet" ph="T AE0 L AH1 N">talon</phoneme>

It will always put emphasis on AH instead of AE.

Emotion

If you want the AI to express a specific emotion, the best approach is to write in a style similar to that of a book. To find good prompts to use, you can flip through some books and identify words and phrases that convey the desired emotion.

For instance, you can use dialogue tags to express emotions, such as he said, confused, or he shouted angrily. These types of prompts will help the AI understand the desired emotional tone and try to generate a voiceover that accurately reflects it. With this approach, you can create highly customized voiceovers that are perfect for a variety of applications.

"Are you sure about that?" he said, confused.

"Don’t test me!" he shouted angrily.

You will also have to somehow remove the prompt as the AI will read exactly what you give it. The AI can also sometimes infer the intended emotion from the text’s context, even without the use of tags.

"That is funny!"

"You think so?"

This is not always perfect since you are relying on the AI discretion to understand if something is sarcastic, funny, or just plain from the context of the text.

Pacing

To control the pacing of the speaker, you can use the same approach as in emotion, where you write in a style similar to that of a book. While it’s not a perfect solution, it can help improve the pacing and ensure that the AI generates a voiceover at the right speed. With this technique, you can create high-quality voice overs that are both customized and easy to listen to.

"I wish you were right, I truly do, but you're not," he said slowly.

If you want to try voice cloning or use a different voice than the ones on Pyro, please reach out to gcahill@firebaystudios.com

Examples

Shouting

Rising anger, whispering to shouting, “No, you clearly don’t know who you’re talking to, so let me clue you in. I am not in danger, Skyler. I AM the danger. A guy opens his door and gets shot and you think that of me? No. I am the one who knocks!”

Emotion

“Noooo. I don’t want to!” she cried. “I want… to eat… my ice cream!” She sobbed uncontrollably.

Whispering

“When you get to the gate, use the key! – and - make sure to not let… the.. demons… in!”

Laughter

“Haha! That’s funny! I wish I would have thought of that. I guess it doesn’t make sense really.” He giggled

Accents

“It sure does, Jackie. My mama always said: ‘In Carolina, the air’s so thick you can wear it!‘”

Pauses

“If you want to introduce a pause ––- you can use dashes or … you can use an ellipsis”

Read more posts

Start creating with Firebay Studios.

Schedule a Demo