Text to speech functionality in web applications using the iSpeech API

“Turn right onto Main Street”. If you are using a car navigation system like Garmin or TomTom, you will be all too familiar with phrases like this, spoken to you by the device mounted to your dashboard, magically guiding you through the maze of local roads and highways to your destination. Such text-to-speech capabilities are usually associated with native software implementations, not so much with web applications. With the iSpeech API, that changes. You can now build web applications that implement text-to-speech functionality with the power of JavaScript and HTML5.

The iSpeech demo page gives you an idea of the power of the API by providing a regular text input and being able to turn any text into spoken words. Let’s have a closer look now as to what is required to make this work in a web application.

Step 1: Sign up for a Developer Account and set up an API key

Signing up for a developer account is a requirement in order to use the API. After creating your account and logging in, you can create a new API key. Fill in the form and select “Desktop, Web, Other” as the application type in order to be able to use iSpeech’s REST API.

Step 2: Trying out the API

After creating an API key, you should find it in your list of available API keys. Clicking on “Settings” leads you to a form where you can set certain parameters for the text-to-speech functionality, like file formats, bit rates and frequency as well as add some more information about the app that uses the key. For this step, we just want to try out the API and get some text converted into speech, so the first thing we need to do is familiarize ourselves with how the API request is constructed. iSpeech provides good documentation on that and gives us the general layout of the request:

http://api.ispeech.org/api/rest?apikey=YOURAPIKEYHERE&action=convert&text=This+is+the+text+I+want+to+convert

Now we only need to replace YOURAPIKEYHERE with the API key we created in Step 1 and copy that request URL to the address bar of our browser or create an HTML5 audio element like the one below to embed in our HTML document.

Step 3: Tweaking the Request Parameters

The attentive reader might have noticed, from looking at the source code of the audio element in step 2, that we need to provide several fallback versions in addition to the MP3 format that the iSpeech API generates by default. This is necessary to satisfy the different codec requirements by modern browsers. To achieve this, the iSpeech API has a format parameter, that lets us specify the format in which we would like the audio piece to be returned to us. A complete list of supported formats can be found in the iSpeech API documentation.

If we want to use the Ogg Vorbis format, we can specify this like this:

http://api.ispeech.org/api/rest?apikey=YOURAPIKEYHERE&action=convert&text=This+is+the+text+I+want+to+convert&format=ogg

We can also specify a different voice using the “voice” parameter:

http://api.ispeech.org/api/rest?apikey=YOURAPIKEYHERE&action=convert&text=This+is+the+text+I+want+to+convert&format=ogg&voice=auenglishfemale

It is also possible to slow down the voice by providing a “speed” parameter with values between -10 (very slow) and 10 (very fast). However I noticed that not all voices provided by iSpeech support this parameter.

http://api.ispeech.org/api/rest?apikey=YOURAPIKEYHERE&action=convert&text=This+is+the+text+I+want+to+convert&format=ogg&speed=-5

Sounds great (literally), but…

Page 1 of 2 | Next page