HTML5: Reading texts with SpeechSynthesis

Navigation devices that have been around for a long time can now be used by the browsers. SpeechSynthesis from the Web Speech API allows to output texts with a human speaking voice. There are voices available for several languages – sometimes also several voices per language. At the touch of a button, your visitors can simply read the text of an HTML document.

Speech Synthesis is a technology that allows applications to read textual content using a human voice. Also part of the Web Speech API is SpeechRecognition, the speech recognition engine. This makes it possible to record voice inputs asynchronously and to recognize their contents. That’s how Siri, Cortana, the Google Assistant and some others just did not do it via the web but natively in the app or the operating system.

In the following text I focus on SpeechSynthesis. The reason you read at the bottom of the post;)

speech synthesis

Reader with variety of languages and voices

Of course, SpeechSynthesis only brings something when spending is not just possible in one language. The number of available voices per language, on the other hand, is more a matter of taste, or for me a question of equal rights.

In general, the number of available languages and voices depends on the browser. Chrome currently supports more than a dozen languages, including British and American English, German, French, Spanish and Italian, as well as Russian and Chinese. It is possible, although mostly meaningless to spend, for example, a German text with a French voice. Then the text is reproduced in German, with a French accent.

It’s so easy to start playing

In order to have a text spoken, a new statement (utterance) to be created. So that the browser knows, in which language the expression should be reproduced, can either directly one of the voices pervoiceor perlongProperty to be passed the language. Then let the browser speak the utterance.

					var words = new SpeechSynthesisUtterance ("Welcome"); words.lang = "de-DE"; window.speechSynthesis.speak (words);var words = new SpeechSynthesisUtterance ("Welcome");
					
				

In the example, the language is set to German. Since languages are partially – as in English – differentiated by country requireslongthe indication of a language including the country. So can be between Americanen-USand British Englishen-UKdiffer. By methodspeak ()In the example, the text is spoken with a German voice.

					var words = new SpeechSynthesisUtterance ("Welcome"); var vote = window.speechSynthesis.getVoices (); words.voice = votes [6]; window.speechSynthesis.speak (words);
				

In the second example, instead of one language, one vote is directly selected. These are pergetVoices ()all available votes are retrieved. About thevoiceProperty is added to one of the voices of the utterance. Voice 6 corresponds to the German vote. As a result, the two examples do not differ from each other.

This is how you query available voices in the browser

As soon as you pergetVoices ()accessing the voices of the browser, it is important to work with event listeners. Because the browser does not load the voices together with the document. That means it’s not possiblegetVoices ()directly when the page is called. The important thing is that first about the event listenerDOMContentLoadeda function is loaded that has aifQuery checks if SpeechSynthesis is already available or if it is supported by the browser at all. Then the voice output perclickEvent will be called.

					window.addEventListener ("DOMContentLoaded", function () {if (window.speechSynthesis! = undefined) {document.getElementById ("playback"). addEventListener ("click", function () {var tune = window.speechSynthesis.getVoices ( ) for (var i = 0; istimmen.length; i ++) {console.log ("voice" + i.toString () + "" + tune [i] .name); }}, false)}}, false)
				

In the example, clicking on the item with the IDreproductionPerform a function that writes all voices with the internal number and their name to the console. The German voice is named “Google German” in Chrome.

This useful script that I found on Stack Overflow lists the voices contained in each browser, but only for the browser it is called with.

How to control voice frequency and speed

In addition to the voices for each language, there is also a standard voice that works for all languages. To adapt to the best possible, one has the ability to define the frequency of the voice used and the speech rate.

					var words = new SpeechSynthesisUtterance ("This is faster and higher."); var vote = window.speechSynthesis.getVoices (); words.voice = votes [10]; words.pitch = 4; words.rate = 10; window.speechSynthesis.speak (words);
				

The propertypitchdetermines the voice frequency. A value between 0 and 2 is allowed, where the value 1 corresponds to the normal frequency of the voice. Everything below 1 results in a lower frequency, all about a higher one.

Withratethe speech rate is regulated. Values between 0.1 and 10 are permitted, with the value 1 representing the normal speech rate. Values below ensure that speech is very slow. Values above 1 ensure a correspondingly faster speech reproduction.

The disadvantage of this standard voice is that it sounds much more synthetic compared to the other voices. The voices available for each language are much more natural.

Control voice output

Next to the methodspeak (), which starts the speech, can be withpause()stop playback. With the methodresume ()a previously paused playback continues.

					document.getElementById ("pause"). addEventListener ("click", function () {window.speechSynthesis.pause ();}, false);
				

overvolumethe volume of the playback is controlled. Values between 0 and 1 are allowed. There are also several event listeners with which functions can be executed, for example, at the start and at the end of the speech output.

					words.addEventListener ("start", function () {document.title = "Listen to ...";}, false); words.addEventListener ("end", function () {document.title = "... over.";}, false);
				

If you start several voice outputs at the same time or start an output while another is still in the playback, the individual perspeak ()initiated language editions processed sequentially. About the propertypendingdo you ask if there are instances ofspeech synthesisare in the queue. The property is eithertrueorfalseout. WithspeakingIn addition, it can be determined whether speech is currently being played.

browser support

Although SpeechSynthesis is still an experimentally featured feature, browser support is good. Restrictions exist only in terms of some features in Opera and Safari for the desktop. Opera for Android does not support the API at all, but is also a rather minor representative in the browser game. SpeechRecognition is currently fully supported only by Chrome and is therefore not yet seriously considered for use.

Post picture: Depositphotos

(The article first appeared in July 2014 and has been kept up to date since then, the last update was in April 2019.)