|
How does VoiceXML work?
Let's compare running a VoiceXML phone application with viewing a web page written in HTML.
What do we (the users) use?
-> HTML: a visual interface (screen + keyboard/mouse) combined with software that interprets HTML and can interact with web servers. We think of this as a standard web browser.
-> VoiceXML: an audio interface (a phone + engines that recognize spoken input and "read" out text) combined with software that interprets VoiceXML and can interact with web servers. We can think of this as an audio web browser.
1. Start up the browser
-> HTML: connect to the Internet and open the browser software on your screen.
-> VoiceXML: use a phone to call the VoiceXML platform (where the VoiceXML interpreter, speech recognizer, and Text-to-Speech engine run).
2. Request a page from any web server
-> HTML: type the URL into the Address input box.
-> VoiceXML: the platform will look up the URL associated with the number you dialed.
3. The page is fetched
With both visual and audio browsers, pages are fetched using the Common Gateway Interface (CGI) protocol. This means that technologies like Perl, PHP, ColdFusion, ASP, and JSP can be used to write code that generates VoiceXML/HTML dynamically.
4. The page is sent to the browser
-> HTML: the page is interpreted by the browser software - output is presented as text and graphics on the screen, and input is accepted from the keyboard or mouse.
-> VoiceXML: the page is interpreted by the VoiceXML interpreter - output is presented as audio (recorded or Text-to-Speech), and the user provides input by speaking or pressing touchtone keys.
|