Lesson 2: Using Voice Recognition

Before you start this lesson, be sure you understand the concepts presented in Lesson 1.

Step 1. Start with the skeletal VoiceXML structure

As mentioned in Lesson 1, all VoiceXML applications contain the <?xml>, <vxml> and </vxml> lines at least:


<?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <!-- The code will go here --> </vxml>
Notice the comment embraced with "<!-- -->"

Step 2. Now we will add the mechanism to prompt the user and wait for his/her response


<form id="Choices"> <field name="choice"> <prompt> Please choose from the following: To add a course, say add. To drop a course, say drop. To check your schedule, say check. </prompt> <grammar type="application/srgs+xml" version="1.0" root="root"> <rule id="root"> <one-of> <item>add</item> <item>drop</item> <item>change</item> </one-of> </rule> </grammar> </field> </form>

Notes

Test your code by saying nothing. What happened?
Test it again by saying "Whatever". What happened?
Test it again by interrupting the prompt with your response.

Step 3. Now let's override the default for "No Input":


<field> ... <noinput> I didn't hear you. <reprompt/> </noinput> ... </field>

Notes: <reprompt/> is an example of a stand-alone tag. A tag that is used by itself (not in a pair) is called a stand-alone or empty tag. Note that there is an end-of-tag slash (/) at the end of a stand-alone or empty tag.

The <form> tag is what groups sections of input and output together. It is also known as a "dialog". In this document, there is only one form, but a document could of course contain any number of forms.

Step 4. Now let's override the default for "No Match":


<field> ... <nomatch> I didn't quite understand you. <reprompt/> </nomatch> ... </field>

Step 5. Try out bargein

By default, <bargein> is "true". While you're testing (at least), it's a good idea to make it "false". You will notice much higher accuracy in speech recognition when <bargein> is turned off.


<property name="bargein" value="false"/>

So, here's our code so far. We have an application that accepts an input with three choices, but doesn't do anything yet with the input:


<?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <property name="bargein" value="false"/> <form id="Choices"> <field name="choice"> <prompt> Please choose from the following: To add a course, say add. To drop a course, say drop. To check your schedule, say check. </prompt> <grammar type="application/srgs+xml" version="1.0" root="root"> <rule id="root"> <one-of> <item>add</item> <item>drop</item> <item>change</item> </one-of> </rule> </grammar> <noinput> I didn't hear you. <reprompt/> </noinput> <nomatch> I didn't quite understand you. <reprompt/> </nomatch> </field> </form> </vxml>

Step 6. Take action when match is found:

The <filled> tag will allow us to take some action when a grammar is recognized. The VoiceXML <if> tag is very similar to <if> in many other programming languages.


<field> ... <filled> <if cond="choice=='add'"> OK let's add a course. <elseif cond="choice=='drop'"/> OK let's drop a course. <else/> OK let's check your schedule. </if> </filled> ... </field>
Notes:
  1. Be very careful to use only lower-case in the "cond=" construction. The expression following "cond=" is an example of ECMAScript, otherwise known as Javascript or Jscript. It is case-sensitive.
  2. Notice the double-quotes around the entire expression following "cond=". Within those double-quotes is a pair of single-quotes around the string 'drop' or 'add'.

Step 7. Add Debugging Information

As a final step, (which perhaps should have been step 2 or 3), we will add lines to help in debugging:


<meta name="maintainer" content="yourname@yourserver.com"/> <property name="loglevel" value="3"/> <property name="metricslevel" value="2"/>
Notes:
  1. The "maintainer" is the person to whom e-mails will be sent with any error messages.
  2. The "loglevel" is the amount of detail sent in the above e-mails, maximum "4".
  3. The "metricslevel" is the amount of detail in the on-line trouble-shooting. To access this trouble-shooting method, log in to the Developer's Workshop, then click on "Call Log Explorer" and input the extension you just called.

Step 8. Putting it all together

Our script is now complete, and the entire file should look like this:


<?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <meta name="maintainer" content="yourname@yourserver.com"/> <property name="loglevel" value="3"/> <property name="metricslevel" value="2"/> <property name="bargein" value="false"/> <form id="Choices"> <field name="choice"> <prompt> Please choose from the following: To add a course, say add. To drop a course, say drop. To check your schedule, say check. </prompt> <grammar type="application/srgs+xml" version="1.0" root="root"> <rule id="root"> <one-of> <item>add</item> <item>drop</item> <item>change</item> </one-of> </rule> </grammar> <noinput> I didn't hear you. <reprompt/> </noinput> <nomatch> I didn't quite understand you. <reprompt/> </nomatch> <filled> <if cond="choice=='add'"> OK let's add a course. <elseif cond="choice=='drop'"/> OK let's drop a course. <else/> OK let's check your schedule. </if> </filled> </field> </form> </vxml>

Step 9. Upload and test

Save this file on your web site with a ".vxml" extension, for example:
http://www.freewebsite.com/yourname/lesson2.vxml

Create an extension for this application, just as in Lesson 1, then call the number and test your VoiceXML program!

What did we learn in Lesson 2?

  1. A form groups sections of input and output together.
  2. A field contains a variable where user input is stored.
  3. <noinput> and <nomatch> are tags that can be used to override the default action taken when there is no input or input that doesn't match a grammar.
  4. The recognizer is more accurate if "bargein" is turned off.
  5. <if>, <elseif/> and <else/> control the flow of events.
  6. VoiceGenie offers several methods to debug your application, which are controlled by adding a <property> line to the program.