The VoiceGenie platform now allows VoiceXML applications to be run using recorded audio as speech input.
This document tells you how to use this feature in your application, and includes the following topics:
Two examples of uses for this feature are: tuning applications, and running maintenance checks on a deployed application.
In order to tune an application to improve recognition accuracy, you need to assess performance before and after various tuning adjustments. However, to be assured that any changes in performance come from the adjustments, it's important that all other factors are held constant - including the input utterances. Using this feature, you can reuse a set of utterances (either recorded by testers from a call script, or recordings of actual callers), as input to the application, while you make changes to the grammars and properties.
Once an application is deployed, you don't want to wait for negative feedback from your users, to find out something's not working properly. Using this feature, you can create test scripts which will periodically "call" the application. You can check the results of these calls, to make sure that the ASR and your grammars are correctly accepting speech input.
This feature is very simple to use. For each <field> in your application, if you want the recognizer to use recorded audio as speech input, specify the audio source with the audioinexpr attribute. Otherwise, the recognizer will wait for speech input to come from the caller, as usual.
This attribute should specify one of the following sources:
<record> variable, if one was recorded earlier in the application$.utteranceaudio from an earlier <field>, if saveutterance is enabledNote: If you own a VoiceGenie platform, you can put audio files on the platform and reference them with audioinexpr="'file:///file path'". Or, if you put the audio files in VoiceGenie's builtin "audio" directory (or any subdirectory of "audio"), then you can reference them with audioinexpr="'builtin:file path, relative to audio directory'".
Here are three examples that show <field>s receiving their input from different sources of recorded audio.
Note: A caller will not be able to hear the recorded audio input.
Example 1. Speech input from an audio file. The recognition results are sent to a script that processes them, so the application's recognition performance can be assessed. Note that this example is only to illustrate the use of this feature; no implementation is given for stats.jsp.
<?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <meta name="maintainer" content="yourname@yourserver.com"/> <meta name="application" content="Recorded Audio Input 1"/> <property name="bargein" value="false"/> <form> <!-- The caller will not hear the input --> <field name="field1" audioinexpr="'http://developer.voicegenie.com/libraries/audio/ common/goodbye.vox'"> <prompt> Please say goodbye. </prompt> <grammar xml:lang="en-US" version="1.0" root="ROOT" xmlns="http://www.w3.org/2001/06/grammar" type="application/srgs+xml"> <rule id="ROOT" scope="public"> <item> goodbye </item> </rule> </grammar> <nomatch> <var name="field1" expr="'nomatch'"/> <submit next="stats.jsp" namelist="field1"/> </nomatch> <noinput> <var name="field1" expr="'noinput'"/> <submit next="stats.jsp" namelist="field1"/> </noinput> <filled> <var name="confidence" expr="field1$.confidence"/> <submit next="stats.jsp" namelist="field1 confidence"/> </filled> </field> </form> </vxml>
Example 2. Speech input from the result of an earlier <record>.
<?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <meta name="maintainer" content="yourname@yourserver.com"/> <meta name="application" content="Recorded Audio Input 2"/> <property name="bargein" value="false"/> <form> <record name="recordaudio" beep="true" dtmfterm="true"> <prompt> At the tone, please say goodbye, then press the pound key. </prompt> </record> <!-- Use recording from above as input here --> <!-- The caller will not hear the input --> <field name="field1" audioinexpr="recordaudio"> <prompt> Please say goodbye. </prompt> <grammar xml:lang="en-US" version="1.0" root="ROOT" xmlns="http://www.w3.org/2001/06/grammar" type="application/srgs+xml"> <rule id="ROOT" scope="public"> <item> goodbye </item> </rule> </grammar> <nomatch> I didn't understand your recording. <exit/> </nomatch> <noinput> I didn't hear your recording. <exit/> </noinput> <filled> I heard your recording say <value expr="field1"/>. </filled> </field> </form> </vxml>
Example 3. Speech input from the recording of the caller's earlier input. Check out the tutorial on saving caller utterances for more information.
Note: The following example uses the tag format that is supported by OSR (<tag>command='goodbye';</tag>). If you want to run this example with a different ASR engine, confirm the format supported by that engine, and modify the below tag content if necessary.
<?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <meta name="maintainer" content="yourname@yourserver.com"/> <meta name="application" content="Recorded Audio Input 3"/> <property name="ASRENGINE" value="SPEECHWORKS"/> <property name="bargein" value="false"/> <form> <grammar xml:lang="en-US" version="1.0" root="ROOT" xmlns="http://www.w3.org/2001/06/grammar" type="application/srgs+xml"> <rule id="ROOT" scope="public"> <item> goodbye <tag>command='goodbye';</tag> </item> </rule> </grammar> <field name="field1" saveutterance="true" slot="command"> <prompt> Please say goodbye. </prompt> <catch event="nomatch noinput"> Try again. <reprompt/> </catch> <filled> I recognized goodbye with a confidence of <value expr="field1$.confidence"/>. Let's make sure I'm a consistent recognizer. </filled> </field> <!-- Use recording of caller's last input as input here --> <!-- The caller will not hear the input --> <field name="field2" audioinexpr="field1$.utteranceaudio" slot="command"> <prompt> Please say goodbye. </prompt> <filled> I recognized goodbye with a confidence of <value expr="field2$.confidence"/>. <if cond="field1$.confidence == field2$.confidence"> See, I am a consistent recognizer! <else/> <!-- This should not happen --> I guess I'm not a consistent recognizer. </if> </filled> </field> </form> </vxml>
Usage Notes
<nomatch> and <noinput> handlers to leave the field in this case, by hanging up or exiting, by transitioning to another field/form/document, by setting the field variable or cond attribute, etc. If no such logic is provided (ie. if the handlers simply prompt to retry recognition), an infinite loop will occur, since the same input will be used each time.
<?xml version="1.0"?>
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
<property name="bargein" value="false"/>
<form>
<var name="autoinput" expr="'http://developer.voicegenie.com/libraries/audio
/horoscopes/zodiacs.vox'"/>
<field name="input" audioinexpr="autoinput">
<prompt> Please say test. </prompt>
<grammar xml:lang="en-US" version="1.0" root="ROOT"
xmlns="http://www.w3.org/2001/06/grammar" type="application/srgs+xml">
<rule id="ROOT" scope="public">
test
</rule>
</grammar>
<nomatch>
<assign name="autoinput" expr=""/>
Sorry, I didn't understand the automated input.
Now I'll listen to the live caller.
<reprompt/>
</nomatch>
<filled>
You said <value expr="input"/>.
</filled>
</field>
</form>
</vxml>
Here is the <field> attribute that is used to indicate that speech input is coming from recorded audio, and what the source of that audio is:
| Attribute | Possible Values |
| audioinexpr | - full http URI, ex. audioinexpr="'http://blah.com/audio/utterance.vox'"- full file URI (audio on the platform), ex. audioinexpr="'file:///usr/local/phoneweb/blah/utterance.vox'"- relative URI, ex. audioinexpr="'audio/utterance.vox'"- reference to builtin audio on the platform, ex. audioinexpr="'builtin:test1/utterance.vox'"- <record> field variable, ex. audioinexpr="recording1"- $.utteranceaudio shadow variable, ex. audioinexpr="field1$.utteranceaudio" |
Here are the supported audio formats:
| File Format | Extension | Sample Rate | Encoding |
| Dialogic Vox | .vox | 8 kHz | u-law, a-law |
| Microsoft WAVE | .wav | 8 kHz | u-law, a-law |
| AU Audio | .au | 8 kHz | u-law, a-law |
| NIST Sphere | .wav | 8 kHz | u-law, a-law |