voice recognition - Can Microsoft Bing Speech be configured to return only numbers / letters? -
can microsoft bing speech api configured return numbers , letters, opposed full words?
the use case translating canadian postal codes. ex. m 1 b 0 r 3. microsoft may return "em 1 0 3"
our audio file 8000hz , encoded "m-ulaw". have no flexibility in changing sample rate or encoding. using "smd" scenario, can't find documentation on does. base request uri:
https://speech.platform.bing.com/recognize?scenarios=smd&appid=d4d52672-91d7-4c74-8ad8-42b1d98141a5&device.os=your_device_os&version=3.0
is there way more accurate response microsoft use case?
thank you
you try using microsoft's custom speech service (previously known custom recognition intelligent service, or cris) create , use custom language model.
the guidelines transcription of custom language models "common acronyms can left single entity without periods or spaces between letters, other acronyms should written out in separate letters, each letter separated single space" , include example:
original text after normalization ----------------------- --------------------------- play ou812 van halen play o u 8 1 2 van halen
so following guidelines, custom language model file each line looks this:
m 1 b 0 r 3
you can generate file containing thousands of examples of canadian postal codes based on structure of codes, in regular expression format looks this:
[abceghjklmnprstvxy][0-9][abceghjklmnprstvwxyz][0-9][abceghjklmnprstvwxyz][0-9]
(the above expression taken this answer validating postal codes.)
by doing you're telling recognizer sort of things you're expecting people say, , helping choose when there multiple possibilities sound (e.g. "u" vs. "you"). think make huge difference in results get.
Comments
Post a Comment