Azure speech-to-text - Continuos Recognition

Ask Time：2019-01-13T13:41:36 Author：F_M

I would like to see the accuracy of the speech services from Azure, specifically speech-to-text using an audio file.

I have been reading the documentation https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/?view=azure-python and playing around with a suggested code from the MS quickstar page. The code workds fine and I can get some transcription, but it just transcribes the beginning of the audio (first utterance):

import azure.cognitiveservices.speech as speechsdk

speechKey = 'xxx'
service_region = 'westus'

speech_config = speechsdk.SpeechConfig(subscription=speechKey, region=service_region, speech_recognition_language="es-MX")
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=False, filename='lala.wav')

sr = speechsdk.SpeechRecognizer(speech_config, audio_config)

es = speechsdk.EventSignal(sr.recognized, sr.recognized)

result = sr.recognize_once()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

Based on the documentation, looks like I have to use signals and events to capture the full audio using method start_continuous_recognition (which is not documented for python, but looks like the method and related classes are implemented). I tried to follow other examples from c# and Java but was not able to implement this in Python.

Has anyone been able to do this and provie some pointers? Thank you very much!

Author:F_M，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/54166387/azure-speech-to-text-continuos-recognition

David Beauchemin :

And to further improve @manyways solutions here own to collect the data.\nall_results = []\n\ndef handle_final_result(evt):\n all_results.append(evt.result.text)\n speech_recognizer.recognized.connect(handle_final_result) # to collect data at the end\n",

2021-09-27T17:22:29

manyways :

Check the Azure python sample: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py\n\nOr other language samples: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples\n\nBasically, the below:\n\ndef speech_recognize_continuous_from_file():\n \"\"\"performs continuous speech recognition with input from an audio file\"\"\"\n # <SpeechContinuousRecognitionWithFile>\n speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)\n audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)\n\n speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)\n\n done = False\n\n def stop_cb(evt):\n \"\"\"callback that stops continuous recognition upon receiving an event `evt`\"\"\"\n print('CLOSING on {}'.format(evt))\n speech_recognizer.stop_continuous_recognition()\n nonlocal done\n done = True\n\n # Connect callbacks to the events fired by the speech recognizer\n speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))\n speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))\n speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))\n speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))\n speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))\n # stop continuous recognition on either session stopped or canceled events\n speech_recognizer.session_stopped.connect(stop_cb)\n speech_recognizer.canceled.connect(stop_cb)\n\n # Start continuous speech recognition\n speech_recognizer.start_continuous_recognition()\n while not done:\n time.sleep(.5)\n # </SpeechContinuousRecognitionWithFile>\n",

2020-01-02T08:14:29

datariel :

You could try this:\n\nimport azure.cognitiveservices.speech as speechsdk\nimport time\nspeech_key, service_region = \"xyz\", \"WestEurope\"\nspeech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region, speech_recognition_language=\"it-IT\")\nspeech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)\n\nspeech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))\nspeech_recognizer.session_stopped.connect(lambda evt: print('\\nSESSION STOPPED {}'.format(evt)))\nspeech_recognizer.recognized.connect(lambda evt: print('\\n{}'.format(evt.result.text)))\n\nprint('Say a few words\\n\\n')\nspeech_recognizer.start_continuous_recognition()\ntime.sleep(10)\nspeech_recognizer.stop_continuous_recognition()\n\nspeech_recognizer.session_started.disconnect_all()\nspeech_recognizer.recognized.disconnect_all()\nspeech_recognizer.session_stopped.disconnect_all()\n\n\nRemember to set your preferred language. It's not too much but it's a good starting point, and it works. I will continue experimenting.",

2019-02-27T02:01:41

Azure speech-to-text - Continuos Recognition