[ Automatic Speech Recognition ]
Automatic speech recognition (ASR) is a technology that can be used to transcribe spoken words into written text.
Ubiqus uses one form of ASR, which is the Large Vocabulary Continuous Speech Recognition (LVCSR), based on the automatic identification of very short audio sequences. This technology makes it possible to produce a highly quality transcription, if provided with and subject to a high quality audio recording.
The state of the art of ASR has greatly evolved in recent years, and our R&D team is contributing to its permanent growth.
There are 4 Steps to the Process:
1) Voice Activity Detection
Next, it’s important to identify the different speakers in each recording, and to group them into segments according to their identity, solving the problem of ‘who spoke when?’. For this, the machine uses different models containing specific data (languages, voice). It is therefore able to differentiate the subtleties of a language (such as accents for example). Note that at this point, we are still in the “mathematical” treatment of the data.
This process is applied to every segment of the recording to produce, in fine, the complete transcription.
At the end of this automated process, the document is re-read by our teams, like we do for any other Ubiqus document: On top of verifying the content as a whole, the proofreader will also ensure the speech has been correctly attributed.