Automatic Speech-to-Text Transcription: Preliminary Results – OpenSpires

How well this speech recognition working for transcription? Here’s a research project that was set ups to find some answers….

As part of the project SPINDLE we are running a series of experiments to evaluate the use of  Large Vocabulary Continuous Speech Recognition Software for the automatic transcription of podcasts. We already know this automatic transcription is not going to be 100% accurate at transcription level but ‘good enough’ to enrich the existing metadata of the University podcasts with a set of keywords generated from this automatic transcription.

Today we will present some preliminary results of three different podcasts already available at the University of Oxford Podcasts website. We used the Speech Analysis tool from Adobe Premiere Pro CS5 to automatically transcribe these three podcasts. We selected the English UK language option and the High (slower) quality parameter.

The Table below shows the characteristics of the three podcasts (title, duration, number of words in the manual transcription and number of words in the automatic transcription). We report the automatic transcription results in terms of Word Accuracy using the Levenshtein distance between the manual transcript and the automatic transcript (the higher the better).

Analysing the results we see that the range of accuracy goes from 17% to 56%. Why is accuracy so variable? Listening to the recordings and analysing the audio signals we see that the recording conditions of these three podcasts are really different from each other and that is what we consider the important factor in obtaining such different results.

The first podcast contains background noise and even a video conference speaker.

The second podcast has a really low signal.

The last podcast was professionally recorded and edited and therefore obtains the best results.

There may be other factors affecting the accuracy of the automatic transcription such as the podcast topic (language model), out of vocabulary words (dictionary) or accents (acoustic model).

In following weeks we will report how do we generate keywords automatically from these automatic transcripts. Stay tuned!

More-latest speech technologies
Social share or comment – what do you think?

There are no comments yet. Be the first and leave a response!

Leave a Reply

Wanting to leave an <em>phasis on your comment?

 
Trackback URL http://www.speechtechnologygroup.com/automatic-speech-to-text-transcription-preliminary-results-openspires/trackback

Automatic Speech-to-Text Transcription: Preliminary Results – OpenSpires

How well this speech recognition working for transcription? Here’s a research project that was set ups to find some answers….

As part of the project SPINDLE we are running a series of experiments to evaluate the use of  Large Vocabulary Continuous Speech Recognition Software for the automatic transcription of podcasts. We already know this automatic transcription is not going to be 100% accurate at transcription level but ‘good enough’ to enrich the existing metadata of the University podcasts with a set of keywords generated from this automatic transcription.

Today we will present some preliminary results of three different podcasts already available at the University of Oxford Podcasts website. We used the Speech Analysis tool from Adobe Premiere Pro CS5 to automatically transcribe these three podcasts. We selected the English UK language option and the High (slower) quality parameter.

The Table below shows the characteristics of the three podcasts (title, duration, number of words in the manual transcription and number of words in the automatic transcription). We report the automatic transcription results in terms of Word Accuracy using the Levenshtein distance between the manual transcript and the automatic transcript (the higher the better).

Analysing the results we see that the range of accuracy goes from 17% to 56%. Why is accuracy so variable? Listening to the recordings and analysing the audio signals we see that the recording conditions of these three podcasts are really different from each other and that is what we consider the important factor in obtaining such different results.

The first podcast contains background noise and even a video conference speaker.

The second podcast has a really low signal.

The last podcast was professionally recorded and edited and therefore obtains the best results.

There may be other factors affecting the accuracy of the automatic transcription such as the podcast topic (language model), out of vocabulary words (dictionary) or accents (acoustic model).

In following weeks we will report how do we generate keywords automatically from these automatic transcripts. Stay tuned!

More-latest speech technologies
Social share or comment – what do you think?

There are no comments yet. Be the first and leave a response!

Leave a Reply

Wanting to leave an <em>phasis on your comment?

 
Trackback URL http://www.speechtechnologygroup.com/automatic-speech-to-text-transcription-preliminary-results-openspires/trackback
css.php