In The Name of God
Master Thesis Defense Session
Computer Engineering, Artificial Intelligence Engineering
Supervisor:
Dr. Peyman Adibi
Dr. Ali Reza Darvishy
Internal Reviewer:
Dr. Hamid Reza Baradaran Kashani
External Reviewer:
Dr. Mohammad Reza Yazdchi
Researcher:
Sina Eshraghi Samani
Date: : 21 September 2022
Time: : 2:00 PM
Location:
Ansari building, Third floor, Dr. Braani Hall
Topic:
Improving a formal Persian automatic speech recognition system based on deep learning
In this project, an attempt is made to create a well-functioning speech recognition system for the Persian language. Hence, some of the currently advanced models have already been implemented and after examining the results, the model that shows the best performance will be selected. Then, by further studying the structure of the selected model, it is tried to improve the performance characteristics of the speech recognition system by making changes in the structure of the model. After implementing and checking the results of some advanced models, the CRDNN model provides the best results. By examining more and more deeply the different parts of this model of encoding and decoding, the changes in each of them will improve the efficiency of the models. The encoder part of this model consists of the combination of CNNs, RNNs and DNNs blocks. The structure of the CNNs blocks of this model is inspired by the VGG model. Here this structure is modified by the inspiration of the ResNet network, and in parts of the CNNs block, instead of using the normal convolution layer, the depthwise separable convolution layer is used. In the decoder part, a language model based on basic LSTM along with BPE tokenization method with 1000 tokens is used. In addition, in order to make the speech recognition system more resistant to noise, augmentation technique has also been used. After performing various tests on the Persian Mozilla common voice dataset, the results show the improvement of performance criteria. WER and CER criteria were improved by 4.13% and 2.1%, respectively, about 1-2% compared to the base model. The number of model parameters was also reduced.