Number of Visitors: 2195


Master Thesis Defense Session

Master Thesis Defense Session by Sina Eshraghi Samani, Improving a formal Persian automatic speech recognition system based on deep learning

News Code: 19

Publishing Date: 21 Sep 2022 22:37

In The Name of God

Master Thesis Defense Session

Computer Engineering, Artificial Intelligence Engineering​



Dr. Peyman Adibi

Dr. Ali Reza Darvishy


Internal Reviewer:

Dr. Hamid Reza Baradaran Kashani


External Reviewer:

Dr. Mohammad Reza Yazdchi



Sina Eshraghi Samani


Date: : 21 September 2022

Time: : 2:00 PM



Ansari building, Third floor, Dr. Braani Hall



Improving a formal Persian automatic speech recognition system based on deep learning​

In this project, an attempt is made to create a well-functioning speech recognition system for the Persian language. Hence, some of the currently advanced models have already been implemented and after examining the results, the model that shows the best performance will be selected. Then, by further studying the structure of the selected model, it is tried to improve the performance characteristics of the speech recognition system by making changes in the structure of the model. After implementing and checking the results of some advanced models, the CRDNN model provides the best results. By examining more and more deeply the different parts of this model of encoding and decoding, the changes in each of them will improve the efficiency of the models. The encoder part of this model consists of the combination of CNNs, RNNs and DNNs blocks. The structure of the CNNs blocks of this model is inspired by the VGG model. Here this structure is modified by the inspiration of the ResNet network, and in parts of the CNNs block, instead of using the normal convolution layer, the depthwise separable convolution layer is used. In the decoder part, a language model based on basic LSTM along with BPE tokenization method with 1000 tokens is used. In addition, in order to make the speech recognition system more resistant to noise, augmentation technique has also been used. After performing various tests on the Persian Mozilla common voice dataset, the results show the improvement of performance criteria. WER and CER criteria were improved by 4.13% and 2.1%, respectively, about 1-2% compared to the base model. The number of model parameters was also reduced.