Master Thesis Defense Session

General
About FCE
News and Events
Tours and Photos
University of Isfahan
About Isfahan

Students
Apply Now
How to Apply
Programs
Courses
Student Life

Faculty/Staff
Departments
Labs
Contact Us
Phonebook / Directory

Number of Visitors: 2430

:

Master Thesis Defense Session

Master Thesis Defense Session by Sina Eshraghi Samani, Improving a formal Persian automatic speech recognition system based on deep learning

News Code: 19

Publishing Date: 21 Sep 2022 22:37

In The Name of God

Master Thesis Defense Session

Computer Engineering, Artificial Intelligence Engineering

Supervisor:

Dr. Peyman Adibi

Dr. Ali Reza Darvishy

Internal Reviewer:

Dr. Hamid Reza Baradaran Kashani

External Reviewer:

Dr. Mohammad Reza Yazdchi

Researcher:

Sina Eshraghi Samani

Date: : 21 September 2022

Time: : 2:00 PM

Location:

Ansari building, Third floor, Dr. Braani Hall

Topic:

Improving a formal Persian automatic speech recognition system based on deep learning

In this project, an attempt is made to create a well-functioning speech recognition system for the Persian language. Hence, some of the currently advanced models have already been implemented and after examining the results, the model that shows the best performance will be selected. Then, by further studying the structure of the selected model, it is tried to improve the performance characteristics of the speech recognition system by making changes in the structure of the model. After implementing and checking the results of some advanced models, the CRDNN model provides the best results. By examining more and more deeply the different parts of this model of encoding and decoding, the changes in each of them will improve the efficiency of the models. The encoder part of this model consists of the combination of CNNs, RNNs and DNNs blocks. The structure of the CNNs blocks of this model is inspired by the VGG model. Here this structure is modified by the inspiration of the ResNet network, and in parts of the CNNs block, instead of using the normal convolution layer, the depthwise separable convolution layer is used. In the decoder part, a language model based on basic LSTM along with BPE tokenization method with 1000 tokens is used. In addition, in order to make the speech recognition system more resistant to noise, augmentation technique has also been used. After performing various tests on the Persian Mozilla common voice dataset, the results show the improvement of performance criteria. WER and CER criteria were improved by 4.13% and 2.1%, respectively, about 1-2% compared to the base model. The number of model parameters was also reduced.

Research Centers

Model Driven Software Engineering Research Group

Rasa Research Center

Offices

Office of Industry Relations

Office of Research and Technology

Office of Education and Postgraduate Studies

International Scientific Cooperation Office


	Iran, Isfahan, Hezar Jerib Street, University Of Isfahan
	+98-31-379334501
	Postal Code: 36699529
	info@eng.ui.ac.ir

Tiida