Urdu Based ASR System

Speech recognition is the process of understanding the words uttered by a speaker. Speech recognition is used to convert speech to text and then use the text for natural language processing. To perform natural language processing the text obtained must follow the general rules of grammar of the language. Pakistan has the 5th largest population in the world and hence the number of people associated with a regional language is significantly large. Urdu is the most spoken regional language in Pakistan. Therefore, it is important to develop a speech recognition system capable of understanding Urdu. The focus of the project will be to collect speech data, transcribe it and then use existing tools to perform speech recognition. The goals of the project are: The project focuses on developing an Automatic Speech Recognition (ASR) system for the Urdu language, which can convert Urdu speech into text using the Whisper Model. The developed mobile application allows users to record audio through the microphone, play and stop recordings functionality, and uses the Whisper Model's API to display the transcribed speech on the mobile interface and the system operates in online mode. The proposed ASR system will also operate in offline mode by reducing the model size and integrating it within the mobile application. This feature will enable Urdu speakers to use the system without an internet connection, enhancing its accessibility and convenience. The Whisper small Model trained on a la

Keywords: Automatic Speech Recognition, Urdu language, Common Voice dataset, Whisper Model, training, mobile application, online mode, offline mode
Tools: Pytorch, Numpy, Pandas, Whisper Model, Andriod Studio, Flutter, Dart, Python3 Hugging face
Department: Department of Computer Science

Project Team Members

Name Email
Sania Bibi sania2019@namal.edu.pk
Inayat Ullah anayat2019@namal.edu.pk
Bakht Ullah bakhtullah2019@namal.edu.pk

Project Poster

Copyrights © 2024. Namal University Mianwali. All Rights Reserved.