Pakistan has the 5th largest population in the world and hence the number of people associated with a regional language is significantly large. Urdu is the national language of Pakistan. Therefore, developing a natural language processing system capable of understanding Urdu is crucial. Natural language processing is a challenging research area with interesting problems. NLP-related tasks involve parts of speech tagging, developing word embedding’s, etc. whereas its applications revolve around translation, intent classification, sentiment analysis, etc. Therefore, there is a requirement for both developments of basic processes and applications involving NLP. Our project’s primary focus is to collect text data, tag parts of speech, and then train a system to recognize parts of speech in Urdu text. The goals of the project are: Collection of 100,000 tokens for the training of the system Annotation/tagging of tokens for reference Development of deep learning-based approach for Urdu parts of speech tagging. Mobile Application for tagging POS.
Tools: Hugging Face, Jupyter Lab, Visual Studio Code, TensorFlow, XLM Roberta Base Model, Keras, Sci-kit,python
Department: Department of Computer Science
Project Team Members
Name | CV | |
---|---|---|
Kiran Zafar | kiranzafar2019@namal.edu.pk | |
Hurmat Ilyas | hurmat2019@namal.edu.pk | |
Muhammad Bilal | mbilal2019@namal.edu.pk |