Automatic Parts-of-Speech Tagger Based on BIS Tagset in Assamese
Implementing Organization
Dibrugarh University
Principal Investigator
Dr. Nomi Baruah
Dibrugarh University
Project Overview
Parts-of-speech (POS) tagging is a challenging field in Natural Language Processing (NLP) due to its need for deep insight and knowledge about a specific language, particularly in large volumes of data. Despite the growing number of works on POS tagging in Indian languages like Hindi and Bengali, there is a lack of resources for Assamese, one of India's national languages, with 15.3 million populations worldwide. As NLP research on Assamese language grows, a high-accuracy automatic POS tagger is necessary. A dataset will be developed using BIS tagset for Assamese novels, news articles, and sports, which will be one of the pioneer works in Assamese and Indian languages. The POS tagger will be implemented using RNN-based deep learning methods and a newly designed hybrid method. The outputs and performance of these methods will be critically analyzed for their effectiveness.