×

img Acces sibility Controls

Research Projects Banner

Research Projects

Advancement of NLP Techniques for Indian Languages with Focus on Bangla and Hindi

Implementing Organization

Indian Institute of Technology Kanpur
Principal Investigator
Dr. Arnab Bhattacharya
Indian Institute of Technology Kanpur
CO-Principal Investigator
Dr. Pawan Goyal
Indian Institute of Technology (IIT)

Project Overview

India is a land of languages, and the digital divide is being addressed by using resources in mother tongues. Automated computational processing of natural language tasks has improved significantly in recent years, but Indian languages, particularly Bangla and Hindi, still struggle with basic NLP tasks. The project aims to enhance performance for these languages by creating large corpora, creating benchmarks, building better NLP models, and using cross-lingual knowledge transfer. Large corpora are essential for building state-of-the-art deep learning-enabled NLP tools, as newspapers, blogs, and social media posts lack quality and variety. Literature articles are best suited for this purpose, and task-specific annotation can deliver quality benchmark datasets similar to what GLUE provides for English. The project also aims to build a generalized framework for automatic grammar correction for Indian languages, which will be useful for other Indian languages. Cross-lingual knowledge transfer from higher-resource Indian languages to lower-resource ones can help create better models due to common traits like scripts and sentence structure. Lastly, the project plans to showcase these works on an interactive website where users can download resources, play with trained models, help annotate data, and provide feedback. This approach aims to bridge the digital divide and improve the performance of NLP tools in Indian languages.
Funding Organization
Funding Organization
Science and Engineering Research Board (SERB), New Delhi
Anusandhan National Research Foundation (ANRF)
Quick Information
Area of Research
Computer Sciences and Information Technology
Start Year
2024
End Year
2027
Sanction Amount
₹ 40.81 L
Status
Ongoing
Output
No. of Research Paper
00
Technologies (If Any)
00
No. of PhD Produced
N/A
Startup (If Any)
00
No. of Patents
Filed :00
Grant :00
arrowtop