×

img Acces sibility Controls

Research Projects Banner

Research Projects

A cross-lingual study of neuron-level explainability of deep natural language processing models and its application in framework building for cross-lingual natural language processing systems

Implementing Organization

Indian Institute of Technology (Indian School of Mines) Dhanbad
Principal Investigator
Dr. Ayan Das
Indian Institute of Technology (Indian School of Mines) Dhanbad

Project Overview

Natural language processing (NLP) systems are traditionally trained using annotated data, but this is not always available for most languages due to the high cost and time required. To develop NLP systems for low-resourced languages, cross-lingual approaches are adopted. Transfer learning-based cross-lingual approaches focus on using contextual word representations from large pre-trained language models trained on raw text in different languages. However, the quality of these representations can be degraded if the target language text volume is small or if other languages are syntactically different from the target language. Recent studies have attempted to explain predictions of NLP systems by associating each prediction category with a subset of neurons in the representations. These studies have shown that activations of a subset of neurons are predominantly responsible for encoding knowledge for predicting a particular category. Some NLP systems can be controlled by altering the activation values of a subset of neurons. This project aims to extend this idea to cross-lingual settings, conducting a neuron-level analysis of the cross-lingual performance of deep multilingual models for resource-deficient languages. The goal is to identify subsets of neurons that encode the majority of information corresponding to different prediction classes in different languages for a given NLP task. The information obtained will be used to develop a framework for building cross-lingual systems for under-resource languages, particularly for low-resourced Indian languages.
Funding Organization
Funding Organization
Science and Engineering Research Board (SERB), New Delhi
Anusandhan National Research Foundation (ANRF)
Quick Information
Area of Research
Computer Sciences and Information Technology
Focus Area
Artificial Intelligence, Natural Language Processing
Start Year
2024
End Year
2026
Sanction Amount
₹ 30.06 L
Status
Ongoing
Output
No. of Research Paper
00
Technologies (If Any)
00
No. of PhD Produced
N/A
Startup (If Any)
00
No. of Patents
Filed :00
Grant :00
arrowtop