A cross-lingual study of neuron-level explainability of deep natural language processing models and its application in framework building for cross-lingual natural language processing systems

Acces sibility Controls

Implementing Organization

Indian Institute of Technology (IIT), Dhanbad, Jharkhand

Principal Investigator

Dr. Ayan Das

Indian Institute of Technology (IIT), Dhanbad, Jharkhand

About

Natural language processing (NLP) systems are traditionally trained using annotated data, but this is not always available for most languages due to the high cost and time required. To develop NLP systems for low-resourced languages, cross-lingual approaches are adopted. Transfer learning-based cross-lingual approaches focus on using contextual word representations from large pre-trained language models trained on raw text in different languages. However, the quality of these representations can be degraded if the target language text volume is small or if other languages are syntactically different from the target language. Recent studies have attempted to explain predictions of NLP systems by associating each prediction category with a subset of neurons in the representations. These studies have shown that activations of a subset of neurons are predominantly responsible for encoding knowledge for predicting a particular category. Some NLP systems can be controlled by altering the activation values of a subset of neurons. This project aims to extend this idea to cross-lingual settings, conducting a neuron-level analysis of the cross-lingual performance of deep multilingual models for resource-deficient languages. The goal is to identify subsets of neurons that encode the majority of information corresponding to different prediction classes in different languages for a given NLP task. The information obtained will be used to develop a framework for building cross-lingual systems for under-resource languages, particularly for low-resourced Indian languages.

Source

Anusandhan National Research Foundation/Science and Engineering Research Board (SERB), DST 2023-24

Related Research

View All

Funding Organization

Science and Engineering Research Board (SERB), New Delhi

Anusandhan National Research Foundation (ANRF)

Quick Information

Area of Research

Computer Sciences and Information Technology

Focus Area

Artificial Intelligence, Natural Language Processing

Start Year

2024

End Year

2026

Sanction Amount

₹ 30.06 L

Status

Ongoing

Contact

ayandas@iitism.ac.in

Output

No. of Research Paper

Technologies (If Any)

No. of PhD Produced

No. of Patents

Filed : 00

Grant : 00

Acces sibility Controls

Research Projects