×

img Acces sibility Controls

Research Projects Banner

Research Projects

Study and development of Information Entropy-based distance measures for Categorical and Continuous Data in a Metric Space for Clustering

Implementing Organization

Pandit Dwarka Prasad Mishra Indian Institute Of Information Technology, Design & Manufacturing, Jabalpur, Madhya Pradesh
Principal Investigator
Dr. Sraban Kumar Mohanty
Pandit Dwarka Prasad Mishra Indian Institute Of Information Technology, Design & Manufacturing, Jabalpur, Madhya Pradesh

Project Overview

This proposal aims to develop dis(similarity) measures in a metric space for Numerical, Categorical, and Mixed datasets by employing the information entropy to capture the disorderliness and ensemble property of the data distribution along features. The salient feature of the proposal is to capture the statistical significance of individual attributes of the dataset from the possible number of microstates for that feature. Further, entropy would be employed to compute the weight of each individual attributes to signify the contribution of different features. The proposed measure would be free from any user defined parameters and also independent of the distribution of datapoints. 1. In general, the characteristic length of any system suggests its scale in the Euclidean feature space. The characteristic length of a feature defines the measure of the wideness/inhomogenity of all-pair differences. Large value of characteristic length indicates all-pair absolute differences are widely distributed and this is a good measure of the weight for that feature. Based on this, a weighted metric would be proposed for numerical data to improve the performance of the clustering methods. 2. The characteristic length and Boltzmann entropy would be employed to capture the intra-attribute statistical information along features to discover the significance of attributes for clustering categorical data. 3. Similarly, both intra and inter-attribute data distribution would be captured by entropy to devise a dis(similarity) measure for mixed datasets.
Funding Organization
Funding Organization
Science and Engineering Research Board (SERB), New Delhi
Anusandhan National Research Foundation (ANRF)
Quick Information
Area of Research
Mathematical Sciences
Start Year
2023
End Year
2026
Sanction Amount
₹ 6.60 L
Status
Ongoing
Output
No. of Research Paper
00
Technologies (If Any)
00
No. of PhD Produced
N/A
Startup (If Any)
00
No. of Patents
Filed :01
Grant :00
arrowtop