Forschungsthemen
[FM] Data Transformation Quality Metrics based on Information Theory
A cruicial step with harnessing large datasets for Big Data and AI applications is the reduction of the data set’s dimensionality. Ideally, dimonsionality reduction simplifies handling the dataset while preserving information. Most Dimensionality Reduction algorithms output a key figure to indicate the success of the run. t-SNE, for example, reports its optimisation criterion, the KL-Divergence.
There a several problems:
* Trusting the key figures from algorithms means to trust it’s working correctly.
* Each algorithm uses its own metric/key figure—which in general are not comparable with each other.
* The key figures might not indicate extend of loss of information.
* Publications presenting a Dimensionality Reduction algorithm usually do not contain an extensive survey of input values and space and time complexity.
The goal of this treatise is to answer the questions:
* whether an information theoretic approach can be utilised to evalutate the performance of Dimensionality Reduction algorithms
* and if such a key figure is easier to interpret.
The core aspect of this endeavour is the assumption that to be simplified sets of data-points can be examined from an information theoretic perspective. Information Theory is based on events and probabilies. On the contrary, spatial configurations of datapoints do not per se yield probabilities.
Betreuer: Karsten Wendt