Skip to main content

Publications about the project

Project publications are originally saved on a Zenodo community. Access the project's community page to see the details.
Displaying 21-30 of 87 records

CoSy: Evaluating Textual Explanations of Neurons

Kopf, Laura; Bommer, Philine Lou; Hedström, Anna; Lapuschkin, Sebastian; Höhne, Marina M.-C.; Bykov, Kirill
Publication date: 05/12/2024 - DOI: 10.52202/079017-1093

A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While methods exist to connect neurons to human-understandable textual descriptions, evaluating the quality of these explanations is challenging due to the lack of a unified quantitative approach. We introduce CoSy (Concept Synthesis), a novel, architecture-agnostic framework for evaluating textual explanations of latent neurons. Given textual explanations, our proposed framework uses a generative model conditioned on textual input to create data points representing the explanations. By comparing the neuron's response to these generated data points and control data points, we can estimate the quality of the explanation. We validate our framework through sanity checks and benchmark various neuron description methods for Computer Vision tasks, revealing significant differences in quality.

EFFICIENT AND FLEXIBLE NEURAL NETWORK TRAINING THROUGH LAYER-WISE FEEDBACK PROPAGATION

Weber, Leander; Berend, Jim; Weckbecker, Moritz; Binde, Alexander; Wiegand, Thomas; Samek, Wojciech; Lapuschkin, Sebastian
Publication date: 19/06/2025 - DOI: 10.48550/arXiv.2308.12053

Gradient-based optimization has been a cornerstone of machine learning that enabled the vast advances of Artificial Intelligence (AI) development over the past decades. However, this type of optimization requires differentiation, and with recent evidence of the benefits of non-differentiable (e.g. neuromorphic) architectures over classical models w.r.t. efficiency, such constraints can become limiting in the future. We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors that utilizes methods from the domain of explainability to decompose a reward to individual neurons based on their respective contributions. Leveraging these neuron-wise rewards, our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones. While having comparable computational complexity to gradient descent, LFP does not require gradient computation and generates sparse and thereby memory- and energy-efficient parameter updates and models. We establish the convergence of LFP theoretically and empirically, demonstrating its effectiveness on various models and datasets. Via two applications - neural network pruning and the approximation-free training of Spiking Neural Networks (SNNs) - we demonstrate that LFP combines increased efficiency in terms of computation and representation with flexibility w.r.t. choice of model architecture and objective function.

Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond

Bareeva, Dilyara; Yolcu, Galip Ümit; Hedström, Anna; Schmolenski, Niklas; Wiegand, Thomas; Samek, Wojciech; Lapuschkin, Sebastian
Publication date: 10/10/2024 - DOI: 10.48550/arXiv.2410.07158

In recent years, training data attribution (TDA) methods have emerged as a promising direction for the interpretability of neural networks. While research around TDA is thriving, limited effort has been dedicated to the evaluation of attributions. Similar to the development of evaluation metrics for traditional feature attribution approaches, several standalone metrics have been proposed to evaluate the quality of TDA methods across various contexts. However, the lack of a unified framework that allows for systematic comparison limits trust in TDA methods and stunts their widespread adoption. To address this research gap, we introduce Quanda, a Python toolkit designed to facilitate the evaluation of TDA methods. Beyond offering a comprehensive set of evaluation metrics, Quanda provides a uniform interface for seamless integration with existing TDA implementations across different repositories, thus enabling systematic benchmarking. The toolkit is user-friendly, thoroughly tested, well-documented, and available as an open-source library on PyPi and under https://github.com/dilyabareeva/quanda.

A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

Arras, Leila; Puri, Bruno; Kahardipraja, Patrick; Lapuschkin, Sebastian; Samek, Wojciech
Publication date: 21/02/2025 - DOI: 10.48550/arXiv.2502.15886

Various XAI attribution methods have been recently proposed for the transformer architecture, allowing for insights into the decision-making process of large language models by assigning importance scores to input tokens and intermediate representations. One class of methods that seems very promising in this direction includes decomposition-based approaches, i.e., XAI-methods that redistribute the model's prediction logit through the network, as this value is directly related to the prediction. In the previous literature we note though that two prominent methods of this category, namely ALTI-Logit and LRP, have not yet been analyzed in juxtaposition and hence we propose to close this gap by conducting a careful quantitative evaluation w.r.t. ground truth annotations on a subject-verb agreement task, as well as various qualitative inspections, using BERT, GPT-2 and LLaMA-3 as a testbed. Along the way we compare and extend the ALTI-Logit and LRP methods, including the recently proposed AttnLRP variant, from an algorithmic and implementation perspective. We further incorporate in our benchmark two widely-used gradient-based attribution techniques. Finally, we make our carefullly constructed benchmark dataset for evaluating attributions on language models, as well as our code, publicly available in order to foster evaluation of XAI-methods on a well-defined common ground.

RoboFireFuseNet: Robust Fusion of Visible and Infrared Wildfire Imaging for Real-Time Flame and Smoke Segmentation

Publication date: 09/01/2026 - DOI: 10.5281/zenodo.18199667

Concurrent image segmentation of flames and smoke is challenging, as smoke frequently obscures fire in standard RGB imagery, necessitating the use of other spectral bands such as Infrared (IR). Existing multimodal models are either too computationally demanding for real-time deployment or too lightweight to capture fine-grained sparse fire patterns that may escalate into large wildfires. Moreover, they are typically trained and validated on simplistic datasets, such as Corsican or FLAME1, which lack the dense smoke occlusion present in real-world scenarios. We introduce RoboFireFuseNet (RFFNet), a real-time deep neural network that fuses RGB and IR data using attention mechanisms, a detail-preserving decoder, and class-balance training techniques. RFFNet establishes a benchmark on a challenging, realistic wildfire dataset with dense smoke, creating a foundation for practical comparison in future wildfire segmentation research. Despite its lightweight design, it also achieves state-of-the-art results on a general urban benchmark, demonstrating versatility. Its combination of accuracy, real-time performance, and multimodal fusion makes RFFNet well-suited for proactive, robust and accurate wildfire monitoring. Code is available at: https://gitfront.io/r/dfotiou/eiTd3o9UURjn/RoboFireFuseNet-private

Spatio-temporal invariant descriptors for skeleton-based human action recognition

Kamel, Aouaidjia; Zhang, Chongsheng; Pitas, Ioannis
Publication date: 09/01/2026 - DOI: 10.1016/j.ins.2024.121832

Skeleton-based human action recognition is crucial for many practical applications. However, existing methods often rely on a single skeleton sequence representation, which may not fully capture the complex features of actions. To tackle this issue, we propose IMDAR (Invariant Multi-Descriptors for Action Recognition): A framework that uses multiple spatio-temporal invariant representations to improve action feature learning. These representations capture the evolutionof skeleton poses, considering the motion of joints and limbs. We transform each skeleton in the sequence into a graph representation, and the sequence of graph features is structured into a spatio-temporal matrix. To capture the motion dynamics, we design three spatio-temporal distance matrices that represent the variation in inter-joint distances, inter-frame joint distances, and inter-limb angles across the sequence. The matrices are then transformed into image descriptors, which are used for training action prediction models. A Voting and Priority Score Selection(VPSS) algorithm is proposed to determine the correct class from multiple descriptor predictions. Experiments on benchmark datasets demonstrate the invariance capability of IMDAR, and show 2.4%, 1.3%, 1.8% and 2.8% improvement in accuracy on NTU-RGB+D 60, NTU-RGB+D 120, N-UCLA and UTD-MHAD datasets, respectively. Code and models are made available on the Github repository.

Generative Representation Learning in Recurrent Neural Networks for Causal Timeseries Forecasting

Publication date: 09/01/2026 - DOI: 10.1109/TAI.2024.3446465

Feed-forward Deep Neural Networks (DNNs) are the state-of-the-art in timeseries forecasting. A particularly significant scenario is the causal one: when an arbitrary subset of variables of a given multivariate timeseries is specified as forecasting target, with the remaining ones (exogenous variables)causing the target at each time instance. Then, the goal is to predict a temporal window of future target values, given a window of historical exogenous values. To this end, this paper proposes a novel deep recurrent neural architecture, called Generative-Regressing Recurrent Neural Network (GRRNN), which surpasses competing ones in causal forecasting evaluation metrics, by smartly combining generative learning and regression. During training, the generative module learns to synthesize historical target timeseries from historical exogenous inputs via conditional adversarial learning, thus internally encoding the input timeseries into semantically meaningful features. During a forward pass, these features are passed over as input to the regression module, which outputs the actual future target forecasts in a sequence-to-sequence fashion. Thus, the task of timeseries generation is synergistically combined with the task of timeseries forecasting, under an end-to-end multitask training setting. Methodologically, GRRNN contributes a novel augmentation of pure supervised learning, tailored to causal timeseries forecasting, which essentially forces the generative module to transform the historical exogenous timeseries to a more appropriate representation, before feeding it as input to the actual forecasting regressor. Extensive experimental evaluation on relevant public datasets obtained from disparate fields, ranging from air pollution data to sentiment analysis of social media posts, confirms that GRRNN achieves top performance in multistep long-term forecasting.

FCL-ViT: Task-Aware Attention Tuning for Continual Learning

Publication date: 09/01/2026 - DOI: 10.1016/j.patrec.2025.08.003

Continual Learning (CL) involves adapting the prior Deep Neural Network (DNN) knowledge to new tasks, without forgetting the old ones. However, modern CL techniques focus on provisioning memory capabilities to existing DNN models rather than designing new ones that are able to adapt according to the task at hand. This paper presents the novel Feedback Continual Learning Vision Transformer (FCL-ViT) that uses a feedback mechanism to generate real-time dynamic attention features tailored to the current task. The FCL-ViT operates in two Phases. In phase 1, the generic image features are produced and determine where the Transformer should attend on the current image. In phase 2, task-specific image features are generated that leverage dynamic attention. To this end, Tunable self-Attention Blocks (TABs) and Task Specific Blocks (TSBs) are introduced that operate in both phases and are responsible for tuning the TABs attention, respectively. The FCL-ViT surpasses stateof-the-art performance on Continual Learning compared to benchmark methods, while retaining a small number of trainable DNN parameters.

UNREALFIRE: A SYNTHETIC DATASET CREATION PIPELINE FOR ANNOTATED FIRE IMAGERY IN UNREAL ENGINE

Publication date: 09/01/2026 - DOI: 10.5281/zenodo.18198757

High-quality training data are essential for Deep Neural Network (DNN) training. In Natural Disaster Management (NDM) scenarios, annotated training data are needed to train DNN models, e.g., for wildfire detection/segmentation. However, image annotation in such scenarios is prone to annotation errors, mostly due to the unpredictable visual structure of the fire/smoke. To this end, photorealistic simulators hold substantial promise, since they allow the creation of synthetic wildfire images. Yet, existing assets depicting fires in simulator engines are typically inserted as particle objects. As a result, existing assets do not feature a set 3D mesh causing them to have no 2D projection, i.e., it is not trivial how to generate fire segmentation annotation maps. This paper presents a free, open-access1 pipeline for creating diverse synthetic annotated wildfire image datasets. More specifically, we developed a novel particle segmentation camera for the AirSim plugin, which enables the generation of segmentation maps of objects made of particles. We also integrate Procedural Content Generation tools (PCG) to gather unlimited amounts of diverse, high-quality annotated training data. To evaluate our framework, we generated a sample fire dataset called AUTH-Unreal-Wildfire (AUW) for wildfire segmentation. In our experiments we use a state-of-the-art segmentation DNN, namely PIDNet, and compare the our synthetic wildfire images to different real image datasets, along with their potential to augment real wildfire datasets.

Ensuring Medical AI Safety: Interpretability-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data

Publication date: 29/07/2025 - DOI: 10.1007/s10994-025-06834-w

Deep neural networks are increasingly employed in high-stakes medical applications, despite their tendency for shortcut learning in the presence of spurious correlations, which can have potentially fatal consequences in practice. Whereas a multitude of works address either the detection or mitigation of such shortcut behavior in isolation, the Reveal2Revise approach provides a comprehensive bias mitigation framework combining these steps. However, effectively addressing these biases often requires substantial labeling efforts from domain experts. In this work, we review the steps of the Reveal2Revise framework and enhance it with semiautomated interpretability-based bias annotation capabilities. This includes methods for the sampleand feature-level bias annotation, providing valuable information for bias mitigation methods to unlearn the undesired shortcut behavior. We show the applicability of the framework using four medical datasets across two modalities, featuring controlled and real-world spurious correlations caused by data artifacts. We successfully identify and mitigate these biases in VGG16, ResNet50, and contemporary Vision Transformer models, ultimately increasing their robustness and applicability for real-world medical tasks. OOur code is available at https://github.com/frederikpahde/medical-ai-safety.