Datasets and Publications

Project publications are originally saved on a Zenodo community. Access the project's community page to see the details.

There are 44 publications

Improving Multilabel Text Emotion Detection with Emotion Interrelation Anchors

Giannouris, Polydoros; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 26/06/2025 - DOI: 10.1016/j.nlp.2025.100170

Emotion detection studies the problem of automatic identification of emotions expressed in text. Since multiple emotions may co-occur in a single text excerpt, state-of-the-art approaches often cast this multi-label classification task to multiple, independent binary classification tasks, each specialized for one emotion class. The main disadvantage of such approaches is that, by design, each binary classifier overlooks typical emotion interrelationships, such as co-occurrence (e.g., anger and fear) or mutual exclusiveness (e.g., sadness and joy). This paper proposes a simple and lightweight approach to re-introduce emotion interrelations into each binary classification task, where each binary classifier is able to understand the presence of other emotions, without directly inferring them. This is achieved by incorporating the proposed emotion anchors (i.e. features of representative emotional phrases) into the model of each binary classifier. More specifically, the model is trained to incorporate other emotions in its representation by learning the parameters of an attention mechanism. Based on experiments on multiple datasets, our approach improves emotion classification performance in both supervised and few-shot domain adaptation settings, outperforming standard binary models in terms of accuracy and macro averaged F1-scores. The approach is generic and can be applied to other interrelated multi-label binary classification tasks.

Exploring the Role of Engagement in Learning within a Rescue Department Community of Practice

Sever, Filip

Publication date: 26/06/2025 - DOI: 10.53615/2232-5697.14.207-219

Purpose: The purpose of this study is to investigate the role of engagement in building a community of practice (CoP) within a rescue department and its influence on workplace learning, knowledge exchange, and professional growth.

Study design/methodology/approach: This qualitative study employed focus group interviews with firefighters and fire officers, preceded by an expert interview, to explore the context of work and learning within a rescue department.

Findings: Findings reveal that engagement in a rescue department CoP is fostered by factors such as peer support, facilitation, intrinsic motivation, and flexible participation. These elements, alongside supportive organizational structures and adaptive leadership practices are crucial for building and sustaining the CoP and influencing workplace learning, knowledge exchange, and professional growth.

Originality/value: This paper provides new insights into CoP dynamics in the emergency services, highlighting the importance of inclusive practices, adaptive leadership, and digital facilitation to foster engagement

A Weighting Loss Approach for Transformer-Based Object Detection

Tzimas, Matthaios D.; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 30/05/2025 - DOI: 10.5281/zenodo.15552851

This paper introduces a training loss function tailored for object detection in transformer-based architectures. Our approach addresses the imbalance in ground-truth bounding box sizes during training by implementing a coordinate-based error-weighting mechanism for the $L_1$ loss. This modification stabilizes optimization and enhances detection performance, particularly in detection problems requiring bounding boxes of varying sizes within the same image, such as fire/smoke detection applications. By integrating this method into the Real-Time Detection Transformer (RT-DETR), we conduct extensive experiments across three fire/smoke detection datasets and compare our findings against leading real-time object detection algorithms, such as YOLO models. To further validate the generalizability of the proposed loss function, we incorporate it into various DETR-based architectures. Our experiments demonstrate the superior fire detection accuracy of RT-DETR trained with our method across all three datasets while ensuring its effectiveness on more complex datasets. This study not only enhances the capabilities of transformer-based architectures for real-time detection tasks but also contributes to the development of more efficient and reliable fire detection systems.

DIVIDE-AND-SUMMARIZE: ENHANCING DEEP NEURAL VIDEO SUMMARIZATION

Charalampakis, Evangelos; Papaioannidis, Christos; Pitas, Ioannis

Publication date: 22/05/2025 - DOI: 10.5281/zenodo.15487775

Sequence-based neural architectures, such as Long Short-Term Memory (LSTM) networks and Transformers, have driven advances in supervised video summarization by modeling inter-frame dependencies. However, existing methods assume that long-range dependencies are essential for summary generation, which may lead to unnecessary computational overhead. To address this, we propose a Field of View (FOV) adjustment strategy, Divide-and-Summarize (DIV-SUM). By partitioning input videos into smaller fragments of predefined size, our approach explicitly models short-range interframe relationships, enabling a fully parallelizable end-to-end video summarization pipeline. Furthermore, most prior work formulates neural video summarization as a frame-wise score regression task. We introduce a simple yet effective target space quantization module, which discretizes the regression targets into classes, introducing a tolerance margin that improves performance. Our approach offers two key benefits: (1) we achieve state-of-the-art performance on the SumMe benchmark while remaining competitive on TVSum, and (2) we significantly reduce the computational cost of inference, improving efficiency without sacrificing quality.

Neural Architecture Search and Knowledge Distillation for Semantic Image Segmentation on Big Wildfire Datasets

Vlachos, Evgenios; Papaioannidis, Christos; Pitas, Ioannis

Publication date: 21/05/2025 - DOI: 10.5281/zenodo.15479971

The increasing complexity of Deep Neural Network (DNN) models poses computational challenges for both DNN model development and their real-world deployment, particularly in the case of large training and test dataset scenarios. This is the case of forest fires, where huge UAV and synthetic image data have to be analyzed in real-time for efficient wildfire management. In this paper, we propose a novel combination of Neural Architecture Search (NAS) with Knowledge Distillation for burnt area image segmentation in the aftermath of a wildfire, by exploring a vast search space of DNN architectures and transferring learned DNN knowledge. We conducted our experiments on the BLAZE dataset depicting wildfires in Greece to evaluate the effectiveness of our approach on five different image segmentation DNN architectures. Our experiments demonstrated that for the best performing architecture, we have found a combination that can provide a 62.3% reduction of total trainable DNN parameters, alongside an increase of 1.02% in semantic image segmentation performance in terms of the mIoU metric.

IMPROVE REAL-TIME FLOOD SEGMENTATION BY ENCODING AND DISTILLING FOREGROUND INFORMATION

Mentesidis, Pantelis; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 21/05/2025 - DOI: 10.5281/zenodo.15479831

Flood segmentation systems play a crucial role in natural disaster management, particularly for real-time flood monitoring, thus real-time lightweight deep neural network (DNN) models constitute the state-of-the-art (SOTA) solution. A neglected aspect during the design of such solutions is that flood segmentation is a computer vision problem where the variance of visual appearance between the foreground (flood) and the background is imbalanced. This paper tackles this imbalance using Knowledge Distillation (KD), enhancing the capabilities of real-time SOTA DNN models for flood segmentation in complex and challenging environments. The proposed method employs a Self-KD approach, where a Teacher model, trained on augmented inputs with reduced background variance by exploiting traditional image processing techniques (e.g., blurring), guides a Student model operating on real-world data. By consistently processing augmented inputs, the Teacher model facilitates the Student’s ability to learn robust representations, effectively suppressing noisy background elements. Experimental results on a flood dataset demonstrate improvement of up to 2.5% in mean Intersection over Union (mIoU) over baseline SOTA models which scored 85% mIoU, highlighting the effectiveness of the proposed method. Furthermore, our approach is model-agnostic, consistently improving the performance of various SOTA DNN architectures across different models.

Promoting learning in the rescue department: A community of practice perspective

Sever, Filip

Publication date: 09/05/2025 - DOI: 10.3384/rela.2000-7426.5297

The modern digitalised workplace requires continuous learning to maintain the skills and knowledge required for civil protection work. The purpose of this study is to identify the primary factors that enable learning in the rescue department. Data was collected in semi-structured interviews with firefighters and fire officers from a Finnish rescue department. The results of the study show that peer support and learning preference are valued across organisational ranks. Technology has a crucial role as it disrupts workflows and necessitates new work requirements, while serving as a tool in social interactions, learning and knowledge management. The findings contribute to research on workplace learning through the development of communities of practice for civil protection workers, emphasising the need for collaboration and adaptive strategies for learning in the workplace.

Real-Time Flood Water Segmentation with Deep Neural Networks

Gerontopoulos, Anastasios; Papaioannou, Dimitrios; Papaioannidis, Christos; Pitas, Ioannis

Publication date: 28/04/2025 - DOI: 10.5281/zenodo.15296811

Nowadays, extreme floods pose severe threats to human lives and infrastructure, necessitating effective flood disaster management plans and systems. In this paper, we address the crucial need of real-time flood monitoring by leveraging computer vision models. To this end, benchmark deep neural networks are trained on the flood water segmentation task, using a novel dataset that was created by combining and annotating flood images from different sources. Our experimental evaluation of two real-world flood videos showcases the potential of computer vision in fast and accurate flood monitoring. Moreover, we investigated the use of semi-supervised training methods to enhance the flood segmentation performance by taking advantage of large unlabeled datasets. Our work emphasizes the potential of applying state-of-the-art big visual data analytics tools to mitigate the devastating impacts of floods on communities worldwide.

Extreme Weakly Supervised Binary Semantic Image Segmentation Via One-Pixel Supervision

Tzimas, Matthaios Dimitrios; Mygdalis, Vasileios; Papaioannidis, Christos; Pitas, Ioannis

Publication date: 15/04/2025 - DOI: 10.2139/ssrn.5217487

Despite recent advancements, Unsupervised Semantic Segmentation (USS) methods still exhibit a significant performance deficit compared to supervised approaches, particularly in binary semantic segmentation. This limitation arises because, without supervision, USS methods struggle to distinguish foreground from background image regions, particularly when the foreground contains small or uncommon objects. This issue is addressed by our proposed Extremely Weakly Supervised Binary Semantic Segmentation (EWS) framework. EWS expects minimal supervision, consisting only of a small set of one-pixel annotations explicitly belonging to the foreground class across the entire image dataset. Our approach leverages these one-pixel annotations and employs two contrastive losses to map visual transformer features into well-separated foreground and background feature clusters. Additionally, we propose a novel loss function to eliminate the need for hyperparameter tuning of the contrastive loss threshold, by dynamically computing it based on the similarity between the input image features. Even if we employ employ a single one-pixel annotation, EWS achieves competitive results in binary segmentation tasks while maintaining low computational costs, making it an efficient solution for critical segmentation applications.

A Decentralized Sharding BFT Consensus Approach, for Efficient Decentralized DNN Inference Classification

Papaioannou, Dimitrios; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 25/04/2025 - DOI: 10.5281/zenodo.15281260

The security and trustworthiness of participating DNN nodes are often overlooked during the design of modern Decentralized Deep Neural Networks (D-DNN). This paper introduces a shard-based distributed consensus protocol specifically tailored for DNN nodes operating over unreliable communication links. The proposed approach enhances D-DNN scalability, by enabling D-DNN systems having a large number of DNN nodes. This is achieved through a hierarchical consensus mechanism that partitions the D-DNN network into sub-networks (shards), leveraging Out-of-Distribution (OOD) detectors to localize and isolate the consensus process within each shard. Rather than randomly allocating DNN nodes into shards, the OOD detector can be employed to identify and group nodes with similar domain knowledge. This approach improves the overall D-DNN system robustness, by identifying and isolating malicious DNN nodes or once that have poor performance for a specific DNN task. Experimental results demonstrate improvements in the D-DNN system’s classification accuracy and reliability.

Trustworthy Majority Voting for Labeling and Analyzing Multi-Annotator Text Sentiment Datasets

Avgoustidis, Fotios; Bassia, Paraskevi; Pitas, Ioannis

Publication date: 23/04/2025 - DOI: 10.5281/zenodo.15267847

A typical way to label datasets for Deep Neural Network (DNN) training and testing is through crowdsourcing. However, there is no assurance that crowd workers will adhere to the data labeling criteria, refrain from introducing personal bias, or from spamming random labels. In order to address this issue, we propose a graph-based technique to assess annotator trustworthiness and adjust their involvement in the labeling process. Our proposed method not only improves data labels accuracy, by considering the agreement between annotators and ranking them based on their labeling trustworthiness, but also aims to enhance DNN inference performance by providing more accurate training data labels. We examine the constraints of conventional multi-annotation label aggregation techniques and compare them to our approach. Lastly, we demonstrate that our proposed method remains robust to artificially injected noisy annotations, surpassing the performance of previous state-of the-art work. The effectiveness of the proposed method is validated on an intrinsically subjective task, namely text sentiment analysis.

Comparison of Visual Place Recognition Methods for UAV Imagery

Siavrakas, Michael; Vlachos, Eugene; Pitas, Ioannis

Publication date: 14/04/2025 - DOI: 10.5281/zenodo.15211727

In many real world applications (natural disaster management, urban development, infrastructure inspection) Unmanned Aerial Vehicles (UAVs) perform flights on different times, for scene image acquisition. Visual Place Recognition (VPR) methods can match newly acquired images with older ones, when the new and/or the old ones are not georeferenced. Most VPR solutions are based on image retrieval, where a query image scene is visually compared with that of many related images in a database, and the most relevant ones are retrieved. Deep learning-based VPR performance relies a lot on image dataset acquisition conditions, e.g. structured/unstructured scene visualization, single/multi-view image acquisition, illumination variations, or on-road/aerial view. Most of VPR methods are trained and tested on on-road views. This paper addresses the issue of image retrieval performance when large image databases are employed. To this end, we perform a comparison of some state of the art VPR methods on UAV image datasets, where the amount of database images is scaled, examine how well they generalize, and expand on some dataset creation gaps for this task.

FOREST FIRE IMAGE CLASSIFICATION THROUGH DECENTRALIZED DNN INFERENCE

Papaioannou, Dimitrios; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 10/07/2025 - DOI: 10.1109/ICIPCW64161.2024.10769107

In the realm of Natural Disaster Management (NDM), timely communication with local authorities is paramount for an effective response. To achieve this, multi-agent systems play a pivotal role by proficiently identifying and categorizing various disasters. In the field of Distributed Deep Neural Network (D-DNN) inference, such approaches often require DNN nodes to transmit their results to the cloud for inference, or they necessitate the establishment of a fixed topology network to enable inference directly on the edge, a practice prone to security risks. In this work, we propose a decentralized inference strategy tailored for fire classification tasks. In this approach, individual DNN nodes communicate within a network and enhance their predictions by considering other DNN node inference outputs that contribute to improving their individual performance. The overall coordination of the system on a specific decision is achieved through a consensus protocol, which acts as a universally accepted inference rule adopted by all DNN nodes operating within the system. We present a comprehensive experimental analysis, of the forestfire classification task, focusing on enhancing both individual DNN node performance and the stability of the consensus protocol.

Privacy-Shielding Autonomous Systems For Natural Disaster Management (Ndm): Targeted Regulation Of The Use Of Autonomous Systems For Natural Disaster Management Goals Before The Materialization Of The Privacy Harm

Bouchagiar, Georgios; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 10/07/2025 - DOI: 10.54648/euro2023020

This contribution aims to recommend a fully-fledged privacy-assessment applicable to future uses of Autonomous Systems (AS) for Natural Disaster Management (NDM) purposes. It claims that certain implementations may interfere with the right to privacy and the protection of personal data and analyses challenges stemming from (non-) compliance with the General Data Protection Regulation (GDPR). Moreover, it subjects the use of autonomous systems to the European Court of Human Rights’ (ECtHR) Legality – Legitimacy – Necessity testing (LLN-check). On this basis, it proposes a targeted and ex ante privacy-assessment to address legal uncertainty, resulting from the GDPR’s tech-neutrality and case law’s ex post (after the harm) adjudication. The recommended scheme, ideally involving experts from various disciplines who would moreover be independent, could apply before the actual use of any AS and give a ‘proceed’, a ‘proceed with conditions’ or a ‘do not proceed’ decision.

Methodology and elaboration of model for Map of Wildfire Risk

CASULE, FABIO; SECCI, ROMINA; USAI, ANTONIO; MERELLA, MAURO

Publication date: 10/07/2025 - DOI: 10.1145/3632366.3632370

For civil protection purposes, risk is the probability of a calamitous event occurring that may cause harmful effects on the population, residential and productive settlements and infrastructure, within a particular area, in a given period of time. The work was carried out with the aim of being able to establish the municipal fire danger and risk index (IR), which define, respectively, the degree of danger and fire risk calculated on a regional basis and referred to the individual municipal territory, exploiting the typical functions of GIS tools and the new steps forward made by the application of artificial intelligence; however, the horizon to be reached is to be able to transform the algorithms into automated processes that can be used in platforms capable of returning outputs to end users.

EDGEmergency: A Cloud-Edge Platform to Enable Pervasive Computing for Disaster Management

Colosi, Mario; Garofalo, Marco; Carnevale, Lorenzo; Marino, Roberto; Fazio, Maria; Villari, Massimo

Publication date: 12/06/2024 - DOI: 10.1145/3632366.3632372

EDGEmergency is a platform designed for disaster management that can dynamically leverage the edge infrastructure potentially already present within the emergency perimeter. Edge devices, from IoT to smartphones, possess an increasingly significant computational capacity that can be exploited, by changing their behavior in real-time and creating a pervasive local environment, capable of adapting perfectly to the specific context of reference. EDGEmergency, in fact, allows the creation of a unified computation environment leveraging the Cloud-Edge-Client Continuum concept, through which a computation cluster with zero configurations is created on-the-fly. The platform thus allows the deployment of distributed microservices on existing edge devices, installed by default for other purposes, through a modular and incremental logic that has the role of adapting best to the needs of the individual emergency, through advanced tools for analysis and monitoring, using artificial intelligence.

Supporting the Natural Disaster Management Distributing Federated Intelligence over the Cloud-Edge Continuum: the TEMA Architecture

Carnevale, Lorenzo; Filograna, Antonio; Arigliano, Francesco; Marino, Roberto; Ruggeri, Armando; Fazio, Maria

Publication date: 12/06/2024 - DOI: 10.1145/3632366.3632371

Natural disasters are more and more often present in our daily life. Many are the cases where these events affect people and economies. In this context, there is the need for a technological intervention in support of first responders, with solutions capable of make decisions on the disaster areas. Indeed, considering these scenarios are time-sensitive, the intention is moving the computation units closer to those areas. In this paper, we propose a computing continuum architecture for offloading distributed intelligences over cloud, edge and deep edge layers. Exploiting the federated learning paradigm, enables mobile and stationary devices to independently train local models, contributing to the creation of the global common mode.

Data Operational Driven AI-based Architecture for Natural Disaster Management

Sebbio, Serena; Carnevale, Lorenzo; Balouek-Thomert, Daniel; Galletta, Antonino; Parashar, Manish; Villari, Massimo

Publication date: 12/06/2024 - DOI: 10.5281/zenodo.11608662

Natural disasters pose increasing threats to communities and economies worldwide, emphasizing the urgency for technological interventions to support first responders and decision-makers in affected areas. To address this need, we introduce a novel computing continuum architecture designed for efficient offloading of distributed intelligences across cloud, edge, and deep edge tiers. Our approach leverages an AI crosslayer framework, integrating service, network, and infrastructure management, to optimize decision-making processes in timesensitive disaster scenarios. By employing federated learning techniques, our architecture enables both mobile and stationary devices to autonomously train local models, contributing to the development of a comprehensive global common model. Through this collaborative approach, we aim to enhance the capabilities of disaster management systems, facilitating more effective responses to critical events.

Federated Learning on Raspberry Pi 4: A Comprehensive Power Consumption Analysis

Sebbio, Serena; Morabito, Gabriele; Catalfamo, Alessio; Carnevale, Lorenzo; Fazio, Maria

Publication date: 12/06/2024 - DOI: 10.1145/3603166.3632545

Edge Computing, a rapidly evolving sector within information technology, redefines data processing and analysis by shifting it closer to the data source, away from centralized cloud servers. This paradigm promises substantial benefits for diverse applications. In the realm of Artificial Intelligence and Machine Learning, Federated Learning emerges as a pioneering technique that harnesses Edge Computing for statistical model training. Federated Learning presents numerous advantages over traditional centralized Machine Learning, including reduced latency, heightened privacy, and real-timedata processing. Nonetheless, it introduces concerns regarding energy consumption, particularly for battery-powered Edge devices designed for remote or harsh environments. This study provides a comprehensive assessment of power consumption within the context of Federated Learning operations. To achieve this, a Raspberry Pi 4 and an INA 219 current sensor are employed. Results show that, during communication operations, the power consumption of the target device increases from a minimum of 8% to a maximumof 32% with respect to its idle state. During the local training operations it increases respectively by up to 32% for a CNN model and by up to 40% for aRNN model.

Make Federated Learning a Standard in Robotics by Using ROS2

Marino, Roberto; Carnevale, Lorenzo; Fazio, Maria; villari, massimo

Publication date: 12/06/2024 - DOI: 10.1145/3632366.3632373

The use of the Federated Learning paradigm could be disruptive in robotics, where data are naturally distributed among teams of agents and centralizing them would increase latency and break privacy. Unfortunately there are a lack of robot oriented framework for federated learning that use state ofthe art machine learning libraries. ROS2 (Robot Operating Systems) is a standard de-facto in robotics for building upteams of robots in a multi-node fully distributed manner. In this paper we presents the integration of ROS2 with PyTorch allowing an easy training of a global machine learning model starting from a set of local datasets. We present the architecture, the used methodology and finally we discuss the experimentation results over a well-known public dataset.

When Robotics Meets Distributed Learning: the Federated Learning Robotic Network Framework

Marino, Roberto; Carnevale, Lorenzo; Villari, Massimo

Publication date: 12/06/2024 - DOI: 10.1109/ISCC58397.2023.10218022

Federated Learning (FL) is a cutting-edge technology for distributed solving of large-scale problems using local data exclusively. The potential of Federated Learning is nowadays clear in different context from automatic analysis of healthcare data to object recognition in video sources coming from public video streams, from distributed search for data breach and finance frauds to collaborative learning of hand typing on mobile phone. Multi-robot systems can also largely benefit from FL concerning resolution of problems like trajectory prediction, non colliding trajectory generation, distributed localization and mapping or distributed reinforcement learning. In this paper we propose a multi-robot framework that includes distributed learning capabilities by using Decentralized Stochastic Gradient Descent on graphs. First of all we motivate the position of the paper discussing the privacy preserving problem for multi robot systems and the need of decentralized learning. Then we build our methodology starting from a set of prior definitions. Finally we discuss in details the possible applications in robotics field.

FedROS: The ROS Framework for Federated Learning on Mobile Edge Devices

Carnevale, Lorenzo; Gambito, Mark Adrian; Marino, Roberto; Saraniti, Davide; Catalfamo, Alessio; Villari, Massimo

Publication date: 12/06/2024 - DOI: 10.5281/zenodo.11608934

Federated Learning is a computing paradigm that shift the concept of learning from a single to a distributed system. Many applications have been considered in literature, for example mobile edge computing. In this context, robotic is an emerging trend, which takes advantage in terms of infrastructure optimization, such as resource allocation and communication efficiency, as well as in business solutions. In this poster, we propose a novel framework for submitting FL jobs on ROS-based devices. The framework, called FedROS, composes the containers of FL clients and server ROS2 packages programmatically.

Layer-wise feedback propagation

Weber, Leander; Berend, Jim; Wiegand, Thomas; Samek, Wojciech; Lapuschkin, Sebastian

Publication date: 23/08/2023 - DOI: 10.5281/zenodo.11549616

In this paper, we present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors that utilizes explainability, specifically Layer-wise Relevance Propagation (LRP), to assign rewards to individual connections based on their respective contributions to solving a given task. This differs from traditional gradient descent, which updates parameters towards an estimated loss minimum. LFP distributes a reward signal throughout the model without the need for gradient computations. It then strengthens structures that receive positive feedback while reducing the influence of structures that receive negative feedback. We establish the convergence of LFP theoretically and empirically, and demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets. Notably, LFP overcomes certain limitations associated with gradient-based methods, such as reliance on meaningful derivatives. We further investigate how the different LRP-rules can be extended to LFP, what their effects are on training, as well as potential applications, such as training models with no meaningful derivatives, e.g., step- function activated Spiking Neural Networks (SNNs), or for transfer learning, to efficiently utilize existing knowledge.

DualView: Data Attribution from the Dual Perspective

Yolcu, Galip Ümit; Wiegand, Thomas; Samek, Wojciech; Lapuschkin, Sebastian

Publication date: 19/02/2024 - DOI: 10.48550/arXiv.2402.12118

Local data attribution (or influence estimation) techniques aim at estimating the impact that individual data points seen during training have on particular predictions of an already trained Machine Learning model during test time. Previous methods either do not perform well consistently across different evaluation criteria from literature, are characterized by a high computational demand, or suffer from both. In this work we present DualView, a novel method for post-hoc data attribution based on surrogate modelling, demonstrating both high computational efficiency, as well as good evaluation results. With a focus on neural networks, we evaluate our proposed technique using suitable quantitative evaluation strategies from the literature against related prin-
cipal local data attribution methods. We find that DualView requires considerably lower computational resources than other methods, while demonstrating comparable performance to competing approaches across evaluation metrics. Futhermore,
our proposed method produces sparse explanations, where sparseness can be tuned via a hyperparameter. Finally, we showcase that with DualView, we can now render explanations from local data attributions compatible with established
local feature attribution methods: For each prediction on (test) data points explained in terms of impactful samples from the training set, we are able to compute and visualize how the prediction on (test) sample relates to each influential train-
ing sample in terms of features recognized and by the model. We provide an Open Source implementation of DualView online1 , together with implementations for all other local data attribution methods we compare against, as well as the metrics reported here, for full reproducibility.

Explainable AI for Time Series via Virtual Inspection Layers

Vielhaben, Johanna; Lapuschkin, Sebastian; Montavon, Grégoire; Samek, Wojciech

Publication date: 10/06/2024 - DOI: 10.48550/arXiv.2303.06365

The field of eXplainable Artificial Intelligence (XAI) has greatly advanced in recent years, but progress has mainly been made in computer vision and natural language processing. For time series, where the input is often not interpretable, only limited research on XAI is available. In this work, we put forward a virtual inspection layer, that transforms the time series to an interpretable representation and allows to propagate relevance attributions to this representation via local XAI methods like layer-wise relevance propagation (LRP). In this way, we extend the applicability of a family of XAI methods to domains (e.g. speech) where the input is only interpretable after a transformation. Here, we focus on the Fourier transformation which is prominently applied in the interpretation of time series and LRP and refer to our method as DFT-LRP. We demonstrate the usefulness of DFT-LRP in various time series classification settings like audio and electronic health records. We showcase how DFT-LRP reveals differences in the classification strategies of models trained in different domains (e.g., time vs. frequency domain) or helps to discover how models act on spurious correlations in the data.

Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression

Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute; Bareeva, Dilyara; Dreyer, Maximilian; Pahde, Frederik; Wojciech, Samek; Lapuschkin, Sebastian

Publication date: 15/04/2024 - DOI: 10.5281/zenodo.11545313

Deep Neural Networks are prone to learning and relying on spurious correlations in the training data, which, for high-risk applications, can have fatal consequences. Various approaches to suppress model reliance on harmful features have been proposed that can be applied post-hoc without additional training. Whereas those methods can be applied with efficiency, they also tend to harm model performance by globally shifting the distribution of latent features. To mitigate unintended overcorrection of model behavior, we propose a reactive approach conditioned on model-derived knowledge and eXplainable Artificial Intel-
ligence (XAI) insights. While the reactive approach can be applied to many post-hoc methods, we demonstrate the incorporation of reactivity in particular for P-ClArC (Projective Class Artifact Compensation), introducing a new method called R-ClArC (Reactive Class Artifact Compensation). Through rigorous experiments in controlled settings (FunnyBirds) and with a real-world dataset (ISIC2019), we show that introducing reactivity can minimize the detrimental effect of the applied correction while simultaneously ensuring low reliance on spurious features.

AudioMNIST: Exploring Explainable Artificial Intelligence for Audio Analysis on a Simple Benchmark

Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute; Becker, Sören; Vielhaben, Johanna; Ackermann, Marcel; Müller, Klaus-Robert; Lapuschkin, Sebastian; Samek, Wojciech

Publication date: 27/11/2023 - DOI: 10.1016/j.jfranklin.2023.11.038

Explainable Artificial Intelligence (XAI) is targeted at understanding how models perform feature selection and derive their classification decisions. This paper explores post-hoc explanations for deep neural networks in the audio domain. Notably, we present a novel Open Source audio dataset consisting of 30,000 audio samples of English spoken digits which we use for classification tasks on spoken digits and speakers’ biological sex. We use the popular XAI technique Layer-wise Relevance Propagation (LRP) to identify relevant features for two neural network architectures that process either waveform or spectrogram representations of the data. Based on the relevance scores obtained from LRP, hypotheses about the neural networks’ feature selection are derived and subsequently tested through systematic manipulations of the input data. Further, we take a step beyond visual explanations and introduce audible heatmaps. We demonstrate the superior interpretability of audible explanations over visual ones in a human user study.

Explaining Predictive Uncertainty by Exposing Second-Order Effects

Bley, Florian; Lapuschkin, Sebastian; Samek, Wojciech; Montavon, Grégoire

Publication date: 30/01/2024 - DOI: 10.48550/arXiv.2401.17441

Explainable AI has brought transparency into complex ML blackboxes, enabling, in particular, to identify which features these models use for their predictions. So far, the question of explaining predictive uncertainty, i.e. why a model ‘doubts’, has been scarcely studied. Our investigation reveals that predictive uncertainty is dominated by second-order effects, involving single features or product interactions between them. We contribute a new method for explaining predictive uncertainty based on these second-order effects. Computationally, our method reduces to a simple covariance computation over a collection of first-order explanations. Our method is generally applicable, allowing for turning common attribution techniques (LRP, Gradient × Input, etc.) into powerful second-order uncertainty explainers, which we call CovLRP, CovGI, etc. The accuracy of the explanations our method produces is demonstrated through systematic quantitative evaluations, and the overall usefulness of our method is demonstrated via two practical showcases.

Human-Centered Evaluation of XAI Methods

Dawoud, Karam; Samek, Wojciech; Eisert, Peter; Lapuschkin, Sebastian; Bosse, Sebastian

Publication date: 16/10/2023 - DOI: 10.1109/ICDMW60847.2023.00122

In the ever-evolving field of Artificial Intelligence, a critical challenge has been to decipher the decision-making processes within the so-called ”black boxes” in deep learning. Over recent years, a plethora of methods have emerged, dedicated to explaining decisions across diverse tasks. Particularly in tasks like image classification, these methods typically identify and emphasize the pivotal pixels that most influence a classifier’s prediction. Interestingly, this approach mirrors human behavior: when asked to explain our rationale for classifying an image, we often point to the most salient features or aspects. Capitalizing on this parallel, our research embarked on a user-centric study. We sought to objectively measure the interpretability of three leading explanation methods: (1) Prototypical Part Network, (2) Occlusion, and (3) Layer-wise Relevance Propagation. Intriguingly, our results highlight that while the regions spotlighted by these methods can vary widely, they all offer humans a nearly equivalent depth of understanding. This enables users to discern and categorize images efficiently, reinforcing the value of these methods in enhancing AI transparency.

From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space

Dreyer, Maximilian; Pahde, Frederik; Anders, Christopher J.; Samek, Wojciech; Lapuschkin, Sebastian

Publication date: 18/12/2023 - DOI: 10.1609/aaai.v38i19.30096

Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method for model correction on the concept level that explicitly reduces model sensitivity towards biases via gradient penalization. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures.

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Dreyer, Maximilian; Achtibat, Reduan; Samek, Wojciech; Lapuschkin, Sebastian

Publication date: 29/04/2024 - DOI: 10.48550/arXiv.2311.16681

Ensuring both transparency and safety is critical when deploying Deep Neural Networks (DNNs) in high-risk applications, such as medicine. The field of explainable AI (XAI) has proposed various methods to comprehend the decision-making processes of opaque DNNs. However, only few XAI methods are suitable of ensuring safety in practice as they heavily rely on repeated labor-intensive and possibly biased human assessment. In this work, we present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. What sets our approach apart is the combination of local and global strategies, enabling a clearer understanding of the (dis-)similarities in model decisions compared to the expected (prototypical) concept use, ultimately reducing the dependence on human long-term assessment. Quantifying the deviation from prototypical behavior not only allows to associate predictions with specific model sub-strategies
but also to detect outlier behavior. As such, our approach constitutes an intuitive and explainable tool for model validation. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets (ImageNet, CUB-200, and CIFAR-10) utilizing VGG, ResNet, and EfficientNet architectures.

PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Dreyer, Maximilian; Purelku, Erblina; Vielhaben, Johanna; Samek, Wojciech; Lapuschkin, Sebastian

Publication date: 09/04/2024 - DOI: 10.48550/arXiv.2404.06453

The field of mechanistic interpretability aims to study the role of individual neurons in Deep Neural Networks. Single neurons, however, have the capability to act poly-semantically and encode for multiple (unrelated) features, which renders their interpretation difficult. We present a method for disentangling polysemanticity of any Deep Neural Network by decomposing a polysemantic neuron into multiple monosemantic “virtual” neurons. This is achieved by identifying the relevant sub-graph (“circuit”) for each “pure” feature. We demonstrate how our approach allows us to find and disentangle various polysemantic units
of ResNet models trained on ImageNet. While evaluating feature visualizations using CLIP, our method effectively disentangles representations, improving upon methods based on neuron activations.

XAI-based Comparison of Input Representations for Audio Event Classification

Frommholz, Annika; Seipel, Fabian; Lapuschkin, Sebastian; Samek, Wojciech; Vielhaben, Johanna

Publication date: 27/04/2023 - DOI: 10.1145/3617233.3617265

Deep neural networks are a promising tool for Audio Event Classification. In contrast to other data like natural images, there are many sensible and non-obvious representations for audio data, which could serve as input to these models. Due to their black-box nature, the effect of different input representations has so far mostly been investigated by measuring classification performance. In this work, we leverage eXplainable AI (XAI), to understand the underlying classification strategies of models trained on different input representations. Specifically, we compare two model architectures with regard to relevant input features used for Audio Event Detection: one directly processes the signal as the raw waveform, and the other takes in its time-frequency spectrogram representation. We show how relevance heatmaps obtained via "Siren"Layer-wise Relevance Propagation uncover representation-dependent decision strategies. With these insights, we can make a well-informed decision about the best input representation in terms of robustness and representativity and confirm that the model’s classification strategies align with human requirements.

The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus

Hedström, Anna; Bommer, Philine; Wickstroem, Kristoffer K.; Samek, Wojciech; Lapuschkin, Sebastian; Höhne, Marina

Publication date: 19/07/2023 - DOI: 10.48550/arXiv.2302.07265

One of the unsolved challenges in the field of Explainable AI (XAI) is determining how to most reliably estimate the quality of an explanation method in the absence of ground truth explanation labels. Resolving this issue is of utmost importance as the evaluation outcomes generated by competing evaluation methods (or “quality estimators”), which aim at measuring the same property of an explanation method, frequently present conflicting rankings. Such disagreements can be challenging for practitioners to interpret, thereby complicating their ability to select the best-performing explanation method. We address this problem through a meta-evaluation of different quality estimators in XAI, which we define as “the process of evaluating the evaluation method”. Our novel framework, MetaQuantus, analyses two complementary performance characteristics of a quality estimator: its resilience to noise and reactivity to randomness, thus circumventing the need for ground truth labels. We demonstrate the effectiveness of our framework through a series of experiments, targeting various open questions in XAI such as the selection and hyperparameter optimisation of quality estimators. Our work is released under an open-source license1 to serve as a development tool for XAI- and Machine Learning (ML) practitioners to verify and benchmark newly constructed quality estimators in a given explainability context. With this work, we provide the community with clear and theoretically-grounded guidance for identifying reliable evaluation methods, thus facilitating reproducibility in the field.

A Fresh Look at Sanity Checks for Saliency Maps

Hedström, Anna; Weber, Leander; Lapuschkin, Sebastian; Höhne, Marina

Publication date: 03/05/2024 - DOI: 10.5281/zenodo.11546698

The Model Parameter Randomisation Test (MPRT) is highly recognised in the eXplainable Artificial Intelligence (XAI) community due to its fundamental evaluative criterion: explanations should be sensitive to the parameters of the model they seek to explain. However, recent studies have raised several methodological concerns for the empirical interpretation of MPRT. In response, we propose two modifications to the original test: Smooth MPRT and Efficient MPRT. The former reduces the impact of noise on evaluation outcomes via sampling, while the latter avoids the need for biased similarity measurements by re-interpreting the test through the increase in explanation complexity after full model randomisation. Our experiments show that these modifications enhance the metric reliability, facilitating a more trustworthy deployment of explanation methods.

Explainable concept mappings of MRI: Revealing the mechanisms underlying deep learning-based brain disease classification

Tinauer, Christian; Damulina, Anna; Soellradl, Martin; Achtibat, Reduan; Dreyer, Maximilian; Pahde, Frederik; Lapuschkin, Sebastian; Schmidt, Reinhold; Ropele, Stefan; Langkammer, Christian

Publication date: 16/04/2024 - DOI: 10.48550/arXiv.2404.10433

Motivation. While recent studies show high accuracy in the classification of Alzheimer’s disease using deep neural networks, the underlying learned concepts have not been investigated.
Goals. To systematically identify changes in brain regions through concepts learned by the deep neural network for model validation.
Approach. Using quantitative R2* maps we separated Alzheimer’s patients (n=117) from normal controls (n=219) by using a convolutional neural network and systematically investigated the learned concepts using Concept Relevance Propagation and compared these results to a conventional region of interest-based analysis.
Results. In line with established histological findings and the region of interest-based analyses, highly relevant concepts were primarily found in and adjacent to the basal ganglia.
Impact. The identification of concepts learned by deep neural networks for disease classification enables validation of the models and could potentially improve reliability.

Detection and Estimation of Gas Sources with Arbitrary Locations based on Poisson's Equation

SHUTIN, DMITRIY; Wiedemann, Thomas; Hinsen, Patrick

Publication date: 21/12/2023 - DOI: 10.1109/OJSP.2023.3344076

Accurate estimation of the number and locations of dispersed material sources is critical for optimal disaster response in Chemical, Biological, Radiological, or Nuclear accidents. This paper introduces a novel approach to Gas Source Localization that uses sparse Bayesian learning adapted to models based on Partial Differential Equations for modeling gas dynamics. Using the method of Green’s functions and the adjoint state method, a gradient-based optimization with respect to source location is derived, allowing superresolving (arbitrary) source locations. By combing the latter with sparse Bayesian learning, a sparse source support can be identified, thus indirectly assessing the number of sources. Simulation results and comparisons with classical sparse estimators for linear models demonstrate the effectiveness of the proposed approach. The proposed sparsity-constrained gas source localization method offers thus a flexible solution for disaster response and robotic exploration in hazardous environments.

Evaluating Deep Neural Network-based Fire Detection for Natural Disaster Management

Tzimas, Matthaios Dimitrios; Papaioannidis, Christos; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 10/07/2025 - DOI: 10.1145/3632366.3632369

© ACM 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in BDCAT'23, https://doi.org/10.1145/3632366.3632369.

Recently, climate change has led to more frequent extreme weather events, introducing new challenges for Natural Disaster Management (NDM) organizations. This fact makes the employment of modern technological tools such as Deep Neural Networks-based fire detectors a necessity, as they can assist such organizations manage these extreme events more effectively. In this work, we argue that the mean Average Precision (mAP) metric that is commonly used to evaluate typical object detection algorithms can not be trusted for the fire detection task, due to its high dependence on the employed data annotation strategy. This means that the mAP score of a fire detection algorithm may be low even when it predicts fire bounding boxes that accurately enclose the depicted fires. In this direction, a new evaluation metric for fire detection is proposed, denoted as Image-level mean Average Precision (ImAP), which reduces the dependence on the bounding box annotation strategy by rewarding/penalizing bounding box predictions on image level, rather than on bounding box level. Experiments using different object detection algorithms have shown that the proposed ImAP metric reveals the true fire detection capabilities of the tested algorithms more effectively.

STRUCTURED EFFICIENT SELF-ATTENTION SHOWCASED ON DETR-BASED DETECTORS

Militsis, Nikolaos Marios; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 07/01/2025 - DOI: 10.5281/zenodo.14608445

The Multi-Head Self-Attention (MHSA) mechanism stands as the cornerstone of Transformer architectures, endowing them with unparalleled expressive capabilities. The main learnable parameters in a transformer self-attention block include matrices that project the input features into subspaces, where similarity metrics are thereby calculated. In this paper, we argue that we could use less learnable parameters for achieving good projections. We propose the Structured Efficient Self-Attention (SESA) module, a generic paradigm inspired by the Johnson-Lindenstrauss (JL) lemma, that employs an Adaptive Fast JL Transform (A-FJLT) parameterised by a single learnable vector for each projection. This allows us to eliminate a substantial 75% of the learnable parameters of the legacy MHSA, with very slight sacrifices to accuracy. SESA properties are showcased on the demanding task of object detection at the COCO dataset, achieving comparable performance with its computationally intensive counterparts.

These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios with Decisive Disparity Diffusion

Chuang-Wei Liu; Yikang Zhang; Qijun Chen; Ioannis Pitas; Rui Fan

Publication date: 06/11/2024 - DOI: 10.48550/arXiv.2411.03717

Stereo matching has emerged as a cost-effective solution for road surface 3D reconstruction, garnering significant attention towards improving both computational efficiency and accuracy. This article introduces decisive disparity diffusion (D3Stereo), marking the first exploration of dense deep feature matching that adapts pre-trained deep convolutional neural networks (DCNNs) to previously unseen road scenarios. A pyramid of cost volumes is initially created using various levels of learned representations. Subsequently, a novel recursive bilateral filtering algorithm is employed to aggregate these costs. A key innovation of D3Stereo lies in its alternating decisive disparity diffusion strategy, wherein intra-scale diffusion is employed to complete sparse disparity images, while inter-scale inheritance provides valuable prior information for higher resolutions. Extensive experiments conducted on our created UDTIRI-Stereo and Stereo-Road datasets underscore the effectiveness of D3Stereo strategy in adapting pre-trained DCNNs and its superior performance compared to all other explicit programming-based algorithms designed specifically for road surface 3D reconstruction. Additional experiments conducted on the Middlebury dataset with backbone DCNNs pre-trained on the ImageNet database further validate the versatility of D3Stereo strategy in tackling general stereo matching problems.

3D-Flood Dataset

Kitsos, Filippos

Publication date: 27/05/2024 - DOI: 10.5281/zenodo.11349721

The Aristotle University of Thessaloniki (hereinafter, AUTH) created the following dataset, entitled ‘3D-Flood’, within the context of the project TEMA that was funded by the European Commission-European Union.

The dataset will be used for the construction of a 3D model regarding the district of Agios Thomas in Larisa, Greece, after the flood events of 2023. It is comprised of 795 UAV video frames, taken from 4 YouTube videos.

We provide the links for each YouTube video, along with the frame numbers that we kept for each video.

Details on acquiring the dataset can be found here.

Flood Master Dataset

Kitsos, Filippos; Zamioudis, Alexandros

Publication date: 06/06/2024 - DOI: 10.5281/zenodo.11501494

Our Master Flood Dataset consists of flood images picked from different publicly available datasets. The origins of the images is specified in the "sources.csv" file.

The dataset consists of 282 train, 87 validation and 1973 test frames. We provide the frames from the sourced videos and segmentation masks of the flooded areas.

Details on acquiring the dataset can be found here.

Blaze Fire Classification – Segmentation Dataset

Michalis, Siamvrakas; Kitsos, Filippos

Publication date: 06/06/2024 - DOI: 10.5281/zenodo.11501836

The dataset is destined to be used for wildfire image classification and burnt area segmentation tasks for Unmanned Aerial Vehicles. It is comprised of 5,408 frames of aerial views taken from 56 videos and 2 public datasets. From the D-Fire public dataset, 829 photographs were used; and from the Burned Area UAV public dataset 34 images were used. For the classification task, there are 5 classes (‘Burnt’, ‘Half-Burnt’, ’Non-Burnt’, ‘Fire’, ‘Smoke’). As for the segmentation task, 404 segmentation masks on a subset have been created, which assign to each pixel of the image the class ‘burnt’ or the class ‘non-burnt’.

Details on acquiring the dataset can be found here.

Mastodon Posts Dataset

Avgoustidis, Fotios; Giannouris, Polydoros; Kitsos, Filippos

Publication date: 06/06/2024 - DOI: 10.5281/zenodo.11502116

The dataset comprises of 766 social media posts in Greek language from the platform “Mastodon” spanning the 2023 wildfires in Greece. Each post was annotated internally with Plutchik-8 emotions.

Details on acquiring the dataset can be found here.

Datasets and Publications

Publications about the project