Datasets and Publications

Project publications are originally saved on a Zenodo community. Access the project's community page to see the details.

Displaying 31-40 of 89 records

NAVIGATING NEURAL SPACE: REVISITING CONCEPT ACTIVATION VECTORS TO OVERCOME DIRECTIONAL DIVERGENCE

Pahde, Frederik; Maximilian, Dreyer; Weckbecker, Moritz; Weber, Leander; Anders, Christopher J.; Wiegand, Thomas; Samek, Wojciech; Lapuschkin, Sebastian

Publication date: 07/05/2025 - DOI: 10.48550/arXiv.2202.03482

With a growing interest in understanding neural network prediction strategies, Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space. Commonly, CAVs are computed by leveraging linear classifiers optimizing the separability of latent representations of samples with and without a given concept. However, in this paper we show that such a separability-oriented computation leads to solutions, which may diverge from the actual goal of precisely modeling the concept direction. This discrepancy can be attributed to the significant influence of distractor directions, i.e., signals unrelated to the concept, which are picked up by filters (i.e., weights) of linear models to optimize class-separability. To address this, we introduce pattern-based CAVs, solely focussing on concept signals, thereby providing more accurate concept directions. We evaluate various CAV methods in terms of their alignment with the true concept direction and their impact on CAV applications, including concept sensitivity testing and model correction for shortcut behavior caused by data artifacts. We demonstrate the benefits of pattern-based CAVs using the Pediatric Bone Age, ISIC2019, and FunnyBirds datasets with VGG, ResNet, ReXNet, EfficientNet, and Vision Transformer as model architectures.

Improving Multilabel Text Emotion Detection with Emotion Interrelation Anchors

Giannouris, Polydoros; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 26/06/2025 - DOI: 10.1016/j.nlp.2025.100170

Emotion detection studies the problem of automatic identification of emotions expressed in text. Since multiple emotions may co-occur in a single text excerpt, state-of-the-art approaches often cast this multi-label classification task to multiple, independent binary classification tasks, each specialized for one emotion class. The main disadvantage of such approaches is that, by design, each binary classifier overlooks typical emotion interrelationships, such as co-occurrence (e.g., anger and fear) or mutual exclusiveness (e.g., sadness and joy). This paper proposes a simple and lightweight approach to re-introduce emotion interrelations into each binary classification task, where each binary classifier is able to understand the presence of other emotions, without directly inferring them. This is achieved by incorporating the proposed emotion anchors (i.e. features of representative emotional phrases) into the model of each binary classifier. More specifically, the model is trained to incorporate other emotions in its representation by learning the parameters of an attention mechanism. Based on experiments on multiple datasets, our approach improves emotion classification performance in both supervised and few-shot domain adaptation settings, outperforming standard binary models in terms of accuracy and macro averaged F1-scores. The approach is generic and can be applied to other interrelated multi-label binary classification tasks.

Exploring the Role of Engagement in Learning within a Rescue Department Community of Practice

Sever, Filip

Publication date: 26/06/2025 - DOI: 10.53615/2232-5697.14.207-219

Purpose: The purpose of this study is to investigate the role of engagement in building a community of practice (CoP) within a rescue department and its influence on workplace learning, knowledge exchange, and professional growth.

Study design/methodology/approach: This qualitative study employed focus group interviews with firefighters and fire officers, preceded by an expert interview, to explore the context of work and learning within a rescue department.

Findings: Findings reveal that engagement in a rescue department CoP is fostered by factors such as peer support, facilitation, intrinsic motivation, and flexible participation. These elements, alongside supportive organizational structures and adaptive leadership practices are crucial for building and sustaining the CoP and influencing workplace learning, knowledge exchange, and professional growth.

Originality/value: This paper provides new insights into CoP dynamics in the emergency services, highlighting the importance of inclusive practices, adaptive leadership, and digital facilitation to foster engagement

A Weighting Loss Approach for Transformer-Based Object Detection

Tzimas, Matthaios D.; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 30/05/2025 - DOI: 10.5281/zenodo.15552851

This paper introduces a training loss function tailored for object detection in transformer-based architectures. Our approach addresses the imbalance in ground-truth bounding box sizes during training by implementing a coordinate-based error-weighting mechanism for the $L_1$ loss. This modification stabilizes optimization and enhances detection performance, particularly in detection problems requiring bounding boxes of varying sizes within the same image, such as fire/smoke detection applications. By integrating this method into the Real-Time Detection Transformer (RT-DETR), we conduct extensive experiments across three fire/smoke detection datasets and compare our findings against leading real-time object detection algorithms, such as YOLO models. To further validate the generalizability of the proposed loss function, we incorporate it into various DETR-based architectures. Our experiments demonstrate the superior fire detection accuracy of RT-DETR trained with our method across all three datasets while ensuring its effectiveness on more complex datasets. This study not only enhances the capabilities of transformer-based architectures for real-time detection tasks but also contributes to the development of more efficient and reliable fire detection systems.

DIVIDE-AND-SUMMARIZE: ENHANCING DEEP NEURAL VIDEO SUMMARIZATION

Charalampakis, Evangelos; Papaioannidis, Christos; Pitas, Ioannis

Publication date: 22/05/2025 - DOI: 10.5281/zenodo.15487775

Sequence-based neural architectures, such as Long Short-Term Memory (LSTM) networks and Transformers, have driven advances in supervised video summarization by modeling inter-frame dependencies. However, existing methods assume that long-range dependencies are essential for summary generation, which may lead to unnecessary computational overhead. To address this, we propose a Field of View (FOV) adjustment strategy, Divide-and-Summarize (DIV-SUM). By partitioning input videos into smaller fragments of predefined size, our approach explicitly models short-range interframe relationships, enabling a fully parallelizable end-to-end video summarization pipeline. Furthermore, most prior work formulates neural video summarization as a frame-wise score regression task. We introduce a simple yet effective target space quantization module, which discretizes the regression targets into classes, introducing a tolerance margin that improves performance. Our approach offers two key benefits: (1) we achieve state-of-the-art performance on the SumMe benchmark while remaining competitive on TVSum, and (2) we significantly reduce the computational cost of inference, improving efficiency without sacrificing quality.

Neural Architecture Search and Knowledge Distillation for Semantic Image Segmentation on Big Wildfire Datasets

Vlachos, Evgenios; Papaioannidis, Christos; Pitas, Ioannis

Publication date: 21/05/2025 - DOI: 10.5281/zenodo.15479971

The increasing complexity of Deep Neural Network (DNN) models poses computational challenges for both DNN model development and their real-world deployment, particularly in the case of large training and test dataset scenarios. This is the case of forest fires, where huge UAV and synthetic image data have to be analyzed in real-time for efficient wildfire management. In this paper, we propose a novel combination of Neural Architecture Search (NAS) with Knowledge Distillation for burnt area image segmentation in the aftermath of a wildfire, by exploring a vast search space of DNN architectures and transferring learned DNN knowledge. We conducted our experiments on the BLAZE dataset depicting wildfires in Greece to evaluate the effectiveness of our approach on five different image segmentation DNN architectures. Our experiments demonstrated that for the best performing architecture, we have found a combination that can provide a 62.3% reduction of total trainable DNN parameters, alongside an increase of 1.02% in semantic image segmentation performance in terms of the mIoU metric.

IMPROVE REAL-TIME FLOOD SEGMENTATION BY ENCODING AND DISTILLING FOREGROUND INFORMATION

Mentesidis, Pantelis; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 21/05/2025 - DOI: 10.5281/zenodo.15479831

Flood segmentation systems play a crucial role in natural disaster management, particularly for real-time flood monitoring, thus real-time lightweight deep neural network (DNN) models constitute the state-of-the-art (SOTA) solution. A neglected aspect during the design of such solutions is that flood segmentation is a computer vision problem where the variance of visual appearance between the foreground (flood) and the background is imbalanced. This paper tackles this imbalance using Knowledge Distillation (KD), enhancing the capabilities of real-time SOTA DNN models for flood segmentation in complex and challenging environments. The proposed method employs a Self-KD approach, where a Teacher model, trained on augmented inputs with reduced background variance by exploiting traditional image processing techniques (e.g., blurring), guides a Student model operating on real-world data. By consistently processing augmented inputs, the Teacher model facilitates the Student’s ability to learn robust representations, effectively suppressing noisy background elements. Experimental results on a flood dataset demonstrate improvement of up to 2.5% in mean Intersection over Union (mIoU) over baseline SOTA models which scored 85% mIoU, highlighting the effectiveness of the proposed method. Furthermore, our approach is model-agnostic, consistently improving the performance of various SOTA DNN architectures across different models.

Promoting learning in the rescue department: A community of practice perspective

Sever, Filip

Publication date: 09/05/2025 - DOI: 10.3384/rela.2000-7426.5297

The modern digitalised workplace requires continuous learning to maintain the skills and knowledge required for civil protection work. The purpose of this study is to identify the primary factors that enable learning in the rescue department. Data was collected in semi-structured interviews with firefighters and fire officers from a Finnish rescue department. The results of the study show that peer support and learning preference are valued across organisational ranks. Technology has a crucial role as it disrupts workflows and necessitates new work requirements, while serving as a tool in social interactions, learning and knowledge management. The findings contribute to research on workplace learning through the development of communities of practice for civil protection workers, emphasising the need for collaboration and adaptive strategies for learning in the workplace.

Real-Time Flood Water Segmentation with Deep Neural Networks

Gerontopoulos, Anastasios; Papaioannou, Dimitrios; Papaioannidis, Christos; Pitas, Ioannis

Publication date: 28/04/2025 - DOI: 10.5281/zenodo.15296811

Nowadays, extreme floods pose severe threats to human lives and infrastructure, necessitating effective flood disaster management plans and systems. In this paper, we address the crucial need of real-time flood monitoring by leveraging computer vision models. To this end, benchmark deep neural networks are trained on the flood water segmentation task, using a novel dataset that was created by combining and annotating flood images from different sources. Our experimental evaluation of two real-world flood videos showcases the potential of computer vision in fast and accurate flood monitoring. Moreover, we investigated the use of semi-supervised training methods to enhance the flood segmentation performance by taking advantage of large unlabeled datasets. Our work emphasizes the potential of applying state-of-the-art big visual data analytics tools to mitigate the devastating impacts of floods on communities worldwide.

Extreme Weakly Supervised Binary Semantic Image Segmentation Via One-Pixel Supervision

Tzimas, Matthaios Dimitrios; Mygdalis, Vasileios; Papaioannidis, Christos; Pitas, Ioannis

Publication date: 15/04/2025 - DOI: 10.2139/ssrn.5217487

Despite recent advancements, Unsupervised Semantic Segmentation (USS) methods still exhibit a significant performance deficit compared to supervised approaches, particularly in binary semantic segmentation. This limitation arises because, without supervision, USS methods struggle to distinguish foreground from background image regions, particularly when the foreground contains small or uncommon objects. This issue is addressed by our proposed Extremely Weakly Supervised Binary Semantic Segmentation (EWS) framework. EWS expects minimal supervision, consisting only of a small set of one-pixel annotations explicitly belonging to the foreground class across the entire image dataset. Our approach leverages these one-pixel annotations and employs two contrastive losses to map visual transformer features into well-separated foreground and background feature clusters. Additionally, we propose a novel loss function to eliminate the need for hyperparameter tuning of the contrastive loss threshold, by dynamically computing it based on the similarity between the input image features. Even if we employ employ a single one-pixel annotation, EWS achieves competitive results in binary segmentation tasks while maintaining low computational costs, making it an efficient solution for critical segmentation applications.

Datasets and Publications

Publications about the project