Publications about the project
Few-Shot Learning for Relevance Classification of Textual Social Media Posts in Disaster Response
STRUCTURED EFFICIENT SELF-ATTENTION SHOWCASED ON DETR-BASED DETECTORS
© 2025 N. Militsis, V. Mygdalis, I. Pitas. This is the authors' version of the work. It is posted here for your personal use. Not for redistribution
The Multi-Head Self-Attention (MHSA) mechanism stands as the cornerstone of Transformer architectures, endowing them with unparalleled expressive capabilities. The main learnable parameters in a transformer self-attention block include matrices that project the input features into subspaces, where similarity metrics are thereby calculated. In this paper, we argue that we could use less learnable parameters for achieving good projections. We propose the Structured Efficient Self-Attention (SESA) module, a generic paradigm inspired by the Johnson-Lindenstrauss (JL) lemma, that employs an Adaptive Fast JL Transform (A-FJLT) parameterised by a single learnable vector for each projection. This allows us to eliminate a substantial 75% of the learnable parameters of the legacy MHSA, with very slight sacrifices to accuracy. SESA properties are showcased on the demanding task of object detection at the COCO dataset, achieving comparable performance with its computationally intensive counterparts.
These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios with Decisive Disparity Diffusion
Stereo matching has emerged as a cost-effective solution for road surface 3D reconstruction, garnering significant attention towards improving both computational efficiency and accuracy. This article introduces decisive disparity diffusion (D3Stereo), marking the first exploration of dense deep feature matching that adapts pre-trained deep convolutional neural networks (DCNNs) to previously unseen road scenarios. A pyramid of cost volumes is initially created using various levels of learned representations. Subsequently, a novel recursive bilateral filtering algorithm is employed to aggregate these costs. A key innovation of D3Stereo lies in its alternating decisive disparity diffusion strategy, wherein intra-scale diffusion is employed to complete sparse disparity images, while inter-scale inheritance provides valuable prior information for higher resolutions. Extensive experiments conducted on our created UDTIRI-Stereo and Stereo-Road datasets underscore the effectiveness of D3Stereo strategy in adapting pre-trained DCNNs and its superior performance compared to all other explicit programming-based algorithms designed specifically for road surface 3D reconstruction. Additional experiments conducted on the Middlebury dataset with backbone DCNNs pre-trained on the ImageNet database further validate the versatility of D3Stereo strategy in tackling general stereo matching problems.
3D-Flood Dataset
The Aristotle University of Thessaloniki (hereinafter, AUTH) created the following dataset, entitled ‘3D-Flood’, within the context of the project TEMA that was funded by the European Commission-European Union.
The dataset will be used for the construction of a 3D model regarding the district of Agios Thomas in Larisa, Greece, after the flood events of 2023. It is comprised of 795 UAV video frames, taken from 4 YouTube videos.
We provide the links for each YouTube video, along with the frame numbers that we kept for each video.
Details on acquiring the dataset can be found here.
Flood Master Dataset
Our Master Flood Dataset consists of flood images picked from different publicly available datasets. The origins of the images is specified in the "sources.csv" file.
The dataset consists of 282 train, 87 validation and 1973 test frames. We provide the frames from the sourced videos and segmentation masks of the flooded areas.
Details on acquiring the dataset can be found here.
Blaze Fire Classification – Segmentation Dataset
The dataset is destined to be used for wildfire image classification and burnt area segmentation tasks for Unmanned Aerial Vehicles. It is comprised of 5,408 frames of aerial views taken from 56 videos and 2 public datasets. From the D-Fire public dataset, 829 photographs were used; and from the Burned Area UAV public dataset 34 images were used. For the classification task, there are 5 classes (‘Burnt’, ‘Half-Burnt’, ’Non-Burnt’, ‘Fire’, ‘Smoke’). As for the segmentation task, 404 segmentation masks on a subset have been created, which assign to each pixel of the image the class ‘burnt’ or the class ‘non-burnt’.
Details on acquiring the dataset can be found here.
Mastodon Posts Dataset
The dataset comprises of 766 social media posts in Greek language from the platform “Mastodon” spanning the 2023 wildfires in Greece. Each post was annotated internally with Plutchik-8 emotions.
Details on acquiring the dataset can be found here.
Pagination
- First page
- Previous page
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9