Datasets and Publications

Project publications are originally saved on a Zenodo community. Access the project's community page to see the details.

Displaying 81-89 of 89 records

More than just Tweets: the potential of alternative geo-social media data for disaster management

Vračević, Nikola; Schmidt, Sebastian; Keskin, Merve; Hanny, David; Resch, Bernd

Publication date: 20/07/2025 - DOI: 10.1007/s13278-025-01494-z

Natural disasters are increasingly prevalent worldwide, necessitating the utilisation of diverse datasets for effective disaster response. While geo-social media data represents a valuable resource in this context, the recent restrictions to Twitter data have significantly impacted its availability for disaster research. Alternative social media platforms to Twitter remain underexplored, leading to limited understanding of their potential. To address this gap, we collected posts for a specific use case, Hurricane Ian, from four social media platforms (Mastodon, Reddit, Telegram, and TikTok), and subsequently geoparsed each post. We then computed spatial and temporal patterns and evaluated their correlations to investigate the potential applicability of other data sources for disaster response efforts. While none of the platforms can fully substitute Twitter’s role in disaster management, the findings demonstrated that substantial amounts of potentially valuable data can be sourced from other platforms. Despite consistent overall patterns, subtle differences in temporal activity and spatial distribution suggest that each platform offers unique insights that enhance situational awareness. However, a significant challenge in using these platforms for disaster response is the low spatial accuracy achievable through geoparsing.

Bluesky as a social media data source for disaster management: investigating spatio-temporal, semantic and emotional patterns for floods and wildfires

Hanny, David; Schmidt, Sebastian; Resch, Bernd

Publication date: 19/12/2025 - DOI: 10.1007/s42001-025-00448-x

Social media has become a key data source for near-real-time disaster monitoring and response, with Twitter playing a central role for over a decade. However, recent Application Programming Interface (API) changes on Twitter (now: X) have restricted academic data access, creating an urgent need to identify viable alternatives. This study investigates the suitability of the decentralised social media platform Bluesky for disaster-related geo-social media analysis, aiming to evaluate whether it can serve as a viable alternative microblogging platform for spatio-temporal disaster monitoring. Using a keyword-based crawling pipeline, we collected 676,337 posts related to two major natural disasters: the September 2024 Central Europe floods and the January 2025 Southern California wildfires. We applied a multilingual analysis pipeline covering semantic, emotional, geospatial, and temporal modalities. It includes disaster-relatedness classification, emotion detection, geoparsing and subsequent spatio-temporal aggregation. Our results show that disaster-related content on Bluesky surged in direct response to the disasters, peaking at up to 80% of daily posts during the main impact phases. Emotional expressions, particularly fear and anger, rose sharply alongside event progression. Geospatial analysis of the geoparsed data revealed heightened disaster-related posting activity in affected areas, demonstrating the platform’s utility for geographic disaster monitoring. However, large differences between urban and rural regions, as well as between different countries, were identified. Furthermore, we demonstrate current platform limitations such as user penetration, API constraints and sensitivity to keyword selection.

Few-Shot Learning for Relevance Classification of Textual Social Media Posts in Disaster Response

Hanny, David; Schmidt, Sebastian; Gandhi, Shaily; Resch, Bernd

Publication date: 26/06/2025 - DOI: 10.5281/zenodo.18234131

Social media can provide real-time insights during natural disasters, yet efficiently identifying relevant content remains a challenge due to the reliance on large labelled datasets and high computational costs. This study therefore investigates the potential of Few-Shot Learning (FSL) for relevance classification of textual social media posts during disasters. We compare few-shot prompting using eight Small Language Models (SLMs) and a contrastive learning approach (SetFit) with data from five disasters across the world: the 2020 California wildfires, 2021 Ahr Valley floods, 2023 Chile wildfires, 2023 Emilia-Romagna floods, and 2023 Turkey/Syria earthquake. GPT-4o-mini achieves the highest average macro F1 score (0.77) using just five labelled examples per class, while the multilingual-e5-base model fine-tuned with SetFit offers a strong alternative (avg. macro F1 = 0.65) without reliance on prompt engineering. Our findings highlight the potential of SLMs and FSL for scalable and resource-efficient data analytics in disaster management and broader social science research.

STRUCTURED EFFICIENT SELF-ATTENTION SHOWCASED ON DETR-BASED DETECTORS

Militsis, Nikolaos Marios; Mygdalis, Vasileios; Pitas, Ioannis

Publication date: 07/01/2025 - DOI: 10.5281/zenodo.14608445

The Multi-Head Self-Attention (MHSA) mechanism stands as the cornerstone of Transformer architectures, endowing them with unparalleled expressive capabilities. The main learnable parameters in a transformer self-attention block include matrices that project the input features into subspaces, where similarity metrics are thereby calculated. In this paper, we argue that we could use less learnable parameters for achieving good projections. We propose the Structured Efficient Self-Attention (SESA) module, a generic paradigm inspired by the Johnson-Lindenstrauss (JL) lemma, that employs an Adaptive Fast JL Transform (A-FJLT) parameterised by a single learnable vector for each projection. This allows us to eliminate a substantial 75% of the learnable parameters of the legacy MHSA, with very slight sacrifices to accuracy. SESA properties are showcased on the demanding task of object detection at the COCO dataset, achieving comparable performance with its computationally intensive counterparts.

These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios with Decisive Disparity Diffusion

Chuang-Wei Liu; Yikang Zhang; Qijun Chen; Ioannis Pitas; Rui Fan

Publication date: 06/11/2024 - DOI: 10.48550/arXiv.2411.03717

Stereo matching has emerged as a cost-effective solution for road surface 3D reconstruction, garnering significant attention towards improving both computational efficiency and accuracy. This article introduces decisive disparity diffusion (D3Stereo), marking the first exploration of dense deep feature matching that adapts pre-trained deep convolutional neural networks (DCNNs) to previously unseen road scenarios. A pyramid of cost volumes is initially created using various levels of learned representations. Subsequently, a novel recursive bilateral filtering algorithm is employed to aggregate these costs. A key innovation of D3Stereo lies in its alternating decisive disparity diffusion strategy, wherein intra-scale diffusion is employed to complete sparse disparity images, while inter-scale inheritance provides valuable prior information for higher resolutions. Extensive experiments conducted on our created UDTIRI-Stereo and Stereo-Road datasets underscore the effectiveness of D3Stereo strategy in adapting pre-trained DCNNs and its superior performance compared to all other explicit programming-based algorithms designed specifically for road surface 3D reconstruction. Additional experiments conducted on the Middlebury dataset with backbone DCNNs pre-trained on the ImageNet database further validate the versatility of D3Stereo strategy in tackling general stereo matching problems.

3D-Flood Dataset

Kitsos, Filippos

Publication date: 27/05/2024 - DOI: 10.5281/zenodo.11349721

General description of the dataset

The dataset will be used for the construction of a 3D model regarding the district of Agios Thomas in Larisa, Greece, after the flood events of 2023. It is comprised of 795 UAV video frames, taken from four publicly available videos.

Dataset Structure

Within the dataset, a dedicated folder contains four CSV files. Each file provides the link to the original video and specifies the video frames that were utilized from each source.

Details on acquiring the dataset can be found here.

Flood Master Dataset

Kitsos, Filippos; Zamioudis, Alexandros

Publication date: 06/06/2024 - DOI: 10.5281/zenodo.11501494

General description of the dataset

This dataset comprises flood-related images collected from several publicly available datasets, as well as frames extracted and annotated from related videos. The training and validation sets were constructed using the following sources: “Flood Area Segmentation,” “Water Dataset,” and “Roadway Flooding Image Dataset.” For the first two sources, the corresponding binary masks were normalized to values in {0,1}, while the third source was retained in its original normalized form. The test set consists of video frames extracted from real flooding scenarios in Greece and Italy, which were annotated for segmentation purposes. Specifically, the Greek and Italian videos contributed 567 and 1,406 frames, respectively. If one uses any part of these datasets in his/her work, he/she is kindly asked to cite the following papers:

P. Mentesidis, V. Mygdalis and I.Pitas, "Improve Real-time flood segmentation by encoding and distilling foreground information", IEEE International Conference on Image Processing (ICIP), Anchorage, Alaska, USA, 13-17 September, 2025.
A. Gerontopoulos, D. Papaioannou, C. Papaioannidis and I.Pitas, "Real-Time Flood Water Segmentation with Deep Neural Networks", IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW), Tromsø, Norway, pp. 85-91, 2025

Dataset Structure

The dataset is organized into separate folders for the training, validation, and test sets, each containing the corresponding annotations and a CSV file specifying the image paths, annotation paths, and the source of each image. The training–validation split was performed using a 3:1 ratio. Finally, the URLs of the original data sources are provided in the “sources.csv” file.

Details on acquiring the dataset can be found here.

Blaze Fire Classification – Segmentation Dataset

Michalis, Siavrakas; Kitsos, Filippos

Publication date: 06/06/2024 - DOI: 10.5281/zenodo.11501836

The dataset will be used for wildfire image classification and burnt area segmentation tasks for Unmanned Aerial Vehicles. It is comprised of 5,408 frames of aerial views taken from 56 videos and 2 public datasets. From the D-Fire public dataset, 829 photographs were used; and from the Burned Area UAV public dataset 34 images were used. For the classification task, there are 5 classes (‘Burnt’, ‘Half-Burnt’, ’Non-Burnt’, ‘Fire’, ‘Smoke’). As for the segmentation task, 404 segmentation masks on a subset have been created, which assign to each pixel of the image the class ‘burnt’ or the class ‘non-burnt’. If one uses any part of these datasets in his/her work, he/she is kindly asked to cite the following paper:

M. Siavrakas, C. Papaioannidis and I.Pitas, “BLAZE: A dataset for wildfire and burnt area UAV image classification and segmentation”, IEEE International Conference on Image Processing (ICIP), Anchorage, Alaska, USA, 13-17 September, 2025.

Dataset Structure

CSV files are provided containing the frames taken from every video, the class that has been assigned to them, the path to the respective segmentation mask along with the mask for the segmentation subset and the related links to the public videos and the 2 public datasets.More details on the dataset are available in the following papers:

de Venâncio, P.V.A.B., Lisboa, A.C. & Barbosa, A.V. An automatic fire detection system based on deep convolutional neural networks for low-power, resource-constrained devices. Neural Comput & Applic 34, 15349–15368 (2022). DOI
Tiago F.R. Ribeiro, Fernando Silva, José Moreira, Rogério Luís de C. Costa,Burned area semantic segmentation: A novel dataset and evaluation using convolutional networks,ISPRS Journal of Photogrammetry and Remote Sensing,Volume 202,2023,Pages 565-580,ISSN 0924-2716. DOI

Details on acquiring the dataset can be found here.

Mastodon Posts Dataset

Avgoustidis, Fotios; Giannouris, Polydoros; Kitsos, Filippos

Publication date: 06/06/2024 - DOI: 10.5281/zenodo.11502116

General description of the dataset

The dataset comprises of 766 social media posts in Greek from the platform “Mastodon” spanning the 2023 wildfires in Greece. Each post was annotated internally with Plutchik-8 emotions. To obtain texts use the Mastodon API (https://docs.joinmastodon.org/api/) along with the provided IDs.

Dataset Structure

The dataset comprises a single CSV file containing pairs of text identifiers and their corresponding sentiment labels.

Details on acquiring the dataset can be found here.

First page
Previous page
1
2
3
4
5
6
7
8
9

Datasets and Publications

Publications about the project