Publications about the project
Unsupervised Multimodal Graph-based Model for Geo-social Analysis
The systematic analysis of user-generated social media content, especially when enriched with geospatial context, plays a vital role in domains such as disaster management and public opinion monitoring. Although multimodal approaches have made significant progress, most existing models remain fragmented, processing each modality separately rather than integrating them into a unified end-to-end model. To address this, we propose an unsupervised, multimodal graph-based methodology that jointly embeds semantic and geographic information into a shared representation space. The proposed methodology comprises two architectural paradigms: a mono graph (MonoGrah) model that jointly encodes both modalities, and a multi graph (MultiGraph) model that separately models semantic and geographic relationships and subsequently integrates them through multi-head attention mechanisms. A composite loss, combining contrastive, coherence, and alignment objectives, guides the learning process to produce semantically coherent and spatially compact clusters. Experiments on four real-world disaster datasets demonstrate that our models consistently outperformexisting baselines in topic quality, spatial coherence, and interpretability. Inherently domain-independent, the framework can be readily extended to diverse forms of multimodal data and a wide range of downstream analysis tasks.
Enhancing satellite-based emergency mapping: Identifying wildfires through geo-social media analysis
When a disaster emerges, timely acquisition of information is crucial for a rapid situation assessment. Although automation in the standard satellite-based emergency mapping workflow has been advanced, delays still occur at crucial steps. In order to speed up the provision of satellite-based crisis products to emergency managers, this paper proposes a geo-social media-based approach that detects disaster events based on the spatio-temporal analysis of georeferenced, disaster-related Tweets. The proposed methodology is validated on the basis of two use cases: wildfires in Chile and British Columbia. The results show the general ability of Twitter to forecast events several days in advance, at least for the Chile use case. However, there are large spatial differences, as there is a correlation between population density and the reliability of Twitter data. Consequently, only few meaningful alerts could be generated for British Columbia, an area with very low population numbers.
Multimodal GeoAI: An integrated spatio-temporal topic-sentiment model for the analysis of geo-social media posts for disaster management
A multimodal GeoAI approach to combining text with spatiotemporal features for enhanced relevance classification of social media posts in disaster response
Geo-referenced social media data supports disaster management by offering real-time insights through user-generated content. To identify critical information amid high volumes of noise, classifying the relevance of posts is essential. Most existing methods primarily use textual features, neglecting spatial and temporal context despite its importance in determining relevance. This study proposes a multimodal approach that integrates text with spatiotemporal features for relevance classification of geo-referenced social media posts. We evaluate our method on 4,574 manually labelled posts from five disasters: the 2020 California wildfires, 2021 Ahr Valley floods, 2023 Chile wildfires, 2023 Turkey earthquake and 2023 Emilia-Romagna floods. Labels were assigned based on text, geographic location and time. Our spatiotemporal features include proximity to disaster impact sites, local co-occurrences with disaster-related posts, event type and geographic context. When utilised on their own, they achieved a macro F1 score of 0.713 with a random forest classifier. A fine-tuned TwHIN-BERT-base model using only text scored 0.779. For multimodal classification, we tested feature concatenation, in-context learning, stacking and partial stacking. Partial stacking produced the highest macro F1 score (0.814). Our multilingual, context-aware classification approach lays the groundwork for more integrated GeoAI applications in disaster management, the social sciences and beyond.
Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks Approach
With the vast amount of social media posts available online, topic modeling and sentiment analysis have become central methods to better understand and analyze online behavior and opinion. However, semantic and sentiment analysis have rarely been combined for joint topic-sentiment modeling which yields semantic topics associated with sentiments. Recent breakthroughs in natural language processing have also not been leveraged for joint topic-sentiment modeling so far. Inspired by these advancements, this paper presents a novel framework for joint topic-sentiment modeling of short texts based on pre-trained language models and a clustering approach. The method leverages techniques from dimensionality reduction and clustering for which multiple algorithms were considered. All configurations were experimentally compared against existing joint topic-sentiment models and an independent sequential baseline. Our framework produced clusters with semantic topic quality scores of up to 0.23 while the best score among the previous approaches was 0.12. The sentiment classification accuracy increased from 0.35 to 0.72 and the uniformity of sentiments within the clusters reached up to 0.9 in contrast to the baseline of 0.56. The presented approach can benefit various research areas such as disaster management where sentiments associated with topics can provide practical useful information.
Multimodal Geo-Information Extraction from Social Media for Supporting Decision-Making in Disaster Management
Effective decision-making in natural disaster management relies heavily on a comprehensive understanding of the situation in affected areas. Social media has been established as a tool to monitor human response and damage assessment. Given the vast amounts of data available, computational methods such as topic modelling are typically employed to reduce information complexity. However, these methods mostly neglect aspects such as geographic location and emotional response, which frequently results in sequential workflows of initial semantic filtering and subsequent spatial or spatio-temporal analysis. This study presents a novel approach for multimodal information extraction from geo-social media data for aiding decision support in disaster management. The method leverages a spatial, temporal, semantic, and sentiment-based clustering approach of social media posts to extract clusters that provide insights into disaster-related content. A case study in the Ahr Valley region in Germany demonstrates the method’s effectiveness in providing actionable insights for disaster response and management. The approach offers a tool for the quick assessment of disaster-related information from social media, potentially aiding timely and informed decision-making.
Active Learning for Identifying Disaster-Related Tweets: A Comparison with Keyword Filtering and Generic Fine-Tuning
Information from social media can provide essential information for emergency response during natural disasters in near real-time. However, it is a difficult task to identify the disaster-related posts among the large amount of unstructured data available. Previous methods often use keyword filtering, topic modelling or classification-based techniques to identify such posts. Active Learning (AL) presents a promising sub-field of Machine Learning (ML) that has not been used much in the field of text classification of social media content. This study therefore investigates the potential of AL for identifying disaster-related Tweets. We compare a keyword filtering approach, a RoBERTa model fine-tuned with generic data from CrisisLex, a base RoBERTa model trained with AL and a fine-tuned RoBERTa model trained with AL regarding classification performance. For testing, data from CrisisLex and manually labelled data from the 2021 flood in Germany and the 2023 Chile forest fires were considered. The results show that generic fine-tuning combined with 10 rounds of AL outperformed all other approaches. Consequently, a broadly applicable model for the identification of disaster-related Tweets could be trained with very little labelling effort. The model can be applied to use cases beyond this study and provides a useful tool for further research in social media analysis.
Assessing the spatial accuracy of geocoding flood-related imagery using Vision Language Models
While the capabilities of large language models and visual language models for various classification tasks have advanced significantly, their potential for location inference remains largely underexplored. Therefore, this study evaluates the performance of four prominent models — BLIP-2, LLaVA1.6, OpenFlamingo, and GPT-4o — for geocoding flood-related images from Flickr. Model inferences are compared against the original photo locations and human-labelled assessments. Our findings reveal that GPT-4o achieves the highest spatial accuracy (median deviation of 89.12 km). OpenFlamingo geocodes the highest number of images (90.7%), albeit with fluctuating quality (median 408.35 km), while still outperforming the human annotators. LLaVA1.6 geocodes only 18.9% of all images, while BLIP-2 exhibits the highest median deviation (1,781 km). We observe a spatial bias in our results, with inferences being most accurate in Central Europe. Additionally, model results improve when images feature recognisable landmarks. The proposed workflow could significantly increase the amount of geocoded web-based data available for disaster management, though further research is required to enhance accuracy across diverse geographic contexts.
More than just Tweets: the potential of alternative geo-social media data for disaster management
Natural disasters are increasingly prevalent worldwide, necessitating the utilisation of diverse datasets for effective disaster response. While geo-social media data represents a valuable resource in this context, the recent restrictions to Twitter data have significantly impacted its availability for disaster research. Alternative social media platforms to Twitter remain underexplored, leading to limited understanding of their potential. To address this gap, we collected posts for a specific use case, Hurricane Ian, from four social media platforms (Mastodon, Reddit, Telegram, and TikTok), and subsequently geoparsed each post. We then computed spatial and temporal patterns and evaluated their correlations to investigate the potential applicability of other data sources for disaster response efforts. While none of the platforms can fully substitute Twitter’s role in disaster management, the findings demonstrated that substantial amounts of potentially valuable data can be sourced from other platforms. Despite consistent overall patterns, subtle differences in temporal activity and spatial distribution suggest that each platform offers unique insights that enhance situational awareness. However, a significant challenge in using these platforms for disaster response is the low spatial accuracy achievable through geoparsing.
Bluesky as a social media data source for disaster management: investigating spatio-temporal, semantic and emotional patterns for floods and wildfires
Social media has become a key data source for near-real-time disaster monitoring and response, with Twitter playing a central role for over a decade. However, recent Application Programming Interface (API) changes on Twitter (now: X) have restricted academic data access, creating an urgent need to identify viable alternatives. This study investigates the suitability of the decentralised social media platform Bluesky for disaster-related geo-social media analysis, aiming to evaluate whether it can serve as a viable alternative microblogging platform for spatio-temporal disaster monitoring. Using a keyword-based crawling pipeline, we collected 676,337 posts related to two major natural disasters: the September 2024 Central Europe floods and the January 2025 Southern California wildfires. We applied a multilingual analysis pipeline covering semantic, emotional, geospatial, and temporal modalities. It includes disaster-relatedness classification, emotion detection, geoparsing and subsequent spatio-temporal aggregation. Our results show that disaster-related content on Bluesky surged in direct response to the disasters, peaking at up to 80% of daily posts during the main impact phases. Emotional expressions, particularly fear and anger, rose sharply alongside event progression. Geospatial analysis of the geoparsed data revealed heightened disaster-related posting activity in affected areas, demonstrating the platform’s utility for geographic disaster monitoring. However, large differences between urban and rural regions, as well as between different countries, were identified. Furthermore, we demonstrate current platform limitations such as user penetration, API constraints and sensitivity to keyword selection.