RumorCrisisDB: A Social-Media Crisis Rumor Database for Misinformation Diffusion Analytics
Main article
Abstract
Crisis events concentrate the conditions under which rumors thrive: high uncertainty, intense emotion, and an accelerated demand for information that official channels cannot immediately satisfy. Although a number of valuable public corpora capture fragments of this phenomenon, they were built for different tasks, follow incompatible schemas, use divergent label vocabularies, and rarely preserve the full propagation structure that diffusion analytics requires. This article presents RumorCrisisDB, a relational database design and construction framework that integrates heterogeneous crisis-rumor resources into a single event-centric, cascade-preserving, and annotation-harmonized data model. We first analyze the gap left by existing resources and articulate four use cases that an integrated database must serve: diffusion measurement, detection benchmarking, intervention evaluation, and longitudinal crisis comparison. We then specify the six-entity schema, a six-stage construction pipeline covering re-collection, normalization, linkage, and label harmonization, and the quality-control procedures attached to each stage. The analytics layer is validated through controlled stochastic experiments: Galton–Watson cascade simulations reproduce the heavy-tailed size distributions reported for empirical rumor cascades, and Maki–Thompson-style spreading experiments quantify how the timing of debunking responses changes peak rumor prevalence, with early intervention reducing the simulated peak by more than half relative to a late response. The article closes with the reproducibility and open-access protocol, built on identifier-based redistribution, FAIR principles, and datasheet documentation, together with an explicit account of the design’s limitations.
