RumorCrisisDB: A Social-Media Crisis Rumor Database for Misinformation Diffusion Analytics

Michael Anderson; Sarah Mitchell; David Thompson

doi:10.63646/

Open Access PDF

Published 2025-12-30

Michael Anderson

Department of Computer Science, Stanford University, Stanford, CA, USA

Sarah Mitchell*

School of Engineering, University of Michigan, Ann Arbor, MI, USA
sarah.mitchell@umich.edu

David Thompson

Department of Computer Science, Stanford University, Stanford, CA, USA

DOI: https://doi.org/10.63646/

Abstract

Crisis events concentrate the conditions under which rumors thrive: high uncertainty, intense emotion, and an accelerated demand for information that official channels cannot immediately satisfy. Although a number of valuable public corpora capture fragments of this phenomenon, they were built for different tasks, follow incompatible schemas, use divergent label vocabularies, and rarely preserve the full propagation structure that diffusion analytics requires. This article presents RumorCrisisDB, a relational database design and construction framework that integrates heterogeneous crisis-rumor resources into a single event-centric, cascade-preserving, and annotation-harmonized data model. We first analyze the gap left by existing resources and articulate four use cases that an integrated database must serve: diffusion measurement, detection benchmarking, intervention evaluation, and longitudinal crisis comparison. We then specify the six-entity schema, a six-stage construction pipeline covering re-collection, normalization, linkage, and label harmonization, and the quality-control procedures attached to each stage. The analytics layer is validated through controlled stochastic experiments: Galton–Watson cascade simulations reproduce the heavy-tailed size distributions reported for empirical rumor cascades, and Maki–Thompson-style spreading experiments quantify how the timing of debunking responses changes peak rumor prevalence, with early intervention reducing the simulated peak by more than half relative to a late response. The article closes with the reproducibility and open-access protocol, built on identifier-based redistribution, FAIR principles, and datasheet documentation, together with an explicit account of the design’s limitations.

Keywords: Crisis informatics; rumor detection; misinformation diffusion; social media datasets; database schema; cascade analysis; reproducibility

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Anderson, M., Mitchell, S., & Thompson, D. (2025). RumorCrisisDB: A Social-Media Crisis Rumor Database for Misinformation Diffusion Analytics. DATAMIND, 3(4), 5-28. https://doi.org/10.63646/

Download Citation

Article sidebar

Main article

Abstract

Article details

How to Cite