Urban Air-Quality Knowledge Graphs: Database-Driven Policy Analytics for Pollution Intervention Planning
Main article
Abstract
Urban air pollution poses one of the most complex policy challenges in environmental governance: the causal pathways linking emission sources to observed concentrations, and interventions to measured outcomes, are inherently heterogeneous across cities, sectors, and seasons. Conventional tabular air-quality databases store pollutant measurements and regulatory records in isolation, preventing the structured linking of pollution events with emission inventories, meteorological context, policy timelines, and health outcomes that effective intervention planning requires. This paper presents AirKGDB, an open urban air-quality knowledge graph database that formalises these multi-domain linkages through a seven-entity ontological schema grounded in W3C Semantic Sensor Network (SSN) and PROV-O provenance standards. AirKGDB integrates five heterogeneous source streams — national ground monitoring networks, MODIS and Sentinel-5P satellite retrievals, traffic and industrial emission inventories, weather reanalysis, and structured policy corpora — for 288 Chinese prefecture-level cities over the period 2014–2022. The database comprises 18.6 million pollution event nodes, 4.3 million policy-intervention nodes, and 214 million typed edges linking events to sources, interventions, weather contexts, and health outcomes. We conduct three reproducible experiments: (1) a difference-in-differences (DiD) evaluation of the 2013 Air Pollution Prevention and Control Action Plan across ten major cities, finding mean PM2.5 reductions of 22.4 μg/m³ (95% CI: [17.8, 27.0]); (2) a regional heterogeneity decomposition attributing PM2.5 reductions to industrial control, traffic restriction, and residential heating interventions; and (3) a policy case-retrieval experiment achieving NDCG@5 = 0.80 and Precision@1 = 0.82 for similar-context policy recommendation. AirKGDB, its construction pipeline, SPARQL/Cypher query templates, and evaluation scripts are released under CC-BY 4.0 to support reproducible environmental AI research.
