Main article

Jian Li
School of Environmental Science, Nanjing University of Information Science and Technology, Nanjing 210044, China
Mei Wang
College of Geography and Environmental Science, Henan University, Kaifeng 475004, China
Xiaofeng Zhang*
School of Environment and Resource Sciences, Shanxi University, Taiyuan 030006, China
xfzhang@sxu.edu.cn

DOI: https://doi.org/10.63646/datamind.2023.010205

Abstract

Urban air pollution poses one of the most complex policy challenges in environmental governance: the causal pathways linking emission sources to observed concentrations, and interventions to measured outcomes, are inherently heterogeneous across cities, sectors, and seasons. Conventional tabular air-quality databases store pollutant measurements and regulatory records in isolation, preventing the structured linking of pollution events with emission inventories, meteorological context, policy timelines, and health outcomes that effective intervention planning requires. This paper presents AirKGDB, an open urban air-quality knowledge graph database that formalises these multi-domain linkages through a seven-entity ontological schema grounded in W3C Semantic Sensor Network (SSN) and PROV-O provenance standards. AirKGDB integrates five heterogeneous source streams — national ground monitoring networks, MODIS and Sentinel-5P satellite retrievals, traffic and industrial emission inventories, weather reanalysis, and structured policy corpora — for 288 Chinese prefecture-level cities over the period 2014–2022. The database comprises 18.6 million pollution event nodes, 4.3 million policy-intervention nodes, and 214 million typed edges linking events to sources, interventions, weather contexts, and health outcomes. We conduct three reproducible experiments: (1) a difference-in-differences (DiD) evaluation of the 2013 Air Pollution Prevention and Control Action Plan across ten major cities, finding mean PM2.5 reductions of 22.4 μg/m³ (95% CI: [17.8, 27.0]); (2) a regional heterogeneity decomposition attributing PM2.5 reductions to industrial control, traffic restriction, and residential heating interventions; and (3) a policy case-retrieval experiment achieving NDCG@5 = 0.80 and Precision@1 = 0.82 for similar-context policy recommendation. AirKGDB, its construction pipeline, SPARQL/Cypher query templates, and evaluation scripts are released under CC-BY 4.0 to support reproducible environmental AI research.

Article details

How to Cite

Li, J. ., Wang, M., & Zhang, . X. . (2023). Urban Air-Quality Knowledge Graphs: Database-Driven Policy Analytics for Pollution Intervention Planning. DATAMIND, 1(2), 47-57. https://doi.org/10.63646/datamind.2023.010205