HealthQueryHub: A Privacy-Preserving Federated Database Gateway for Cross-Institution Clinical Studies
Main article
Abstract
Clinical research across institutions is increasingly constrained by fragmented data silos, heterogeneous record systems, and strict regulatory requirements for patient privacy. While individual hospitals accumulate rich longitudinal datasets, the inability to perform cross-institutional queries without centralising patient data creates significant barriers to reproducible multi-centre studies, rare-disease cohort assembly, and large-scale clinical artificial intelligence model validation. HealthQueryHub is a federated database gateway designed to address these challenges by providing a secure, auditable query interface over distributed clinical repositories. The system enables researchers to construct and execute cohort queries spanning multiple institutions without transferring raw patient records. Four technical components form the core architecture: a semantic ontology layer built on UMLS and SNOMED-CT for cross-site field harmonisation, a role-based access control module with institutional ethics enforcement, a secure aggregation pipeline combining differential privacy and homomorphic encryption, and an immutable audit logging subsystem for regulatory accountability. The underlying database schema follows an extended OMOP Common Data Model augmented with provenance metadata and quality-control fields. Experimental evaluation using simulated multi-institutional datasets across three hospital nodes demonstrates overall query accuracy of 97.4%, mean federated query latency of 2.79 seconds, and differential privacy budget expenditure within epsilon = 1.0 per session. Ablation experiments confirm that the ontology mapping layer contributes the largest single accuracy gain, while the privacy pipeline introduces only a modest latency overhead of 0.91 seconds relative to an unprotected baseline. HealthQueryHub provides a reusable, reproducible, and ethically governed infrastructure for clinical research, with its full schema, API specification, and pipeline code released under an open-access licence.
