Synthetic Telematics Insurance Data for Transparent Rate-Making: A Benchmark Framework for Claim Frequency, Severity, and Territory Clustering
Main article
Abstract
Telematics-enabled usage-based insurance (UBI) is reshaping how motor insurers measure and price individual driver risk, yet the move from traditional rating tables to data-rich pricing engines also creates new regulatory pressures around transparency, fairness, and territorial differentiation. This study proposes a benchmark modelling framework that combines penalised generalised linear models, generalised additive models with spline-based smooth functions, gradient-boosted decision trees, and an interpretable low-dimensional clustering procedure for territory design. Using a publicly available synthetic telematics portfolio that mirrors the statistical properties of operational UBI books, we estimate claim frequency and claim severity separately, examine variable interactions between annual mileage and traditional rating factors, and rank predictor importance using SHAP values from an XGBoost model. We then apply a regularised k-means procedure to cluster territories on policy-relevant risk indicators, with the optimal number of clusters chosen by a penalised MAD/MSE criterion. The results show that generalised additive models deliver substantially better group-level calibration than linear pricing models for young drivers and for high-mileage segments, while interpretable clustering on two-dimensional risk maps produces transparent territory structures that regulators can inspect, defend, and compare across rate filings. The framework supports actuarial fairness review, exposure-based rate justification, and reproducible UBI rate-making in markets where model explainability is a regulatory requirement rather than an optional feature.
