AutoDBBench: An Automated Benchmarking Workbench for Relational, Graph, Vector, and Lakehouse Databases

Zhiqiang   Hu; Lijie  Tan; Bowen  Qin; Min  Yu

doi:10.63646/datamind.2023.010103

Open Access PDF

Published 2023-03-05

Zhiqiang Hu

School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China

Lijie Tan*

School of Information Engineering, Nanchang University, Nanchang 330031, China
tan.lijie@ncu.edu.cn

Bowen Qin

School of Software, Yunnan University, Kunming 650091, China

Min Yu

School of Data Science, Qingdao University, Qingdao 266071, China

DOI: https://doi.org/10.63646/datamind.2023.010103

Abstract

The modern data stack now routinely combines relational engines, property graph databases, vector indexes, and lakehouse storage in the same analytical workflow, yet there is no widely accepted benchmarking infrastructure that treats these four families as comparable members of a single evaluation universe. Existing benchmarks such as TPC-C, TPC-H, LDBC SNB, and ANN-Benchmarks are excellent within their respective domains but use incompatible workload generators, metric reporters, and reproducibility conventions, so a practitioner comparing alternatives must stitch together heterogeneous tools and accept that the resulting numbers will not be directly comparable. This article presents AutoDBBench, an automated benchmarking workbench that unifies workload generation, query execution, resource monitoring, and visualization across the four database families through a common internal data model. AutoDBBench centers its design on the database itself: a documented schema, a typed field dictionary, indexed metric storage, a quality control pipeline, and a reusable application programming interface together turn the workbench into a research database rather than a one-off scripting harness. We describe the architecture, the internal data model, the workload-template grammar, and the fault-injection facility, and we report a runnable experiment over eight target databases on a five-node test cluster. Across the chosen workloads AutoDBBench surfaces a 5.6 percent throughput regression in a graph engine that none of the upstream vendor benchmarks detected, attributes a 41.7 percent latency tail to a buffer pool misconfiguration, demonstrates linear scalability up to 16 nodes for three of the four families, and recovers from injected network partitions in 35 to 88 seconds. The full workbench, including configuration files, dictionaries, and reproducible containers, is released under an open license.

Keywords: Database benchmarking; workload generation; reproducible experiments; relational database; graph database; vector database; lakehouse; performance monitoring

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Hu, Z. ., Tan, L., Qin, B., & Yu, M. (2023). AutoDBBench: An Automated Benchmarking Workbench for Relational, Graph, Vector, and Lakehouse Databases. DATAMIND, 1(1), 20-32. https://doi.org/10.63646/datamind.2023.010103

Download Citation

Article sidebar

Main article

Abstract

Article details

How to Cite