Learning Analytics Databases for Early Warning of Student Dropout
Main article
Abstract
Student dropout is one of the most consequential outcomes in higher education, with far-reaching consequences for individual career trajectories, institutional funding, and national workforce development. Despite the proliferation of Learning Management Systems that generate rich longitudinal records of student behaviour, most universities lack a structured, integrated database infrastructure that converts raw LMS event logs, enrolment records, financial aid data, and social network interactions into actionable early warning signals. This article presents EWS-LMS-DB, a purpose-built learning analytics database designed to support reproducible early warning research and evidence-based intervention in undergraduate education. The database integrates six core relational tables covering student demographics, course enrolment records, LMS interaction events, financial aid transactions, risk alerts, and intervention outcomes, covering 28,640 students across 14 semesters at three federal universities in Brazil. An LSTM-based sequential model, EWS-LSTM, is benchmarked against Random Forest and Logistic Regression baselines, achieving an AUC-ROC of 0.88 at semester week 8, providing an average of 8.4 weeks of early warning lead time before confirmed dropout. A fairness analysis across eight demographic groups reveals that true positive rate parity is maintained within ±5 percentage points across gender, first-generation, scholarship, and rural-urban subgroups. An intervention backtest using matched control groups shows that students who received an advisor contact within one week of a high-risk alert had a 23 percentage point higher semester retention rate at twelve weeks. The database schema, field dictionary, ingestion pipeline, and a 20 percent open sample are released for reproducible experimentation.
