On this page

Sentencing standardization based on judicial big data using gradient boosting decision trees

By: Ying Chieh Lin 1, Shaojun Liu 1
1School of Civil, Commercial, and Economic Law, China University of Political Science and Law

Abstract

In the era of digital justice, the integration of big data analytics into sentencing decisions has emerged as a key direction for enhancing judicial transparency and fairness. This paper proposes a novel sentencing standardization framework based on judicial big data and interpretable machine learning. Focusing on online fraud adjudication documents from the Chinese judiciary, we construct a domain-specific database using a hybrid method of keyword-based pattern matching and association rule analysis to extract structured features such as criminal intent, means, economic loss, and mitigating factors. These features are encoded into machine-readable vectors and fed into a LightGBM-based gradient boosting decision tree (GBDT) model to predict sentencing outcomes. Extensive experiments using real-world fraud cases demonstrate the model’s high predictive performance, with R² scores reaching 0.98 and minimal average deviation. A series of visual and statistical evaluations—including boxplots, Taylor diagrams, and regression fits—validate the model’s robustness and its ability to replicate human sentencing logic.