Educational quality assessment faces the challenge of diversifying students’ needs, and traditional assessment methods are difficult to accurately cope with individual differences. This study proposes an educational quality assessment and learning path design method that combines K-mean clustering and reinforcement learning. First, we construct a student user profile and establish a teaching quality evaluation system with four primary indicators and eighteen secondary indicators; then, we classify the student group by K-mean clustering, and determine the optimal number of clusters based on the sum of squared intra-group distances; finally, we use reinforcement learning algorithms to design a personalized learning path recommendation system. The empirical study collected the online education assessment scores of 30,000 students, and the cluster analysis showed that four groups of students were clearly characterized: the first group (10,144) had moderate learning pressure but poor memory; the second group (6,845) had high learning pressure but lacked motivation; The third category (7168) has extreme learning pressure and poor emotional control; the fourth category (5843) has good grades with little learning pressure but poor emotional control. Reinforced learning path recommendation experiments show that the system reaches a stable learning gain after 45 iterations and can generate different learning paths according to learners’ individualized needs. The results prove that this method can effectively identify the characteristics of student groups, provide learning paths that meet individual needs, and provide a feasible solution for intelligent and personalized teaching.