On this page

Analysis of the Application of Speech Recognition Technology in French Cross-Cultural Communication and Its Impact on Improving Students’ Language Proficiency

By: Min Shang 1,2, Yifeng Wang 2, Jin Chai 1
1Xi’an International University, Xi’an, Shaanxi, 710077, China
2Xidian University, Xi’an, Shaanxi, 710126, China

Abstract

This study focuses on the development of high-precision French speech recognition technology and its application in cross-cultural communication teaching. First, we propose an end-to-end French phoneme recognition method based on cross-modal knowledge distillation, using a CTC decoder to address phoneme alignment issues, and designing a frame-level distillation weight adaptation mechanism and sequence-level distillation. Additionally, we integrate speaker recognition technology based on i-vectors, using factor analysis to extract low-dimensional speaker features, thereby enhancing the system’s adaptability to learners. We also propose a teaching strategy to enhance students’ language proficiency by cultivating French thinking, creating authentic contexts, strengthening cross-cultural awareness, and establishing a layered interactive teaching model. Experiments based on French speech datasets show that the English pre-trained model performs optimally, with a CER of 8.87% and a SER of 10.46% between the Latin alphabet and the French alphabet set. The CTC decoder significantly outperforms the Transformer/Conformer, with a CER 9.42 percentage points lower than the Transformer encoder’s 24.95%. After introducing i-vectors, the maximum error rate reduction reached 61.2%, and the syllable error rate SER on multilingual character sets decreased from 18.60% to 7.22%. Through stepwise multiple regression analysis of 476 student questionnaires, it was found that language attitude is the core predictor of conversational ability (β = 0.24, explaining 13.4% of the variance), self-efficacy dominates French proficiency improvement (β = 0.24, △R² = 0.065), and learning resources contribute most to reading ability (β = 0.33, explaining 21.1% of the variance).