Speech recognition systems face the challenge of speech recognition from people with different accents from different countries and regions, and the research on multi-accent speech recognition methods has received extensive attention. In this paper, an end-to-end English accent recognition method based on Bidirectional Long Short-Term Memory Network Linked Temporal Classification and Attention Mechanism (BiLSTM-CTC-AM) is proposed and combined with the speech separation model of Parameter-Free Fourier Transformer Network (FNet), the design of an automatic recognition system for English accents is realized. Then the English text obtained from the recognition is used as the input corpus, and a grammar error correction model based on augmented multi-head attention is constructed. The experimental results show that the performance of speech recognition enhanced by accent information is further improved under the multi-task learning framework, and the word error rate is absolutely reduced by 0.4% and 1.1% on the Common Voice and AESRC2020 datasets, respectively. Comparing the RNNCTC and LSTM-CTC models, the word error rate of the BiLSTM-CTC-AM model in this paper is reduced by 11.23% and 3.68% to only 9.70%, which verifies the superiority of the model. In addition, for the correction of all types of errors, this paper’s English grammatical error automation is superior to the UIUC method, indicating that this paper’s method is effective. This paper provides a practical tool for automatic recognition of spoken language and automatic correction of grammatical errors in the teaching of spoken English in colleges and universities.