On this page

Analyzing Nature Imagery in Tang Poetry Using LDA Thematic Modeling

By: Xiaoyi Dong 1
1China University of Petroleum (Beijing) Karamay, Karamay, Xinjiang, 834000, China

Abstract

In this study, a corpus selected from All Tang Poems was collected, and the information in the text about the study of this paper was extracted through data cleaning, de-duplication and other operations. The LDA topic model is used to classify the topics of Tang poems according to the topic words appearing in the poems. Combined with the TF-IDF algorithm, the probability distribution of natural imagery themes in Tang poems is calculated. By analyzing the emotional indexes of different natural imagery themes in Tang poems, the emotional characteristics embodied by the author in the poems are studied. In this paper, the LDA method is used to categorize the themes of Tang poems more accurately, and the theme words contained under each Tang poem theme are closely related. TF-IDF can be used to effectively determine the themes of Tang poems, such as the probability distributions of “natural imagery” and “wandering the world” in “The End of Spring”, which are 0.214 and 0.550, respectively. In Tang poetry, the theme of “natural imagery” had the lowest positive affective index of “cold rain” (0.17), and “pine and cypress” had the highest positive affective index of 0.83. In Tang poems, the high-frequency words related to the artistic conception of mountains are “moon”, “rain”, “smoke”, etc., which appear more than 15 times, which verifies the applicability of the LDA model in the text mining of Tang poems.