Text Analysis and Risk Identification of Financial Reporting Based on Natural Language Processing Algorithms in the Era of Intelligence

Wang, Xiangling

doi:70517/ijhsa46345

Research article
DOI: https://doi.org/10.70517/ijhsa46345

Volume 46, Issue 3
Pages: 615
-626
Open Access
Download

Text Analysis and Risk Identification of Financial Reporting Based on Natural Language Processing Algorithms in the Era of Intelligence

By: ^¹

¹Department of Economics and Trade, Yongcheng Vocational College, Yongcheng, Henan, 476600, China

Published: 03/08/2025

Abstract

As an important data of annual operation and production overview and financial situation of listed companies, the analysis of their text sentiment has an important application value in financial risk identification. In this paper, the financial reports of listed companies are taken as the research object, and TF-IDF is used to extract the structured, data-oriented and visualized information in the text. Then, using N-Gram model, the text information is processed by word vector. Subsequently, the improved sentiment co-occurrence algorithm is used to extract and expand the general sentiment lexicon to construct the financial report sentiment lexicon. Meanwhile, the SEN-TF-IDF algorithm is introduced to build the annual report sentiment dataset. The construction and improvement of the financial report sentiment dictionary is completed through the extraction of financial report text information and the learning of word vectorized representation. Comparing with the general sentiment dictionary, the financial report sentiment dictionary has the highest F1 value of 0.872 under the research threshold of 0.6, which demonstrates its superiority in analyzing and mining the sentiment tendency in the field of financial reporting.

Keywords: financial reporting, sentiment dictionary, risk identification, SEN-TF-IDF

On this page

Text Analysis and Risk Identification of Financial Reporting Based on Natural Language Processing Algorithms in the Era of Intelligence

Abstract