On this page

Cross-cultural analysis and travel preference prediction of social media data: a study based on the K-nearest neighbor algorithm

By: Dongxia Wu1
1School of Culture and Tourism, Huangshan Vocational and Technical College, Huangshan, Anhui, 245000, China

Abstract

Travel-related content on social media platforms is exploding, and there are significant differences in travel behaviors and preference expressions among tourists from different cultural backgrounds. This study integrates text mining techniques and K-nearest neighbor algorithm to cross-culturally analyze travel data on social media platforms and predict tourists’ preferences. The study crawled 3500 travel tips from Poor Traveler and GoWhere.com, and obtained 3000 valid data after cleaning. The TF-IDF algorithm is used to extract 50 highfrequency feature words, and the correlation matrix is constructed through the Ochiai coefficient, and the hierarchical clustering method is used to classify the tourism behaviors into three major categories and seven subclasses, namely, scenic area unique resource perception, entertainment experience behavior, and facility and service appeals. Meanwhile, an improved KNN algorithm based on vector orthogonalization and updated out-ofsample prediction method is proposed to predict the passenger flow at subway stations in A city. The results show that the average time-sharing prediction error of the whole network under 5-minute time granularity is 11.64%, and the cumulative all-day prediction error is 2.37%, and the prediction accuracy of the model is significantly better than that of the traditional method. The study found that more than 90% of the successfully matched samples were within one year before the prediction date, and the prediction accuracy was higher at the sites with higher passenger flow. This study provides an effective cross-cultural analysis framework and tourist preference prediction tool for the tourism industry, which can help companies develop accurate marketing strategies and personalized service plans.