Aiming at the issues of low accuracy and poor robustness in the recommendation system for cultural tourism attractions, this article adopted a combination model of multimodal visual geometry group 16 (VGG16) and neural collaborative filtering (NCF) to study the intelligent identification and recommendation of cultural tourism attractions. Firstly, the convolutional neural network (CNN) VGG16 model was adopted for feature extraction of scenic spot images, and multimodal data was combined to help recommendation systems better understand the characteristics of scenic spots and improve the accuracy of recognition and classification of scenic spot images. Then, a neural collaborative filtering model was introduced to fully consider the relevant information of tourists for personalized recommendation of tourist attractions, improving tourist satisfaction and recommendation accuracy. By comparing the recommendation performance of four models, NCF, content filtering, collaborative filtering, and matrix factorization, on a self built dataset, the test outcomes indicate that the recommendation accuracy of NCF model reaches 97.21%, which is 8.07% higher than collaborative filtering, and the recommendation coverage reaches 99.41%, with a response speed of only 64.1ms, which improves the accuracy and adaptability of the recommendation of the system of cultural and tourist attractions in different situations.