This work aims to quantify vocal singing timbre evaluation parameters that are subjective and apply them to intelligent objective evaluation software. This research first examines how vocal singing timbre is subjectively evaluated and then improves the universal assessment indices. It next looks into how to convert these standards into numerical vectors that can be used as input into an intelligent assessment system. Finally, it uses multilayer perceptrons and convolutional neural networks to model in order to extract timbre features and perform automatic evaluation. The implementation of the algorithm, including data sampling, preprocessing, embedding layer operation, intermediate layer convolutional operation, and post-processing algorithm, is also thoroughly covered in this work. The experimental findings demonstrate that the system can successfully eliminate human bias and realize timbre judgment that is somewhat consistent with subjective evaluation. Although aspects like mood and style have not yet been taken into account by the existing system, this study offers a theoretical framework and technical support for the validity and applicability of the vocal intelligent evaluation system.