This study addresses the challenge of transformer condition monitoring in the complex electromagnetic environment of urban building complexes by proposing an intelligent diagnostic method based on acoustic signature signal analysis. Through electromagnetic-mechanical coupling theory analysis, the study clarifies the acoustic signature generation mechanisms of winding vibration and core magnetostriction, and reveals the resonance risks caused by harmonic interference. A multi-channel high-speed synchronous data acquisition system is designed, integrating high-precision sensors and FPGA modules to collect vibration data from an 110kV transformer. An improved EEMD denoising algorithm is proposed, utilizing minimum cutoff frequency constraints and multi-sensor fusion strategies to enhance noise suppression performance. Based on the denoised acoustic signature features, an SSAE-IELM fault diagnosis model is constructed, with incremental extreme learning machines enabling rapid classification. Experiments show that the improved CEEMD algorithm achieves a signal-to-noise ratio of 18.11 dB, an improvement of 26% over EEMD, with the mean square error reduced to 0.0177 and computational efficiency improved by one-third. In transformer fault identification tests across four states (normal, short-circuit impact, DC bias, and partial discharge), the model achieves an accuracy rate of 94.11%, significantly outperforming CNN’s 82.46%.