Fine-grained image classification (FGIC) is increasingly important in computer vision, driven by its extensive use across various domains. However, existing methods often struggle to achieve both discriminative feature representation and semantic consistency, primarily due to subtle inter-class differences, complex object structures, and background clutter. To tackle these issues, this paper proposes a innovative framework named CARE-Net (Cross-level Adaptive Recalibration and Enhancement Network), which enhances feature learning through three synergistic mechanisms: multi-scale fusion, crosslayer guidance, and explicit feature reconstruction. Specifically, CARE-Net extracts multi-scale features from different semantic levels and employs a guidance enhancement module to recalibrate shallow features using high-level semantic cues. A lightweight attention-based module is then introduced to adaptively fuse features across scales, reinforcing responses in key discriminative regions. Finally, an auxiliary reconstruction branch is incorporated to enforce structural consistency across semantic layers under supervision. Experimental results on the CUB-200-2011 and Stanford Dogs datasets show that CARENet achieves Top-1 classification accuracies of 76.3% and 75.8%, respectively, outperforming several mainstream baselines. Ablation experiments provide further evidence for the effectiveness and complementary nature of each module. These results demonstrate that CARE-Net provides an efficient and interpretable solution for FGIC in complex visual environments.