On this page

Precise detection of structural variations in wheat genomes based on deep learning

By: Yanling Li 1, Zijing Dong 1, Yuhong Li 2, Fernando Bacao 3, Yuyang Zhao 1, Haiping Si 1
1College of Information and Management Science, Henan Agricultural University, Zhengzhou, Henan, 450002, China
2School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
3 NOVA Information Management School (NOVA lMS), Campus de Campolide, Universidade Nova de Lisboa,1070-312 Lisboa, Portugal

Abstract

Wheat, as an important global food crop, its genome structure variation directly affects yield and quality. In this study, a complete framework for detecting structural variations in wheat genome was constructed, which contains four core modules: data preprocessing, image generation, data amplification and deep learning prediction. Firstly, effective structural variation information is extracted from VCF files and generated into BED files, and then gene sequence data are converted into RGB images using gene visualization methods, and different types of structural variation are processed by the designed breakpoint strategy and compression strategy. To address the data imbalance problem, an improved generative adversarial network was proposed for data augmentation, and the F1 value reached 67.46% under the condition that the ratio of positive and negative samples was 1:1. Subsequently, the DLSVPre deep learning prediction model is constructed, using ResNet as the backbone network and incorporating the spatial attention mechanism, with Kaiming initialization and ReLU activation function to optimize the model performance. The experimental results show that the prediction accuracy of DLSVPre on the HG001 dataset is 98.45%, the recall is 97.26%, and the F1-score is 97.85%. The F1-score was improved by 60.58% on the PacBio dataset compared to the traditional GATK method. The study demonstrated that the method provides an effective technical tool for high-precision detection of structural variants in wheat genome.