On this page

Optimized Design of Distributed Computing Architecture for Massive Data Storage

By: Xiang Li 1
1 Image and Text Information Center, Jiangsu Province Nantong Industry & Trade Technician College, Nantong, Jiangsu, 226010, China

Abstract

With the arrival of the big data era in full swing, the global data volume is experiencing a never-beforeseen dramatic growth, and the traditional centralized storage architecture has shown serious performance limitations in dealing with such a huge data scale. To address this issue, this paper gives an optimized design approach for distributed storage systems based on HDFS, MapReduce, and cloud computing technologies, which fully exploits the cluster parallel processing capability by dispersing data and computation tasks to multiple nodes. Experimental data show that the computation time can be drastically reduced to one quarter of the original when using distributed techniques to process data of the same size. In this paper, the system layer through the standardized interface to achieve functional interconnection and data flow, while the system adopts a hybrid storage model, the strengths of relational databases and non-relational databases are organically combined to achieve efficient management of structured, semi-structured and unstructured data. The optimized system is significantly better than the traditional system in terms of data writing speed, reading speed, query response time, and system resource utilization and other key indicators, and has good scalability and high reliability. These research results have important theoretical value and practical significance for promoting the in-depth application of big data technology in various industries.