2022 Development of Data and AI Integration in China

Source:iResearchJuly 29,20226:31 PM Overview

Traditional data warehouse and the separation of data lake and data warehouse make data and AI integration and enterprises’ agile decision-making difficult. Since data silos still exist, decision-making is not based on full data. The circulation of data results in high costs, long cycle and lack of timeliness. Based on separated storage, cache and computing, data lake, data warehouse and AI data unify metadata management, which can achieve the best result in data volume, cost, efficiency and agility。

Open-source model contributes a lot to the ecology of data and AI. However, this doesn't mean all companies need to build their own data and intelligence platforms through open-source products. In fact, most companies focus on their core businesses and choose commercial data and intelligence platforms that have stable performance, data-intelligence integration, end-to-end automation and intelligence and do not need operation or maintenance. Enterprises that are more flexible and open have lower IT talent replenishment costs.

Rising Data Volume and Unstructured Data Proportion 

Global data volume is surging at an annual growth rate of over 59%. 80% of the data are unstructured or semi-structured. The amount of data in China is rising at a higher rate. Object storage-based data lakes are increasingly common thanks to the rising data volume and proportion of non-structured data. Unified management, query, and use of data are the new challenges.

Development of AI Needs Large Amount of Accurate Data

When enterprises deploy AI applications, the effect of AI applications depends largely on the quality of data resources. To promote high-quality AI applications, targeted data governance is the primary link. The traditional data governance systems built by the enterprises mostly focus on the optimization of structured data governance. AI applications' high requirements for data can hardly be met in the dimensions of data quality, data field richness, data distribution and real-time data. Enterprises need to carry out secondary data governance for artificial intelligence applications.

Pain Point: Data Warehouse, Data Lake and AI Data Form New Data Silos

Data analysis and AI analysis have developed for many years. There are many specialized data systems for different tasks. The data warehouse systems process structured data, but the scale is small; Big data systems handle massive amounts of data and unstructured data; AI systems generally store data locally. These proprietary systems have shortcomings. Some of them form new data silos. Some need to migrate data to develop different businesses, which consumes time, storage and network resources. What's worse, there is a risk of data inconsistencies. It is also It is relatively difficult to drill down and trace the data when abnormal data is found. They can hardly meet the needs of agile data analysis in a rapidly changing market environment.

Strategy 2: Unify metadata to a central node

Unify the data warehouse, data lake, AI data directory, data permission, transaction consistency, multi-version management and other capabilities to a central node, and rely on this node to access data, therefore, data use wouldn't be constrained by isolated systems. This distributed storage and unified management Master-Slave architecture are similar to Mapreduce in the computing field. This approach can break down data silos and allow the same set of data to be freely shared among multiple engines. Different users have the same permissions and Consistent transaction control regardless of the tool they use to access data. It can also avoid the waste of resources caused by data migration. Any link can see the full amount of data within its authority. All models are based on a single source of truth ( raw data) to avoid inconsistencies in results caused by different teams' different data analysis. Once an anomaly is found, it can be easily drilled down and backtracked.

Huawei Cloud DataArts+ModelArts

Huawei Cloud combines big data and AI through the integration of DataArts and ModelArts. It unifies metadata so that data can be used for multiple purposes, break down data silos, and avoid data migration. The separation of storage, cache and memory considers storage, cost and computing performance. The combination of DataQps and MLOps different departments of the enterprise to use data in a way they prefer. Low code, no code and AI4Data make the whole process automated and intelligent.

HUAWEI Cloud DataArts+ModelArts

Huawei Cloud has advantages in software and hardware integration, industrial practice experience, and open source ecology. In the area of software and hardware integration, Huawei Cloud is good at the underlying technologies of computing, storage and network. For example, Huawei has an obvious advantage in the number of patent applications for RDMA, one of the key technologies of memory pooling. In the aspect of industrial practice experience, Huawei Cloud has adhered to the principle of everything as a service and has accumulated a lot of practical experience in the areas of the Internet and government and enterprises. It uses the experience to improve its products. In terms of open-source ecology, Huawei Cloud contributes a lot to Hadoop and Spark communities. Thus, HUAWEI CLOUD has a deeper understanding of security, stability, and other aspects of these open-source products. DataArts has better compatibility with the mainstream versions of these open source products.

Table of Contents of the Full Report


1 Development Background of Data and AI Integration in China

1.1 Social Background

1.1.1 Data Volume and the Proportion of Unstructured Data Are Rising

1.1.2 Multi-source and Heterogeneous Data Become Normal

1.1.3 The Volume, Variety, Veracity, Value and Velocity of Big Data Need to be Further Released

1.2 Technology Background

1.2.1 Cloud Native: From Micro Service to Serverless

1.2.2 Development of AI Needs a Large Amount of Accurate Data

1.2.3 Business Agility Needs De-process of IT Architecture

2 Pain Points of Enterprises Data and AI Integration and Corresponding Strategies
2.1 Pain Point 1: Large Data Volume, Low Cost and High Efficiency Can Hardly be Achieved at the Same Time
2.2 Solution 1: Separation of Storage, Cache and Computing
2.3 Pain Point: Warehouse-Lake –AI Data Form New Data Silos
2.4 Solution 2: Unify Metadata to Central Nodes

2.5 Pain Point 3: Rich Open-source Products; Difficult Development, Operation and Maintenance
2.6 Solution 3: Integration of DataOps and MLOps
2.7 Pain Point 4: Complex and Inefficient Data Preparation
2.8 Solution 4: End-to-end Automation and Intelligence

3 Typical Practices of Data and AI Integration
3.1 Core Advantages of Data and AI Integration of Huawei Cloud
3.1.1 DataArts + ModelArts of Huawei Cloud
3.2 Typical Practices of Data and AI Integration of Huawei Cloud  
3.2.1 IT Service
3.2.2 Online Car-hailing
3.2.3 Social Network


In the VUCA era, market changes have accelerated. Enterprises need more agile and accurate decision-making which should be based on accurate and full data instead of partial data or dirty data. It should be arbitrarily initiated by business personnel and data analysts without complex processes and cooperation between departments.

Related Reports
Top Reports
Beijing Office
3/F, Tower B, Guanghualu SOHO II, No. 9 Guanghua Road, Chaoyang District, Beijing, 100020 Phone: +86 18610937103
Services Related:wanghe9@iresearch.com.cn Media Interview: vikdong@iresearch.com.cn