Since the advent of computers, the von Neumann architecture has been consistently employed, with computation and storage at its core. The CPU, as the processing unit, is responsible for executing various arithmetic and logical calculations. RAM and hard drives are in charge of data storage and interact with the CPU.
As multimedia software such as graphics and 3D design rapidly developed, the workload to be processed became increasingly larger and more complex. To alleviate the pressure on the CPU, GPUs (Graphics Processing Units) specifically designed for image and graphics processing came into being.
Nowadays, with the booming digital economy, especially the rapid popularization and implementation of applications like generative artificial intelligence, big data analysis, autonomous driving, and the metaverse, the global demand for large-scale computing power across various industries is showing a sharp increase. At this juncture, DPU (Data Processing Unit), with its exceptional performance and unique advantages, is gradually emerging as one of the key technologies to boost computing power.
Advertisement
NVIDIA CEO Jen-Hsun Huang once stated in a speech: "DPU will become one of the three pillars of future computing, and the standard configuration for future data centers will be 'CPU + DPU + GPU'. The CPU is for general computing, the GPU for accelerated computing, and the DPU for data processing."
So, what is the main function of the DPU, and what advantages does it have over CPUs and GPUs?
01
The Main Differences between DPU, CPU, and GPU
The emergence of the DPU is not a coincidence but a strong response to the growing demand for data processing.
Functionally, although CPUs, GPUs, and DPUs are all computing processors, they each excel in different functions. The CPU is responsible for the overall operation of the computer system, the "brain" of the computer, suitable for a wide range of applications, but its performance is relatively limited when dealing with large-scale data and specific computational tasks.GPUs are specialized processors designed for graphic computing tasks, such as 3D image rendering or video processing. They have certain advantages for large-scale parallel computing tasks, such as deep learning training, but may not be the best choice for some specific tasks.
DPU, on the other hand, is specifically designed for data processing tasks, featuring highly optimized hardware structures that cater to the computational needs of specific domains. Its flexibility and high performance make it an essential component of future computing.
From an architectural perspective, CPUs consist of several powerful processing cores optimized for serial processing, with the advantage of executing tasks sequentially one by one. GPUs contain a large number of simpler cores optimized for parallel processing, with the advantage of handling a multitude of tasks simultaneously. DPU is composed of processing cores, hardware accelerator components, and high-performance networking interfaces, facilitating its handling of data-centric large-scale tasks.
Looking at the application fields, CPUs are present in almost all computing devices, including smartphones, computers, servers, etc. GPUs are commonly used in gaming PCs.
DPU is primarily used in data centers. FPGA is one of the core technologies of DPU, with the ability to reconfigure at the hardware level, making it suitable for a variety of computing tasks. DPU leverages the flexibility of FPGA to achieve efficient data processing by reconfiguring hardware. Heterogeneous computing is another key technology of DPU, which improves overall performance by utilizing different types of processing units to execute tasks simultaneously. The processing units in heterogeneous computing can include CPUs, GPUs, FPGAs, etc., working together to complete computing tasks. With the support of these two technologies, DPU can fully exert its performance advantages, providing strong computational power support for data centers.
In fact, DPU is not the first product to gain attention for compensating for the insufficiency of CPU capabilities. The popularity of GPUs years ago was also to make up for the shortcomings of CPUs in graphic processing capabilities. In other words, from CPUs, GPUs, to today's DPU, what is reflected behind the technological changes is actually the change of the times and the change of user needs.
02
The application of DPU is becoming more scenario-oriented.
The tasks undertaken by DPU can be summarized in four key words: virtualization, networking, storage, and security.
By delegating the control plane, DPU achieves complete isolation between host business and the control plane, thereby enhancing the security of the virtual environment. The efficient data processing capability of DPU accelerates communication between virtual machines, improving virtualization performance.Additionally, innovative algorithms and implementations in the storage industry can also be deployed in the DPU architecture, independently of the server operating system. DPU technology helps storage vendors achieve true "compute-storage separation."
In the realm of networking and security, with the increasing frequency of data and privacy breaches, data security and privacy protection have become highly concerning issues. DPU can leverage programmable hardware to offload and accelerate inline security services, providing robust zero-trust protection, effectively isolating host business and control planes, and ensuring data security.
In terms of specific application scenarios, the data centers mentioned earlier are just one of the primary areas where DPU is mainly applied.
Beyond data centers, DPU is equally capable of "mastering" a multitude of applications. In HPC and AI scenarios, DPU can offer ultra-high bandwidth, lossless networking, and high-speed storage access capabilities, providing the ultra-high-performance networking required for HPC and AI operations. Networking, storage, and security are the main applications of DPU.
In the burgeoning field of edge computing, the introduction of DPU is highly beneficial. As business grows, the demand for edge computing power and bandwidth significantly increases, but the scale and capabilities of edge facilities are limited. CPUs primarily meet the computational needs of core business, leaving little time for handling networking, storage, and security, which are not their forte. The introduction of DPU can greatly reduce the consumption of CPU resources for such tasks, while using specialized hardware to enhance processing performance, thereby significantly improving the processing capabilities of edge computing.
In the intelligent computing scenario, DPU also has a vast market space. DPU, through high-performance networking and domain-specific hardware offloading, provides intelligent computing centers with infrastructure capabilities of large bandwidth, high throughput, and low latency, thereby eliminating data IO bottlenecks and unleashing computational power. This makes DPU a must-have for the infrastructure of intelligent computing centers, significantly enhancing the computational efficiency ratio of computing clusters.
Diverse application scenarios bring abundant business opportunities for DPU, and in the future, DPU is expected to further expand into fields such as autonomous driving, artificial intelligence, and the metaverse.
03
A Flourishing DPU Battlefield
As DPU technology solutions become more mature and data centers accelerate their deployment worldwide, manufacturers like Nvidia and Intel are mass-producing data processing chips DPU/IPU. The global DPU market is expected to witness explosive growth in the coming years.The DPU industry has a high market concentration. According to data from the Head Leopard Research Institute, in recent years, the domestic DPU market has been dominated by the three international giants NVIDIA, Broadcom, and Intel, with market shares of 55%, 36%, and 9%, respectively. Companies such as Xilinx, Marvell, Pensando, Fungible, Amazon, and Microsoft have also been producing DPU or similar architecture products in the past 2-5 years, which is relatively earlier compared to domestic counterparts.
NVIDIA's Data Center "Ambition"
Among the many companies listed in the table above, NVIDIA has a first-mover advantage. In March 2019, NVIDIA spent $6.9 billion to acquire the Israeli chip company Mellanox. NVIDIA combined Mellanox's ConnectX series of high-speed network card technology with its own existing technology and officially launched two DPU products, the BlueField-2 DPU and BlueField-2X DPU, in 2020, officially opening the prelude to DPU development.
Today, NVIDIA's BlueField series of chips has reached its third generation. The NVIDIA BlueField-3 DPU is an infrastructure computing platform that supports speeds of 400Gb/s, capable of line-rate processing of software-defined networking, storage, and cybersecurity tasks. BlueField-3 combines powerful computing capabilities, high-speed networking, and extensive programmability, providing software-defined hardware acceleration solutions for demanding workloads. From accelerating AI to hybrid cloud and high-performance computing, to 5G wireless networks, BlueField-3 redefines various possibilities.
The main functions of NVIDIA's DPU are data security, network security, and storage offloading. In NVIDIA's layout of DPU, it is also evident that it has ambitions in the application field of data centers. Some people say that NVIDIA is "trying to use DPU to replicate the path of GPU replacing display accelerator cards to become a universal display chip."
Intel Launches IPU to Challenge Data Centers
In June 2021, Intel introduced a new IPU product (which can be seen as Intel's version of DPU), integrating FPGA with the Xeon D series processor, becoming a strong competitor in the DPU race. The IPU is an advanced networking device with enhanced accelerators and Ethernet connections, using tightly coupled, dedicated programmable cores to accelerate and manage infrastructure functions. During the 2022 Vision Global User Conference, Intel also unveiled the IPU development roadmap, showcasing products and platforms that will be released in the next three years and beyond.
In the development roadmap, Intel revealed three IPU products, corresponding to ASIC, IPU platform, and SmartNIC. In addition, Intel also revealed two development paths, one based on a dedicated ASIC chip IPU, codenamed Mount Evans; the other is based on two FPGA architecture acceleration solutions, codenamed Oak Springs Canyon IPU platform.
In response to the differences in the application characteristics of the two types of IPUs, Intel also provided the latest interpretation here. If it is an IPU based on the FPGA architecture, it can quickly meet market demands, support the continuous evolution of network standards, and through repeatable programmable features, as well as secure data transmission paths, it can flexibly handle a variety of specific workloads. If it is an IPU based on the ASIC architecture, it can provide the best combination of performance and power consumption, and can be used to ensure the security of tasks in the fields of networking and storage.The development blueprint indicates that Intel's second-generation IPU was launched in 2022, including Mount Evans (Intel's first ASIC IPU) and Oak Springs Canyon (Intel's second-generation FPGA IPU), which are currently being shipped to Google and other service providers. The third-generation 400 GB IPUs, code-named Mount Morgan and Hot Springs Canyon, are expected to begin shipping to customers and partners in 2023/2024. The next-generation 800GB IPUs are anticipated to start shipping to customers and partners in 2025/2026.
Furthermore, Xilinx has introduced the DPU processor—Alveo SmartNIC product portfolio. DPU can be used as a standalone embedded processor, but it is typically integrated into SmartNICs. Broadcom has thestingray, while Marvell possesses the OCTEON and ARMADA product series.
Compared to the CPU and GPU arenas, the DPU is undoubtedly a brand-new playing field. With the exponential increase in network traffic, the DPU market outlook is vast. As international giants intensify their DPU business layout, the domestic chip market also frequently brings good news.
04
Domestic Manufacturers Showcase Their Strengths
In recent years, the country has been continuously promoting the rapid development of the digital economy. Computational infrastructure is an essential foundation for the digital economy, and computing power and high-performance networking have become the core capabilities of computational infrastructure. Especially driven by demands such as artificial intelligence and edge computing, high-performance networking and DPU have become increasingly important.
Recently, six departments including the Ministry of Industry and Information Technology, the Cyberspace Administration of China, the Ministry of Education, the National Health Commission, the People's Bank of China, and the State-owned Assets Supervision and Administration Commission of the State Council, jointly issued the "Action Plan for High-Quality Development of Computational Infrastructure." The plan specifically outlines the main goals, key tasks, and safeguard measures for the development of computational infrastructure by 2025. It emphasizes the need to carry out technology upgrades and pilot applications of DPU and other technologies for scenarios such as intelligent computing, supercomputing, and edge computing, to achieve high-performance transmission in computational center networks. This is the first time a national-level document has pointed out the direction for DPU development over the next three years.
With the rapid development of the DPU industry, a large number of DPU companies have also emerged domestically.Zhongke Yushu
Zhongke Yushu, based on its self-developed agile heterogeneous KPU chip architecture and DPU software development platform HADOS, has independently developed the industry's first DPU chip and standard acceleration card series products that integrate high-performance networking and database acceleration functions. These products can be widely applied to scenarios such as ultra-low latency networks, big data processing, 5G edge computing, and high-speed storage, helping computing power become the new productivity of the digital era.
In terms of R&D iteration of DPU products, Zhongke Yushu taped out the first-generation DPU chip K1 in 2019, and the second-generation DPU chip K2 was successfully taped out at the beginning of 2022. Currently, the company has started the development of the third-generation DPU chip K2 Pro. In response to key performance bottlenecks and business needs in data centers, Zhongke Yushu has also launched a series of products such as RDMA acceleration cards and cloud-native network acceleration cards based on its self-developed DPU chips. These support the interconnection of ultra-large-scale networking computing power, to support the construction of the computing power base with the necessary 100G+ ultra-high bandwidth and low latency. This allows more CPU/GPU computing power to truly serve the business, providing a complete set of solutions with higher performance and better computing power for the construction of intelligent computing centers.
Under the wave of domestic construction, Zhongke Yushu is also fully embracing the domestic ecosystem and actively carrying out product compatibility certification with domestic industry chain upstream and downstream manufacturers. Currently, Zhongke Yushu has completed compatibility adaptation with six major domestic CPU chip manufacturers, 12 mainstream operating systems, nine mainstream database manufacturers, eight leading cloud/cloud-native manufacturers, and 17 top-level server manufacturers.
Xinqi Yuan
Xinqi Yuan has a fully independent intellectual property rights DPU chip. The Xinqi Yuan DPU provides greater processing power, stronger flexibility, programmable packet processing, and an expandable Chiplet structure compared to traditional smart network cards. The chip design adopts the NP-SoC model, combining the general ARM architecture with a highly optimized packet-oriented NP chip (RISC-V core) and a multi-threaded processing mode, enabling it to achieve the data processing capabilities of ASIC solidified chips. At the same time, it takes into account the attributes of full programmability and flexible scalability to support performance goals of 400Gbps and above, low power, and cost-effectiveness.
Xinqi Yuan began developing the first generation of FPGA smart network cards in 2019 and started launching the second generation of products based on the NP-SoC architecture in 2020, gradually entering the market. Now, Xinqi Yuan has launched a DPU chip smart network card based on the SoC-NP architecture, featuring programmability, scalability, and high performance. It has been mass-produced and shipped, commercially landed, and can adapt to a wide range of application scenarios, becoming one of the first domestic chip companies to enter the DPU field in the true sense.
It is reported that the new generation of NFP-7000 DPU chip under development by Xinqi Yuan will benchmark against Nvidia's BlueField-3 and promote the domestication of industry network cards with the "general-purpose chip + customized software" model. From the design goal perspective, the performance and functionality of this chip are no less than Nvidia's BlueField-3. Moreover, the chip will set its capability range according to different scenario needs in the future, which will greatly reduce the cost of the chip and better meet the multi-scenario needs of domestic chips.
Yunbao Intelligence
Currently, Yunbao Intelligence leads the domestic data center scenario with a domestically produced DPU chip solution. The Yunbao Intelligence DPU SoC is the first general-purpose programmable DPU chip in China, with rich programmability and complete DPU functions, supporting different cloud computing scenarios and unified management of resources, optimizing the utilization rate of data center computing resources.Cloud Leopard DPU provides a new generation of computing platforms for cloud computing, data centers, artificial intelligence, and edge computing by offloading, accelerating, and isolating various high-speed networks, elastic storage, security services, and reliable operation and management.
Currently, Cloud Leopard Intelligence has launched in-depth cooperation with leading cloud computing companies, telecommunications operators, and state-owned enterprises to jointly promote the industrialization of the DPU industry.
Dayu Zhixin
Dayu Zhixin also has successful experience in DPU design and development and large-scale commercial deployment of DPU. Dayu Zhixin's Paratus series DPU products are gradually introduced to the broad commercial market with easy-to-use and practical DPU products through three parallel product lines:
Paratus 1.0, as the first product line of Dayu Zhixin DPU, uses ARM SoC as the main processing unit, providing multiple 10Gbps/25Gbps service network interfaces, and for user convenience in management, a separate RJ45 management port is set up.
Paratus 2.0, as the second product line of Dayu Zhixin DPU, adopts the hardware architecture of ARM SoC + FPGA. Based on the Paratus 1.0 product, it uses FPGA to achieve high-performance forwarding of data packets with solidifiable logic, providing multiple 10G/25G, 100G service network interfaces.
Paratus 3.0, as the third product line, will adopt a self-developed DPU chip by Dayu Zhixin. This chip will integrate the company's understanding of DPU-related technologies and future application scenarios, along with valuable customer feedback and experience accumulated from the actual deployment of the first two product lines (Paratus 1.0 and Paratus 2.0), to form a highly integrated DPU product.
Alibaba Cloud CIPU
At the 2022 Alibaba Cloud Summit, Alibaba Cloud officially released the Cloud Infrastructure Processor (CIPU). The predecessor of CIPU is the MoC card (Micro Server on a Card), which is the essence of the Shenlong architecture. The MoC card has independent I/O, storage, and processing units, undertaking the work of network, storage, and device virtualization.
The first and second generations of MoC cards solved the problem of zero overhead in the narrow sense of computing virtualization, while the virtualization of the network and storage parts was still implemented by software. The third generation of MoC cards achieved partial hardening of network forwarding functions, significantly improving network performance. The fourth generation of MoC cards achieved full hardware offloading of network and storage, and also supported RDMA capabilities.In addition to the companies listed above, several outstanding domestic manufacturers such as Nebula Intelligence and Ruiwen Technology have seized the market with their advantages in technological innovation and product definition, following a differentiated path. However, it is important to note that the domestic DPU industry is still in its early stages of development. For domestic DPU companies, the most critical task at hand is to first produce actual products and test them in application scenarios. After all, as an emerging technological field, the development of DPU products is challenging, and the market has extremely high demands for their performance, stability, and security.
05
DPU Enters a Period of Explosive Growth
According to data from CCID Consulting, the global DPU market size will exceed ten billion US dollars starting from 2023 and enter a fast track with an annual growth rate of over 50%. The Chinese DPU market size is also expected to exceed 30 billion yuan in 2023, showing a leapfrog growth, with the domestic market size projected to reach 56.59 billion yuan by 2025, achieving a five-year compound annual growth rate of 170.60%.
At present, cloud computing leaders including Amazon, Alibaba Cloud, and Huawei are all developing DPU product lines that meet their own requirements.
In addition to data centers, intelligent driving, data communication, and network security are also downstream application fields for DPU.
Furthermore, DPU and DOCA are of great significance for large models and generative AI. According to Gartner, it is expected that by 2026, more than 80% of enterprises will use generative AI application programming interfaces (APIs) or models, or deploy applications that support generative AI in related production environments. Statistics show that this proportion was less than 5% in 2023, which means that within just three years, the number of enterprises adopting or creating generative AI models is expected to grow by 16 times.
This implies that the next three years are a window of opportunity for the explosive growth of generative AI and a period of opportunity for the popularization of BlueField DPU and DOCA.
Comment