With the breakout of ChatGPT, it is evident that the entire society is paying close attention to generative artificial intelligence technology. As the number of large models and the volume of model parameters continue to surge, the demand for computing power is also increasing.
According to the definition in the "China Computing Power Development Index White Paper," computing power is the computational ability of a device to process data and achieve specific output results.
The core of computing power lies in various computing chips such as CPUs and GPUs, and it is carried by computers, servers, and various intelligent terminals. The processing and calculation of massive data and various digital applications are inseparable from the processing and computation of computing power.
Advertisement
So, what kind of computing power chips are suitable for different application scenarios, and what are the differences between different computing power chips?
01
What Kind of Computing Power Chips Are Needed for Different Scenarios
From small devices like earphones, smartphones, and PCs, to large-scale applications such as automobiles, the internet, artificial intelligence, data centers, supercomputers, and space rockets, "computing power" plays a central role. Different computing power scenarios have varying requirements for chips.
Data centers, as the core infrastructure of the digital age, bear a large amount of data processing, storage, and transmission tasks. Therefore, they require powerful computing power to cope with various complex computational demands. Data centers and supercomputers need high computing power chips above 1000 TOPS. Currently, the computing power of supercomputing centers has entered the era of E-level computing power (hundreds of millions of billions of operations per second) and is developing towards Z-level (thousands of E) computing power. Data centers have extremely high requirements for the chips' low power consumption, low cost, reliability, and generality.Intelligent autonomous driving involves numerous aspects such as human-machine interaction, visual processing, and intelligent decision-making. The continuous increase in vehicle-mounted sensors (lidar, cameras, millimeter-wave radar, etc.) and the ever-growing demands for real-time, complexity, and accuracy in data processing have set higher standards for onboard computing power. Typically, the industry estimates that achieving Level 2 (L2) automated driving assistance requires less than 10 TOPS of computing power, Level 3 (L3) requires 30-60 TOPS, Level 4 (L4) requires over 300 TOPS, Level 5 (L5) requires over 1000 TOPS, and even more than 4000 TOPS. Therefore, the computing power required for autonomous driving far exceeds that commonly found in smartphones and computers. For instance, the processor computing power of the NIO ET5 reaches 1016 TOPS, and the processor computing power of the Xiaopeng P7 is 508 TOPS. In intelligent driving, safety is paramount, hence the scenario places extremely high demands on the reliability of computing chips, while the requirements for chip versatility are relatively high, and the demands for power consumption and cost are comparatively less stringent.
To address the challenges of current complex tasks such as video processing, facial recognition, and anomaly detection, while ensuring that the system has ample computing resources for future technological upgrades and expansions, intelligent security systems require approximately 4-20 TOPS of computing power. Although this figure is much smaller compared to data centers, it is sufficient to ensure the efficient and stable operation of intelligent security systems. As AI security enters its second half, the importance of computing power becomes increasingly prominent, and this figure is also on the rise. Intelligent security has a high demand for low cost and reliability, with moderate requirements for power consumption and versatility.
In the field of intelligent mobile terminals, small products such as wearable devices have relatively low demands for computing power, but the demand for computing power in products like smartphones and laptops is rapidly increasing. For example, the A14 chip in the iPhone 12 from a few years ago had a computing power of about 11 TOPS, while the Snapdragon 865 chip equipped in the Xiaomi 10 had a computing power of 15 TOPS. However, with the increasing integration and popularization of AI technology in smartphones, the computing power of the Snapdragon 888 has reached 26 TOPS, and subsequent chips like the 8Gen1 and 8Gen2 have made significant improvements in computing power. Intelligent mobile terminals are also application scenarios with high demands for low power consumption and low cost, with relatively high requirements for reliability and few restrictions on versatility.
02
Mainstream Computing Chips and Their Characteristics
Currently, basic computing power is primarily provided by servers based on CPU chips, which are oriented towards basic general computing. Intelligent computing power is mainly provided by accelerated computing platforms based on chips such as GPUs, FPGAs, and ASICs, which are oriented towards artificial intelligence computing. High-performance computing power is mainly provided by computing clusters that integrate CPU and GPU chips, mainly oriented towards applications such as scientific and engineering computing.
CPU is the king of traditional general computing, comprising main components such as the arithmetic logic unit, control unit, and memory. Data is stored in memory, the control unit fetches data from memory and hands it over to the arithmetic logic unit for computation, and after the computation is completed, the results are returned to memory. The characteristic of the CPU is its strong versatility, capable of handling various types of computing tasks, but its computational efficiency is not as high as chips specifically designed for certain tasks.
GPU was initially used to accelerate graphics rendering and is also known as a powerful tool for graphic processing. In recent years, GPUs have shown excellent performance in fields such as deep learning and have been widely used in artificial intelligence computing. The characteristic of the GPU is its large number of parallel computing units, capable of processing a large amount of data simultaneously, making it very efficient in parallel computing tasks. However, the versatility of the GPU is not as high as that of the CPU and is only suitable for specific types of computing tasks.
ASIC is a chip designed specifically for certain tasks. It implements algorithms through hardware, achieving extremely high computational efficiency and energy efficiency in specific tasks. The characteristic of ASIC is its strong specificity, applicable only to specific tasks, but its computational efficiency and energy efficiency far exceed those of CPUs and GPUs, making it suitable for products with large scale or high maturity.FPGAs utilize gate circuits for direct computation, which is relatively fast. Compared to GPUs, FPGAs offer higher processing speeds and lower power consumption, but they still fall short when compared to ASICs under the same process conditions. However, FPGAs can be programmed, making them more flexible than ASICs. FPGAs are suitable for rapid iteration or small-batch production, and in the AI field, FPGA chips can serve as acceleration cards to speed up the computation of AI algorithms.
GPGPU, or General-Purpose computing on Graphics Processing Units, has the first "GP" standing for general-purpose, while the second "GP" indicates graphics processing. The primary goal is to leverage the parallel computing capabilities of GPUs to accelerate general computing tasks. In layman's terms, GPGPU can be understood as a tool that assists the CPU in performing computations unrelated to graphics. It is suitable for large-scale parallel computing scenarios, such as scientific computing, data analysis, and machine learning.
03
GPUs are the optimal solution for AI, but not necessarily the only one.
In the AI boom triggered by ChatGPT, GPUs are the most popular choice. To advance AI, leading tech giants around the world are competing to stockpile NVIDIA's GPUs. Why are GPUs favored by many manufacturers in the AI era?
The reason is straightforward: AI computing is similar to graphics computing, involving a large number of high-intensity parallel computing tasks.
Specifically, training and inference are the cornerstones of large AI models. During training, a complex neural network model is trained by inputting a vast amount of data. During inference, a trained model uses a large amount of data to deduce various conclusions.
The training and inference processes of neural networks involve a series of specific algorithms, such as matrix multiplication, convolution, recurrent layer processing, and gradient computation. These algorithms can typically be highly parallelized, meaning they can be broken down into numerous small tasks that can be executed simultaneously.
GPUs possess a multitude of parallel processing units that can rapidly perform the matrix operations required in deep learning, thus accelerating the training and inference of models.
Currently, most enterprises use NVIDIA's GPU clusters for AI training. With proper optimization, a single GPU card can provide the computational power equivalent to dozens or even hundreds of CPU servers. Companies like AMD and Intel are also actively enhancing their technical capabilities to compete for market share. Leading Chinese manufacturers include Jingjia Micro, Loongson Technology, Haiguang Information, Cambricon, and Sino IC Design, among others.It can be observed that in the field of AI, GPUs are leading the pack, just as NVIDIA positions itself as a leader in artificial intelligence. It is evident that almost all current applications of AI are inseparable from the presence of GPUs.
At this juncture, one might question whether GPUs alone are sufficient in the era of prevailing AI. Will GPUs monopolize the future AI market, becoming the undisputed favorite?
The author believes that this is not the case. While GPUs are indeed the optimal solution at present, they are not necessarily the only solution.
CPUs can play a more significant role.
Although GPUs currently dominate the AI field, they also face certain challenges and limitations. For instance, supply chain issues with GPUs have led to price increases and shortages, which are burdens for AI developers and users. CPUs, on the other hand, have more competitors and partners, which can promote technological advancement and reduce costs. Moreover, CPUs also have more optimization techniques and innovative directions, allowing them to play a greater role in the AI field.
Some more streamlined or compact models can also demonstrate excellent operational efficiency on traditional CPUs, often being more cost-effective and energy-saving. This proves that when choosing hardware, it is necessary to weigh the advantages of different processors based on specific application scenarios and model complexity. For example, Julien Simon, the Chief AI Evangelist at HuggingFace, demonstrated a language model Q8-Chat based on the Intel Xeon processor. This model, with 7 billion parameters, can run on a 32-core CPU and provide a chat interface similar to OpenAI's ChatGPT, capable of quickly answering user questions and at a much faster speed than ChatGPT.
In addition to running ultra-large-scale language models, CPUs can also run smaller and more efficient language models. These language models, through innovative technologies, can significantly reduce computational load and memory usage, thus adapting to the characteristics of CPUs. This also means that CPUs have not been completely marginalized in the AI field but possess undeniable advantages and potential.
The global CPU market is dominated by a duopoly of Intel and AMD, with a combined market share exceeding 95%. Currently, six major domestic CPU manufacturers, including Loongson, Sunway, Haiguang, Zhongxin, Kunpeng, and Phytium, are rapidly emerging, accelerating the development process of domestic CPUs.
CPU + FPGA, CPU + ASIC also have potential.
Not only that, due to the heterogeneous nature of AI acceleration servers, in addition to the CPU + GPU combination, there are various other architectures on the market, such as: CPU + FPGA, CPU + ASIC, CPU + multiple acceleration cards.The transformation of technology is rapid, and it is indeed possible that more efficient and AI-computing-tailored new technologies will emerge in the future. One of the possibilities for the future is the combination of CPU + FPGA and CPU + ASIC.
CPUs excel at logical control and serial processing, while FPGAs possess parallel processing capabilities and hardware acceleration features. By combining the two, the overall performance of the system can be significantly enhanced, especially when dealing with complex tasks and large-scale data. The programmability of FPGAs allows for flexible configuration and customization based on specific application scenarios. This means that the CPU + FPGA architecture can adapt to a variety of different needs, from general computing to acceleration of specific applications, all of which can be achieved by adjusting the FPGA configuration.
ASICs, on the other hand, are integrated circuits specifically designed for particular applications, and thus they are usually highly optimized in terms of performance and power consumption. When used in conjunction with CPUs, they can ensure that the system has excellent performance and efficiency when handling specific tasks. Additionally, the design of ASICs is fixed, and once manufactured, their functionality will not change. This makes ASICs perform well in scenarios that require long-term stable operation and high reliability.
The global FPGA chip market is primarily dominated by a duopoly of Xilinx and Intel, with a combined market share as high as 87%. Major domestic manufacturers include Fudan Microelectronics, Unigroup Guoxin, and Anlogic Technology. International giants such as Google, Intel, and NVIDIA have successively released ASIC chips. Domestic manufacturers like Cambricon, HiSilicon, and Horizon Robotics have also launched ASIC chips that accelerate deep neural networks.
GPGPUs can utilize higher-level programming languages and are more powerful in terms of performance and versatility, making them one of the mainstream choices for AI acceleration servers at present. The core manufacturers of GPGPUs mainly include NVIDIA, AMD, Biren Technology, MuXi, and TianShu Zhixin, among others.
04
What is the scale of China's computing power?
According to IDC's forecast, the amount of new data generated globally in the next three years will exceed the total of the past 30 years. By 2024, the total global data volume is expected to grow at a compound annual growth rate of 26% to 142.6 Zettabytes (ZB). This will lead to an exponential increase in the demand for data storage, data transmission, and data processing, continuously raising the demand for computing power resources. Additionally, large-scale model training and inference for scenarios such as artificial intelligence also require a robust supply of high-performance computing power.
In recent years, China has made significant progress in the construction of computing power infrastructure.
By the end of 2023, the total scale of data center racks in use nationwide exceeded 8.1 million standard racks, with a total computing power reaching 230 ExaFLOPS (EFLOPS). Computing power is accelerating its penetration into various industries and fields such as government affairs, industry, transportation, and healthcare. At the same time, under the layout of the "East Data West Computing" project and the national integrated computing power network, the China Computing Power Network - Smart Computing Network Phase I has already been launched, and the national computing power "one network" has taken shape.On the policy front, China has successively introduced a series of documents such as the "National Integrated Large Data Center Collaborative Innovation System Computational Hub Implementation Plan," the "High-Quality Development Action Plan for Computational Infrastructure," and the "14th Five-Year Plan for Digital Economy Development" to promote the construction of computational infrastructure. In addition, the country is promoting the construction of intelligent computing centers in multiple regions, gradually expanding from east to west. Currently, more than 30 cities in China are building or proposing to build intelligent computing centers. According to the policies issued by the Ministry of Science and Technology, "In the public computational platforms with mixed deployment, the nominal value of computational power provided by independently developed chips should not be less than 60%, and domestic development frameworks should be given priority, with a usage rate not lower than 60%." The penetration rate of domestic AI chips is expected to increase rapidly. According to IDC data, China's intelligent computing power is expected to grow rapidly in the future, with a compound annual growth rate of 52.3% from 2021 to 2026.
Comment