Multi-chip design pushes complexity to the limit

Continuing to employ advanced packaging technologies for ongoing miniaturization will necessitate a transformation across the entire semiconductor ecosystem.

Multi-chip designs pose challenges in managing design complexity, driving up the cost per transistor, compressing market windows, and spurring the entire chip industry to race for new tools and methods. For decades, the entire semiconductor design ecosystem (from EDA and IP providers to wafer fabs and equipment manufacturers) has evolved based on the assumption that more functionality can be integrated into chips and packages while improving power consumption, performance, and the area/cost equation. However, as integrating all these functionalities into a single chip or package becomes more challenging, the complexity of developing these devices has increased dramatically.

It is estimated that advanced packaging technologies in the near future will accommodate one trillion transistors, and tightly controlling power consumption, performance, and area/cost (PPA/C) will require significant shifts at every stage of the design-to-manufacturing process.

"The industry is not ready yet, but we are moving in that direction," said Sutirtha Kabir, Senior Architect of Research and Development Engineering at Synopsys. "We believe there are steps between today and that year, whether it's 2030 or earlier. Suppose you take an SoC and fold it [a simple 3D-IC analogy], and suppose all you do is put them into two chips with the same functionality, without any other changes. Your transistor count doesn't change, but what you're doing in the process is adding interfaces between these two chips, whether it's bumps or hybrid bonding interconnects (HBI)."

Designs that were previously completed on a single chip have become more complex due to functionalities now distributed across multiple chips or chip sets. "Essentially, tasks that were previously straightforward have become more difficult," said Ron Press, Senior Director of Technology Enablement for Tessent Silicon Lifecycle Solutions at Siemens EDA. "Do you remember Bill Gates' famous quote in 1981, '640K of memory should be enough, right?' That was applicable at the time. Complexity is the driving force behind EDA. Once a task becomes too difficult to perform using traditional methods, some form of abstraction and automation becomes necessary. From the early days of electronics, this has driven the compilation of programming languages to silicon design and many EDA tools. Thus, the definition of complexity is always relative to the current level of technology."

This further exacerbates the complexity brought about by higher data rates. "If you look at the relationship between data rates and time, for 2G, 2.5G, 3G, 4G, 5G, the data rates they support have roughly kept pace with the growth of Moore's Law, which also confirms the continuous increase in complexity," noted Chris Mueth, Director of New Market Management at Capgemini. "The 2G phones of yesteryear were composed of a bunch of components—transistors, small modules, and discrete components. Back then, phones were filled with electronic components, leaving little room for additional functionality. But now everything is integrated. The size of the modules is almost the same as the IC chips of old, containing everything inside. And 3D-IC will take it to a new level."

This also significantly increases verification challenges. "In the 2.5G era, a phone might have had 130 specifications, while a 5G phone might have 1,500 specifications that need to be verified," Mueth said. "Now there are many different frequency bands, different operating modes, different voltages, digital controls, and so on. You have to verify everything before shipping because the last thing you want is to find a problem after the phone has already hit the market."All of this has led to a huge increase in complexity and is seriously disrupting long-standing chip design methods.

25 chips in one, Tesla's wafer-level Dojo processor has been put into mass produ

Scared by Morris Chang's words, he has no regrets about leaving TSMC

Three key challenges facing artificial intelligence chips in advanced packaging

The cost is too high, TSMC can't hold on

Hsinchu, carrying India's chip dream?

Once the best-selling PC brand: Compaq

Vision Pro price plunge, is Apple panicking?

US semiconductor stocks suddenly surged across the board

Is there still hope for asynchronous design?

Quantum chips are in the spotlight again

"In the past, a single-chip designer might have worried about these issues, but that was more of a packaging problem," said Kabir from Synopsys. "Let the packaging people worry about it. The chip design team only needs to work up to the pins. RDL bump connections will always have something happening. But now, because the connections between signals and signals are made through these bumps between the chips, chip designers must worry about this issue. What we're seeing this year is that we started with millions of bumps, and now the number of bumps is rapidly increasing to about 10 million, and it's expected that within two or three years, multi-chip designs will include 50 million HBI connections."

Others share the same view. "In the many years I've been in this industry, I've always felt that we were solving the most complex problems of the time," noted Arif Khan, Senior Director of Product Marketing for Design IP at Cadence. "Moore's Law applied to monolithic systems until it hit the mask limit and process limitations. Transistor density did not grow linearly with process technology advancements, and our demand for increasingly complex designs continued unabated, pushing us to the physical limits of the photolithography image field (mask limit). It is estimated that NVIDIA's GH100 design has over 140 billion transistors, with a chip size of 814 square millimeters, using a 4-nanometer process."

Shrinking on Multiple Dimensions

As advanced process technologies become more complex, wafer costs exceed historical norms. When combined with the gradual decline in transistor scaling with each new generation of process, the cost per transistor at each successive leading node is higher than the previous generation.

"This poses a dilemma for design because the cost of designing and manufacturing at newer process nodes is much higher," Khan said. "Larger designs naturally produce fewer wafers. When considering random defects, the loss in yield is greater when the wafer size is larger, and a smaller fraction of the denominator will be unusable unless these wafers can be repaired. As process technology advances beyond 5 nanometers, extreme ultraviolet technology has reached the limit of single-layer lithography. High numerical aperture EUV technology is now coming into play, doubling the magnification and allowing for smaller pitches, but halving the mask size. Therefore, today's increasingly complex and larger designs have no choice but to be broken down, and chiplet technology is the holy grail."

At the same time, there is a greater focus on adding new features to the design, with the main limitation being the mask size. This adds a whole new level of complexity.

"In the heyday of IBM mainframes and Intel/AMD x86 servers, everything was about clock speed and performance," observed Ashish Darbari, CEO of Axiomise. "Due to the Arm architecture, from the late 90s, power consumption became the dominant driving factor in the industry, and as chips were compressed into smaller form factors such as mobile phones, watches, and micro-sensors, performance, power, and area (PPA) determined the quotient of design complexity. According to a 2022 report by Wilson Research, it is reported that 72% of ASIC power management is active, and power management verification is a growing challenge. However, with the rapid application of silicon in automotive and IoT, functional safety and design complexity have taken the lead. When designing chips, you cannot ignore power, performance, and area (PPA) — as well as security and/or confidentiality.According to the Wilson Research report by Harry Foster, 71% of FPGA projects and 75% of ASIC projects consider both security and confidentiality. With the emergence of "Meltdown" and "Spectre" (2018), along with a series of ongoing chip security vulnerabilities, including "GoFetch" in 2024—security issues are proving to be a direct result of design complexity. What's worse, security vulnerabilities often stem from performance-enhancing optimizations, such as speculative prefetching and branch prediction.

"To achieve low-power optimization, designers have employed selective state retention, clock gating, clock dividers, hot and cold resets, and power islands, which introduce verification challenges in clock and reset validation," said Darbari. "Multi-speed clocks introduce challenges regarding glitches, clock domain crossing, and reset domain crossing."

Although computational performance has always dominated the design field, it is now just one of many factors, such as mobility and access to the increasing amount of data generated by sensors and artificial intelligence/machine learning. "HBMs are one of the cornerstones of AI/ML chips, which is also the direction our industry is heading," said Darbari. "If you look at the broader range of design complexity, beyond PPA, security, and confidentiality, we should note that in the era of having hundreds of cores on a single chip and AI/ML, we are re-examining the design challenges of high-performance computing, while minimizing the power footprint and optimizing arithmetic (fixed-point/floating-point) data formats and correctness. Moving data faster at low power, using high-performance NoCs, introduces deadlock and livelock challenges for designers. The RISC-V architecture has opened the door for anyone to design processors, leading to clever designs that can serve as both CPUs and GPUs, but the foundational design complexity regarding PPA, security, confidentiality, as well as deadlock, livelock, and compute and memory-intensive optimizations, will remain as relevant for RISC-V as it was before the RISC-V era. Over the past six years, a significant amount of work has been invested in establishing compliance between RISC-V micro-architectural implementations and the RISC-V Instruction Set Architecture (ISA), using simulation for startup testing and formal methods to mathematically prove compliance. RISC-V validation, especially for low-power, multi-core processors, will open a Pandora's box full of verification challenges, as not many design companies have the same level of verification capabilities as more established companies. Wilson Research's report suggests that for ASICs, 74% of design surveys have one or more processor cores, 52% have two or more cores, and 15% have eight or more processor cores—something we are seeing more of in our experience with deploying formal verification."

Solutions to Complexity Challenges

Complexity challenges are addressed by continuously building on the capabilities of previous generations through automation and abstraction.

"Over time, more and more trade-offs and optimizations are embedded into EDA tools, so users can provide less complex 'intent' commands, allowing the tools to do the difficult and tedious work," said Press of Siemens. "Innovation is necessary to tackle some of the complexity, such as how to communicate between devices and sort data. In the testing community, scan is a method of transforming the design into shift registers and combinational logic. Scan makes automatic test pattern generation possible, so EDA tools can generate high-quality test patterns without someone having to understand the functional design. As data and test times become too large, embedded compression is used to improve efficiency."

Darbari also agrees. "Testing and verification have evolved from architectural validation kits in the 70s and 80s to constrained random, formal verification, and simulation. Each new verification technology deals with designs at different levels of abstraction, and if used correctly, they can be complementary. While simulation can reason about function and performance at the entire chip level, constrained random and formal are good techniques at the RTL level, and formal verification is the only technique for constructing defect proofs. We are seeing an increase in the application of formal verification in architectural validation, as well as in finding deadlocks, livelocks, and logic-related errors."

There are other types of complexity as well. "You can define complexity based on the application domain and where it occurs in the process," said Frank Schirrmeister, Vice President of Solutions and Business Development at Arteris. "You can define complexity based on the system you are going to build. Obviously, when you consider systems, you can go back to the old V-shaped diagram, which gives you a sense of complexity. Then, you can define complexity based on technology nodes and process data. In addition, there is a very traditional definition of complexity, which is addressed by raising the level of abstraction. But what happens next?"Chiplets

The answer is chiplets, but as chiplets and other advanced packaging methods gradually become popular, designers must confront numerous issues.

"Chiplets provide a modular solution to this increasing complexity," says Khan from Cadence. "For instance, a complex SoC designed at the 'N' process node has many subsystems—computing, memory, I/O, etc. Moving to the next node (N+1) to add additional performance/features does not necessarily bring significant benefits, considering the limited scaling improvements and other factors (development time, cost, yield, etc.). If the original design is modular, then only those subsystems that benefit from process scaling need to migrate to the advanced node, while other chiplets remain at the older process node. Breaking down the design so that each subsystem matches its ideal process node addresses a key aspect of development complexity. There is an initial cost for designing the decomposed architecture, but subsequent generations reap significant benefits in terms of reduced development costs and increased SKU generation options. Leading processor companies like Intel (Ponte Vecchio) and AMD (MI300) have adopted this approach."

Customizing chiplets to achieve the ideal power consumption, performance, area/cost is particularly important to manage costs and time to market. "New features can be added without redesigning the entire chip, allowing the design to hit the market window while maintaining the product refresh rhythm, which would otherwise slow down due to the development and productization time required at advanced nodes," Khan says. "The 'Nirvana' of the chiplets market, conceived by companies like Arm, proposes a chiplets system architecture to standardize chiplets types and partitioning choices (within its ecosystem). SoC designers still need to customize designs for their secret sauce, which provides differentiation in their implementations. Automation will be a key driver in reducing complexity here. In the past few years, the complexity of inter-chip communication has been largely mitigated through chip-level standards such as UCIe. However, designers must overcome additional implementation complexities when transitioning from 2.5D IC flows to 3D-IC flows. How to logically partition between chiplets to provide the best partitioning with direct chip-to-chip interconnections for stacked chips? The next frontier is to shift this complex problem from the user partitioning domain to automated, AI-driven design partitioning. One can envision a generation of AI processors becoming the main force for the next generation of chiplets-based processors for design."

Meanwhile, chiplets introduce a new dimension of verification—verifying inter-chip communication based on the UCIe protocol, while also understanding the complexities of latency and thermal issues.

In other words, chiplets represent another evolution in design growth and expansion, says Press from Siemens. "Like many previous technologies, standards that enable more plug-and-play approaches are important. Designers should not deal with increasingly complex tradeoffs but should adopt methods that eliminate difficult tradeoffs. In the scan test field, packaged scan delivery can eliminate entire layers of complexity, allowing chiplet designers to optimize the design test and pattern of the chiplet only. With plug-and-play interfaces and self-optimizing pattern delivery, users do not need to worry about core or chiplet embedding or I/O pins to get scan data to the chiplet. The idea is to simplify the problem with plug-and-play methods and automatic optimization."

How to Best Manage Complexity

Given the multitude of considerations and challenges involved in multi-chip designs, complexity is not easily managed. However, there are methods that can help address this issue.

Darbari from Axiomise points out that leveraging more advanced techniques, such as formal verification, with an intent to shift verification left can have a significant impact on the outcome. "Using formal verification early in the DV process ensures that we catch errors faster, find errors in edge cases, establish proofs of absence of errors, ensure freedom from deadlocks and livelocks, and achieve coverage to find unreachable code. Simulation based on constraints and random stimuli should only be used when formal verification is not applicable."However, there is another side to the coin. In many cases, complex problems cannot be solved for the entire chiplet. "You have to break it down into pieces," says Kabir from Synopsys. "Address the smaller issues, but ensure that you are solving the larger problem. In multi-chip designs, this is the biggest challenge. We are still looking at 'This is a thermal issue. No, this is a power consumption issue.' But yesterday you were designing the same chip. Sometimes when chips come back from the lab, they find that the timing is not accurate because the thermal or power effects on timing were not properly considered. The models and standard libraries did not predict these, which can have a significant impact. Therefore, designs are always built with a large margin, how can we compress this? This also means considering multi-physical effects, as well as timing and construction."

Breaking down complex problems into manageable parts is a challenge that chip design engineers are still grappling with. "This is a new puzzle, and I see a lot of people struggling with it, and this is just one of the challenges of complexity, not even touching on the atomic level," Kabir says. "What is the design process like? Who goes first, who goes later? Which problem do you solve first? And it's not just that, how do you ensure that the problem is solved throughout the process, and all the different chips can be merged together? No company knows how to do this; we must solve it together. Everyone will provide different solutions, and this is where artificial intelligence/machine learning tools can be very helpful."

Mueth from Keysight agrees. "This is definitely a multidisciplinary challenge. Your digital designers must communicate with your RF designers, who in turn must communicate with your analog designers; a chip designer must communicate with a packaging designer; thermal analysis, vibration analysis. It's a multidisciplinary world because now you have your systems and systems of systems. You have the underlying components. It's really complex. There are four different dimensions, and then you have to look at it throughout the entire engineering lifecycle. Sometimes it's amazing what people can accomplish."

This may be an understatement. Although complexity is growing exponentially, the number of staff has not increased accordingly. "The average tenure of engineers in the United States is 4.5 years. In Silicon Valley, that number is 2.5 years," Mueth adds. "When they leave, they take all the design knowledge, tribal knowledge, company knowledge, and you're left with gaps. So, you really want to have a way to digitize your processes, lock them down, and lock down the intellectual property you've developed. You have to find a way to bridge or close the gap between staff and complexity, which includes looking for new automation processes. We've seen a lot of people working hard to develop large platforms. But we already know that large platforms cannot cover everything. They can't. There are too many changes, and too many applications. The solution is a combination of application-specific workflows, peripheral engineering management, and peripheral processes, because engineers do not spend 100% of their time on simulation, or even on design. Most of their time is spent on peripheral processes, which sadly have not been automated."

Multi-chip design pushes complexity to the limit

Multi-chip design pushes complexity to the limit

Comment