

# A Holistic Evaluation of Buried Power Rails and Back-Side Power for Sub-5 nm Technology Nodes

S. S. Teja Nibhanupudi<sup>®</sup>, *Graduate Student Member, IEEE*, Divya Prasad<sup>®</sup>, Shidhartha Das<sup>®</sup>, *Member, IEEE*, Odysseas Zografos, Alex Robinson, Anshul Gupta<sup>®</sup>, *Member, IEEE*, Alessio Spessot<sup>®</sup>, *Member, IEEE*, Peter Debacker<sup>®</sup>, Diederik Verkest, Julien Ryckaert, Geert Hellings<sup>®</sup>, *Senior Member, IEEE*, James Myers, *Member, IEEE*, Brian Cline, *Member, IEEE*, and Jaydeep P. Kulkarni<sup>®</sup>, *Senior Member, IEEE* 

Abstract—Buried power rail (BPR) and back-side power delivery grid have been proposed as solutions to scaling challenges that arise beyond the 5-nm technology node, mainly to lower IR drop and further shrink area. This article demonstrates a holistic evaluation of this technology and its variants at the microprocessor level. This is carried out by taking an Arm Cortex-A53 design through the standard-VLSI physical design implementation flow on Imec's iN6 node, equivalent to the industry 3-nm technology node, which features the buried power technology. The power, performance, area, on-chip IR drop, and off-chip voltage droop metrics are benchmarked, and implications on power gating are explored. An extensive Design-Technology-Co-Optimization (DTCO) study of the back-side power grid is presented to enhance the decoupling capacitance by sweeping associated technology parameters showcasing further optimization opportunities in manufacturing. The conclusions of this work highlight that the front-side (FS) power delivery network (PDN) with buried rails achieves a 25% lower on-chip IR drop and 17% lower off-chip voltage droop (power supply noise) resulting in 21% lower guard band voltage. On the other hand, the back-side power grid with BPRs achieves 85% lower on-chip IR drop and 30% off-chip voltage droop resulting in 60% lower guard band voltage. In addition, the impact of BPRs, and back-side power grids on power gated designs are evaluated.

*Index Terms*—Buried power rail (BPR), Design-Technology-Co-Optimization (DTCO), IR drop, off-chip voltage droop.

Manuscript received 27 April 2022; revised 16 June 2022; accepted 22 June 2022. Date of publication 7 July 2022; date of current version 25 July 2022. The review of this article was arranged by Editor P. Thadesar. (*Corresponding author: S. S. Teja Nibhanupudi.*)

S. S. Teja Nibhanupudi and Jaydeep P. Kulkarni are with the Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712 USA (e-mail: subrahmanya\_teja@utexas.edu; jaydeep@austin.utexas.edu).

Divya Prasad, Shidhartha Das, Alex Robinson, James Myers, and Brian Cline are with ARM Research, Austin, TX 78735 USA.

Odysseas Zografos, Anshul Gupta, Alessio Spessot, Peter Debacker, Diederik Verkest, Julien Ryckaert, and Geert Hellings are with imec, 3001 Leuven, Belgium.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TED.2022.3186657.

Digital Object Identifier 10.1109/TED.2022.3186657

# I. INTRODUCTION

RADITIONAL dimension scaling in semiconductor tech-I nology nodes has been achieved by scaling of metal pitch (MP) and contacted poly pitch (CPP). In the advanced CMOS technology nodes (sub-10 nm), the metal half-pitch has scaled to very narrow dimensions (sub-20 nm) [1]. At these metal line widths, the resistivity of the metal increases significantly due to increased size effects of wires such as surface and grain-boundary scattering [2]. The increased resistivity aggravates the IR drop problem and has become a significant bottleneck in high-performance designs at sub-5-nm CMOS technology nodes. To ensure a lower IR drop, designers are often forced to tradeoff signal routing resources to build finer, robust power grids. Buried power rails (BPRs) have been recently proposed as a technology booster for sub-5-nm CMOS nodes to enable standard cell area scaling and to lower the IR drop problem [3], [4]. In this technology, the power rails (e.g., VDD, VSS) are buried within the silicon substrate and tapped through special vias to connect to the power grid (frontor back-side). The BPRs with a high aspect ratio minimize the IR drop by allowing a lower-resistance path for power delivery to the transistors [5].

In addition to on-chip IR drop, parasitic effects introduced by the other components of the power delivery network (PDN) such as PCB, package, C4 bumps, and so on induce voltage droop upon transient current spike events [6]. This off-chip voltage droop can be lowered by increasing the on-chip decoupling capacitance. In the Buried rail technology, since the metal lines are routed beneath the substrate where no signals are routed, low-resistance high-aspect-ratio power rails can be realized. The high aspect ratio helps increase the decoupling capacitance between supply and ground, thus lowering the offchip voltage droop associated with spontaneous current spike events.

In this article, we present a holistic evaluation of the BPRs and back-side power grid by considering three PDN configurations—conventional PDN named front side (FS), FS power delivery with BPRs (FSBPRs), and back-side power delivery with BPRs (BSBPRs). These configurations

0018-9383 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. (a) FS PDN schematic. (b) FSBPR schematic. (c) BSBPR schematic. (d) Table showing the resistance of critical metal layers.

are evaluated to quantify the impact on Performance–Power– Area (PPA), on-chip IR drop, off-chip voltage droop, and power gating using a representative 64-bit CPU such as Arm<sup>1</sup> Cortex<sup>1</sup>-A53 CPU. Previous studies in this domain focused either on physical design [4] or PDN modeling [7].

Our main contributions to this article are as follows.

- Comprehensive analysis elaborating on the critical tradeoffs between microprocessor performance and on-chip IR drop for different PDN configurations presented in our previous work [5].
- Holistic Design-Technology-Co-Optimization (DTCO) study of the backside PDN through sweeping technology parameters to optimize both on-chip IR drop and off-chip voltage droop.
- Evaluating potential challenges involved in the integration of power gates with BPRs and backside power grid technology. Extensive analysis highlighting the impact of BPRs on local power grid resistance.

The rest of the article is organized as follows. Section II provides an introduction to the concept of BPRs and backside power delivery. Various power delivery configurations are described in Section III. Section IV presents the results for CPU design, on-chip IR drop, off-chip voltage droop, and gated power grid design. Section V presents a summary and conclusion.

#### II. TECHNOLOGY DETAILS

# A. Buried Power Rails

BPRs can be implemented in the FinFET technology at the cell boundaries [4]. After fin formation, the BPR process modules begin by etching a cavity in the Shallow-Trench-Isolation (STI) extending into bulk silicon. This is followed by atomic-layer deposition (ALD) of a thin dielectric barrier layer to isolate the buried rail from the Si bulk electrically [8]. The cavity is then filled with metals that can withstand front-end of line (FEOL) thermal budgets such as ruthenium (Ru) or tungsten (W) and capped for further FEOL integration. Experimentally demonstrated BPRs exhibit resistances between 30 and 50  $\Omega/\mu$ m [4].

## B. Backside Power Delivery

Backside power delivery is a unique 3-D-integration technique wherein the entire PDN is integrated on the backside of the chip [9]. Fine pitch micro-through-silicon-vias ( $\mu$ TSVs)

<sup>1</sup>Registered trademark.

connect the BPRs to the PDN on the backside. After the processing on the FS, the wafer is thinned to 500 nm, and  $\mu$ TSVs are etched from the backside, using the BPR metal as an etch stop layer [10]. This is followed by the deposition of the backside metal layers to distribute the power to C4 bumps. Recent studies have successfully demonstrated functional transistors powered through a backside power grid [11].

# **III. PDN CONFIGURATIONS**

In this article, we explore three different power delivery configurations; namely:

- 1) FS where the signal and power nets are routed on the FS of the chip [see Fig. 1(a)];
- FSBPR which is similar to FS except that the power supply tracks for the standard cells are buried within the substrate [see Fig. 1(b)]; and
- BSBPR where the power nets are routed on the backside of the chip but the signal nets are routed on the FS of the chip [see Fig. 1(c)].

# A. BPR Standard Cell Design and Metal Interconnect

To evaluate the system-level impact of BPRs, standard cells in IMEC's-iN6 technology node (equivalent to IRDS 3 nm [1]) have been designed with and without incorporating the BPRs. The FS configuration is implemented through conventional design flow using a standard cell library without BPR. The FSBPR/BSBPR configurations use the standard cell library implementing BPR technology along with a modified standard-cell layout, technology Library Exchange Format (LEF), and interconnect RC files which capture the effect of BPR. The standard cells in this technology node are six-track high with four tracks reserved for routing and two tracks for power rails. Although BPR can enable standard cell height scaling to five tracks, this work considers only six-track high cells. This enables fair and effective evaluation of just the buried rail as a technology booster for advanced nodes from an IR drop perspective. The iN6 technology has a 14-metal layer interconnect stack: M1-M13 and MINT (intermediate metal layer in iN6 technology node for local routing), with interconnecting pitches representative of  $\sim 3$  nm technology node with a single re-distribution layer (RDL) to connect the C4 bumps. The FSBPR and BSBPR configurations have an additional buried metal layer (MBUR). Fig. 1(d) presents the resistance of some of the fine pitch metal interconnect layers.

 TABLE I

 PDN CONFIGURATION SPECIFICATIONS (CPP = 45 nm)

| PDN config-<br>uration | PDN design  | Layers             | VDD-VSS<br>pitch       |
|------------------------|-------------|--------------------|------------------------|
|                        | PDN1 (PDN4) | M3-M4<br>M7-M13    | 32CPP<br>32CPP         |
| FS (FSBPR)             | PDN2 (PDN5) | M3-M4<br>M7-M13    | 16CPP<br>16CPP         |
|                        | PDN3 (PDN6) | M3<br>M4<br>M7-M13 | 16CPP<br>8CPP<br>16CPP |
| BSBPR                  | PDN7        | μTSV<br>BM1<br>BM2 | 2μm<br>2μm<br>2μm      |
|                        | PDN8        | μTSV<br>BM1<br>BM2 | 750nm<br>750nm<br>2μm  |
|                        | PDN9        | μTSV<br>BM1<br>BM2 | 500nm<br>500nm<br>2μm  |

# B. FS/Conventional PDN

In the FS configuration, the power supply tracks reside on the MINT metal layer (first back-end of line (BEOL) metal layer below M1). The small MINT MP (~22 nm) results in highly resistive power rails with a resistance of about 900  $\Omega/\mu$ m [12], resulting in IR drop hot spots in the CPU design, as will be discussed in subsequent sections. To study the impact of PDN design on performance/IR drop, three different PDN designs are considered with increasing power grid density (PDN1 having the sparsest and PDN3 having the densest). Table I presents the specifications of each of the PDN designs for the FS configuration. Each of the three PDN designs dedicates four metal layers (M1, M2, M5, and M6) for enabling local signal routing. The power grid on these four metal layers is only restricted to the via structure (no metal stripes on these layers).

# C. FS PDN With Buried Rails

The power rails of standard cells use the MBUR layer (buried metal layer) in the FSBPR configuration. The access to the BPRs is restricted to specific "tapping" points where the dielectric is etched to create special vias (buried via VBUR). These special vias are embedded within dedicated tap cells carefully placed across the design. The tap cells consume additional area and obstruct the placement of the standard cells in the design. Therefore, the placement of tap cells is a crucial design constraint in FSBPR designs. Similar to FS configuration, three PDN designs are considered (PDN4–PDN6) for FSBPR configuration, as presented in Table I.

# D. Backside PDN With Buried Power Rails

The BSBPR configuration eliminates the overhead due to tap cells needed in the FSBPR configuration. It reduces the wiring congestion on the FS since all the metal resources can be dedicated to signal/clock routing. However, the signal I/Os would have to pass through the backside of the chip along with power rails to finally connect to C4 bumps [13]. The

additional parasitic capacitance induced by BPR and  $\mu$ TSVs can be compensated using technology innovations [13]. In recent years, this configuration has gained attention (even from semiconductor companies) owing to the tremendous potential of realizing low-power chips [14]. In this study, the PDN on the backside is limited to three metal layers (MBUR: buried metal, BM1: backside metal-1, BM2: backside metal-2). Additional layers can be added on the backside if required by the design specifications. Since the backside metal stack is dedicated to power-ground routing, metal interconnects can have large track widths (>250 nm), which can significantly lower the resistance of the grid. The IR drop is reduced as the  $\mu$ TSV pitch reduces but at the cost of increased process complexity. To study the impact of  $\mu$ TSV pitch on the IR drop, three PDN designs are considered in this study with decreasing  $\mu$ TSV pitch (PDN7 having the largest and PDN9 having the smallest) as presented in Table II.

#### IV. RESULTS

#### A. CPU Performance and IR Drop

To evaluate the system-level impact of the different PDN configurations, physical design implementation of a representative 64-bit high-efficiency CPU such as Cortex-A53 is realized using imec's iN6 library. The designs have a single power domain (VDD = 0.7 V) and are compared under isoarea conditions (die area-150  $\mu$ m  $\times$  150  $\mu$ m). Fig. 2(a) shows the power versus performance of the chip for the three PDN configurations. The implementations that utilize the BSBPR configuration consume lower power than the FS/FSBPR configurations across the entire range of design frequencies due to the significantly reduced routing congestion. In comparison, the FSBPR configuration consumes higher power compared to the FS configuration due to the overhead created by tap cells. Overall for an iso-frequency of 1.4 (normalized units), the FSBPR consumes 10% higher power, and BSBPR consumes 8% lower power than the FS configuration.

The placed and routed physical design of the Cortex-A53 CPU is used for vector-less dynamic IR drop analysis in the Cadence Voltus environment [15]. Fig. 2(b) shows the layerbased distribution of IR drop for each of the nine PDN designs (see Table I). In the FS configuration, the highly resistive local metal layers (MINT-M3) contribute about 60% of the IR drop in PDN1 with 32CPP pitch. This significant IR drop in the local metal layers is reduced to half in PDN2 with 16CPP MP. The IR drop is further reduced by specifically reducing only the M4 MP to 8CPP in PDN3. The denser M4 MP reduces the IR drop on the highly resistive M3 metal layer. In the FSBPR configuration, the less resistive MBUR layer (30  $\Omega/\mu$ m) replaces the highly resistive MINT layer (900  $\Omega/\mu m$ ). This reduces the overall IR drop in the FSBPR configuration by  $\sim 30\%$  across all the three PDN designs (PDN4–PDN6). The BSBPR configuration IR drop is strongly dependent on the  $\mu$ TSV pitch. Although the MBUR has low resistance (30  $\Omega/\mu$ m), increasing the  $\mu$ TSV pitch significantly increases the voltage drop on the MBUR layer compared to the less resistive backside metal layers (BM1, BM2).

Fig. 2(c) shows the variation in maximum achieved frequency and IR drop for the three PDN designs in the FS



Fig. 2. (a) Normalized power versus performance [5]. (b) Layer-based IR drop distributions for PDN1-9. (c) Frequency and IR drop variation for PDN1-3. (d) Energy versus dynamic IR drop for all the PDN configurations [5].

configuration. As expected, the IR drop and the maximum achieved frequency reduce as the power grid density increases. The performance degrades by 30%, and the IR drop improves by 70%, moving from PDN1 to PDN3. A similar trend is observed for the PDN designs when implemented using the FSBPR configuration. Fig. 2(d) summarizes the tradeoff between power/performance and IR drop for all the PDN designs considered in this study. The power divided by performance metric (mW/GHz) estimates the energy loss to lower the IR drop in different PDN configurations. Despite the increased energy loss incurred while moving from PDN1 to PDN3, the FS configuration does not meet the IR drop target. The FSBPR configuration, although meets the IR drop target, it experiences higher energy loss due to the overhead of the tap cells. On the contrary, the BSBPR configuration completely decouples this tradeoff and does not incur any energy loss to lower the IR drop.

## B. Off-Chip Voltage Droop Analysis

The IR drop analysis presented in Section IV-A is limited to on-chip PDN. However, the chip-package-PCB parasitics induce power supply noise (off-chip voltage droop) during transient current spike events. The power supply noise is estimated by simulating the equivalent circuit shown in Fig. 3(a) [6]. This power supply noise can be lowered by increasing the on-chip decoupling capacitance [6], [16].

The BPRs increase the decoupling capacitance as they are enclosed in relatively high permittivity material (silicon:  $k \sim$ 11.7) compared to inter-layer dielectric ( $k \sim 1.8$ ). The buried rail thickness is also higher than local metal layers (M1-M6) which enhances the sidewall decoupling capacitance. The resistance/capacitance of the buried rail and the rest of the PDN are obtained using the Synopsys Raphael RC extraction engine. The increased decoupling capacitance due to buried rail lowers the power supply noise (magnitude of first voltage droop) by 17% in FSBPR compared to FS configuration. The simulations have been conducted considering an eightcore configuration with one core switching and the other cores providing useful decoupling capacitance. In the BSBPR configuration, the power grid on the backside can be optimized independently to increase the decoupling capacitance and lower the power supply noise. The decoupling capacitance from the backside power grid can be increased by:

- 1) increasing backside dielectric relative permittivity;
- 2) increasing backside metal thickness;

- reducing via height connecting backside metal layers; and
- 4) reducing  $\mu$ TSV pitch.

These modifications are not possible in the FS/FSBPR configuration due to the risk of increasing signal-to-signal noise coupling. Therefore, the BSBPR configuration provides this unique opportunity to increase the decoupling capacitance and reduce the power supply noise (magnitude of first voltage droop) without impacting the signal integrity. Fig. 3(c)shows the variation of the power supply noise with the backside dielectric relative permittivity for three different BPR aspect ratios. CMOS-compatible high-k oxides such as  $Al_2O_3$  $(k \sim 9)$  [17] or HfO<sub>2</sub>  $(k \sim 23)$  [18] can replace the usually preferred low-k inter-layer dielectrics ( $k \sim 1.8$ ). The power supply noise is reduced by 12% by increasing the dielectric relative permittivity from 1.8 to about 25. Increasing the backside metal thickness increases the side-wall capacitance and reduces the power supply noise, as shown by Fig. 3(d). Similarly, reducing the via height between the backside metal layers increases the capacitance between BM1 and BM2 layers as shown by Fig. 3(e). Furthermore, reducing the  $\mu$ TSV pitch increases the sidewall capacitance and helps reduce the power supply noise. As shown by Fig. 3(f), reducing the  $\mu$ TSV pitch from 1 to 0.25  $\mu$ m reduces the power supply noise by 15%. Overall, the optimized BSBPR configuration (dielectric permittivity = 25, thickness/width ratio = 3, via height = 140 nm,  $\mu$ TSV pitch = 250 nm) has 59% and 21% higher decoupling capacitance compared to FS and FSBPR configurations, respectively. Fig. 4(a) shows the decoupling capacitance contribution from various components of the onchip PDN. Fig. 4(b) shows the variation in power supply noise with the di/dt for the transient current spike event. The optimized BSBPR configuration has lower power supply noise compared to FS and FSBPR. In the frequency domain, the peak impedance of BSBPR shifts to lower frequencies, and the magnitude of the peak is reduced by 34% as seen from Fig. 4(c). The corresponding time-domain voltage transient simulated with step current input is shown in Fig. 4(d). The waveform shows the worst-case instance experiencing dynamic IR drop (due to power grid resistance) superimposed by the power supply noise (due to off-chip voltage droop) for FS, FSBPR, and BSBPR configurations.

### C. Power Gate Implementation

The on-chip IR drop and off-chip voltage droop analysis presented in Sections IV-A and IV-B do not account for power



Fig. 3. (a) Equivalent circuit model for chip-package-PCB system. (b) Illustration of possible optimizations in BSBPR. Power supply noise variation with (c) backside dielectric relative permittivity, (d) backside metal thickness-to-width ratio, (e) BM1-BM2 via height, and (f)  $\mu$ TSV pitch.

gates in the design. However, most modern SoC modules implement power gating to minimize the leakage power consumed by inactive cores [19]. In the power gating technique, the local power grid (or power domain) is connected to the global power grid (main power supply) through switchable transistors called power gates. This section presents the impact of BPRs on the designs employing the power gating technique. Fig. 5(a) and (b) illustrates the potential power gate integration methodology for FSBPR and BSBPR designs, respectively. The standard cells and BEOL metal interconnects are omitted in the diagram for clarity. Also, the global VDD (main power supply), local VDD (power domain), and VSS interconnects are color coded to match the power gate schematic [see Fig. 5(a) and (b) (inset)]. In the FSBPR configuration, the power gate is connected to the local VDD on the BPR layer (instead of the MINT layer for FS configuration). To facilitate this connection, the VBPR needs to be accommodated within the standard cell of the power gate. In the BSBPR configuration, low resistance backside metals are employed to implement both local and global power grids to minimize IR drop. The global VDD connects to the power gate through backside metals and  $\mu$ TSVs. Since the  $\mu$ TSV needs BPR to land, the VSS BPR can be split to create an isolated global VDD BPR island, as shown in Fig. 5(b). The drain of the power gate then connects to the local VDD BPR, which distributes power to the standard cells through the backside metals.



Fig. 4. (a) Decoupling capacitance comparison highlighting the contribution of different components. (b) Power supply noise variation with rate of change of current for step input. (c) Impedance profile of the diepackage-PCB system. (d) Voltage transient upon transient current spike event.

Typically in power gating implementations, the global power grid is designed using higher (BEOL) metal layers (>M6 or M7), and the local power grid is designed using lower metal layers. The highly resistive local power grid contributes significantly to the overall IR drop of the design. Therefore, in this section, the impact of buried rails on power gating designs is studied by considering the local power grid resistance. The local power grid resistance (for FS, FSBPR, and BSBPR designs) is analyzed using the Cadence Voltus simulation environment. The power gates are uniformly distributed across the chip area [see Fig. 5(c)] and the effective resistance from the power gates to every standard cell is determined. Fig. 5(d)-(f) shows the effective resistance heatmap for tight pitch PDN designs of each configuration: FS-PDN3, FSBPR-PDN6, and BSBPR-PDN9. The number of power gates required in a design is dependent on several factors such as power-gate resistance, the operational frequency of the chip, input vectors, and so on. Therefore, we analyze the variation of peak resistance (worst-case located standard cell) with the number of power gates in the design as shown by Fig. 5(g). Here, an important trend can be observed, the resistance of the power grid reduces, and saturates at a certain value for each configuration. This minimum limit is determined by the resistance of the lowest metal layer from the nearest via to the worst-case standard cell (located halfway between two VDD lines). Since the resistance of MINT layer is high (MINT resistance is  $\sim 30 \times \text{MBUR}$  resistance [see Fig. 1(d)]), the FS configuration local power grid resistance is  $4.5 \times$  higher than FSBPR and  $40 \times$  higher than BSBPR configurations.

In the FS/FSBPR configurations, the local power grid uses up to six lower metal layers where most of the signals are also routed. If the local power grid resistance can meet the desired target resistance with fewer metal layers, additional metal layers can be exclusively allocated for signal routing and global power grid routing. Fig. 5(h) shows the variation in local power grid resistance with the number of metal



Fig. 5. Schematic showing (a) FSBPR power gate implementation and (b) BSBPR power gate implementation. (c) Power grid map showing uniformly distributed power gate. Effective resistance heatmap of (d) FS, (e) FSBPR, and (f) BSBPR PDN designs. (g) Local power grid resistance variation with the number of power gates in the design. (h) Local power grid resistance variation with number of metal layers in the power grid.

| PDN   | Area | Frequency | Power | IR drop | Power supply noise | Local power<br>grid resistance |
|-------|------|-----------|-------|---------|--------------------|--------------------------------|
| FS    | 1x   | 1x        | 1x    | 1x      | 1x                 | 1x                             |
| FSBPR | 1x   | 1x        | 1.1x  | 0.75x   | 0.85x              | 0.25x                          |
| BSBPR | 1x   | 1x        | 0.92x | 0.15x   | 0.7x               | 0.02x                          |

TABLE II COMPARISON OF IMPORTANT DESIGN METRICS

layers employed. The BSBPR configuration is shown with a single point since the local power grid is designed using the two available backside metal layers. In the FS/FSBPR configurations, the resistance reduces as the higher metal layers (having lower resistance) are added to the local power grid. As expected, the FSBPR local power grid has lower resistance than the FS local power grid. To meet the desired resistance target (derived from activity in the cores), the FSBPR power grid can employ fewer metal layers compared to an FS configuration. Overall, the FSBPR local power grids achieve  $\sim 4.5 \times$  lower resistance compared to FS configuration local power grids owing to the enhanced current carrying capability of the BPRs. Since very low resistive backside metals are used to implement the local power grid in BSBPR, the tight pitch PDN9 has  $\sim 40 \times$  lower resistance compared to FS configuration. Overall, the low-resistance BPR and backside metal layers [see Fig. 1(d)] can help alleviate the IR drop problem in power-gated designs.

## V. CONCLUSION

A thorough PDN design study considering different possible power delivery configurations using BPR technology is presented. The system-level impact has been evaluated through the physical design implementation of a 64-bit high-efficiency CPU such as Cortex-A53 at the sub-5-nm node. The power, performance, and IR drop metrics have been presented for the FS, FSBPR, and BSBPR configurations. The FSBPR and BSBPR configurations can lower the IR drop by 25% and 85%, respectively, compared to the FS configuration, thereby comfortably meeting the 10% IR drop target. Furthermore, a unique method to enhance the decoupling capacitance of the BSBPR configuration is presented, resulting in 30% lower power supply noise consumption than the FS configuration. Finally, the impact of BPR on power gating implementation is presented. The local power grid resistance of FSBPR and BSBPR show  $\sim 4.5 \times$  and  $\sim 40 \times$  lower resistance compared to FS configuration.

#### ACKNOWLEDGMENT

The authors would like to thank Dr. Saurabh Sinha, Rogier Baert, Bilal Chehab, and Satadru Sarkar for helpful suggestions and contributions to this work.

# REFERENCES

- International Roadmap for Devices and Systems (IRDS), IEEE, Piscataway, NJ, USA, 2020.
- [2] H. M. van der Veen et al., "Damascene benchmark of Ru, Co and Cu in scaled dimensions," in Proc. IEEE Int. Interconnect Technol. Conf. (IITC), Jun. 2018, pp. 172–174.
- [3] A. Gupta *et al.*, "High-aspect-ratio ruthenium lines for buried power rail," in *Proc. IEEE Int. Interconnect Technol. Conf. (IITC)*, Jun. 2018, pp. 4–6.
- [4] J. Ryckaert *et al.*, "Extending the roadmap beyond 3 nm through system scaling boosters: A case study on buried power rail and backside power delivery," in *Proc. Electron Devices Technol. Manuf. Conf. (EDTM)*, Mar. 2019, pp. 50–52.
- [5] D. Prasad *et al.*, "Buried power rails and back-side power grids: Arm CPU power delivery network design beyond 5 nm," in *IEDM Tech. Dig.*, Dec. 2019, pp. 1–19.

- [6] S. Pant, "Design and analysis of power distribution networks in VLSI circuits," Ph.D. thesis, Univ. Michigan Ann Arbor, Ann Arbor, MI, USA, 2008.
- [7] M. O. Hossen, B. Chava, G. V. D. Plas, E. Beyne, and M. S. Bakir, "Power delivery network (PDN) modeling for backside-PDN configurations with buried power rails and μ TSVs," *IEEE Trans. Electron Devices*, vol. 67, no. 1, pp. 11–17, Dec. 2019.
- [8] A. Gupta *et al.*, "Buried power rail integration with FinFETs for ultimate CMOS scaling," *IEEE Trans. Electron Devices*, vol. 67, no. 12, pp. 5349–5354, Dec. 2020.
- [9] B. Chava *et al.*, "Backside power delivery as a scaling knob for future systems," *Proc. SPIE*, vol. 10962, Mar. 2019, Art. no. 1096205.
- [10] A. Jourdain *et al.*, "Extreme wafer thinning and nano-TSV processing for 3D heterogeneous integration," in *Proc. IEEE 70th Electron. Compon. Technol. Conf. (ECTC)*, Jun. 2020, pp. 42–48.
- [11] A. Veloso *et al.*, "Enabling logic with backside connectivity via n-TSVs and its potential as a scaling booster," in *Proc. Symp. VLSI Technol.*, Jun. 2021, pp. 1–2.
- [12] B. Chava *et al.*, "DTCO exploration for efficient standard cell power rails," *Proc. SPIE*, vol. 10588, Mar. 2018, Art. no. 105880B.

- [13] W.-C. Chen *et al.*, "External I/O interfaces in sub-5 nm GAA NS technology and STCO scaling options," in *Proc. Symp. VLSI Technol.*, Jun. 2021, pp. 1–2.
- [14] IEEE Spectrum. Accessed: Aug. 26, 2021. [Online]. Available: https://spectrum.ieee.org/next-gen-chips-will-be-powered-from-below/
- [15] Cadence Voltus IC Power Integrity Solution User Guide, Cadence Des. Syst., San Jose, CA, USA.
- [16] S. Das, P. Whatmough, and D. Bull, "Modeling and characterization of the system-level power delivery network for a dual-core ARM cortex-A57 cluster in 28 nm CMOS," in *Proc. IEEE/ACM Int. Symp. Low Power Electron. Design (ISLPED)*, Jul. 2015, pp. 146–151.
- [17] S. Jakschik, U. Schroeder, T. Hecht, M. Gutsche, H. Seidl, and J. W. Bartha, "Crystallization behavior of thin ALD-Al<sub>2</sub>O<sub>3</sub> films," *Thin Solid Films*, vol. 425, nos. 1–2, pp. 216–220, Feb. 2003.
- [18] Y.-S. Lin, R. Puthenkovilakam, and J. P. Chang, "Dielectric property and thermal stability of HfO<sub>2</sub> on silicon," *Appl. Phys. Lett.*, vol. 81, no. 11, pp. 2041–2043, Sep. 2002.
- [19] K. Agarwal, K. Nowka, H. Deogun, and D. Sylvester, "Power gating with multiple sleep modes," in *Proc. 7th Int. Symp. Quality Electron. Design (ISQED)*, Mar. 2006, p. 5.