# Efficient Backside Power Delivery for High-Performance Computing Systems

Hesheng Lin, Student Member, IEEE, Geert Van der Plas, Member, IEEE, Xiao Sun, Member, IEEE, Dimitrios Velenis, Member, IEEE, Francky Catthoor, Fellow, IEEE, Rudy Lauwereins, Fellow, IEEE, and Eric Beyne, Senior Member, IEEE

Abstract—In this work, we present a thin-profile, efficient power delivery approach including a voltage regulator with in-package power inductor and backside power delivery network (PDN). To meet 1  $W/mm^2$  power-density target for high-performance computing systems, a 25-high-O-factor (300 MHz), 150-um thick. in-molding power inductor is provided for high-efficiency pointof-load voltage regulation. Meanwhile, a novel analytical model for backside power delivery is developed for computer-aideddesign procedure to optimize the system efficiency. For the power flowing from bumps (57-µm V<sub>DD</sub>-bump-pitch) and backside PDN to active devices, the area resistances contributed by backside PDN and the buried power rail (BPR) are 23% and 77%, respectively, if a 10-µm-horizontal-pitch nano-through-silicon-via is available. The resulting impact on power dissipation is within 1% so negligible. A higher-ratio (0.5) buck converter with maintained efficiency is combined to better benefit the external interconnect. The overall power delivery efficiency n=83% can be obtained for  $1 W/mm^2$  power-density target. The power losses contributed by an air-core inductor, power switches, and PDN/BPR/redistribution layer are 26%, 66%, and 8%, respectively.

*Index Terms*—Backside PDN, buck converter, air-core inductor, system integration, system optimization.

## I. INTRODUCTION

**D**RIVEN by the explosive growth of big data, cloud computing and Internet of Things, high-performance computing (HPC) systems are in huge demand over the past years [1][2]. With the trend that computing's workloads are becoming more heterogeneous and explosive, together with the lower voltage resulted by device scaling and power saving, the aggressive supply current leads to significant challenges on power delivery network (PDN). Recently, buried power rail (BPR) and backside PDN were proposed due to their advantages on lower energy and lower IR drop [3]–[5]. As is provided in literatures, 50  $\Omega/\mu$ m of power rail resistance can be obtained with high aspect-ratio Ruthenium-based BPR. Meanwhile, by landing high-density *nano*-through-silicon-via (*n*TSV) on BPR, backside metal layer can reach the BPR to form a complete backside PDN. To address the power loss and critical IR drop on external interconnect, and to maintain a fast transient-response, an integrated voltage regulator (IVR) is highly desirable. It can be located in the package or even on the logic die for point-of-load (PoL) voltage regulation [6]–[9].



Fig. 1. (a) Schematic of buck converter, and (b) its in-package air-core inductor with labels of device dimensions.



Fig. 2. Cross section of backside power delivery: buck converter with inmolding inductor (pillar height=120  $\mu$ m), backside PDN and the logic die. The converter's power switches (small chip area) are designed in the logic die.

In this paper, we focus on the required analytical model of the backside power delivery, and this allows the evaluation of the system efficiency and voltage IR drop. Combining buck converter with an in-molding air-core inductor (Fig. 1) and backside PDN, Fig. 2 gives the heterogeneous package integration of the backside power delivery in which the RDL means the redistribution layer. An air-core inductor with 150- $\mu$ m thickness is selected, because it can achieve a high

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. https://doi.org/10.1109/TVLSI.2022.3183904

The authors are with imec, Heverlee 3001, Belgium. H. Lin, F. Catthoor and R. Lauwereins are also with the Department of Electrical Engineering, KU Leuven, Leuven 3000, Belgium (email: hesheng.lin@imec.be, geert.vanderplas@imec.be, francky.catthoor@imec.be, rudy.lauwereins@imec.be).

| Layer           | Parameter                                        | Description                                                                                            | Value                                                                   |  |
|-----------------|--------------------------------------------------|--------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|--|
|                 | r <sub>BPR</sub>                                 | Ruthenium-based BPR's resistance density                                                               | 50Ω/µm                                                                  |  |
| BPR             | P <sub>BPR</sub>                                 | BPR pitch; also 2· <i>P</i> <sub>BPR</sub> is the vertical <i>n</i> TSV vertical-pitch                 | 0.105µm                                                                 |  |
| rTeV/           | R <sub>nTSV</sub>                                | Resistance of each nTSV                                                                                | 2 Ω                                                                     |  |
| 1130            | P <sub>nTSV</sub>                                | nTSV pitch in horizontal                                                                               | 2~20µm                                                                  |  |
| Tap via (BSM3/2 | R <sub>TAP</sub>                                 | Resistance of each tap via                                                                             | $0.5\Omega \cdot (2\mu m/P_{nTSV})^2$                                   |  |
| to BSM1)        | I <sub>TAP</sub>                                 | Delivered current of each tap via                                                                      | <sup>a</sup> $4j_0 \cdot P_{nTSV}^2$                                    |  |
| Backside        | $R_{\square BSM1}, w_{BSM1}$                     | Square resistance and metal width of BSM1                                                              | R <sub>□BSM1</sub> =80mΩ/□, w <sub>BSM1</sub> =0.5µm<br>(0.25-µm-thick) |  |
| metals          | R <sub>□BSM3</sub>                               | Square resistance of BSM3/2                                                                            | 40mΩ/□ (0.5-µm-thick)                                                   |  |
| Bump            | R <sub>BUMP</sub>                                | Resistance per bump                                                                                    | 10mΩ                                                                    |  |
| Builip          | PB                                               | V <sub>DD</sub> (or V <sub>SS</sub> )-bump pitch                                                       | ≤160μm (typical: 57μm)                                                  |  |
|                 | α                                                | Factor to define RDL3's spacing                                                                        | 3                                                                       |  |
| RDL3            | WBUS                                             | Width of RDL3's bus line $P_{B'}(\alpha+1)/\alpha$ , PDN size:b $P_{B'}(2\alpha+1)/\alpha$ , size: 0.6 |                                                                         |  |
|                 | W <sub>RDL3</sub>                                | Width of RDL3's branch wire $P_B/\alpha$                                                               |                                                                         |  |
|                 | R <sub>□RDL3</sub>                               | Square resistance of RDL3                                                                              | 2mΩ/□                                                                   |  |
| Via             | h <sub>VIA</sub> , D <sub>VIA</sub>              | Height and diameter of via contact                                                                     | h <sub>via</sub> =10μm, D <sub>via</sub> =50μm                          |  |
| Pillar          | h <sub>P</sub> , D <sub>P</sub> , s <sub>P</sub> | Height, diameter and spacing of the pillars                                                            | h <sub>P</sub> =120µm, D <sub>P</sub> =75µm, s <sub>P</sub> ≥75µm       |  |
| RDL1,RDL2       | $t_{RDL}, w_{RDL}, s_{RDL}$                      | Thickness, width and spacing of RDL1/2;<br>RDL3's thickness is also s <sub>RDL</sub>                   | t <sub>RDL</sub> =10µm,<br>w <sub>RDL</sub> , s <sub>RDI</sub> ≥10µm    |  |
| PDN or inductor | pgrid_xpitch<br>pgrid_ypitch                     | Dimension of inductor (or PDN) unit in x-/y-<br>direction                                              | 400µm x integer Number                                                  |  |

TABLE I. SPECIFICATIONS OF THE BACKSIDE-PDN AND POWER INDUCTOR

<sup>a</sup>  $j_0 = P/V_{DD}$  is the current density, and P is the power density; <sup>b</sup> design wider bus lines for optimal resistive loss in RDL3 layer.

inductance and a low resistance without complicated processing like core lamination [10]. Based on this proposed system, the main contributions of this work are listed below:

(1) The analytical model of a thin-profile, efficient backside power delivery is proposed, which could help to evaluate the power dissipation and voltage drop in each delivery segment: inductor with RDL3 layer and bumps  $\rightarrow$  decoupling capacitor (metals BSM3 and BSM2)  $\rightarrow$  backside power grid (metal BSM1+*n*TSV+BPR)  $\rightarrow$  active devices (on-chip loadings);

(2) The measurement data of the 3D air-core inductors are provided, and they are verified with the modeling and the simulation results; Moreover, the model of the buck converter with the air-core inductors also coincides with the post-layout simulation results;

(3) The system optimization of the whole backside power delivery is proposed using a computer-aided-design (CAD) procedure. This gives the loss breakdown for the efficient delivery system and the optimal inductor / circuit parameters.

#### II. MODELING OF BACKSIDE PDN/BPR WITH THE INDUCTOR

In Fig. 2, power inductor's output voltage  $V_{DD}$  (~0.7 V) is globally distributed through a RDL3 layer together with  $V_{SS}$ ground terminal. Subsequently, terminals  $V_{DD}$  and  $V_{SS}$  are connected with backside PDN via bumps ( $V_{DD}$ -bump pitch= $P_B$ ). Two backside metal layers (BSM2, BSM3) are used to create a high-density decoupling capacitor (decap,  $C_L$ in Fig. 1) which is required to remove the high-frequency noise, while BSM1, *n*TSV and BPR form the perpendicular power grids and deliver uniform power to the active devices nearby.

To make the analyses clear, Table I lists the key parameters

for the backside PDN and the power inductor. The square resistance and the metal width of the corresponding metal layer are defined as  $R_{\Box BSMi}$  and  $W_{BSMi}$  (*i* =1, 2, 3), respectively.  $R_{Bump}$  is the resistance for each solder bump. The *n*TSVs have a horizontal pitch of  $P_{nTSV}$  and a resistance of  $R_{nTSV}$ . The BPR's resistance density and pitch are defined as  $r_{BPR}$  (50  $\Omega/\mu m$  in [5]) and  $P_{BPR}$ , respectively. Given  $j_0 =$  $P/V_{DD}$  as the average current density in BPR where P refers to the average power density, thus, the current density on the external interconnect  $V_{SW}$  and  $V_{IN}/V_{SS}$  are  $j_0$  and  $D \cdot j_0$ , respectively, where D is the duty cycle of the control signals given in Fig. 1. The assumption of the uniform loading is our first step to study the PDN resistive drop, and it makes sense to serve as the IVR's loading [11]. Moreover, due to the large size of the PDN unit (0.64  $mm^2$  or so), this simplified assumption can be accurate if the size of hotspots in the CPU core is small enough (~600  $\mu m^2$ , in [5]). For the PDN model of a more specific CPU core, it could be our further work. With uniform workloads, the voltage-drop analyses for  $V_{DD}$ grid are derived below, while it is similar for others.

### A. Voltage Drop in RDL3 Layer

For the bulky power inductor (~  $mm^2$  area), the direct highcurrent delivery to the thin backside metal layers would result in a large power dissipation in the interconnect. Therefore, 10µm thick RDL3-copper layer (square resistance  $R_{\square RDL3}$ ) is proposed to distribute uniform power with a low loss before its connection to the backside PDN.

As is shown in Fig. 3, the inductor's  $V_{DD}$  terminal firstly connects with RDL3's bus line, and then distributes the power to every branch with a finer width  $W_{RDL3} = P_B/\alpha$ . For the

case with  $P_B=57 \ \mu m$  and RDL3 thickness & spacing=10  $\mu m$ ,  $\alpha$ =3 is derived. The width of the bus line in Fig. 3 is designed as  $W_{BUS} = P_B(1 + 1/\alpha)$ . For a larger-size inductor, a larger  $W_{BUS}$  is needed. We assume each  $V_{DD}$ -bump delivers the same current. For an inductor with dimensions of  $N_X \cdot P_B$  (X-axis) and  $N_Y \cdot P_B$  (Y-axis), the simplified  $V_{DD}$ -currents in RDL3's bus line and every branch are given in Fig. 4. The  $V_{DD}$ -voltage drop on RDL3's bus line and the branch are given in (1) and (2), respectively, deriving from the integral of current and resistance along the wires. The voltage drop on one  $V_{DD}$ -solder bump is shown in (3). Replacing  $j_0$  with  $D \cdot j_0$  in (1)–(3), the  $V_{ss}$ -voltage drop on RDL3 and bumps can be obtained.



Fig. 3. RDL3 layer with bumps (an example with bump pitch  $P_B=57 \ \mu m$ ,  $N_x = N_y = 14$ ): connection between power inductor and backside PDN.  $V_{DD}$ and  $V_{SS}$  are globally distributed as they are logic die's power supplies.  $V_{IN}$  and  $V_{SW}$  are only connected with the power switches: several bumps locally for them are enough to have a low IR drop although they delivery a high power.

$$\Delta V_{RDL3,bus} = \begin{cases} \frac{\alpha N_X N_Y^2}{8(\alpha+1)} j_0 \cdot P_B^2 \cdot R_{\Box RDL3}, N_Y = even\\ \frac{\alpha N_X (N_Y^2+1)}{8(\alpha+1)} j_0 \cdot P_B^2 \cdot R_{\Box RDL3}, N_Y = odd \end{cases}$$
(1)

$$\Delta V_{RDL3,branch} = \frac{\alpha}{2} (N_X - 3) (N_X - 2) j_0 \cdot P_B^2 \cdot R_{\Box RDL3}$$
(2)  
$$\Delta V_{BUMP} = R_{BUMP} \cdot j_0 \cdot P_B^2$$
(3)

$$I_{x} \xrightarrow{I_{x}=(N_{x}-3) \cdot j_{0}P_{B}^{2}} \xrightarrow{I_{y} \xrightarrow{I_{y}=j_{0}P_{B}^{2}}} X_{y} \xrightarrow{I_{y}=\frac{N_{y}-1}{2}} X_{x} \cdot j_{0}P_{B}^{2}} \xrightarrow{I_{y} \xrightarrow{N_{x}-j_{0}P_{B}^{2}}} X_{x} \cdot j_{0}P_{B}^{2} \xrightarrow{I_{y}=\frac{N_{x}}{2}} X_{y} \cdot j_{0}P_{B}^{2}} \xrightarrow{I_{y}=\frac{N_{x}}{2}} \xrightarrow{I_{y}=\frac{N_{x}}{2}} X_{y} \cdot j_{0}P_{B}^{2}} \xrightarrow{I_{y}=\frac{N_{x}}{2}} \xrightarrow{I_{y}=\frac{N_$$

Fig. 4. Simplied current flows in RDL3's (a) bus line and (b) branches.

In Fig. 3, the  $V_{IN}$  and  $V_{SW}$  bumps are globally distributed as they are the logic die's power supplies as well. For the locally distributed terminals  $V_{IN}$  and  $V_{SW}$  which connect with the specific power switches, normally eight parallel bumps are enough for 1-  $W/mm^2$  power density to reduce the interconnect resistive loss. Therefore, the maximum voltage drops on bumps/RDL3 contributed by  $V_{SW}$  and  $V_{IN}$  are provided in (4)(5) respectively. Both terminals are connected with power switches on the logic die with low-loss interconnect.

$$\Delta V_{SW} = N_X N_Y j_0 P_B^2 \cdot \left(\frac{\alpha}{\alpha+2} R_{\Box RDL3} + \frac{R_{BUMP}}{8}\right)$$
(4)  
$$\Delta V_{IN} = D \cdot \Delta V_{SW}$$
(5)

$$V_{IN} = D \cdot \Delta V_{SW} \tag{5}$$

## B. Voltage Drop from Bumps to Decoupling Capacitors

As is shown in Fig. 5, the regulated power is injected from bumps to the decap (formed by BSM2 and BSM3) and the PDN grid of BSM1; Finally, the current is delivered to the BPR and active devices through the nTSV. To reduce the standby or leakage power, power gating can be added on the logic die via the BPR connections (Fig. 6). The exact BPR grids for the logic circuits is named as  $V_{DD}$ -Switched and  $V_{SS}$ . Generally, the power gating only takes <5% of the total chip area. Its power dissipation occupies less than 5% of the whole power losses and hence that is quite negligible.



Fig. 5. (a) RDL pedestals and backside metals below flip chip bumps, and (b) power grid: BSM1, nTSV, BPR; nTSV pitch=P<sub>nTSV</sub> (horizontal), 2P<sub>BPR</sub> (vertical).



Fig. 6. Power gating on the logic die and its connections with the BPR.

To reduce the voltage drop from bumps to decap, pedestals are inserted to spread the current around the capacitor. With the "cross feed" current, the maximum voltage drop from the bump to the corresponding capacitor corner is reduced to (6).

$$\Delta V_{CAP,MAX} = \frac{R_{\Box BSM3}}{4\pi} \cdot j_0 \cdot P_B^2 \tag{6}$$

For the tap via that connects the decap and BSM1 layer,  $V_{DD}$  (or  $V_{SS}$ ) tap pitch= $2P_{nTSV}$  (both horizontal and vertical directions) is designed. Generally, the tap resistance  $R_{TAP}$  is inversely proportional to the square of the tap pitch (or  $P_{nTSV}$ ). As each tap delivers an average current in (7), the resulting voltage drop in (8) is independent of this tap pitch (or  $P_{nTSV}$ ).

$$I_{TAP} = j_0 \cdot 4P_{nTSV}^2 \tag{7}$$

$$\Delta V_{TAP} = 4R_{TAP} \cdot j_0 \cdot P_{nTSV}^2 \tag{8}$$

## C. Voltage Drop from Power Grids to Active Devices

For each tap via with current equal to (7) and tap pitch of  $2P_{nTSV}$ , the BSM1 wire between two tap vias has a maximum voltage drop shown in (9). For a cell size of  $2P_{nTSV} \times 2P_{BPR}$  including dense *n*TSV and BPR, each *n*TSV contributes voltage drop in (10). Besides, (11) gives the maximum voltage drop on BPR for the uniform workloads.

$$\Delta V_{BSM1,MAX} = R_{\Box BSM1} \cdot \frac{j_0 \cdot P_{aTSV}^3}{W_{BSM1}} \tag{9}$$

$$\Delta V_{nTSV} = 4J_0 R_{nTSV} \cdot P_{nTSV} \cdot P_{BPR} \tag{10}$$

$$\Delta V_{BPR,MAX} = r_{BPR} \cdot j_0 \cdot P_{nTSV}^2 \cdot P_{BPR} \tag{11}$$

The overall voltage drop on the backside PDN/BPR and RDL3 layer can be derived by combining the expressions above and the related  $V_{SS}$  counterpart. Dividing the total voltage drop by the current density  $j_0$ , we obtain the area resistance for the backside PDN/BPR and RDL3 layer. The power dissipation (in density) in the backside PDN/BPR/RDL3 layer can be obtained by means of the following formula: "area resistance  $j_0^2$ ". This can be combined into the power converter design to optimize the whole power delivery efficiency, which is shown below. Moreover, the obtained IR drop (=area resistance  $\cdot j_0$ ) is also a key specification for the HPC systems [4], but it is not our scope here.

## D. Evaluation of Backside PDN and BPR

Based on the analyses above, Fig. 7 gives the area

resistance contributed by backside PDN and BPR. The contribution in RDL3 layer is not included, because it is relevant to the inductor or PDN size, which is more feasible to be investigated together in the whole backside power delivery in the Section III and IV. For the logic dies with large hotspots, the relevant power dissipation and IR drop should be minimized. Thus, the scaling trends of  $P_{nTSV}$  and  $P_B$  (relevant to the backside PDN) are discussed below. Some parameters such as the metal thicknesses are kept constant in Table I, but it could be our future work to see their scaling trends.

(1)  $P_{nTSV}$  scaling: generally, the resistances contributed by tap via and *n*TSVs are negligible. The critical resistances in the BSM1 and BPR scale with the  $P_{nTSV}$  value following Eqs. (9) and (11).

(2)  $P_B$  scaling: The  $P_B$  scaling only has an impact on the voltage drop of the bump and the decap (BSM3/BSM2). The voltage drop is reasonably low if  $P_B$ =57 µm is designed.

For the case with 57- $\mu$ m  $V_{DD}$ -bump pitch (see Fig. 3) and 10- $\mu$ m *n*TSV horizontal-pitch, the resistances contributed by the backside PDN and BPR are 23% and 77%, respectively. With 1500  $\Omega$ · $\mu$ m<sup>2</sup> as the total PDN/BPR area resistance, the PDN resistive loss is less than 1% of the total workloads (power delivery with P=1  $W/mm^2$  and  $V_{DD}$ =0.7 V) and hence negligible.



Fig. 7. Contributions of backside PDN and BPR resistance.

# III. BUCK CONVERTER WITH 3D AIR-CORE INDUCTOR

For the IVR in HPC systems, both the power density (P) and efficiency  $(\eta)$  are important. For the buck converter design, the critical issue is to determine the optimized inductor option (size and winding turn n). As the power losses in its inductor, power switches and the PDN are correlated with each other, only providing the inductor efficiency in [12] would not be accurate. To optimize the whole converter circuit, we can use the accurate models of buck converter circuit and 3D air-core inductor from [13][14]. For an *n*-turn air-core inductor with 4n wire segments in Fig. 1(b), its DC and AC equivalent-series-resistance (ESR) can be obtained using the skin effect model [14]. Its inductance can be computed by summing the self-inductance of all the segments and the mutual inductance of every two segments, which is independent of the inductor structure. To verify this theoretical inductance model, some air-core inductors are fabricated with fixed coil width  $w_{RDL}$ =100 µm, *xpitch*=300 µm and the middle pillar-pitch D=165  $\mu$ m (Fig. 8). The winding turn *n* and the ydirection-pitch *ypitch* are varied. In Fig. 9, a 5-turn inductor with *vpitch*=635 µm has an inductance L=4.15 nH, Qfactor=25@300 MHz and GHz-resonance-frequency from the measurement. The measured L and Q-factor coincides with the simulation result from 3D Electro-Magnetics High-Frequency-Structure-Simulator (HFSS) based on the Finite Element Method. In Table II, the inductance of four fabricated inductors with turn n=5, 9 and ypitch=535 µm, 635 µm are investigated. The measurement result coincides with the simulation result and the theoretical calculation within 4% mismatch. Following this, the inductor' L, ESR, and its winding copper loss  $P_{ESR}$  can be derived from the modeling precisely, and this solution is independent of the inductor structure.



Fig. 8. (a) The micrograph and (b) the layout of a 3D air-core inductor with ypitch=635  $\mu$ m and turn n=5.



Fig. 9. (a) The inductance and (b) the Q-factor of a 5-turn air-core inductor (ypitch=635 µm): simulation result versus measurement data.

TABLE II. INDUCTANCE: MODELING, SIMULATION, AND MEASUREMENT

| Turn, <i>n</i> | <i>ypitch</i><br>[µm] | Model,<br>L [nH] | Simulation*,<br>L [nH] | Measurement*,<br>L [nH] |
|----------------|-----------------------|------------------|------------------------|-------------------------|
| 5              | 535                   | 3.44             | 3.47                   | 3.35                    |
| 5              | 635                   | 4.03             | 4.07                   | 4.15                    |
| 9              | 535                   | 5.32             | 5.39                   | 5.38                    |
| 9              | 635                   | 6.05             | 6.27                   | 6.17                    |

\*The results of simulation and measurement are obtained at 300MHz.





Fig. 10. (a) The layout of power switches which is normalized as 1, and (b) the waveforms of the inductor current  $I_L$  and the switching node  $V_{SW}$  with its typical values based on simulations, and (c) BEOL extraction (color in grey) of the routing resistance and the coupling capacitor.

In this work, we consider the air-core inductor structure shown in Fig. 1(b), because the converter only needs the inductor with turn n=2 or 3, and this inductor is easier to be fitted into a  $V_{DD}$ -power grid with a minimal pitch=400 µm. Given a 2-turn, 0.64-mm<sup>2</sup>-size inductor with L=0.85 nH, DC ESR=26 m $\Omega$ . AC ESR=53 m $\Omega$ @300 MHz, and 28 nm switch. Fig. 10 shows the circuit parameters based on the simulation, which would be used in the circuit model. The unit cell of the power switch layout (optimized at  $1 W/mm^2$ , 0.64 W output, 0.7-V-gate-drive) is 0.04 mm<sup>2</sup> (Fig. 10(a)). Fig. 10(b) gives the rising/falling time of the high-side  $(t_{rH}, t_{fH})$  and the lowside switch  $(t_{rL}, t_{fL})$ , and the deadtime & diode on-time  $(t_{Dr}, t_{fL})$  $t_{Df}$ ) based on the simulation results. As the larger switch would have larger rising/falling times, we double the values

when the normalized switch size s is larger than 1.5. We simplify this computation because our designed conversion ratio (1/2) is not high enough and the switching loss is only a small part of the total losses. Generally, within our interest of the power switch size, the resulting switching loss is only within 16% of the total losses in a power converter, and the deviations between its modeling and simulation results are limited within 3% so they can be considered to be negligible (Fig. 11). Hence it is acceptable to use this simplified calculation in the model. Besides, the forward voltage of the low-side switches' body diode  $V_D$  is 0.2 V and 0.3 V, respectively, for  $1 W/mm^2$  and  $2 W/mm^2$  power delivery. Those extracted results are used to compute the switching loss and the deadtime loss in buck converter. Fig. 10(c) provides the back-end-of-line (BEOL) metal routing information of a unit converter module. For a unit converter cell, the coupling capacitance between two terminals  $(V_{SS}/V_{parL}, V_{parL}/V_{SW})$  $V_{SW}/V_{parH}, V_{parH}/V_{IN}$ ) is 5 pF, which is proportional to the power switch size and would limit the circuit frequency during the system optimization. Its coupling to the transistors' gate (0.5 pF) is neglected to simplify the model. The routing resistance between two switches nearby is close to 8 m $\Omega$  for a 28 nm technology, which is dependent on its BEOL technology but almost not the switch size.



Fig. 11. The resulting switching loss (rising/falling time related) at  $1 W/mm^2$  and  $2 W/mm^2$  and a varied switch size: modeling and simulation.



Fig. 12. The comparison (only the power converter) of simulation results and theoretical model based on a 2-turn, 0.64- $mm^2$ -area inductor and 28nm CMOS.

Fig. 12 provides the power efficiency of a 0.5-ratio buck converter with 2-turn, 0.64-mm<sup>2</sup>-size inductor based on the post-layout simulation result and the theoretical calculation. With the optimal frequency of 220 MHz and 150 MHz for a 1  $W/mm^2$  and  $2W/mm^2$  delivery, respectively. The converter model fits with the simulation results, thus, it is used to find out the optimal circuit efficiency by sweeping the switch size. From the simulation, a 0.5-ratio buck converter ( $V_{DD}$ =0.7 V) can achieve  $\eta = 84\% @1 W/mm^2$  with a 0.04-mm<sup>2</sup>-switch-size (s=1). For the power density goes up to  $2 W/mm^2$ , switch size=0.08  $mm^2$  (s=2) is enough, and the converter achieves  $\eta$ =81%. Compared with the inductor size, the chip size of the power switches is only 12% even for a  $2-W/mm^2$  delivery, therefore, approximately 88% of the space is still left for the HPC system design, and the inductor size is taken as basis for the power density calculation as is shown above.

#### IV. PERFORMANCE OF BACKSIDE POWER DELIVERY

To optimize the whole backside power delivery with an IVR, we can use the model of the 3D air-core inductor and the converter circuit as presented above to accelerate the design procedure and narrow down the inductor options. Together with the modeling of the backside PDN/BPR/RDL3, the whole power delivery efficiency of Fig. 2 is evaluated. If other types of the PDN is required, we can also combine it into the design procedure given below.



Fig. 13. Optimization flowchart of the whole backside power delivery. *r* is the inductor's current ripple percentage and D is duty cycle of the control signals.

TABLE III. INDUCTOR DESIGNS

| Inductor Design         | Α    | В    | С    |
|-------------------------|------|------|------|
| n                       | 2    | 3    | 3    |
| Size [mm <sup>2</sup> ] | 0.64 | 0.64 | 0.32 |
| L [nH]                  | 0.85 | 1.52 | 0.77 |
| DC ESR [mΩ]             | 26   | 47   | 26   |
| AC ESR[mΩ]<br>@300MHz   | 53   | 95   | 59   |

In Fig. 13, we explore the CAD procedure to find out the  $\eta$ -

P Pareto front for the backside power delivery with a buck converter. By sweeping the inductor's design parameters (winding turn n, and inductor/PDN size defined by pgrid\_xpitch/pgrid\_ypitch) together with its current ripple percentage r and the switch size s, a wide range of values for the circuit frequency f is also covered. For every fixed inductor design, the power losses contributed by the inductor's copper winding  $P_{ESR}$ , power switches & the BEOL  $P_{SW}$ , and PDN-related  $P_{PDN}$  can be computed and optimized effectively. Therefore, an optimized power delivery efficiency and its related device/circuit parameters can be obtained by sweeping the overall design parameter space. As each power module should be fitted into the power grid, *pgrid\_xpitch/pgrid\_ypitch* (the inductor dimensions) is constrained to 400 µm×integer number, and the inductor turn *n* is an integer. The design space for the switch size is s>0, and the r parameter is  $0\sim2$  to guarantee that inductor current  $I_L > 0$  during the continuousconduction-mode (CCM) operation (from Fig. 10(b)). Others are not-sweeping parameters which are dependent on the specific applications. For the example of the 7-nm HPC systems, some typical values like  $V_{DD}=0.7$  V and  $P_{target}=1$  $W/mm^2$  are used, and  $V_{IN}$  is determined by the conversion ratio you need. Based on the optimization method above together with 28-nm switch, Table III lists inductor designs identified on the n-P Pareto front, and its parameters (L, ESR) are given with HFSS simulation results.



Fig. 14. n-P evaluation of power delivery with a 0.5-ratio buck converter with different inductor designs. The square, round and triangle dots are the simulation results with inductor A, B and C, respectively.

Fig. 14 gives the optimized results of the backside power delivery including a 0.5-ratio buck converter with  $V_{DD}$ =0.7 V. For the backside PDN,  $V_{DD}$ -bump pitch  $P_B$ =57 µm and *n*TSV horizontal-pitch  $P_{nTSV}$ =10 µm are used. The square, round and triangle dots are the simulation result (PDN model + post-layout result of power converter), and they coincide with the design methodology. It shows that the inductor designs with *n*=2, 3 and the inductor (or PDN) size=0.32, 0.64 *mm*<sup>2</sup> are preferred. The optimal inductor option is dependent on the power density range. If the HPC systems require <1 *W/mm*<sup>2</sup> and 1~1.5 *W/mm*<sup>2</sup> power delivery, inductor B and A are the optimal options, respectively. If the HPC systems requires higher power density (>1.5 *W/mm*<sup>2</sup>), inductor C which has a

smaller size is preferred to reduce the PDN resistive loss and the winding copper loss. From Fig. 15,  $1 W/mm^2$  delivery achieves  $\eta$ =83% with a PDN unit of 0.64  $mm^2$ . The converter's optimal frequency is 220 MHz and 180 MHz if inductor A and B are designed, respectively. With inductor A, the loss breakdown in power inductor, power switches, and the backside PDN/BPR/RDL3 layer are 26%, 66%, and 8%, respectively. For a 5  $W/mm^2$  power delivery with 76% overall efficiency, a smaller inductor/PDN unit cell is designed to reduce the RDL3 area resistance in (1)(2), which mitigates the power dissipation. Meanwhile, the resulting 2x lower area resistance (unit:  $\Omega \cdot mm^2$ ) of the power inductor (inductor C versus inductor A) also helps to reduce the copper winding loss  $P_{ESR}$ .



Fig. 15. With (a)  $1 W/mm^2$  and (b)  $5 W/mm^2$  power density, loss breakdown of the whole power delivery including a 0.5-ratio buck converter. The AC resistive power loss in the RDL3/PDN/BPR is not included.

As compared with state-of-the-art power delivery designs (1  $W/mm^2$  range) in Table III, our design has a thinner profile (200 µm versus 700 µm in [16], and thick silicon interposer in [15]), which is more suitable for in-package integration [17]. For the microprocessor in the extremely thin laptops and tablets, its overall thickness (both the die and the package) is limited to 1.05 mm [18]. Therefore, one improved design in [8] reduces the passive to 200-µm thickness, but it pays with a lower inductor performance and a lower system efficiency.

To have a fair comparison of the system performance, we also provide the simulation result with a CCM operation and an optimal switch size based on the given inductor parameters in [8]. As is shown in Table IV, the simulated efficiency for a 1.6V-to-1.2V conversion is close to the measurement data (91% versus 88%), with only a limited 3% efficiency deviation coming from the PDN resistive loss that is not yet

| Work                                   |                                                                                                            | ISSCC'19<br>[8]                                                                                      | VLSIC'15<br>[15]                                                                                  | APEC'14<br>[16]     | This<br>work        |                    |
|----------------------------------------|------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|---------------------|---------------------|--------------------|
| CMOS Tech.                             |                                                                                                            | 14 nm                                                                                                |                                                                                                   | 45 nm SOI           | 22 nm               | 28 nm              |
| Inductor,<br>Thickness                 | Air-core<br>200 μm                                                                                         |                                                                                                      |                                                                                                   | Magnetic-core<br>NA | Air-core<br>700 µm⁵ | Air-core<br>150 μm |
| Integration<br>Type                    | In<br>Package                                                                                              |                                                                                                      |                                                                                                   | In<br>Interposer    | In LGA<br>Package   | In Package         |
| With PDN                               | Yes                                                                                                        |                                                                                                      |                                                                                                   | Yes                 | NA                  | Yes                |
| Frequency [MHz]                        | 70                                                                                                         |                                                                                                      |                                                                                                   | 150                 | 140                 | 180                |
| V <sub>IN</sub> /V <sub>DD</sub> [V/V] | 1.6/1.2                                                                                                    | 1.6/0.8                                                                                              | 1.4/0.7                                                                                           | 1.66/0.83           | 1.7/1.05            | 1.4/0.7            |
| η <sub>peak</sub> @Ρ [W/mm²]           | 88%@0.86                                                                                                   | 75%@0.57                                                                                             | -                                                                                                 | 82%@0.9             | 90%@1               | 83%°@1;            |
|                                        | 89% <sup>a</sup> @0.86<br>(f <sub>sw</sub> =70MHz);<br>91% <sup>a</sup> @0.86<br>(f <sub>sw</sub> =100MHz) | 77% <sup>a</sup> @0.57 (f <sub>SW</sub> =70MHz);<br>85% <sup>a</sup> @0.57 (f <sub>SW</sub> =150MHz) | 81% <sup>a</sup> @1 (f <sub>SW</sub> =70MHz);<br>84% <sup>a</sup> @1(f <sub>SW</sub> =100~150MHz) |                     |                     | 04 <b>%</b> ⁻₩1    |

<sup>a</sup>Not the published measurement data. We simulate it (converter only) based on 28-nm (not 14-nm) switch with a normalized switch size  $s \approx 1$ ; higher  $f_{SW}$  (>70 MHz, but optimal  $\eta$ ) is used to avoid the reverse current loss in power inductor during the CCM operation. The inductor's AC ESR is obtained from [20];

<sup>b</sup>Obtained from Ref. [18]; <sup>c</sup>PDN model + post-layout simulation result of buck converter; <sup>d</sup>Only post-layout simulation result of buck converter.

considered in the simulation. The absolute errors are up to 10% for a 1.6V-to-0.8V conversion (85% as optimal versus 75%), but the relative ranking between the simulated and measured efficiency stays the same, which is the most important property we want to achieve. The additional errors are probably coming from the not-optimal circuit frequency and switch size for the converter, and the larger reverse current loss [19] caused by a larger feedback-loop delay in a comparator (supplied by  $V_{DD}$ ) for a lower  $V_{DD} = 0.8$  V. Therefore, the simulated efficiency would be (even) closer to the measured result if those secondary effects are also incorporated. Based on the simulation result, our design achieves a clearly higher efficiency ( $\eta$ =84% versus 81% in [8]) and a higher conversion ratio (0.5 versus 0.62 in [16]). The possible reasons for the higher  $\eta$  are the optimal circuit frequency and the lower-AC-ESR inductor that is fabricated in the low-loss molding compound. Including the PDN impact, our designs can still achieve  $\eta$ =83%. Due to the lower designed  $V_{DD}$  (0.7 V versus 1.05 V in [16]) while maintaining the same power-density target, potentially 2.25x higher PDNrelated power dissipation would be required. But that cost is eventually mitigated by our proposed backside power delivery solution. For high-volume applications, the integrated solutions are preferred, and our solution has that advantage over Ref. [8]. Moreover, it uses less layers in the fabrication which gives an additional advantage in this respect (2-layer RDLs versus 3-layer laminations).

# V. CONCLUSION

From power loss aspect, analytical model of backside power delivery is developed. For efficient power delivery, the power dissipation in backside PDN and BPR is only <1% of the output power. Combining buck converter with an in-package, high-*Q*-factor air-core inductor as a PoL voltage regulation, the overall efficiency is evaluated via a systematic CAD procedure with accurate circuit and device models (within 4% deviation). It reveals that the whole power delivery efficiency  $\eta$ =83% can be obtained for a 1-  $W/mm^2$  application. Compared with state-of-the-art designs, we support higher conversion ratio, thinner profile and lower designed  $V_{DD}$  while maintaining the efficiency. Prior works exhibit either lower efficiency or strong integration challenges.

#### REFERENCES

- L. T. Su, S. Naffziger, and M. Papermaster, "Multi-chip technologies to unleash computing performance gains over the next decade," in *IEDM Tech. Dig.*, Dec. 2017, pp. 1.1.1–1.1.8.
- [2] M. Horowitz, "Computing's energy problem (and what we can do about it)," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC)*, Feb. 2014, pp. 10–14.
- [3] M. O. Hossen, B. Chava, G. Van der Plas *et al.*, "Power delivery network (PDN) modeling for backside-PDN configurations with buried power rails and μTSVs," *IEEE Trans. Electron. Devices*, vol. 67, no. 1, pp. 11–17, Jan. 2020.
- [4] D. Prasad, S. S. Teja Nibhanupudi, S. Das *et al.*, "Buried power rails and back-side power grids: Arm® CPU power delivery network design beyond 5nm," in *IEDM Tech. Dig.*, Dec. 2019, pp. 19.1.1–19.1.4.
- [5] B. Chava, J. Ryckaert, L. Mattii *et al.*, "DTCO exploration for efficient standard cell power rails," *Proc. SPIE*, vol. 10588, Mar. 2018, Art. no. 105880B.
- [6] H. Lin, D. Velenis, P. Nolmans *et al.*, "84%-efficiency full integrated voltage regulator for computing systems enabled by 2.5-D high-density MIM capacitor," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 30, no. 5, pp. 661–665, May 2022.
- [7] H. Lin, G. Hiblot, X. Sun *et al.*, "Backside power delivery with a direct 14:1/19:1 high-ratio point-of-load power converter for servers and datacenters," in *Proc. Symp. VLSI Technol.*, Jun. 2021, pp. 1–2.
- [8] C. Schaef, N. Desai, H. Krishnamurthy *et al.*, "A fully integrated voltage regulator in 14nm CMOS with package-embedded air-core inductor featuring self-trimmed, digitally controlled variable on-time discontinuous conduction mode operation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 154–155.
- [9] H. K. Krishnamurthy, V. Vaidya, P. Kumar *et al.*, "A digitally controlled fully integrated voltage regulator with on-die solenoid inductor with planar magnetic core in 14-nm Tri-Gate CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 1, pp. 8–19, Jan. 2018.
- [10] W. J. Lambert, M. J. Hill, K. P. O'Brien *et al.*, "Study of thin film magnetic inductors applied to integrated voltage regulators," *IEEE Trans. Power Electron.*, vol. 35, no. 6, pp. 6208–6220, Jun. 2020.
- [11] T. M. Andersen, F. Krismer, J. W. Kolar *et al.*, "A 10 W on-chip switched capacitor voltage regulator with feedforward regulation capability for granular microprocessor power delivery," *IEEE Trans. Power Electron.*, vol. 32, no. 1, pp. 378–393, Jan. 2017.

TABLE IV.COMPARISON WITH THE REPORTED IVRS

- [12] T. M. Andersen, C. M. Zingerli, F. Krismer *et al.*, "Modeling and pareto optimization of microfabricated inductors for power supply on chip," *IEEE Trans. Power Electron.*, vol. 28, no. 9, pp. 4422–4430, Sept. 2013.
- [13] Rohm Semiconductor, Switching Regulator IC Series: Efficiency of Buck Converter, Application Note. Accessed: 2016. [Online]. Available: https://fscdn.rohm.com/en/products/databook/applinote/ic/power/switching\_regulator/buck\_converter\_efficiency\_app-e.pdf
- [14] H. Lin, G. Van der Plas, X. Sun *et al.*, "System optimization: high-frequency buck converter with 3D in-package air-core inductor," *IEEE Trans. Compon., Packag., Manuf. Technol.*, vol. 12, no. 3, pp. 401–409, Mar. 2022.
- [15] K. Tien, N. Sturcken, N. Wang *et al.*, "An 82%-efficient multiphase voltage-regulator 3D interposer with on-chip magnetic inductors," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2015, pp. C192–C193.
- [16] E. A. Burton, G. Schrom, F. Paillet *et al.*, "FIVR–Fully integrated voltage regulators on the 4th generation Intel Core SoCs," in *Proc. 29th Annu. IEEE Appl. Power Electron. Spec. Conf. Expo.*, Mar. 2014, pp. 432–439.
- [17] D. Yu, "A new integration technology platform: integrated fan-out wafer-level-packaging for mobile applications," in *Proc. Symp. VLSI Technol.*, Jun. 2015, pp. T46–T47.
- [18] W. J. Lambert, M. J. Hill, K. Radhakrishnan *et al.*, "Package inductors for Intel fully integrated voltage regulators," *IEEE Trans. Compon.*, *Packag., Manuf. Technol.*, vol. 6, no. 1, pp. 3–11, Jan. 2016.
- [19] C. Huang and P. K. T. Mok, "An 84.7% efficiency 100-MHz package bondwire-based fully integrated buck converter with precise DCM operation and enhanced light-load efficiency," *IEEE J. Solid-State Circuits*, vol. 48, no. 11, pp. 2595–2607, Nov. 2013.
- [20] C. Schaef, N. Desai, H. K. Krishnamurthy *et al.*, "A light-load efficient fully integrated voltage regulator in 14-nm CMOS with 2.5-nH packageembedded air-core inductors," *IEEE J. Solid-State Circuits*, vol. 54, no. 12, pp. 3316–3325, Dec. 2019.



**Hesheng Lin** (S'21) received the M.S. (Hons.) degree from the School of Electronic and Computer Engineering, Peking University, Beijing, in 2016. He is currently pursuing the Ph.D. degree with KU Leuven, Leuven, Belgium, continuing his study on power delivery network designs for computing systems

and its related power management integrated circuits (ICs).

He held an Internship (or Research Assistant) position with Solomon Systech Ltd., Shenzhen, China, from 2014 to 2016, SKL-AMSV, University of Macau, Macau, from 2016 to 2017, and Peking University Shenzhen Graduate School, Shenzhen, 2017 Fall. He is currently with imec and KU Leuven, Leuven, Belgium.



**Dimitrios Velenis** is group leader of the 3D and silicon Photonics Device and Component group at imec. He has been with imec for more than 13 years working on benchmarking and cost model activities for integration flows in 3D and Silicon Photonics interconnects.

Previously, Dimitrios worked as Assistant Professor at the ECE Department at Illinois Institute of Technology, and as Research Associate at the University of Rochester. He has obtained M.Sc. and Ph.D. degrees from the University of Rochester. His Ph.D. work was awarded with the 2004 Outstanding Dissertation Award from the European Design Automation Association. He is the author and co-author of over 70 papers in journals and conference proceedings.



Geert Van der Plas obtained Ph.D. degree from the Katholieke Universiteit Leuven, Belgium, in 2001. He joined imec, Belgium, in 2003.

He has been working on energy efficient data converter, power/signal integrity and 3D integration technologies.

Currently he is program manager in the 3D program addressing system

scaling using advanced 3D (TSV) and packaging (FO-WLP) technology for high performance, mobile and IoT applications. His interests are in characterization, modeling, system exploration and design enablement of 3D integration technologies.



**Francky Catthoor** has been active in research on architectural methodologies; co-exploration of application, computer architecture and deep-submicron technology aspects; wireless, IoT and biomedical systems; and renewable energy systems, all at IMEC. He is IMEC senior fellow, part-time full professor at KU Leuven and IEEE fellow.



Xiao Sun received the Ph.D. degree in electrical engineering from Université Grenoble Alpes, Grenoble, France, in 2001.

Currently, she is Principal Member of Technical Staff at imec, Leuven, Belgium. She has (co)authored over 70 peer-reviewed journals and conference articles and one book chapter. Her research interests include RF design,

modeling and characterization for 3-D interconnects and their impact on 3-D integrated circuits (ICs), heterogeneous integration, and packaging for RF and 5G applications.

Dr. Sun was a recipient of the prestigious Outstanding Session Paper Award from the IEEE Electronic Components and Technology Conference (ECTC) 2015.



**Rudy Lauwereins** is vice president at imec responsible for the digital and user-centric solutions unit, which focuses on security, connectivity, image processing, sensor fusion, artificial intelligence, machine learning, data analytics and on making technology society proof. He

is a part-time Full Professor at the Katholieke Universiteit Leuven, Belgium and has authored and co-authored more than 500 peer reviewed publications in international journals, books and conference proceedings. He is a fellow of the IEEE.

Eric Beyne (biography not available at this moment)