# High-Density Standard Cell Libraries with Backside Power Options in A14 Nanosheet Node

Halil Kükner, Gioele Mirabelli, Sheng Yang, Yun Zhou, Alexander Makarov, Yang Xiang, Juergen Boemmels, Anabela Veloso, Odysseas Zografos, Pieter Weckx, Julien Ryckaert, and Geert Hellings

imec, Kapeldreef 75, 3001 Leuven, Belgium

## ABSTRACT

Beyond FinFET device nodes, nanosheet is the next transistor architecture in CMOS scaling roadmaps. On top of the newer device architectures and materials, several other CMOS scaling boosters are being considered, and can help in further to improve the power, performance and area scaling.

Backside power delivery network (BSPDN) is one of the promising scaling boosters, e.g. it disengages metal routing resources from the frontside, resulting in a lower routing congestion. Hence, the BSPDN booster paves the way for higher frequency and lower area footprint. However, ad-hoc standard cell design and optimization is required to connect the BSPDN network to the logic devices located in the front-end-of-line (FEOL).

In this study, the implementation of different connection options to the BSPDN are studied in imec's A14 nanosheet node: i.e. Through Silicon Via in the Middle of Line (TSVM), buried power rail (BPR) and direct backside contact (BSC). The different implications on standard cell design, as cell track height, routing and main process challenges are then compared to the classic frontside power delivery option.

Finally, high-density (HD) standard cell libraries are implemented and characterized. Normalized area and delay comparisons at the library-level are presented. Area gains can rise up to 25% in case of BSC BSPDN option. Furthermore, maximum delay gains can vary up to 20% depending on standard cell type.

Keywords: Backside power delivery, scaling booster, nanosheet, standard cell library, scaling roadmap, DTCO

# 1. INTRODUCTION

CMOS logic area scaling has enabled higher functionality per unit die area with each scaled node, where the main strategy was built on the lithography-driven dimensional scaling for decades. When the happy dimensional scaling pace slowed down, then the standard cell track height scaling was the scaling method<sup>1,2</sup> for CMOS logic roadmaps, where the number of effective routing tracks were dropped.<sup>3</sup> Scaled cell heights stalemated the frontside power and ground (PG) rails at the standard cell boundaries, where the PG rails were deeply scaled, since highly resistive (e.g. 900  $\Omega/\mu m$ )<sup>4</sup> but still relatively wider than their routing track counterparts. Overall, the less number of routing resources and highly resistive PG rails were significantly derating the design Quality of Results metrics, e.g. the routing congestion rate and the IR drop level.<sup>4</sup> Under the aforementioned circumstances, the BSPDN concept has been proposed<sup>5,6</sup> to further boost the logic scaling by relieving the available routing track resources, as well as to leverage the usage of wider pitched, hence lower resistance backside metal rails. Since its proposal, the BSPDN concept has gained attention. Power and performance gains were shown at both ring oscillator (RO) and block-level for both High Performance (HP) and High Density (HD) logic families, leading to an area reduction of 8% to 16%, limited IR drop levels below 35 mV;<sup>7,8</sup> power, performance, area and IR drop gains between 10% to 24%.<sup>9</sup> As well, processor-level simulations reported 30%, 6% and 20% improvements on platform voltage droop, frequency and wire length reduction respectively.<sup>10</sup>

This study presents high-density (HD) standard cell libraries in imec's A14 nanosheet node with the frontside (M0) and the backside (TSVM, BPR and BSC) power delivery options. Area and delay comparisons are done at the library-level. Results show promises of the backside power delivery for high-density application domains.

DTCO and Computational Patterning III, edited by Neal V. Lafferty, Proc. of SPIE Vol. 12954, 1295409 · © 2024 SPIE · 0277-786X · doi: 10.1117/12.3010866

Further author information: halil.kukner@imec.be

The rest of the paper is organized as follows. Section 2 summarizes the standard cell architectures. Section 3 describes the experimental setup. Section 4 presents and discusses the results. Finally, the last section concludes the paper.

# 2. STANDARD CELL ARCHITECTURES

A summary of the standard cell library parameters for the frontside (FS) and the backside (BS) power delivery options are listed in Table 1, e.g. CPP: 42 nm, M0P: 18 nm, M1P: 28 nm metal pitch values, gate extension: 9 nm, n-p separation: 28 nm, device width: 12 nm, gate length: 14 nm, etc. All PDN technology options are based on the same design ground rules. Cell track heights, power rail layers and the via layers will be explained next.

|                   | Frontside (FS) | Backside (BS)   |               |                 |
|-------------------|----------------|-----------------|---------------|-----------------|
|                   | M0             | TSVM            | BPR           | BSC             |
| Power Rail Layer  | M0             | BM0             | MBUR          | BM0             |
| Via Layer         | VD             | $\mathrm{TSVM}$ | VBUR          | BSC             |
| CPP [nm]          | 42             | 42              | 42            | 42              |
| Cell Track Height | $6\mathrm{T}$  | $6\mathrm{T}$   | $5\mathrm{T}$ | $4.5\mathrm{T}$ |
| M0 Pitch [nm]     | 18             | 18              | 18            | 18              |
| M1 Pitch [nm]     | 28             | 28              | 28            | 28              |

Table 1. Standard cell library specifications for the frontside and the backside power delivery options.

Process cross-sections for the FS and the BS power delivery options are shown in Fig. 1. Nanosheets (i.e. Active) are merged at the source and drain regions by the grown epi layer. The source and drain diffusion regions are contacted by a contact metal (i.e. MD).

The traditional FS M0 option has the power and ground (PG) rails at the FS M0 layer, where the VD via connects M0 rails to MD. Although the FS M0 cells have a height of 6 tracks (6T) in total, 2 routing tracks are lost due to the wider frontside PG rails at the standard cell boundaries. Therefore, the available routing tracks at the M0 layer effectively reduces to 4 tracks.

The BS TSVM option makes the use of the TSVM technology, where the PG rails are located at the backside M0 (i.e. BM0). A tall TSVM via connects the BS BM0 to the extended side of the MD layer at the frontside. Due to the formation process of TSVM, it has an etch tapering angle of ~  $87^{\circ}$ .<sup>7</sup> To satisfy at least a 10 nm wide landing space on BM0 rails, the TSVM width has to be at least 26 nm at the FS device level. Given that the M0 pitch is 18 nm (10 nm of width and 8 nm of spacing), a minimum 26 nm of TSVM width results in a loss of 2 M0 tracks in total at the cell boundaries. Therefore, the available routing tracks at M0 that can be internally used by a cell, effectively drops down to 4 tracks. Although the TSVM technology enables a BSPDN connection to the standard cells, the total cell height still remains the same as the traditional FS M0 technology, due to the wide structure of TSVM.

To overcome this limitation, a narrower via can be a solution to lower the via area overhead at the cell boundary. However in this scenario, the power rails are required to be brought closer to the device level, since a narrower via imposes shorter vertical lengths due to the vertical aspect ratio limitations.

The BS BPR option brings a 2 stage solution to connect the frontside devices to the BSPDN.<sup>3, 5, 11</sup> First, local PG rails (i.e. BPR) that are closely located to the device plane, are partially buried in shallow trench isolation (STI) oxide and silicon, so that a shorter via (i.e. VBUR) can be utilized to access the BPR PG rails. Secondly, nano-through-silicon vias (nano-TSV) can further connect the BPR rails to the backside BM0 rails. Better than a 10 nm of alignment accuracy between nano-TSVs and BPR rails have been achieved.<sup>11</sup> On the other hand,



Figure 1. Process cross-sections for the frontside and the backside power delivery options: a. FS M0, b. BS TSVM, c. BS BPR, and d. BS BSC.

the narrower VBUR rises the via resistance by  $3 \times$  with respect to the BS TSVM technology.<sup>7</sup> Moreover, the BS BPR option carries a contamination risk of the FEOL process.<sup>5</sup> BS BPR option has a cell height of 5T, where 1 M0 track is lost due to the VBUR via width and the MD tip-to-tip limitations at the cell boundaries.

Finally, the BS BSC option is the ultimate scaling scenario for the coplanar nanosheet-based MOS devices, where the area overhead of PG vias at the cell boundaries are completely eliminated. In this scenario, the total cell height is determined by the device width, the n-p separation, the gate extension and the MD tip-to-tip parameters. The BS BSC option results in an almost ideal 4.5T total cell height, with 4 M0 routing tracks. There are two variants of BSC:<sup>7</sup> BSC-E (epi) where the BSC via directly lands on the device epi from the backside; BSC-M (metal) where the BSC via is laterally extended to create a full Wrap Around Contact (WAC), hence lowering the access resistance. In this study, the BSC-M variant is chosen in standard cell designs, and will be referred as BSC in the text.

As a final remark, BS BPR and BS BSC options can be also configured to provide larger device widths, hence stronger drive strengths than the FS M0 and BS TSVM options under the assumption of the same cell height. In this study, we are limiting our analysis to the area scaled versions of the BS PDN options while keeping the device widths as the same.

#### **3. EXPERIMENTAL SETUP**

Standard cell libraries for the aforementioned 4 PDN options are designed and characterized, where each library consists around 200 cells. This amount of cells cover a wide set of drive strengths and representative logic groups varying from inverters, buffers to multi-bit flip-flops, etc. An in-house developed library characterization flow based on the commercial EDA tools is used at different stages during the standard cell design,<sup>12</sup> the cell-level parasitics extraction,<sup>13</sup> the library characterization and the library comparison.<sup>14, 15</sup>

## 4. RESULTS

Normalized area values for the frontside and the backside power delivery options are shown in Fig. 2. Since the CPP and M0 pitch values are not scaling through the PDN options, the area scaling is directly proportional to



Figure 2. Library-level area comparisons of the frontside and the backside power delivery options.

the standard cell track height scaling, i.e. FS M0: 6T, BS TSVM: 6T, BS BPR: 5T (17%), BS BSC: 4.5T (25%) as listed in Table 1.

Delay data of each timing arc of each standard cell in the Liberty libraries are compared. The library comparison EDA tool<sup>14</sup> uses the delay values stored in the lookup tables for varying slew rate and cell output capacitive load values. Although the cell output capacitive load values are not matching through the scaled libraries, the interpolated values are automatically calculated by the EDA tool<sup>14</sup> during the comparison. Fig. 3 shows the distributions for the ratio of delay differences of each data point between the FS M0 to the BS TSVM, BPR and BSC PDN options, respectively. The y-axis shows the number of occurences in distributions, while the x-axis (i.e. *Ratio of Delay Difference* (%)) shows the relative delay difference in [%] between the timing libraries. A negative relative delay difference means that the comparison library has a lower delay value than the reference library at a given delay data point.



Figure 3. Distributions for the ratio of delay differences between the FS M0 and BS TSVM, BPR and BSC libraries.

The FS M0 library is taken as the reference library, while the BSPDN libraries are taken as the comparison libraries. Library-level comparison results show that the BSPDN libraries have lower cell delays than the FS M0 library up to 20%. On the other hand, the delay difference distributions are shifting towards the origin, when the PDN option switches from BS TSVM to BPR and BSC. However it is important to consider that the library comparison EDA tool<sup>14</sup> compares the cell delays at the same interpolated output capacitive load levels. In other words, highly scaled BS BPR and BS BSC cell delays are compared at higher capacitive load levels which are closer to the FS M0 library. Therefore, the delay gains of the scaled BSPDN options are shown as diminishing in the relative delay difference graphs based on the library-level comparisons.<sup>7</sup>

The normalized mean delay difference of the library comparison distributions and the normalized average stage delay of a ring oscillator benchmark are showed in Fig. 4. At library-level (shown in blue bars), the BS PDN libraries are exhibiting closer mean delay values to the FS M0 library. More critically, the relative performance trends between the BS PDN libraries (i.e. TSVM, BPR and BSC) are not visible at library-level. The BS BPR and BSC options show higher mean delay levels than the BS TSVM option, since the cell delays are compared at the same output capacitive load levels. To further investigate the delay comparisons of the scaled libraries, a 15-stage INVD1 ring oscillator with a fan-out of 3 is built for 4 PDN options with the back-end-of-line loads. The normalized average stage delay in the RO benchmark (shown in orange bars) for the FS M0 to the BS TSVM, BPR and BSC PDN options, respectively. The delay scaling trend is visible, where the BS BSC option outperforms the BS TSVM and BPR options, since the output capacitive loads are lower at the scaled libraries for the same sized devices. In a real usage scenario similar to the RO benchmark experiment, the 4.5T BS BSC critical path cells will be loaded by other 4.5T BS BSC cells, not by 6T FS M0 or 6T BS TSVM cells. In summary, direct conclusions based on the library comparisons can be misleading, especially between the scaled libraries, where the cell-level parasitics and the capacitive load levels are significantly varied.



Figure 4. (Blue) Normalized mean delays based on the library comparisons, (orange) normalized average stage delay in the ring oscillator benchmarks for the frontside and the backside power delivery options.

#### 5. CONCLUSIONS

In this study, the design implications of the different BSPDN options (i.e. BS TSVM, BS BPR and BS BSC) on the HD standard cells are compared considering for imec's A14 nanosheet node. Normalized area and delay comparisons at the library-level are presented. Due to the library comparison EDA tool's working principles,<sup>14</sup> it is hard to observe and determine the real delay gains between FS M0 and the deeply scaled BSPDN libraries. Therefore, DTCO and STCO studies at higher abstraction levels are strongly required to be able to identify the power-performance-area trade-offs between the aforementioned BSPDN options.

### REFERENCES

- [1] Bardon, M. G., Sherazi, Y., Schuddinck, P., Jang, D., Yakimets, D., Debacker, P., Baert, R., Mertens, H., Badaroglu, M., Mocuta, A., Horiguchi, N., Mocuta, D., Raghavan, P., Ryckaert, J., Spessot, A., Verkest, D., and Steegen, A., "Extreme scaling enabled by 5 tracks cells: Holistic design-device co-optimization for finfets and lateral nanowires," in [2016 IEEE International Electron Devices Meeting (IEDM)], 28.2.1–28.2.4 (2016).
- [2] Mocuta, A., Weckx, P., Demuynck, S., Radisic, D., Oniki, Y., and Ryckaert, J., "Enabling cmos scaling towards 3nm and beyond," in [2018 IEEE Symposium on VLSI Technology], 147–148 (2018).
- [3] Beyne, E., Jourdain, A., and Beyer, G., "Nano-through silicon vias (ntsv) for backside power delivery networks (bspdn)," in [2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)], 1–2 (2023).
- [4] Nibhanupudi, S. S. T., Prasad, D., Das, S., Zografos, O., Robinson, A., Gupta, A., Spessot, A., Debacker, P., Verkest, D., Ryckaert, J., Hellings, G., Myers, J., Cline, B., and Kulkarni, J. P., "A holistic evaluation of buried power rails and back-side power for sub-5 nm technology nodes," *IEEE Transactions on Electron Devices* 69(8), 4453–4459 (2022).
- [5] Ryckaert, J., Gupta, A., Jourdain, A., Chava, B., Van der Plas, G., Verkest, D., and Beyne, E., "Extending the roadmap beyond 3nm through system scaling boosters: A case study on buried power rail and backside power delivery," in [2019 Electron Devices Technology and Manufacturing Conference (EDTM)], 50–52 (2019).
- [6] Gupta, A., Pedreira, O. V., Arutchelvan, G., Zahedmanesh, H., Devriendt, K., Mertens, H., Tao, Z., Ritzenthaler, R., Wang, S., Radisic, D., Kenis, K., Teugels, L., Sebai, F., Lorant, C., Jourdan, N., Chan, B. T., Subramanian, S., Schleicher, F., Hopf, T., Peter, A. P., Rassoul, N., Debruyn, H., Demonie, I., Siew, Y. K., Chiarella, T., Briggs, B., Zhou, X., Rosseel, E., De Keersgieter, A., Capogreco, E., Litta, E. D., Boccardi, G., Baudot, S., Mannaert, G., Bontemps, N., Sepulveda, A., Mertens, S., Kim, M.-S., Dupuy, E., Vandersmissen, K., Paolillo, S., Yakimets, D., Chehab, B., Favia, P., Drijbooms, C., Cousserier, J., Jaysankar, M., Lazzarino, F., Morin, P., Altamirano, E., Mitard, J., Wilson, C. J., Holsteyns, F., Boemmels, J., Demuynck, S., Tokei, Z., and Horiguchi, N., "Buried power rail integration with finfets for ultimate cmos scaling," *IEEE Transactions on Electron Devices* 67(12), 5349–5354 (2020).
- [7] Yang, S., Schuddinck, P., Garcia-Bardon, M., Xiang, Y., Veloso, A., Chan, B. T., Mirabelli, G., Hiblot, G., Hellings, G., and Ryckaert, J., "Ppa and scaling potential of backside power options in n2 and a14 nanosheet technology," in [2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)], 1-2 (2023).
- [8] Sisto, G., Preston, R., Chen, R., Mirabelli, G., Farokhnejad, A., Zhou, Y., Ciofi, I., Jourdain, A., Veloso, A., Stucchi, M., Zografos, O., Weckx, P., Hellings, G., and Ryckaert, J., "Block-level evaluation and optimization of backside pdn for high-performance computing at the a14 node," in [2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)], 1–2 (2023).
- [9] Choi, S., Jung, J., Kahng, A. B., Kim, M., Park, C.-H., Pramanik, B., and Yoon, D., "Probe3.0: A systematic framework for design-technology pathfinding with improved design enablement," *IEEE Transactions* on Computer-Aided Design of Integrated Circuits and Systems, 1–1 (2023).
- [10] Shamanna, M., Abuayob, E., Aenuganti, G., Alvares, C., Antony, J., Bahudhanam, A., Chandran, A., Chew, P., Chatterjee, A., Chauhan, B., Dandeti, N., Desai, J., Doyle, M., Dmukauskas, T., Farache, P., Fetzer, E., Fischer, K., Hack, P., Greenzweig, Y., Giacobbe, J., Hafez, W., Haralson, E., Hegde, A., Illa, A., Islam, M., Jain, S., Jang, M., Nguyen, J., Tong, T., Jiang, L., Karl, E., Kalangi, P., Khoo, G., Krishnamoorthy, A., Kuns, B., Li, W., Livengood, R., Malik, T., Priyanka, R., Faraby, H., Maymon, Y., Mistry, K., Morgan, K., Natarajan, S., Nevo, O., Oh, M., Pardy, P., Park, J., Penmatsa, P., Phelps, B., Peterson, C., Rajappa, S., Raveh, A., Rezaie, A., Ravishankar, T., Ramaswamy, R., Reddy, S., Saha, R., Sen, S., Sanchez, R., Sanaga, R., Simkhovich, B., Sell, B., Senger, M., Schnarch, B., Seshadri, M., Sidorov, O., Subramaniam, S., Subramanian, K., Truong, B., Bangalore, S., Hicks, J., Venkatesh, S., Christensen, D., Bhargav, K., Haartman, M. V., Joshi, P., Zickel, S., Lin, C.-H., Huening, J., Wu, T.-H., Bakken, N., Afzal, A., Raman, A., Rao, S., Kawar, V., Neirynck, J., Bradley, D., Duwe, M., Wu, S., Patil, V., and Bayoumy, M., "E-core implementation in intel 4 with powervia (backside power) technology," in [2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)], 1–2 (2023).

- [11] Jourdain, A., Stucchi, M., Van der Plas, G., Beyer, G., and Beyne, E., "Buried power rails and nano-scale tsv: Technology boosters for backside power delivery network and 3d heterogeneous integration," in [2022 IEEE 72nd Electronic Components and Technology Conference (ECTC)], 1531–1538 (2022).
- [12] Virtuoso Cadence R Virtuoso R Layout Suite.
- [13] QuickCap Synopsys® QuickCap® NX.
- [14] Liberate Cadence (R) Liberate<sup>™</sup> Trio<sup>™</sup> Characterization Suite.
- [15] Spectre Cadence & Spectre Circuit Simulator.