# A Bypassable Scan Flip-flop for Low Power Testing with Data Retention Capability

Xugang Cao<sup>1</sup>, Hailong Jiao<sup>1, 3</sup>, Member, IEEE, and Erik Jan Marinissen<sup>2, 3</sup>, Fellow, IEEE

<sup>1</sup>School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China

<sup>2</sup>IMEC, Leuven, Belgium

<sup>3</sup>Electronic Systems group, Eindhoven University of Technology, Eindhoven, The Netherlands

Abstract—The power consumption of modern highly complex chips during scan test is significantly higher than the power consumed during functional mode. This leads to substantial heat dissipation, excessive IR drop, and unrealistic timing failures of the integrated circuits (ICs) under test. In this brief, a ByPassable Scan Data Retention Flip-Flop (BPS-DRFF) is proposed for lowpower IC test. The proposed flip-flop contains two secondary latches. The output of the "function" secondary latch goes to the following combinational circuits, while the other "shadow" secondary latch is used to shift test vectors during scan test. By gating the output of the function secondary latch, the redundant switching activity in the combinational circuits is eliminated during scan shift, thereby reducing the test power consumption significantly. The suppressed switching activity also leads to lower IR drop across the chip, increasing the chip manufacturing yield. Furthermore, the shadow latch is reused for data retention in the sleep mode while performing power gating, thereby alleviating the area cost of the shadow latch. The proposed BPS-DRFF also eases the hold time sign-off in the test mode due to the elongated clockto-Q contamination delay that is brought in by the shadow latch. The proposed design is applied to an AES-128 crypto core in a UMC 55-nm low power CMOS technology. Experiment results show that 68.5% power is saved during scan test with the proposed BPS-DRFF, compared to the standard scan retention flip-flop.

*Index Terms*—Low power test, design for testability, scan shift, shadow latch, power gating.

## I. INTRODUCTION

**S**ERIAL scan design is a widely used design-for-testability (DfT) technique for achieving high quality digital ICs. The complexity of digital ICs increases substantially with technology scaling due to the ever-increasing demand for higher performance and fancier functionality. The power consumption of these digital ICs during test has also grown dramatically, which is significantly higher than the power consumed during the functional mode due to the high switching activity of combinational circuits during test vector shifting [1]. High scan shifting power leads to excessive heat dissipation and IR drop as well as unrealistic timing failures [2], leading to lower circuit reliability and reduced manufacturing yield. Therefore, low-power test techniques are highly demanded.

During scan test of digital ICs, the test vectors are shifted serially into the flip-flops of the scan chain. Since the combinational circuits are not isolated from the outputs of scan flip-flops, massive redundant switching activity occurs in the combinational circuits, which is meaningless. It is reported that  $\sim$ 78% of the total energy during the scan test is dissipated in the combinational circuits [3].

Various techniques have been proposed for scan test power reduction in the literature, which can be categorized into two types: software and hardware techniques. The software innovations focus on the modification of automatic test pattern generation (ATPG) [4] – [5]. Test vector reordering reduces the number of transitions between two consecutive vectors [4]. The X-Filling technique fills don't care bits (X bits) into the test patterns to reduce the switching activity during a test [5]. However, the effectiveness of these techniques depends on the characteristics of specific test patterns.

The hardware innovations focus on the modification of hardware structure, including scan chain architecture [6] - [8]and scan cell [3], [9], [10]. The scan chain reordering method, also known as scan chain stitching, can reduce the number of transitions by changing the order of scan cells [7]. However, modifying the scan architecture typically requires modification of traditional ATPG flow and brings in area, performance, and power cost as well. Some techniques modify the basic scan cells to eliminate the useless transitions in the combinational circuits. In [3], a NAND or NOR gate is used to freeze the combinational inputs to logic 1 or 0 during scan shift. In [9], a modified scan flip-flop with an extra transmission gate is proposed. The transmission gate isolates the secondary latch from the following combinational circuits during scan shift. However, in both [3] and [9], the scan output is not gated during the functional mode, which leads to significantly higher power consumption compared with the standard scan flip-flop.

In this brief, a new scan flip-flop that contains two secondary latches is proposed. The output of one "function" secondary latch goes to the combinational circuits, while the other "shadow" one is used to shift test vectors during scan test. By gating the function output of the flip-flop, the redundant switching activity in the combinational circuits is eliminated, thereby saving substantial dynamic power and suppressing the IR drop across the chip. The chip manufacturing yield is therefore increased. Furthermore, the shadow latch is reused for data retention during power gating, thereby alleviating the area cost of the shadow latch. The proposed scan flip-flop also eases the hold time sign-off in the test mode due to the elongated clock-to-Q contamination delay caused by the shadow latch.

## II. THE PROPOSED BYASSABLE SCAN FLIP-FLOP WITH DATA RETENTION CAPABILITY

The proposed bypassable scan flip-flop with data retention capability (BPS-DRFF) is presented in this section. The schematic of BPS-DRFF is shown in Fig. 1. The proposed scan

<sup>© 2022</sup> IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. https://doi.org/10.1109/TCSII.2021.3096885

flip-flop has two outputs, Q and SQ, which are driven by two secondary latches, function secondary latch and shadow secondary latch, respectively. Q goes to the following combinational circuits. SQ is used to shift test vectors during serial shifting. During the functional mode, SQ is gated, so that BPS-DRFF performs as a conventional flip-flop. During scan shifting, Q is gated. The unnecessary switching of following combinational circuits is thereby avoided. To reduce the area cost, the shadow secondary latch is reused for data retention when power gating is performed, thereby serving two purposes with one circuit.



Fig. 1. The schematic of proposed scan flip-flop. The logic gates in blue are always powered on. HVT transistors and CMOS logic gates are represented with thick lines in the transistor and gate symbols, respectively.

In BPS-DRFF, regular threshold voltage (RVT) transistors are used for the logic gates along the critical signal path. High threshold voltage (HVT) transistors are used for logic gates that are on non-critical paths to suppress leakage currents. The logic gates in blue are powered by always-on power supply. SE is the scan enable signal. SAVE and RESTORE are used to perform data retention and restoration while performing power gating. C1 is used to isolate the shadow secondary latch in the functional mode. C2 serves as the inverted signal of clock in the test mode and helps save data in the sleep mode. C1 and C2 are generated as in (1) and (2), and shown in Fig. 1.

$$Cl = SAVE + SE. \tag{1}$$

$$C2 = \overline{SE \cdot bclk + NSE \cdot SAVE}.$$
 (2)

#### A. Functional Mode

In the functional mode, SE = 0, SAVE = 0, and RESTORE = 0. Therefore, C1 = 0 and C2 = 1. Both T5 and T6 are thereby cut off. The shadow secondary latch is detached from the main flip-flop. Alternatively, T3 and T4 are turned on. The BPS-DRFF therefore operates the same as the conventional flip-flop.

### B. Test Mode

In the test mode, SE transitions high. Meanwhile, SAVE = 0 and RESTORE = 0. Therefore, C1 = 1 and C2 =  $\overline{bclk}$ . T3 is cut off, while T6 is turned on. The function secondary latch is detached from the main flip-flop, while the shadow secondary latch is connected to the main latch for serial test vector shifting through SQ. The BPS-DRFF performs as a standard scan flip-flop with a scan output SQ.

The connection between two adjacent scan flip-flops in a scan chain is shown in Fig. 2. The scan output SQ of the first BPS-DRFF is connected to the scan input (SI) of the second BPS-DRFF in the scan chain directly, thereby bypassing the combinational logic circuits in between. When the test vectors are shifted through this scan chain, the combinational circuits between the BPS-DRFFs remain quiet.



Fig. 2. The connection between two adjacent BPS-DRFFs.

When the serial shift-in phase is finished, the test vectors are stored in the shadow secondary latches rather than the function secondary latch. Before the capture phase starts, these test vectors must be transferred into the function secondary latches that drive the combinational circuits. Similarly, before the beginning of shift-out phase, the captured values in the function secondary latches must be transferred into the shadow secondary latches. Due to the difference from the conventional scan flip-flop, the waveform of SE needs to be adjusted as shown in Fig. 3.



Fig. 3. The waveform of serial scan for stuck-at test.

SE changes the logic level when the clock signal is high as indicated by ② in Fig. 3. The test vectors are then captured by the following scan flip-flops during the next rising edge of clock for computing with the combinational circuits, as indicated by ③ in Fig. 3. Subsequently, before the computation results are shifted out, SE transitions high during the high phase of the clock signal, as indicated by ④ in Fig. 3. T3 is cut off while T6 is activated. The computation results are transferred from the main latches to the shadow secondary latches rather than the function secondary latches for shifting out. The unnecessary switching activity in the combinational circuits during the shift-out phase is thereby avoided as well.

The clock period during the last shift cycle and the capture cycle is typically longer than during the normal shift [10]. By determining the propagation delay of combinational logic circuit ( $t_{comb}$ ) and the routing delay of SE signal ( $t_{SE\_routing}$ ), the minimum duration of the clock high phase ( $t_{clk\_high\_sc}$ ) and clock period ( $t_{clk\_sc}$ ) during the last shift and capture cycles can be adjusted accordingly. The timing constraints are as follows and illustrated in Fig. 3.

$$t_{clk\_high\_sc} \ge t_{C\_Q} + t_{SE\_routing} + t_{SE\_setup}.$$
(3)

$$t_{clk\_sc} \ge t_{C\_Q} + t_{SE\_routing} + t_{SE\_Q} + t_{comb} + t_{setup}.$$
 (4)

 $t_{SE\_setup}$  is the setup time of SE before the falling edge of clock;  $t_{Setup}$  is the setup time of flip-flop;  $t_{C\_Q}$  is the clock-to-Q propagation delay of flip-flop. During the last shift cycle,  $t_{SE\_Q}$ represents the duration from the rising edge of SE to the switching edge of SQ. Alternatively, during the capture phase,  $t_{SE\_Q}$  represents the duration from the falling edge of SE to the switching edge of SQ.

At-speed scan test based on the launch-on-capture (LOC) scheme [13] is widely used to detect timing defects due to its simplicity, high fault coverage, and strong diagnostic support. In LOC at-speed test, SE is maintained at 0 for two consecutive clock cycles. The first test vector initializes the circuit, while the second vector launches a transition. The timing diagram of LOC test with the proposed BPS-DRFF is shown in Fig. 4.



Fig. 4. The timing diagram of launch on capture (LOC) test.

### C. Data Retention Sleep Mode

When the chip test is finished, SE is maintained at 0. T3 is always on. The circuit operates either in the active mode or in the sleep mode. The data retention capability of the proposed BPS-DRFF then can be utilized in the sleep mode.

The timing diagram of mode transitions between the active mode and the sleep mode is shown in Fig. 5. Both power gating (with header sleep transistor) and ground gating (with footer sleep transistor) are supported by the proposed BPS-DRFF. Ground gating is taken as an example in Fig. 5.

Before entering the sleep mode, the system clock is gated first. SAVE transitions high, thereby turning on T6. The data that is stored in the function secondary latch is transferred to the shadow secondary latch. SAVE then transitions low to detach the shadow secondary latch from the main flip-flop. The crosscoupled inverters in the shadow secondary latch are always powered on to maintain the data when the sleep signal transitions low to start the sleep mode.

During wake-up, the sleep signal transitions high first to reactivate the footer sleep transistors. Then RESTORE transitions high to activate T5 and cut off T4. The data in the shadow secondary latch is restored to the main latch. Finally, the clock signal starts toggling to resume the normal operation.



Fig. 5. The timing diagram of mode transitions between the active mode and the sleep mode.

#### III. EVALUATION OF THE PROPOSED BPS-DRFF

The proposed BPS-DRFF is designed in the UMC 55-nm low power CMOS technology [14]. The evaluation of BPS-DRFF at the cell level is performed by SPICE simulation in Section III-A. Then the BPS-DRFF is applied to an AES-128 crypto core [15] for evaluation at the system level in Section III-B.

## A. Cell-Level Evaluation

The layout of BPS-DRFF is drawn with Cadence Virtuoso with double-row architecture to minimize the interconnect length, as shown in Fig. 6. Accurate parasitic extraction is conducted with Siemens Calibre. Post-layout SPICE simulations are performed. A conventional data retention scan flip-flop (SDRFF) in the foundry standard cell library (also with double-row layout) is used for comparison with the proposed BPS-DRFF. SDRFF has 52 transistors, while the proposed BPS-DRFF has 68 transistors. SDRFF has 34.9% area advantage compared to the proposed BPS-DRFF.



Fig. 6. The layout of proposed BPS-DRFF.

The five process corners (TT: typical, FF: fast NMOS fast PMOS, FS: fast NMOS slow PMOS, SF: slow NMOS fast PMOS, and SS: slow NMOS slow PMOS) of the 55-nm CMOS technology are used to evaluate the influence of process variations on the timing metrics of flip-flops. The simulations are performed at  $T = 25^{\circ}$ C. The power supply voltage is 1.2 V. The transistors in BPS-DRFF are sized to achieve similar (within 5%) setup time to SDRFF at the typical process corner. The additional switch T3 along the critical signal path and the longer wires (caused by the larger layout area) of BPS-DRFF lead to inferior timing behavior. Across the different process corners, the SDRFF reduces the functional mode D-to-Q propagation delay (setup time + clock-to-Q propagation delay) and hold time by 17.9% and 33.3% on average, respectively, compared to the proposed BPS-DRFF, as shown in Fig. 7.

Hold time violation in test mode is a serious issue for scan flip-flops. In the test mode, BPS-DRFF has 20% longer hold time (39.8 ps longer) on average compared to SDRFF. However, the transistors on the test mode clock-to-Q path of BPS-DRFF are all HVT minimum sized transistors. Alternatively, the clock-to-Q path of SDRFF is the same in both functional and test modes, which is constrained by the speed requirement in the functional mode. The clock-to-Q contamination delay of BPS-DRFF is therefore on average (across the five process corners) 161 ps longer than SDRFF. BPS-DRFF is thereby easier for hold time sign-off in the test mode than SDRFF.



Fig. 7. The D-to-Q propagation delay (setup time + clock-to-Q propagation delay) and hold time of different FFs at different process corners.

The dynamic and leakage power consumption of flip-flops is evaluated at the typical process corner with different temperatures, as shown in Fig. 8. The active mode dynamic power consumption is simulated at 2 GHz clock frequency for each FF. The input D keeps changing the logic value in each clock cycle. The dynamic power consumption of SDRFF is on average 5.69% lower than BPS-DRFF. Similarly, the leakage power consumption of SDRFF is on average 15.2% lower than BPS-DRFF due to the lower number of transistors.



Fig. 8. The power consumption of scan flip-flops at different temperatures.

## B. System-Level Evaluation

The prominent feature of the proposed BPS-DRFF is the power reduction during scan shift. To evaluate the effectiveness of power savings at the system level, an AES-128 core is used as a test circuit, where all flip-flops are replaced by BPS-DRFFs. The foundry-provided SDRFF is applied to the same circuit for comparison. Cadence Liberate and Abstract are used to generate the liberty and LEF files of BPS-DRFF, respectively. The Cadence low power flow with power gating is performed. Cadence Genus and Innovus are used for logic synthesis and physical implementation, respectively. ATPG is performed with Synopsys TestMAX. The test patterns are used for post-layout simulation with Cadence NC-Sim to generate value change dump (VCD) files for accurate power analysis in Cadence Voltus. As discussed, BPS-DRFF is easier for hold time sign-off in the test mode than SDRFF. For hold time sign-off in the test mode, one delay buffer is required to be add in each scan path (the output Q to the scan input SI of next scan cell) of the SDRFF based circuit. Alternatively, for the BPS-DRFF based circuit, these buffers are unnecessary. 7391 buffers are saved, thereby alleviating the area cost of BPS-DRFF based circuit.

7391 flip-flops are connected in one scan chain in the test mode in each AES core. Ten test patterns are used for the power analysis. As shown in Fig. 9, the average power consumption during scan shift is reduced by 68.5% with the BPS-DRFF based design compared to the SDRFF based design. Note that the situation is different during the last shift and capture cycles. The rise or fall transitions of SE in these two cycles lead to the data transmission between the function secondary latch and shadow secondary latch and the transitions of combinational circuits. The power consumption of BPS-DRFF based design in the last shift and capture cycles is therefore 6.79% and 6.52% higher, respectively, compared to the SDRFF based design.



Fig. 9. The average power consumption of scan flip-flops in each test phase.

Assume that the total length of the scan chain is N. There are (2N - I) cycles of normal shift, while only one last shift cycle and one capture cycle. Then the average power consumption  $(P_{average})$  for one test pattern can be calculated by (5).

$$P_{average} = \frac{(2N-1)*E_{average\_normal\_shift} + E_{last\_shift} + E_{capture}}{T_{total}}.$$
 (5)

 $E_{average\_normal\_shift}$  is the average energy consumption of one normal shift cycle.  $E_{last\_shift}$  and  $E_{capture}$  are the energy consumption of the last shift cycle and capture cycle, respectively.  $T_{total}$  is the total test time. The number of scan flipflops in a scan chain is typically much larger than one. Therefore, the average power consumption of one full test pattern is mainly determined by the normal shift. In this work, the length of the scan chain N is 7391. Therefore, the proposed design efficiently reduces the power consumption by 68.5% compared to the SDRFF based design during scan test.

The dynamic power rail analysis during the normal shift is performed with Cadence Voltus. The results show that the worst-case voltage drop of BPS-DRFF based design is 53 mV, while the worst-case voltage drop of SDRFF based design is 271 mV. The high voltage drop of SDRFF based design indicates that this circuit draws excessive current during the normal shift, affecting the power rail of the neighboring cores or blocks under test. Furthermore, the high power consumption of SDRFF based design leads to undesirable heat dissipation, degrading the performance of the neighboring cores or blocks as well as itself. Alternatively, the proposed BPS-DRFF not only achieves power savings, but also avoids unrealistic test failure and improves the manufacturing yield.

Power gating is also implemented for both the SDRFF and BPS-DRFF based AES cores. A power management unit (PMU) is designed for each AES core to generate the control signals. These signals control the mode transitions between the active and sleep modes. The AES core has two power domains: "TOP" and "CORE". The "TOP" power domain is always powered on, while the "CORE" domain is ground-gated in the sleep mode.

In the active mode, the total power consumption of BPS-DRFF based design is 2.2% larger (see Table I) than the SDRFF based design, due to the more transistors in each flip-flop. In the sleep mode, the total power consumption is mainly from the "TOP" power domain and the always-on cells in the "CORE" domain. The total power consumption of BPS-DRFF based design is 5.1% larger than the SDRFF based design, due to the more complicated control signals that need to be generated by the PMU (the relative scale of PMU can be much smaller if designed for a larger core). Then by focusing on the "CORE" domain, the increased leakage power of the BPS-DRFF based design is only 3% compared to the SDRFF based design.

 TABLE I

 EVALUATION OF THE AES CORES BASED ON DIFFERENT FFS

|                  |                                   | SDRFF                 | BPS-DRFF     |
|------------------|-----------------------------------|-----------------------|--------------|
|                  |                                   | based design          | based design |
| Area             |                                   | 0.846 mm <sup>2</sup> |              |
| Area utilization |                                   | 80.1%                 | 86.4%        |
| Number of FFs    |                                   | 7391                  |              |
| Frequency        |                                   | 100 MHz               |              |
| Test Mode        | Total power                       | 44.34 mW              | 13.98 mW     |
| Active           | Total power                       | 8.023 mW              | 8.201mW      |
| Mode             | Leakage power                     | 96.3 μW               | 97.4 μW      |
| Sleep Mode       | Total power                       | 21.5 uW               | 22.6 µW      |
|                  | Leakage power                     | 0.115 μW              | 0.117 μW     |
|                  | Leakage power of<br>"CORE" domain | 0.0658 μW             | 0.0678 μW    |

#### IV. CONCLUSION

In this brief, a scan-power-aware flip-flop with data retention capability, BPS-DRFF, is proposed. By adding a secondary latch, the test vectors are bypassed from the combinational circuits between the scan flip-flops in the scan shift mode. The redundant switching activity in the combinational circuits is eliminated, thereby saving significant power. The added secondary latch is also used for data retention during power gating, alleviating the area cost. The proposed scan flip-flop also eases the hold time sign-off in the test mode due to the elongated clock-to-Q contamination delay caused by the added secondary latch. The proposed BPS-DRFF is used in an AES-128 core to evaluate the characteristics at system level in the UMC 55-nm CMOS technology. Compared with the standard scan retention flip-flop based design, the proposed BPS-DRFF based design reduces the power consumption during scan test by 68.5%, and the worst-case voltage drop by 5.1x. Although having slightly increased delay, power, and area in the functional mode, the substantial power savings in the test mode enable that more good chips can pass the test and enter the

functional life cycle, thereby increasing the manufacturing yield to achieve lower chip cost. The proposed BPS-DRFF is therefore an attractive alternative for the standard scan retention flip-flop to achieve significant test power savings and higher manufacturing yield.

#### REFERENCES

- E. Alpaslan, Y. Huang, X. Lin, W. Cheng, and J. Dworak, "On reducing scan shift activity at RTL," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 29, no. 7, pp. 1110-1120, July 2010.
- [2] W. Ding, H. Hsieh, C. Han, J. C. Li, and X. Wen, "Test pattern modification for average IR-drop reduction," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 1, pp. 38-49, Jan. 2016.
- [3] S. Gerstendorfer and H.-J. Wunderlich, "Minimized power consumption for scan-based BIST," in *Proc. IEEE International Test Conference.* (*ITC*), 1999, pp. 77-84.
- [4] J. T. Tudu, E. Larsson, V. Singh, and V. D. Agrawal, "On minimization of peak power for scan circuit during test," in *Proc. IEEE European Test Symposium (ETS)*, 2009, pp. 25-30.
- [5] J. Li, Q. Xu, Y. Hu, and X. Li, "X-Filling for simultaneous shift- and capture-power reduction in at-speed scan-based testing," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 18, no. 7, pp. 1081-1092, July 2010.
- [6] S. Seo, Y. Lee, H. Lim, J. Lee, H. Yoo, Y. Kim, and S. Kang, "Scan chain reordering-aware X-Filling and stitching for scan shift power reduction," in *Proc. IEEE Asian Test Symposium (ATS)*, 2015, pp. 1-6.
- [7] S. Lee, K. Cho, S. Choi, and S. Kang, "A new logic topology-based scan chain stitching for test-power reduction," *IEEE Transactions on Circuits* and Systems II: Express Briefs, vol. 67, no. 12, pp. 3432-3436, Dec. 2020.
- [8] W. Pradeep, P. Narayanan, R. Mittal, N. Maheshwari, and N. Naresh, "Frequency scaled segmented (FSS) scan architecture for optimized scanshift power and faster test application time," in *Proc. IEEE International Test Conference (ITC)*, 2017, pp. 1-10.
- [9] A. Mishra, N. Sinha, Satdev, V. Singh, S. Chakravarty, and A. D. Singh, "Modified scan flip-flop for low power testing," in *Proc. IEEE Asian Test Symposium (ATS)*, 2010, pp. 367-370.
- [10] Y. Lin, J. Huang, and X. Wen, "A transition isolation scan cell design for low shift and capture power," in *Proc. IEEE Asian Test Symposium (ATS)*, 2012, pp. 107-112.
- [11] S. Shigematsu, S. Mutoh, Y. Matsuya, Y. Tanabe, and J. Yamada, "A 1-V high-speed MTCMOS circuit scheme for power-down application circuits," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 6. pp. 861-869, June 1997.
- [12] H. Jiao and Z. Zhang, "A compact low-power data retention flip-flop with easy-sleep mode," in *Proc. IEEE International Symposium on Circuits* and Systems (ISCAS), 2020, pp. 1-5.
- [13] A. D. Singh, "Cell aware and stuck-open tests," in Proc. IEEE European Test Symposium (ETS), 2016, pp. 1-6.
- [14] UMC 55-nm low power CMOS technology. Available on-line: https://www.umc.com/en/Product/technologies/Detail/55\_65\_90nm
- [15] AES. Available on-line: https://opencores.org/projects/tiny\_aes