# Characterization and Fault Modeling of Intermediate State Defects in STT-MRAM

Lizhou Wu<sup>\*</sup> Siddharth Rao<sup>‡</sup> Mottaqiallah Taouil<sup>\*†</sup> Erik Jan Marinissen<sup>‡</sup> Gouri Sankar Kar<sup>‡</sup> Said Hamdioui<sup>\*†</sup> <sup>\*</sup>TUDelft, Delft, The Netherlands <sup>†</sup> CognitiveIC, Delft, The Netherlands <sup>‡</sup>IMEC, Leuven, Belgium

{Lizhou.Wu, M.Taouil, S.Hamdioui}@tudelft.nl

{Siddharth.Rao, Erik.Jan.Marinissen, Gouri.Kar}@imec.be

Abstract—Understanding the defects in magnetic tunnel junctions (MTJs) and their faulty behaviors are paramount for developing high-quality tests for STT-MRAM. This paper characterizes and models intermediate (IM) state defects in MTJs; IM state manifests itself as an abnormal third resistive state, apart from the two bi-stable states of MTJ. We performed silicon measurements on MTJ devices with diameter ranging from 60 nm to 120 nm; the results reveal that the occurrence probability of IM state strongly depends on the switching direction, device size, and applied bias voltage. To test such defect, appropriate fault models are needed. Therefore, we use the advanced device-aware modeling approach, where we first physically model the defect and incorporate it into a Verilog-A MTJ compact model and calibrate it with silicon data. Thereafter, we use a systematic fault analysis to accurately validate a theoretically predefined fault space and derive realistic fault models. Our simulation results show that the IM state defect causes intermittent write transition faults. This paper also demonstrates that the conventional resistor-based fault modeling and test approach fails in appropriately modeling IM defects, and hence incapable of detecting such defects.

## I. INTRODUCTION

Spin-transfer torque magnetic RAM (STT-MRAM) technology offers competitive write performance, endurance, retention, and low power consumption [1]. The tunability of these aspects makes it customizable as both embedded and discrete memory solutions for a variety of applications such as edge AI, IoT, and aerospace [2]. Therefore, STT-MRAM has received a large amount of attention for commercialization from major foundries such as TSMC [2], Intel [3], and Samsung [1]. To enable STT-MRAM mass production, high-quality yet costefficient manufacturing test solutions are crucial to ensure the required quality of products being shipped to end customers. The STT-MRAM manufacturing process involves not only the conventional CMOS process but also MTJ fabrication and integration [4]. The latter is more vulnerable to defects as it requires deposition, etch, and integration of magnetic materials with new tools. Hence, a blind application of conventional tests for existing memories such as SRAMs and DRAMs to STT-MRAMs may lead to test escapes and yield loss.

STT-MRAM testing is still an on-going research topic. Several fault models, e.g. multi-victim and kink faults [5], were proposed for field-driven MRAMs. However, these fault models are not applicable to current-driven STT-MRAMs. Chintaluri *et al.* [6] derived fault models such as transition faults and read disturb faults in STT-MRAM arrays by simulating the impact of resistive defects in the presence of process variations; a March algorithm and its built-in-self-test

implementation were also introduced. Nair et al. [7] performed layout-aware defect injection and fault analysis, whereby they observed dynamic incorrect read fault. Nevertheless, all these papers assumed that STT-MRAM defects including those in MTJ devices are equivalent to linear resistors without any justification. Recently, Wu et al. [8,9] presented both experimental data and simulation results of pinhole defects in MTJ devices, and demonstrated that modeling pinhole defects as linear resistors is inaccurate and results in wrong fault models. As a solution to address the limitations of the traditional test approach, Fieback et al. [10,11] proposed the concept of Device-Aware Test (DAT). The DAT approach models physical defects accurately by incorporating the impact of such defects into the technology parameters and subsequently into the electrical parameters of the device. With the obtained defective device model, a systematic fault analysis based on circuit simulations can be conducted to develop realistic fault models; these are then used to develop test solutions.

This paper characterizes intermediate (IM) state defects in STT-MRAMs and applies DAT to develop accurate and realistic fault models. Normally, an MTJ device only has two bi-stable states representing logic '0' and '1'. However, due to some physical imperfections such as unreversed magnetic bubbles [12], inhomogeneous distribution of stray field [13] or even skyrmion generation [14], a third resistive state (i.e., IM state) may arise, leading to unintended memory faulty behaviors. The contributions of this paper are as follows.

- Characterize the IM state in MTJs with diameters ranging from 60 nm to 120 nm based on *silicon measurements*.
- Develop a Verilog-A compact model for a defective MTJ device suffering from IM state defect, and calibrate the model based on silicon measurements.
- Perform device-aware fault modeling to develop accurate and realistic fault models induced by the IM state defect.
- Demonstrate the conventional fault modeling and test method fails to derive appropriate fault models for the IM state defect; hence, it fails to detect such a defect.

The rest of this paper is organized as follows. Section II introduces STT-MRAM fundamentals. Section III presents characterization results of IM state defect. Section IV elaborates the device-aware defect modeling of IM state. Section V presents and compares the fault modeling results using the device-aware and the conventional resistor-based fault modeling approaches. Section VI concludes this paper.



Fig. 1. (a) Simplified MTJ stack, (b) 1T-1MTJ cell and its access operations.

## II. BACKGROUND

# A. MTJ Device Technology

Magnetic tunnel junction (MTJ) is the data-recording element in STT-MRAMs; it encodes two bi-stable magnetic states into one-bit data. Fig. 1a shows the schematic of a simplified MTJ device; its critical diameter (CD) is typically 20 nm-150 nm. The cross-sectional area  $A_0 = \frac{1}{4} \pi \text{CD}^2$  is a key technology parameter of the device [4]. Fundamentally, the MTJ consists of two ferromagnetic layers sandwiching an ultra-thin dielectric tunnel barrier (TB). The top ferromagnetic layer is named as *free layer* (FL) where the magnetization can be switched by applying a spin-polarized current going through it. The bottom ferromagnetic layer is called *pinned* layer (PL) where the magnetization is strongly pinned to a certain direction. Therefore, the FL's magnetization can be either parallel (P state) or anti-parallel (AP state) to the PL's. The MTJ's resistance depends on both the thickness of TB  $(t_{\text{TB}})$  and the magnetic state (i.e., P or AP). This is well known as the tunneling magneto-resistance (TMR) effect [15], which is characterized by the TMR ratio. It is defined as:  $(R_{\rm AP} - R_{\rm P})/R_{\rm P}$  where  $R_{\rm AP}$  and  $R_{\rm P}$  are the resistances in AP and P states, respectively. The key technology parameters of MTJ are listed in listed in Table I [4].

## B. 1T-1MTJ Cell Design

Fig. 1b shows a bottom-pinned 1T-1MTJ bit cell and the associated write and read operations. The three-terminal cell includes an MTJ device and an NMOS selector. The three terminals are connected to a bit line (BL), a source line (SL), and a word line (WL). The voltages on the BL and SL control which operation on the cell when the WL is asserted. For instance, a write '0' operation requires the BL at  $V_{\rm DD}$  and the SL grounded, which leads to a current  $I_{w0}$  flowing from BL to SL. In contrast, a current  $I_{w1}$  with the opposite direction goes through the cell during a write '1' operation. To guarantee a successful transition of the MTJ state, the write current (both  $I_{w0}$  and  $I_{w1}$ ) has to be larger than the *critical switching current*  $I_{\rm c}$ . The larger the current above  $I_{\rm c}$ , the faster the switching can be. It is worth noting that the *actual switching time*  $t_w$  under a fixed pulse varies from one cycle to another since the STT switching is intrinsically stochastic. During a read operation, a significantly smaller voltage  $V_{\rm read}$  than  $V_{\rm DD}$  is applied on the BL to draw a read current  $I_{\rm rd}$ , which can be as small as  $\sim 10 \,\mu\text{A}$ , to read the resistive state ( $R_{\rm P}$  or  $R_{\rm AP}$ ) of the MTJ device by a sense amplifier.

TABLE I. STT-MRAM KEY PARAMETERS.

| <b>Technology Parameters</b>                                     |                                                                                                                                       |                                                        | Electrical Parameters                                             |  |  |
|------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|-------------------------------------------------------------------|--|--|
| $\begin{array}{c} A_0\\ M_{\rm s}\\ H_{\rm k}\\ TMR \end{array}$ | Cross-sectional area of MTJ<br>Saturation magnetization of FL<br>Magnetic anisotropy field of FL<br>Tunneling magnetoresistance ratio | $ \begin{array}{c c} R \\ I_{c} \\ t_{w} \end{array} $ | Device resistance<br>Critical switching current<br>Switching time |  |  |

#### III. DEFECT CHARACTERIZATION OF IM STATE

Electrical characterization with pulses is a common practice to evaluate the write performance of STT-MRAM devices. When we performed comprehensive characterization on devices with CD ranging from 60 nm to 120 nm, some devices showed an abnormal third resistive state apart from the two bi-stable P and AP states. As the resistance of this unexpected state is always between  $R_{\rm P}$  and  $R_{\rm AP}$ , we refer to it as *intermediate* (IM) state in this paper. Next, we will present the measurement data of IM state and elaborate it in detail.

#### A. Identification of IM State

=

To experimentally characterize the  $P \rightarrow AP$  switching behavior under voltage pulses, first a negative write pulse  $(V_{\rm p}=-0.8\,{\rm V}, t_{\rm p}=50\,{\rm ns})$  was applied to the MTJ device under test to initialize it to P state. The pulse was followed by a read operation using a relatively long but small voltage pulse  $(V_{\rm p}=10\,{\rm mV}, t_{\rm p}=0.7\,{\rm ms})$  to check whether the device has been initialized to P state successfully. After the read, a positive write pulse with  $t_p=15 \text{ ns}$  was applied to the device to study  $P \rightarrow AP$  switching. Similarly, a second read was applied to read out the resistive state of the device. As the switching behavior is intrinsically stochastic, we repeated these four operations for 10k cycles to obtain a statistical result. To cover the switching probability  $P_{\rm sw}$  from 0% to 100%, we swept the pulse amplitude  $V_{\rm p}$  of the second write pulse in a carefullytuned range. For  $AP \rightarrow P$  switching characterization, a similar measurement was conducted with the polarity of both write pulses reversed.

Fig. 2a and 2b show the  $P \rightarrow AP$  switching characterization results of a representative normal MTJ and a defective MTJ with an IM state, respectively, at  $V_{\rm p}$ =0.45 V; the nominal CD of both devices is 100 nm. Note that each point in the two figures represents a readout resistance of the second read operation. It can be seen in Fig. 2a that 99.2% of the measured 10k cycles experience a successful transition, while the rest of cycles (0.8%) experience a failed transition due to the intrinsic switching stochasticity. In contrast, three resistive states are observed in Fig. 2b for the defective MTJ under the same experimental conditions; a line of unexpected yellow points (IM state) shows up. The occurrence probability of IM state is 0.6%. It is also worth noting that the failed transition probability of the defective MTJ is 44.8%, which is much higher than that of the defect-free one (0.8%) under the same applied pulses. The disparity of  $R_{\rm P}$  (red lines) and  $R_{\rm AP}$ (green lines) between these two devices is attributed to process variations; the slight TMR drop in this defective MTJ was not a common rule in all observed defective MTJs with IM states, compared to good MTJs.



Fig. 2. Measurement results: (a) normal MTJ, (b) defective MTJ with an IM state.

## B. Bias, Device Size, and Switching Direction Dependence

We also conducted numerous experiments to investigate which factors impact the occurrence probability of IM state. Fig. 3 shows the bias dependence of IM state of four different MTJ devices in  $P \rightarrow AP$  switching direction; the measurement data for the other switching direction is similar, thus not shown here due to space limitations. The nominal CD of MTJ A and B is 100 nm while it is 120 nm for MTJ C and D. It can be seen that the successful transition probability  $P_{\rm ST}$  from P to AP increases from 0% to 100%, as  $V_{\rm p}$  increases (green squares corresponding to the left y-axis). The orange circles represent the occurrence probability of IM state  $P_{IM}$ , corresponding to the right y-axis at various  $V_{\rm p}$  points (from  $0.35\,{\rm V}$  to  $0.55\,{\rm V}$  in a step of 0.02 V). One can observe that  $P_{\rm IM}$  increases with  $V_{\rm p}$ until reaching a peak at  $P_{\rm ST}{\approx}50\%$  (marked with the dashed line), then it decreases as  $V_{\rm p}$  further increases. We observed this rule applies to all MTJ devices with IM states in both switching directions despite the peak height of  $P_{\rm IM}$  varies from one device to another.

In addition to the bias voltage, we observed that the switching direction and device size also play a role in determining the occurrence probability of IM state. During the measurements, four different device sizes (CD=60 nm, 75 nm, 100 nm, and 120 nm) were covered; for each size, 60 MTJ samples were measured. As shown in Fig. 4, the smaller the MTJ device (i.e., smaller CD), the less likely to see IM states in our devices. More specifically, 57 devices out of the measured 60 devices with CD=120 nm exhibit IM states in the measurement, whereas the number is 5 and 0 for MTJs with CD=75 nm and 60 nm respectively. Among those devices with observed IM states, the median of the maximum occurrence probability of IM state (i.e., the peak height of  $P_{\rm IM}$  in Fig. 3) becomes smaller when CD decreases, as shown with the two curves corresponding the right y-axis in Fig. 4. It is also worth noting that the median of the maximum  $P_{\rm IM}$  in  $AP \rightarrow P$  switching direction is a bit smaller than that in  $P \rightarrow AP$ switching direction for a given MTJ size. This is probably because  $AP \rightarrow P$  switching generates more Joule heating than the opposite switching direction, which reduces the retention time of IM state; thus, the captured number of IM states on average is smaller in AP  $\rightarrow$ P switching direction under the same measurement set-up. Interestingly, Intel also presented similar measurement results in [13]. Based on the above



Fig. 4. Device size and switching direction dependence of IM state.

observations, it can be inferred that STT-MRAM technology down-scaling is helpful in reducing the probability of having IM states in MTJ devices, thus leading to a more deterministic and uniform transition between the bi-stable AP and P states.

### C. Related Work and Potential Causes

There are several prior works on studying IM states in MTJ devices based on experiments and/or simulations. Yao et al. [16] observed stable IM states in both switching directions after the removal of write pulses with a similar measurement set-up to ours. They attributed the physical causes of IM state to the multi-structure of the FL induced by the dipole field and large device size. Subsequently, more research works [12,13] were conducted and reported that the observed IM states are caused by the inhomogeneous distribution of stray field at the FL and unreversed magnetic bubbles. In recent two years, studies on IM states reveal that IM states in MTJ devices take place due to Skymion formation and their retention time can be as long as the bi-stable P and AP states [14].

In this work, our measurement data also clearly demonstrates the existence of IM states in MTJ devices especially for large sizes above 75 nm. As the occurrence of IM state is probabilistic depending on the switching direction, bias voltage, and device size, the conventional linear-resistor-based defect modeling approach for memory testing is not qualified to cover this defect in MTJ devices.

## IV. DEVICE-AWARE DEFECT MODELING OF IM STATE

To appropriately model the IM defect at the functional level, we use the recently proposed device-aware defect modeling approach [5], which consists of three steps: 1) physical defect analysis and modeling, 2) electrical modeling of defective device, and 3) fitting and model optimization.

Design, Automation and Test in Europe Conference (DATE 2021)



Fig. 5. MTJ schematics with both P-state and AP-state regions in the FL.

## A. Physical Defect Analysis and Modeling

Based on the characteristics and root causes of IM state, we physically model the IM state at the following two aspects.

1) Partial switching behavior of the FL: As explained in the previous section, the most probable cause of IM state in MTJ devices is that some parts of the FL switch to the intended state under a write pulse while the rest remain in their initial state due to unreversed magnetic bubbles, inhomogeneous distribution of stray field at the FL, or even skyrmion generations. Therefore, we model this partial switching behavior by splitting the FL into two regions: 1) P-state region and 2) AP-state region with the assumption that these two regions are independent magnetically and electrically. Fig. 5a and Fig. 5b show the vertical and horizontal crosssection schematics of an MTJ device with both P-state and AP-state regions, respectively. As a result, we can derive:

$$1 = \frac{A_{\rm P}}{A_0} + \frac{A_{\rm AP}}{A_0} = A_{\rm IMP} + A_{\rm IMAP},\tag{1}$$

where  $A_{\rm P}$  and  $A_{\rm AP}$  are the cross-sectional area of the P-state and AP-state regions, respectively.  $A_{\rm IMP}$  and  $A_{\rm IMAP}$  are the normalized area with respect to the entire area  $A_0$ ; they can be any value in [0, 1]. Note that this model also covers the defect-free case where the P and AP states exist exclusively; i.e.,  $A_{\rm IMP}=0$  represents AP state and  $A_{\rm IMP}=1$  means P state.

2) Probabilistic occurrence of IM state: As introduced previously, the IM state does not show up in all write cycles. Instead, we observed experimentally that it has a certain occurrence probability depending on the applied bias voltage  $V_{\rm p}$ , MTJ size CD, and the switching direction. Apart from that, it is expected that the FL thickness ( $t_{\rm FL}$ ) also plays a role in determining the IM occurrence probability, as it significantly influences the thermal stability of the device [15].

We define a discrete random variable X as whether or not the IM state occurs. For a given  $V_p$ , CD, and  $t_{FL}$ , X obeys a Bernoulli distribution. Its probability mass function Pr (X) is:

$$Pr(X) = \begin{cases} 1 - P_{\rm IM}(V_{\rm p}, CD, t_{\rm FL}) & X = 0\\ P_{\rm IM}(V_{\rm p}, CD, t_{\rm FL}) & X = 1, \end{cases}$$
(2)

As shown in Fig. 3, the correlation between  $P_{\rm IM}$  and  $V_{\rm p}$  exhibits a curve which is quite similar to Gaussian function (Bell curve). Thus, we model the  $V_{\rm p}$  dependence of  $P_{\rm IM}$  as:

$$P_{\rm IM} = H_{\rm IM} \cdot \exp(\frac{-(V_{\rm p} - V_{\rm pk})^2}{2V_{\rm wd}^2}),$$
 (3)

where  $V_{\rm pk}$  is the applied bias voltage when  $P_{\rm IM}$  reaches its peak  $H_{\rm IM}$ , and  $V_{\rm wd}$  is a parameter controlling the width



Fig. 6. (a) R-V loop experimental data vs. fitting curves to extract  $R_{\rm P}$  and  $R_{\rm AP}$  at varying bias voltage, (b)  $R_{\rm IM}$  vs.  $A_{\rm IMP}$  with respect to three biases.

of the Bell curve. Note that the polarity of  $V_{\rm p}$  determines the switching direction; a negative  $V_{\rm p}$  results in an AP $\rightarrow$ P transition while a positive  $V_{\rm p}$  leads to a reversed transition. Since  $H_{\rm IM}$  shows a linear scaling trend with CD, as shown in Fig. 4, it can be modeled as a linear piecewise function:

$$H_{\rm IM} = \begin{cases} S_{\rm lp} \cdot (CD - 60) & CD \ge 60\\ 0 & CD < 60, \end{cases}$$
(4)

where  $S_{\rm lp}$  is the slope of the curve. Since all the measurements we performed were on MTJ devices with the same  $t_{\rm FL}$ , it is assumed that  $t_{\rm FL}$  has no impact on  $P_{\rm IM}$ . However, for a generic model for devices with different  $P_{\rm IM}$ , such impact should be incorporated. Combing Equations (2-4),  $S_{\rm lp}$ ,  $V_{\rm pk}$ , and  $V_{\rm wd}$  are three fitting parameters which can be tuned and fitted to measurement data, which will be covered later.

### B. Electrical Modeling of MTJ Devices with a Single IM State

With the obtained model of IM state, we can map it to the three key electrical parameters: R,  $I_c$ , and  $t_w$  as a reflection of the impact on the device's electrical behavior.

As we model the IM state by splitting the FL into AP-state and P-state regions (see Fig. 5), electrons can go through via either the P-state region or the AP-state region in an electric field. Therefore, the overall conductance of IM state is the sum of the conductance of these two parallel regions. By replacing the conductance with the reciprocal of resistance, we derive:

$$R_{\rm IM}(A_{\rm IMP}) = \frac{R_{\rm P} \cdot R_{\rm AP}}{R_{\rm P} \cdot (1 - A_{\rm IMP}) + R_{\rm AP} \cdot A_{\rm IMP}}.$$
 (5)

 $R_{\rm P}$  and  $R_{\rm AP}$  are both dependent on the bias voltage  $V_{\rm MTJ}$  applied across the MTJ device. Fig. 6a shows the measured R-V loop of the MTJ device A (see also Fig. 3); the red solid curves are fitting curves used to extract the exact resistance at a given bias voltage with the physical model in [9]. With  $R_{\rm P}$  and  $R_{\rm AP}$  extracted from measurement data, we can calculate  $R_{\rm IM}$  for different  $A_{\rm IMP}$  values using Equation (5); the results are shown in Fig. 6b for  $V_{\rm MTJ}$ = 10 mV, 300 mV, and 700 mV.

To map the IM state defect model to  $I_c$ , we modify the equation of  $I_c$  in the STT-switching model as follows [15].

$$I_{\rm c}(A_{\rm IMP}) = \begin{cases} \frac{1}{\eta} \frac{2\alpha e}{\hbar} M_{\rm s} H_{\rm k} t_{\rm FL} A_0 A_{\rm IMP}, & \text{IM}(P) \to AP\\ \frac{1}{\eta} \frac{2\alpha e}{\hbar} M_{\rm s} H_{\rm k} t_{\rm FL} A_0 (1 - A_{\rm IMP}), & \text{IM}(AP) \to P \end{cases}$$
(6)



Fig. 7. Curve fitting of  $P_{\rm ST}$  and  $P_{\rm IM}$  to measurement data.

In this equation,  $\eta$  is the STT efficiency,  $\alpha$  the magnetic damping constant, e the elementary charge,  $\hbar$  the reduced Planck constant. When  $A_{\text{IMP}}=1$  (indicating P state), the above equation collapses to the original equation for  $I_c(P \rightarrow AP)$ . When  $A_{\text{IMP}} \in (0, 1)$  (indicating IM state),  $I_c(\text{IM} \rightarrow AP)$  is smaller than  $I_c(P \rightarrow AP)$  as only the P-state region in the FL necessitates a flip. Similar interpretation can be inferred for IM(AP) $\rightarrow$ P switching. Note that the switching from P or AP state to IM state is governed by the aforementioned statistical model in Equation (2-4).

To model the changes in  $t_w$  due to an IM state defect, we use the Sun's model for the STT switching behavior [9].

$$\mu(t_{\rm w}) = \left(\frac{2}{C + \ln(\frac{\pi^2 \Delta}{4})} \cdot \frac{\mu_{\rm B} P}{e \cdot m \cdot (1 + P^2)} \cdot I_{\rm d}\right)^{-1}, \quad (7)$$

$$I_{\rm d} = \frac{V_{\rm MTJ}}{R(V_{\rm MTJ})} - I_{\rm c}(A_{\rm IMP}), \qquad (8)$$

$$t_{\rm w} \sim \mathcal{N}(\mu(t_{\rm w}), \, \sigma(t_{\rm w})^2).$$
 (9)

Here,  $C \approx 0.577$  is Euler's constant,  $\Delta$  the thermal stability in P or AP or IM depending on the switching direction,  $\mu_{\rm B}$  the Bohr magneton, P the spin polarization, and m the magnetic moment of FL.  $V_{\rm p}$  is the bias voltage across the MTJ device to switch its state.  $R(V_{\rm MTJ})$  is the resistance of the MTJ device; it shows a non-linear dependence on  $V_{\rm MTJ}$  (see Fig. 6). In addition, we assume that  $t_{\rm w}$  obeys a normal distribution at a given  $V_{\rm MTJ}$  as a model for the switching stochasticity.

#### C. Fitting and Model Optimization

We use the measured data of MTJ A to calibrate the obtained model as an example. First,  $R_{\rm P}$  and  $R_{\rm AP}$  can be extracted from R-V loops (see Fig. 6a). As the measured  $R_{\rm IM}$ =1050  $\Omega$  and the read bias is 10 mV, we can calculate the  $A_{\rm IMP}$  value based on the  $R_{\rm IM}$  model; the result is marked with the blue point ( $A_{\rm IMP}$ =0.48) in Fig. 6b. Second, the fitting results of  $P_{\rm ST}$  and  $P_{\rm IM}$  are shown in Fig. 7. On the positive side  $V_{\rm p}$ >0 for P $\rightarrow$ AP switching,  $S_{\rm lp}$ =1e-3,  $V_{\rm pk}$ =0.4369, and  $V_{\rm wd}$ =0.0145. On the negative side  $V_{\rm p}$ <0 for AP $\rightarrow$ P switching,  $S_{\rm lp}$ =3.9e-4,  $V_{\rm pk}$ =-0.7096, and  $V_{\rm wd}$ =0.0182.

The output of device-aware defect modeling is a calibrated Verilog-A MTJ compact model. After verifying the MTJ model in Python, we moved this model to Verilog-A so as to make it compatible with circuit simulators such as Cadence Spectre adopted in this paper for subsequent fault modeling.

TABLE II. FAULT PRIMITIVE NOTATION [4].



Fig. 8. Simulation result statistics of: (a)  $P_{\rm ST}$  and (b)  $P_{\rm IM}$  in 0w1 operations at varying word line voltage  $V_{\rm WL}$  and pulse width  $t_{\rm p}.$ 

#### V. DEVICE-AWARE FAULT MODELING OF IM STATE

In this work, we limit the analysis to single-cell faults, which involve a single STT-MRAM cell. Single-cell faults can be systematically described by the *fault primitive* (FP) notation [4]:  $\langle S/F_n/R \rangle$ , as shown in Table. II. It describes the deviation of the observed memory behavior from the expected.

## A. Simulation Set-up and Results

The simulation circuits comprise a  $3 \times 3$  1T-1MTJ array and peripheral circuits (e.g., write driver and sense amplifier). Process variations in transistors are lumped into the variation in the threshold voltage  $V_{\rm th}$  with 10% away from its nominal value at  $3\sigma$  corners. For MTJ devices, our Verilog-A MTJ compact model with CD=100 nm is adopted; Variations in the MTJ performance are taken into account by enabling: 1) switching stochasticity, 2) process variations, and 3) temperature variation from  $-40\,^{\circ}\text{C}$  to  $125\,^{\circ}\text{C}$  in the MTJ model. The defect injection is executed by replacing the defect-free MTJ model (with only P and AP states) located in the center of the array with a defective one (with P, AP, and IM states). The defect strength is configured by assigning a float number to  $A_{\text{IMP}} \in (0, 1)$  as an input parameter of the MTJ model; it is swept from 0 to 1 in 100 steps in the simulations. The reminding eight MTJs surrounding the central one are always defect-free. Since the simulation overhead is immense due to Monte Carlo simulations (2k cycles) and sweeping multiple variables  $(A_{\rm IMP}, V_{\rm WL}, t_{\rm p})$ , we performed circuit simulations in a cluster with eight compute nodes to speed up the simulation by exploiting job-level parallelism.

Fig. 8a shows the simulation result statistics of S=0w1 at varying  $V_{WL}$  and  $t_p$  in the defect-free case. The successful transition probability  $P_{ST}$  rises from 0% (red area) to 100% (blue area) as  $V_{WL}$  and  $t_p$  increase. However, one can observe that the indeterministic switching area occupies a large area, which poses a big design challenge for reliable and deterministic write operations in STT-MRAMs. Typically, this issue can be addressed by circuit-level designs such as boosting  $V_{WL}$ , write-verify-write, and self-write-termination schemes [3]. Therefore, failed transitions due to switching stochasticity are not considered as memory faults here.

Design, Automation and Test in Europe Conference (DATE 2021)

TABLE III. FAULT MODELING RESULTS OF IM STATE DEFECT USING OUR DEVICE-AWARE (DA) MODEL.

| Defect<br>model | $\mathbf{A}_{\mathrm{IMP}}$ | Sensitized<br>FP                           | FP name<br>and abbreviation                                | Detection condition |
|-----------------|-----------------------------|--------------------------------------------|------------------------------------------------------------|---------------------|
| DA model        | [0.30, 0.61]                | $\langle 0w1/U_i/-\rangle$                 | Intermittent write transition<br>fault: W1TFU <sub>i</sub> | DfT                 |
|                 |                             | $\langle 1$ w0/U <sub>i</sub> /– $\rangle$ | Intermittent write transition<br>fault: W0TFU <sub>i</sub> |                     |

Fig. 8b shows the IM state statistics in S=0w1 operations at varying  $V_{\rm WL}$  and  $t_{\rm p}$  in the defective case ( $A_{\rm IMP}$ =0.48). It can be seen that the IM state shows up with different probability  $P_{\rm IM}$  in a large area of the contour map, especially in the area where  $P_{\rm ST}$  is around 50%. Obviously, the closer to the topright corner, the less likely to see an IM state and more likely to have a successful transition. However, large  $V_{\rm WL}$  and  $t_{\rm p}$  come with severe costs: 1) back-hopping issue, 2) large energy consumption, 3) long write latency, and 4) reduced endurance. Hence, in practice, a trade-off has to be made and a flexible write scheme with configurable  $V_{\rm WL}$  and  $t_{\rm p}$  in the field is more desirable.

From a test perspective, the occurrence of IM state may transform the MTJ resistance into an undefined region (i.e., 'U' state), thus leading to memory faults. Table III lists the fault modeling results due to the IM state defect. When  $A_{\rm IMP} \in [0.30, 0.61]$ , two fault primitives were observed:  $\langle 0w1/U_i/-\rangle$  and  $\langle 1w0/U_i/-\rangle$ . The intermittent write transition fault W1TFU<sub>i</sub>= $\langle 0w1/U_i/-\rangle$  means that an up-transition operation on a memory cell with initial state '0' transforms the memory cell into a 'U' state with certain probability. The probability depends on  $V_{\rm WL}$  and  $t_{\rm p}$  as shown in Fig. 8b. Since these two FPs both involve the 'U' state, meaning that a read operation cannot guarantee their detection; the readout results can be random. Thus, these two FPs belong to hard-to-detect faults which require special Design-for-Testability (DfT) solutions to detect them.

## B. Comparison to the Conventional Resistive Model

We also performed fault modeling using the conventional resistive model for the IM state defect. Unlike the DA model, resistive model assumes that a defect in an MTJ device can be modeled as a resistor either in series with  $(R_{\rm sd})$  or in parallel to  $(R_{\rm pd})$  an ideal MTJ device. We added  $R_{\rm sd}$  and  $R_{\rm pd}$  separately to the defect-free netlist and performed the same fault analysis process; the resistance was swept from  $10^9 \ \Omega$ .

Fig. 9 shows a Venn diagram which compares the fault modeling results of our DA model and the conventional resistive model. Clearly, the DA model leads to two hard-todetect faults while the resistive model results in three easy-todetect faults. There is no overlap between the two circles. This means that the IM state defect in MTJ devices exhibits unique faulty behaviors which cannot be covered by the resistor-based defect models. The two FPs sensitized using our DA model are intermittent and involve the 'U' state, which makes them hard to be detected by March tests. In contrast, the resistive models result in only easy-to-detect faults, since the MTJ device is considered as an ideal black box and thus only '0' and '1' states are observed in the simulations.



Fig. 9. Comparison of sensitized fault primitives using our device-aware model and the conventional resistive model for IM state defects.

The above comparison clearly indicates that test algorithms developed with the conventional approach not only cannot guarantee the detection of IM state defects, but also may waste test time and resources. Hence, DAT can be a unique and complimentary approach which specifically targets MTJinternal defects and analyzes their faulty behaviors.

#### VI. CONCLUSION

This paper presents comprehensive characterization of IM state defect in STT-MRAM devices. The occurrence probability of the defect depends on bias voltage, device size, switching direction, and FL thickness. It also demonstrates that the traditional fault modeling and test approach fails to accurately model this defect at the functional behavior; hence it fails in detecting such a defect during manufacturing tests. Therefore, it requires new fault modeling and test approach. The use of DAT suggests that an IM state defect leads to intermittent write transition faults. These are hard-to-detect faults, and require special DfT solutions to be detected.

#### REFERENCES

- K. Lee *et al.*, "1Gbit high density embedded STT-MRAM in 28nm FDSOI technology," in *IEDM*, Dec. 2019, pp. 2.2.1–2.2.4.
- [2] W.J. Gallagher *et al.*, "22nm STT-MRAM for reflow and automotive uses with high yield, reliability, and magnetic immunity and with performance and shielding options," in *IEDM*, 2019, pp. 2.7.1–2.7.4.
- [3] L. Wei et al., "A 7Mb STT-MRAM in 22FFL FinFET technology with 4ns read sensing time at 0.9V using write-verify-write scheme and offset-cancellation sensing technique," in *ISSCC*, 2019, pp. 214–216.
- [4] L. Wu et al., "Defect and fault modeling framework for STT-MRAM testing," IEEE Trans. Emerg. Topics Comput., pp. 1–14, Dec. 2019.
- [5] J. Azevedo et al., "A complete resistive-open defect analysis for thermally assisted switching MRAMs," TVLSI, vol. 22, Nov. 2014.
- [6] I. Yoon et al., "EMACS: efficient MBIST architecture for test and characterization of STT-MRAM arrays," in ITC, Nov. 2016, pp. 1–10.
- [7] S.M. Nair *et al.*, "Defect injection, fault modeling and test algorithm generation methodology for STT-MRAM," in *ITC*, Oct. 2018, pp. 1–10.
- [8] L. Wu *et al.*, "Electrical modeling of STT-MRAM defects," in *ITC*, Oct. 2018, pp. 1–10.
  [9] L. Wu *et al.*, "Pinhole defect characterization and fault modeling for
- [9] L. Wu *et al.*, "Pinhole defect characterization and fault modeling for STT-MRAM testing," in *ETS*, May 2019, pp. 1–6.
  [10] M. Fieback *et al.*, "Device-aware test: a new test approach towards
- [10] M. Fieback *et al.*, "Device-aware test: a new test approach towards DPPB," in *ITC*, Nov. 2019, pp. 1–10.
- [11] L. Wu et al., "Characterization, modeling and test of synthetic antiferromagnet flip defect in STT-MRAMs," in *ITC*, 2020, pp. 1–10.
- [12] T. Devolder *et al.*, "Size dependence of nanosecond-scale spin-torque switching in perpendicularly magnetized tunnel junctions," *Phys. Rev. B*, vol. 93, p. 224432, Jun. 2016.
- [13] O. Golonzka et al., "MRAM as embedded non-volatile memory solution for 22FFL FinFET technology," in *IEDM*, Dec. 2018, pp. 18.1.1–18.1.4.
- [14] X. Zhang et al., "Skyrmions in magnetic tunnel junctions," ACS Appl. Mater. Interfaces, vol. 10, pp. 16887–16892, May 2018.
- [15] A.V. Khvalkovskiy *et al.*, "Basic principles of STT-MRAM cell operation in memory arrays," *J. Phys. D: Appl. Phys*, vol. 46, p. 139601, Feb. 2013.
- [16] X. Yao *et al.*, "Observation of intermediate states in magnetic tunnel junctions with composite free layer," *IEEE Trans. Magnetics*, vol. 44, pp. 2496–2499, Nov. 2008.