# Testing STT-MRAM: Manufacturing Defects, Fault Models, and Test Solutions

Lizhou Wu<sup>∗</sup> Siddharth Rao‡ Mottaqiallah Taouil∗† Erik Jan Marinissen‡ Gouri Sankar Kar‡ Said Hamdioui∗†  $\dagger$  CognitiveIC, Delft, The Netherlands <sup>‡</sup>IMEC, Leuven, Belgium njuwulizhou@gmail.com {M.Taouil, S.Hamdioui}@tudelft.nl {Siddharth.Rao, Erik.Jan.Marinissen, Gouri.Kar}@imec.be

*Abstract*—STT-MRAM is one of the most promising emerging non-volatile memory technologies. As its mass production and deployment in industry is around the corner, high-quality yet costefficient manufacturing test solutions are crucial to ensure the required quality of products being shipped to end customers. This paper focuses on STT-MRAM testing, covering three abstraction levels: manufacturing defects, fault models, and test solutions. We first survey STT-MRAM manufacturing defect space and apply the conventional resistor-based test approach to develop test solutions. We then demonstrate with silicon measurements that this approach fails to appropriately model and test defects in STT-MRAM devices: magnetic tunnel junctions (MTJs), although it is qualified for interconnect/contact defects. Therefore, we propose a new test approach: device-aware test (DAT) to specifically target device-internal defects. We apply DAT to three key types of MTJ defects: pinhole, synthetic anti-ferromagnet flip, and intermediate state defects. After developing accurate defect models and calibrating them with silicon data, we perform comprehensive fault analyses based on SPICE circuit simulations to derive accurate and realistic fault models. Some STT-MRAM unique faults are identified, including both permanent faults and intermittent faults. Based on the obtained fault models, highquality test solutions are proposed. Additionally, this paper also proposes a magnetic coupling model and a magnetic-field-aware compact MTJ model for fast and robust STT-MRAM designs.

#### I. INTRODUCTION

Spin-transfer torque magnetic random access memory (STT-MRAM) is one of the most promising emerging memory technologies, thanks to its advantageous features: non-volatility, fast access speed, high endurance, nearly zero leakage power, and CMOS-compatibility [1]. The flexible trade-off between write speed, endurance, and retention also empowers it to be tailored and fitted into different layers in the present memory hierarchy ranging from high-retention storage to highperformance caches [2]. Therefore, STT-MRAM has stimulated several start-ups (e.g., Everspin [3], Avalanche [4]) and major semiconductor companies worldwide (e.g., Samsung [5], TSMC [2]) to commercialize this technology. Nevertheless, to enable high-volume production of STT-MRAM, highquality test solutions are paramount to meet the increasingly stringent quality requirements of IC chips being shipped to end-customers. The STT-MRAM manufacturing process involves not only conventional CMOS process but also magnetic tunnel junction (MTJ) fabrication and integration [5]. The latter is more vulnerable to defects as it requires deposition, etch, and integration of magnetic materials with new tools [6,7]. A blind application of conventional tests for existing memories such as SRAM and DRAM to STT-MRAM may lead to test escapes and yield loss. Hence, understanding MTJinternal defects and their resultant faulty behaviors are crucial for developing high-quality STT-MRAM test solutions.

Testing STT-MRAM is still in its infant stage with limited publications. In 2015, Chintaluri *et al.* [8,9] studied the faulty behaviors of STT-MRAM induced by resistive opens and shorts as well as extreme process variations. Based on circuit simulations, they derived six fault models: stuck-at fault, transition fault, incorrect read fault, read disturb fault, retention fault, and coupling fault. In 2016, the same research group presented a memory built-in-self-test (BIST) to detect these faults; furthermore, this MBIST design was also claimed to have the capability of characterizing retention time of STT-MRAM cells at affordable test time [10]. In 2018, Nair *et al.* [11] performed layout-aware defect injection and fault analyses, whereby they observed dynamic incorrect read faults; a test algorithm was also proposed to detect all observed faults in the same paper. More recently, Radhakrishnan *et al.* [12] developed and implemented a design-for-testability (DfT) scheme for STT-MRAM parametric testing and process optimization. The CMOS-based DfT circuit replicates the electrical characteristics of MTJ devices. They also extended this DfT design to monitor electrical parameter deviations of MTJ device due to aging defects formation over time [13].

Scanning the prior works on testing STT-MRAM or MRAM reveals four major limitations. First, *linear resistors* are used to model all manufacturing defects, including those in MTJ devices which are the data-storing elements in STT-MRAMs. However, linear resistors (with only electrical properties) *cannot* reflect the changes of defects on the MTJ's magnetic properties which are as important as electrical ones. Second, there is a lack of characterization data of defective STT-MRAM cells/devices; this is needed to understand the mechanisms, causes, locations, and impact of STT-MRAM defects. Third, existing fault modeling approaches are unsystematic, and the fault model terminology is ambiguous. For instance, Chintaluri *et al.* [9] refer to a failed transition write fault as *transition fault* (TF), while Vatajelu *et al.* [14] use the term *slow write fault* (SWF) to describe the same faulty behavior. In addition, the term *read distrub fault* (RDF) is used to describe different faulty behaviors with different failure mechanisms in [9] and [15]. Finally, the proposed test solutions in the prior art have never been implemented in real-world STT-MRAM prototype



Fig. 1. (a) MTJ stack, (b) cross-sectional TEM image, and (c) 1T-1MTJ cell.

chips; therefore, their effectiveness in detecting STT-MRAMspecific defects has not been justified with silicon data yet.

This paper addresses the above-mentioned limitations in STT-MRAM testing at all three abstraction levels: manufacturing defects, fault models, and test solutions. The main contributions are listed as follows.

- Survey STT-MRAM manufacturing process and define a complete space of STT-MRAM defects [6,16].
- Propose device-aware test (DAT), a new test approach towards DPPB level [17,18].
- Characterize, analyze, and model three key types of MTJ-internal defects using device-aware defect modeling based on silicon measurements [19–21].
- Develop fault models and test solutions for both MTJinternal defects and interconnect defects [21–23].
- Analyze STT-MRAM robustness by characterizing, modeling, and evaluating magnetic coupling effects based on silicon measurements [24].
- Develop a magnetic-field-aware compact model of pMTJ (named as MFA-MTJ model) in Verilog-A for electrical/magnetic co-simulation of STT-MRAMs [25].

The remainder of this paper is organized as follows. Section II provides a background for STT-MRAM. Section III introduces STT-MRAM manufacturing defects, the conventional test and its limitations. Section IV presents the DAT approach. Section V, Section VI, and Section VII apply DAT to pinhole, SAFF, and IM state defects, respectively. Section VIII details the magnetic coupling model and Section IX elaborates the MFA-MTJ model. Section X concludes this paper.

## II. BACKGROUND

This section briefly introduces the MTJ device technology and the most widely-used 1T-1MTJ cell design.

#### *A. MTJ Device Organization*

*Magnetic tunnel junction* (MTJ) is the fundamental component of STT-MRAMs acting as the data-recording element where one-bit data is coded into the relative magnetization directions of ferromagnetic layers. Fig. 1a shows its stack organization which fundamentally comprises four layers [26].

The top layer is called *free layer* (FL), which is composed of CoFeB-based materials. The typical thickness of FL is 1.5 nm [26]. The FL's magnetization can be reversed by a spin-polarized current going through it. *Tunnel barrier* (TB) is the second layer below the FL; TB is typically made of MgO material. As the TB layer is ultra-thin (∼1 nm) [26], electrons have chance to tunnel through it overcoming its

TABLE I. KEY TECHNOLOGY AND ELECTRICAL PARAMETERS OF MTJ.

| <b>Technology Parameters</b>                               |                                                                                                                                                                                                                                          | <b>Electrical Parameters</b>                                                                                                                            |                                                                                                                                                                                                                             |
|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Aο<br>M.<br>Hı<br>ō<br>RA<br><b>TMR</b><br>$H_{\rm strav}$ | Cross-sectional area of MTJ<br>Saturation magnetization of the FL<br>Magnetic anisotropy field of the FL<br>Potential barrier height of the TB<br>Resistance-area product<br>Tunneling magneto-resistance ratio<br>Stray field at the FL | Rр<br>$R_{AP}$<br>$I_c(P\rightarrow AP)$<br>$I_c(\text{AP}\rightarrow\text{P})$<br>$t_{\rm w}$ (P $\rightarrow$ AP)<br>$t_{\rm w}$ (AP $\rightarrow$ P) | Resistance in P state<br>Resistance in AP state<br>$P \rightarrow AP$ critical switching current<br>$AP \rightarrow P$ critical switching current<br>$P \rightarrow AP$ switching time<br>$AP \rightarrow P$ switching time |

*potential barrier height*  $\overline{\varphi}$  [6]. This makes the device behave as a tunneling-like resistor. The third layer is called *reference layer* (RL), which is based on the CoFeB material, typically 2 nm in thickness. The RL has a fixed magnetization at certain direction to provide a reference to the magnetization in the FL, as shown in the device schematic. Due to the *tunneling magneto-resistance* (TMR) effect [27], the MTJ's resistance is relative low  $(R_P)$  when the magnetization of the FL is parallel to that of the RL, and it is relative high  $(R_{AP})$  when in antiparallel state. The bottom layer is Co/Ni-based (∼5 nm); it is commonly referred to as *hard layer* (HL). Its function is to strongly pin the magnetization in the RL by means of the anti-ferromagnetic coupling effect [26].

It is worth noting that the RL and HL together form a *synthetic anti-ferromagnet* (SAF) structure [26], which sometimes is also referred to as *pinned layer* (PL) as a whole. Fig. 1 shows the cross-sectional TEM image of a  $\approx$  55 nm MTJ device fabricated at IMEC. The key technology parameters of MTJ are listed in Table I.

## *B. 1T-1MTJ Cell Design*

 $=$ 

Fig. 1c shows a bottom-pinned 1T-1MTJ memory cell and its corresponding voltage configurations for read/write (R/W) operations. The three-terminal cell includes an MTJ device (storage element) and an NMOS transistor (access selector). The three terminals are connected to a bit line (BL), a source line (SL), and a word line (WL), as shown in the figure.

The voltages on the BL and SL control R/W operations on the cell when the WL is asserted. For instance, a write '0' operation requires the BL at  $V_{\text{DD}}$  and the SL grounded, which leads to a current  $I_{\rm w0}$  flowing from BL to SL. In contrast, a current  $I_{w1}$  with the opposite direction flows through the cell during a write '1' operation. To guarantee a successful transition of the MTJ state, the magnitude of write current (both  $I_{\rm w0}$  and  $I_{\rm w1}$ ) has to be larger than the MTJ's *critical switching current*  $I_c$ . The larger the current above  $I_c$ , the faster the switching can be. Due to the bias dependence of STT efficiency and stray fields [27],  $I_c(P\rightarrow AP)$  can be significantly different from  $I_c$ (AP→P) in practice. It is worth noting that the *actual switching time*  $t_w$  under a fixed pulse varies from cycle to cycle since the STT-induced magnetization switching is intrinsically stochastic [1]. During a read operation, a significantly smaller voltage  $V_{\text{read}}$  than  $V_{\text{DD}}$  is applied on the BL to draw a read current  $I_{\text{rd}}$ , which can be as small as  $\sim$ 10 µA or 0.06 $I_c$  [28], to read the resistive state of the MTJ device by a sense amplifier.

Table I lists the key technology and electrical parameters of MTJ device to be used for the DAT-based defect modeling.



TABLE II. STT-MRAM DEFECTS AND CLASSIFICATION.

| FEOL.                                                                                                                       | BEOL.                                                                            |                                                                                                                                                                                                                                                      |  |
|-----------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Transistor                                                                                                                  | Interconnect                                                                     | <b>MTJ</b> Device                                                                                                                                                                                                                                    |  |
| Material impurity<br>Crystal imperfection<br>Pinholes in gate oxides<br>Shifting of dopants<br>Patterning proximity<br>etc. | Open vias/contacts<br>Irregular shapes<br>Big bubbles<br>Small particles<br>etc. | Pinholes in TB<br>Redepositions on MTJ sidewalls<br>Synthetic anti-ferromagnet flip<br>Intermediate states<br>Back-hopping<br>Extreme thickness variation of TB<br>MgO/CoFeB interface roughness<br>Atom inter-diffusion<br>Magnetic layer corrosion |  |
|                                                                                                                             |                                                                                  | etc.                                                                                                                                                                                                                                                 |  |

## III. DEFECT SPACE AND CONVENTIONAL TEST

## *A. STT-MRAM Manufacturing Process and Defects*

A defect is a physical imperfection in manufactured chips (i.e., an unintended difference from the intended design) [29]. To guarantee a high-quality test solution and improve the manufacturing process itself so as to improve yield, understanding all potential defects is of great importance. The STT-MRAM manufacturing process mainly consists of the standard CMOS fabrication steps and the integration of MTJ devices into metal layers. Fig. 2 shows the bottom-up manufacturing flow and the vertical structure of STT-MRAM cells [30]. Based on the manufacturing phase, STT-MRAM defects can be classified into front-end-of-line (FEOL) and back-end-of-line (BEOL) defects. As MTJs are integrated into metal layers during BEOL processing, BEOL defects can be further categorized into interconnect defects and MTJ-internal defects. All potential defects are listed in Table II; the detailed explanations of these defects can be found in [31].

#### *B. Conventional Test for STT-MRAM*

Defect modeling is the first critical step in the test development process. It abstracts physical defects and presents them at electrical level so as to be processed by circuit simulators such as SPICE. Traditionally, a spot defect in an electronic circuit is modeled as a linear resistor (e.g., open or bridge), and the defect strength is represented by its resistance value [10]. Fig. 3 shows how the resistive models are used to model defects in interconnects and contacts of an STT-MRAM cell.

The resistive models in Fig. 3 are used to develop appropriate fault models. Table III presents the fault modeling results for all resistive bridges. For instance, the resistive bridge BC<sub>SL-IN</sub> (which connects the SL to the internal cell node, as shown in Fig. 3) results in an incorrect read fault IRF1= $\langle 1r1/1/0 \rangle$  [22] when the resistance is below  $13 k\Omega$ . Detecting IRF1 requires reading '1' from each memory cell,

TABLE III. SINGLE-CELL STATIC FAULT MODELING RESULTS OF RESISTIVE BRIDGES.

| <b>Defect</b> | <b>Resistance</b><br>$(\Omega)$ | <b>Sensitized</b><br>FP                  | <b>Fault Model</b><br>&FP Name    | <b>Detection</b><br>Condition    |
|---------------|---------------------------------|------------------------------------------|-----------------------------------|----------------------------------|
| $BC_{SI}$ in  | [0, 13k)                        | $\langle 1r1/1/0 \rangle$                | <b>Incorrect Read Fault: IRF1</b> | $\mathcal{L}$ (1, r1, )          |
|               | [0, 1.1k)                       | $\langle 1r1/1/0 \rangle$                | Incorrect Read Fault: IRF1        | $\mathcal{L}$ (1, r1, )          |
|               |                                 | $\langle 0w1/0/- \rangle$                | Transition Fault: TFO             | $\hat{\mathbb{L}}$ (0, w1, r1, ) |
| $BC_{BL-IN}$  |                                 | $\langle 1 \text{w0}/1/- \rangle$        | Transition Fault: TF1             | $\hat{\mathbb{I}}$ (1, w0, r0, ) |
|               | [1.1k, 3.1k)                    | $\langle 1r1/1/0 \rangle$                | <b>Incorrect Read Fault: IRF1</b> | $\mathcal{L}$ (1, r1, )          |
|               |                                 | $\langle 1 \text{w0}/1/\text{m} \rangle$ | Transition Fault: TF1             | $\mathcal{L}$ (1, w0, r0, )      |
|               | [0, 5.6k)                       | $\langle 0r0/0/1 \rangle$                | <b>Incorrect Read Fault: IRF0</b> | $\hat{\mathbb{I}}$ (0, r0, )     |
| $BCWLSI$ .    |                                 | $\langle 1 \text{w0}/1/\text{m} \rangle$ | Transition Fault: TF1             | $\hat{\mathbb{L}}$ (1, w0, r0, ) |
|               | [5.6k, 56.1k]                   | $\langle 0r0/0/1 \rangle$                | <b>Incorrect Read Fault: IRF0</b> | $\hat{\mathbb{I}}$ (0, r0, )     |
|               | [0, 7.7k)                       | $\langle 0r0/0/1 \rangle$                | <b>Incorrect Read Fault: IRFO</b> | $\mathcal{L}$ (0, r0, )          |
| $BC_{WI, JN}$ |                                 | $\langle 1 \text{w} 0/1/- \rangle$       | Transition Fault: TF1             | $\mathcal{L}$ (1, w0, r0, )      |
|               | (7.7k, 13.1k)                   | $\langle 0r0/0/1 \rangle$                | <b>Incorrect Read Fault: IRF0</b> | $\hat{\mathbb{I}}$ (0, r0, )     |

denoted as the march element  $\mathcal{D}$  (...1, r1, ...). If the resistance is larger than  $13 \text{ k}\Omega$ , it leads to a weak fault. The complete fault modeling results can be found in the thesis [31].

Based on the previous fault analysis results, appropriate test solutions can be developed. All easy-to-detect faults can be detected by March tests. To minimize the test cost, the minimal detection condition for each resistance (defect strength) range is first identified. Thereafter, all the detection conditions for all resistance ranges are merged to obtain an optimal test algorithm. For instance, our experimental results suggest that March C- [32,33] can be used to detect all observed faults.

#### *C. Limitations of the Conventional Test Approach*

The conventional test approach assumes that any defect in a semiconductor device can be modeled as a linear resistor either in parallel to  $(R_{\text{pd}})$  or in series with  $(R_{\text{sd}})$  a defect-free MTJ device. The physical mechanism of defect is never taken into account and manifested as a difference in the defect model. This can be found in the prior works on STT-MRAM testing [9–11,16,34–36]. However, it has recently been demonstrated that this assumption is inaccurate for emerging technologies such as RRAM and STT-MRAM [18]; the results showed that the traditional approach may even lead to wrong fault models.

Fig. 4a shows the measured R-V loop of a good MTJ and Fig. 4b shows the measurement data of a defective MTJ. Due to the non-linear behavior of MTJ, it is impossible to model the impact of a physical defect on the R-V loop simply by adding a linear resistor. For an MTJ device, its magnetic properties are as important as electrical ones; but linear resistors are unable to capture defect-induced changes in magnetic properties [19]. Hence, a new test approach is required to develop high-quality yet cost-efficient test solutions for device-internal defects.



Fig. 6. Device-aware test steps: (a) generic device-aware defect modeling flow, (b) fault analysis procedure, and (c) device-aware test development.

## IV. DEVICE-AWARE TEST (DAT) APPROACH

To overcome the limitations in the conventional test approach, we propose a new test approach, which we name as *device-aware test* (DAT) [18]. The DAT flow is shown in Fig. 5, which fundamentally consists of three steps as follows.

*1) Device-aware defect modeling.* Fig. 6a shows a threestep modeling flow. Its inputs are: (a) a defect-free MTJ compact model and (b) a defect under investigation. The output is an optimized (parameterized) defective MTJ compact model. Note that the device can also be an RRAM device, a PCM device, a planar or FinFET transistor etc. The approach consists of the following three sub-steps. First, the defect needs to be physically analyzed and characterized to understand its forming mechanism, location, occurrence rate, and the key technology parameters that are impacted. Thereafter, the effects of the defect are quantitatively incorporated into these technology parameters. Second, the defect-induced changes in the technology parameters are mapped into the device's electrical parameters. This allows us to convert the defect-free device model into a parameterized defective model. Third, the obtained model can be further calibrated by fitting to silicon data if available.

*2) Device-aware fault modeling.* First, a complete fault space which describes all possible faults in emerging memories is defined. This is achieved by extending the conventional *fault primitive* (FP) notation  $\langle S/F/R \rangle$  [37]; the complete fault space can be found in [22]. Based on the extended FP definition, all memory faults are classified into two categories: *easy-to-detect* (EtD) faults and *hard-to-detect* (HtD) faults [18]. EtD faults are those which can be detected by applying normal write and read operations, i.e., March tests, while HtD faults refer to those which cannot be guaranteed by March tests in their detection. Second, a systematic fault analysis based on circuit simulations for each targeted defect is conducted, as shown in Fig. 6b; this is to derive realistic faults that can be sensitized by such a defect within the pre-defined fault space.

*3) Device-aware test development.* The accurate and realistic faults obtained from the previous step are used to develop test solutions at DPPB level. Specifically, EtD faults can simply be detected by March tests; HtD faults, however, need special DfT or stress tests (see Fig. 6c). The clear mapping relations between physical defects and fault models enable us to not only reduce test escapes and time but also speed up yield learning [18].

#### V. DAT FOR PINHOLE DEFECTS

In this section, we apply the three-step DAT approach to a key type of MTJ-internal defects: *pinhole* defects [19].

#### *A. Device-Aware Defect Modeling for Pinhole Defects*

Pinhole defects in the tunnel barrier of MTJ are seen as one of the most important type of STT-MRAM manufacturing defects; they take place during the multi-layer deposition [38– 40]. The inset in Fig. 7a shows a schematic of pinhole defect; it can form due to unoptimized deposition processes [38]. We performed comprehensive characterization on MTJ devices with pinhole defects. Fig. 7a shows the R-H hysteresis loops



TABLE IV. SINGLE-CELL STATIC FAULT MODELING RESULTS OF PINHOLE **DEFECTS** 



of four selected devices from the same wafer; Fig. 7b shows the measured R-V loops. More details about pinhole defect characterization can be found in [19].

We then applied the three-step defect modeling approach (see Fig. 6a) to pinhole defects [19]. With the obtained pinhole defect model, we simulated the R-V loop for an MTJ device with different pinhole size; the simulation results are shown with the solid curves in Fig. 7. It is clear that the simulation results of our defective MTJ model match the measured silicon data in terms of resistance and switching voltage.

#### *B. Device-Aware Fault Modeling for Pinhole Defects*

The upper part of Table IV shows the fault modeling results of pinhole defects in MTJ devices using our proposed DAT approach; the fault detection conditions for different pinhole size ranges are listed in the last column. It can be seen that sufficiently large pinholes  $(A_{ph}>0.61\%)$  make the MTJ device fall into the resistance range of '0' state or even



Fig. 8. Comparison of sensitized FPs due to pinhole defects: device-aware test vs. conventional test based on linear resistors.



Fig. 9. Stress test for detecting small pinhole defects (HtD faults).

primitives are listed in the table. As the pinhole gets smaller  $(A_{\text{ph}}\in (0.07\%, 0.61\%])$ , it makes  $R_{\text{P}}$  fall into 'L' state and  $R_{\text{AP}}$ into 'U' state. If the pinhole size is smaller than 0.04%, it leads to a weak fault, while the cell still behaves logically correct.

We also performed fault modeling using the conventional linear resistor model; the results are shown in the lower part of Table IV. Fig. 8 is a Venn diagram comparing the fault modeling results using these two different approaches. It can be seen that our DAT approach results in 18 FPs. Among these FPs, 17 FPs are not observed with resistor models  $R_{\text{pd}}$ and  $R_{sd}$  while only a single EtD fault (W1TF0= $\langle 0w1/0/-\rangle$ ) is in overlap. Among the unique 17 FPs generated by our DAT approach, 10 FPs are HtD faults and the rest 7 FPs are EtD faults. With the resistor-based defect models, only '0' and '1' states were observed in the simulations, leading to 4 EtD faults. This is because the MTJ device is considered as a *black box* and *ideal*.

#### *C. Device-Aware Test Development for Pinhole Defects*

Based on the previous fault analysis results, appropriate test solutions can be developed to detect pinhole defects with different sizes. Large pinhole defects  $(A_{ph} > 0.35\%)$  lead to EtD faults; therefore, any March algorithm including the element  $(x(1,r1))$  can guarantee their detection. However, for smaller pinhole defects ( $A_{ph} \leq 0.35\%$ ), HtD faults are sensitized. They are typically related to the cell being in a forbidden state (i.e., H, L, or U) or to random readout values. One possible solution is to subject the STT-MRAM to a hammering write '1' operation sequence with elevated voltage or prolonged pulse width to deliberately speedup the growth of pinhole defects, so as to transform HtD faults to EtD faults. Fig. 9 shows the measurement data of four selected MTJ devices under a stress test. Under pulse stress, the pinhole defects quickly grow up into larger ones leading to a drop in the resistance of the MTJ devices.



Fig. 10. Comparison between a defect-free MTJ (upper) and a SAFF-defective MTJ (lower) with  $\sim$  55 mass (a) obtained a 6.4 D attached D IJ Jacob MTJ (lower) with  $\sim$  55 nm: (a) schematic of AP state, (b) P-H loop.



Fig. 11. Simulation results vs. measurement data for a SAFF-defective MTJ.

## VI. DAT FOR SYNTHETIC ANTI-FERROMAGNET FLIP (SAFF) DEFECTS

In this section, we apply the three-step DAT approach to another key type of MTJ-internal defects: *synthetic antiferromagnet flip* (SAFF) defects [21].

## *A. Device-Aware Defect Modeling for SAFF Defects*

We did comprehensive magnetic and electrical characterization on MTJs with diameters ranging from 35 nm to 175 nm on four wafers. A small fraction of devices across different sizes with *horizontally* flipped R-H loops (see Fig. 10b) and *normal* R-V loops were observed. We attribute the root cause to the flip of magnetization in both HL and RL, which we name as SAFF defects. A probable cause of SAFF defects is an initial HL reversal. Due to inhomogeneities arising during device fabrication steps, HL with significantly reduced  $H_c$  may exist in certain outlier devices. Given the strong anti-ferromagnetic coupling strength between the HL and RL  $(>10 \text{ kOe})$  for our devices), the RL also flips with the HL. Due to a SAFF defect, the polarity of the stray field at the FL is reversed when compared to a good MTJ device (see Fig. 10).

We then applied the three-step defect modeling approach (see Fig. 6a) to SAFF defects [21]. Fig. 11 shows the measured and simulated  $V_p$  vs.  $t_p$  at switching probability  $P_{sw}=0.50$ for a SAFF-defective MTJ with *electrical critical diameter* eCD=35 nm. It can be seen that our simulation results match the silicon data. The output of device-aware defect modeling is a calibrated Verilog-A SAFF-defective MTJ model, which is compatible with analog circuit simulations for subsequent fault modeling.



Fig. 12. Comparison of sensitized FPs due to SAFF defects: device-aware test vs. conventional test based on linear resistors.



Fig. 13. Testing SAFF defects using a magnetic write '0' operation  $(w0_H)$ .

#### *B. Device-Aware Fault Modeling for SAFF Defects*

Fig. 12 compares the fault modeling results using our DAT approach and the conventional test approach based on linear resistor injection. It can be seen that a SAFF defect results in a HtD fault: *intermittent passive neighborhood pattern sensitive fault* (PNPSF1<sub>i</sub>) using our DAT approach. This cannot be obtained by the conventional fault modeling approach where a linear resistor is injected in parallel with or in series with an ideal defect-free MTJ device. In contrast, the conventional approach results in four EtD faults, as shown in the figure. This indicates that these four faults are not qualified to cover SAFF defects in STT-MRAMs. Accordingly, the March tests targeting these four EtD faults obviously cannot guarantee the defection of SAFF defects.

#### *C. Device-Aware Test Development for SAFF Defects*

Based on the fault modeling results, we have proposed two test solutions. One straightforward test solution could be a March algorithm such as:

$$
\{\mathcal{D}(w1); \mathcal{D}(w0, r0, w1)^n\}.
$$

Here  $n (n \in \mathbb{Z}^+)$  denotes the number of times that the second march element should be repeated. It can be inferred that getting high confidence in the detection comes at the cost of a long test time (large  $n$ ); 100% detection is hard to guarantee. The second test solution aims at guaranteeing the detection by incorporating *magnetic* write operations in the March test:

$$
\{\text{ }(\text{w0}_{\text{H}});\text{ }(\text{r0})\}\text{ or }\{\text{ }(\text{w1}_{\text{H}});\text{ }(\text{r1})\}.
$$

Here, the first element  $w0<sub>H</sub>$  (w1<sub>H</sub>) indicates a magnetic write '0' ('1') operation; i.e., an external field  $H_{ext}$  is applied to switch the MTJ state rather than driving an electric current through the MTJ device. Fig. 13 illustrates the test process to guarantee the detection of SAFF defects.



Fig. 14. IM state defects: (a) Cross-sectional schematic of an MTJ in the IM state, (b) energy barrier diagram for an MTJ with AP, P, and IM states, and (c) IM state measurement data.



Fig. 16. Curve fitting of  $P_{ST}$  and  $P_{IM}$  to measurement data.

## VII. DAT FOR INTERMEDIATE (IM) STATE DEFECTS

In this section, we apply the three-step DAT approach to another key type of MTJ-internal defects: *intermediate* (IM) state defects [23].

#### *A. Device-Aware Defect Modeling for IM State Defects*

As introduced in Section II, MTJ devices normally have two bi-stable magnetic states: P and AP. However, the fabrication and integration process of MTJ devices is vulnerable to several defects, among which IM state defects are considered as a critical type. An IM state manifests itself as a third resistive state between  $R_P$  and  $R_{AP}$ , leading to unintended memory faulty behaviors. The root causes can be attributed to: 1) multidomain structure of the FL induced by the dipole field and large device size, or 2) inhomogeneous distribution of stray field at the FL from the SAF layers, etc.

Fig. 14a shows a cross-sectional schematic of an MTJ in the IM state; the FL layer is split into two regions: 1) P-state region and 2) AP-state region. Fig. 14b illustrates the energy barrier diagram of a defective MTJ with AP, P, and IM states. We performed comprehensive characterization on devices with eCD ranging from 60 nm to 120 nm. We observed that the majority of measured devices showed two resistive states. However, some devices showed IM states with resistance values between  $R_P$  and  $R_{AP}$ ; an example is shown in Fig. 14c. We also observed that the occurrence of IM state significantly depends on the applied bias voltage, switching direction (i.e.,  $AP \rightarrow P$  or  $P \rightarrow AP$ ), and device size in our experiments [20].

We then followed the three-step defect modeling approach to model IM state defects [20]. Fig. 16 shows the fitting results of the successful transition probability  $P_{ST}$  between AP and P states and the occurrence probability of IM state  $P_{IM}$ ; it can be seen that our simulation results match the measurement data.







Fig. 17. Proposed March algorithm with a weak write operation  $\hat{w}0/\hat{w}0_H$ .

## *B. Device-Aware Fault Modeling for IM State Defects*

The obtained accurate model of IM state defects is used to develop realistic fault models using SPICE-based circuit simulations. Fig. 15 shows a Venn diagram which compares the fault modeling results using our device-aware defect model and the conventional resistive defect model. Clearly, the device-aware defect model leads to two HtD faults; they are *intermittent write transition faults*. For example,  $W1TFU_i = \langle 0w1/U_i / \rangle$  means that an up-transition operation on a memory cell with the inital state '0' transforms the memory cell into a 'U' state with a certain probability (i.e., intermittently). In contrast, the resistive defect model results in four EtD faults. There is no overlap between the two circles. This means that IM state defects in MTJ devices exhibit unique faulty behaviors which cannot be covered by the resistive defect models.

#### *C. Device-Aware Test Development for IM State Defects*

Since the two FPs sensitized using our device-aware defect model are intermittent and involve the 'U' state, they are hard to be detected by normal March tests. To detect IM state defects, we proposed the following March algorithm with a weak write operation, as illustrated in Fig. 17:

$$
\{\mathcal{D} \ (\mathbf{w}0); \mathcal{D} \ (\mathbf{w}1, \mathbf{r}1); \mathcal{U} \ (\widehat{\mathbf{w}}0/\widehat{\mathbf{w}}0_{\mathrm{H}}, \mathbf{r}1)\}.
$$

Here  $\hat{w}0$  denotes a write '0' operation with a relatively weak current; it can be implemented by reducing current amplitude or duration, compared to normal write operations. Similarly,  $\widehat{w}0_H$  means a write '0' operation using a weak magnetic field. The weak write operation induces an  $IM\rightarrow P$  transition while it is not strong enough to change AP state. More detailed explanation about this test and its circuit implementation and verification can be found in [23].



Fig. 18. (a) MTJ stack and the intra-cell stray fields from the RL and HL, (b)  $3\times3$  MTJ array and the inter-cell stray fields from neighboring cells, (c) SEM image of the 0T1R wafer floorplan, and (d) SEM image of MTJ array.









Fig. 22.  $H_{\text{sinter}}^z$  at the FL of victim C8 under various combinations of the number of 1s in direct neighbors and diagonal neighbors.

#### VIII. MAGNETIC COUPLING MODEL

As a unique mechanism for MRAMs, magnetic coupling needs to be taken into account when designing memory arrays. This section introduces the magnetic coupling mechanism, followed by measurement data and a model [24].

#### *A. Magnetic Coupling Mechanism*

To obtain high TMR and strong interfacial perpendicular magnetic anisotropy (iPMA), our MTJ devices were annealed at 375 °C for 30 mins in a vacuum chamber under a perpendicular (out-of-plane) magnetic field of 20 kOe. Once the ferromagnetic layers (i.e., FL, RL, and HL) in the MTJ stack are magnetized, each of them inevitably generates a stray field in the space. Fig. 18a illustrates the intra-cell stray field  $H<sub>s-intra</sub>$  perceived at the FL, generated by the RL and HL together. Furthermore, as the density of STT-MRAMs increases, the spacing between neighboring MTJs becomes narrower (i.e., smaller pitch). This makes stray fields from neighboring cells non-negligible [41]. Fig. 18b-d shows the inter-cell stray field  $H_{s\_{inter}}$  in an STT-MRAM array.

## *B. Magnetic Coupling Characterization*

 $H_{\text{s\_intra}}^{z}$  can be extracted from R-H loops. Fig. 19a shows a measured R-H loop for a representative MTJ with the HL/RL configuration shown in Fig. 18a. Due to the existence of stray fields at the FL, the loop is always offset to the positive side. Given the fact that  $RA$  does not change with the device size, the eCD of each device can be derived by:  $eCD = \sqrt{\frac{4}{\pi} \cdot \frac{RA}{R_P}}$ ,

where  $RA=4.5 \Omega \cdot \mu m^2$  (measured at blanket stage) for this wafer, and  $R_P$  can be extracted from the R-H loop. The calculated eCD=55 nm for the device shown in Fig. 19a. In this way, we can obtain  $H_{s\_intra}^z$  and eCD for MTJ devices with different sizes on the same wafer. The measurement results are shown in Fig. 19b. The error bars indicate the device-to-device variation in the measured values due to process variations and the intrinsic switching stochasticity. It can be seen that the smaller the device size (i.e., smaller eCD), the higher  $H_{\text{s\_intra}}^z$ ; the trend even tends to grow exponentially for  $eCD < 100$  nm.

## *C. Magnetic Coupling Modeling*

To analyze the impact of magnetic coupling on the MTJ's performance, we have developed a physics-based model for both inter-cell and intra-cell magnetic coupling [24]. Fig. 20 shows the 3D distribution of  $H_{s\_intra}$  for a modeled MTJ. Fig. 19b presents the simulation results of  $H_{\text{s\_intra}}^z$  vs. eCD (solid curve), which match the silicon data. We have also proposed the *inter-cell magnetic coupling factor* Ψ to indicate the coupling strength; The  $\Psi$  value varies with device size and array pitch, as shown in Fig. 21. For our devices,  $\Psi$ =2% can be considered as the threshold point, where the array density is maximized with negligible inter-cell magnetic coupling. For a given eCD and pitch,  $H<sub>s</sub>$  inter also changes with the data pattern in the neighborhood (see Fig. 22). Using this model, we can evaluate the impact of magnetic coupling on STT-MRAM performance parameters such as  $I_c$ ,  $t_w$ , and  $\Delta$  [24].



Fig. 23. Block diagram of the proposed MFA-MTJ model for simulations of hybrid MTJ/CMOS circuits.



## IX. MFA-MTJ MODEL: A MAGNETIC-FIELD-AWARE COMPACT MODEL OF PMTJ

It is well recognized that the performance of STT-MRAM is very sensitive to all sources of magnetic fields. This section introduces a magnetic-field-aware compact model of perpendicular MTJ (named as MFA-MTJ model) for fast and robust STT-MRAM design [25].

#### *A. Implementation of MFA-MTJ Model*

Fig. 23 illustrates the block diagram of our MFA-MTJ model. The model has two terminals and meets Ohm's law: i.e.,  $V(T1, T2) = I_{MTJ} \cdot R_{MTJ}$ . The MTJ resistance  $R_{MTJ}$ depends on the magnetic state AP or P, the bias voltage  $V(T1, T2)$ , and the ambient temperature T;  $R_{\text{MTJ}}$  can also be switched between  $R_{\rm P}$  and  $R_{\rm AP}$ , depending on the current  $I_{\text{MTJ}}$  and its duration. In essence, the compact MTJ model describes the complex relationships between these three electrical variables. It abstracts MTJ devices from physical level to electrical level via compact behavioral modeling, described in Verilog-A. In other words, the inputs of the MTJ model are physical and technology parameters (e.g., eCD and RA) and the outputs are MTJ's electrical parameters (e.g.,  $R_P$ ) and  $I_c$ ); the mapping relationships from the inputs to the outputs are analytically described by physical equations. The internal implementation of the MTJ compact model consists of different functional modules, as shown in the figure. Detailed implementation of this model can be found in [25].

#### *B. STT-MRAM Full Circuit Simulations*

With the obtained MFA-MTJ model, we can perform electrical/magnetic co-simulation of STT-MRAM circuits. Fig. 24 presents the simulation results of *write error rate* (WER) vs.  $V<sub>p</sub>$  at  $t<sub>p</sub>=10$  ns when different external fields  $H<sub>ext</sub>$  are applied to an MTJ device. With  $H_{ext}$ =500 Oe, the WER curve shifts to the right in comparison to  $H_{ext}=0$  Oe; At a given  $V_p$ , the change in WER can be up to  $\times 10^2$ . We also simulated STT-MRAM full circuits with peripherals [22]. Fig. 25 shows the waveforms of seven key signals during the transient simulation of operation sequence: 0w1r1w0r0; when the pitch changes from  $3 \times eCD$  to  $1.5 \times eCD$ , the switching time  $t_w$  during the w1 operation becomes larger while it becomes smaller in the w0 operation, due to the inter-cell magnetic coupling effect.

## *C. Design Space with Various Variation Sources*

Our STT-MRAM simulation platform with the proposed MFA-MTJ model enable us to explore STT-MRAM design space considering five variation sources: 1) process variation (device-to-device variation), 2) supply voltage variation, 3) operating temperature variation, 4) MTJ switching stochasticity (cycle-to-cycle variation), and 5) magnetic field variation. Fig. 26 shows a contour plot of WER of 0w1 operation with respect to  $t_p$  and  $V_{\text{WL}}$ , when  $H_{\text{ext}}$ =500 Oe at room temperature. It can be seen that WER(0w1) gradually decreases from the lower-left corner to the upper-right corner. We define the *area*

151

*of design space*  $A_{ds}$  as the normalized area where WER= $10^{-6}$ with respect to the entire area of the contour plot. In the figure,  $A_{ds}$ =0.254; this is 13.8% smaller than the baseline  $A_{ds}$  value where there is no external fields; see [25] for more details.

#### X. CONCLUSION

In this paper, we have explored high-quality test for STT-MRAM. We have demonstrated that the conventional resistorbased test approach fails in deriving accurate fault models and high-quality tests for STT-MRAM device defects. Therefore, we have proposed the device-aware test approach and applied it to three key types of MTJ-internal defects; clear mapping relationships between manufacturing defects, fault models, and test solutions have been created and verified with silicon measurements. Emerging memory technologies such as STT-MRAM, RRAM, and PCM require unique manufacturing steps which could cause unique defect mechanisms. This calls for a better understanding of new defect mechanisms and better fault modeling and test approaches such as device-aware test.

Moreover, Spintronic circuits such as STT-MRAM require magnetic/electrical co-simulations of MTJ/CMOS circuits. Therefore, we have proposed a magnetic coupling model to evaluate the impact of magnetic coupling and density on STT-MRAM performance. In addition, we have also presented the MFA-MTJ model for fast and robust STT-MRAM device/circuit co-design.

#### **REFERENCES**

- [1] D. Apalkov *et al.*, "Magnetoresistive random access memory," *Proc. IEEE*, vol. 104, pp. 1796–1830, Aug. 2016.
- [2] W.J. Gallagher *et al.*, "22nm STT-MRAM for reflow and automotive uses with high yield, reliability, and magnetic immunity and with performance and shielding options," in *Int. Electron Devices Meeting*, Dec. 2019, pp. 2.7.1–2.7.4.
- [3] S. Ikegawa *et al.*, "Magnetoresistive random access memory: Present and future," *Trans. Electron Devices*, vol. 67, pp. 1407–1419, Jan. 2020.
- [4] Avalanche Technology, "Avalanche STT-MRAM products," , accessed in Jun. 2020.
- [5] K. Lee *et al.*, "1Gbit high density embedded STT-MRAM in 28nm FDSOI technology," in *Int. Electron Devices Meeting*, Dec. 2019, pp.  $2.2.1 - 2.2.4$ .
- [6] L. Wu *et al.*, "Electrical modeling of STT-MRAM defects," in *Int. Test Conf.*, Oct. 2018, pp. 1–10.
- [7] R. Bishnoi *et al.*, "Special session–emerging memristor based memory and CIM architecture: Test, repair and yield analysis," in *VLSI Test Symp.*, Apr. 2020, pp. 1–10.
- [8] A. Chintaluri *et al.*, "A model study of defects and faults in embedded spin transfer torque (STT) MRAM arrays," in *Asian Test Symp.*, Nov. 2015, pp. 187–192.
- [9] A. Chintaluri *et al.*, "Analysis of defects and variations in embedded spin transfer torque (STT) MRAM arrays," *J. Emerg. Sel. Topics Circuits Syst.*, vol. 6, pp. 319–329, Sep. 2016.
- [10] I. Yoon *et al.*, "EMACS: efficient MBIST architecture for test and characterization of STT-MRAM arrays," in *Int. Test Conf.*, Nov. 2016, pp. 1–10.
- [11] S.M. Nair *et al.*, "Defect injection, fault modeling and test algorithm generation methodology for STT-MRAM," in *Int. Test Conf.*, Oct. 2018, pp. 1–10.
- [12] G. Radhakrishnan *et al.*, "A parametric DFT scheme for STT-MRAMs," *Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 27, pp. 1685–1696, Jul. 2019.
- [13] G. Radhakrishnan et al., "Monitoring aging defects in STT-MRAMs," *Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 39, pp. 4645– 4656, Mar. 2020.
- [14] E.I. Vatajelu et al., "Challenges and solutions in emerging memory testing," *Trans. Emerg. Topics Comput.*, vol. 7, pp. 493–506, Jul.–Sep. 2019.
- [15] R. Bishnoi *et al.*, "Read disturb fault detection in STT-MRAM," in *Int. Test Conf.*, Oct. 2014, pp. 1–7.
- [16] L. Wu *et al.*, "Survey on STT-MRAM testing: Failure mechanisms, fault models, and tests," *arXiv preprint*, pp. 1–24, Jan. 2020.
- [17] L. Wu et al., "Device-aware test for emerging memories: Enabling your test program for DPPB level," in *Eur. Test Symp.*, May 2020, pp. 1–2.
- [18] M. Fieback et al., "Device-aware test: A new test approach towards DPPB level," in *Int. Test Conf.*, Nov. 2019, pp. 1–10.
- [19] L. Wu *et al.*, "Pinhole defect characterization and fault modeling for STT-MRAM testing," in *Eur. Test Symp.*, May 2019, pp. 1–6.
- [20] L. Wu et al., "Characterization and fault modeling of intermediate state defects in STT-MRAM," in *Design Autom. & Test in Europe Conf.*, Feb. 2021, pp. 1–6.
- [21] L. Wu *et al.*, "Characterization, modeling and test of synthetic antiferromagnet flip defect in STT-MRAMs," in *Int. Test Conf.*, Nov. 2020, pp. 1–10.
- [22] L. Wu *et al.*, "Defect and fault modeling framework for STT-MRAM testing," *Trans. Emerg. Topics Comput.*, pp. 1–15, Dec. 2019.
- [23] L. Wu *et al.*, "Characterization and fault modeling of intermediate state defects in STT-MRAM," *Trans. Comput.*, pp. 1–14, 2021, (minor revision).
- [24] L. Wu *et al.*, "Impact of magnetic coupling and density on STT-MRAM performance," in *Design Autom. & Test in Europe Conf.*, Mar. 2020, pp. 1211–1216.
- [25] L. Wu et al., "A magnetic-field-aware compact model of pMTJ for robust STT-MRAM design," *arXiv preprint*, pp. 1–14, 2021.
- [26] G.S. Kar et al., "Co/Ni based p-MTJ stack for sub-20nm high density stand alone and high performance embedded memory application," in *Int. Electron Devices Meeting*, Dec. 2014, pp. 19.1.1–19.1.4.
- [27] A.V. Khvalkovskiy *et al.*, "Basic principles of STT-MRAM cell operation in memory arrays," *J. Phys. D: Appl. Phys*, vol. 46, p. 139601, Feb. 2013.
- [28] W. Zhao *et al.*, "Design considerations and strategies for high-reliable STT-MRAM," *Microelectronics Rel.*, vol. 51, pp. 1454–1458, Sep.-Nov. 2011.
- [29] M. Bushnell *et al.*, *Essentials of electronic testing for digital, memory and mixed-signal VLSI circuits*. Springer Science & Business Media, 2004, vol. 17.
- [30] Y.J. Song *et al.*, "Highly functional and reliable 8Mb STT-MRAM embedded in 28nm logic," in *Int. Electron Devices Meeting*, Dec. 2016.
- [31] L. Wu, "Testing STT-MRAM: Manufacturing defects, fault models, and test solutions," PhD dissertation, TU Delft, Feb. 2021.
- [32] A.J. Van De Goor, "Using march tests to test SRAMs," *IEEE Design Test of Computers*, vol. 10, pp. 8–14, Mar. 1993.
- [33] S. Hamdioui et al., "Memory test experiment: industrial results and data," *IEE Proc. Computers and Digital Techniques*, vol. 153, pp. 1– 8, Jan. 2006.
- [34] C.L. Su *et al.*, "MRAM defect analysis and fault modeling," in *Int. Test Conf.*, Oct. 2004, pp. 124–133.
- [35] J. Azevedo *et al.*, "A complete resistive-open defect analysis for thermally assisted switching MRAMs," *Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, pp. 2326–2335, Nov. 2014.
- [36] C. Su *et al.*, "Testing MRAM for write disturbance fault," in *Int. Test Conf.*, Oct. 2006, pp. 1–9.
- [37] S. Hamdioui *et al.*, "An experimental analysis of spot defects in SRAMs: realistic fault models and tests," in *IEEE Asian Test Symp.*, Dec. 2000, pp. 131–138.
- [38]  $\hat{W}$ . Zhao *et al.*, "Failure analysis in magnetic tunnel junction nanopillar with interfacial perpendicular magnetic anisotropy," *Materials*, vol. 9, pp. 1–17, Jan. 2016.
- [39] B. Oliver *et al.*, "Two breakdown mechanisms in ultrathin alumina barrier magnetic tunnel junctions," *J. Appl. Phys.*, vol. 95, pp. 1315– 1322, Jan. 2004.
- [40] S. Mukherjee *et al.*, "Role of boron diffusion in CoFeB/MgO magnetic tunnel junctions," *Phys. Rev. B*, vol. 91, p. 085311, 2015.
- [41] C. Augustine et al., "Numerical analysis of typical STT-MTJ stacks for 1T-1R memory arrays," in *Int. Electron Devices Meeting*, Dec. 2010, pp. 22.7.1–22.7.4.