

# Instant Test and Repair for TSVs using Differential Signaling

Ching-Yi Wen<sup>1</sup> · Shi-Yu Huang<sup>1,2</sup>

Received: 22 December 2023 / Accepted: 9 March 2024 / Published online: 3 April 2024 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024

#### **Abstract**

A faulty Through Silicon Via (TSV) could spoil a 3D IC and cause hefty loss as the potentially expensive known-good-dies bonded together must be discarded. This work presents a Fault-tolerant TSV scheme to avoid such a disastrous situation. Our method uses two differential TSVs for each binary signal to be transmitted. Compared to the previous Fault-tolerant TSV schemes, our test and repair scheme is not only instant and much more simplified, requiring no global test result analysis and complex reconfiguration process, thereby making it especially suitable for some situations when the more involved TSV test and repair schemes cannot be easily supported by some die providers in multi-vendor 3D-IC design environment.

**Keywords** 3D-IC · TSV · Fault Tolerance · Differential Signaling · Instant Repair · Yield enhancement

## 1 Introduction

In a 3D-IC, several known good dies are stacked vertically via Through Silicon Vias (TSVs). It is known that thermal and mechanical stresses during the die stacking and bonding process may introduce parametric defects in the TSV structure (such as delay faults or leakage faults). If a TSV is faulty, the entire 3D-IC could fail and all the known-good dies bonded are wasted, leading to a hefty loss. To avoid such a last-minute disaster during the 3D-IC integration, yield and reliability enhancement strategies for the faulty TSVs are essential to guarantee the success of a product.

In general, a TSV could connect several dies vertically. However, for simplicity without losing generality, we assume in this work that there are only two dies connected by TSVs—the master die and the slave die. Supporting circuits need to be inserted in both dies to support the TSV repair process – including the "redundancy architecture" and the "test-and-repair controllers" to execute the test and repair protocol.

Responsible Editor: K.-J. Lee

- Shi-Yu Huang syhuang@ee.nthu.edu.tw
- Electrical Engineering Department, National Tsing Hua University, Hsinchu, Taiwan
- <sup>2</sup> Electrical Engineering Department, Southern Taiwan University of Science and Technology, Tainan, Taiwan

Numerous TSV repair schemes have been proposed in the literature [1, 3, 6, 7, 10, 12–15, 17, 18, 20–23]. In general, a faulty TSV has to be identified first by a test method and then replaced by some spare TSVs. To support the replacement, input switching logic (such as de-multiplexers) and output switching logic (such as multiplexers) are needed along the signal path of each TSV. The previous Fault-tolerant TSV schemes have targeted a wide range of different situations. They can be characterized by several features in terms of the targeted fault types, the test methods, the diagnostic methods, the spare structures, the replacement algorithms, etc. The quality metrics include the repair cost, the repair rate, and the ability to repair special faults (e.g., delay faults, leakage faults, electromigration faults, clustered faults, etc.).

In this work, we consider the Fault-tolerant TSVs from an aspect that takes into account the limitation of a realistic "cooperative multi-vendor design environment", in which a 3D-IC integrator may not design all the functional dies. In other words, a 3D-IC integrator may incorporate some 3rdparty functional dies, e.g., a memory die. A memory vendor is known to have the mindset of producing dies to work with various SoC dies by using the same core circuitry with some standardized interfaces. Sometimes, it might be hard to persuade a die vendor to adopt a sophisticated TSV repair scheme, considering that every die involved in a 3D-IC project needs to implement all the supporting circuitry for the TSV's test and repair – including the spare structure, the controller for the test and diagnosis of the faulty TSVs, and the on-chip memory and logic to analyze the test results for deciding the repair routes.



In this work, we pursue a "simple yet effective test and repair scheme for TSVs" with the following principles to resolve this difficulty.

- 1. All the design-for-repair circuitry will be inserted just like local wrappers around the TSV areas. That is, *there are no global repair controllers at all*.
- 2. The test-and-repair operations are simplified so that they can be executed on the triggering of some control signal, e.g., when the signal *Test* goes high for only 3 test clock cycles. The test results are processed within each TSV wrapper, and the repair process is completed instantly afterward. Therefore, there is no need to transmit the test results to any global controller for further processing and there is no need to decide the routing of the signals through TSVs. This is a very unique merit not found in the previous methods.

Following the above principles, we developed a simple Fault-tolerant TSV with the following features:

- 1. We use "differential TSV pairs" to provide redundancy for fault tolerance. An input binary signal and its complementary signal are used to drive two TSVs, respectively. At the receiver ends, they are "differentially sensed" and converted back into a binary value. Even though two TSVs are used for each binary signal transmission, with the ever-decreasing TSV pitch into the less-than-10µm realm [2, 16], the cost has become not just affordable but also worthwhile.
- 2. We use the simple "pulse-vanishing test" in [8, 9]. A short-duration pulse signal, e.g., with a pulse width of 1 ns, extracted from the high-speed system clock is used to drive each TSV within a test clock cycle. If the pulse signal driving a status flip-flop's clock port can "register" the flip-flop at the receiver end, then the corresponding TSV is considered fault-free. Otherwise, it is regarded as faulty. This test method can detect not just hard faults, but also delay faults. It is worth mentioning that the term "pulse" used in this paper has only one shot, which is different from a long series of pulse waves as defined in [11].
- 3. A "sensor cell" at the received ends of each "differential TSV pair" is specially designed to "fix" or "mask" a severely faulty TSV instantly based on the prior test result stored locally in the status flip-flop. After the test, the 3D-IC can start its functional mode and now the "reconfigured sensor cell" will operate correctly.
- 4. A fault is called a minor fault if it escapes our test method. For this kind of minor fault, our sensor cell does not mask it completely. However, the fault effect is mitigated because of the differential nature of our sensor cell, as will be reported later.



The rest of this paper is organized as follows. In Section 2, we provide the preliminaries. In Section 3, we introduce our basic Fault-tolerant TSV structure, including the "differential TSV pair" and the basic sensor cell. Also, we discuss our install test-and-repair process. In Section 4, we present simulation results and in Section 5 we conclude.

# 2 Preliminaries

# 2.1 Delay Test Method

As shown in Fig. 1, the Pulse-Vanishing Test (PV-Test) proposed in [8] applies a short-duration pulse signal at the driver end of the TSV under test. The pulse width is roughly equal to the system clock cycle time. If the TSV is fault-free as illustrated in Fig. 1(a), the pulse signal could manage to arrive at the receiver end of the TSV (denoted as WO). This pulse waveform is said to have "survived" the journey through the TSV and will be restored to full swing after passing the receiver. On the other hand, if the TSV is faulty with excessive resistance as shown in Fig. 1(b), the pulse signal may vanish altogether. This phenomenon is because a faulty TSV could be too resistive and therefore the rising transition at node WO in the figure is so distorted that it never reaches a level above the threshold voltage of the receiver. As a result, the resulting pulse waveform at node WO is considered a glitch and "filtered" by the receiver, leading to a no-pulse situation at the receiver's output (i.e., node B) indicating a failed situation.



Fig. 1 The pulse-vanishing test for a TSV as proposed in [8]





Fig. 2 A typical infrastructure for TSV repair [10]

# 2.2 An Exemplar TSV Repair Scheme

In general, a TSV test-and-repair process involves three major steps, namely, fault detection, diagnosis (to identify the locations of the faulty TSVs), and repair (to reconfigure the redundancy structure for bypassing identified faulty TSVs).

Figure 2 shows an example of the required infrastructure to carry out an on-chip test-and-repair process [10]. Overall, there is a Main Controller located in a die designated as the master die (i.e., Die 1 in the figure). Further in each die, there is a Test Controller (named TC in the figure), and a local Repair Controller (named RC in the figure).

The Main Controller regulates the entire test-and-repair process. It initiates a test session that shifts in the test stimuli and shifts out the test results through a global TDI-to-TDO daisy chain. Notably, the test results need to be "broadcast" to every die so that each die can check the test results and generate the "repair pattern" accordingly, which is then loaded into the input switch boxes (as indicated as trapezoidal shapes in Die 1) and the output switch boxes (as indicated as trapezoidal shapes in Die 2). It is noteworthy that there is one "repair scan chain" as well in each die, consisting of all its input or output switch boxes. The repair scan chain orders of different dies should match carefully

so that the "two repair bits" at the two ends of a TSV can match after the repair. For example, the repair patterns are both "0111" for Die 1 and Die 2 if the 2nd TSV has been identified as faulty and is to be bypassed. Once the repair is completed, the Main Controller will issue another test session to check if the repair is successful. In this case, the TSV group is said to be "4-out-of-5" meaning that it will function correctly after repair if at least 4 TSVs are fault-free.

Overall, the above test-and-repair process requires "global infrastructure" and sophisticated repair scheduling. Simplifying this process by eliminating the global Repair Controllers, the switch boxes, and the repair scan chains is the major motivation for this work.

Table 1 summarizes the differences between our scheme (to be proposed) and previous methods in five criteria – target fault type(s), test method, diagnosis technique, redundancy structure, and repair scheme.

- The targeted faults include hard faults (stuck-at, open, short faults) and parametric faults (delay faults, leakage faults, and online electromigration faults). Some even consider clustered faults. Our scheme will target both hard faults and parametric faults.
- 2. Previous works have offered numerous test methods. In this work, we use the pulse-vanishing test.
- The diagnosis techniques used to pinpoint the locations
  of faulty TSVs could be complicated and require the
  assistance of external ATE or on-chip diagnosis hardware. In our method, it is as simple as just checking the
  status flip-flop of each TSV after testing.
- 4. The redundancy structure ranges from shifting-based, switching network-based, to more sophisticated routingbased, etc. Depending on the redundancy structure. In this work, we use differential TSV pairs as our redundancy structure.
- 5. The repair schemes involve how the repair information is loaded into the switch boxes of the redundancy structure. Sometimes, the repair information needs to be stored in a repair memory first before being applied through the

Table 1 Comparison of TSV repair methods

| Criterion             | Prior Methods                                                               | Proposed Scheme                 |  |
|-----------------------|-----------------------------------------------------------------------------|---------------------------------|--|
| Targeted Fault Types  | Hard faults (stuck-at, open, short) Delay faults, Leakage faults, etc       | Both Hard and Parametric Faults |  |
| Test Methods          | Boundary-scan test Memory-Like BIST<br>Pulse-Vanishing Test, etc            | Pulse-Vanishing Test            |  |
| Diagnosis Techniques  | External-ATE-based diagnosis, On-chip diagnosis, On-the-spot diagnosis, etc | On-the-spot diagnosis           |  |
| Redundancy Structures | Shifting-based<br>Switching-Network-based<br>Multi-Hop Routing-based, etc   | Differential TSV Pair           |  |
| Repair Schemes        | Store-and-Load repair<br>On-the-fly repair, etc                             | Instant Local Repair            |  |



assistance of some repair chain chains. Sometimes, the repair memory can be eliminated if the repair is done on the fly in a way that the repair information is generated as a bitstream and immediately loaded into the repair chains. In our work, we do not need the repair memory nor the repair scan chains at all because our repair is conducted on-the-fly and locally.

# 3 Proposed Fault-Tolerant TSV Scheme

With smaller and smaller TSV pitch sizes below  $10\mu m$ , the use of differential TSV pairs for one binary signal transmission has become more and more cost-effective and appealing. In such a differential TSV pair, a binary signal and its complementary are transmitted through two independent TSVs. At the receiver end, they are combined back to a binary signal by a "differential sensor cell" or simply "sensor cell" for short. Since a binary signal has been transmitted in two TSVs, any fault occurring to only one of them can be tolerated. In the following, we investigate the details of this different TSV pair in terms of its basic circuit and operation in the functional mode, its fault detection method, and then its self-repair.

#### 3.1 Basic Differential TSV Pair

As illustrated in Fig. 3, the differential TSV pair consists of the following 3 parts:

- 1. The two TSVs, one as the native TSV and the other as the complementary TSV.
- The driver-side buffer and inverter, for enforcing the differential signals down the two TSVs.



| Signal Name | Meaning Input binary signal Output binary signal End signal of native TSV End signal of complementary TSV |  |
|-------------|-----------------------------------------------------------------------------------------------------------|--|
| Α           |                                                                                                           |  |
| В           |                                                                                                           |  |
| Р           |                                                                                                           |  |
| N           |                                                                                                           |  |

Fig. 3 Proposed Differential TSV pair



3. The receiver-side sensor cell, for converting the received differential signals back to a binary signal.

The transistor-level schematic of the sensor cell is shown in Fig. 4. It is a logic gate based on the Cascode Voltage Switch Logic [19]. It has 3 parts – the differential input stage, the common current sink, and the load portion. The differential input stage consists of nMOS transistors M5 and M7, sitting on top of the common current sink made of an always-on footer nMOS transistor M11. The two input signals to this sensor cell are the signal through the native TSV (driving the right-hand-side of the input stage and denoted as P throughout this paper) and the other signal through the complementary TSV (driving the left-hand-side of the input stage and denoted as N throughout this paper). One of these two input signals with a larger voltage will dominate the input stage and drain the most current of the common current sink. The load portion at the upper part can be viewed as a pseudo-cross-coupled pMOS latch consisting of M1 and M2. Its purpose is to enhance the sensor result to a rail-to-rail logic signal as illustrated in the following example. The binary output signal of this sensor cell is called signal B.

(Example 1) Consider the two input conditions driving our basic sensor cell. Consider the first case when the input P is logic '1' and the input N is logic '0' as illustrated in Fig. 5(a). The current mainly flows through the right-hand-side branch of the input stage, thereby discharging node Y2. After a while, the left-hand-side pMOS in the load portion starts to conduct and build up a relatively high voltage at node Y1. Through the pseudo-cross-coupled pMOS latch, a stable condition between Y1 and Y2 is established and output B becomes '1'. Now consider the other case when the input P is logic '0' and the input N is logic '1' as illustrated in Fig. 5(b). The current mainly flows through the left-hand-side branch, and output B will become '0'.



Fig. 4 The schematic of our basic sensor cell, following the principle of Cascode Voltage Switch Logic [19]



Fig. 5 The response of our basic sensor cell in two different input conditions (P='1', N='0') and (P='0', N='1')

For detecting parametric faults, a test method is associated with a test threshold. When the severity of the fault exceeds the test threshold, it is detected and called a severe fault in this paper. Otherwise, it may escape the test method and be called a minor fault. Our basic strategy is to detect severe faults and mask them while mitigating minor faults.

In this subsection, we analyze the fault-effect-mitigation ability of our differential TSV pair for minor faults. We use a single TSV for comparison. In this experiment, we consider the delay fault with the fault resistance increasing gradually from 1 k $\Omega$  to 10 k $\Omega$  on a TSV. As shown in Fig. 6, for the single TSV version, the TSV delay (from A to B) increases accordingly, e.g., from its fault-free delay of 226 ps to more than 1647 ps at 10 k $\Omega$ . On the other hand, when the fault resistance is injected into one TSV of our differential TSV pair, the TSV pair delay (from A to the output of our sensor cell, denoted as B) increases as well, but with a slower increasing rate than that in the single TSV version.



Fig. 6 The delay of our differential TSV pair using the basic sensor cell versus the fault resistance of a delay fault

This indicates that our differential TSV pair can mitigate the fault effect to some extent.

#### 3.2 Dual-Railed Sensor Cell

The fault effect mitigation ability of our sensor cell can be further enhanced by turning the input stage into a dual-railed version as shown in Fig. 7. Consider a situation in the original sensor cell that input N is undergoing a normal '1-to-0' falling transition, while input P a slow '0-to-1' rising transition due to a delay fault. The basic sensor cell may not react very quickly since the input stage is high-active. It waits for the faulty input P to rise above the threshold voltage before it reacts. To fix this lagging reaction, we adopt a dual-railed input stage. In addition to the original high-active nMOS transistors, M5 and M7, we added two low-active pMOS transistors, M3 and M9. In this new version, instead of waiting for the slow-to-rise input P, the quick-to-fall input signal N drives the input stage, leading to an overall faster turnaround time for our sensor.

We run the fault-effect-mitigation experiment again with the new dual-railed sensor cell design. The basic single TSV delay and our two versions of differential TSV pair delays are compared in Fig. 8. It can be seen that when the signal TSV experiences a delay of 1000 ps, our delay has improved from 880 to 562 ps by using the dual-railed sensor cell.

## 3.3 Complete Test-and-Repair Wrapper

The complete test-and-repair wrapper for a differential TSV pair is shown in Fig. 9. Its operations can be described in the following aspects.



Fig. 7 Dual-railed sensor cell to speed up the signal propagation for half-faulty differential TSV pair





Fig. 8 The delay of our differential TSV pair using the dual-railed sensor cell in the presence of a delay fault

1.We use a toggle-flip-flop (on the driver-side) to convert a two-pulse signal, i.e., TP, into a single-pulse signal, i.e., T, which is further applied to both TSVs during the test mode. The detailed reason why this is needed can be found in the literature [8]. The resulting pulse width of the single-pulse signal is therefore equivalent to the system clock cycle time. For example, if a system clock signal is 1 GHz, then the pulse width of the test clock signal in our test method will be 1 ns. When a delay fault on a TSV is so severe that such a pulse signal is prevented from reaching its termination end, then the TSV is considered faulty.

2.At the termination end of each TSV, we use a status flip-flop to record the test result. Before the test session begins, all status flip-flops are initialized to '0'. During a test session, if there is a pulse signal surviving the journey



Fig. 9 Complete test wrapper for our differential TSV pair with testand-repair ability

of a TSV and arrives at the clock port of the status flip-flop, a logic '1' will be registered to that status flip-flop, indicating a test result of "pass". Otherwise, the status flip-flop stays at '0' at the end of the test session to indicate a test result of "fail". As shown in the figure, we have two status flip-flops for each differential TSV pair, named PE and NE respectively, with the following meanings:

- If Native TSV has a severe fault, 'PE' will stay at its initial reset value of '0' after a test session. Otherwise, 'PE' will become '1' to indicate a fault-free or minor fault condition.
- If Complementary TSV has a severe fault, 'NE' will stay
  at its initial reset value of '0' after a test session. Otherwise, 'PE' will become '1' to indicate a fault-free or
  minor fault condition.

## 3.4 Fault Tolerant Sensor Cell

To utilize the test results to achieve fault tolerance in the presence of hard faults or severe faults, our sensor cell is modified as in Fig. 10, and turned into a so-called Fault-Aware (FA) Dual-Railed Sensor Cell. It means that once PE and NE can indicate the faulty or fault-free status of the TSV pair, this sensor cell can function correctly in the presence of hard or severe faults.

In this fault-aware sensor cell, the footer nMOS transistor used to implement the common current sink is not changed. The load portion is not changed either. However, the input-stage part that takes in the two signals from the native TSV (i.e., P) and complementary TSVs (i.e., N) is now expanded to incorporate the status signals from PE and NE. It can now



Fig. 10 Fault-Aware (FA) Dual-railed sensor cell for hard or severe half-faulty TSV pair



be considered an input-stage network with four discharging paths, two on the left-hand side and two on the left-hand side. These two sides are competing to become the dominating discharging force while pushing the other side to the opposite.

In this input-stage network, we intend to use 'PE' and 'NE' as the gating signals of the 4 discharging paths. If 'PE' and 'NE' are both '1', two TSVs in a differential TSV pair are both fault-free. Under this condition, our fault-tolerant sensor cell will degenerate to its primitive dual-railed version. Otherwise, the differential TSV pair is half-faulty. It follows that either 'PE' is '0' or 'NE' is '0', assuming there is only one fault in each TSV pair. Next, we elaborate on the two conditions, respectively.

- 1. When 'PE' is faulty '0' and 'NE' is fault-free '1', it indicates that the native TSV is faulty and the complementary TSV is fault-free. Our fault-tolerant sensor cell degenerates to a form as shown in Fig. 11. Since the native TSV is faulty, our sensor takes the signal from the other complementary TSV, i.e. N, as the sole input. When N is negative, the right-hand-side discharging path dominates. On the other hand, when N is positive, the left-hand-side discharging path dominates. Each will produce a correct output response.
- 2. When 'NE' is faulty '0' and 'PE' is fault-free '1', it indicates that the complementary TSV is faulty and the native TSV is fault-free. Our fault-tolerant sensor cell degenerates to a form as shown in Fig. 12. Since the complementary TSV is faulty, our sensor takes the signal from the native TSV, i.e. P, as the sole input. When P is positive, the right-hand-side discharging path domi-



When native TSV is identified faulty, but complementary TSV is fault-free,

→ Thus, PE is '0', but NE is '1'

Fig. 11 Degenerate forms of our fault-aware sensor cell when its differential TSV pair is half-faulty that PE is '0' and NE is '1'



When complementary TSV is identified faulty, but native TSV is fault-free,

→ Thus, NE is '0', but PE is '1'

Fig. 12 Degenerate forms of our fault-aware sensor cell when its differential TSV pair is half-faulty that PE is '1' and NE is '0'

nates. On the other hand, when P is negative, the lefthand-side discharging path dominates. Each will produce a correct output response.

# 3.5 Coping Strategy for Bridging Faults

A bridging fault may occur between the native TSV and the complementary TSV of a signal as shown in Fig. 13. If we change the injected bridging fault resistance,  $R_{bridge}$ , from  $0\Omega$  to  $10~\mathrm{k}\Omega$  in a step size of  $100\Omega$ , the propagation delay of our differential TSV pair will change accordingly as depicted in Fig. 14. Our TSV pair fails to function normally when  $R_{bridge}$  is less than a threshold value of  $300\Omega$ . Note that a smaller  $R_{bridge}$  value means a more severe bridging fault.

To tolerate such a bridging fault, we resort to a signal shuffling technique, in which the 4 constituent TSVs of every 2



Fig. 13 A differential TSV pair affected by a bridging fault





Fig. 14 Delay plot of a differential TSV pair with a bridging fault

neighboring signals, denoted as A1 and A2, are placed physically in an interleaved manner as shown in Fig. 15. By doing so, a bridging fault has been turned into two single faults in a sense that it only affects one constituent TSV of each of the 2 differential signals, and thus, their fault effects can now be masked by our scheme. In this case, we are shuffling the constituent TSVs of 2 signals to combat a bridging fault. If there is a cluster fault that bridges more than 2 physically neighboring TSVs, then the shuffling of a larger group of signals, e.g., a group of 4 signals, can be employed. In that case, the physical distance between the two constituent TSVs of each signal is made even larger. Thereby, a multiple-TSV bridging effect can be diluted into several single faults and tolerated individually.

## 3.6 Instant Test and Repair Flow

Consider a 3D IC consisting of some stacked dice. The bottom one is considered the master die, where the Test-and-Repair center is located. The Test-and-Repair Center issues basic logistic signals to all dice through 3D-IC test channels, including {TCLK, TRST}. Meanwhile, we assume a high-speed



Two signal paths through the TSVs are mixed in this figure: (A1→B1) and (A2→B2)

Fig. 15 Two differential TSV pairs after the signal shuffling



clock signal, e.g., 1 GHz, is also available on all dice. However, they do not have to be synchronized among these dice. This is because our test method is asynchronous and each TSV is tested quite independently. Then, our overall test repair flow can be performed in the following steps:

- 1. A low-frequency test clock signal TCLK, e.g., 10 MHz, is sent to all dice.
- 2. Originally, the test reset signal TRST is high, which forces the initialization of all flip-flops in both the driver-side wrappers and the receiver-side wrappers of all dice to their initial values, i.e., logic '0'.
- Note that the two-pulse signal in each die can be generated by a simple test-pulse generator in each die. The first PV-test cycle triggers a two-pulse signal to all odd-numbered TSV pairs, and the second PV-test cycle triggers a two-pulse signal to all even-numbered TSV pairs. By this arrangement, the test pulses along physically neighboring TSVs are dispersed, because of our aforementioned signal shuffling technique. In other words, when a test pulse is launched along a TSV, all its neighbor TSVs will be grounded to create the most hostile environment to trigger a bridging fault if any exists.
- 4. It is worth emphasizing that at the end of the above two test clock cycles, all our sensor cells at the receiver ends are properly configured and ready to sense the incoming differential signal while tolerating fault effects along some TSVs.

In summary, our instant test and repair process takes only 3 test clock cycles, as illustrated in Fig. 16, including (1) reset cycle, (2) PV-Test-1 cycle for odd-numbered TSV pairs, and (3) even-numbered PV-Test-2 cycle for even-numbered TSV pairs. After that, the TSV is ready to support normal operation.



TP1 is the two-pulse signal driving odd-numbered differential TSV pairs TP2 is the two-pulse signal driving even-numbered differential TSV pairs

Fig. 16 Our instant test and repair flow using only 3 test clock cycles

#### 4 Simulation Results

# 4.1 Layout and Area Overhead

We have implemented the proposed test-and-repair wrapper in a 90 nm CMOS process, divided into the driver-side wrapper and the receiver-side wrapper, with the following area estimates.

- The driver-side wrapper includes 3 tri-state buffers and 1 tri-state inverter, 1 inverter, and 1 resettable flip-flop. The total area is estimated as 50.08 μm<sup>2</sup>.
- 2. The receiver-side wrapper includes two 2-input AND gates, 2 resettable flip-flops, and one sensor cell. The full-custom layout of our sensor cell has an area of 39.34 μm² as shown in Fig. 17. All the nMOS transistors in this layout have a W/L ratio of (480 nm/100 nm). The 2 pMOS transistors in the dual-railed input stage have a W/L ratio of (1440 nm/100 nm), and the 2 pMOS transistors in the load portion have a W/L ratio of (420 nm/100 nm). The total area of the receiver-side wrapper is thus estimated as 77.44 μm².

The breakdown of the area overhead of our test wrappers is summarized in Table 2. The total area is  $(50.8+77.44)=128.24~\mu m^2$ . In comparison, the area of a boundary scan cell in this 90 nm CMOS process is  $52.22~\mu m^2$ . Considering that an interconnect requires two boundary scan cells (one at the driver side and the other at the receiver side), the total area overhead of the proposed method is about (128.24-2\*52.22) / (2\*52.22)=22.8% than that of a boundary scan test for a single TSV. In addition to the test wrapper, for each binary signal, two TSVs are required.

# 4.2 Circuit Model for Simulation

For performance estimation, 3 different types of circuit models as shown in Fig. 18 are used. It is worth mentioning again



Fig. 17 Full-custom layout of our fault-aware dual-railed sensor cell in a 90 nm CMOS process

**Table 2** Area overhead of our test wrappers for supporting the proposed differential TSV pair in a 90 nm CMOS process

| Circuit       |                      | Area (µm²) | Total area (µm²) |
|---------------|----------------------|------------|------------------|
| Driver-side   | 3 tri-state buffers  | 23.28      | 50.8             |
|               | 1 tri-state inverter | 9.88       |                  |
|               | 1 inverter           | 2.12       |                  |
|               | 1 resettable FF      | 15.52      |                  |
| Receiver-side | 2 AND2               | 7.06       | 77.44            |
|               | 2 resettable FFs     | 31.04      |                  |
|               | 1 sensor cell        | 39.34      |                  |

Ref: A boundary scan cell is 52.22 μm<sup>2</sup>

that the "delay" reported in this section is calculated by the larger of the low-to-high propagation delay and the high-tolow propagation delay.

- Figure 18(a) shows a basic single TSV (or a bare-metal one) without any test or repair mechanism. We assume a simple RC model for the TSV [5], with a series resistance of 2 mΩ and a lumped capacitance of 242fF. The delay from A to B in this bare-metal one is 236 ps.
- 2. Figure 18(b) shows 2 basic repairable TSVs using the traditional input/output switch-box-based repair scheme to bypass identified faulty TSV(s). The signal path from A1 to B1 passes through 3 MUXes. The first 2 MUXes from A1 to TSV1 are needed for supporting the test mode (when 'TEST' is 1) and the input switch box, while the last MUX from TSV to B1 is for supporting the output switch box. The delay from A1 to B1 in the functional mode is 436 ps. Compared to the bare-metal one, the extra delay is (436 ps 236 ps) = 200 ps.
- 3. Figure 18(c) shows a proposed differential TSV pair. The delay from A to B is 363 ps during the functional mode.

For the evaluation of a resistive open fault, a resistance,  $R_{\rm open}$ , will be injected into the affected TSV model in a series manner. The larger this resistance, the more severe the fault. For the evaluation of a resistive leakage fault, a resistance,  $R_{\rm leak}$ , will connect the affected TSV to a ground level, i.e., logic '0'. The smaller this resistance, the more severe the leakage fault. Similarly, for the evaluation of a bridging fault, a resistance,  $R_{\rm bridge}$ , will connect two affected TSVs. The smaller this resistance, the more severe the bridging fault.

#### 4.3 Analysis for Stuck-At Faults

If a stuck-at-0 fault occurs to our differential TSV pair but affects just one TSV, then our test method will be able to detect this stuck-at-fault during the test clock cycle, and then reconfigure our fault-aware sensor cell to react to the other fault-free TSV alone. As a result, our differential TSV pair can still function correctly. The delay from A to B is 585 ps.







(Notations) 'IS": input-switch box control bit, 'OS": output-switch box control bit (b) Two TSVs using the traditional switch-box-based repair scheme.



Fig. 18 Circuit models used in our simulation for performance estimation

# 4.4 Performance Analysis for Resistive Open Faults

Figure 19 shows the delay plot of our half-faulty differential TSV pair with a resistive open fault occurring to the native TSV, based on SPICE simulation. The fault resistance is increased incrementally to  $10~\text{k}\Omega$  in a step size of  $100\Omega$ . In this experiment, we apply a test pulse width of 1000~ps. When the fault resistance occurring to



Fig. 19 Delay plot of a half-faulty differential TSV pair with a resistive open fault





Fig. 20 Delay plot of a half-faulty differential TSV pair with a leakage fault

the faulty TSV exceeds 4.2 k $\Omega$ , the fault is successfully detected by our test method, and classified as a "severe fault". Otherwise, it will not be detected and classified as a "minor fault". When the fault is severe, the faulty TSV is "ignored" by our sensor cell and so the delay from A to B is determined by the fault-free TSV alone with a delay time of 585 ps. It is worth mentioning that this is the maximum delay of our differential TSV pair no matter how large the fault resistance grows since the faulty TSV has been "masked" out. In other words, our method enjoys a timing margin of at least (1000 ps -585 ps) = 415 ps, regardless of the fault resistance.

## 4.5 Performance Analysis for Leakage Faults

Figure 20 shows the delay plot of our half-faulty differential TSV pair with a leakage fault occurring to the native TSV, based on SPICE simulation. In other words, a leakage resistance  $R_{leak}$  is inserted between the faulty native TSV and the ground node. The smaller  $R_{leak}$ , the more severe the fault. We sweep the  $R_{leak}$  from  $0\Omega$  to  $10~k\Omega$  in a step size of  $100\Omega$ . We found that a leakage fault with a  $R_{leak}$  smaller than  $900\Omega$  will be detected by our test method and regarded as a severe fault. A basic single TSV with a severe leakage fault will fail. On the contrary, our differential TSV pair still functions correctly, with a delay of about 640 ps when there is a hard leakage fault to the ground.

### 4.6 Performance Analysis for Bridging Faults

Consider a bridging fault occurring to 2 TSVs. Two cases are studied as illustrated in Fig. 21(a) and (b).

In Fig. 21(a), signal A1 is transmitted to signal B1 via a single TSV, and signal A2 is transmitted to signal B2 via another TSV. These two functional TSVs are bridged. We

attempt to analyze the worst-case delay from A1 to B1 under the influence of the bridging fault. In the figure, the two input patterns for A1 and A2 are also illustrated. In the first pattern, a "rising transition" is applied to signal A1, while signal A2 is set to logic '0' to induce a worst-case bridging effect. Then, we examine the waveform of signal B1 to determine the propagation delay,  $\tau_{PLH}$ , from A1 to B1. In the second pattern, a "falling transition" is applied to signal A1, while signal A2 is set to logic '1' to induce a worst-case bridging effect. Again, we examine the waveform of signal B1 to determine the propagation delay,  $\tau_{PHL}$ , from A1 to B1. The larger value of  $\tau_{PLH}$  and  $\tau_{PHL}$  is reported as the final delay.

In Fig. 21(b), signal A1 is transmitted to signal B1 via a proposed differential TSV pair, and signal A2 is transmitted to signal B2 via another proposed differential TSV pair. As mentioned previously, the signal shuffling technique is applied. So, the bridging fault affects one TSV in each of the two signal paths, i.e., the two native TSVs in the figure. The two input patterns for A1 and A2 are the same in Fig. 21(a).

Figure 22 shows the plot of the delay from A1 to B1 under the influence of the bridging fault. Again, we sweep the  $R_{bridge}$  from  $0\Omega$  to  $10~\text{k}\Omega$  in a step size of  $100\Omega$ . We found that a bridging fault with a  $R_{bridge}$  smaller than  $200\Omega$  the single TSV will completely fail. On the contrary, our shuffled differential TSV pairs still function correctly, with a delay of at most 557 ps, regardless of the severity of the bridging fault.



(a) A single TSV (A1 $\rightarrow$ B1) affected by a bridging fault to a neighbor TSV (A2 $\rightarrow$ B2)



(b) A bridging fault between two our differential TSV pairs with signal shuffling. Assuming A1-to-B1 is victim path under consideration, and A2-to-B2 is the aggressor.

Fig. 21 Circuits in our simulation for the evaluation of the bridging fault effect



**Fig. 22** Delay plot of our differential TSV pair affected by a bridging fault, with the bridging fault resistance between two shuffled differential TSV pairs

# 4.7 Performance Comparison

In summary, assuming a system clock rate of 1 GHz, we will test our TSVs with a pulse width of 1000 ps. Then, the maximum propagation delay of our differential TSV pair is limited to at most 585 ps, 640 ps, and 557 ps, in the presence of a delay fault, a leakage fault, or a bridging fault, respectively, regardless of the severity of the fault. In other words, the delay time can be controlled to be less than 640 ps at all times using the proposed method, independent of the fault types. We can enjoy a timing margin of (1000 ps - 640 ps) = 360 ps. This timing margin is due to the minor fault mitigation of our fault-aware sensor cell.

Next, we compare our method to the traditional switch-box-based repair scheme as shown previously in Fig. 18(b). Consider the resistive open fault for example. The overall delay across this traditional switch-box-based TSV structure will grow with the increasing  $R_{\rm open}$  and peak at 1188 ps when the  $R_{\rm open}$  is 5 k $\Omega$ . After that, the fault will become severe and detected. Then, the faulty TSV can be replaced by a fault-free one and the signal path delay will come down to 436 ps after the repair process. In other words, the traditional switch-box-based repair scheme could limit the delay to only 436 ps when the fault is severe and detected. However, when the fault is minor and undetected, its delay could increase to an unexpected level as high as 1188 ps. In comparison, our method is much more resistant to the effects of minor faults.

A fair quantitative comparison of existing TSV repair methods is very challenging if not impossible, considering that these methods are highly diverse in terms of their test, diagnosis, and repair schemes. Some of them can perform online Built-In-Self-Repair (BISR) by all hardware, while some others may need the support of software for identifying the faulty TSVs and for the subsequent repair information generation. Some of them require the storage of the repair



information using on-chip nonvolatile memory while some others can perform BISR on the fly. Furthermore, the fault detection ability and repair ability are not the same from one method to another.

Overall, the test-diagnosis-repair protocol for faulty TSVs of all kinds of faults is rather intimidating. The major contribution of this work is a fundamentally new way to simplify this complicated test-diagnosis-repair protocol to a level as easy as "plug-and-play". In other words, we can achieve instant online BISR for faulty TSVs using only "local wrappers in just a few test clock cycles". We believe this is an appealing feature in the future in some situations when the TSVs become smaller and smaller and double TSVs are a viable choice for transmitting a binary signal.

## 5 Conclusion

Comprehensive fault-tolerant TSV methods have been proposed in the literature. However, most of them require scattering hardware and sophisticated test-and-repair processes. If this is one difficulty that needs to be avoided, we propose in this work an instant on-chip test and repair scheme for TSVs. It features a "differential TSV pair" for each binary signal to be transmitted. With the relentless decrease of the TSV pitch size into the 10 µm realm by emerging bonding techniques recently, this double-TSV redundancy could become affordable and cost-efffective in some situations. In our differential TSV scheme, all test and repair activities are done instantly and locally with the TSV wrappers, each occupying only an area of 128.24 µm<sup>2</sup> in a 90 nm CMOS process for a TSV pair (or only 22.8% more than 2 boundary scan cells used in a boundary scan test). Our differential fault-aware sensor design is the key to this scheme. It can mask the severe faults and mitigate the minor faults' timing penalty at the same time. The maximum propagation delay across a TSV pair is limited to 640 ps regardless of the fault type and the fault severity under a system clock rate of 1 GHz.

**Funding** This work was sponsored in part by the Ministry of Science and Technology (MOST) of Taiwan under research grants 111-2221-E-007-115-MY3, and in part, by Power Semiconductor Manufacturing Corp. We also acknowledge the help of Taiwan Semiconductor Research Institute (TSRI) for providing the access to the EDA tools [4].

**Data Availability Statement** The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

## **Declarations**

**Competing Interests** The authors declare no competing interests.



# References

- Chen S, Xu Q, Yu B (2019) Adaptive 3D-IC TSV fault tolerance structure generation. IEEE Trans Comput Aided Des Integr Circuits Syst 38(5):949–960
- Chen MF, Lin CS, Liao EB, Chiou WC, Kuo CC, Hu CC, Tsai CH, Wang CT, Yu DCH (2020) SoIC for low-temperature, multilayer 3D memory integration. Proc of IEEE Electronic Components and Technology Conf. pp 855–860
- 3. Chi CC, Wu CW, Wang MJ, Lin HC (2013) 3D-IC TSV test, diagnosis, and repair. Proc of IEEE VLSI Test Symp. pp 1–6
- EDA cloud Cell-based Flow. Taiwan Semiconductor Research Institute
- Heller L, Griffin W, Davis J, Thoma N (1984) Cascode Voltage Switch Logic: A Differential CMOS Logic Family. Proc of Int'l Solid-State Circuits Conf. pp 16–17
- Hsieh A, Hwang T, Chang M, Tsai M, Tseng C, Li HC (2010) TSV redundancy: architecture and design issues in 3D-IC. Proc of IEEE Design, Automation, and Test in Europe. pp 166–171
- Huang YJ, Li JF (2012) Built-in self-repair scheme for the TSVs in 3-D ICs. IEEE Trans Comput Aided Des Integr Circuits Syst 31(10):1600–1613
- Huang SY, Lee JY, Tsai KH, Cheng WT (2013) At-speed BIST for interposer wires supporting on-the-spot diagnosis. Proc of Int'l On-Line Test Symp, pp 67–72
- Huang SY, Lee JY, Tsai KH, Cheng WT (2014) Pulse-vanishing test for interposers wires in 2.5-D IC. IEEE Trans Comput Aided Des Electron Circuits (TCAD) 33(8):1258–1268
- Huang SY, Tsai MT, Zeng ZF, Tsai KH, Cheng WT (2015) General timing-aware built-in self-repair for die-to-die TSVs. IEEE
  Trans Comput Aided Des Integr Circuits Syst 34(11):1836–1846
- Jeong J, Iizuka T, Nakura T, Ikeda M, Asada K (2011) All-digital PMOS and NMOS Process Variability Monitor Utilizing Buffer Ring with Pulse Counter,. Proc of 16th Asia and South Pacific Design Automation Conference (ASP-DAC). Yokohama, Japan, pp 79–80
- Jiang L, Xu Q, Eklow B (2012) On effective TSV repair for 3D-stacked ICs. Proc of IEEE Design, Automation, and Test in Europe. pp 6–11
- Jiang L, Ye F, Xu Q, Chakrabarty K, Eklow B (2013) On effective and efficient in-field TSV repair for stacked 3D-ICs. Proc of Design Automation Conf. pp 1–6
- Kang U, Chung H, Heo S, Park D, Lee H, Kim J, Ahn S, Cha S, Ahn J, Kwon D et al (2010) 8 GB 3-D DDR3DRAM using Through-Silicon-Via Technology. IEEE J Solid-State Circuits 45(1):111–119
- Liang H, Li D, Yang Z, Ni T, Huang Z, Jiang C (2021) An N:1 single-channel TDMC Fault-tolerant technique for TSVs in a 3D-ICs. Proc of Int'l Test Conf. in Asia. pp 1–5
- Liang SW, Wu G, Yee KC, Wang CT, Cui JJ, Yu DCH (2022) High-Performance and Energy Efficient Computing with Advanced SoIC Scaling. Proc of IEEE Electronic Components and Technology Conf. pp 1090–1094
- Loi I, Mitra S, Lee T, Fujita S, Benini L (2008) A low-overhead fault tolerance scheme for TSV-based 3D network on chip links. Proc of Int'l Conf. on Computer-Aided Design. pp 598–602
- Nicolaidis M, Pasca V, Anghel L (2012) Through-Silicon-Via Built-In Self-Repair for Aggressive 3D integration. Proc of IEEE Int'l On-Line Testing Symp. (IOLTS). pp 91–96
- O'Brien PRO, Savarino TL (1989) Modeling the Driving-Point Characteristic of Resistive TSV for Accurate Delay Estimation. Proc of Design Automation Conf. pp 512–515
- Reddy RP, Acharyya A, Khursheed S (2017) A cost-effective fault tolerance technique for functional TSV in 3-D ICs. IEEE Trans VLSI Syst 25(7):2071–2080

- Wang S, Tahoori MB, Chakrabarty K (2016) Thermal-aware TSV repair for electromigration in 3D ICs. Proc of Design Automation & Test in Europe Conf. (DATE). pp 1291–1296
- Wang Q, Liu Z, Jiang J, Jing N, Sheng W (2019) A new cellularbased redundant TSV structure for clustered faults. IEEE Trans VLSI Syst 27(2):458–467
- Zhao Y, Khursheed S, Al-Hashimi BM (2015) Online fault tolerance technique for TSV-based 3-D-IC. IEEE Trans VLSI Syst 23(8):1567–1571

**Publisher's Note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

**Ching-Yi Wen** received her B.S. degree in Engineering and System Science from National Tsing Hua University, Taiwan, in 2020, and her M.S. degree in Electrical Engineering from National Tsing Hua University, Taiwan in 2023. Her research interests include VLSI design and TSV testing and repair.

Shi-Yu Huang received his B.S. and M.S. degrees in Electrical Engineering from National Taiwan University in 1988, and 1992, and his Ph.D. degree in Electrical and Computer Engineering from the University of California, Santa Barbara in 1997, respectively. He has been on the faculty of EE Dept., National Tsing Hua University, Taiwan, since 1999. His research interests include VLSI design, automation, and testing, with a current emphasis on All-Digital Phase-Locked Loop (ADPLL) design and its application in parametric fault testing and monitoring in 3D ICs. Dr. Huang is an IEEE Senior Member, and currently serving as an Associate Editor of IEEE Trans. on Emerging Topics in Computing.

