Chapter 15. Electromigration (EM) Reliability Analysis

15.1 Introduction to EM Reliability Analysis

The electrical analysis flows described in previous chapters pertain to the correct functionality of the (tapeout) design database. An additional analysis requirement is to ensure that the design will satisfy product lifetime reliability requirements. The product lifetime is typically denoted as “N power-on hours per year for Y years.” Often, the package engineering team adds the consideration “M power cycles per year” for its analysis of the mechanical stress on the die/package attach technology due to thermal cycling.

15.1.1 Design Robustness and Reliability

There are two classes of SoC design specifications to address for product lifetime operation: (1) robust design, in the presence of circuit changes due to parameter drift or noise sources, and (2) reliability, due to failure mechanisms.

Robust design refers to the steps taken to maintain functionality during the operational lifetime. For example, the SoC is subject to high-energy incident particles, either from extraterrestrial sources or radioactive decay of (trace) materials in electronic packages. (Since the initial identification of radioactive decay alpha particles as a major source of soft errors in DRAM modules, package and die attach materials have been modified, reducing this particle flux substantially.) These particles traversing through the die may result in a “collision” and the generation of free electrons and holes. A sufficient concentration of particle collision-related free charge near a sensitive circuit node may disrupt a stored logic value. The soft error rate (SER) in the SoC circuitry can be minimized through robust circuit and layout design so that the magnitude of free charge collection does not cause a flip in stored value (see Section 16.3).

Section 4.2 highlights device-level operational effects that result in parameter drift and device current degradation (e.g., hot carrier injection, bias temperature instability). The SoC methodology team needs to evaluate these device model impacts and establish an appropriate design strategy to maintain functionality. In addition, analog mixed-signal IP needs to ensure proper behavior in response to device noise mechanisms (e.g., thermal, shot, and flicker noise).

Design robustness addresses maintaining the correct circuit response to electrical disruption during operation. SoC reliability relates to wearout mechanisms, typically associated with material stresses that result in (significant) changes in material properties or mechanical fractures. As mentioned earlier, thermal cycling exerts stress on the die and package attach materials due to coefficient of thermal expansion differences; over time, a material fracture may result in a (catastrophic) failure. The major reliability issue within the die relates to electromigration (EM). This chapter briefly reviews EM as a failure mechanism and techniques to evaluate the related die failure rate.

15.1.2 MTTF and FIT Rate Specification

The end customer market for the SoC design defines the acceptable chip failure rate specification over the product lifetime. The SoC methodology and package engineering teams need to demonstrate that these specifications will be met, whether for consumer, automotive, medical, or aerospace applications. There are two common metrics to represent SoC reliability: the mean time to failure (MTTF; this acronym is also used for median time to failure) and the FIT rate.

Mean Time to Failure

Given a population of parts, the MTTF is the expected (mean) time until a part no longer meets the functional specifications. Specifically, if the failure probability density function is denoted as f(t), the expression for the MTTF is given in Figure 15.1.

Equation of mean time to failure equals the integral over the limits 0 to infinity, t times f of t dt.

Figure 15.1 MTTF equation, in terms of the failure probability density function, f(t).

The failure probability function is related to several other metrics:

  • The reliability function, R(t)—R(t) is also referred to as the survival function, as it is the probability of the part surviving until time t:

    R(t)=1F(t)(Eqn. 15.1)

    where F(t) is in the integral of the failure probability—namely, the cumulative density function (CDF) of the probability density function, f(t).

  • The hazard rate, h(t)—The hazard rate at time t is the (conditional) probability of failure among the remaining part population that has survived to time t. Mathematically, the hazard rate is represented by:

    h(t)=f(t)/1(1F(t))=f(t)/R(t)(Eqn. 15.2)

  • The failure rate—In general, the failure rate is the probability that a product will fail at time t among the remaining product population, and the broadest definition includes repaired products that remain in the population. As a failed SoC is not repairable, the hazard rate and failure rate are equivalent. The term failure rate is used in the subsequent discussion, denoted as h(t); the notation f(t) refers to the failure probability density function as opposed to the failure rate.

Failure rate data are measured from a sample and thus represent an estimate of the total population. For reliability calculations due to electromigration, the sample data are derived from fabricated testsites that allow specific failure mechanisms to be isolated. This reliability data is included as part of the foundry PDK release.

The mean time to failure measure is often replaced by median time to failure (also, confusingly, denoted MTTF), which is the solution to the integral in Figure 15.2.

0.5 equals integral over the limits 0 to MTTF, f of t dt.

Figure 15.2 Definition of the median time to failure (also denoted as MTTF). When discussing reliability failure calculations, it is important to clarify which interpretation of MTTF is being referenced.

The median time to failure is perhaps a more meaningful metric, given the very asymmetric nature of the typical failure rate function for microelectronics. Figure 15.3 depicts a bathtub curve to illustrate the SoC population failures over time. (Note that infant fails are ideally screened and removed by burn-in stress testing, as discussed in Section 21.1.)

A bathtub curve shows the failure rate function for microelectronics.

Figure 15.3 Illustration of a typical “bathtub curve” failure rate function, h(t). Assuming that infant fails are screened, the part population failure rate is relatively constant until the wearout region.

The probability density function f(t) associated with the bathtub curve failure rate h(t) is skewed. A composite Weibull distribution is often used, with shape = 1 for the part of the population representing the constant failure rate and shape > 1 for the wearout region. The mean value of the distribution is significantly larger than the median due to the wearout region. As a result, the median time to failure (assuming a constant failure rate during the useful product lifetime) is more informative than the mean time to failure. Due to the nature of statistical sampling, there is also a confidence level typically associated with the median time to failure specification.

FIT Rate

The failure in time (FIT) rate is the reliability metric that is more commonly used, as it is numerically straightforward to apply it to a large SoC product volume with the power-on-hours lifetime specification. The FIT value is an estimate of the number of failing parts in 10**9 hours of operation accumulated across the entire part population. The 10**9 hours is a de facto standard for reference. Mathematically, the FIT value and MTTF are related, as follows.

Assume a constant failure rate, h(t), over the useful life of the population, neglecting the wearout region. The FIT rate is equal to ((10**9) * h(t)). For example, if the estimated SoC failure rate is h(t) = 3.5 * (10** − 8) units/hour, the FIT rate for the part is 35. (Note that a single constant failure rate assumes that the part is consistently operating with a device junction temperature equal to Tmax; if the temperature profile over the operating lifetime is known to differ from Tmax, a composite calculation using multiple failure rates may be more appropriate.)

For a constant failure rate, the surviving function R(t) is a very slowly decaying exponential. The R(t) and MTTF are illustrated in Figure 15.4.

Equations of R of t, F of t, and f of t for a constant failure rate is shown.

Figure 15.4 Illustration of the surviving function, R(t), and the failure probability density function, f(t), for a microelectronic part with a constant failure rate during the operational lifetime.

The motivation for the discussion of both failure rate and MTTF is that the key electromigration mechanism is described in terms of MTTF, whereas SoC specifications commonly use the FIT value. Product developers add the FIT rates of individual components to calculate the overall expected failures.

Note that there are additional reliability metrics used when referring to a (repairable) end product outside the field of microelectronics. The mean time between failures (MTBF) is a probabilistic estimate of the time between successive failures of a product. The mean time to repair (MTTR) is an estimate of the resources required to return a product to functional use. These metrics are evaluated by product developers when estimating service costs and maintenance (downtime) schedules, including (possibly preventive) replacement strategies for field-repairable units (e.g., swapping an existing printed circuit board with a new replacement). The (non-repairable) SoC MTTF or FIT specifications are incorporated into the overall product MTBF calculation.

These SoC reliability projections are inherently statistical approximations, as the corresponding failure mechanisms are probabilistic in nature; this is certainly the case for electromigration. As discussed in the remainder of this chapter, the goal is to establish current density limits for metal interconnects, vias, and contacts such that the total failure rate for the entire SoC due to electromigration does not exceed the reliability target. SoC electrical data from other flows are analyzed to compare actual current densities against these limits.

15.1.3 Sum of Failure Rates

The typical reliability model used for a large electronic system is the sum of failure rates.[1] This model is based on the following assumptions:

  • Each failure rate is independent (at least up until the first failure occurrence).

  • The first failure among the independent mechanisms represents the failure of the system.

The failure rate data measured experimentally by the foundry are based on the testsite population, using elevated current density and temperature to accelerate the electromigration mechanisms. Using a key relationship (i.e., Black’s equation), the measured failures can be scaled to calculate the operational MTTF and FIT with corresponding current density limits. Note that the definition of “a failure” in testsite measurement data subsequently needs to be applied to the analysis of the electrical model of the SoC; this is discussed further in the next section.

15.2 Fundamentals of Electromigration

The metals used in the fabrication of wires, contacts, and vias are subject to several forces during operation:

  • Electric field—The electric field across a metal segment results in electron motion and current; however, there is no significant local net charge concentration, and thus there is no electric force on metal atoms.

  • Electron momentum transfer—The electric current in the metal gives momentum to electrons. An electron collision with a metal atom/ion imparts a force on the atom, which may result in a permanent displacement of the atom, potentially moving it to a lattice vacancy. Metal atom vibrations in their lattice position increase with temperature; thus, the collision probability is strongly dependent on temperature. The net flux of metal atoms due to the electron collision force over the lifetime of the SoC is denoted as electromigration.

  • Thermal gradients—The current in metal wires results in resistive power dissipation. The Joule self-heating energy due to the ((I**2)*R) dissipation results in a temperature rise in the metal and thermal gradients throughout the local neighborhood. The Joule heating temperature increase (over the reference die substrate) has a multi-faceted impact:

    • Electron-collision migration is accelerated for the wire.

    • Thermal self-diffusion (thermo-migration) moves metal atoms to vacancies.

      Although the thermo-migration flux is relatively small, it may result in additional paths for electron collision-driven atom displacements, accelerating electromigration.

    • The temperature of neighboring structures increases, based on the relative thermal resistances between adjacent wires and the substrate.

    • The temperature coefficient of resistance results in higher R, adding to interconnect delays.

The deposition and patterning of metal wires has characteristics that enable migration. The metal volume is composed of grains with abutting grain boundaries and contains lattice vacancies, as mentioned earlier. As depicted in Figure 15.5, the electron momentum imparted to a metal atom is in the direction opposite the current flow. Metal displacement results in accumulation (or hillocks) toward the anode, and the cathode end of the wire is subject to the nucleation and subsequent growth of voids. The paths for metal migration include vacancies, grain boundaries, and the area along the surrounding metal/dielectric material interface.

A figure shows the direction of electron flow and current flow due to metal atom migration.

Figure 15.5 Metal atom migration due to current flow. The atom flux is in the opposite direction of the current flow as a result of electron momentum collisions. Examples of metal grains, hillocks, and voids are depicted.

Fabrication process development has focused on reducing the diffusivity of metal atoms along these paths:

  • Wire and via metallurgies have evolved, from Al to AlCu to Cu.

  • Wire patterning using damascene technology includes thin trench metal barrier and seed growth layers, plus capping layers, with lower interface diffusivity.

  • Local device-level and contact metallurgies have evolved, and refractory metals are increasingly being used.

  • Metal grain structures and size have received considerable process development focus.

If the grain size and orientation can effectively span the cross-section of the wire, the overall cathode-to-anode diffusivity is reduced.

15.2.1 Black’s Equation and Blech Length

The study of electromigration in integrated circuits began in the 1960s. Subsequent research has expanded the understanding of the metal migration mechanisms, the influence of bidirectional versus unidirectional current, and the influence of circuit-level device self-heating on the thermal profiles of local metals. The original EM investigations resulted in a model known as Black’s equation for the median time to failure due to electron collisions that has continued to be used in the subsequent decades (see Figure 15.6).

An equation of black’s equation for the median time to failure in a wire is shown.

Figure 15.6 Black’s equation for the median time to failure due to electromigration.

In the equation, the factor A is an empirical constant, j is the current density in the wire, and EA is the activation energy for metal ion displacement. The exponent n depends on the relative contribution of void nucleation (n ~ 2) followed by void growth (n ~ 1). For AlCu wires, the migration kinetics typically result in using n = 2 as the best fit to Black’s equation; for Cu wires, the kinetics favor n = 1. The key characteristics are the strong dependency of measured failures on current density and temperature. As mentioned in the previous section, this equation enables the fitting of accelerated test measurement data, which can then be applied to operating MTTF targets, as shown in Figure 15.7.

An equation for scaling measured failure data at accelerated conditions is shown.

Figure 15.7 Measured failures on silicon testsites utilize accelerated stress conditions. The MTTF data is subsequently scaled back to the SoC operating environment using Black’s equation.

The general modeling assumption is that equation coefficients A and n are independent of temperature; the ratio of the activation energy to kT represents the temperature dependence.

Subsequent experimentation demonstrated that there is a substantial reversal of the metal atom displacement flux due to electron collisions when the wire is subject to a (high-frequency) bidirectional current, as would be the case for signal interconnections with moderate-to-high switching activity. A typical model for the MTTF for wires with bidirectional currents is shown in Figure 15.8.

A figure shows the black’s equation for bidirectional currents and a model for MTTF for wires.

Figure 15.8 MTTF model incorporating a healing factor to reflect the reversal of atom displacement for bidirectional currents in the metal.

In this revised model, the healing factor increases the reliability current density for bidirectional currents. As a result, signal interconnects between cells are analyzed using different criteria than are used for unidirectional (power rail and cell internal) metals.

Another experimental observation on EM failures noted a wire-length dependence. For test structure wires below a certain length, the failure rate dropped dramatically. As metal atoms are displaced, compressive stress in the metal is present at the anode, and tensile stress exists at the cathode. For short-length wires, these additional material stresses result in a force on metal atoms that counteracts the electron collision flow. The Blech length relationship is used to define metal segments for which significantly higher current density limits are applied. (The definition of an EM fail is discussed shortly.) The Blech length expression is (j * LBlech<process_limit), specific to each metal layer. The dielectric materials (and damascene barrier metal layers) influence the Blech length, as these interfaces also contribute to the metal stresses.

15.2.2 JDC, JAC, JRMS, and JPEAK

The electromigration reliability of wires, contacts, and vias is analyzed using four factors:

  • Unidirectional DC (average) current density, jDC or jAVG

  • Bidirectional current density (with healing), jAC

  • The wire temperature increases due to resistive Joule heating, represented for EM analysis by jRMS

  • Peak current density, jPEAK

Unidirectional DC (Average) Current Density and Bidirectional Current Density (with Healing)

Using Black’s equation, the calculated jDC and jAC current density for all SoC wires, vias, and contacts are compared to the foundry PDK current limit to scale the PDK failure rate. The PDK provides the following:

  • jDC— the reference PDK current limits published for all metal and via layers

    • Blech length current multiplier for metal layers, using a B(Lwire, Wwire) relationship.

    • Via current limit multiplier, based on the Blech length of connected metaln and metaln+1

  • jAC—healing factor per layer applied to jDC limits (see Figure 15.9)

    A graph shows a curve for the calculation for bidirectional current in a wire.

    Figure 15.9 Example of the calculation of jAC for a bidirectional wire current.

  • FIT value for the reference jDC current limit

Each SoC metal segment has a FIT value. The current density in the segment is compared to the PDK reference to scale the published FIT using Black’s equation. The jAC calculated for the bidirectional current density in the metal segment is submitted to the jDC failure rate calculation; the healing factor is assumed to be a constant over the typical range of signal switching frequencies. As there is an assumption of complete signal net charge/discharge for each transient, the integrals of the two currents shown in Figure 15.9 are (by definition) equal.

Wire Temperature Increase Due to Resistive Joule Heating

The development of a self-consistent model for the wire jRMS, the local temperature profile, and EM reliability is difficult due to the interdependencies listed earlier, at the beginning of Section 15.2. A typical approach is to:

  1. Estimate the jRMS current in the wire (as discussed in subsequent sections).

  2. Utilize a (pre-characterized) model from the foundry for the wire temperature increase due to jRMS self-heating for the wire metal layer.

  3. Add the thermal contribution from local device power dissipation to the wire temperature.

The pre-characterized foundry model for the temperature increase due to jRMS is specific to each BEOL metallization stack, with the corresponding metal/dielectric layer thicknesses and thermal resistances.

For FinFET and FD-SOI device technologies, the thermal flow to the substrate is reduced over planar devices; an increasing fraction of the active device thermal energy flows into the metallization stack. The foundry provides the models for the (three-dimensional) temperature increase to neighboring wires due to device dissipation. The EDA vendor providing the EM analysis tool links the circuit-level data from the power dissipation analysis flow to these ΔT tables in the PDK to calculate the wire temperature adder due to device self-heating.

The “steady-state” wire temperature is the sum of the jRMS Joule self-heating ΔT, the ΔT from device self-heating, and the substrate temperature (e.g., Tsubstrate = Tjunction_max). If a more accurate die thermal map is available from the power analysis flow, the values from that map could be used instead of a single substrate temperature when calculating the wire temperature.

The calculated wire temperature would be used to further scale the wire failure rate from the foundry PDK data. This method is depicted in Figure 15.10. The calculated wire temperature combining the self-heating, device dissipation, and die substrate temperature factors would be compared to the foundry PDK reference temperature, and a failure rate multiplier would be determined for the wire using Black’s equation for the wire metal layer.

A graph compares the scaling factor against the temperature.

Figure 15.10 Illustration of the scaling multiplier for the failure rate of a specific wire, based upon the calculated wire temperature.

Note that the jRMS temperature increase calculation applies to metal layers. The Joule heating contribution in vias is small (assuming that the layout includes sufficient vias to satisfy the jDC limit).

Peak Current Density

The average, bidirectional, and RMS calculations integrate current waveforms over a longer interval. There is also an EM current limit applicable to a short duration current pulse. Each wire segment has a peak current density limit, jPEAK. The definition of the peak current measurement is illustrated in Figure 15.11.

A graph shows a curve for the calculation of current density in a wire.

Figure 15.11 Illustration of the jPEAK calculation in a wire or via.

This current density limit is intended to avoid excessive Joule heating, which could lead to metal deformation. Exceeding the jPEAKPDK limit values for a wire or via should be regarded as an absolute failure, which requires design changes, rather than as a contribution to the SoC operating lifetime failure rate.

Also, it should be noted that different interconnect layout topologies may have significant current crowding at the transition between a wire and via or in wire jogs. However, EM analysis uses the extracted parasitic electrical network from the layout and thus is not able to accurately reflect non-uniform current densities in the cross-section of a resistive element. Separate EM-avoidance layout design guidelines are typically added to the PDK design ruleset, especially for via arrays between wide metals.

15.2.3 Definition of an EM Failure

The foundry’s process development and PDK design enablement teams fabricate a set of testsites to measure EM failure rates, consistent with Black’s equation (with AC healing) and the Blech length expression. The published jDC, jAC, jRMSΔT, device power ΔT contribution, and jPEAK data for each layer are referenced to a specific failure rate and temperature. For example, (except for jPEAK, which is absolute) the current limits for each wire, via, and contact could be defined as follows:

h(t)=(1*10**9) at 120 degrees  C(e.g., 1FIT)

The CAD team collaborates with the EDA tool vendor on the software utilities required to submit the wire current density and temperature results from the electromigration analysis flow to Black’s equation. This calculation with the actual current and temperature scales the published PDK FIT rate for each wire, via, and contact. The full-chip SoC reliability evaluation applies the sum-of-failure rates assumption to the entire population of wires, vias, and contacts to provide an overall FIT estimate.

However, it is also crucial for the SoC methodology team to review the testsite measurement criteria that denote failure. Metal atom flux results in the formation of voids, originating from the nucleation of metal lattice vacancies. The focus of accelerated testing at the foundry is the resistance increase in the wire or via due to the growth of voids. A “failure” is recorded when the starting resistance of a testsite structure increases by N%—typically N = 10%, or ΔR = (Rstress / R) = 1.1. A scattergram of ΔR versus current density is plotted for the resistance data measured after stress testing for each structure on the testsite wafers. This plot is analyzed to select the current density limit for each metal, via, and contact layer, corresponding to the reference FIT for that layer published in the PDK.

Although a wire segment resistance increase is reported as an EM failure, the SoC design may be sufficiently robust to maintain functionality. For example, the grid topology of the power/ground distribution may still be able to maintain an adequate voltage drop in the presence of (multiple) EM failures. The contribution of a “failing” power grid segment to the FIT calculation may be overstated. The EM analysis flow targeting the power grids integrates the voltage drop analysis algorithm described in Chapter 14, “Power Rail Voltage Drop Analysis,” with a modified conductance matrix due to high-resistance segment(s)—perhaps even removing the conductance of the failing EM segment altogether. If the power grid is still adequate, the FIT value merits correction.

The methodology for EM analysis for a signal wire in the presence of current density failures is more complex. The robustness of a static timing or noise path may remain in the presence of wire segment resistance increases of N%. Conversely, a critical path may fail static timing analysis if the resistive element(s) in the interconnect model are increased by N%.

The SoC methodology team is faced with multiple options in response to the results from the sigEM flow:

  • First, any jPEAK limit failures must be resolved.

  • Design ECOs may be required to address any “high FIT” wires/vias. If a signal wire has a high FIT due to its current density, this may be indicative of a physical layout mismatch between cell drive strength, wire widths, and capacitive load. Min/max signal slew limit checks applied to the static timing analysis results would have previously identified appropriate design changes and hopefully reduce the mismatch resulting in the high current density. Reference 15.2 compares the current density from a “matched” driver and interconnect to typical process technology jAC EM limits. If a signal via has a high FIT contribution, multiple vias (on wider wires) are warranted. If the ΔT temperature adder is a significant factor in the FIT value, more significant layout changes are required to increase the distance from power dissipation sources.

  • Additional performance margin for EM resistance increases over the SoC lifetime may be appropriate, in addition to BTI and HCI device parameter drift mechanisms. This is an unattractive option, to be sure.

  • The static timing analysis slack results for the paths corresponding to signal wires with high FIT are evaluated and an assessment is made to determine whether a potential increase in interconnect delay has sufficient positive slack to warrant reducing the FIT value.

Reference 15.3 presents an algorithm to correlate the wire EM failure probability density function, f(t), to a probability distribution for the increase in wire resistance. Then Monte Carlo-sampled circuit simulations are used to evaluate probabilistic path delay increases over time (e.g., 5 to 20 years). The SoC and CAD teams may opt to deploy a similar flow to assess the sensitivity of timing paths with high-FIT wires to the resistance increase as a function of operating lifetime.

15.2.4 Extraction Model for Electromigration

To apply the sum-of-failure-rate algorithm for the SoC reliability calculation, it is necessary to evaluate the current densities in a detailed parasitic extraction network without reduction. In addition, the parasitic extraction flow needs to be exercised in verbose mode, where resistive elements include full layout detail to enable current density calculation (see Figure 15.12). The resistor example in the figure includes a thickness parameter. The foundry may provide an analytical model for the chemical-mechanical polishing metal planarization process module. The layout extraction tool from the EDA vendor may include a corresponding CMP analysis algorithm to derive a (non-default) wire thickness estimate. (CMP analysis is also applied to yield prediction models, as any non-planarity of the wafer surface affects lithographic exposure accuracy due to depth-of-focus limits.)

An illustration shows the calculation of current density. Row 1 reads, R subscript n, node 1, node2, and value. The values to be calculated are W, L, layer, thickness, X1, Y1, X2, and Y2. These are the additional dimensional and coordinate parasitic extraction data for EM analysis.

Figure 15.12 Illustration of resistance extraction in “verbose” mode, with layout detail to enable current density calculation.

Electromigration models and the related failure rate calculations are complex probabilistic relationships that utilize an empirical fit of testsite data to the model of Black’s equation. As mentioned earlier in this section, the kinetics for atom displacement are unique for unidirectional and bidirectional currents. As a result, EM analysis is divided into power and ground rail (powerEM) and switching interconnect (sigEM) flows, discussed next.

15.3 Power Rail Electromigration Analysis: powerEM

The static I*R voltage drop analysis flow described in Section 14.2 applies the (average) circuit currents to the extracted resistive model of the VDD and GND grids. The matrix equation to solve for static I*R was presented as G * v = i, where G is the conductance matrix between grid nodes, i is the vector of circuit currents, and v is the vector of node voltages to solve. The powerEM analysis flow leverages the I*R voltage drop data. The jDC through each conductance element in the matrix is calculated using the equation in Figure 15.13.

An equation for the calculation of j subscript DC is shown.

Figure 15.13 Calculation of jDC in a power grid wire for the powerEM flow.

The assumption is made that the power and ground grids are adequately described by the static I*R rail voltage drop data, using average cell currents. The jDC failure probability spans years, as opposed to the time scale for the dynamic I*R rail voltage analysis flow results. For a DC current in the power and ground grids, jRMS = jDC.

Using the jDC value derived from static I*R analysis, the powerEM flow calculates the failure rate measure for each resistor, using the parameters provided in the foundry PDK. The FIT is scaled from the PDK data by the calculated jDC and wire temperature. The metal line length parameter from the parasitic extraction detail multiplied by the jDC current density defines whether Blech length kinetics apply to the failure rate calculation.

The powerEM flow then provides an output report with the cumulative FIT value and a listing of the individual high-FIT wires. Using the (x,y) coordinate information included with the parasitic resistive element, the powerEM flow also provides a physical view that highlights the high-FIT segments to allow the layout engineer to more easily visualize the wires and vias of concern.

The voltage solution from the dynamic I*R rail voltage drop flow enables a time-based current for conductance elements to be calculated (see Section 14.3):

G * x(t)+Y*x'(t)=s(t)(Eqn. 15.1)

The solution x(t) from the dynamic I*R rail analysis provides the voltage at each node in the PDN network and is used in the calculation of the time-varying current in the metal segment. The jPEAK limit check applies to this dynamic conductance current. As mentioned in the previous section, the peak current density limit is based on the Joule heating rise:

jPEAK_limit=f(jPEAK_DC,pulse_duty_cycle)(Eqn. 15.2)

For the power and ground grids, the current profile differs from the short-duration pulse of a switching signal, typically with small fluctuations about the average rail current. Applying the maximum current from dynamic I*R analysis for each conductance element (with duty_cycle ~ 1) would give an appropriate approximation in the jPEAK_limit comparison. Any powerEM jPEAK failures would require physical design modifications; these are hard violations rather than contributions to the SoC FIT calculation.

The SoC methodology team reviews the powerEM results and proceeds with (potentially) multiple recommendations to address issues:

  • Modify the global power and ground grids, if there are regular high FIT “weak spots.” Global grid layout data are often created using a script developed by the CAD team. PowerEM flow results may suggest a pervasive change to the grid—for example, wider wires (on specific layers), more vias between segments on different layers, or a larger stacked via area. The impact on global routing track density would need to be assessed. Note that the powerEM flow could be exercised right after (early) static I*R analysis to identify this impact as soon as possible in the overall project schedule.

  • Locally modify the P/G grid. A unique version of the P/G grid layout cell could be manually generated in response to specific high-FIT segments. (A review of the I*R voltage drop and ΔT results in this local area would be appropriate to determine the root cause of the high FIT segment calculation.)

  • Iterate on static I*R and powerEM flows by removing the high-FIT segments from the grid. As mentioned in the previous section, the definition of an EM segment failure is based on a (probabilistic) percentage increase in the segment resistance. The power and ground grids may be able to continue to provide an adequate voltage drop if (a small number of) grid segments are higher resistance. The CAD team provides a utility to remove selected high-FIT segments from the parasitic network to allow iteration on the static I*R drop and powerEM flows. (Although the EM failure criterion is a percentage increase in R, a conservative approach would be to remove the conductance segment in the revised static I*R calculation.) The overall flow is depicted in Figure 15.14.

A figure shows the power EM and static IR analysis flow.

Figure 15.14 Illustration of the (iterative) use of the powerEM and static I*R analysis flows to determine the impact of a power grid segment EM “failure”; the gij element in the conductance matrix is modified.

If the new power and ground voltage drop solution is still adequate with the segment removals in the conductance matrix, the high-FIT grid segments in the initial powerEM results could potentially be waived if the subsequent powerEM results do not demonstrate a significantly higher failure rate.

15.4 Signal Interconnect Electromigration Analysis: sigEM

As described in Section 15.2, the EM reliability analysis of interconnects between cells applies a different jAC current density calculation due to the different metal atom migration kinetics with healing for bidirectional currents. As a result, a specific sigEM flow is developed for the SoC methodology, different from the powerEM flow. These two flows may be exercised at different times in the project schedule and at different levels of the SoC design hierarchy. For example, the powerEM flow may be exercised early to evaluate the global P/G grid design, using average current density estimates for individual blocks with a “constant current per unit area” measure at the grid conductance nodes. Conversely, the sigEM flow would be exercised (repeatedly) during block-level physical design iterations to ensure proper route and via construction for each cell driver (using a block-level thermal map). A budget for the total FIT value would be assigned to each block (and global signals and clocks, as well) to enable block-level analysis to proceed independently.

Before discussing the sigEM flow, there are two topics to highlight: cell-level EM and EM analysis for clocks.

15.4.1 Cell-Level EM Analysis

The development of a cell IP library for release to SoC designers requires analysis of the FIT rates associated with the wires, vias, and contacts within the cell layout. The cell contains a mix of unidirectional and bidirectional wire currents, necessitating calculation of both jAVG and jAC current densities during the cell characterization circuit simulations.

The library developer needs to ensure a very low FIT rate for each cell, across a wide range of characterization and usage conditions. The calculation of jPEAK, jAVG, jRMS, and jAC for each metal segment in the extracted model requires a conservative (high) estimate for the assumed circuit-switching activity. The current density calculations need to use the maximum values measured during characterization for any of the pin-to-pin delay arcs and the range of input pin slews and output loads.

There are several electromigration concerns specific to cell-based analysis, as illustrated in Figure 15.15:

  • The proximity to device self-heating results in a significant ΔT contribution, in addition to the jRMS wire self-heating.

  • The availability of local-interconnect metal layers in a process technology results in both horizontal and vertical current density in the metal, with different PDK EM limits for the M0 layout for the two current directions.

  • The introduction of area pins enables the router to select among multiple wiring tracks; the specific pin location selected modifies the unidirectional and bidirectional current profiles; a worst-case pin selection EM analysis is required.

An illustration depicts about cell-level EM analysis.

Figure 15.15 The cell-level EM analysis requires interpretation of the current flow in metal segments. The M0 layer local interconnect has both horizontal and vertical currents. The specific connection to an area pin results in different current directions for metal segments under the pin.

These same EM analysis considerations apply to the reliability estimates for larger IP macros, as well. The SoC methodology team needs to review the algorithms used by the IP vendor when evaluating the library IP FIT rate specification.

15.4.2 EM Analysis for Clocks

The design of clock repowering grids and H-trees requires focus on optimal sizing of clock buffers and interconnects. Early EM analysis of clock jAC and jRMS is encouraged, as soon as the (global) clock distribution is available, to confirm the buffer and wire design assumptions.

Like the device-level BTI and HCI parameter drift mechanisms, the evolution of resistance increase in clock wires contributes to arrival skew variation at clock distribution endpoints. The previous section references an algorithm to correlate interconnect electromigration (void growth) data with a probability distribution model for resistance increase; this model would be applicable in clock simulations to develop a consolidated BTI/HCI/EM lifetime skew margin to adopt in static timing analysis.

15.4.3 SigEM Current Density Calculation

The sigEM analysis flow for IP library cells measures the current densities of interest using the circuit simulation data from cell characterization. For (block and global) signals between cells, a different method for current density calculations for sigEM is required.

The static timing analysis flow provides a model for the driving current source on each cell output for both RDLY and FDLY transitions (for each STA mode/corner). Although there are different current source profiles for different input pin-to-output pin arcs, STA propagates a single arc for path delay analysis. Assuming that this arc represents the greatest sensitivity to an increase in the interconnect resistance, using the STA-propagated driver current profile for sigEM analysis is appropriate.

The circuit simulation of each interconnect tree to determine jAC, jRMS, and jPEAK would be computationally expensive. Reference 15.4 describes an alternative approach, which involves solving a matrix-based formulation to calculate these currents. The translation of the interconnect model into a linear matrix is based on evaluating the total charge delivered to the network from the driving current at the cell output, as depicted in Figure 15.16.

A circuit diagram and a matrix formulation to calculate total charge are shown.

Figure 15.16 Illustration of an algorithm used to compute the current through each resistive element in an RC interconnect tree from a transition on the driving output pin.

The matrix equations for the signal interconnect RC trees are similar to those developed for the dynamic power voltage drop flow using Kirchhoff’s current law, as described in Section 14.3:

G*v(t)+C*v'(t)=s(t)(Eqn. 15.3)

In Equation 15.3, G is the interconnect conductance matrix of dimensionality (N x N), where there are N nodes in the interconnect tree (including a ground reference node), and gij is the conductance between nodes i and j. For the case of an extracted signal interconnect, this conductance matrix is very sparse (and there is no conductance between any extracted node and ground). C is the network capacitance matrix, and v is the vector of interconnect node voltages. The vector s describes the current sources into the network. For sigEM analysis, it is assumed that the only current source is from the driver node; no other side currents are injected into the interconnect network during the cell output transient (thus neglecting aggressor noise-coupled transients).

Integration of both sides of Equation 15.3 from t = 0 to t = T provides the following equivalent relationship:

G*w=QDC*(vTv0)(Eqn. 15.4)

The integral of the vector v' results in two simple vectors for the node voltages at t = 0 and t = T. For the case of a signal interconnect, assume that the initial and final values of all (capacitive storage) node voltages are known (e.g., v0 = [ 0 ] and vT = [ VDD ] for a RDLY transition). The integral of the current source vector s provides a vector representing the total charge delivered by the driving current source, QD. This vector is nonzero only at the cell output node. The total driver charge can be easily calculated: It is equal to the charge delivered to all the network capacitances during the transition, (Ctotal * VDD). The vector w is the voltage integral at each node as a result of the driver transient, and it is the vector to be solved in the matrix equation.

Solving for the vector w also provides the total charge through each interconnect resistor as a result of the driver transient: qij = gij * |wj−wi |, where gij is the conductance (1/R) between nodes i and j. The jAC for each resistive element in the signal interconnect can thus be determined from the total charge calculation, as shown in Figure 15.17.

A figure shows the expressions for the calculation of j subscript AC and j subscript AC subscript R i j.

Figure 15.17 Calculation of jAC for a segment in the RC interconnect tree, using the solution to the matrix formulation.

The calculation of jRMS and jPEAK for a signal interconnect element requires waveform detail. To avoid the computational demand of circuit simulation, Reference 15.4 also proposes an approximation for these waveforms, leveraging the earlier matrix solution for the total charge through each resistive element during a transition. Referring to Figure 15.18, assume that the current waveform through each resistive element is (roughly) triangular. The total charge delivered for the transition, combined with a (minimum) transition time for the signal from static timing analysis results, defines the approximate current waveform and the subsequent jRMS and jPEAK values for the sigEM FIT calculation.

A circuit diagram and a graph are shown.

Figure 15.18 Illustration of the approximation used for the current waveform through each resistive segment to calculate jRMS and jPEAK, from the total charge through the resistor during the transient.

With the estimated values for jAC, jRMS, jPEAK, device self-heating ΔT, and duty cycle for each resistive element, the foundry PDK limits and FIT scaling factors can be calculated.

The sigEM FIT results at block, core, and full-chip hierarchy levels require review by the SoC methodology team to determine whether significant FIT contributions by resistive elements require physical design updates or further simulation analysis to assess the sensitivity of the (probabilistic) resistance increase to overall functionality.

15.5 Summary

This chapter reviews what is likely to become a limiting factor to the pace of VLSI process technology scaling: the SoC reliability impact of electromigration in metal wires, vias, and contacts. The transition to device technologies that are thermally distant from the die substrate will exacerbate the issue due to the local temperature increase from device self-heating. The SoC markets for extremely high-reliability products (e.g., medical, aerospace, automotive) are growing rapidly. The challenging temperature environments of aerospace and automotive applications require increasing focus on EM reliability analysis and sufficient design margins to continue to function despite the probability of a wire “failure,” as reflected in the resistance increase.

The sheer magnitude of the number of SoC signals multiplied by the number of resistive elements in the extracted interconnect tree for each signal presents an interesting question: Is the sum-of-failure-rate assumption still the best reliability model? Individual resistive elements may have an infinitesimally small failure rate, but their quantity is large, potentially resulting in a significant FIT estimate. The validity of the summation of a large number of very small failure rates warrants further research. With the increasing demand for high-reliability product applications, and with continued process scaling providing even greater numbers of signals (and parasitic elements), this research will be crucial for future SoC designs.

References

[1] White, M., and Bernstein, J., Microelectronics Reliability: Physics-of- Failure Based Modeling and Lifetime Evaluation, National Aeronautics and Space Administration (NASA) Jet Propulsion Laboratory, JPL Publication 08-5, 2008.

[2] Banerjee, K., and Mehrotra, A., “Coupled Analysis of Electromigration Reliability and Performance in ULSI Signal Nets,” Proceedings of the 2001 IEEE International Conference on Computer-Aided Design (ICCAD), 2001, pp. 158–164.

[3] Mishra, V., and Sapatnekar, S., “Circuit Delay Variability Due to Wire Resistance Evolution Under AC Electromigration,” IEEE International Reliability Physics Symposium (IRPS), 2015, pp. 3D.3.1–3D.3.7.

[4] Oh, C., et al., “Static Electromigration Analysis for Signal Interconnects,” Proceedings of the Fourth International Symposium on Quality Electronic Design (ISQED), 2003, pp. 377–382.

Further Research

Flowcharts for EM Analysis with Data from Other Flows

This chapter highlights algorithms used for powerEM and sigEM analysis of the FIT rate and the data required from other flows to enable this analysis; however, the chapter discussion lacks visual descriptions. Develop comprehensive flowcharts for EM reliability analysis, including:

  • powerEM

    • The (non-reduced) model from parasitic extraction

    • The required data from the power rail voltage drop analysis flow

    • jAVG, jRMS, and jPEAK calculations

    • Scaling of the foundry FIT data for the calculated current density of each power and ground resistive segment

    • Power rail voltage drop analysis with modified gij elements for high-FIT segments

  • sigEM

    • The model from parasitic extraction

    • The required data from the STA flow for the output driver current for each signal net

    • The switching activity data from (stressmark) simulation testcases

    • The jAC, jRMS, and jPEAK calculations (with healing factor)

    • Scaling of the foundry FIT data for the calculated current density of each RC interconnect segment

    • Additional path timing analysis options for high-FIT interconnect segments

Electromigration and Wire/Via/Contact Metallurgy (Advanced)

Process development engineers are continually evaluating new metallurgies for wires and contacts/vias. These engineering teams assess the resistivity and electromigration parameters versus the difficulty in material deposition and patterning.

Research and describe the (temperature-dependent) resistivity and electromigration activation (Black’s equation) for metals used in SoC fabrication (e.g., Al, Cu, Co, W, Ti).

Describe the process development techniques used to increase grain size and/or orient grain boundaries to minimize atom displacement.

Electromigration “Resistive” Fails

The discussion in this chapter proposes addressing EM failures with additional analysis using increased resistance values in the conductance model of the power grid or the RC interconnect model for a timing path delay. The intent is to justify the specification of a lower overall FIT rate for SoC reliability. However, the nucleation and growth rate of voids in a metal wire leading to a resistance increase is a non-linear function, increasing over time.

Research and describe the rate of resistance increase in a metal wire subject to high current densities and void growth for various metallurgies. Describe the rate of change in the resistivity after an increase of N~10%.

Describe the risks associated with the justification of reducing the contribution from high-FIT segments, based on subsequent power rail voltage drop and path timing analysis with modified (R + 10%) elements.

Blech Length (Advanced)

For efficiency, the SoC methodology team may opt to skip the current density and FIT rate calculation altogether for wire segments less than the Blech length.

Research and describe the Blech length and MTTF correction factor for metals (and surrounding dielectric materials) in advanced process nodes. Determine a strategy for optimizing EM flow runtime for segments less than the Blech length.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset