# The Evolution of Waveform Relaxation for Circuit and Electromagnetic Solvers

Albert Ruehli, Missouri S&T EMC Laboratory, University of Science & Technology, Rolla, MO

with contributions by Giulio Antonini, Univ. L'Aquila, Italy

July, 2010



- Evolution of Waveform Relaxation (WR)
- WR in the Circuit Domain
- WR for Transmission Lines
- WR for Electromagnetic Solvers

### **SYSTEMS TO BE SOLVED**

Time Domain Solution for Large Systems

- May contain non-linear parts
- Heterogeneous: VLSI circuit
- Problems with many transmission lines
- Homogeneous: EM circuit PEEC models
- Sparse MNA System solution Time  $O(n^{1.5})$

$$C\dot{x}(t) + Gx(t) = Bu;$$

## BEGINNING OF WAVEFORM RELAXATION

Time Domain Solution for Large Systems

- Logic circuits are almost One-Way
- Forward coupling from left to right
- Miller capacitances introduce back coupling!



### WAVEFORM RELAXATION SOLUTION

Example for Weak One-Way Coupled Subsystems

• Solve SSy1 for window in time, Solve SSy2 ...



 $C_1[x_1(t), \varepsilon x_2(t)]\dot{x_1}(t) + G_1[x_1(t), \varepsilon x_2(t)] = Bu;$  $C_2[x_1(t), x_2(t)]\dot{x_2}(t) + G_2[x_1(t), x_2(t)] = 0$ 

## BEGINNING OF WAVEFORM TECHNIQUES

#### Start was Work on One-Way Systems and WR

- 1980 Work on one-way systems (Paper: Ruehli, Sangiovanni, Rabbat)
- 1981 first ideas on WR, Lelarasmee, Ruehli, Sangiovanni-Vincentelli
- 1982 Trans. on CAD paper on WR
- First application goal: large logic circuits

## **OUTLINE OF GENERAL WR SOLUTION**

Assume System Partitioned into Subsystems (SSy)

- Partition circuit into SSy
- Most logic SSy are One-Way forward

Other Fundamental Steps in WR Approach

- Ordering of SSy (make labels)
- Scheduling of SSy for solver
- Solve an SSy for window in time
- Store waveform for time window segment

### **PARTITIONING FOR CIRCUITS**

Circuits are Heterogeneous Systems!

- First, exploit hierarchy from top down
- Short feedback loops are in one SSy
- Weak coupling allows cutting at circuit inputs
- Break circuits depending on strong coupling



#### Non-uniform Structure of Circuits

- Circuit level partition at detail level
- Assemble SSy bottom up branch by branch
- Assemble nodes into strongly coupled SSy



Assemble SSy according to coupling  $R_2$  Only EigenV = 0.19;  $R_2$  and  $R_6 EigenV = 0.25$ 

 $R_2$  and  $R_4$  and  $R_6$  *EigenV* = 0.37

## RAPID CONVERGENCE FOR RC CIRCUITS

#### CONVERGENCE IN SMALL TIME WINDOW

• Analytic Expression For Convergence

$$\|X_E^{(k)}\|_T \le \|1 - e^{-2\alpha t} \sum_{m=0}^{(k-1)} \frac{(2\alpha t)^m}{m!} \|T\| E(0)\|_T$$



Rapid Convergence For Window *T* :  $k \le 2eT$ 

## PARTITIONING DIFFICULT FOR SOME CKTs

#### High Pass Connection, Strong Coupling

- Short circuit between nodes 1 and 2
- Needs more advanced partitioning approach

$$\dot{v}_1^{(k)}(t) + \alpha v_1^{(k)}(t) = \dot{v}_2^{(k-1)}(t) + I_1(t)/C_2$$
  
$$\dot{v}_2^{(k)}(t) + \beta v_2^{(k)}(t) = \dot{v}_1^{(k-1)}(t)$$



$$\mathbf{x} = \begin{bmatrix} 0 & \frac{s}{s+1} \\ \frac{s}{s+1} & 0 \end{bmatrix}$$

### **SSy ORDERING**

Time Domain Solution for Large Systems

- Start at inputs (sources)
- Logic circuits: levelize the graph
- SSy labeling according to graph results



### **SCHEDULING OF SSy**

Scheduling After Ordering

- Assignment of SSy sequence
- Scheduling based on ordering
- However, can be different from ordering

### **SCHEDULING OF SSy SOLVER**

#### **Basic Scheduling of SSy**

- Simple chain example
- Follow ordering = **basic** schedule
- Visit all SSy until all voltages, currents converged



#### $\epsilon$ Theorem Scheduling

- Assume that the SSy have directionality (logic ckts with ε feedback signal)
- The error of cutting the feedback at SSy k results in a back direction error  $O(\epsilon^{(N-k)})$
- The error propagating in the forward direction is  $O(\varepsilon)$ . EPSILON SCHEDULE

### **SSy SOLVE STEP**

#### Given:

- Partitioning done: We have SSys
- Ordering done, Static schedule known
- Solve SSys according to static schedule

#### Use of Updated Waveforms?

- Gauss-Jacobi: Update at end of solving all SSy, Converges slower
- Gauss-Seidel: New waveforms each SSy solve

## SSy LATENCY (DORMANCY)

Avoid Solve Compute Time for Latent SSy

- A Subsystem is Latent if
- All external waveforms  $x_E$  do not change



### WAVEFORM EXCHANGE, STORAGE

Example: Waveforms for two SSy

- Solve SSy 1 using waveforms from SSy 2
- WFs are divided into time windows, Store by windows



### **OVER AND UNDER WAVEFORM**

#### OVER (UNDER) RELAXATION FACTOR $0 \le \beta \le 2$

- Scale the update by  $\boldsymbol{\beta}$
- Under-relaxation
- Over-relaxation
- Approximate  $\beta(t)$

$$\begin{split} \dot{y}_{1}^{(k+1)}(t) &= f_{1}[y_{1}^{(k+1)}(t), x_{2}^{(k)}(t)] \\ x_{1}^{(k+1)}(t) &= \beta(t)y_{1}^{(k+1)}(t) + (1 - \beta(t))y_{1}^{(k)}(t) \\ \dot{y}_{2}^{(k+1)}(t) &= f_{2}[x_{1}^{(k+1)}(t), y_{2}^{(k+1)}(t)] \\ x_{2}^{(k+1)}(t) &= \beta(t)y_{2}^{(k+1)}(t) + (1 - \beta(t))y_{2}^{(k)}(t) \end{split}$$

## PARALLEL PROCESSING FOR WR

Suitability for Parallel Processing

- Used for large circuits with many SSys
- Need algorithms which keep most processors busy
- Would like to have number of processors smaller than number of SSy

#### Experience with Parallel Circuit Solver

- Faster if use more aggressive partitioning allowing for non-uniform iterations
- Best approach for parallel (Spice) circuit solver?

## **COMPARE WR TO SPICE**

#### **SPICE-** Time Point By Point Computations

- Computations are localized in single matrix
- Short compute times only
- Cannot tolerate delays(latency) in processor communication, less suitable for parallel

#### WR- Compute all Point for Time Window

- Put a small Spice on each processor as solver
- WR, processor exchange of *waveforms* rather than point data only
- Can tolerate larger communication latency

### PARALLEL WR FOR LARGE CIRCUITS

- Circuits with up to 186k transistors
- 256 processor WR circuit solver speedup



## SUMMARY OF VLSI CIRCUIT PART

#### WR For Pure Circuit Problems

- Many interesting circuit specific algorithms
- Good enhancement of WR performance
- High efficiency for parallel processing

#### State Of The Art

- Cheap parallel processors are widely available
- Makes WR more useful
- Still much work needs to be done

## APPLICATION TO COMBINED ELMAG./CIRCUIT PROBLEMS

#### EM/Ckt Problems

- General: EM and Ckt interactions challenging
- 3D EM solutions much different from Ckt
- Full wave solution adds challenges
- Nonlinear combined solvers are difficult

#### **Observations About Partitioning**

- EM problems can be systematically partitioned
- Homogeneous structures with fixed partitioning
- Partitioning and convergence can be controlled

Transverse Partitioning for Multi-Lines

- Modeling with many TLs is very time expensive
- Use transverse WR partitioning for problem



### **PARTITIONING FOR TLs**

Excessive Spice Compute Time

- Modeling with many TLs is very time expensive
- Without WR compute time



### **PARTITIONING FOR TL RESULTS**

#### **Excessive Spice Compute Time**

- Modeling for multi TLs is very time expensive
- Compute time with transverse WR is linear!



### **PEEC for 3D WR-EM SOLUTION**

#### PEEC - Transforms EM Problem to Circuit Domain

- Transient (and frequency) domain EM solutions
- PEEC ckt. models: Consists of capacitances, inductances, resistances, voltage, current sources
- Partitioning at coupled elements
- Using modified nodal analysis (MNA) formulation
- PEEC gets low frequency and *dc* solution

### **Basic Derivation of PEEC Model**

Equation for Total Electric Field
KVL:  $v = \int \mathbf{E} \cdot \mathbf{dl}$ 

$$\bar{E}^{i}(\bar{r},t) = \frac{\bar{J}(\bar{r},t)}{\sigma} + \mu \int_{\nu'} G(\bar{r},\bar{r}') \frac{\partial \bar{J}(\bar{r}',t_d)}{\partial t} d\nu' + \frac{\nabla}{\varepsilon_0} \int_{\nu'} G(\bar{r},\bar{r}') q(r',t_d) d\nu'$$
(1)

- PEEC Circuit Model Element Computation
  KVL: Voltage = R I + s Lp I + Q/C
- RHS Term 1: Resistance
- RHS Term 2: Partial Inductance
- RHS Term 3: Coefficient of Potential

## (Lp,P,R,\u03c7)PEEC Equivalent Circuit Model

PEEC Equivalent Circuits For Two Basic Cells

- Example: 3 Node Discretization of "Metal Stick"
- Path along metal conductor is strongly coupled
- Coupled Partial Inductances and Capacitances



### **Partitioning Into SSy**



EM Interactions between SSys

- PEEC mutual coupling between all EM SSy
- Challenge is coupled branches SSy to SSy

## **OUTLINE OF EM SSy PARTITIONING**

#### EM Geometry Partitioning into SSys

- Circuit topology is same for all PEEC cells
- Partial inductance coupling decreases with d
- Capacitive coupling decreases with d
- SSy formed based on weak coupling

#### **PEEC Model Direct Coupling**

- Break at less coupled parts
- Need to break resistive conduction path
- Galvanic-ally isolated units are easy to decouple
- Trade-off between SSy size and no. iterations

## PRE-ESTIMATION OF COUPLING STRENGTH

#### Coupling factors checks for partitioning

- Do we have to know the circuit details to estimate couplings?
- Good news. WR coupling may be large compared to EM coupling!
- WR Coupling factors  $\gamma \le 0.25$  is small
- Each iteration error will be reduced by factor 4
- Convergence in 3 to 5 iterations
- EM Coupling  $10^{-3}$  may still be large!
- Cannot neglect such EM couplings

### **ESTIMATE OF COUPLING STRENGTH**



#### Coupling factors checks for partitioning

- Inductive coupling:  $\gamma = Lp_{12}^2/(Lp_{11}Lp_{22})$
- Can also use distance-size related criteria
- Capacitive couplings, use similar approximation
- WR couplings is weak if  $\gamma \ll 0.1$

### Inductive SSy WR Decoupling



 $V_2 = Lp_{26}sI_6 + Lp_{29}sI_9 + \cdots; V_6 = Lp_{62}sI_2 + Lp_{69}sI_9 + \cdots$ 

# Capacitive SSy WR Decoupling



## ASSEMBLING THE SSy FROM ELEMENTS





Test Coupling all Elements Between SSy Elements

• *dc* paths are directly coupled

## **SOLUTION OF PARTITIONED** SSys

- Neutral Delay Differential Equations (NDDE) in Modified Nodal Analysis (MNA) form
- $\mathcal{C}^*_0 \dot{x} + \mathcal{G}^*_0 x + \sum_i \mathcal{G}^*_i x(t \tau_i) + \sum_i \mathcal{C}^*_i \dot{x}(t \tau_i) \sum_i \mathcal{B}^*_i u_i(t \tau_i) = -\sum_i \mathcal{C}^+_i \dot{x}(t \tau_i) \sum_i \mathcal{G}^+_i u_i(t \tau_i) + \sum_i \mathcal{B}^+_i u_i(t \tau_i)$
- Solve the subsystems SSy in usual Spice form
- Each processor has its own Spice Circuit solver
- Always use latest waveform results
- Each subsystem *SSy* has its own time-step
- Need Multi-Rate interpolation among the coupled waveforms

### **ORDERING AND SCHEDULING FOR** *SSys*



Ordering: (Pin1: SSy1), (Pin3: SSy2), (Pin4: SSy3), (Pin5: SSy 4), (Pin2,Gnd: SSy 5)

• Basic schedule SSy1, SSy2, SSy4, SSy5, SSy3

### MIXED WR-PARALLEL MATRIX SOLVER

- Large dependence on number of available processors
- Large dependence on system size (Number of SSy)
- Convergence in 3 to 10 iterations
- Conventional solution: At most 1 processor per SSy
- New solution: Assign matrix solver
- Number of processors depends on size of SSy
- SSy compute time is more uniform

## VALIDATION PROBLEMS FOR WR SOLUTION

The first contact is driven by a pulse voltage source with rise time  $\tau_r = 50$  ps.



Left: transient voltage; Right: magnitude spectrum.

### PARALLEL MATRIX SOLUTION OF SSy

- Size of each SSy is different for real problems
- Several processors to solve SSy circuits



#### Parallel compute time for 1 pin and pin + ground

#### **A CONNECTOR TEST PROBLEM**



### WAVEFORM COMPARISON WITH WR



Input and output WF flat and WR comparison

### **CONNECTOR MODELING RESULTS**

| Inductive cells | Capacitive cells | Nodes |
|-----------------|------------------|-------|
| 552             | 752              | 200   |

#### Table 1: Global problem.

| Inductive cells | Capacitive cells | Nodes |
|-----------------|------------------|-------|
| 264             | 304              | 80    |

Table 2: Grounded pin+ground plane.

| Global [s] | Grounded pin+ground plane [s] | Ratio |
|------------|-------------------------------|-------|
| 119.4      | 21.35                         | 5.5   |

Table 3: CPU-time requirements.

## LARGE EM SYSTEM BEHAVIOR FOR WR

- Original circuit matrix size: NxN
- Number of subsystems *SSy*: S
- Number of processors: P
- Number of WR iterations: K
- Circuit solver run time assumed:  $O(N^2)$
- S = P = 3; K=3;

$$Time_{Full} = N^2$$
(2)  
$$Time_{WR} = \frac{KN^2}{S} = N^2$$
(3)

## **SUMMARY AND CONCLUSIONS**

#### WR for circuits

- Introduction of issues for circuit WR
- Status: Ongoing work on improving partitioning

#### **PEEC** solver status

• Work in starting phase, several problems solved

#### **General Status**

- Many papers have been published, parallel WR
- Continuous progress on new algorithms and implementations
- Commercial interest is in large parallel solvers