Mastering IS-IS – Part 4: Convergence and Resiliency – Optimizing for Speed and Stability

In Part 3, we dissected the LSDB, explored how LSPs are originated and flooded, and examined the SPF process in depth using our reference multi-level IS-IS topology. Now, in Part 4, we move beyond database synchronization to focus on network responsiveness and fault tolerance and ensuring that IS-IS not only calculates optimal paths but can also recalculate them in milliseconds when the network changes. We will cover the key mechanisms that make this possible.

Convergence Components in IS-IS

IS-IS convergence is a three-phase process:

Failure Detection: The first step is quickly recognizing a loss of adjacency or link failure.
Topology Update Propagation: The router must flood updated LSPs across the affected level(s) to inform all other routers of the change.¹
SPF Recalculation and RIB/FIB Installation: Each router then independently computes new paths and pushes them into the forwarding plane.

In our reference topology from Part 3, a failure between R3 and R4 affects only the Level 1 area. A failure on the R1 to R3 link impacts the Level 2 backbone and both Level 1 areas, while a failure between R1 and R2 contained within a single Level 1 area.

Fast Hello Timers and BFD for Rapid Failure Detection

The default IS-IS Hello Timers are often too slow for modern networks. For critical links, we need sub-second failure detection.

Fast Hello Timers: The Hello Interval controls how often IS-IS sends a Hello PDU. By decreasing this interval, you speed up how quickly a neighbor is declared down. A Hold Time is configured as a multiple of the Hello Interval (e.g., three times) to act as a grace period.
- Default: 10 seconds (broadcast), 3.3 seconds (point-to-point).
- Tuned: 1 second or less for critical adjacencies.
Bidirectional Forwarding Detection (BFD): BFD is the industry-standard for sub-second failure detection. Unlike fast hellos, BFD is a lightweight, independent protocol that can detect failures on any media type. It is generally preferred because it is less CPU-intensive than very fast hellos.
- BFD uses a minimum interval to define how often BFD packets are sent, and a multiplier to define how many consecutive packets can be missed before a link is declared down. For example, a 50 ms interval with a multiplier of 3 means the router will declare the link down after 150 ms (50 ms x 3) of no BFD packets.
- In our topology: BFD should be considered mandatory on the Level 2 backbone link between R1 and R3 and is highly recommended for the Level 1 links as well.