Mastering IS-IS – Part 4: Convergence and Resiliency – Optimizing for Speed and Stability

Optimize your network for speed and stability with this deep dive into IS-IS convergence. We explore fast timers, BFD, and SPF throttling to achieve sub-second convergence. Learn how Graceful Restart and the Overload Bit ensure resiliency during critical events.
Mastering IS-IS – Part 4: Convergence and Resiliency – Optimizing for Speed and Stability

On this page

In Part 3, we dissected the LSDB, explored how LSPs are originated and flooded, and examined the SPF process in depth using our reference multi-level IS-IS topology. Now, in Part 4, we move beyond database synchronization to focus on network responsiveness and fault tolerance and ensuring that IS-IS not only calculates optimal paths but can also recalculate them in milliseconds when the network changes. We will cover the key mechanisms that make this possible.

Convergence Components in IS-IS

IS-IS convergence is a three-phase process:

  1. Failure Detection: The first step is quickly recognizing a loss of adjacency or link failure.
  2. Topology Update Propagation: The router must flood updated LSPs across the affected level(s) to inform all other routers of the change.1
  3. SPF Recalculation and RIB/FIB Installation: Each router then independently computes new paths and pushes them into the forwarding plane.

In our reference topology from Part 3, a failure between R3 and R4 affects only the Level 1 area. A failure on the R1 to R3 link impacts the Level 2 backbone and both Level 1 areas, while a failure between R1 and R2 contained within a single Level 1 area.

Fast Hello Timers and BFD for Rapid Failure Detection

The default IS-IS Hello Timers are often too slow for modern networks. For critical links, we need sub-second failure detection.

  • Fast Hello Timers: The Hello Interval controls how often IS-IS sends a Hello PDU. By decreasing this interval, you speed up how quickly a neighbor is declared down. A Hold Time is configured as a multiple of the Hello Interval (e.g., three times) to act as a grace period.
    • Default: 10 seconds (broadcast), 3.3 seconds (point-to-point).
    • Tuned: 1 second or less for critical adjacencies.
  • Bidirectional Forwarding Detection (BFD): BFD is the industry-standard for sub-second failure detection. Unlike fast hellos, BFD is a lightweight, independent protocol that can detect failures on any media type. It is generally preferred because it is less CPU-intensive than very fast hellos.
    • BFD uses a minimum interval to define how often BFD packets are sent, and a multiplier to define how many consecutive packets can be missed before a link is declared down. For example, a 50 ms interval with a multiplier of 3 means the router will declare the link down after 150 ms (50 ms x 3) of no BFD packets.
    • In our topology: BFD should be considered mandatory on the Level 2 backbone link between R1 and R3 and is highly recommended for the Level 1 links as well.

This post is for subscribers only

Subscribe to LevelUp I.T. newsletter and stay updated.

Don't miss anything. Get all the latest posts delivered straight to your inbox. It's free!
Great! Check your inbox and click the link to confirm your subscription.
Error! Please enter a valid email address!