Chapter 5. Troubleshooting BGP Convergence

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Troubleshooting BGP Convergence

The following topics are covered in this chapter:

Understanding BGP route convergence

Troubleshooting convergence Issues

BGP slow peer

Troubleshooting BGP route churns

Understanding BGP Route Convergence

Every network design needs to be planned and tested before it is deployed and ready for use. There are various tests that should be performed, such as load test, failure test, convergence test, and so on before the network is ready to carry production traffic. The main problem faced when testing any network design is during convergence testing. The purpose of the convergence testing is to identify how convergent the network is when it is brought into production and actual production traffic will be carried over it. It is challenging when it comes to defining how convergent the network is. Routing convergence can be broadly defined as how quickly a routing protocol can become stable after changes occur in the network—for example, a protocol or link flap. In terms of Border Gateway Protocol (BGP), it can be defined as converged when all BGP neighbor sessions have been established and neighbors have been updated, routes have been learned from all neighbors and installed into the routing table, and all routing tables across the network are consistent after a network event or any change in the network.

Faster convergence leads to higher availability and improved network stability. Thus it is important that before the network is deployed in production, convergence time is properly calculated with thorough testing. But what is convergence time? Consider the topology shown in Figure 5-1. There are multiple paths from the source router in order to reach the destination router, but for simplicity, consider the two paths—primary and secondary. The primary extends from R1 to R2 to R4 to R6, whereas the secondary path extends from R1 to R3 to R5 to R6. If a link on primary path fails, the best path is impacted and leads to a traffic loss. Because of the failure event, a next-best path is computed. The amount of time during which there was a traffic loss in the network while the alternate path was not available to forward the traffic to the point where traffic starts flowing again is called the convergence time.

Figure 5-1 Topology with Primary and Secondary Path

Like any other dynamic routing protocol, BGP accepts routing updates from its neighbors. It then advertises those updates to its peers except to the one from which it received, only if the route is a best route. BGP uses an explicit withdrawal section in the update message to inform the peers on loss of the path so they can update their BGP table accordingly. Similarly, BGP uses implicit signaling to check if there is an update for the learned prefix and to update the existing path information in case newer information is available. Looking closely at the BGP update message as shown in Figure 5-2, it can be seen that new BGP update message is bounded with a set of BGP attributes. Thus, any update with different set of attributes needs to be formatted as a different update message to be replicated to its peers.

Figure 5-2 BGP Update Message

As the networks grow larger, this could eventually pose scalability challenges and convergence issues especially to the service provider and enterprise networks to maintain an ever-increasing number of Transmission Control Protocol (TCP) sessions and routes. If the scale of the network has increased, the BGP process will have to process all the routes present in the BGP table and update its peers. In addition, the router processing the updates in such a scaled environment demand more memory and CPU resources. Because BGP is a key protocol for the Internet, it is important to ensure that BGP is highly convergent even with increased scale.

BGP convergence depends on various factors. BGP convergence is all about the speed of the following:

Establishing sessions with a number of peers

Locally generate all the BGP paths (either via network statement, redistribution of static/connected/IGP routes), and/or from other component for other address-family for example, Multicast Virtual Private Network (MVPN) from multicast, Layer 2 Virtual Private Network (L2VPN) from l2vpn manager, and so on.)

Send and receive multiple BGP tables; that is, different BGP address-families to/from each peer

Upon receiving all the paths from peers, perform the best-path calculation to find the best path and/or multipath, additional-path, backup path

Installing the best path into multiple routing tables, such as the default or Virtual Routing and Forwarding (VRF) routing table

Import and export mechanism

For another address-family, like l2vpn or multicast, pass the path calculation result to different lower layer components

BGP uses lot of CPU cycles when processing BGP updates and requires memory for maintaining BGP peers and routes in the BGP table. Based on the role of the BGP router in the network, appropriate hardware should be chosen. The more memory a router has, the more routes it can support, much like how a router with a faster CPU can support a larger number of peers.

BGP updates rely on TCP, optimization of router resources such as memory and TCP session parameters such as maximum segment size (MSS), path MTU discovery, interface input queues, TCP window size, and so on help improve convergence.

BGP Update Groups

An update group is a collection of peers with an identical outbound policy. The update groups are dynamically formed during the time of the configuration. Two peers will be part of same update group if one of the following conditions is met:

Peers are in same peer group.

Peers are having the same template.

If the peers are not part of any of the preceding two, they will be in same update group if they have the same outbound policy.

After an update group is formed, a peer within the update group is selected as a group leader. The BGP process walks the BGP table of the leader and then formats the messages that are then replicated to the other members of the update group. This is so because the router needs to format the update only once and replicate the formatted update to all the peers in the update group because they all need to have the same information. Because the messages are not required to be formatted for all the peers but only for the leader of the update group, this saves lot of resources and processing time on the router.

On IOS, an update group is verified using the command show bgp ipv4 unicast update-group [group-index]. This command displays the update group index, the address-family under which the update group is formed, messages formatted in the update group, the messages replicated to the peers in the update group, and all the peers in the update group. If a particular peer is in process of being replicated or not yet converged, an asterisk (*) is seen beside the peer, indicating that the peer is still being updated. The topology in Figure 5-3 is used for understanding the update groups and the update generation process. In this topology, R1, R10, and R20 are the three route reflectors (RR) whereas R2, R3, R4, and R5 are the RR clients.

Figure 5-3 Topology with Route Reflectors

Example 5-1 displays the command output of show bgp ipv4 unicast update-group and previously discussed information. Also, this command shows the BGP update version, which generally matches the update version in the show bgp ipv4 unicast summary command. If the peers of the update group are configured as route-reflector clients, it is displayed in the command output. This command also displays any outbound policy attached to the update group.

Example 5-1 BGP Update Group on IOS

Table of Contents for Chapter 5. Troubleshooting BGP Convergence

Create new playlist

Sign In

Sign Up

Chapter 5. Troubleshooting BGP Convergence

Understanding BGP Route Convergence

BGP Update Groups

BGP Update Generation

Troubleshooting Convergence Issues

Faster Detection of Failures

Jumbo MTU for Faster Convergence

Slow Convergence due to Periodic BGP Scan

Slow Convergence due to Default Route in RIB

BGP Next-Hop Tracking

Selective Next-Hop Tracking

Slow Convergence due to Advertisement Interval

Computing and Installing New Path

Troubleshooting BGP Convergence on IOS XR

Verifying Convergence During Initial Bring Up

Verifying BGP Reconvergence in Steady State Network

Troubleshooting BGP Convergence on NX-OS

BGP Slow Peer

BGP Slow Peer Symptoms

High CPU due to BGP Router Process

Traffic Black Hole and Missing Prefixes in BGP table

BGP Slow Peer Detection

Verifying OutQ value

Verifying SndWnd

Verifying Cache Size and Pending Replication Messages

Workaround

Changing Outbound Policy

Advertisement Interval

BGP Slow Peer Feature

Static Slow Peer

Dynamic Slow Peer Detection

Slow Peer Protection

Slow Peer Show Commands

Troubleshooting BGP Route Flapping

Summary

Reference

Table of Contents for
Chapter 5. Troubleshooting BGP Convergence