Skip to main content

Cisco Nexus 9300 – VXLAN with BGP EVPN Control Plane – Part 1

For the last few weeks I have been configuring, testing and taking new Cisco Nexus 9300 (Nexus 9000) platform with VXLAN and BGP EVPN control plane into use. It proved to be somewhat challenging due to documentation and user experiences being so sparse. Especially as some posts, configuration guides and documentation seems to tell to do things differently. There is no clear explanation on why they’ve done it that way or another. So I decided to make this post to clear things up, and as always, if you have questions or agree/disagree on something, please comment below. Also note that this post is more a configuration guide than VXLAN (or BGP EVPN) introduction, Google and Cisco documentation can help with that. Part 2 will introduce the DCI (Data Center Interconnect) and how to implement that with VXLAN and BGP EVPN.

Two important notes before we begin:

  • If you use BGP as ingress-replication protocol, then you do not need any Multicast config!
  • Also note that the configuration below is using eBGP (iBGP configuration is quite different)! 

The infra is built with the following specs and software:

  • Spines: Cisco Nexus 9332PQ
  • Leafs: Cisco Nexus 9372PX
  • All switches are running the 7.0(3)I1(3) software (latest as of 3.9.2015)

Topology overview (DCI will be implemented in Part 2):

Topology overview

Topology in more detail:

Detailed topology

Spine configurations

Let’s take a look at configuring the Spines first. Spines do not require any VXLAN fetures as they are simply routing the traffic. Spines will of course require the BGP EVPN control plane for propagating the MAC-addresses and MAC-IP pairs. I’m also including the PIM (Protocol Independent Multicast) multicast configurations, if for some reason, you want to use multicast for BUM-traffic (Broadcast, Unknown Unicast and Multicast). I’m using BGP for BUM traffic as you can see further below in the Leaf-switch configurations. Only Spine-1 configurations are shown, Spine-2 should be self explanatory to configure. Features that need to be enabled on Spines are the follows, others are self explanatory but the “nv overlay evpn” enables BGP EVPN control plane on the switch:

First we will configure the loopback addresses required. loopback0 is the unique loopback address for the Spine and loopback1 is the shared PIM Anycast-RP (Rendezvous Point) address (same on both Spines). Also enable PIM sparse-mode for both loopbacks for multicast to work.

If you want to use multicast for BUM-traffic, below is the configuration for enabling Spines to act as PIM RP’s. Anycast-RP configuration is there to provide redundancy (if one Spine goes down, the other will still handle the multicast).

Interface configurations are as follows. Enable Jumbo-frames on all interfaces (mtu 9216), please note that “system jumbomtu 9216” must to be configured also. Enable PIM sparse-mode and BFD (Bidirectional Forwarding Detection) if you want to use multicast (BFD for faster PIM convergence). “no ip/ipv6 redirects” must be configured for BFD to work properly. If and when you have more Leafs, the interface configurations are similar for all of them.

Only thing left for Spine-configurations is the BGP-configuration. As displayed in the diagram above the Spines are in the same AS. Leafs that form a vPC pair are always in same AS. You can of course alter the Spine AS-number configuration, but just make sure that everything is properly advertised and installed into routing tables where need be. Notice especially that if you are using same AS-number for all Leafs you need to configure “disable-peer-as-check” at Spine per neighbor (for both address-families, ipv4 unicast and l2vpn evpn) to advertise routes learned from same AS to Leafs with same AS.

And below is the route-map used in the above BGP-configurations:

Leaf configurations

Next we will go through the Leaf-switch configurations. This is quite more complex due to the VXLAN-configurations. Let’s go through this in the same order as for Spines, first features, loopbacks, multicast (if needed) and interface configurations. Only Leaf-1 configuration is shown, again it is easy to implement the next switches using the same template. In this configuration example the Leaf-1 and Leaf-2 are configured in vPC domain, so we will go through the vPC config here also.

Loopback configuration is a bit different when using vPC. The primary IP-address is unique on both vPC peers. The secondary is same on both switches. When BGP EVPN is advertising routes to Spines, it uses the secondary IP-address as a next-hop address, this allows both switches to encapsulate and terminate VXLAN traffic.

Multicast configuration (if needed) is not that complicated, only the Anycast-RP address is used as a RP.

Spine facing interface configuration below, same specifics as above in the Spine-interface configuration.

vPC configuration and required features (not shown here is peer-link configuration, however it is not any different from original Nexus implementation):

SVI for vPC peer-gateway functionality. Remember to allow the VLAN over peer-link!

BGP-configuration, router-id is unique on both switches in vPC pair (best to use the primary, unique loopback address):

And below are the route-maps used in the above BGP-configurations:

Now we can get to configuring the VXLAN networks. First configure VLAN’s and their vn-segments (VNI = Virtual Network Identifiers). The first three Customer VLANs are for normal traffic and the last 1001 is for VXLAN <–> VXLAN routing (see articles on Symmetric IRB [Integrated Routing and Bridging] which Cisco uses).

Configure a VRF for the Customer

To enable routing between VXLANs you must create a SVI in the routing VNI. You must also create anycast gateways for all the L3 routable networks (VLAN 100 and 200 below):

NVE (Network Virtualization Endpoint) interface is the VTEP (VXLAN Tunnel Endpoint). You are going to use the loopback0 interface as the NVE source-interface. In vPC case this means the shared loopback address (same on both peers), otherwise the address would be unique on all leafs.

Add the Customer VRF configuration under “router bgp 65001” to propagate the Layer 3 information also:

And as a last step you need to enable EVPN Layer 2 information propagation. The routing VNI does not belong here, only the networks which have Layer 2 traffic. Again easy way to generate route-targets is to use VNI number as a prefix and VLAN as suffix (VNI:VLAN)

Normal end host facing interfaces or interfaces facing, for example, non-VXLAN Layer 2 switches are configured as you did without VXLAN functionality.

Verifying that everything works

And now onto how to verify that everything works. Of course if you successfully ping from connected host to host in other switch, it is quite a good indication that everything works (remember though that in the same VLAN in the same vPC domain traffic takes the peer-link, not the VXLAN encapsulated way). Please ask if you need more specific outputs, I’m expecting some of these to be familiar to most so I’m just throwing some tips here.

For underlay from Spines and Leafs you can check that there are routes to all the VTEPs and everything is propagated in the BGP right way. Also check that BFD and ECMP works properly.

  • show bfd neighbors
  • show bfd neighbors details
  • show ip route
  • show forwarding ipv4 route
  • show ip bgp summary
  • show ip bgp neighbors xxx.xxx.xxx.xxx routes
  • show ip bgp neighbors xxx.xxx.xxx.xxx advertised-routes

Verify that the EVPN routes are correctly propagated and installed. Using these commands you can also check the MAC – IP pairs and how they are advertised / received.

  • show bgp l2vpn evpn summary
  • show bgp l2vpn evn neighbors xxx.xxx.xxx.xxx routes
  • show bgp l2vpn evn neighbors xxx.xxx.xxx.xxx advertised-routes

From Leafs you can check the Layer 2, Layer 3 VRF and VXLAN specifics.

  • show mac address-table vlan 100       # Remote MAC addresses are shown behind the respective NVE neighbor

  • show l2route evpn mac-ip all       # This will display the MAC-IP pair table (only for the vn-segments you have ARP-suppression enabled)

  • show l2route evpn mac all       # This will display the MAC and next-hop (VTEP) table (also for pure L2 networks with no L3 configured)

  • show ip route vrf VRF-CUSTOMER-X       # See the routing table for the Customer VRF. Note that the hosts are in the table also (.10 via other VTEP and .100 local)

  • show nve peers      # Displays all the peer NVE-interfaces (VTEPs)

  • show l2route topology      # Displays all vn-segments configured on the switch

  • show l2route evpn imet all      # Displays vn-segments configured on peer VTEPs

  • show nve interface nve1 detail      # Check details from NVE (VTEP) interface

  • show nve vni     # Display all configured vn-segments (VNIs) and their type
  • show nve nvi ingress-replication     # Display ingress replication peers
  • show ip arp suppression topo-info     # Display ARP-suppression status on different vn-segments
  • show nve vni xxxxx counters     # Display vn-segment specific counters

This is the end of Part 1. Now you should have a perfectly working VXLAN infrastucture with BGP EVPN control plane. In the next post, I will go through how to implement a DCI using the same technologies.

12 thoughts to “Cisco Nexus 9300 – VXLAN with BGP EVPN Control Plane – Part 1”

  1. Great write up for EVPN based VXLAN, finally happy to see someone up here doing this. I noticed you chose eBGP instead of iBGP, any reason for your choice? I’ve noticed more users choosing iBGP over eBGP in “single customer” environments; however, I could see this as a viable choice in some hosted environments too, but I just wanted to know what your decision was?

    One thing I wanted to touch on briefly, was your choice in what to resize for TCAM entries. Taking QOS out of the TCAM, IMO, is not such a great idea and here is why:

    In a UCS environment where end users choose specific traffic to be “jumbo enabled” based on COS markings in the 802.1q headers, you’ll find some frustration with Inter-leaf “inter-domain” communication failures, despite having the vDS set the COS tags correctly and UCS QOS setup correct. The issue actually is based on the RFC for VXLAN which states you SHOULD NOT (verbatim caps) copy 802.1q information into the VXLAN header; thus, you lose the COS markings when the packet arrives at the VTEP and on the receiving side, there is no COS marking and now you have broken QOS and jumbo frames.

    The workaround for this is to have your vDS set both the COS and DSCP, so the end users will want to have a consistent end-to-end mapping scheme, and then create a QOS policy at the ToR leaf switches to match on DSCP and set the COS in the frame before sending the frame to the fabric interconnects on the receiving side.

    This will require e-qos and qos to be set; thus, what I do for TCAM carving is:
    region vacl 0
    region span 0
    region redirect 256
    region rp-qos 0
    region mac-qos 0
    (you can set ipv6 qos, but my environment runs IPv6 and IPv4 together)

    An example QOS configuration on the leaf switches would look like this
    class-map type qos match-any GOLD
    match cos 4
    match dscp 26
    class-map type qos match-any SILVER
    match dscp 16
    match cos 2
    class-map type qos match-any PLATINUM
    match cos 6
    match dscp 48

    policy-map type qos SET-COS-FROM-DSCP
    class SILVER
    set cos 2
    class GOLD
    set cos 4
    class PLATINUM
    set cos 6

    The class-maps are also used for QOS based for system policy for traffic leaving the UCS and into the network, hence the match COS setting in there as well.

    This motivates me to get my examples on my blog too, once I can find the time and energy to do it, but what you have here is a great example for the community and showing a DCI example is more than anyone could ask..

    1. Thanks for the comment!

      Mainly the decision between iBGP and eBGP was due to the traffic engineering capabilities of eBGP. Due to being quite a large environment the capabilities make maintenance, affecting traffic flows etc. quite a lot easier.

      And you have a very valid point on the QoS. Configuration example above was for an environment where QoS is not used nor needed. And in turn some of the other features where TCAM carving would have been possible were required. Wouldn’t it be really nice to have all the features and a larger TCAM? 🙂

      But yes, if you have the time and interest, please post your own findings in your blog. The content around VXLAN and BGP EVPN is so sparse, so the more useful content the better! I will post the DCI, as well as some other useful, examples in near future, as soon as I have the time…

  2. Hi there. Great post.Are you doing part 2 soon ? I am just about to deploy this across 2 DC’s.

    Thanks

    1. Great post indeed and I too amd looking at deploying this across x2 DC’s. When are you likely to post around the DCI or are there any posts you’d recommend?

      Thanks
      Steve

      1. Hi,

        Thanks for the comments! Been really busy with work lately and so of course the Part 2 and 3 has been delayed.
        I will post the Part 2 coming weekend at the latest. I will cover the DCI using the VXLAN there.

        Part 2 will also have some IPv6 related stuff. Part 3 which will be posted later on will include border leaf, optimization etc.

        – Jesse

        1. Hi Jesse

          Thanks a lot for the post, it was very helpful in my deployment. When can we expect Part 2? Please let me know.

          Thanks
          Shankar

        2. Hi,
          Thank you for the article.
          Have you done the same configuration to address IPv6 services?
          Regards,
          John.

  3. Hi Jesse

    I have a question about evpn routes.

    if i can receive the evpn bgp update message, and i want to create a ip-mac address table like this:

    the table contents:

    ip, mac, l2vni, l2vni, vtep

    I know how can i get the ip mac address and vtep address(NLRI nexthop address), but how can i get the l2 vni and l3 vni through bgp update attribute? and which attribute it should be?

    thanks.

  4. Just a note: the comment about the requirement for “store and forward” does NOT apply to Nexus 9K platforms, but only to Nexus 5600 switches. I would remove it from the article to avoid confusion.

    Cheers,
    -Max

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.