Skip to main content

Juniper QFX, IP-Fabric and VXLAN – Part 1

See the second part here: Juniper QFX, IP-Fabric and VXLAN – Part 2

Recently I have been lab testing and evaluating some Juniper QFX switches and new DC LAN architectures. In this and upcoming posts I will show some configuration guides and hints regarding Juniper QFX (5100-48Q and 5100-48S), IP-Fabric (complete L3 eBGP-fabric) and VXLAN configuration. Of course the fabric could use iBGP, OSPF or IS-IS if you wanted so, I just decided to go with eBGP due to some traffic engineering features. L3 Fabric poses some interesting questions and issues what we needn’t think in previous “old school” L2 networks.

  • Bare-metal server connectivity and L2 dual homing
  • Virtual-to-Virtual, Virtual-to-Physical, Physical-to-Physical
  • L2 overlay which is still needed (not only for vMotion)
  • Firewall, load balancer connectivity (talking about non-overlay, non-VXLAN, devices)
  • DCI

As you probably know the VXLAN is used as an overlay to bring L2 visibility over a routed L3 network using MAC-in-UDP encapsulation. This can be used for applications that require L2 connectivity. I’m not going to deep dive into how VXLAN works, but rather post some configuration snippets and guidelines with sample topologies. In case you need more detailed specifications regarding VXLAN, please check VMWare, Cisco, Cisco Live! and Juniper documentations, as these are really good resources, especially the Cisco Live! materials are worth checking out.

The test IP-fabric design is based on Spine-Leaf architecture with eBGP running in the core. There are two spine switches (QFX5100-24Q) and two leaf switches (QFX5100-48S). All leaf switches are connected to all spine switches. Routing protocol is eBGP over point-to-point links. All switches and leafs are running on their own AS number. L2 overlay is designed with VXLAN. In this design I’m introducing directly connected servers / appliances to the VXLAN network. In the Part 1, I will show the configuration of the IP-Fabric, we’ll dive into VXLAN in the next part. See physical and logical topologies below.

Physical topology

IP_FAB_physical

 

Logical topology

IP_FAB_logical

In addition, VXLAN on QFX (with no VXLAN control plane) requires Multicast configuration. PIM Sparse mode configuration and Rendezvous Point (RP) configurations are the minimum for VXLAN to work. Spine switches make the perfect place for RP’s as they are always one hop away from the leaf switches. For redundancy (and load balancing) it is important to configure multiple RP’s using PIM Anycast-RP (shown in this example) or MSDP (Multicast Source Discovery Protocol).

Configuration examples only include the needed configurations for the IP Fabric, Multicast and VXLAN (in the next part of the post). Basic configuration of the switches is of course required. Software version used in this lab was 14.1X53-D15.

 

Configuration for the Spine-switches (only Spine 1 configuration shown)

See the interface configurations below. Note that I’m using maximum sized Jumbo frames at L2-level (9216 bytes) and limiting IPv4 to 9100 bytes. Loopback interface is configured, also with secondary IP-address for multicast PIM Anycast RP (please note that the secondary Loopback IP-address is the same on both Spines!).

Routing-options includes the basics, and also the Forwarding Table export policy. This export policy is used to allow ECMP (Equal Cost Multipath) load balanced routing on the uplinks.

BGP configuration can be seen below. Import and Export policies are also shown below. bgp-fabric-in accepts advertisements for loopback addresses and addresses received from leafs (network 192.168.0.0/16 orlonger).

bgp-fabric-out allows advertisement of loopback addresses originating from itself (protocol direct) and advertises all other networks received in BGP (from leafs). BGP next-hop-self is not needed in the last point due to eBGP automatically changing next-hop for routes received from eBGP neighbor where the next-hop router is not in the same network as the router where the advertisement is directed to.

Graceful-restart is enabled to allow forwarding to work even when BGP is being reinitialized. BFD is enabled for faster convergence in case of failures. Multipath multiple-as is enabled because all the peers are in different AS-numbers, this allows ECMP to work even though AS-numbers differ (allows installation of paths into forwarding table even with different AS-numbers) .

Below is the Multicast configuration for the Spines. Multicast configuration is needed for Multicast based VXLAN to work (this is how “manual” [no controller or control plane] QFX VXLAN implementation works). I have also configured redundancy via PIM Anycast RP below. Be sure to advertise the Spine loopback addresses via eBGP so Spines can see each others loopbacks (this is due to PIM Anycast RP). Configuration below includes setting all downlink (to Leaf-switches) interfaces and loopback interface into PIM Sparse mode. RP configuration includes RP address 172.1.0.1, which is the Anycast RP address and is the same on both switches! Anycast PIM rp-set address is the neighboring Spine-router’s “real” loopback address and local address is the “real” loopback address of the Spine-router being configured.

 

Configuration for the Leaf-switches (only Leaf 1 configuration shown)

See the interface configurations below. Please note that I’m using maximum sized Jumbo frames at L2-level (9216 bytes) and limiting IPv4 to 9100 bytes. Loopback interface is also configured.

Routing-options includes the basics, and also the Forwarding Table export policy. This export policy is used to allow ECMP (Equal Cost Multipath) load balanced routing on switch uplinks.

BGP configuration can be seen below. Import and Export policies are also shown below. bgp-fabric-in accepts advertisements for loopback addresses and addresses received from Spines (network 192.168.0.0/16 orlonger).

bgp-fabric-out allows advertisement of loopback addresses originating from itself (protocol direct) and advertises network 196.168.0.0/16 orlonger from itself (protocol direct). Do not advertise anything else as the last term is reject.

Graceful-restart is enabled to allow forwarding to work even when BGP is being reinitialized. BFD is enabled for faster convergence in case of failures. Multipath multiple-as is enabled because all the peers are in different AS-numbers, this allows ECMP to work even though AS-numbers differ (installs paths into forwarding table even with different AS-numbers).

Multicast configuration for the Leafs. Configuration below includes setting all interfaces into PIM mode (disabled for management interface). RP configuration includes remote RP address 172.1.0.1, which is the Anycast RP address.

 

Verifying that everything works

On Spine-switches verify that routing, forwarding table (you can verify that ECMP is working) and multicast works as expected. Also you can configure some routed SVI’s on the Leafs so that you can verify route propagation and test pinging.

 

On Leaf-switches verify that routing, forwarding table (you can verify that ECMP is working) and multicast works as expected. Also you can configure some routed SVI’s on the Leafs so that you can verify route propagation and test pinging.

 

This is all that is needed to build a simple L3 Fabric with eBGP. Please comment if you have any questions or comments. To make this post from not growing into epic proportions, I will go through on how to configure the VXLAN and set it up properly in the Part 2.

2 thoughts to “Juniper QFX, IP-Fabric and VXLAN – Part 1”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.