This application note presents a high-performance, high-availability implementation of Top Of Rack (TOR) switch interconnect with multiple subtending compute servers. The switch used in this example is the EX4600 model from Juniper Networks. The servers are IBM’s Power System E980 model. The connectivity between these devices will be accomplished using point to point 10Gbps Ethernet over OM3 multimode duplex fibers. Additional items required are FluxLight 10GBASE-SR optical transceivers and 10Gbps Network Interface Card (NIC), also referred to as a Host Bus Adaptor (HBA). The sections below describe each of these components in a more detail.
TOR Leaf Switch: Juniper Networks EX4600
Juniper’s EX4600 Ethernet Switch (shown below) is positioned as an enterprise campus distribution switching point and as a low-density data center leaf (aka, Top of Rack or TOR) switch. The EX4600 is a 1RU (1.72” height) chassis and has a total switching capacity of 1.44 Terabits/second (Tbps). The basic system includes 24 SFP+ 10GbE slots (can also run at 1GbE). Two expansions slots may be equipped with either 8-port SFP+ 10GbE modules or 4-port QSFP+ 40Gbps modules. The QSFP+ ports may be configured as 4x10GbE interfaces which are derived by means of a breakout cable.
The EX4600 implements Juniper Networks’ proprietary “Virtual Chassis” technology. This allows up to 10 interconnected EX4600 switches (and select other EX-family switches) to operate as a single logical switch managed via a single IP address. Juniper touts “tremendous configuration flexibility for 10GbE server connectivity by only requiring redundant links between Virtual Chassis groups rather than each physical switch to ensure high availability.”
Rack-mounted Server: IBM Power System E980
Each IBM Power System E980 is composed of one 2RU System Control Unit (SCU) and up to four 5RU System Nodes. Therefore, a fully built out Power System E980 is 22RU or about half of the vertical space in a standard equipment rack.
The SCU (shown below), powered through redundant connections from Systems Nodes, contains redundant “Master Service Processors.” It includes an operator panel with a small LCD display and system USB port and contains what IBM refers to as the system Vital Product Data (VPD). One SCU is required for each server.
The compute power of the E980 server is contained in the 5U system nodes (shown below). While the vast majority of data center servers are based on Intel Xeon Gold or AMD EPYCprocessors, the E980 is based on IBMs proprietary POWER9 processors. Each E980 system node contains four Single-Chip Modules (SCMs). These SCMs may be selected with 8, 10, 11 or 12 POWER9 cores running up to 4 GHz each running up to 8 threads per core. Each SCM has 128GB off-chip eDRAM L4 cache accessed through integrated dual-memory controllers with sustained memory bandwidth of 230GBps (920 GBps per node). Additionally, up to 410GBps of peak bandwidth is supported between the L4 cache and the 32 DIMM slots supporting up to 16TB of DD4 memory. Each system node contains eight Gen4 x16 low-profile PCIe slots. Therefore, a four-node server offers a total of 32 PCIe expansion slots.
The EN16 adaptor includes four 10Gbps small form-factor pluggable (SFP+) transceiver ports. All four ports support 10Gbps Ethernet Network Interface Controller (NIC) functionality including: IEEE 802.3ae, 802.1p for VLAN priority tagging setup, 802.1Q VLAN tagging, 802.3x flow control, 802.3ad load-balancing and failover and IEEE 802.3ad for link aggregation. Also included is jumbo frame support up to 9.6KB, TCP, UDP, TCP segmentation offload (TSO) for IPv4 and IPv6 and TCP segmentation or large send offload.
FluxLight’s EX-SFP-10GE-SR is a Juniper Networks® compatible 10GBase SFP+ Optical Transceiver and is factory pre-programmed with all the necessary configuration data for seamless network integration. These transceivers perform identically to Juniper® original transceivers, are 100% compatible and support full hot swapping operation. The EX-SFP-10GE-SR is 100% MSA (Multi-Source Agreement) compliant.
The 46C3447 SFP+ optical transceiver from FluxLight is 100% equivalent to the IBM transceiver of the same part number. It is fully compatible with the IBM EN16 HBA described in this note. Like all FluxLight optical transceivers, both the 46C3447 and the EX-SFP-10GE-SR are backed by FluxLight’s limited lifetime warranty.
Fiber Jumper Cables: FluxLight FL-LCLC-OM3-2M
FluxLight carries a wide range of fiber jumper cables, with various terminations, types of fiber and lengths. Particularly well suited to the short range 10Gbps Ethernet links used in this the example is FluxLight part number: FL-LCLC-OM3-2M. This is a duplex (2-fiber) cable with dual LC-type connectors on each end. The dual LC connector mates perfectly with the dual LC receptacles on the SFP+ optical transceivers we are using in this example. The OM3 multimode fiber has sufficient bandwidth to allow 10Gbps links up to 300 meters.
Putting it all together
In this example, for enhanced availability, the IBM Power System E680 servers will be dual-homed to the Juniper EX4600 leaf switches. The block diagram below shows two 4-node IBM Power System E680s, each node of which includes the IBM EN16 4-port HBA (NIC), dual-homing to two EX4600 TORS. That is a total of 16 10Gbps multimode fiber (MMF) links from each 4-node E680. These 16 links are split, 8 to each EX4600.
The individual pieces are described in this note are installed as follows. 16 FluxLight EX-SFP-10GE-SR optical transceiver modules are installed in each of the Juniper EX4600 switches (total of 32 EX-SFP-10GE-SR optical modules). One IBM EN16 NIC is installed in each of the 8 IBM Power E980 nodes. There are 8 nodes so 8 EN16 NICs are required. Into each EN16 NIC, four FluxLight 46C3447 IBM compatible SFP+ 10GbE Optical Transceiver, for a total of 32 46C3446 optical transceivers. Since all the 8x 10GbE fiber links groups are identical, a single connection group is shown below indicating the devices required for each group.
Since its inception in 2004, Fluxlight has continually refined its processes to assure the best possible quality for our customers. The foundational elements of these processes are Record-Keeping and Quality Assurance Testing (QAT). To maintain a robust data archive, each part is individually serialized both in its internal EEPROM memory and on its external label. All QAT results and other important information (e.g., manufacture date, compatibility programming) are associated with each Serial Number stored in our cloud-based secure database. QAT consists of visual part inspection as well as verification of all critical parameters and performance of each part. For the 46C3447 optical output power and wavelength (850nm) are verified on optical spectrum analyzers. Receiver sensitivity is verified at both minimum distance and the 300m maximum range over actual (not simulated) MMF , multimode fiber cable. Digital Diagnostics Monitoring (DDM) parameters including: optical output power, receive optical power, internal module temperature, transmit bias current and supply voltage, are tested for compliance with specifications.In addition, the part is verified for proper operation across its full operating temperature range, 0C to 50C.