Networking with z/OS and Cisco Routers : An Interoperability Guide 9780738423432

201 39 3MB

English Pages 376 Year 2002

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Guide to Cisco Routers Configuration Becoming a Router Geek [2nd edition] 9783319546292, 9783319546308, 9783319854397, 3319854399

601 102 4MB Read more

.NET and COM: The Complete Interoperability Guide

The focus of the book is on COM Interoperability (since it's a much larger subject), and the heart of the discussio

1,706 211 69MB Read more

Cisco Routers for the Desperate, 2nd Edition: Router and Switch Management, the Easy Way 9781593271930, 159327193X

A guide to Cisco routers and switches provides informaton on switch and router maintenance and integration into an exist

540 80 5MB Read more

Cisco Certified Support Technician CCST Networking 100-150 Official Cert Guide [1 ed.] 0138213429, 9780138213428

Trust the best-selling Official Cert Guide series from Cisco Press to help you learn, prepare, and practice for the CCST

280 133 43MB Read more

Cisco Intelligent WAN (IWAN) (Networking Technology) 1587144638, 9781587144639

Cisco Intelligent Wide Area Network (IWAN) customers are achieving remarkable savings in WAN costs, and typically achiev

1,922 162 6MB Read more

Introducing Cisco Unified Computing System: Learn Cisco UCS with Cisco UCSPE 9781484289853, 9781484289860, 1484289854

The Cisco Unified Computing System (UCS) can be found in the majority of data centers across the world. However, getting

205 54 9MB Read more

Introducing Cisco Unified Computing System: Learn Cisco UCS with Cisco UCSPE 9781484289860, 9781484289853, 1484289854

The Cisco Unified Computing System (UCS) can be found in the majority of data centers across the world. However, getting

204 62 27MB Read more

Running IPV6: [a practical guide to configuring IPV6 for Windows XP, MacOS X, FreeBSD, Red Hat Linux, Cisco routers, DNS and BIND, Zebra and Apache 2] 1590595270, 9781590595275

504 134 4MB Read more

Cisco CCNA Command Guide: 3 in 1- Beginner's Guide+ Tips and tricks+ Advanced Guide to learn CISCO CCNA

1,374 230 7MB Read more

Mastering Python networking: advanced networking with Python 9781784397005, 1784397008

Become an expert in implementing advanced, network-related tasks with Python. About This Book - Build the skills to perf

6,973 1,357 10MB Read more

Networking with z/OS and Cisco Routers : An Interoperability Guide
9780738423432

Author / Uploaded
IBM Redbooks

Citation preview

Front cover

Networking with z/OS and Cisco Routers: An Interoperability Guide Implement advanced z/OS and Cisco functionality in your network Details OSPF, EIGRP, MNLB and Sysplex Distributor Includes useful samples and scenarios

Adolfo Rodriguez Edward Mazurek Roland Peschke Rick Williams

ibm.com/redbooks

International Technical Support Organization Networking with z/OS and Cisco Routers: An Interoperability Guide May 2002

SG24-6297-00

Take Note! Before using this information and the product it supports, be sure to read the general information in “Notices” on page ix.

First Edition (May 2002) This edition applies to Version 1, Release 2 of Communications Server for z/OS. Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. HQ7B Building 662 P.O. Box 12195 Research Triangle Park, NC 27709-2195 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 2002. All rights reserved. Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.

Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi The team that wrote this redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Special notice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Comments welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Part 1. z/OS and Cisco interoperation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Physical connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Channel attachment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 Shared networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Routing in your network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Static routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.2 Dynamic routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.1 Integrated Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.2 Differentiated Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 High availability and load distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.1 DNS mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4.2 Connection dispatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4.3 Virtual IP Addressing (VIPA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4.4 Round-robin DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4.5 Connection Optimization (DNS/WLM). . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.6 Network Dispatcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.7 Sysplex Distributor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4.8 MultiNode Load Balancing (MNLB) . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.4.9 Sysplex Distributor/MNLB joint solution . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 2. Connecting z/OS systems and Cisco routers . . . . . . . . . . . . . 27 2.1 Channel-attached Cisco routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.1 CIP and xCPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.2 Channel protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2 LAN-attached Cisco routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.1 OSA-2 adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.2 OSA-Express adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

© Copyright IBM Corp. 2002

iii

Chapter 3. Routing overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1 Static routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Dynamic routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 RIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.1 RIP Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.2 RIP Version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4 OSPF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4.1 OSPF terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.2 Neighbor communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.3 OSPF virtual links and transit areas . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4.4 OSPF route redistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.5 OSPF stub areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.6 OSPF route summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5 EIGRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5.1 Features of EIGRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5.3 Neighbor discovery and recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.4 The DUAL algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 4. Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.1 Overview of QoS protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.1.1 Service models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2 Steps in QoS deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2.1 Traffic audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2.2 Traffic classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2.3 Defining policies for the classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2.4 Planning for RSVP configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3 QoS on the z/OS Communications Server . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.1 PAGENT policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3.2 Configuring QoS in z/OS Communication Server . . . . . . . . . . . . . . . 89 4.4 Ensuring QoS across the Cisco network . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4.1 Cisco IOS QoS support features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.4.2 Configuring QoS in the network . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.4.3 SNA QoS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.5 Managing Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.5.1 What management tools are available? . . . . . . . . . . . . . . . . . . . . . 118 4.6 QoS summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.6.1 QoS reduces costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Chapter 5. Load distribution solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.1 Connection dispatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.1 What this chapter includes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1.2 Distribution manager/forwarding agent in the sysplex . . . . . . . . . . 123

iv

Networking with z/OS and Cisco Routers: An Interoperability Guide

5.1.3 Distribution manager/forwarding agent outside the sysplex . . . . . . 124 5.1.4 Distribution within sysplex, forwarding outside the sysplex. . . . . . . 125 5.2 IBM Sysplex Distributor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.1 Sysplex Distributor elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.2 Sysplex Distributor start tasks and takeover/takeback . . . . . . . . . . 127 5.2.3 Sysplex Distributor load-balancing rules . . . . . . . . . . . . . . . . . . . . . 129 5.2.4 Handling connection requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.2.5 Data path after connection establishment . . . . . . . . . . . . . . . . . . . . 129 5.2.6 Takeover/takeback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.2.7 Reaching the goals of availability and load balancing. . . . . . . . . . . 131 5.3 Cisco LocalDirector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3.2 Connection and datagram flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.4 Cisco MultiNode Load Balancing (MNLB) . . . . . . . . . . . . . . . . . . . . . . . . 135 5.4.1 Overview of the MultiNode Load Balancing (MNLB) functions . . . . 136 5.4.2 Connection establishment and subsequent data flow . . . . . . . . . . . 137 5.4.3 Client/server connection restart . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.4.4 Reaching the goals of availability and load balancing. . . . . . . . . . . 139 5.5 IBM Sysplex Distributor and Cisco MNLB . . . . . . . . . . . . . . . . . . . . . . . . 142 5.5.1 What does this mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.5.2 Overview of IBM Sysplex Distributor with Service Manager . . . . . . 143 5.5.3 Cisco Forwarding Agent, overview and functions . . . . . . . . . . . . . . 144 5.5.4 Cisco Workload Agent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.5.5 Connection establishment process . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.5.6 Failure of application server, TCP/IP stack, system/LPAR . . . . . . . 148 5.5.7 Failure of the Sysplex Distributor . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.5.8 Routing packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.5.9 Additional tasks of the MNLB components . . . . . . . . . . . . . . . . . . . 149 Part 2. Implementation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Chapter 6. Configuring CLAW, MPC+ and OSA-Express. . . . . . . . . . . . . 153 6.1 Cisco CLAW support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.1.1 IOCP definitions for CLAW devices . . . . . . . . . . . . . . . . . . . . . . . . 156 6.1.2 Router definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.1.3 Host TCP/IP profile statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.1.4 Router show commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.1.5 z/OS CLAW commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.2 Cisco CMPC+ support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.2.1 IOCP definitions for CMPC+ devices . . . . . . . . . . . . . . . . . . . . . . . 167 6.2.2 Cisco MPC+ router definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.2.3 MPC+ host definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 6.2.4 z/OS MPC+ commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Contents

v

6.2.5 Router show commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.3 Configuring for the OSA-Express adapter . . . . . . . . . . . . . . . . . . . . . . . . 183 6.3.1 IOCP for OSA-Express devices . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.3.2 Catalyst 6500 configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.3.3 7507 configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 6.3.4 7206 configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.3.5 VTAM and TCP/IP definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 6.3.6 z/OS OSA-Express commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Chapter 7. Routing with OSPF and EIGRP . . . . . . . . . . . . . . . . . . . . . . . . 205 7.1 Topology overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 7.1.1 Routing topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 7.2 OSPF configuration in the sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.2.1 OMPROUTE configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.2.2 Verify routing from the host . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 7.3 ASBR configuration and redistribution . . . . . . . . . . . . . . . . . . . . . . . . . . 217 7.3.1 Redistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 7.3.2 Verify routing from the router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Chapter 8. Implementing QoS in a z/OS and Cisco environment . . . . . . 231 8.1 Implementation steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 8.1.1 Perform traffic audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 8.1.2 Traffic classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 8.1.3 QoS policy definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 8.2 Configuration examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 8.2.1 z/OS configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 8.2.2 Cisco network configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 8.3 QoS test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 8.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Chapter 9. Load distribution with MNLB and Sysplex Distributor . . . . . 255 9.1 Connection distribution for a sysplex. . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 9.2 Advantages of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 9.3 IP addresses used during our tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 9.4 Data flow: Service Manager and Forwarding Agent . . . . . . . . . . . . . . . . 269 9.4.1 Wildcard affinity and processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 9.4.2 Service Manager processes TCP connection request . . . . . . . . . . 272 9.4.3 Continuation of the TCP connection establishment process. . . . . . 273 9.4.4 Fixed affinity processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 9.4.5 Prerequisites for the CASA protocol exchange . . . . . . . . . . . . . . . . 274 9.4.6 Message flow of wildcard and fixed affinities, SYN, ACK, data . . . 275 9.4.7 Message flow for connection data with no fixed affinity . . . . . . . . . 276 9.4.8 Message flow for closing a TCP connection . . . . . . . . . . . . . . . . . . 278

vi

Networking with z/OS and Cisco Routers: An Interoperability Guide

9.5 Service Manager implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 9.5.1 Service Manager new TCPIP.PROFILE definitions . . . . . . . . . . . . 281 9.6 TCP/IP stack of the target systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 9.6.1 TCPIP.PROFILE definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 9.6.2 Basic TCPIP.PROFILE definitions . . . . . . . . . . . . . . . . . . . . . . . . . 283 9.7 Forwarding Agent definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 9.7.1 CASA definitions for Cisco 7507 . . . . . . . . . . . . . . . . . . . . . . . . . . 283 9.7.2 CASA definitions for Cisco router 7206VXR . . . . . . . . . . . . . . . . . . 284 9.8 Operations: control and displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 9.8.1 CASA information in the Sysplex Distributor . . . . . . . . . . . . . . . . . . 285 9.8.2 CASA information in the Forwarding Agent . . . . . . . . . . . . . . . . . . 297 9.8.3 Integrated CASA information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 9.9 Sysplex Distributor backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 9.9.1 TCPIP.PROFILE definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 9.9.2 Sysplex Distributor backup procedures. . . . . . . . . . . . . . . . . . . . . . 312 9.10 Generic Routing Encapsulation (GRE) protocol . . . . . . . . . . . . . . . . . . 325 9.10.1 The need for GRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 9.10.2 Search for a shared OSA-Express solution . . . . . . . . . . . . . . . . . 330 9.10.3 Generic Routing Encapsulation (GRE) overview . . . . . . . . . . . . . 331 9.10.4 Definitions in the Cisco routers 7507 and 7206 . . . . . . . . . . . . . . 337 Related publications . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other resources . . . . . . . . . . . . . . . . . . . . . . . . Referenced Web sites . . . . . . . . . . . . . . . . . . . . . . How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . IBM Redbooks collections . . . . . . . . . . . . . . . . .

...... ...... ...... ...... ...... ......

....... ....... ....... ....... ....... .......

...... ...... ...... ...... ...... ......

. . . . . .

345 345 345 346 346 347

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

Contents

vii

viii

Networking with z/OS and Cisco Routers: An Interoperability Guide

Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

© Copyright IBM Corp. 2002

ix

Trademarks The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Redbooks(logo)™ Advanced Peer-to-Peer Networking® AIX® APPN® CICS® CUA® DB2® DFS™ ESCON®

IBM® IBM server® MVS™ OpenEdition® OS/390® RACF® Redbooks™ SAA® S/390® S/390 Parallel Enterprise

Server™ SP™ VM/ESA® VTAM® WebSphere® z/OS™ z/VM™ zSeries™

The following terms are trademarks of other companies: ActionMedia, LANDesk, MMX, Pentium and ProShare are trademarks of Intel Corporation in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. C-bus is a trademark of Corollary, Inc. in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. SET, SET Secure Electronic Transaction, and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC. Other company, product, and service names may be trademarks or service marks of others.

x

Networking with z/OS and Cisco Routers: An Interoperability Guide

Preface The increased popularity of Cisco routers has led to their ubiquitous presence within the network infrastructure of many enterprises. In such large corporations, it is also common for many applications to execute on the z/OS (formerly OS/390) platform. As a result, the interoperation of z/OS-based systems and Cisco network infrastructures is a crucial aspect of many enterprise internetworks. This IBM Redbook provides a survey of the components necessary to achieve full interoperation between your z/OS-based servers and your Cisco IP routing environment. It may be used as a network design guide for understanding the considerations of the many aspects of interoperation. We divide this discussion into four major components: 򐂰 The options and configuration of channel-attached Cisco routers 򐂰 The design considerations for combining OSPF-based z/OS systems with Cisco-based EIGRP networks 򐂰 A methodology for deploying Quality of Service policies throughout the network 򐂰 The implementation of load balancing and high availability using Sysplex Distributor and MNLB (including new z/OS V1R2 support) We highlight our discussion with a realistic implementation scenario and real configurations that will aid you in the deployment of these solutions. In addition, we provide in-depth discussions, traces, and traffic visualizations to show the technology at work.

The team that wrote this redbook This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Raleigh Center. Adolfo Rodriguez is an Advisory I/T Specialist at the International Technical Support Organization, Raleigh Center. He writes extensively and teaches IBM classes worldwide on all areas of TCP/IP. Before joining the ITSO, Adolfo worked in the design and development of Communications Server for OS/390, in RTP, NC. He holds a B.A. degree in Mathematics and B.S. and M.S. degrees in Computer Science, from Duke University. He is currently pursuing the Ph.D.

© Copyright IBM Corp. 2002

xi

degree in Computer Science at Duke University, with a concentration on Networking Systems. Edward Mazurek is a Customer Support Engineer in the Technical Assistance Center with Cisco Systems, Inc. in RTP, NC. He is CCIE certified (#6448). He has 18 years of experience in the networking field and holds a degree in Mathematics/Computer Science from Binghamton University. His areas of expertise include Cisco routers, System/390 SNA and TCP/IP, and VM/ESA. Roland Peschke is a networking consultant working for IBM customers requesting consulting and education services for the OS/390 TCP/IP and SNA environment. His comprehensive experiences in these areas are based on more than three decades working in IBM Germany and the ITSO Raleigh. He worked intensively on several SNA and TCP/IP redbooks. Rick Williams is a Technical Marketing Engineer with Cisco Systems, Inc. in RTP, NC. He has 18 years of experience in the networking field and specializes in IP/SNA integration. He holds a degree in Computer Science from Southeast Missouri State University. Thanks to the following people for their contributions to this project: Gail Christensen, Margaret Ticknor, Jeanne Tucker, Juan Rodriguez, Linda Robinson International Technical Support Organization, Raleigh Center Mike Law IBM NIVT (Network Integration Verification Test), RTP, NC Jeff Haggar, Bebe Isrel, Van Zimmerman, Tom Moore, Mac Devine IBM Communication Server for z/OS Development Indu Mahadevan, Mark Albert, Dan McCullough Cisco Systems, Inc.

Special notice This publication is intended to help users of Communications Server for z/OS to understand interoperability issues with Cisco routers. The information in this publication is not intended as the specification of any programming interfaces that are provided by Communications Server for z/OS. See the Publications section of the IBM Announcement for Communications Server for z/OS for more information about what publications are considered to be product documentation.

xii

Networking with z/OS and Cisco Routers: An Interoperability Guide

Comments welcome Your comments are important to us! We want our IBM Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: 򐂰 Use the online Contact us review redbook form found at: ibm.com/redbooks

򐂰 Send your comments in an Internet note to: [email protected]

򐂰 Mail your comments to the address on page ii.

Preface

xiii

xiv

Networking with z/OS and Cisco Routers: An Interoperability Guide

Part 1

Part

1

z/OS and Cisco interoperation

© Copyright IBM Corp. 2002

1

2

Networking with z/OS and Cisco Routers: An Interoperability Guide

1

Chapter 1.

Introduction The interoperation of z/OS and OS/390-based systems and Cisco routers involves not only the necessary physical and internet (IP) connection of these, but goes beyond basic infrastructure support. z/OS systems provide many advanced IP functions, including Quality of Service (QoS) and load distribution. When used in conjunction with Cisco routers, z/OS systems can globally implement ubiquitous policy-based QoS and collaborative load distribution (sometimes referred to as load balancing). This chapter gives an introduction to the issues associated with z/OS and Cisco interoperation and includes a survey of alternate self-contained methodologies. As you will see, such interoperation enables you to harness the power of z/OS and Cisco routers, leveraging cooperative functionality in each to produce technologically superior solutions within your network.

© Copyright IBM Corp. 2002

3

1.1 Physical connectivity The first issue to consider when discussing z/OS and Cisco router interoperability is that of physical connectivity between the two machines. That is, data must physically flow between the z/OS and Cisco router to enable their interoperation. Although some interoperation can take place between z/OS and Cisco routers without a shared physical connection, most interesting scenarios arise when such a configuration exists. Additionally, most deployed z/OS IP systems are directly connected to some subset of the Cisco routers that comprise the network fabric. There are a number of options available to physically connect a z/OS system to Cisco routers. In general, these can divided into two broad categories: 򐂰 Channel attachment 򐂰 Shared networks

1.1.1 Channel attachment Cisco routers have two ways of channel attaching to a z/OS system. The first is with a CIP installed in a Cisco 7000/7500 router. The CIP can contain one or two IBM channel connections. These channel connections can be any combination of Enterprise System Connection (ESCON) or bus and tag (parallel) channels. The 7000/7500 series routers can also run a wide variety of other interface processors for both LAN and WAN connections (for example, Fast Ethernet, serial, ATM). The second type of channel connection is with an xCPA installed in a Cisco 7200 series router. The xCPA can either be an ESCON CPA (ECPA) or a parallel CPA (PCPA). Together they are referred to as xCPAs or simply CPAs. The Cisco 7200 series routers also support a wide variety of interfaces running in port adapter (PA) cards. Together the CIP and CPA are referred to as Cisco Mainframe Channel Connection Adapters (CMCCs). 7000/7500 series routers can contain multiple CIPs and 7200 series routers can contain multiple CPAs as long as the router’s total capacity has not been exceeded. Both the CIP and ECPA support the same IP channel protocols when connected to a z/OS system. These two channel protocols are Common Link Access to Workstation (CLAW) and the newer Multipath Channel (MPC or MPC+).

CLAW Common Link Access to Workstation (CLAW) is a channel protocol designed to minimize interruptions to the host while a steady stream of data is being read or written. It utilizes one read subchannel and one write subchannel. The read and write subchannel share the same channel interface in the router. This effectively 4

Networking with z/OS and Cisco Routers: An Interoperability Guide

reduces the channel bandwidth in a single direction by the bandwidth being used in the other direction. It utilizes S/390 channel programs that seldom end (especially when the z/OS system is receiving data). Also, the channel program adapts to the current state of the z/OS’s device driver thereby minimizing the number of interrupts generated to the z/OS system.There is no overhead added (headers/trailers, etc.) to the actual IP payload itself. When both the CLAW device (for example the CIP or CPA) and the z/OS system are moving a consistent large amount of data, the z/OS system operates very efficiently with respect to the z/OS system itself. However, this efficiency comes at the price of large channel program overhead.

MPC+ The IBM MPC+ protocol has been implemented in Cisco routers as Cisco CMPC+ support with Cisco IOS Release 12.0(3) and later router software. Multipath Channel (MPC+) is a channel protocol designed to move large amounts of data to/from a z/OS system and either another z/OS or a channel-attached router or interface adapter (such as an OSA). It utilizes one or more separate subchannels for reading and writing data. In the Cisco implementation, only one read and one write subchannel are allowed but, unlike CLAW, the read and write subchannel can be split across the two real channel interfaces in a CIP (this doesn’t apply to a CPA because it only has one channel interface). This allows the full channel bandwidth to be utilized by data being sent to the z/OS system and data being sent from the z/OS system. MPC+ utilizes seldom-ending channel programs to increase performance on

both the read and write subchannels. These are channel programs that seldom, if ever, actually terminate but are always active (either actually running or suspended). This reduces the interrupt load on the z/OS system much like the CLAW channel programs. Also, the channel program itself is fairly simple, with most CCWs actually reading or writing data in large chunks (up to 64 KB per CCW). Also, in the Cisco implementation, MPC+ only transports IP data. SNA is not supported unless it is encapsulated in IP (with, for example, Enterprise Extender). Figure 1-1 shows the connection of the Cisco router to the IBM zSeries.

Chapter 1. Introduction

5

HOST TCP/IP STACK

VTAM

TRLE

CHANNEL ESCON/BUS-TAG

WRITE

READ

ROUTER IOS 12.0(3) or later

CISCO CHANNEL

CMPC+

LAN/WAN INTERFACES

IP Network

Figure 1-1 Cisco MPC+ channel connection to IBM z/OS

1.1.2 Shared networks An alternative way of connecting z/OS systems and Cisco routers is via some shared network. In this case, both the z/OS and the Cisco router independently connect to some network, such as a LAN or WAN, and use this network for direct communication. On the side of the Cisco router, a channel processor is not necessary, but rather an interface adapter connecting the router to the shared network is required. On the z/OS side, a channel attachment may be used to provide connectivity to the interface adapter or hardware, such as the Open Systems Adapter (OSA) using the MPC or LAN Channel Station (LCS) protocols. Alternatively, the next generation in OSA adapters, the OSA-Express, can supply 6

Networking with z/OS and Cisco Routers: An Interoperability Guide

the required connectivity via the Queued Direct I/O (QDIO) interface. QDIO enables the connectivity to high-speed, high-bandwidth network attachments such as Gigabit Ethernet by employing Direct Memory Access (DMA), which allows the sharing of memory directly between the z/OS and OSA adapter. The most common shared network environments between z/OS and Cisco routers include: 򐂰 Ethernet (Gigabit, Fast Ethernet, or 10Mbps) 򐂰 Token-ring 򐂰 ATM (Classic or LAN Emulation)

1.2 Routing in your network This book also covers another fundamental issue when interconnecting z/OS systems and Cisco routers: routing. One of the major functions of a network protocol such as IP is to connect together a number of disparate networks efficiently. IP routing is the intelligence at the boundaries of all of these networks, which can look at the packets flowing and make rational decisions as to where and how they should be forwarded. Routing allows you to create networks that can be managed separately but which are still linked and can communicate with one another. In order to route packets, each network interface on a machine (such as a z/OS system or Cisco router) on the network has a unique IP address. Whenever a packet is sent, the destination and source addresses are included in the packet’s header information. Routers examine the destination address to see if there is a matching address in their routing tables. These tables are either created by the system administrator, or built dynamically using information received from other routers, or (often) a combination of both. Along with the IP address of each interface, a subnet mask is also defined, which indicates to the network the distinction between the part of the address that represents the subnetwork (a LAN or point-to-point connection, for example) and the part that represents the host (the network interface on a device). The use of a subnet mask allows flexibility in network design (networks may be small or large), but the proper use of routing tables requires that the subnetwork address and the host portion of an IP address be discernible. When sending data to a remote destination, a host passes datagrams to a local router. The router forwards the datagrams towards the final destination. They travel from one router to another until they reach a router connected to the destination’s LAN segment. Each router along the end-to-end path selects the next hop device used to reach the destination. The next hop represents the next device along the path to reach the destination. It is located on a physical network Chapter 1. Introduction

7

connected to this intermediate system. Since this physical network differs from the one on which the system originally received the datagram, the intermediate host has forwarded (that is, routed) the IP datagram from one physical network to another. Figure 1-2 shows an environment where Host C is positioned to forward packets between Network X and Network Y.

Host B

Host A

Application

Application Host C Acting as Router

TCP IP

TCP IP

IP Routing

Interface X

Interface X

Interface Y

Interface Y

Network X

Network Y

Figure 1-2 IP routing operations

A robust routing protocol provides the ability to dynamically build and manage the information in the IP routing table. As network topology changes occur, the routing tables are updated with minimal or no manual intervention. This chapter details several IP routing protocols and how each protocol manages this information. Every IP host can route IP datagrams. It maintains an IP routing table that indicates the IP address of the next hop in order to route an IP datagram. Each host or router along the way needs to know only the next hop IP address in the path. Routing tables, of course, need to be maintained in both directions. There are two types of route entries in TCP/IP: static and dynamic. The Internet is logically divided into autonomous systems, which are essentially individual customers' networks. There are two classes of dynamic routing protocols used in IP: exterior gateway protocols (EGPs) are used between autonomous systems and interior gateway protocols (IGPs) are used within the autonomous systems.

8

Networking with z/OS and Cisco Routers: An Interoperability Guide

1.2.1 Static routing With static routing, the paths to reach networks and hosts are hard-coded in a routing file, accessible to a TCP/IP host. Each host has its own set of definitions. If something changes (a route for a host or network is added or deleted), then the static routes of some or all hosts in a network may need to be updated. Static routing is suitable for small, stable networks, but quite inadequate for large or often changing scenarios. With static routing, however, one has better administrative control over address allocation and resource access. In CS for z/OS IP, static routes are configured in the TCPIP profile by coding either a BEGINROUTES/ENDROUTES block statement or by making use of the GATEWAY statement. The BEGINROUTES block was created in CS for OS/390 V2R10 IP to overcome inconsistencies and the often awkward syntax of the GATEWAY statement. It defines static IP routing table entries in standard BSD format. For more information, please consult z/OS V1R2.0 CS: IP Configuration Reference, SC31-8776.

1.2.2 Dynamic routing Dynamic routing removes the need for coding and manually maintaining static routing tables. All router addressing and path information is built dynamically. These tables are automatically exchanged between routers in a network. This information sharing enables routers to calculate the best path through the network to any given destination. When taking advantage of dynamic routing, an IP host employs the use of a routing daemon. The routing daemon adds, deletes, or changes route entries within the host’s routing table. Additionally, a routing daemon executing on a particular host communicates with routing daemons on neighbor hosts to exchange topological routing information. Ultimately, this information exchange leads to the update of routing table entries. The most common dynamic routing protocols include Routing Information Protocol (RIP), Open Shortest Path First (OSPF), and Enhanced Interior Gateway Routing Protocol (EIGRP). z/OS CS IP implements both RIP and OSPF dynamic routing protocols. Before CS for OS/390 V2R6, only RIP was available and was implemented by the ORouteD server (in V2R5) and by RouteD (in previous releases). In V2R6 a new strategic server called OMPROUTE was introduced, which runs under UNIX System Services and provides both RIP and OSPF functionality. ORouteD may be withdrawn from z/OS CS IP in due course, leaving OMPROUTE as the only routing daemon. Both ORouteD and OMPROUTE are UNIX System Services applications and require the Hierarchical File System (HFS).

Chapter 1. Introduction

9

Often, you will come across the term RouteD meaning routing daemon. This is a common term for a RIP server. ORouteD and its predecessor RouteD are so named because they are indeed RIP servers. The expression GateD is also used, usually for a router with more function, such as OSPF or EGP capability.

RIP RIP is an interior gateway protocol (IGP) designed to manage relatively small networks. RIP uses a hop count (distance vector) to determine the best possible route to a network or host. The hop count is also known as the routing metric. A router is defined as being zero hops away from its directly connected networks, one hop from networks that can be reached through one gateway (router), and so on. In RIP, a hop count of 16 means infinity, or the destination cannot be reached. Thus, very large networks with more than 15 hops between potential partners cannot make use of RIP. The RIP server broadcasts routing information (in other words, its own distance vector tables) to the gateways of directly connected networks every 30 seconds. The server receives updates from neighboring gateways periodically and updates its routing tables. If an update is not received for three minutes, the gateway is assumed down and all the routes through that gateway are set to a metric of 16 (infinity). The server can, for example, determine if a new route has been created, if a route is temporarily unavailable, or if a more efficient route exists. A complete definition of RIP Version 1 is documented in RFC 1058. RIP Version 2 is compatible with existing RIP Version 1 implementations. It also supports variable subnet masks. Variable subnet mask support means that an IP device can use different subnet masks on each of its network interfaces. It requires, in general, that both the TCP/IP stack and the dynamic update protocol that is used by that stack support variable length subnet masks. The TCP/IP stack from OS/390 V2R5 IP onward can be enabled for variable subnetting via the VARSUBNETTING keyword on the IPCONFIG statement. Techniques such as immediate next hop for shorter paths and multicast addressing are used in RIP Version 2 to reduce the load on hosts. A full description of this protocol is detailed in RFCs 1721, 1722, 1723 and 1724.

OSPF Where RIP is based on distance vectors (hop counts), OSPF is based on link states. In other words, OSPF routing tables contain details of the links between routers, their status (active or inactive), their cost (desirability for routing) and so on. Updates are broadcast whenever a link changes status, and consist merely

10

Networking with z/OS and Cisco Routers: An Interoperability Guide

of a description of the changed status. This is in contrast to RIP, where broadcasts occur every 30 seconds and contain the complete distance vector tables. Because of this difference, and for other reasons such as the lack of the 16-hop limit, OSPF is more suitable for large networks than RIP. In fact, OSPF is similar in concept to Advanced Peer-to-Peer Networking (APPN), where the routers (network nodes) maintain the network topology and broadcast any changes whenever they occur. OSPF, like APPN, can divide its network into topology subnets (known as areas) within which broadcasts are confined. The current version (V2) of OSPF is described fully in RFC 2328. The OMPROUTE server in z/OS CS IP makes use of advanced OSPF V2 techniques for improving availability and performance. For example, it can balance the connection load among up to four alternative routes if the costs of the routes are equal (equal cost multipath). It can also interact with VIPA, whether using RIP or OSPF, to ensure that an IP route is always available to host applications as long as at least one interface to the network is active.

EIGRP The Enhanced Interior Gateway Routing Protocol (EIGRP) is a proprietary protocol developed by Cisco Systems, Inc. At the time of this writing, it is not an IETF standard protocol. EIGRP is categorized as a hybrid routing protocol. Similar to a distance vector algorithm, EIGRP uses metrics to determine network paths. However, like a link state protocol, topology updates in an EIGRP environment are event driven. EIGRP as the name implies, is an interior gateway protocol designed for use within an autonomous system. In properly designed networks, EIGRP has the potential for improved scalability and faster convergence over standard distance vector algorithms. EIGRP is also better positioned to support complex, highly redundant networks. EIGRP provides several benefits, including: 򐂰 Fast convergence: EIGRP maintains a list of alternate routes that can be used if a preferred path fails. When the path fails, the new route is immediately installed in the IP routing table. No route recomputation is performed. 򐂰 Partial routing updates: When EIGRP discovers a neighboring router, each device exchanges its entire routing table. After the initial information exchange, only routing table changes are propagated. There is no periodic rebroadcasting of the entire routing table. 򐂰 Low bandwidth utilization: During normal network operations, only hello packets are transmitted through a stable network. 򐂰 CIDR and VLSM: EIGRP supports supernetting and variable length subnet masks. This allows the network administrator to efficiently allocate IP address resources.

Chapter 1. Introduction

11

򐂰 Route summarization: EIGRP supports the ability to summarize routing announcements. This limits the advertisement of unnecessary subnet information. 򐂰 Multiple protocols: EIGRP can provide network layer routing for AppleTalk, IPX and IP networks. 򐂰 Unequal cost load balancing: EIGRP supports the simultaneous use of multiple unequal cost paths to a destination. Each route is installed in the IP routing table. EIGRP also intelligently balances traffic load over the multiple paths. As of this writing, z/OS CS IP does not support EIGRP. As a result, networks using EIGRP tend to use a mixed environment in which they run another dynamic routing protocol, such as OSPF. As a result, the integration of these is key to the network’s complete routing viability.

1.3 Quality of Service Various protocols have been developed to tackle the problem of providing end-to-end services based on application requirements. Initially, IP was defined to provide only a best-effort delivery service. However, it soon became apparent that Quality of Service (QoS) mechanisms were necessary to appropriately divide constrained network resources among applications with differing demands and importance. A network built using no additional QoS mechanisms is still very robust and services many disparate applications, as proven by the popularity and scale of the global Internet. However, as enterprises make heavier use of the Internet and build their own intranets, bandwidth will inevitably become constrained. No longer will the network be able to offer the desired services to each and every application. Convergence beyond data adds new requirements for traffic delivery. Voice, video, and other digitally encoded streams have real-time transport service requirements that cannot be accommodated without supplemental protocols and policies. QoS protocols allow intelligent packet forwarding decisions to be made based on the end-to-end service goals of the applications. Service levels apply to applications and traffic streams on an end-to-end basis. This is an important point to remember when designing a network with QoS. It is also fundamental to the ability of a network, or a subnetwork, to provide different levels of service without introducing complexity that cannot be managed. There are three commonly referred-to models of end-to-end service (Figure 1-3):

12

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Best-effort service Best-effort service is the type of service provided by all general IP-based networks. The network will deliver data, on a first-in, first-out basis as long as resources are available to do so. No guarantees or assurances are made with respect to delay, packet loss, or throughput. You could say a best-effort service lacks any QoS mechanisms.

򐂰 Differentiated Services Differentiated Services involves the handling of individual packets or flows within a network node. Each packet is associated with a particular class of service. Each node along the network path handles packets in a cooperative manner according to a common set of rules resulting in end-to-end service classes. This is usually implemented in IP networks with the Type of Service (TOS) bits or the Differentiated Services (DS) field in the IP header.

򐂰 Integrated or Reserved Services Integrated Services, also known as Reserved or Guaranteed Services, provides the bandwidth and delay characteristics as requested by the application or configured for specific types of traffic. The application reserves the necessary resources ahead of time so that the network saves sufficient resources to satisfy the reservation. The most well-known protocol implementing Integrated Services is the Resource Reservation Protocol (RSVP).

Best Effort (IP, IPX, AppleTalk)

Solved-IP Internet: ubiquitous connectivity Best Effort

Differentiated (First, Business, Coach Class)

Differentiated

Some traffic is more important than the rest

Reserved Reserved (Bandwidth, Delay, Jitter)

Certain applications require specific network resources

Figure 1-3 End to end service models

For levels of service beyond basic IP services or best-effort service, signaling protocols and queuing, traffic shaping and filtering mechanisms are employed to supplement IP and provide the services required by the application. Keep in mind, however, that QoS is not a substitute for necessary bandwidth. Sufficient bandwidth is necessary, and more is certainly better, to minimize congestion

Chapter 1. Introduction

13

within the network. When congestion does occur, QoS mechanisms help to ensure less critical or more tolerant traffic encounters network delay before application traffic with real-time requirements or that of a more important business nature. CS for z/OS IP provides for Quality of Service with the use of its Policy Agent (PAGENT). It allows for specification of Differentiated Services and Integrated Services policies. Together with policy specification in the network fabric (Cisco environment), the Policy Agent is a critical component in a QoS strategy. The Policy Agent can install the policies from a Lightweight Directory Access Protocol (LDAP) server as well as a local file. PAGENT also supports new policies for traffic regulation and management to limit the number of concurrent connections to any given port.

1.3.1 Integrated Services With Integrated Services, or IntServ, a particular QoS is negotiated at the time it is requested. Resource Reservation Protocol (RSVP) is used to allow an application to request or signal the network to reserve a certain amount of bandwidth with particular QoS criteria, such as minimum latency, before data is actually sent. Information is provided relative to the traffic profile the application expects to send and service is requested in terms of required bandwidth and maximum tolerated delay. The network responds to the QoS request based on available resources and then commits to meeting the requested service level or denies the request. While RSVP provides the highest level of guaranteed services, it is also the most complex of the QoS protocols. This is because the reservation must be done across the entire data path with state in each node (router in the path) maintained throughout the duration of the connection. This is certainly not a trivial task when thousands of connections are anticipated. The resulting overhead is the main reason why RSVP and IntServ have received little acceptance in large networks. This problem of per-flow state has led to increased emphasis in current research efforts. The CS for z/OS IP QoS solution provides an RSVP Agent for the handling of RSVP requests. This agent makes use of the RSVP API, which uses AF_UNIX socket communication between a QoS-aware application and RSVP Agent.

14

Networking with z/OS and Cisco Routers: An Interoperability Guide

1.3.2 Differentiated Services Differentiated Services (DiffServ) was developed to allow a network to support multiple service classes without the need to maintain state of each traffic flow along the path or to perform signaling between nodes. It can, therefore, scale to support the traffic seen in today’s global networks. DiffServ is less complex than Integrated or Reserved Services. It is less network intensive and is appropriate for networks of networks even where portions of the network are outside the administrative control of the network domain manager. In DiffServ, the network domain manager or administrator defines aggregate traffic service classes, sometimes referred to as the Olympic classes, that is, platinum, gold, silver, and bronze. Packets are classified into these classes at an ingress node, which could be the sending host. These packets are marked as belonging to their class by using either the TOS bits or the DSCP field in the IP header. DiffServ Domain B

DiffServ Domain C

DiffServ Domain A

DS-Ingress/Egress Node DS-Boundary Node TCB Process

Bit Bucket Premium

Gold

Silver

Bronze

TCB Policer, Shaper, LLQ, WRED, MQC-Clarification and Marketing

Packet color in DSCP

PHB LLQ/WRED

Figure 1-4 DiffServ end-to-end architecture

Chapter 1. Introduction

15

DiffServ is meant to handle traffic aggregates. This means that traffic is classified according to the application requirements relative to other application traffic. Each node then handles the traffic using internal mechanisms to control bandwidth, delay, jitter, and packet loss. Through the use of standard per-hop-behaviors (PHBs), packets receive the proper handling and the result is end-to-end QoS. For true end-to-end QoS, each administrative domain must implement cooperative policies and PHBs. Packets entering a DiffServ domain can be metered, marked, shaped, or policed to implement traffic policies as defined by the administrative authority. This is handled by the DiffServ traffic conditioner block (TCB) function. DiffServ boundary nodes will typically perform traffic conditioning. A traffic conditioner typically classifies the incoming packets into pre-defined aggregate classes, meters them to determine compliance to traffic parameters, marks them appropriately by writing or re-writing the DSCP, and finally shapes the traffic as it leaves the node. The CS for z/OS IP Policy Agent also has support for Differentiated Services. You can use policy statements to set the TOS byte for outgoing packets. This TOS byte can be used by Queued Direct I/O interfaces on z/OS to map priority levels.

1.4 High availability and load distribution The traditional view of a single server has been primarily a single machine with perhaps a few network interfaces (IP addresses). This tends to lead to many potential points of failure within the server: the machine itself (hardware), the operating system (including TCP/IP stack) kernel executing on the machine, or a network interface (and the IP address associated with it). High server availability, therefore, is the ability to overcome these types of failures to ensure service availability even during times of server failure. z/OS addresses high server availability with the use of Virtual IP Addresses (VIPAs). Load distribution (sometimes erroneously, yet acceptably, referred to as load balancing) is the ability of a group of closely related machines called a cluster to spread workload (based on some policy) to target servers comprising the cluster. Usually, this load balancing is measured by some notion of perceived load on each of the target servers. Clustering techniques that address the load balancing of connections requests also typically provide for some high server availability. That is, these techniques dispatch connections to target servers and can exclude failed servers from the

16

Networking with z/OS and Cisco Routers: An Interoperability Guide

list of target servers that can receive connections. In this way, the dispatching function avoids routing connections and requests to a server incapable of satisfying such requests. By providing load distribution, clustering techniques must also provide for other system requirements in addition to the dispatching of connections. These include the ability to advertise some single system-wide image or identity so that clients can uniquely and easily identify the service. Typically this system-wide identity is either an IP address, known as the cluster address, or a host name, known as the service name. In the former, clients will always use the service via the same IP address. In the latter, although the host name is always the same, the service will be identified by different IP addresses depending on server load. This roughly leads to two categories of clustering load distribution: DNS mapping and connection dispatching.

1.4.1 DNS mapping DNS mapping refers to the dynamic changing of DNS entries to map a service name (or host name) to different IP addresses to identify which target server should receive new connections. Of course, this straightforward mechanism has its limitations including its dependence on host name resolution for load distribution. As a result, it is not a strategic form of load distribution. On the z/OS platform, there are two options of DNS mapping solutions, neither of which is remotely strategic: 򐂰 Round-robin DNS 򐂰 Connection Optimization (DNS/WLM)

1.4.2 Connection dispatching Connection dispatching is the dispatching of TCP connections from a dispatching (or distributing) node to a group of target servers. With this technique, the client perceives a traditional TCP connection with a server. The dispatching node, however, receives data from the client and forwards it to the appropriate server, which can reply directly to the client. The z/OS platform (as a target server) has a number of options of this type including: 򐂰 򐂰 򐂰 򐂰

Network Dispatcher Sysplex Distributor MultiNode Load Balancing (MNLB) Sysplex Distributor/MNLB joint solution

Chapter 1. Introduction

17

1.4.3 Virtual IP Addressing (VIPA) Static VIPAs (introduced by z/OS) eliminate a host application's dependence on a particular network attachment. A client connecting to a server would normally select one of several network interfaces (IP addresses) to reach the server. If the chosen interface goes down, the connection also goes down and has to be reestablished over another interface. Additionally, while the interface is down, new connections to the failed interface (and IP address) cannot be established. With VIPA, you define a virtual IP address that does not correspond to any physical attachment or interface. The TCP/IP stack then makes it appear to the IP network that the VIPA address is on a separate subnetwork, and that it is the gateway to that subnetwork. A client selecting the VIPA address to contact its server will have packets routed to the VIPA via any one of the available real host interfaces. If that interface fails, the packets will be rerouted (using dynamic routing) nondisruptively to the VIPA address using another active interface. CS for z/OS IP extends the availability coverage of the VIPA concept to allow for the recovery of failed system images and entire TCP/IP stacks. In particular, the automatic VIPA takeover (sometimes loosely referred to as Dynamic VIPA or DVIPA) function allows you to define the same VIPA address on multiple TCP/IP stacks in a sysplex. One stack is defined as the primary or owning stack and the others are defined as secondary or backup stacks for the VIPA. Only the primary one is made known to the IP network. If the owning stack fails, then one of the secondary stacks takes its place and assumes ownership of the VIPA. The network simply sees a change in the routing tables. In this case, applications associated with these DVIPAs are active on the backup systems, thereby providing a hot standby for the services.

1.4.4 Round-robin DNS Early solutions to address load balancing were often located at the point where host names are translated into actual IP addresses. By rotating through a table of alternate IP addresses for a specific service, some degree of load balancing is achieved. This approach is often called round-robin DNS. The advantages of this approach are that it is transparent both to the client and the destination host. Also it is performed only once at the start of the connection. Figure 1-5 shows an example of round-robin name resolution. In this case, the client resolves the name “www.acme.com”. The DNS server responsible for this domain cycles through its list of possible target servers and returns to the client the next server to use.

18

Networking with z/OS and Cisco Routers: An Interoperability Guide

Local Name Server Client www.acme.com

Intermediate Name Servers 9.37.38.2

Company name server gives out the "next" IP address. For example, 9.37.38.2 9.37.38.1

9.37.38.2

9.37.38.3

Servers Figure 1-5 Round-robin DNS

Unfortunately, this approach is sometimes defeated by the fact that intermediate name servers and client software (including some of the most popular browsers) cache the IP address returned by the DNS service and ignore an expressly specified time-to-live (TTL) value of the name resolution. As a result, the balancing function provided by the DNS is bypassed because the client continues to use a cached IP address instead of resolving again. Even if a client does not cache the IP address, basic round-robin DNS still has other limitations: 򐂰 It does not provide the ability to differentiate by port. 򐂰 It has no awareness of the availability of the target servers. 򐂰 It does not take into account the workload on the target servers.

1.4.5 Connection Optimization (DNS/WLM) The DNS/WLM solution is based on the DNS name server and the z/OS Workload Manager (WLM). Intelligent sysplex distribution of connections is provided through cooperation between WLM and DNS. For customers who elect to place a name server in a z/OS sysplex, the name server can utilize WLM to determine the best system to service a given client request.

Chapter 1. Introduction

19

In general, DNS/WLM also relies on the host name to IP address resolution for the mechanism by which to distribute load among target servers. The DNS/WLM approach works only in a sysplex environment, because of the WLM requirement. If the server applications are not all in the same sysplex, then there can be no single WLM policy and no meaningful coordination between WLM and DNS. Although the DNS/WLM approach overcomes many of the challenges of round-robin DNS, it is still susceptible to host name caching. Additionally, it requires the application to register on the local host, which essentially means a code change in the application. For these reasons, the DNS/WLM solution is no longer considered strategic.

1.4.6 Network Dispatcher Network Dispatcher (NDR) is one part of IBM’s connection dispatching software technology. With NDR, clients send connection requests to a well-known cluster (service) address. At the time the TCP connection is established, the NDR selects a server from its list of target servers best suited to handle this connection. When subsequent packets reach the Network Dispatcher, it forwards them to the chosen server. The chosen target is aware of this special cluster address and accepts data destined for it. NDR has knowledge of the available servers through advisors that keep a watch on various protocols (HTTP, Telnet, FTP) and, in the case of z/OS, an MVS advisor that communicates with WLM regarding server load. NDR uses all the information obtained from the advisors to select a target server. With NDR, as with all connection dispatchers, all packets from the client to the server pass through the Network Dispatcher, since the IP network knows only one address for the servers (the cluster address) and that address belongs to NDR. From the server back to the client, packets use normal IP routing, because the client's IP address is given to the server as the source of the packet. Network Dispatcher also provides a high-availability option, utilizing a standby machine that remains ready to take over load balancing in case of failure of the primary Network Dispatcher. Network Dispatcher is part of the WebSphere Edge Server and is available on Linux, AIX, Solaris UNIX, and Windows operating platforms. Target servers may have different hardware architectures and operating systems including, of course, z/OS systems.

20

Networking with z/OS and Cisco Routers: An Interoperability Guide

1.4.7 Sysplex Distributor Sysplex Distributor is the state of the art in connection dispatching technology among z/OS IP servers. In contrast with Network Dispatcher, the dispatching entity in this solution is a z/OS system within a sysplex and the target servers are exclusively z/OS systems within the same sysplex. Essentially, Sysplex Distributor extends the notion of automatic VIPA takeover to allow for load distribution among target servers within the sysplex. It combines technology used with Network Dispatcher for the distribution of incoming connections with that of Dynamic VIPAs to ensure high availability of a particular service within the sysplex. Technically speaking, the functionality of Sysplex Distributor is similar to that of Network Dispatcher in that one IP entity advertises ownership of some IP address by which a particular service is known. In this fashion, the single system image of Sysplex Distributor is also that of a special IP address. However, in the case of Sysplex Distributor, this IP address (known as the cluster address in Network Dispatcher) is called a distributed DVIPA. Further, in Sysplex Distributor, the IP entity advertising the distributed VIPA and dispatching connections destined for it is itself a system image within the sysplex, referred to as the distributing stack. Like Network Dispatcher and DNS/WLM, Sysplex Distributor also makes use of Workload Manager (WLM) and its ability to gauge server load. In this paradigm, WLM informs the distributing stacks of this server load so that the distributing stack may make the most intelligent decision regarding where to send incoming connection requests. Additionally, Sysplex Distributor has the ability to specify certain policies within the Policy Agent so that it may use QoS information from target stacks in addition to WLM server load. Further, these policies can specify which target stacks are candidates for clients in particular subnetworks. As with NDR, connection requests are directed to the distributed stack of Sysplex Distributor. The stack selects which target server is the best candidate to receive an individual request and routes the request to it. It maintains state so that it can forward data packets associated with this connection to the correct stack. Additionally, data sent from servers within the sysplex need not travel through the distributing stack. Sysplex Distributor also allows a VIPA to move non-disruptively to another stack. That is, in the past, a VIPA was allowed to be active only on one single stack in the sysplex. This led to potential disruptions in service when connections existed on one stack, yet the intent was to move the VIPA to another stack. With Sysplex Distributor, the movement of VIPAs can now occur without disrupting existing connections on the original VIPA owning stack.

Chapter 1. Introduction

21

1.4.8 MultiNode Load Balancing (MNLB) MultiNode Load Balancing (MNLB) is Cisco’s connection dispatching technology. Where the dispatching entity is in the PC or workstation with Network Dispatcher and in the sysplex with Sysplex Distributor, the dispatching entity is located in the router or switch with MNLB. The MNLB architecture is comprised of the following four components: 򐂰 򐂰 򐂰 򐂰

Service Manager Forwarding Agent Workload Agent Backup Service Manager

Figure 1-6 shows the location of the MNLB components.

22

Networking with z/OS and Cisco Routers: An Interoperability Guide

Cluster1

Cluster2

Workload Agents

XCF

Workload Agents

XCF

Router

Forwarding Agents

Router

Switch

Router

Backup Services Manager Services Manager (Local Director)

Switch

Router

Router

Client

Client

IP-Network Figure 1-6 MNLB components

A Service Manager, such as the Cisco Local Director, is responsible for the distribution of connection requests. That is, the Service Manager makes the decision as to which target server will receive a connection. This is done using information about application availability, server processor capacity, and a load-balancing algorithm such as round robin, least connections, or based on information received through the Dynamic Feedback Protocol (DFP). This protocol for example carries information of the Workload Manager (WLM).

Chapter 1. Introduction

23

A Workload Agent provides the information the Service Manager needs to calculate an optimal load balancing result for the server selection. The Workload Agent is software that runs on server platforms or machines that manage server farms or clustered server environments. The Cisco Workload Agent for OS/390 uses WLM data. It converts this data into the common DFP protocol before it is sent to the Service Manager. The Cisco Workload Agent for OS/390 optimizes load balancing in an IBM sysplex environment. A Forwarding Agent is used as a packet redirector that forwards packets based on the Service Manager’s instructions. The Forwarding Agent is software running in Cisco routers or route switches modules. There can be multiple Forwarding Agents for the same distributed service (application). A Backup Service Manager is responsible for providing connection establishment when the primary Service Manager fails.

1.4.9 Sysplex Distributor/MNLB joint solution Because of the explosive growth and forecasts in the near future in IP applications and leveraging access to traditional OS/390 and z/OS controlled transactions and database, IBM and Cisco in cooperation searched for a “High Availability Web Services” solution. This solution provides an extended and adapted package of hardware and software cooperation between the IBM sysplex server site with its dynamic functions and Cisco’s MNLB architecture. The advantages of this solution include: 򐂰 Avoid inbound traffic flow through the Sysplex Distributor 򐂰 No delays in learning load balancing information 򐂰 Use policy or Quality of Service (QoS) information for the selection of the “best” server 򐂰 No need for installation of the LocalDirector for sysplex traffic

Essentially, this solution leverages the strengths of Sysplex Distributor in making accurate connection dispatching decisions by having it provide the Service Manager functionality. Likewise, it leverages the strengths of Cisco routers and switches by having the Forwarding Agent in these boxes, thereby leaving forwarding to the machines that perform forwarding best. With this solution, the Sysplex Distributor selects the most appropriate server based on each system’s WLM information, or Quality of Service (QoS) data, or policy information provided by the Policy Agent (PAGENT). Since the Sysplex Distributor does the load balancing based on the data the WLMs within the cluster provide, the usage of the DFP will no longer be necessary.

24

Networking with z/OS and Cisco Routers: An Interoperability Guide

The Sysplex Distributor then provides connection information to the MNLB Forwarding Agent. This information is transferred via a proprietary Cisco protocol called Cisco Appliance Services Architecture (CASA). The switch (or router) uses this information to forward subsequent client/server data directly to the selected server within the sysplex, thus avoiding having all inbound traffic go through a single point such as the Sysplex Distributor. We recommend the use of this connection dispatching technology. It takes advantage of the inside knowledge that the Sysplex Distributor has about servers within the sysplex. Further, it leverages the packet forwarding supremacy of Cisco routers and switches. Using the MNLB architecture, it allows for the use of multiple forwarding entities, as opposed to just one with base Sysplex Distributor. The net result is a highly tuned mechanism for supporting the e-business applications of today.

Chapter 1. Introduction

25

26

Networking with z/OS and Cisco Routers: An Interoperability Guide

2

Chapter 2.

Connecting z/OS systems and Cisco routers This chapter provides an overview of the various ways of connecting Cisco routers to an IBM z/OS system in an IP network. Included in the overview are Cisco 7000/7500 series routers containing a Channel Interface Processor (CIP) and Cisco 7200 series routers containing a Channel Port Adapter (xCPA). We also discuss LAN-attached Cisco routers using OSA-2 or OSA-Express adapters. Starting with OS/390 V2R5 IP, TCP/IP I/O processing was redesigned and made more tightly coupled with VTAM. This means that IBM Communications Server for z/OS VTAM provides all I/O support for CS for z/OS IP. This collaboration has led to increased performance with the leveraging of Communications Storage Manager (CSM), High Performance Data Transfer (HPDT), and Queued Direct I/O (QDIO). The Communications Storage Manager provides a common storage environment to facilitate and expedite the interaction of TCP/IP and VTAM services with CS for z/OS. That is, as data passes through these components, the need to copy it is alleviated by CSM. CSM provides a well-defined interface that allows quick access to the data contained with buffers. As a result, the same buffer can be used by the application, through the TCP/IP stack, and out through VTAM’s I/O. All TCP/IP devices maintained by VTAM make use of CSM.

© Copyright IBM Corp. 2002

27

High Performance Data Transfer (HPDT) connections such as MPC+ provide multiple read and write subchannels that expand bandwidth and availability of the channels by increasing the number of logical paths available. As a result, HPDT increases throughput of the channels while reducing CPU costs. Furthermore, IP and SNA traffic can share HPDT connections, thereby reducing the amount of overhead required to manage separate connections for SNA and IP. The OSA-Express feature uses the newer I/O architecture called Queued Direct I/O (QDIO). This architecture provides a new, highly optimized data transfer interface that eliminates the need for channel command words (CCWs) and interrupts during data transmission, resulting in accelerated TCP/IP packet transmission. This is done by providing a data queue between TCP/IP and the OSA-Express. OSA-Express utilizes a Direct Memory Access (DMA) protocol to transfer the data to and from TCP/IP. This design eliminates ESCON bus performance limitations. For these reasons, the QDIO architecture is the strategic connectivity option to Cisco routers, provided the LAN connectivity options are available. Without LAN adapters (OSA-Express and Cisco adapter), channel attachment is the only choice.

2.1 Channel-attached Cisco routers There are two types of considerations when channel attaching a Cisco router to a z/OS system. First, depending of the type of router, the type of physical channel connection varies, either using a Channel Interface Processor (CIP) or a Channel Port Adapter (xCPA). Second, the channel protocol used can be either Common Link Access to Workstations (CLAW) or the newer Multipath Channel Plus (MPC+).

2.1.1 CIP and xCPA Cisco routers have two ways of channel attaching to a z/OS system. The first is with a Channel Interface Processor (CIP) installed in a Cisco 7000/7500 router. The CIP can contain one or two IBM channel connections. These channel connections can be any combination of Enterprise System Connection (ESCON) or bus and tag (parallel) channels. The 7000/7500 series routers can also run a wide variety of other interface processors for both LAN and WAN connections (for example, Fast Ethernet, serial, ATM). The second type of channel connection is with a Channel Port Adapter (xCPA) installed in a Cisco 7200 series router. The xCPA can either be an ESCON CPA (ECPA) or a parallel CPA (PCPA). Together they are referred to as xCPAs or simply CPAs. When the term CPA is used, the information applies to both ECPAs

28

Networking with z/OS and Cisco Routers: An Interoperability Guide

and PCPAs. When the term ECPA is used, the information applies to ECPAs only (both ECPAs and ECPA4s if not further qualified). The Cisco 7200 series routers also support a wide variety of interfaces running in port adapter (PA) cards. Together the CIP and CPA are referred to as Cisco Mainframe Channel Connection Adapters (CMCCs). Since the parallel channels are older, they provide lower performance and are not as widely used in z/OS systems. As a result, we do not discuss them in further detail in this book. Only ESCON-attached CIPs and ECPAs are discussed in this book. 7000/7500 series routers can contain multiple CIPs and 7200 series routers can contain multiple CPAs as long as the router’s total capacity has not been exceeded. Cisco routers run an operating system named Internetworking Operating System (IOS). The CIP and CPA run specific microcode for their respective cards. Only certain combinations of IOS/CIP/CPA microcode are allowed. Information on IOS levels, CIP and CPA microcode levels and what is compatible is available at Cisco’s Web site at: http://www.cisco.com

Cisco 7000s/7500s and CIPs The Cisco 7000 series includes the Cisco 7000 and Cisco 7010 routers. The Cisco 7500 series includes the following routers: Cisco 7505, Cisco 7507, Cisco 7513, and Cisco 7576. These are high-end, high-performance, multiprotocol, multimedia routers. They support a wide variety of interface processors and port adapters (PAs). An interface processor is a full-height printed circuit board that provides a variety of media. Cisco has a wide variety of interface processors such as Asynchronous Transfer Mode (ATM), Basic Rate Interface (BRI), Ethernet, Fast Ethernet, synchronous serial, token-ring, etc. Cisco 7000/7500 series routers also support port adapters (PAs) when they are installed in a Versatile Interface Processor (VIP). A PA is a half-height printed circuit board that also provides a wide variety of media. PAs install in either a Cisco 7200 series router or in a VIP. Cisco has a wide variety of PAs such as ATM, Ethernet/Fast Ethernet/Gigabit Ethernet, Voice, etc. The VIP is a standard size interface processor that allows up to two PAs to be installed in it. This allows more flexibility in the router’s interfaces as well as increased performance, because the VIP can switch packets from one PA to another without utilizing the processor in the router itself. The CIP is the interface processor that allows the 7000s/7500s to channel attach to an IBM z/OS system. Actually, the CIP that we will be referencing in this book is the second-generation CIP known as a CIP2. For ease of reading and because this is how they are commonly referred to, CIP2s will be referred to as CIPs in this book. As mentioned previously, CIPs can have any combination of one or two buses and tags and/or one or two ESCON interfaces. Actually, a CIP

Chapter 2. Connecting z/OS systems and Cisco routers

29

always has a “virtual” channel interface in addition to its “real” channel interface(s). These together show up as two or three channel interfaces. This virtual channel interface is not present on an ECPA. This leads to differences in certain configuration and router show commands. These differences will be pointed out whenever they occur. CIPs have a 100-MHz MIPS RISC processor, 512 KB cache memory, and between 32 MB and 128 MB of system Dynamic Random Access Memory (DRAM).

Cisco 7200s and ECPAs The Cisco 7200 series includes the following routers: Cisco 7202, Cisco 7204, Cisco 7204VXR, Cisco 7206 and Cisco 7206VXR. The Cisco 7200 series routers are high-end, high-performance, multiprotocol, multimedia routers in a more up-to-date, compact chassis. The VXR series are the most up-to-date high-performance versions designed with a midplane that provides increased support for multiple high-bandwidth port adapters. The port adapters installed in the Cisco 7200s are of the same type as those installed on the second-generation Versatile Interface Processors (VIP2s) in the Cisco 7500 series routers. The ECPA is the port adapter that allows a 7200 to connect to an IBM z/OS system. One of the main differences between a CIP and an CPA is that a CIP can have one or two channel interfaces whereas the CPA only has a single channel interface. As mentioned previously, CPAs do not have a virtual channel interface and this does lead to some differences in configuration and router show commands. CPAs have a 100 MHz MIPS RISC processor, 512 KB cache memory, and either 16 MB or 32 MB system DRAM. The latest version, the ECPA4, has a 266 MHz MIPS RISC processor, 256 KB cache memory, and between 32 MB and 128 MB system DRAM. There are no configuration differences between ECPAs and ECPA4s.

2.1.2 Channel protocols Both the CIP and ECPA support the same IP channel protocols when connected to a z/OS system. These two channel protocols are Common Link Access to Workstations (CLAW) and the newer Multipath Channel Plus (MPC+).

CLAW CLAW is a channel protocol designed to minimize interruptions to the host while a steady stream of data is being read or written. It utilizes one read subchannel and one write subchannel. The read and write subchannel share the same channel interface in the router. This effectively reduces the channel bandwidth in a single direction by the bandwidth being used in the other direction. It utilizes S/390 channel programs that seldom end (especially when the z/OS system is receiving data). Also, the channel program adapts to the current state of the 30

Networking with z/OS and Cisco Routers: An Interoperability Guide

z/OS’s device driver thereby minimizing the number of interrupts generated to the z/OS system. There is no overhead added (headers/trailers, etc.) to the actual IP packet itself. When both the CLAW device (for example the CIP or CPA) and the z/OS system are moving a consistently large amount of data, the z/OS system operates very efficiently. However, this efficiency comes at a price of a large channel program overhead.

Read subchannel processing The read channel program consists of blocks of channel command words (CCWs) that together read a single IP packet (assuming CLAW packing is not being used). The number of read CCW blocks in the channel program is controlled by the TCP/IP DEVICE statement. When the z/OS system is receiving data (that is on the read subchannel), the IP packet is transferred up the channel with the first read CCW. The next CCW is also a read type CCW but this one transfers only four bytes of information (that is, length, and a flag byte) pertaining to the packet that was just read. The first IP packet will cause the setting of a status byte in the z/OS main storage to x’FF’ and a Program Controlled Interrupt (PCI) is generated to inform the z/OS device driver that there has been an IP packet read. The channel program then does not terminate but continues to another similar block of CCWs that can read a subsequent packet (if available). If no packet is immediately available, the CLAW device places the read CCW in a command retry status. This effectively suspends the read channel program. If there are subsequent packets, then the read CCW causes the data to be transferred to the z/OS main storage without setting the status byte to x’FF’ or generating the PCI interrupt. This makes subsequent packets that immediately follow somewhat more efficient as two CCWs are bypassed. If there is no subsequent read CCW (because the maximum number of read CCW blocks has been processed), then the channel program will terminate and the z/OS device driver will have to re-start it. Re-starting the channel program causes additional overhead. Meanwhile, the z/OS device driver gets control (because of the PCI interrupt) and it processes all packet(s) transferred so far. Once all packets have been processed, it adds the appropriate number of new read CCW blocks to the end of the channel program (enough to get it to its predefined maximum). It then zeros the status byte and exits. Any subsequent packets will again cause the status byte to be set and a PCI interrupt to be generated. It is in this way that the read channel program can run indefinitely. A data rate that is not large enough to keep the z/OS system busy receiving causes the z/OS device driver to continually zero the status byte and exit. This forces the channel program through its less efficient path, setting the status byte and generating interrupts to z/OS. In this less efficient path there are six individual CCWs per IP packet read (the more efficient path still has four CCWs).

Chapter 2. Connecting z/OS systems and Cisco routers

31

A data rate larger than the z/OS system can process causes the channel program to terminate. This causes the data in the CLAW device to either be queued or dropped until the channel program can be restarted by the z/OS system. The largest problem, however, is with the sheer number of CCWs needed to transfer just one IP packet into z/OS.

Write subchannel processing The write channel program consists of blocks of CCWs that together write a single IP packet (assuming again CLAW packing is not being used). The number of write CCW blocks in the channel program is also controlled by the TCP/IP DEVICE statement. There are three individual CCWs per block. All three CCWs are needed for each IP packet written out by z/OS. When the z/OS device driver wants to write out one or more packets, it constructs the channel program with as many write CCW blocks as it needs (up to the previously discussed maximum) and starts the channel program. As each packet is received, the CLAW device sets a per-packet status byte in the z/OS’s main storage to x’FF’ using the same type CCW as in the read operation. This informs the z/OS device driver that the specific packet has been successfully received by the CLAW device and it is now free to remove that packet from the still currently running channel program. It is also free to add on to the end of the channel program additional write CCW blocks with their corresponding packets. It is in this way that the write channel program can run for long periods of time. If the z/OS system has no further packets to be written, the channel program terminates and will be restarted when additional packets need to be written.

CLAW packing Recently, both IBM and Cisco have enhanced the CLAW protocol to pack together multiple IP packets in a single read or write CCW. This can lead to increased data throughput in both the read and write directions. This enhancement requires configuration in the z/OS system only. When running in packed mode, the block size that CLAW utilizes can be either 32 KB or 60 KB (depending on how the z/OS DEVICE statement is specified). The number of packets that will be packed into a block is known as the packing factor. When running in non-packed mode, the data can be read directly into CSM buffers and passed off to the application without any data moves. When running in packed mode, each packet must be moved into its own CSM buffer out of the channel buffer. This in itself is not desirable but since the z/OS system receives significantly less I/O interrupts the performance of the z/OS system is better even with the additional data moves. Since it increases performance without any negative side effects, all of our examples will utilize this enhancement.

32

Networking with z/OS and Cisco Routers: An Interoperability Guide

The prerequisite microcode from Cisco is cip26-17 or xcpa26-17 for 12.0 IOS releases and cip27-11 or xcpa27-11 for 12.1 IOS releases; also, any future image that has the following problems (DDTs) resolved: CSCds19174 and CSCds24793.

Summary Properties of CLAW include the following: 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰

Read and write subchannels on the same physical channel High data to non-data ratio High channel program overhead Poor performer when packets are small Good CMCC processor efficiency Packed option increases efficiency with small packets No delays in transmitting data

MPC+ Multipath Channel Plus (MPC+) is a channel protocol designed to move large amounts of data to/from a z/OS system and either another z/OS or channel attached router. It utilizes one or more separate subchannels for reading and writing data. In the Cisco implementation only one read and one write subchannel are allowed but, unlike CLAW, the read and write subchannel can be split across the two real channel interfaces in a CIP (this doesn’t apply to a CPA because it only has one channel interface). This allows the full channel bandwidth to be utilized by data being sent to the z/OS system and data being sent from the z/OS system. Also, in the Cisco implementation, MPC+ only transports IP data. SNA is not supported unless it is encapsulated in IP (with, for example, Enterprise Extender). MPC+ utilizes seldom-ending channel programs to increase performance on both the read and write subchannels. These are channel programs that seldom, if ever, actually terminate but are always active (either actually running or suspended). This reduces the interrupt load on the z/OS system much like the CLAW channel programs. Also, the channel program itself is fairly simple with most CCWs actually reading or writing data in large chunks (up to 64 KB per CCW). There can be a significant amount of overhead added in the data that moves across the channel. Each read or write starts with a header page that is typically 4 KB. This is always transferred in its entirety. In this header page are a series of descriptors for the actual IP packets that follow in the data segment. If there is no data segment (because all the packets were contained in the header page) and the amount of descriptors and data is less than 2 KB, only 2 KB will be transferred.

Chapter 2. Connecting z/OS systems and Cisco routers

33

Write subchannel processing There are three ways an IP packet can be processed when being transmitted by VTAM: 򐂰 It could be copied into the header page itself

If the packet is marked to be included in the header page and it is 256 bytes or less, then the packet is copied into the header page. z/OS packets are marked to be included in the header page by the TCP/IP stack. 򐂰 It is pointed to by an Indirect Address Word (IDAW)

If the data is in CSM storage, then it can be utilized as is. The data is not moved but is simply included in the IDAWs. One consequence of this is that all IDAWs end on a 2 KB boundary. Except for the first IDAW, all must point to the beginning of a 2 KB block. This means that if the data resides in a CSM buffer and is not on a 2 KB boundary, then there will be filler space prior to the actual packet. If the packet does not end exactly on a 2 KB boundary, then there will be filler spaces after the packet up to the 2 KB boundary as well. This filler space will be sent across the channel. 򐂰 It is moved into a separate 2 KB block of storage (which is then pointed to by an IDAW).

If the data does not reside in a CSM buffer, then it is copied into a 2 KB block of storage and the IDAW is updated to point to it. If the packet does not end exactly on a 2 KB boundary, then there will be filler space after the packet up to the 2 KB boundary. This filler space will be sent across the channel.

Read subchannel processing There are two ways an IP packet can be processed when being sent by the CMCC: 򐂰 It could be copied into the header page itself

If the packet is marked to be included in the header page and it is 256 bytes or less, then the packet is copied into the header page. In the CMCC, packets are marked to be included in the header page when the packet is part of a connectionless flow. This typically means it is a UDP packet, but there are other types of connectionless data as well. 򐂰 It is moved into a separate 2 KB block of storage

The data is then copied into a 2 KB block of storage aligned on a 2 KB boundary to be sent up the channel in response to a read CCW. If the packet does not end exactly on a 2 KB boundary, then there will be filler space after the packet up to the 2 KB boundary. This filler space will be sent on the channel.

34

Networking with z/OS and Cisco Routers: An Interoperability Guide

Since there is a large amount of initial overhead (the header page) the CMCC delays sending data to the z/OS system for up to 10 ms. This allows for more data to be received and to lessen the per-packet overhead. This is not configurable. If sufficient data is received to fill up an entire 64 KB CCW, then no further delay is added and the data is sent to the z/OS system.

Summary Properties MPC+ include the following: 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰

Read and write subchannels can be on separate physical channels Potential low data to non-data ratio Low channel program overhead Poor performer when packets are small Higher CMCC processor load No packed option available Potential delays in transmitting data Higher potential throughput

Which to use (CLAW or MPC+)? The MPC+ protocol is much more efficient than CLAW with its smaller number of CCWs and with the large blocks of data it sends/receives. But if there is not a sufficient rate of data traversing the channel, significant overhead and delays can occur. Also, the data that actually traverses the channel may contain a high percentage of filler space that is not actual packet data. This can make MPC+ a better performer when the amount of actual data is high but a much worse performer when both the amount of data and/or the data rate are low. Another point to consider is that the data structures required by MPC+ (the header page and the subsequent 2 KB blocks) require more CPU processing in both the router and the z/OS system than CLAW. Finally, MPC+ utilizes High Performance Data Transfer, which minimizes data moves in the z/OS system. It does this by utilizing memory provided by the Communications Storage Manager (CSM). On the inbound path, data received by z/OS on an MPC+ connection can be given directly to an HPDT-enabled application without copying it into the applications’s receive buffers.

2.2 LAN-attached Cisco routers The z/OS system can also directly LAN attach to any Cisco router with a compatible interface media. It can do this with special interface cards known as Open System Adapters (OSAs). Modern OSAs come in two very different flavors: OSA-2 and OSA-Express. These interface cards install directly into the z/OS complex I/O slots just like a channel card. Most often the network interface types will be Ethernet (either Fast Ethernet or Gigabit Ethernet). The cable or fiber will Chapter 2. Connecting z/OS systems and Cisco routers

35

go from the OSA adapter to either an Ethernet switch or a router. Generally speaking, a hub should not be used because it forces a half-duplex shared connection. The switch or router should be configured to run full-duplex and the full media-capable speed (10 Mbps, 100 Mbps or 1000 Mbps). Cisco has a wide variety of high performance routers and switches that can be attached to OSAs. This is not limited to the Cisco 7000/7500 and Cisco 7200 series routers previously discussed.

2.2.1 OSA-2 adapters OSA-2 adapters support either LAN Channel Station (LCS) channel protocol or MPC+ (MPCOSA) channel protocol. Even though the OSA-2 adapter is installed in the z/OS processor complex, it still utilizes ESCON protocol and channel programs. It supports Ethernet, 802.3, token-ring, FDDI and ATM LANE network attachment types.

2.2.2 OSA-Express adapters The next generation of Open Systems Adapter is called OSA-Express. The original OSA-Express implementation within CS for OS/390 included support only for Gigabit Ethernet (GbE). CS for z/OS IP now includes support for both Fast Ethernet and high-speed token-ring. OSA-Express is only supported in the S/390 Parallel Enterprise Servers Generation 5 (G5) and higher family of processors. Gigabit Ethernet employs the same Carrier Sense Multiple Access with Collision Detection (CSMA/CD) protocol, the same frame format and the same frame size as 10 and 100 Mbps Ethernet. The primary difference is the transmission rate being 1000 Mbps. The OSA-Express provides interoperability with all sorts of networking media through a switch or router.

Queued Direct I/O The OSA-Express feature uses the new I/O architecture called Queued Direct I/O. This architecture provides a highly optimized data transfer interface that eliminates the need for channel command words (CCWs) and interrupts during data transmission, resulting in accelerated TCP/IP packet transmission. This is done by providing a data queue between TCP/IP and the OSA-Express. OSA-Express utilizes a direct memory access (DMA) protocol to transfer the data to and from TCP/IP. This new design eliminates ESCON bus performance limitations. The OSA-Express also provides offloading of IP processing from the host, which is called IP assist (IPA). With IP assist, the OSA-Express offloads the following processing from the host:

36

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 All MAC handling, which is done in the card. TCP/IP no longer has to fully format the datagrams for LAN-specific media. 򐂰 ARP processing for identifying the physical address. 򐂰 Packet filtering, screening and discarding of LAN packets.

QDIO also provides a function that assigns a priority value to each outbound datagram and which attempts to provide preferential service to the higher-priority data. This allows Quality of Service (QOS) policies to be implemented at the device level. The data transfer model of the OSA-Express and the OSA-2 interfaces is shown in Figure 2-1.

HOST MEMORY

HOST MEMORY

IOP

OSA-EXPRESS

CHANNEL

CONTROL UNIT

OSA-2

Figure 2-1 OSA-Express Gigabit Ethernet and OSA-2 host memory access

Chapter 2. Connecting z/OS systems and Cisco routers

37

2.3 Summary You should consider using a channel-attached Cisco router with a CMCC when: 򐂰 The processor does not support the OSA-Express 򐂰 When aggregation of SNA and IP traffic is needed

The CMCC router can aggregate both SNA and IP traffic and efficiently transport both types of data across the ESCON channel(s). 򐂰 Little or no space is left in the processor complex’s I/O section to install an OSA

CMCCs can be attached through ESCON directors to existing ESCON channels, thus not requiring a dedicated ESCON I/O slot. 򐂰 Other features in the CMCC (such as its TN3270) are desired to offload z/OS processing to the CMCC

You should consider using an OSA-Express when: 򐂰 The processor is an IBM Generation 5 or later 򐂰 High throughput TCP/IP traffic is desired

SNA traffic is supported as long as it is using Enterprise Extender (also known as EE, EX or HPR/IP). Legacy traffic is supported on the lower-speed OSA-Express adapters, but it does not use QDIO or DMA and represents an expensive use of a valuable ESCON I/O slot. 򐂰 Space is available in the processor complex’s I/O area for the OSA adapter

Each of the processor complex’s ESCON I/O slots can have up to four ESCON connections. These four possible ESCON connections must be weighed against a single OSA adapter.

38

Networking with z/OS and Cisco Routers: An Interoperability Guide

3

Chapter 3.

Routing overview One of the major functions of a network protocol such as TCP/IP is to connect together a number of disparate networks efficiently. These networks may include LANs and WANs, fast and slow, reliable and unreliable, inexpensive and expensive connections. The simplest way to connect them together is to bridge them. However, this results in every part of the network receiving all traffic and leads to slower links being overloaded and perhaps to network failure altogether. What is needed, particularly when all these networks are joined together in the worldwide network such as the Internet, is some form of intelligence at the boundaries of all these networks, which can look at the packets flowing and make rational decisions as to where and how they should be forwarded. This function is known as IP routing. Routing allows you to create networks that can be managed separately but are still linked and can communicate with one another. In order to route packets, each network interface on a machine (such as a z/OS system or Cisco router) on the network has a unique IP address. Whenever a packet is sent, the destination and source addresses are included in the packet’s header information. Routers examine the destination address to see if there is a matching address in their routing tables. These tables are either created by the system administrator, or built dynamically using information received from other routers, or (often) a combination of both.

© Copyright IBM Corp. 2002

39

Along with the IP address of each interface, a subnet mask is also defined, which indicates to the network the distinction between the part of the address that represents the subnetwork (a LAN or point-to-point connection, for example) and the part that represents the host (the network interface on a device). The use of a subnet mask allows flexibility in network design (networks may be small or large), but the proper use of routing tables requires that the subnetwork address and the host portion of an IP address be discernible. Figure 3-1 shows that the routing function in a TCP/IP network is performed at the internetwork layer (layer 3) of the architectural model. Each node on the connection path from client to server must: 򐂰 Inspect the destination address of the packet (192.168.200.1, in this example) 򐂰 Divide it into subnetwork and host addresses 򐂰 Determine whether the subnetwork (represented by the DLC layer) is directly attached to it 򐂰 If not, forward the packet to the next router as defined by the local routing table 򐂰 If so, forward the packet directly to the destination over the appropriate DLC

Router 1

Router 2

Router 3

IP Client Address: 192.168.109.88

Packet Addressed to Server IP Address

IP Server Address: 192.168.200.1

IP

IP

IP

Link

Link

Link

Physical

Physical

Physical

5235\523507

Figure 3-1 Network routing flow

40

Networking with z/OS and Cisco Routers: An Interoperability Guide

Every IP host can route IP datagrams. Each maintains an IP routing table that indicates the IP address of the next hop in order to route an IP datagram. Each host or router along the way needs to know only the next hop IP address in the path. Routing tables, of course, need to be maintained in both directions. There are two types of route entries in TCP/IP: static and dynamic. The Internet is logically divided into autonomous systems, which are essentially individual customer’s networks. There are two classes of dynamic routing protocols used in IP: exterior gateway protocols (EGPs) are used between autonomous systems and interior gateway protocols (IGPs) are used within the autonomous systems.

3.1 Static routing With static routing, the paths to reach networks and hosts are hard-coded in a routing file and accessible to a TCP/IP host. Each host has its own set of definitions. If something changes (a route for a host or network is added or deleted), then the static routes of some or all hosts in a network may need to be updated. Static routing is suitable for small, stable networks, but quite inadequate for large or often changing scenarios. With static routing, however, one has better administrative control over address allocation and resource access. In CS for z/OS IP, static routes are configured in the TCPIP profile by coding either a BEGINROUTES/ENDROUTES block statement or by making use of the GATEWAY statement. The BEGINROUTES block was created in CS for OS/390 V2R10 IP to overcome inconsistencies and the often awkward syntax of the GATEWAY statement. It defines static IP routing table entries in standard BSD format. For more information, please consult z/OS V1R2.0 CS: IP Configuration Reference, SC31-8776.

3.2 Dynamic routing Dynamic routing removes the need for coding and manually maintaining static routing tables. All router addressing and path information is built dynamically. These tables are automatically exchanged between routers in a network. This information sharing enables routers to calculate the best path through the network to any given destination.

Chapter 3. Routing overview

41

When taking advantage of dynamic routing, an IP host employs the use of a routing daemon. The routing daemon adds, deletes, or changes route entries within the host’s routing table. Additionally, a routing daemon executing on a particular host communicates with routing daemons on neighbor hosts to exchange topological routing information. Ultimately, this information exchange leads to the update of routing table entries. The most common dynamic routing algorithms include: 򐂰 Distance vector such as that used in Routing Information Protocol (RIP) 򐂰 Link state such as that used in Open Shortest Path First (OSPF) 򐂰 Hybrid such as that used in Enhanced Interior Gateway Routing Protocol (EIGRP)

Distance vector routing Distance vector algorithms are examples of dynamic routing protocols. These algorithms allow each device in the network to build and maintain a local IP routing table automatically. The principle behind distance vector routing is simple. Each router in the internetwork maintains the distance or cost from itself to every known destination. This value represents the overall desirability of the path. Paths associated with a smaller cost value are more attractive to use than paths associated with a larger value. The path represented by the smallest cost becomes the preferred path to reach the destination. This information is maintained in a distance vector table. The table is periodically advertised to each neighboring router. Each router processes these advertisements to determine the best paths through the network. The main advantage of distance vector algorithms is that they are typically easy to implement and debug. They are very useful in small networks with limited redundancy. However, there are several disadvantages with this type of protocol: 򐂰 During an adverse condition, the length of time for every device in the network to produce an accurate routing table is called the convergence time. In large, complex internetworks using distance vector algorithms, this time can be excessive. While the routing tables are converging, networks are susceptible to inconsistent routing behavior. This can cause routing loops or other types of unstable packet forwarding. 򐂰 To reduce convergence time, a limit is often placed on the maximum number of hops contained in a single route. Valid paths exceeding this limit are not usable in distance vector networks.

42

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Distance vector routing tables are periodically transmitted to neighboring devices. They are sent even if no changes have been made to the contents of the table. This may cause noticeable periods of increased utilization in reduced capacity environments.

Enhancements to the basic distance vector algorithm have been developed to reduce the convergence and instability exposures. RIP and BGP are two popular examples of distance vector routing protocols.

Link state routing The growth in the size and complexity of networks in recent years has necessitated the development of more robust routing algorithms. These algorithms address the shortcoming observed in distance vector protocols. These algorithms use the principle of a link state to determine network topology. A link state is the description of an interface on a router (for example, IP address, subnet mask, type of network) and its relationship to neighboring routers. The collection of these link states forms a link state database. The process used by link state algorithms to determine network topology is straightforward: 򐂰 Each router identifies all other routing devices on the directly connected networks. 򐂰 Each router advertises a list of all directly connected network links and the associated cost of each link. This is performed through the exchange of link state advertisements (LSAs) with other routers in the network. 򐂰 Using these advertisements, each router creates a database detailing the current network topology. The topology database in each router is identical. 򐂰 Each router uses the information in the topology database to compute the most desirable routes to each destination network. This information is used to update the IP routing table.

The OSPF protocol is a popular example of a link state routing protocol.

Hybrid routing The last category of routing protocols is hybrid protocols. These protocols attempt to combine the positive attributes of both distance vector and link state protocols. Like distance vector, hybrid protocols use metrics to assign a preference to a route. However, the metrics are more accurate than conventional distance vector protocols. Like link state algorithms, routing updates in hybrid

Chapter 3. Routing overview

43

protocols are event driven rather than periodic. Networks using hybrid protocols tend to converge more quickly than networks using distance vector protocols. Finally, these protocols potentially reduce the overhead of link state updates and distance vector advertisements. Although open hybrid protocols exist, this category is almost exclusively associated with the proprietary EIGRP algorithm. EIGRP was developed by Cisco Systems, Inc.

Route redistribution Redistribution is the term for importing routing protocol data from one specific routing protocol to another. Defaults can be assigned so that one routing protocol can use the same metric for all redistributed routes, thereby simplifying the routing redistribution mechanism. In the case of this book, the actual router network will be running an EIGRP routing protocol. The z/OS system does not run EIGRP so the Cisco router that is connected to it will run both EIGRP and OSPF. The router will redistribute the EIGRP routes into the OSPF routing protocol and vice versa. Then the z/OS system will have knowledge of the underlying EIGRP network and the routers will have knowledge of any OSPF routes that exist.

CS for z/OS IP support CS for z/OS IP implements both RIP and OSPF dynamic routing protocols. Before CS for OS/390 V2R6, only RIP was available and was implemented by the ORouteD server (in V2R5) and by RouteD (in previous releases). In V2R6, a new strategic server called OMPROUTE was introduced, which runs under UNIX System Services and provides both RIP and OSPF functionality. ORouteD may be withdrawn from CS for z/OS IP in due course, leaving OMPROUTE as the only routing daemon. Both ORouteD and OMPROUTE are UNIX System Services applications and require the Hierarchical File System (HFS). Often, you will come across the term RouteD meaning routing daemon. This is a common term for a RIP server. ORouteD and its predecessor RouteD are so named because they are indeed RIP servers. The expression GateD is also used, usually for a router with more function, such as OSPF or EGP capability.

3.3 RIP RIP is an interior gateway protocol (IGP) designed to manage relatively small networks. RIP uses a hop count (distance vector) to determine the best possible route to a network or host. The hop count is also known as the routing metric. A router is defined as being zero hops away from its directly connected networks,

44

Networking with z/OS and Cisco Routers: An Interoperability Guide

one hop from networks that can be reached through one gateway (router), and so on. In RIP, a hop count of 16 means infinity, or the destination cannot be reached. Thus, very large networks with more than 15 hops between potential partners cannot make use of RIP.

3.3.1 RIP Version 1 The distance vector table describes each destination network. The entries in this table contain the following information: 򐂰 The destination network (vector) described by this entry in the table. 򐂰 The associated cost (distance) of the most attractive path to reach this destination.

This provides the ability to differentiate between multiple paths to a destination. In this context, the terms distance and cost can be misleading. They have no direct relationship to physical distance or monetary cost. 򐂰 The IP address of the next-hop device used to reach the destination network.

Each time a routing table advertisement is received by a device, it is processed to determine if any destination can be reached via a lower cost path. This is done using the RIP distance vector algorithm. The algorithm can be summarized as: 򐂰 At router initialization, each device contains a distance vector table listing each directly attached networks and configured cost. Typically, each network is assigned a cost of 1. This represents a single hop through the network. The total number of hops in a route is equal to the total cost of the route. However, cost can be changed to reflect other measurements, such as utilization, speed, or reliability. 򐂰 Each router periodically (typically every 30 seconds) transmits its distance vector table to each of its neighbors. The router may also transmit the table when a topology change occurs. 򐂰 Each router uses this information to update its local distance vector table:

– The total cost to each destination is calculated by adding the cost reported in a neighbor's distance vector table to the cost of the link to that neighbor. The path with the least cost is stored in the distance vector table. – All updates automatically supersede the previous information in the distance vector table. This allows RIP to maintain the integrity of the routes in the routing table. 򐂰 The IP routing table is updated to reflect the least-cost path to each destination.

Figure 3-2 illustrates the distance vector tables for three routers within a simple internetwork. Chapter 3. Routing overview

45

N4

R3

R4

R2 R1

R5 N5

N2 N3

N1

N6 Router R4 Distance Vector Table

Router R3 Distance Vector Table

Router R2 Distance Vector Table

Next Hop

Metric

Net

Next Hop

Metric

Net

Next Hop

Metric

N1

R1

2

N1

R2

3

N1

R3

4

N2

Direct

1

N2

R2

2

N2

R3

3

N3

Direct

1

N3

Direct

1

N3

R3

2

N4

R3

2

N4

Direct

1

N4

Direct

1

N5

R3

3

N5

R4

2

N5

Direct

1

N6

R3

4

N6

R4

3

N6

R5

2

Net

Figure 3-2 A sample distance vector routing table

Convergence and counting to infinity Given sufficient time, this algorithm will correctly calculate the distance vector table on each device. However, during this convergence time, erroneous routes may propagate through the network. This problem is shown in Figure 3-3.

A

Target Network

(1) (10) C

(1)

D

(1)

(1) B (1) (n) = Network Cost Figure 3-3 Counting to infinity sample network

46

Networking with z/OS and Cisco Routers: An Interoperability Guide

This network contains four interconnected routers. Each link has a cost of 1, except for the link connecting router C and router D; this link has a cost of 10. The costs have been defined so that forwarding packets on the link connecting router C and router D is undesirable. Once the network has converged, each device has routing information describing all networks. For example, to reach the target network, the routers have the following information: 򐂰 򐂰 򐂰 򐂰

Router D to the target network: Directly connected network. Metric 1. Router B to the target network: Next hop is router D. Metric is 2. Router C to the target network: Next hop is router B. Metric is 3. Router A to the target network: Next hop is router B. Metric is 3.

Consider an adverse condition where the link connecting router B and router D fails. Once the network has re-converged, all routes use the link connecting router C and router D to reach the target network. However, this re-convergence time can be considerable. Figure 3-4 illustrates how the routes to the target network are updated throughout the re-convergence period. For simplicity, this figure assumes all routers send updates at the same time. Time Direct

1

Direct

1

6

C

11

C

12

A

6

A

11

D

11

C

6

C

11

C

12

Direct

1

Direct

1

Direct

1

B: Unreachable

C

4

C

5

C

C: B

3

A

4

A

5

A: B

3

C

4

C

5

D: Direct

1

....

....

Figure 3-4 Network convergence sequence

Re-convergence begins when router B notices that the route to router D is unavailable. Router B is able to immediately remove the failed route because the link has timed out. However, a considerable amount of time passes before the other routers remove their references to the failed route. This is described in the sequence of updates shown in Figure 3-4: 1. Prior to the adverse condition occurring, router A and router C have a route to the target network via router B. 2. The adverse condition occurs when the link connecting router D and router B fails. Router B recognizes that its preferred path to the target network is now invalid.

Chapter 3. Routing overview

47

3. Router A and router C continue to send updates reflecting the route via router B. This route is actually invalid since the link connecting router D and router B has failed. 4. Router B receives the updates from router A and router C. Router B believes it should now route traffic to the target network through either router A or router C. In reality, this is not a valid route, since the routes in router A and router C are vestiges of the previous route through router B. 5. Using the routing advertisement sent by router B, router A and router C are able to determine that the route via router B has failed. However, router A and router C now believe the preferred route exists via the partner. Network convergence continues as router A and router C engage in an extended period of mutual deception. Each device claims to be able to reach the target network via the partner device. The path to reach the target network now contains a routing loop. The manner in which the costs in the distance vector table increase gives rise to the term counting to infinity. The costs continues to increase, theoretically to infinity. To minimize this exposure, whenever a network is unavailable, the incrementing of metrics through routing updates must be halted as soon as it is practical to do so. In a RIP environment, costs continue to increase until they reach a maximum value of 16. This limit is defined in the RFC. A side effect of the metric limit is that it also limits the number of hops a packet can traverse from source network to destination network. In a RIP environment, any path exceeding 15 hops is considered invalid. The routing algorithm will discard these paths. There are two enhancements to the basic distance vector algorithm that can minimize the counting to infinity problem: 򐂰 Split horizon with poison reverse 򐂰 Triggered updates

These enhancements do not impact the maximum metric limit.

Split horizon The excessive convergence time caused by counting to infinity may be reduced with the use of split horizon. This rule dictates that routing information is prevented from exiting the router on an interface through which the information was received.

48

Networking with z/OS and Cisco Routers: An Interoperability Guide

The basic split horizon rule is not supported in RFC 1058. Instead, the standard specifies the enhanced split horizon with poison reverse algorithm. The basic rule is presented here for background and completeness. The enhanced algorithm is reviewed in the next section. The incorporation of split horizon modifies the sequence of routing updates shown in Figure 3-4 on page 47. The new sequence is shown in Figure 3-5. The tables show that convergence occurs considerably faster using the split horizon rule. Time

D: Direct

1

Direct

1

Direct

1

B: Unreachable

Unreachable

Unreachable

C: B

3

A

4

D

A: B

3

C

4

Unreachable

11

Direct

1

C

12

D

11

C

12

Note: Faster Routing Table Convergence

Figure 3-5 Network convergence with split horizon

The limitation to this rule is that each node must wait for the route to the unreachable destination to time out before the route is removed from the distance vector table. In RIP environments, this timeout is at least three minutes after the initial outage. During that time, the device continues to provide erroneous information to other nodes about the unreachable destination. This propagates routing loops and other routing anomalies.

Split horizon with poison reverse Poison reverse is an enhancement to the standard split horizon implementation. It is supported in RFC 1058. With poison reverse, all known networks are advertised in each routing update. However, those networks learned through a specific interface are advertised as unreachable in the routing announcements sent out to that interface. This drastically improves convergence time in complex, highly redundant environments. With poison reverse, when a routing update indicates that a network is unreachable, routes are immediately removed from the routing table. This breaks erroneous, looping routes before they can propagate through the network. This approach differs from the basic split horizon rule where routes are eliminated through timeouts.

Chapter 3. Routing overview

49

Poison reverse has no benefit in networks with no redundancy (single path networks). One disadvantage to poison reverse is that it may significantly increase the size of routing announcements exchanged between neighbors. This is because all routes in the distance vector table are included in each announcement. While this is generally not an issue on local area networks, it can cause periods of increased utilization on lower-capacity WAN connections.

Triggered updates Like split horizon with poison reverse, algorithms implementing triggered updates are designed to reduce network convergence time. With triggered updates, whenever a router changes the cost of a route, it immediately sends the modified distance vector table to neighboring devices. This mechanism ensures that topology change notifications are propagated quickly, rather than at the normal periodic interval. Triggered updates are supported in RFC 1058.

RIP-1 limitations There are a number of limitations observed in RIP environments: 򐂰 Path cost limits: The resolution to the counting-to-infinity problem enforces a maximum cost for a network path. This places an upper limit on the maximum network diameter. Networks requiring paths greater than 15 hops must use an alternate routing protocol. 򐂰 Network-intensive table updates: Periodic broadcasting of the distance vector table can result in increased utilization of network resources. This can be a concern in reduced-capacity segments. 򐂰 Relatively slow convergence: RIP, like other distance vector protocols, is relatively slow to converge. The algorithms rely on timers to initiate routing table advertisements. 򐂰 No support for variable-length subnet masking: Route advertisements in a RIP environment do not include subnet masking information. This makes it impossible for RIP networks to deploy variable-length subnet masks.

3.3.2 RIP Version 2 The IETF recognizes two versions of RIP: 򐂰 RIP Version 1 (RIP-1): This protocol is described in RFC 1058. 򐂰 RIP Version 2 (RIP-2): RIP-2 is also a distance vector protocol designed for use within an autonomous system (AS). It was developed to address the

50

Networking with z/OS and Cisco Routers: An Interoperability Guide

limitations observed in RIP-1. RIP-2 is described in RFC 1723. The standard was published in late 1994. In practice, the term RIP refers to RIP-1. Whenever the reader encounters the term RIP in TCP/IP literature, it is safe to assume the reference is to RIP Version 1 unless otherwise stated. This same convention is used in this document. However, when the two versions are being compared, the term RIP-1 is used to avoid confusion. RIP-2 is similar to RIP-1. It was developed to extend RIP-1 functionality in small networks. RIP-2 provides these additional benefits not available in RIP-1: 򐂰 Support for CIDR and VLSM: RIP-2 supports supernetting (that is, CIDR) and variable-length subnet masking. This support was the major reason the new standard was developed. This enhancement positions the standard to accommodate a degree of addressing complexity not supported in RIP-1. 򐂰 Support for multicasting: RIP-2 supports the use of multicasting rather than simple broadcasting of routing announcements. This reduces the processing load on hosts not listening for RIP-2 messages. To ensure interoperability with RIP-1 environments, this option is configured on each network interface. 򐂰 Support for authentication: RIP-2 supports authentication of any node transmitting route advertisements. This prevents fraudulent sources from corrupting the routing table. 򐂰 Support for RIP-1: RIP-2 is fully interoperable with RIP-1. This provides backward-compatibility between the two standards.

As noted in the RIP-1 section, one notable shortcoming in the RIP-1 standard is the implementation of the metric field. RIP-1 specifies the metric as a value between 0 and 16. To ensure compatibility with RIP-1 networks, RIP-2 preserves this definition. In both standards, networks paths with a hop-count greater than 15 are interpreted as unreachable.

RIP-2 limitations RIP-2 was developed to address many of the limitations observed in RIP-1. However, the path cost limits and slow convergence inherent in RIP-1 networks are also concerns in RIP-2 environments. In addition to these concerns, there are limitations to the RIP-2 authentication process. The RIP-2 standard does not encrypt the authentication password. It is transmitted in clear text. This makes the network vulnerable to attack by anyone with direct physical access to the environment.

Chapter 3. Routing overview

51

3.4 OSPF Where RIP is based on distance vectors (hop counts), OSPF is based on link states. In other words, OSPF routing tables contain details of the connections between routers, their status (active or inactive), their cost (desirability for routing) and so on. Updates are broadcast whenever a link changes status, and consist merely of a description of the changed status. This is in contrast with RIP where broadcasts occur every 30 seconds and contain the complete distance vector tables. Because of this difference, and for other reasons such as the lack of the 16-hop limit, OSPF is more suitable for large networks than RIP. In fact, OSPF is similar in concept to APPN, where the routers (network nodes) maintain the network topology and broadcast any changes whenever they occur. OSPF, like APPN, can divide its network into topology subnets (known as areas) within which broadcasts are confined. The current version (V2) of OSPF is described fully in RFC 2328. OSPF provides a number of features not found in distance vector protocols. Support for these features has made OSPF a widely deployed routing protocol in large networking environments. In fact, RFC 1812 (Requirements for IPv4 Routers) lists OSPF as the only required dynamic routing protocol. The following features contribute to the continued acceptance of the OSPF standard: 򐂰 Equal cost load balancing: The simultaneous use of multiple paths may provide more efficient utilization of network resources. 򐂰 Logical partitioning of the network: This reduces the propagation of outage information during adverse conditions. It also provides the ability to aggregate routing announcements that limit the advertisement of unnecessary subnet information. 򐂰 Support for authentication: OSPF supports the authentication of any node transmitting route advertisements. This prevents fraudulent sources from corrupting the routing tables. 򐂰 Faster convergence time: OSPF provides instantaneous propagation of routing changes. This expedites the convergence time required to update network topologies. 򐂰 Support for CIDR and VLSM: This allows the network administrator to efficiently allocate IP address resources.

OSPF is a link state protocol. As with other link state protocols, each OSPF router executes the SPF algorithm to process the information stored in the link state database. The algorithm produces a shortest-path tree detailing the preferred routes to each destination network.

52

Networking with z/OS and Cisco Routers: An Interoperability Guide

3.4.1 OSPF terminology OSPF uses specific terminology to describe the operation of the protocol.

OSPF areas OSPF networks are divided into a collection of areas. An area consists of a logical grouping of networks and routers. The area may coincide with geographic or administrative boundaries. Each area is assigned a 32-bit area ID. Subdividing the network provides the following benefits: 򐂰 Within an area, every router maintains an identical topology database describing the routing devices and links within the area. These routers have no knowledge of topologies outside the area. They are only aware of routes to these external destinations. This reduces the size of the topology database maintained by each router. 򐂰 Areas limit the potentially explosive growth in the number of link state updates. Most LSAs are distributed only within an area. 򐂰 Areas reduce the CPU processing required to maintain the topology database. The SPF algorithm is limited to managing changes within the area.

Backbone area and area 0 All OSPF networks contain at least one area. This area is known as area 0 or the backbone area. may be created based on network topology or other design requirements. In networks containing multiple areas, the backbone physically connects to all other areas. OSPF expects all areas to announce routing information directly into the backbone. The backbone then announces this information into other areas. Figure 3-6 depicts a network with a backbone area and four additional areas.

Intra-Area, Area Border and AS Boundary Routers There are three classifications of routers in an OSPF network. Figure 3-6 illustrates the interaction of these devices.

Chapter 3. Routing overview

53

As External Links

ASBR

AS 10

Area 1

ABR

IA

IA

Area 0

ABR

ABR

Area 2

ABR Area Area44

Area 3

ASBR

As External Links

Key ASBR = AS Border Router ABR = Area Border Router IA = Intra-Area Router

Figure 3-6 OSPF router types

򐂰 Intra-Area Routers: This class of router is logically located entirely within an OSPF area. Intra-area routers maintain a topology database for their local area. 򐂰 Area Border Routers (ABR): This class of router is logically connected to two or more areas. One area must be the backbone area. An ABR is used to interconnect areas. They maintain a separate topology database for each attached area. ABRs also execute separate instances of the SPF algorithm for each area. 򐂰 AS Boundary Routers (ASBR): This class of router is located at the periphery of an OSPF internetwork. It functions as a gateway exchanging reachability between the OSPF network and other routing environments. ASBRs are responsible for announcing autonomous system (AS) external link advertisements through the AS. External link advertisements are further detailed in 3.4.4, “OSPF route redistribution” on page 62.

54

Networking with z/OS and Cisco Routers: An Interoperability Guide

Each router is assigned a 32-bit router ID (RID). The RID uniquely identifies the device. One popular implementation assigns the RID from the lowest-numbered IP address configured on the router.

Physical network types OSPF categorizes network segments into three types. The frequency and types of communication occurring between OSPF devices connected to these networks is impacted by the network type: 򐂰 Point-to-point networks directly link two routers. 򐂰 Multi-access networks support the attachment of more than two routers. They are further subdivided into two types:

– Broadcast networks have the capability of simultaneously directing a packet to all attached routers. This capability uses an address that is recognized by all devices. Ethernet and token-ring LANs are examples of OSPF broadcast multi-access networks. – Non-broadcast networks do not have broadcasting capabilities. Each packet must be specifically addressed to every router in the network. X.25 and frame relay networks are examples of OSPF non-broadcast multi-access networks. 򐂰 Point-to-multipoint networks are a special case of multi-access, non-broadcast networks. In a point-to-multipoint network, a device is not required to have a direct connection to every other device. This is known as a partially meshed environment.

Neighbor routers and adjacencies Routers that share a common network segment establish a neighbor relationship on the segment. Routers must agree on the following information to become neighbors: 򐂰 Area-id: The routers must belong to the same OSPF area. 򐂰 Authentication: If authentication is defined, the routers must specify the same password. 򐂰 Hello and dead intervals: The routers must specify the same timer intervals used in the Hello protocol. 򐂰 Stub area flag: The routers must agree that the area is configured as a stub area. Stub areas are further described in 3.4.5, “OSPF stub areas” on page 64.

Chapter 3. Routing overview

55

Once two routers have become neighbors, an adjacency relationship can be formed between the devices. Neighboring routers are considered adjacent when they have synchronized their topology databases. This occurs through the exchange of link state information.

Designated and backup designated router The exchange of link state information between neighbors can create significant quantities of network traffic. To reduce the total bandwidth required to synchronize databases and advertise link state information, a router does not necessarily develop adjacencies with every neighboring device: 򐂰 Multi-access networks: Adjacencies are formed between an individual router and the (backup) designated router. 򐂰 Point-to-point networks: An adjacency is formed between both devices.

Each multi-access network elects a designated router (DR) and backup designated router (BDR). The DR performs two key functions on the network segment: 򐂰 It forms adjacencies with all routers on the multi-access network. This causes the DR to become the focal point for forwarding LSAs. 򐂰 It generates network link advertisements listing each router connected to the multi-access network. Additional information regarding network link advertisements is contained in “Link state advertisements and flooding” on page 57.

The BDR forms the same adjacencies as the designated router. It assumes DR functionality when the DR fails. Each router is assigned an 8-bit priority, indicating its ability to be selected as the DR or BDR. A router priority of zero indicates that the router is not eligible to be selected. The priority is configured on each interface in the router. Figure 3-7 illustrates the relationship between neighbors. No adjacencies are formed between routers that are not selected to be the DR or BDR.

56

Networking with z/OS and Cisco Routers: An Interoperability Guide

Adjacent Neighbors

DR

Other (1)

Other

BDR

Neighbors

Figure 3-7 Relationship between adjacencies and neighbors

Link state database The link state database is also called the topology database. It contains the set of link state advertisements describing the OSPF network and any external connections. Each router within the area maintains an identical copy of the link state database.

Link state advertisements and flooding The contents of an LSA describes an individual network component (that is, router, segment, or external destination). LSAs are exchanged between adjacent OSPF routers. This is done to synchronize the link state database on each device. When a router generates or modifies an LSA, it must communicate this change throughout the network. The router starts this process by forwarding the LSA to each adjacent device. Upon receipt of the LSA, these neighbors store the information in their link state database and communicate the LSA to their neighbors. This store-and-forward activity continues until all devices receive the update. This process is called reliable flooding. Two steps are taken to ensure this flooding effectively transmits changes without overloading the network with excessive quantities of LSA traffic: 򐂰 Each router stores the LSA for a period of time before propagating the information to its neighbors. If, during that time, a new copy of the LSA arrives, the router replaces the stored version. However, if the new copy is outdated, it is discarded. 򐂰 To ensure reliability, each link state advertisement must be acknowledged. Multiple acknowledgments can be grouped together into a single

Chapter 3. Routing overview

57

acknowledgment packet. If an acknowledgment is not received, the original link state update packet is retransmitted. Link state advertisements contain five types of information. Together these advertisements provide the necessary information needed to describe the entire OSPF network and any external environments: 򐂰 Router LSAs: This type of advertisement describes the state of the router's interfaces (links) within the area. They are generated by every OSPF router. The advertisements are flooded throughout the area. 򐂰 Network LSAs: This type of advertisement lists the routers connected to a multi-access network. They are generated by the DR on a multi-access segment. The advertisements are flooded throughout the area. 򐂰 Summary LSAs (Type-3 and Type-4): This type of advertisement is generated by an ABR. There are two types of summary link advertisements:

– Type-3 summary LSAs describe routes to destinations in other areas within the OSPF network (inter-area destinations). – Type-4 summary LSAs describe routes to ASBRs. Summary LSAs are used to exchange reachability information between areas. Normally, information is announced into the backbone area. The backbone then injects this information into other areas. 򐂰 AS external LSAs: This type of advertisement describes routes to destinations external to the OSPF network. They are generated by an ASBR. The advertisements are flooded throughout all areas in the OSPF network.

Figure 3-8 illustrates the different types of link state advertisements.

58

Networking with z/OS and Cisco Routers: An Interoperability Guide

Router Links

Network Links

Router

DR

Advertised by router Describes state/cost of routers' links

Summary Links

Area X

ABR

Advertised by designated router Describes all routers attached to network

External Links

Area 0

Advertised by ABR Describes inter-area and ASBR reachability

Area X

ASBR

Area 0

Advertised by ASBR Describes networks outside of OSPF AS

Figure 3-8 OSPF link state advertisements

3.4.2 Neighbor communication OSPF is responsible for determining the optimum set of paths through a network. To accomplish this, each router exchanges LSAs with other routers in the network. The OSPF protocol defines a number of activities to accomplish this information exchange: 򐂰 Discovering neighbors 򐂰 Electing a designated router 򐂰 Establishing adjacencies and synchronizing databases

The five OSPF packet types are used to support these information exchanges.

Chapter 3. Routing overview

59

Discovering neighbors - the OSPF Hello protocol The Hello protocol discovers and maintains relationships with neighboring routers. Hello packets are periodically sent out to each router interface. The packet contains the RID of other routers whose hello packets have already been received over the interface. When a device sees its own RID in the hello packet generated by another router, these devices establish a neighbor relationship. The hello packet also contains the router priority, DR identifier, and BDR identifier. These parameters are used to elect the DR on multi-access networks.

Electing a designated router All multi-access networks must have a DR. A BDR may also be selected. The backup ensures there is no extended loss of routing capability if the DR fails. The DR and BDR are selected using information contained in hello packets. The device with the highest OSPF router priority on a segment becomes the DR for that segment. The same process is repeated to select the BDR. In case of a tie, the router with the highest RID is selected. A router declared the DR is ineligible to become the BDR. Once elected, the DR and BDR proceed to establish adjacencies with all routers on the multi-access segment.

Establishing adjacencies and synchronizing databases Neighboring routers are considered adjacent when they have synchronized their link state databases. A router does not develop an adjacency with every neighboring device. On multi-access networks, adjacencies are formed only with the DR and BDR. This is a two step process: 򐂰 Step 1: Database exchange process

The first phase of database synchronization is the database exchange process. This occurs immediately after two neighbors attempt to establish an adjacency. The process consists of an exchange of database description packets. The packets contain a list of the LSAs stored in the local database. During the database exchange process, the routers form a master/slave relationship. The master is the first to transmit. Each packet is identified by a sequence number. Using this sequence number, the slave acknowledges each database description packet from the master. The slave also includes its own set of link state headers in the acknowledgments.

60

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Step 2: Database loading

During the database exchange process, each router notes the link state headers for which the neighbor has a more current instance (all advertisements are time stamped). Once the process is complete, each router requests the more current information from the neighbor. This request is made with a link state request packet. When a router receives a link state request, it must reply with a set of link state update packets providing the requested LSA. Each transmitted LSA is acknowledged by the receiver. This process is similar to the reliable flooding procedure used to transmit topology changes throughout the network. Every LSA contains an age field indicating the time in seconds since the origin of the advertisement. The age continues to increase after the LSA is installed in the topology database. It also increases during each hop of the flooding process. When the maximum age is reached, the LSA is no longer used to determining routing information and is discarded from the link state database. This age is also used to distinguish between two otherwise identical copies of an advertisement.

3.4.3 OSPF virtual links and transit areas Virtual links are used when a network does not support the standard OSPF network topology. This topology defines a backbone area that directly connects to each additional OSPF area. The virtual link addresses two conditions: 򐂰 It may logically connect the backbone area when it is not contiguous. 򐂰 It may connect an area to the backbone when a direct connection does not exist.

A virtual link is established between two ABRs sharing a common non-backbone area. The link is treated as a point-to-point link. The common area is known as a transit area. Figure 3-9 illustrates the interaction between virtual links and transit areas when used to connect an area to the backbone.

Chapter 3. Routing overview

61

Area 0

ABR

ABR

rtu Vi

a

ink lL

Area 2 Transit Area

Area 1

Figure 3-9 OSPF virtual link and transit areas

This diagram shows that area 1 does not have a direct connection to the backbone. Area 2 can be used as a transit area to provide this connection. A virtual link is established between the two ABRs located in area 2. Establishing this virtual link logically extends the backbone area to connect to area 1. A virtual link is used only to transmit routing information. It does not carry regular traffic between the remote area and the backbone. This traffic, in addition to the virtual link traffic, is routed using the standard intra-area routing within the transit area.

3.4.4 OSPF route redistribution Route redistribution is the process of introducing external routes into an OSPF network. These routes may be either static routes or routes learned via another routing protocol. They are advertised into the OSPF network by an ASBR. These routes become OSPF external routes. The ASBR advertises these routes by flooding OSPF AS external LSAs throughout the entire OSPF network. The routes describe an end-to-end path consisting of two portions:

62

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 External portion: This is the portion of the path external to the OSPF network. When these routes are distributed into OSPF, the ASBR assigns an initial cost. This cost represents the external cost associated with traversing the external portion of the path. 򐂰 Internal portion: This is the portion of the path internal to the OSPF network. Costs for this portion of the network are calculated using standard OSPF algorithms.

OSPF differentiates between two types of external routes. They differ in the way the cost of the route is calculated. The ASBR is configured to redistribute the route as: 򐂰 External type 1: The total cost of the route is the sum of the external cost and any internal OSPF costs. 򐂰 External type 2: The total cost of the route is always the external cost. This ignores any internal OSPF costs required to reach the ASBR.

Figure 3-10 illustrates an example of the types of OSPF external routes.

R1 Routing Table

RIP Network

10.99.5.0/24 E1: Cost 60 or E2: Cost 50

OSPF Network R1

(10)

(20)

ASBR

R2

(15) 10.99.5.0/24 redistributed with external cost 50

R2 Routing Table 10.99.5.0/24 E1: Cost 65 or E2: Cost 50

10.99.5.0/24

External

Internal

Figure 3-10 OSPF route redistribution

Chapter 3. Routing overview

63

In this example, the ASBR is redistributing the 10.99.5.0/24 route into the OSPF network. This subnet is located within the RIP network. The route is announced into OSPF with an external cost of 50. This represents the cost for the portion of the path traversing the RIP network. 򐂰 If the ASBR redistributed the route as an E1 route, R1 will contain an external route to this subnet with a cost of 60 (50 + 10). R2 will have an external route with a cost of 65 (50 + 15). 򐂰 If the ASBR redistributed the route as an E2 route, both R1 and R2 will contain an external route to this subnet with a cost of 50. Any costs associated with traversing segments within the OSPF network are not included in the total cost to reach the destination.

3.4.5 OSPF stub areas OSPF allows certain areas to be defined as stub areas. A stub area is created when the ABR connecting to a stub area excludes AS external LSAs from being flooded into the area. This is done to reduce the size of the link state database maintained within the stub area routers. Since there are no specific routes to external networks, routing to these destinations is based on a default route generated by the ABR. The link state databases maintained within the stub area contain only the default route and the routes from within the OSPF environment (for example, intra-area and inter-area routes). Since a stub area does not allow external LSAs, a stub area cannot contain an ASBR. No external routes can be generated from within the stub area. Stub areas can be deployed when there is a single exit point connecting the area to the backbone. An area with multiple exit points can also be a stub area. However, there is no guarantee that packets exiting the area will follow an optimal path. This is because each ABR generates a default route. There is no ability to associate traffic with a specific default route. All routers within the area must be configured as stub routers. This configuration is verified through the exchange of hello packets.

Not-so-stubby areas An extension to the stub area concept is the not-so-stubby area (NSSA). This alternative is documented in RFC 1587. An NSSA is similar to a stub area in that the ABR servicing the NSSA does not flood any external routes into the NSSA. The only routes flooded into the NSSA are the default route and any other routes from within the OSPF environment (for example, intra-area and inter-area).

64

Networking with z/OS and Cisco Routers: An Interoperability Guide

However, unlike a stub area, an ASBR can be located within an NSSA. This ASBR can generate external routes. Therefore, the link state databases maintained within the NSSA contain the default route, routes from within the OSPF environment (for example, intra-area and inter-area routes), and the external routes generated by the ASBR within the area. The ABR servicing the NSSA floods the external routes from within the NSSA throughout the rest of the OSPF network.

3.4.6 OSPF route summarization Route summarization is the process of consolidating multiple contiguous routing entries into a single advertisement. This reduces the size of the link state database and the IP routing table. In an OSPF network, summarization is performed at a border router. There are two types of summarization: 򐂰 Inter-area route summarization: Inter-area summarization is performed by the ABR for an area. It is used to summarize route advertisements originating within the area. The summarized route is announced to the backbone. The backbone receives the aggregated route and announces the summary into other areas. 򐂰 External route summarization: This type of summarization applies specifically to external routes injected into OSPF. This is performed by the ASBR distributing the routes into the OSPF network.

Figure 3-11 illustrates an example of OSPF route summarization.

Chapter 3. Routing overview

65

OSPF Area 2 10.99.0.0/26 10.99.192.0/26 R1 OSPF Area 0 External Summary 10.99.0.0/26

RIP Network

ASBR

Inter-Area Summary 10.99.192.0/26

ABR

OSPF Area 1 10.99.0.0/24 through 10.99.63.0/24

10.99.192.0/24 through 10.99.254.0/24

Figure 3-11 OSPF route summarization

In this figure, the ASBR is advertising a single summary route for the 64 subnetworks located in the RIP environment. This single summary route is flooded throughout the entire OSPF network. In addition, the ABR is generating a single summary route for the 64 subnetworks located in area 1. This summary route is flooded through area 0 and area 2. Depending of the configuration of the ASBR, the inter-area summary route may also be redistributed into the RIP network.

3.5 EIGRP The Enhanced Interior Gateway Routing Protocol (EIGRP) is categorized as a hybrid routing protocol. Similar to a distance vector algorithm, EIGRP uses metrics to determine network paths. However, like a link state protocol, topology updates in an EIGRP environment are event driven.

66

Networking with z/OS and Cisco Routers: An Interoperability Guide

EIGRP, as the name implies, is an interior gateway protocol designed for use within an autonomous system. In properly designed networks, EIGRP has the potential for improved scalability and faster convergence over standard distance vector algorithms. EIGRP is also better positioned to support complex, highly redundant networks. EIGRP is a proprietary protocol developed by Cisco Systems, Inc. At the time of this writing, it is not an IETF standard protocol.

3.5.1 Features of EIGRP EIGRP provides several benefits. Some of these benefits are also available in distance vector or link state algorithms. 򐂰 Faster convergence: EIGRP maintains a list of alternate routes that can be used if a preferred path fails. When the path fails, the new route is immediately installed in the IP routing table. No route recomputation is performed. 򐂰 Partial routing updates: When EIGRP discovers a neighboring router, each device exchanges their entire routing table. After the initial information exchange, only routing table changes are propagated. There is no periodic rebroadcasting of the entire routing table. 򐂰 Low bandwidth utilization: During normal network operations, only hello packets are transmitted through a stable network. 򐂰 CIDR and VLSM: EIGRP supports supernetting and variable-length subnet masks. This allows the network administrator to efficiently allocate IP address resources. 򐂰 Route summarization: EIGRP supports the ability to summarize routing announcements. This limits the advertisement of unnecessary subnet information. 򐂰 Multiple protocols: EIGRP can provide network layer routing for AppleTalk, IPX and IP networks. 򐂰 Unequal cost load balancing: EIGRP supports the simultaneous use of multiple unequal cost paths to a destination. Each route is installed in the IP routing table. EIGRP also intelligently balances traffic load over the multiple paths.

3.5.2 Terminology EIGRP uses specific terminology to describe the operation of the protocol: 򐂰 Successor: For a specific destination, the successor is the neighbor router currently used for packet forwarding. This device has the least-cost path to

Chapter 3. Routing overview

67

the destination and is guaranteed not to be participating in a routing loop. To reach the target network shown in Figure 3-12, router B is the current successor for router A. 򐂰 Feasible successor: A feasible successor assumes forwarding responsibility when the current successor router fails. The set of feasible successors represent the devices that can become a successor without requiring a route recomputation or introducing routing loops. 򐂰 The set of feasible successors to a destination is determined by reviewing the complete list of minimum cost paths advertised by neighboring routers. From this list, neighbors that have an advertised metric less than the current routing table metric are considered feasible successors.

Figure 3-12 provides an example of a feasible successor relationship.

25 (10) A

(10)

(15)

(15)

(15)

Target Network

30

40

C

(5) D

B

E

(n) = Hop Cost

Figure 3-12 EIGRP feasible successors

In this diagram, the costs to reach the target network are shown. For example, the cost from router C to the target network is 40 (15 + 10 + 10 + 5). The cost from router E to the target network is 30 (15 + 10 + 5). Router E is advertising a cost (30) that is less than the current routing table metric on router C (40). Therefore, router C recognizes router E as a feasible successor to reach the target network. Note that the reverse is not true. The cost advertised by router C (40) is more than the current route on router E (30). Therefore, router E does not recognize router C as a feasible successor to the destination network. 򐂰 Neighbor table: EIGRP maintains a table to track the state of each adjacent neighbor. The table contains the address and interface used to reach the neighbor. It also contains the last sequence number contained in a packet from the neighbor. This allows the reliable transport mechanism of EIGRP to detect out-of-order packets.

68

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Topology table: EIGRP uses a topology table to install routes into the IP routing table. The topology table lists all destination networks currently advertised by neighboring routers. The table contains all the information needed to build a set of distances and vectors to each destination. This information includes:

– Smallest bandwidth available on a segment used to reach this destination. – Total delay, reliability, and loading of the path. – Minimum MTU used on the path. – The feasible distance of the path. This represents the best metric along the path to the destination network. It includes the metric used to reach the neighbor advertising the path. – The reported distance of the path. This represents the total metric along the path to a destination network as advertised by an upstream neighbor. – The source of the route. EIGRP marks external routes. This provides the ability to implement policy controls that customize routing patterns. An entry in the topology table can have one of two states: – Passive state: The router is not performing a route recomputation for the entry. – Active state: The router is performing a route recomputation for the entry. If a feasible successor exists for a route, the entry never enters this state. This avoids processor-intensive route recomputation. 򐂰 Reliable transport protocol: EIGRP can guarantee the ordered delivery of packets to a neighbor. However, not all types of packets must be reliably transmitted. For example, in a network that supports multicasting, there is no need to send individual, acknowledged hello packets to each neighbor. To provide efficient operation, reliability is provided only when needed. This improves convergence time in networks containing varying speed connections.

3.5.3 Neighbor discovery and recovery EIGRP can dynamically learn about other routers on directly attached networks. This is similar to the Hello protocol used for neighbor discovery in an OSPF environment. Devices in an EIGRP network exchange hello packets to verify each neighbor is operational. Like OSPF, the frequency used to exchange packets is based on the network type. Packets are exchanged at five-second intervals on high bandwidth links (for example, LAN segments). Otherwise, hello packets on lower bandwidth connections are exchanged every 60 seconds.

Chapter 3. Routing overview

69

Like OSPF, EIGRP uses a hold timer to remove inactive neighbors. This timer indicates the amount of time that a device will continue to consider a neighbor active without receiving a hello packet from the neighbor.

3.5.4 The DUAL algorithm A typical distance vector protocol uses periodic updates to compute the best path to a destination. It uses distance, next hop, and local interface costs to determine the path. Once this information is processed, it is discarded. EIGRP does not rely on periodic updates to converge on the topology. Instead, it builds a topology table containing each of its neighbor’s advertisements. Unlike a distance vector protocol, this data is not discarded. EIGRP processes the information in the topology table to determine the best paths to each destination network. EIGRP implements an algorithm known as DUAL (Diffusing Update ALgorithm). This algorithm provides several benefits: 򐂰 The DUAL algorithm guarantees loop-free operations throughout the route computation and convergence period. 򐂰 The DUAL algorithm allows all routers to synchronize at the same time. This is unlike a RIP environment, in which the propagation of routing updates causes devices to converge at different rates. 򐂰 The DUAL algorithm allows routers not involved with a topology change to avoid route recomputation.

The DUAL algorithm is used to find the set of feasible successors for a destination. When an adverse condition occurs in the network, the alternate route is immediately added to the IP routing table. This avoids unnecessary computation to determine an alternate path. If no feasible successor is known, a route recomputation occurs. This behavior is shown in Figure 3-13 on page 71 and Figure 3-14 on page 72.

70

Networking with z/OS and Cisco Routers: An Interoperability Guide

25 (10) A

(10)

(15)

X (15) (15) C

(5) D

B

Target Network

30

E

(n) = Hop Cost

45

Figure 3-13 Using a feasible successor

In this example, router C uses router E as a feasible successor to reach the target network. If the connection between router A and router C fails, router C will immediately reroute traffic through router E. The new route is updated in the IP routing table.

Route recomputation A route recomputation occurs when there is no known feasible successor to the destination. The process starts with a router sending a multicast query packet to determine if any neighbor is aware of a feasible successor to the destination. A neighbor replies if it has a feasible successor. If the neighbor does not have feasible successor, the neighbor may return a query indicating it also is performing a route recomputation. Figure 3-14 shows an example of querying to determine a feasible successor. In this example, router E does not have a feasible successor to the target network. When the link connecting router E and router B fails, router E must determine a new path. Router E sends a multicast query to each of its neighbors. Router C has a feasible successor and responds to router E. Router E updates its IP routing table with the new path at a cost of 55.

Chapter 3. Routing overview

71

25 (10) A

(10)

(15)

40

(15)

(5) D

B

Target Network

X

(15) C

E

(n) = Hop Cost

Do you have a feasible successor to the target network? Figure 3-14 Query for a feasible successor

When the link to a neighbor fails, all routes that used that neighbor as the only feasible successor require a route recomputation.

EIGRP metrics EIGRP uses a mathematical formula to determine the metric associated with a path. By default, the formula references the minimum bandwidth of a segment used to reach the destination. It also sums the delays on the path. The default formula to determine the metric is: 7 10 æ ------------------------------------ö è minbandwithø

+ sumofdelays × 256

EIGRP supports the inclusion of other measurements in the metric calculation.

72

Networking with z/OS and Cisco Routers: An Interoperability Guide

4

Chapter 4.

Quality of Service As application data, voice and video traffic converge over a single utilitarian network based on the Internet Protocol, it becomes increasingly important to provide the exact services that each traffic type requires. A network that provides services in terms of the latency, jitter, and packet delivery required by each application and based on proper, established business policies will meet the needs of the organization more efficiently. The result is a cost-effective, converged network offering increased productivity and competitive advantage by allowing applications to be developed, enhanced, and deployed faster than ever. This chapter will help you to define Quality of Service (QoS) levels within a Cisco network and map z/OS systems and applications traffic to the services based on their requirements. The chapter is organized according to the following topics: 򐂰 򐂰 򐂰 򐂰 򐂰

Overview of QoS protocols Steps in QoS deployment QoS on the z/OS Communications Server Ensuring QoS across the Cisco network Managing QoS

Included is information to help you configure the QoS features of Communications Server for z/OS to operate with the QoS features of the Cisco network.

© Copyright IBM Corp. 2002

73

4.1 Overview of QoS protocols Various protocols have been developed to tackle the problem of providing end-to-end services based on application requirements. SNA and APPN have evolved over time with the idea of class of service according to application requirements central to its development. Today’s networks are IP-based. Initially, IP was defined to provide only a best-effort delivery service. A network built using no additional QoS mechanisms is still very robust and services many disparate applications, as proven by the popularity and scale of the global Internet. However, as enterprises make heavier use of the Internet and build their own intranets, bandwidth will inevitably become constrained. No longer will the network be able to offer the desired services to each and every application. Convergence beyond data adds new requirements for traffic delivery. Voice, video, and other digitally encoded streams have real-time transport service requirements that cannot be accommodated without supplemental protocols and policies. QoS protocols, then, are necessary to allow intelligent packet forwarding decisions to be made based on the end-to-end service goals of the application.

4.1.1 Service models Remember, service levels apply to applications and traffic streams on an end-to-end basis. This is an important point to remember when designing a network with QoS. It is also fundamental to the ability of a network, or a subnetwork, to provide different levels of service without introducing complexity that cannot be managed. Let’s start with a basic definition of three commonly referred-to models of end-to-end service (Figure 4-1):

򐂰 Best-effort service Best-effort service is the type of service provided by all general IP-based networks. The network will deliver data, on a first-in, first-out basis as long as resources are available to do so. No guarantees or assurances are made with respect to delay, packet loss, or throughput. You could say a best-effort service lacks any QoS mechanisms.

򐂰 Differentiated Services Differentiated Services involves the handling of individual packets or flows within a network node. Each packet is associated with a particular class of service. Each node along the network path handles packets in a cooperative manner according to a common set of rules resulting in end-to-end service classes. 74

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Integrated or Reserved Services Integrated Services, also known as Reserved or Guaranteed Services, provides the bandwidth and delay characteristics as requested by the application or configured for specific types of traffic.

Best Effort (IP, IPX, AppleTalk)

Solved-IP Internet: ubiquitous connectivity Best Effort

Differentiated (First, Business, Coach Class)

Differentiated

Some traffic is more important than the rest

The Network Guaranteed (Bandwidth, Delay, Jitter)

Certain applications require specific network resources

Figure 4-1 End to end service models

For levels of service beyond basic IP services or best-effort service, signaling protocols and queuing, traffic shaping and filtering mechanisms are employed to supplement IP and provide the services required by the application. Keep in mind, however, that QoS is not a substitute for necessary bandwidth. Sufficient bandwidth is necessary, and more is certainly better, to minimize congestion within the network. When congestion does occur, QoS mechanisms help to ensure that less critical or more tolerant traffic will encounter network delay before application traffic with real-time requirements or that of a more important business nature.

Integrated or Reserved Services With Integrated Services, or IntServ, a particular QoS is negotiated at the time it is requested. Resource Reservation Protocol (RSVP) is used to allow an application to request or signal the network to reserve a certain amount of bandwidth with particular QoS criteria, such as minimum latency. RSVP is defined in IETF Internet standard RFC 2205. RSVP can be used to provide something similar to a dedicated circuit over an IP network. See Figure 4-2 on page 76. While RSVP provides the highest level of guaranteed services, it is also the most complex of the QoS protocols. This is because the reservation must be done across the entire data path with the state of each node maintained throughout the duration of the connection. This is certainly not a trivial task when thousands of reservation requests are anticipated. You can think of Integrated Services as a superset of the mechanisms necessary for the simpler Differentiated Services.

Chapter 4. Quality of Service

75

End-to-End

RSVP (Guaranteed Service)

Host

WF Queuing

High Bandwidth Flows

Interactive Flows Reserved Flows

Best Effort/ Fair

Host Client

Guaranteed

Figure 4-2 Integrated services

A host uses RSVP to request a level of service on behalf of the application from the network before data is actually sent. Information is provided relative to the traffic profile the application expects to send and service is requested in terms of required bandwidth and maximum tolerated delay. The network responds to the QoS request based on available resources and then commits to meeting the requested service level or denies the request. Refer to Figure 4-3 on page 77.

Packet scheduler The packet scheduler manages the forwarding of different packet streams in hosts and routers, based on their service class, using queue management and various scheduling algorithms.

Packet classifier The packet classifier identifies packets of an IP flow in hosts and routers that will receive a certain level of service.

76

Networking with z/OS and Cisco Routers: An Interoperability Guide

Admission control The admission control contains the decision algorithm that a router uses to determine if there are enough routing resources to accept the requested QoS for a new flow.

The Resource Reservation Protocol (RSVP) The Resource Reservation Protocol (RSVP) is used by Integrated Services to set up and control QoS reservations.

Router

Host RSVP Application

RSVP Process

RSVP Process Routing Process

Policy Control

Policy Control

Application Control

Admission Control

Classifier

Packet Scheduler

Classifier Data

Packet Scheduler

Data

Figure 4-3 Integrated services operational model

Service classes The Integrated Services model includes two standard service classes defined by the IETF. Controlled Load Service is defined in RFC 2211 and the Guaranteed Service is defined in RFC 2212.

Controlled Load Service provides the data stream with approximately the same QoS as the stream would receive if the network were operating at its optimal uncongested state with best-effort service. It uses admission control to ensure this service is received even when the network becomes overloaded and packets begin to encounter queuing delays. The language, same as uncongested, is intentionally vague and meant to infer a rough approximation of an unloaded network’s ability to deliver packets. A Controlled Load Service can expect to have a high percentage of packets delivered successfully and the delay experienced by a high percentage of packets will not be significantly more than

Chapter 4. Quality of Service

77

the minimum transit delay of any one packet. Controlled Load Service is meant to be used for a wide range of applications, particularly those demanding applications that encounter problems when a network experiences increased delays above the uncongested condition.

Guaranteed Service, on the other hand, provides firm bounds on queuing delays and bandwidth capacity. The ability to provide the required service level is dependent on each node in the path supporting the service in an adequate fashion. Guaranteed Service ensures that packets will arrive at their destination within the guaranteed delivery time and will not be discarded due to queue overflows, provided the offered traffic load stays within the specified transmission specification. The service is intended for applications that require specific delay characteristics. Consider, for example, a video or audio stream where data can be buffered but beyond that point, the sound halts. The service controls merely the maximum queuing delay. Guaranteed Service provides approximately the same level of service as that provided by a leased line.

Differentiated Services Differentiated Services (DiffServ) was developed to allow a network to support multiple service classes without the need to maintain the state of each traffic flow along the path or to perform signaling between nodes. It can, therefore, scale to support the traffic seen in today’s global networks. The network domain manager or administrator defines aggregate traffic service classes, sometimes referred to as the Olympic classes - platinum, gold, silver, and bronze. DiffServ is, therefore, less complex than Integrated or Reserved Services. It is less network intensive and is appropriate for networks of networks even where portions of the network are outside the control of the network domain manager.

78

Networking with z/OS and Cisco Routers: An Interoperability Guide

DiffServ Domain B

DiffServ Domain C

DiffServ Domain A

DS-Ingress/Egress Node DS-Boundary Node TCB Process

Bit Bucket Premium

Gold

Silver

Bronze

TCB Policer, Shaper, LLQ, WRED, MQC-Clarification and Marketing

Packet color in DSCP

PHB LLQ/WRED

Figure 4-4 DiffServ end-to-end architecture

DiffServ is described in IETF RFC 2474, RFC 2475, RFC 2597 and RFC 2598. DiffServ is meant to handle traffic aggregates. This means that traffic is classified according to the application requirements relative to other application traffic. Each node then handles the traffic using internal mechanisms to control bandwidth, delay, jitter, and packet loss. Through the use of standard per-hop-behaviors (PHBs), packets receive the proper handling and the result is end-to-end QoS. For true end-to-end QoS, each administrative domain must implement cooperative policies and PHBs. Packets entering a DiffServ domain can be metered, marked, shaped, or policed to implement traffic policies as defined by the administrative authority. This is handled by the DiffServ traffic conditioner

Chapter 4. Quality of Service

79

block (TCB) function. DiffServ boundary nodes will typically perform traffic conditioning. A traffic conditioner typically classifies the incoming packets into predefined aggregate classes, meters them to determine compliance to traffic parameters, marks them appropriately by writing or re-writing the DSCP, and finally shapes the traffic as it leaves the node.

The DS field To distinguish the data packets from different customers in DS-capable network devices, the IP packets are modified in a specific field. A small bit pattern, called the DS field, in each IP packet is used to mark the packets that receive a particular forwarding treatment at each network node. The DS field uses the space of the former TOS octet in the IPv4 IP header and the traffic class octet in the IPv6 header. All network traffic inside of a domain receives a service that depends on the traffic class that is specified in the DS field. The DS field uses six bits to determine the Differentiated Services Code Point (DSCP) as defined in RFC 2474 and RFC 2475. This code point will be used by each node in the net to select the PHB. A two-bit currently unused (CU) field is reserved. The value of the CU bits are ignored by Differentiated Services-compliant nodes, when PHB is used for received packets. Figure 4-5 shows the structure of the defined DS field.

0

1

2

3

4

5

6

7

CU

DSCP

3376X\3376F7SG

Figure 4-5 DS field

In the event that some nodes in a network recognize only the IP precedence bits, standard DSCP PHBs are constructed in such a way that they remain compatible with IP precedence. For example, the DSCP values can be used such that the values for IP precedence relate to the classes as shown in Table 4-1. Table 4-1 Relationship between IP precedence and DSCP RFC 791 precedence

80

RFC 2474, RFC 2475 DiffServ

Network Control

111 (7)

Preserved

111000

Internetwork Control

110 (6)

Preserved

110000

CRITIC/ECP

101 (5)

Express Forwarding

101xxx

Networking with z/OS and Cisco Routers: An Interoperability Guide

RFC 791 precedence

RFC 2474, RFC 2475 DiffServ

Flash Override

100 (4)

Class 4

100xxx

Flash

011 (3)

Class 3

011xxx

Immediate

010 (2)

Class 2

010xxx

Priority

001 (1)

Class 1

001xxx

Routine

000 (0)

Best Effort

000000

4.2 Steps in QoS deployment Network resources and bandwidth are of finite quantity. While there have been many predictions regarding free bandwidth, that is not the case today nor in the foreseeable future. Implementing QoS, then, means that in times of network congestion, some traffic will get better service while other traffic will likely see increased delay and lower throughput. Therefore, it makes sense to think of the network and applications as a system. As a system, analysis and planning is necessary to ensure efficient operation. In order to successfully deploy QoS to meet the needs of an enterprise, a step-by-step process is followed prior to implementation. This includes the following steps: 򐂰 򐂰 򐂰 򐂰

Traffic audit Traffic classification Defining policies Planning for RSVP configuration (if applicable)

4.2.1 Traffic audit The first step in designing a network with QoS is to determine the different application requirements and allocate each traffic stream to the required class of service. For many organizations this is not as easy as it sounds. However, it is a critical step. This step can be broken down into the following tasks: 򐂰 Network audit

A network audit determines what traffic is present in the network, the capacity required, and the time that it is required. 򐂰 Business audit

Here the application traffic is ranked in terms of business importance.

Chapter 4. Quality of Service

81

򐂰 Application audit

During the application audit, the specific network requirements and traffic characteristics of the applications are determined. 򐂰 Normalize ranks and assign service levels

Once the audit process is complete, assignments can be made to correlate business and applications requirements with a set of service levels. It may be necessary to revisit the current network design, since this exercise may have identified requirements beyond what the network is capable of delivering.

4.2.2 Traffic classification Using the results of the auditing step, specific application traffic or traffic categories are assigned to a number of classes. We suggest you begin with a small number of classes, perhaps using the Olympic reference model of platinum, gold, silver, bronze classifications. Figure 4-6 illustrates the Olympic scheme.

Olympic Model Qos Bandwidth Allocation

Bandwidth Percentage

40 35 30 25 20 15 10 5 0 Platinum Gold

Silver Bronze Default

Figure 4-6 Olympic scheme

There are many ways that a given traffic stream can be classified. A class can be assigned by application using such identifiers as Uniform Resource Locators (URLs), IP ports, task or application name. Classes can also be assigned based on network criteria such as origin or destination IP address, MAC address, or a

82

Networking with z/OS and Cisco Routers: An Interoperability Guide

combination. And groups or departments could be assigned a class based on their IP subnet or network interface, for example. The point is that there are so many ways classes can be assigned that restraint is necessary to minimize the number of classes and assignment mechanisms so complexity is mitigated. Two other points are worth mentioning. First, we recommend that you classify, or color, traffic nearest the source at the edge of the network by setting the DSCP bits to properly designated values. Avoid host, application-based coloring of traffic when possible, since this will inevitably become a problem if not managed properly and may create inconsistencies between traffic streams.

4.2.3 Defining policies for the classes When defining policies for a class, you are defining the actual end-to-end service you want the traffic stream assigned to the class to receive. We are getting down to detail now. Here we are specifying service in terms such as minimum guaranteed bandwidth, the maximum amount of bandwidth this class will be permitted to use, and assigning a priority in relation to other traffic classes. Important: The goal of your policy definitions, when considered as a system, is to offer an appropriate combination of consistent services that meet the requirements of all traffic streams.

Take network topology into account and plan for future change and growth. And document all policies and classes and the desired relationships between traffic categories.

4.2.4 Planning for RSVP configuration Bandwidth reservation using RSVP signaling requires careful planning. First, make sure reservation is necessary. DiffServ may be sufficient and simpler to implement. Often, data traffic does not need reserved bandwidth. Only certain applications with particular real-time requirements that cannot be accommodated using DiffServ techniques will use reservation signaling. Usually, these are applications that are multicast in nature. However, RSVP can be used for unicast between two application endpoints. Consider these questions when planning for RSVP: 򐂰 How much bandwidth should be allowed per application flow? 򐂰 How much bandwidth must be excluded from RSVP reservations so that normal traffic is serviced properly?

Chapter 4. Quality of Service

83

It is imperative that you understand the traffic mix, network performance characteristics, and application requirements before entering RSVP configuration commands that affect network traffic.

4.3 QoS on the z/OS Communications Server In the z/OS CS environment, support for Integrated Services is provided by the RSVP Agent. The RSVP Agent queries the Policy Agent for relevant information and communicates with the Cisco router to request the desired QoS on behalf of the application. Differentiated Services is supported by the z/OS CS Policy Agent (PAGENT). PAGENT reads policy definitions from a local configuration file or a Lightweight Directory Access Protocol (LDAP) server. PAGENT then installs the policies in the z/OS CS stacks as desired. Support for environments with multiple TCP/IP stacks is possible using the configuration techniques described in 4.3.2, “Configuring QoS in z/OS Communication Server” on page 89. Figure 4-7 shows the relationship between the various z/OS QoS components. Tasks or daemons such as PAGENT and RSVPD work together and with the TCP/IP protocol stack to classify and mark packets for QoS. Data collection points are also available for performance management.

84

Networking with z/OS and Cisco Routers: An Interoperability Guide

LDAP Server

Service Policy

NonQoS Aware Appl.

QoS Aware Appl.

Data Traffic

RSVP Agent

Data Traffic

Policy Agent

RSVP Flows

Maintain Policies

IP Packet

1 2 3

4

SNMP Subagent

Obtain MIB Values

SNMP Queries, Responses and Traps

TCP/UDP and IP

Active Service Policy

Monitor and enforce TCP data rates and connection limits Set DS Field

SNMP Subagent

Collect and maintain performance metrics for enforcement and monitoring (MIB variables)

Priority-Based Output Queuing (Queued Direct IO)

Interfaces

Cisco 6500 Cisco 7xxx

Figure 4-7 z/OS CS QoS components

A cooperative framework of host-based components and QoS mechanisms within the Cisco network allow for end-to-end service levels to be established.

4.3.1 PAGENT policies We suggest that when you first implement QoS policies you start with a small number of critical applications or traffic types. Then, as you develop more knowledge of the traffic patterns and interactions, continue to apply a set of services classes to applications or traffic streams as needed. The Policy Agent (PAGENT) supports the following types of policies: 򐂰 Quality of Service (QoS) policies

– Differentiated Services (DS) policies – Integrated Services (RSVP) policies

Chapter 4. Quality of Service

85

– Sysplex Distributor (SD) policies 򐂰 Intrusion Detection Services (IDS) policies

– Scan policies – Attack policies – Traffic Regulation policies

QoS policies Policy conditions consist of a variety of selection criteria that act as traffic filters. Traffic can be filtered based on source/destination IP addresses, source/destination ports, protocol, inbound/outbound interfaces, application name, application-specific data or application priority. Only packets that match the filter criteria are selected to receive the accompanying action. Policy rules can refer to several policy actions, but only one policy action is executed per policy scope. A given policy action may be referred to by several policy rules.

Differentiated Services (DS) policies Policies to be implemented can be configured via the Policy Agent configuration file, in an LDAP server, or both. Once read, the policies are combined into a single list. Policy rules and actions map subsets of outbound traffic to various QoS classes and can be used to create end-to-end Differentiated Services.

Setting DSCP using the Policy Agent PAGENT policies are defined by rules and actions. The rules consists of a variety of selection criteria to provide a match condition. Matching the rule then forces the action. The following actions can be performed using Differentiated Services policies: 򐂰 Set the DS Field or Type of Service (TOS) byte and map to S/390 Queued Direct I/O (QDIO) device priority 򐂰 Committed access bandwidth (mean rate and peak rate) control and enforcement 򐂰 TCP connection limits 򐂰 Maximum and minimum TCP connection rates, TCP maximum delay

Of particular importance here is the setting of the DS Field. Outbound traffic can be marked with the desired Differentiated Services Control Point (DSCP) value. This marking will then be interrogated by the Cisco router or switch and the appropriate PHB applied as the packet traverses the network. We see how this is done in Chapter 8, “Implementing QoS in a z/OS and Cisco environment” on page 231.

86

Networking with z/OS and Cisco Routers: An Interoperability Guide

The host PAGENT can be defined as a started task. Upon startup it reads a configuration data set that contains the commands to configure the Policy Agent.

Integrated Services (RSVP) policies Although RSVP policies are installed into the TCP/IP stack, they are only used for collecting policy statistics. For policy use and limit enforcement, these policies are requested from the Policy Agent by the RSVP Agent, to apply to RSVP reservation requests from RSVP applications.

RSVP Agent The RSVP Agent includes an application programming interface called RAPI. The RAPI interface is a set of C language routines that allow a custom application to make enhanced QoS requests. The RSVP Agent then issues RSVP protocol messages to the network. Each router in the path of the data flow may accept or deny the request, depending on the resources available to meet the requirement. If the request is denied, the RSVP Agent returns the decision to the application using the RAPI. Applications can use the RSVP Agent to establish resource reservations within the network. Reservation requests include a Traffic Specification, or Tspec, that consists of the following values: 򐂰 򐂰 򐂰 򐂰 򐂰

Token bucket mean rate (r) Token bucket depth (b) Peak rate (p) Minimum policed unit (m) Maximum packet size (M)

The RSVP Agent can be defined as a started task. Upon startup it reads a configuration data set that contains the commands necessary to configure the RSVP Agent. Applications invoke QoS reservations using the RSVP application programming interface (RAPI). Information on RAPI can be found in z/OS V1R2.0 CS: IP Application Programming Interface Guide, SC31-8788.

Sysplex Distributor policies Sysplex Distributor policies are used to specify a group of target nodes for a given traffic set. For example, suppose you want all FTP traffic from a certain subnet to be assigned one group of systems and FTP traffic from another subnet to be assigned to a different set of systems. This can be accomplished using SD policies. SD policies establish load balancing rules by combining Workload Manager (WLM) information with the defined SD policy. For more information on load balancing, see Chapter 5, “Load distribution solutions” on page 121.

Chapter 4. Quality of Service

87

The goal of SD policy is to limit the target TCP/IP stacks for inbound traffic from a given subnet. The Policy Agent running on SD target nodes within a sysplex can collect network QoS performance data on behalf of policies defined for a target port or application. It then assigns a weight fraction to such a policy. This weight is then used by the SD distributing stack, in conjunction with weights assigned by the Workload Manager, to take QoS performance into consideration for load balancing decisions. The PolicyPerfMonitorForSDR statement in the PAGENT configuration file is used to define the characteristics of the SD policy performance monitoring. Figure 4-8 illustrates the relationship of SD policies, the distributing stack and the target stacks. The SD policies should be installed in the SD distributing stack, while the SD policy performance monitoring is done by the Policy Agent running on the SD target stacks. SD Target Stack

SD Target Stack

Policy Agent

Policy Agent

QoS Weight Fraction

SD

QoS Weight Fraction

Statistics

SD

TCP/IP stack

Statistics

TCP/IP stack

CF QoS Weight Fraction

QoS Weight Fraction

TCP timeout and retransmissions for source port

Policy Agent

SD/WLM Interface

SD policies

TCP/IP stack SD Distributing Stack

Figure 4-8 Sysplex distributor policies

88

Networking with z/OS and Cisco Routers: An Interoperability Guide

List of Target XCF addresses for inbound traffic to target port TRM constrained destination port

Intrusion Detection Services Intrusion Detection Services (IDS) support is available to detect and report on network intrusion events. The Traffic Regulation Management (TRM) support provided in V2R10 has been extended and incorporated into the IDS support. IDS policy regulates the types of events to report and provides the definition of several types of events. IDS policy may be defined for scans, attacks and traffic regulation for both TCP and UDP ports. Notes:

1. IDS policies may be defined only on an LDAP server. 2. Policy Scope TR policies found in a Policy Agent configuration file are compatibly transformed into IDS TR TCP policies by the Policy Agent. 3. pasearch will display the transformed policy. In this chapter, we are concerned with QoS and therefore do not cover security and intrusion detection policies.

4.3.2 Configuring QoS in z/OS Communication Server The two components mainly responsible for QoS within z/OS CS are the Policy Agent and the RSVP Agent. In this section, we provide an overview of the configuration steps necessary to use the z/OS CS Policy Agent. PAGENT supports QoS functions other than reading/installing policies, such as Sysplex Distributor policy performance monitoring, and mapping Type of Service (TOS) byte values to outbound interface and virtual LAN (VLAN) user priorities. The Policy Agent runs in the z/OS environment and reads policy definitions from a local configuration file and/or a central repository that uses the Lightweight Directory Access Protocol (LDAP). The Policy Agent also installs policies in one or more z/OS CS stacks. It can be used to replace existing policies or update them as necessary. The z/OS CS RSVP Agent provides Integrated Services functions, such as communicating with RSVP Agents on other hosts/routers and reserving resources on certain types of outbound interfaces. The RSVP Agent queries the Policy Agent for policies that relate to RSVP processing.

Basic configuration Before defining policies, some basic operational characteristics of the Policy Agent need to be configured in the PAGENT configuration file. In this section, we detail the following configuration steps: 1. Define the TCPImage statements

Chapter 4. Quality of Service

89

2. Define the appropriate logging level 3. Define security product authorization for the Policy Agent

Define the TcpImage statement(s) The Policy Agent can be configured to install policies on one or more TCP/IP stacks, or images. Each TCP/IP stack is configured using a TcpImage statement in the main configuration file. A secondary configuration file can be defined for any given stack, a set of stacks can share configuration information in the main configuration file, or a combination of these techniques can be used. To install different sets of policies to different stacks, configure each image with a different secondary configuration file. In this case, each image can be configured with a different policy refresh interval if desired. The refresh interval used for the main configuration file will be the smallest of the values specified for the different stacks. PAGENT configuration file (1st level)

PAGENT started procedure START PAGENT

TcpImage TCPIPA /etc/pagent.r2615a.conf TcpImage TCPIPB /etc/pagent.r2615b.conf TcpImage TCPIPC /etc/pagent.r2615c.conf

//PAGENT PROC //PAGENT EXEC PGM=PAGENT... // PARM='.../-c /etc/pagent.conf' ... //STDENV DD PATH='/etc/pagent.env' ...

Service Policy

Policy Agent

Install the different policies into each TCP/IP stack TCPIPA

TCPIPB

TCPIPC

policyAction ....... policyAction policyRule ....... ....... policyAction policyRule ....... SetSubnetPrioTosMask ....... policyRule SetSubnetPrioTosMask ....... SetSubnetPrioTosMask

Image configuration file (2nd level)

Figure 4-9 Mutliple stacks, multiple policy definitions

90

Networking with z/OS and Cisco Routers: An Interoperability Guide

Note: When the main configuration file is an MVS data set, it is reread at each refresh interval (which is the smallest of the individual stack refresh intervals), regardless of whether it has actually been changed or not. Because PAGENT restarts all stack-related processing when the main configuration file is reread, this effectively makes the refresh interval for all stacks the same as this smallest configured interval.

To install a common set of policies to a set of stacks, don't specify secondary configuration files for each image. In this case, there is only one configuration file (the main one) and the policy information contained in it is installed to all of the configured stacks. Different refresh intervals can also be configured for each image, but would probably be less useful in this case.

PAGENT configuration file (1st level)

PAGENT started procedure START PAGENT

Policy Agent

//PAGENT PROC //PAGENT EXEC PGM=PAGENT... // PARM='.../-c /etc/pagent.conf' ... //STDENV DD PATH='/etc/pagent.env' ...

Service Policy

Install the same policies into the all TCP/IP stacks TCPIPA

TCPIPB

TcpImage TCPIPC TcpImage TCPIPB TcpImage TCPIPA policyAction ....... policyRule ....... SetSubnetPrioTosMask .......

TCPIPC PAGENT_CONFIG_FILE PAGENT_LOG_FILE PAGENT_LOG_FILE_CONTROL TZ

UNIX environment variable definitions

Figure 4-10 Multi-stack, single policy definition

In either case, it is possible that TCP/IP stacks configured to the Policy Agent are not started or even defined. The Policy Agent will fail when trying to connect to those stacks and log appropriate error messages. The Policy Agent does not end when any (or all) stacks end. When the stack(s) are restarted, active policies are automatically reinstalled.

Chapter 4. Quality of Service

91

When the Policy Agent is shut down normally (that is, using KILL or STOP), then if the TcpImage statement option PURGE was coded, all policies will be purged from this stack. The TcpImage statement specifies a TCP/IP image and its associated configuration file to be installed to that image. The following example installs the policy control file /tmp/TCPCS.policy to the TCPCS TCP/IP image, after flushing the existing policy control data: TcpImage TCPCS /tmp/TCPCS.policy FLUSH

Define the appropriate logging level The LogLevel statement is used to define the amount of information to be logged by the Policy Agent. The default is to log only event, error, console, and warning messages. This might be appropriate for a stable policy configuration, but more information might be required to understand policy processing or debug problems when first setting up policies or when making significant changes. Specify the LogLevel statement with the appropriate logging level in the main configuration file. Note: The maximum logging level (511) can produce a significant amount of output, especially with large LDAP configurations. This is not a concern if an HFS log file is used, because the Policy Agent uses a set of log files with a finite size in a round-robin configuration (the number and size of these files is controllable with the PAGENT_LOG_FILE_CONTROL environment variable). But when using the syslog daemon as the log file, the amount of log output produced should be taken into consideration.

Define security product authorization for the Policy Agent Because the Policy Agent can affect system operation significantly, security product authority (for example, RACF) is required to start the Policy Agent. Refer to the EZARACF sample in SEZAINST for sample commands needed to create the profile name and permit users to it. If Policy Agent clients (that is, pasearch) are not defined as a superuser, then security product authority in the SERVAUTH CLASS for that client must be defined to retrieve policies. These profiles can be defined by TCP/IP stack (that is, TcpImage) and policy type (that is, ptype = QoS or IDS). Wildcarding of profile names are allowed. EZB.PAGENT...

where: 򐂰 is the system name defined in sysplex 򐂰 is the Tcp name for policy information is being requested 򐂰 is the policy type that is being requested, where:

92

Networking with z/OS and Cisco Routers: An Interoperability Guide

– QOS is the Policy QoS – IDS is the Policy IDS Note: Wildcarding is allowed on segments of the profile name.

The Policy Agent will check all client's requests to verify SERVAUTH class is active and the profile exists for the TcpImage(s) and policy type(s) in the request. If a client's request is for all TcpImages and policy types defined, then the Policy Agent will only return information for any object for which permission is granted. For example, if the request is for all policy types, and both QoS and IDS policy types are defined, but the user is only granted permission for the QoS policy types, then only QoS policy information will be returned. If SERVAUTH class is absent (not RACLIST), or profile(s) are absent for a client's request (that is, TcpImage, policy type), then permission is denied and data is not returned. If SERVAUTH class is active and profile(s) are present for a client's request (that is, TcpImage and policy type) and an MVS user is defined for all profile(s), then permission is granted and data is returned. If SERVAUTH class is active and profile(s) are present for a client's request (that is, TcpImage and policy type) and an MVS user (that is, Policy Agent client) is not defined for all profile(s), then permission is refused and data is not returned. Refer to the EZARACF sample in SEZAINST for sample commands needed to create the profile name and permit users to it.

Defining policies Policies consist of several related objects. The main object is the policy rule. A policy rule object refers to one or more policy condition, policy action, or policy time period condition objects, and also contains information on how these objects are to be used. Policy time period objects are used to determine when a given policy rule is active. Active policy objects are related in a way that is analogous to an IF statement in a program. For example: IF condition THEN action

In other words, when the set of conditions referred to by a policy rule are TRUE, then the policy actions associated with the policy rule are executed.

Chapter 4. Quality of Service

93

Differentiated Services rule The most common QoS deployment will use rules to map outbound traffic from particular applications into sub-classes. Example 4-1 illustrates this type of policy. The goal of this Differentiated Services policy is to map a subset of the traffic outbound from an FTP server. Example 4-1 Sample DiffServ rule PolicyRule diffServ { ProtocolNumberRange SourceAddressRange SourcePortRange PolicyActionReference PolicyRulePriority ConditionTimeRange DayOfMonthMask DayOfWeekMask TimeOfDayRange } PolicyAction tokenbucket { PolicyScope OutgoingTOS DiffServInProfileRate DiffServInProfileTokenBucket DiffServInProfilePeakRate DiffServInProfileMaxPacketSize DiffServOutProfileTransmittedTOSByte DiffServExcessTrafficTreatment }

6 200.50.23.11 20-21 tokenbucket 10 20000701000000:20050630235959 1111111111111111111111111111111 0111110 06:00-22:00

DataTraffic 10000000 256 # 512 # 512 # 120 # 00000000 BestEffort

256 512 512 120

Kbps Kbits Kbps Kbits

This policy is identified as a Differentiated Services policy by the PolicyScope DataTraffic attribute on the PolicyAction statement, as well as the use of several DS-only attributes. The following statements apply to the example in this section: 򐂰 The policy rule selects traffic originated by ports in the range 20-21 for TCP (FTP outbound data connection uses port 20) from the source address 200.50.23.11. 򐂰 The policy rule is active on weekdays between 6 a.m. and 10 p.m. local time, between the dates 7/1/2000 and 7/1/2005. 򐂰 The policy action specifies that the TOS byte be set to '10000000' for traffic that conforms to this policy.

94

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 The action establishes a token bucket traffic conditioner with a mean rate of 256 kilobits per second, a peak rate of 512 kilobits per second, and a burst size of 64 kilobytes. Any traffic that exceeds these specifications will be sent as best effort, with an accompanying TOS byte of '00000000'.

Example 4-2 shows another example of a DS policy. Example 4-2 Web policy sample PolicyRule web-catalog # web catalog traffic { protocolNumberRange 6 SourcePortRange 80 ApplicationData /catalog policyActionReference interactive1 } PolicyAction interactive1 { policyScope DataTraffic outgoingTOS 10000000 }

The goal of this policy is to ensure that outgoing data that matches the specified attributes will be assigned a QoS service level defined in action "interactive1". The following statements apply to the example in this section: 򐂰 This rule will only match traffic on TCP connections (protocol 6) with a source port of 80 (that is, HTTP server) and application-defined data beginning with the string "/catalog". 򐂰 Since we are dealing with HTTP traffic, this rule is basically indicating that all outgoing traffic associated with a URI that begins with "/catalog" should be managed using the DS characteristics specified in the "interactive1" policy action.

RSVP policy example The goal of this RSVP policy is to establish limits on resource reservations requested by RSVP applications using the RSVP API (RAPI) interface. The policy is identified as an RSVP policy by the PolicyScope attribute on the PolicyAction statement, as well as the use of RSVP-only attributes. Example 4-3 RSVP policy sample PolicyRule intserv { SourcePortRange ProtocolNumberRange

8000 8001 6

Chapter 4. Quality of Service

95

PolicyActionReference PolicyActionReference } PolicyAction intserv1 { PolicyScope OutgoingTOS } PolicyAction intserv2 { PolicyScope OutgoingTOS FlowServiceType MaxRatePerFlow MaxTokenBucketPerFlow MaxFlows }

intserv1 intserv2

DataTraffic 01100000

RSVP 01100000 ControlledLoad 400 # 50000 bytes/second 48 # 6000 bytes 10

The following statements apply to Example 4-3: 򐂰 The policy rule selects traffic from source ports in the range 8000 to 8001, with a protocol ID of 6 (TCP). 򐂰 The DataTraffic policy action specifies that the TOS byte be set to '01100000' for Differentiated Services traffic that conforms to this policy. Essentially, any traffic sent by the target application without an RSVP reservation in place will use this policy action. Once an RSVP reservation is in place, the RSVP action gets used. 򐂰 The RSVP policy action specifies that the TOS byte be set to '01100000' while an RSVP reservation is in place. It also limits the type of RSVP service requested by RSVP applications to Controlled Load. Applications requesting Guaranteed Service are "downgraded" to using Controlled Load Service. In addition, the action limits the mean rate and token bucket size to 50000 bytes per second and 6000 bytes, respectively. These values are requested by RSVP applications in the traffic specification, or Tspec. 򐂰 The action also limits the number of active RSVP flows that map to this policy to 10.

For complete information about defining policies and configuring the PAGENT, refer to z/OS V1R2.0 CS: IP Configuration Reference, SC31-8776 and z/OS V1R2.0 CS: IP Configuration Guide, SC31-8775.

96

Networking with z/OS and Cisco Routers: An Interoperability Guide

Configuring the RSVP Agent To configure the RSVP Agent, update the configuration file to specify RSVP Agent operational parameters using the LogLevel, TcpImage, Interface and RSVP statements. To configure the RSVP Agent, you must first authorize the RSVP Agent using the security product. See SEZAINST(EZARACF) for SAF considerations for started tasks. Example 4-4 shows a sample RSVP configuration file. Example 4-4 Sample RSVP configuration file Interface 10.11.12.13 Disabled {} Interface 200.1.1.1 Enabled { TrafficControl Disabled } Interface Others Enabled {} Rsvp All Enabled { MaxFlows 50 }

This example: 򐂰 Runs the RSVP Agent on the stack selected using the standard resolver search order, because a TcpImage statement is not configured. 򐂰 Disables RSVP processing on interface 10.11.12.13, while enabling it for all other interfaces. 򐂰 Disables traffic control on interface 200.1.1.1. This means that no reservations will be made on this interface. 򐂰 Allows a maximum of 50 active RSVP flows per interface.

For complete information on how to configure the RSVP Agent, refer to z/OS V1R2.0 CS: IP Configuration Reference, SC31-8776 and z/OS V1R2.0 CS: IP Configuration Guide, SC31-8775.

4.4 Ensuring QoS across the Cisco network Cisco IOS contains many features to support QoS and ensure that all types of applications and traffic streams receive the network services they require. Remember, adequate bandwidth is still a necessity. QoS cannot overcome an inadequately designed network.

Chapter 4. Quality of Service

97

4.4.1 Cisco IOS QoS support features The Cisco IOS software is the result of an evolution. Early networking devices performed simple store-and-forward services. Today, Cisco IOS software can recognize, classify, set priority for network traffic, optimize routing, and support multimedia applications such as voice, video, and e-learning applications. Cisco has optimized the configuration method for defining the Quality of Service parameters of the network and created the Modular QoS CLI (MQC). This is a command-line interface specifically designed for QoS configuration. The MQC allows you to specify a traffic class independently of QoS policies. You then create traffic policies and attach these policies to interfaces. A traffic policy contains a traffic class and one or more QoS features. A traffic class is used to classify traffic, while the QoS features in the traffic policy determine how to treat the classified traffic. We discuss configuration in more detail later. For now, understand that use of the Modular QoS CLI is the recommended method for configuration of QoS features. The specific features relative to QoS can be categorized under the following headings: 򐂰 򐂰 򐂰 򐂰

Classification Congestion management Congestion avoidance Policing and shaping

Classification As mentioned in 4.2.2, “Traffic classification” on page 82, classification is necessary to determine how policies should be applied to various data streams. The Cisco features pertaining to classification of application traffic are described below.

Policy-Based Routing (PBR) PBR is one of the ways Cisco IOS permits the setting of IP precedence values for classification of traffic. PBR also allows you to control routing based on the defined traffic classes. For example, with PBR you might define a special path for certain traffic to take, possibly over a secure link. With PBR, traffic is identified through the use of extended access control lists (ACLs). For more information on ACLs, refer to Chapter 8, “Implementing QoS in a z/OS and Cisco environment” on page 231. Policies can be based on IP address, port numbers, protocols, or packet size. Any or all of these criteria can be specified. Route maps are used to supplement existing routing mechanisms and implement policy-based routing. 98

Networking with z/OS and Cisco Routers: An Interoperability Guide

Committed Access Rate (CAR) CAR is actually a combination of classification and policing. CAR can be used to mark traffic using IP precedence, again using access control lists and criteria such as physical port, source or destination IP address, source or destination MAC address, application port, protocol, etc. When a packet is classified outside the network, by a host for example, the network can accept or override the packet according to a policy defined using CAR. CAR has been superseded by the introduction of class-based classification and policing methods. The use of class-based configuration using the MQC is preferred going forward.

Class-Based Packet Marking This IOS feature is another method for marking packets for QoS. Class-Based Packet Marking supports the following RFCs: 򐂰 RFC 2474, Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers 򐂰 RFC 2475, An Architecture for Differentiated Services Framework 򐂰 RFC 2597, Assured Forwarding PHB 򐂰 RFC 2598, An Expedited Forwarding PHB

The Class-Based Packet Marking feature provides a user-friendly command-line interface (CLI) for efficient packet marking. Network administrators can differentiate packets based on the designated markings. The Class-Based Packet Marking feature can: 򐂰 Mark packets by setting the IP precedence bits or the IP Differentiated Services code point (DSCP) in the IP type of service (TOS) byte. 򐂰 Mark packets by setting the Layer 2 Class of Service (CoS) value. 򐂰 Associate a local Quality of Service (QoS) group value with a packet. 򐂰 Set the Cell Loss Priority (CLP) bit setting in the ATM header of a packet from 0 to 1. 򐂰 Set the Frame Relay Discard Eligibility (DE) bit in the address field of the frame relay frame from 0 to 1.

If you need to mark packets in your network and all of the devices support IP DSCP marking and matching, we suggest you use the IP DSCP marking to mark your packets.

Chapter 4. Quality of Service

99

Network-Based Application Recognition (NBAR) NBAR is a feature of IOS that performs protocol discovery and analyzes traffic patterns in real time. The discovered traffic is classified using information known about certain applications. Although access control lists can be used to identify traffic, NBAR is easier to configure. NBAR also allows classification of applications that dynamically assign TCP/UDP port numbers, and classification of HTTP traffic by URL, host, or MIME type. Using NBAR, downstream actions, or per-hop-behaviors can be invoked based on QoS policies, random early detection class-based queuing, and policing. Applications not currently visible by NBAR discovery can be added using Packet Description Language Modules (PDLMs). A PDLM can be used to extend the NBAR list of recognized protocols and is loaded at runtime without requiring a router reload. PDLMs are supplied by Cisco and available for download via the Web.

Congestion management Network devices handle congestion management through the use of queuing mechanisms. Queuing algorithms sort the traffic and then apply prioritization to determine which packets should be sent first. The QoS features of Cisco IOS offers four types of queuing algorithms. 򐂰 First-In, First-Out (FIFO)

FIFO queuing offers no additional QoS mechanisms above standard best-effort IP service. 򐂰 Weighted Fair Queuing (WFQ)

WFQ divides bandwidth across queues of traffic based on weights. There are five subtypes or flavors of WFQ: – – – – –

Flow-based WFQ (WFQ) Distributed WFQ (DWFQ) Class-Based WFQ (CBWFQ) Distributed Class-Based WFQ (DCBWFQ) Low Latency Queuing (LLQ)

򐂰 Custom Queuing (CQ)

With CQ, bandwidth is allocated proportionately for each class of traffic. 򐂰 Priority Queuing (PQ)

With PQ, traffic belonging to one class is sent before all lower priority traffic. Each queuing technique is discussed in more detail on the following pages.

100

Networking with z/OS and Cisco Routers: An Interoperability Guide

FIFO queuing As previously stated, FIFO queuing is really an absence of QoS and, therefore, is not very interesting to us since we are concerned with the implementation of QoS. With FIFO queuing, packets are stored during periods of delay or congestion and forwarded in the order they were received when there is no congestion. FIFO queuing makes no decisions relative to priority and provides no protection for applications that attempt to overrun the network. Bursty sources, therefore, have the ability to cause high delays to other traffic flows resulting in poor and inconsistent response times.

Flow-Based Weighted Fair Queuing (WFQ) WFQ is the default queuing method for all physical interfaces with bandwidth less than 2.048 Mbps, except certain interfaces where it does not apply. WFQ is a flow-based algorithm that classifies traffic into conversations and applies priority, or weights, to determine how much bandwidth each conversation is allowed relative to the others. WFQ is effective at ensuring each flow or conversation gets its fair share of the available bandwidth and requires little or no configuration. It is also important to note that WFQ is aware of IP precedence settings within the packets that it classifies. Therefore, as IP precedence values increase, so does the amount of bandwidth assigned to the traffic flow. WFQ assigns a weight to each flow, which determines the transmit order for queued packets. In this scheme, lower weights are served first. IP precedence serves as a divisor to this weighting factor. For instance, traffic with an IP precedence field value of 7 gets a lower weight than traffic with an IP precedence field value of 3, and thus has priority in the transmit order. Suppose, for example, you have one flow at each precedence level on an interface, each flow will get precedence +1 parts of the link, as follows: 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = 36

and the flows will get 8/36, 7/36, 6/36, and 5/36 of the link, and so on. However, if you have 18 precedence level 1 flows and one of each of the others, the formula looks like this: 1 + 18 × 2 + 3 + 4 + 5 + 6 + 7 + 8 = 70

and the flows will get 8/70, 7/70, 6/70, 5/70, 4/70, 3/70, 2/70, and 1/70 of the link, and 18 of the flows will get approximately 2/70 of the link. WFQ is also RSVP aware. RSVP uses WFQ to allocate buffer space and schedule packets, and guarantees bandwidth for reserved flows.

Chapter 4. Quality of Service

101

Traffic Destined for Interface

Classify Transmit Queue

Output Hardware

Weighted Fair Scheduling Weight determined by: Requested Qos (IP, Precedence, RSVP) Frame Relay FECN, BECN, DE (for FR Traffic) Flow throughput (weighted-fair)

Configurable Number of "Flow" Queues

Flow-Based Classification by: Source and destination address Protocol Session identifier (port/socket)

Manage Interface Buffer Resources

Allocate "Fair" Proportion of Link Bandwidth

Figure 4-11 Weighted fair queuing

Distributed Weighted Fair Queuing (DWFQ) DWFQ applies to Virtual Interface Processor (VIP) hardware available for the 7000 and 7500 series routers. To use DWFQ, distributed Cisco Express Forwarding (dCEF) must be enabled on the interface. DWFQ is different from WFQ that runs on other platforms. There are two forms of DWFQ; flow-based and class-based. With flow-based DWFQ, each conversation, or flow, is allocated an equal share of the available bandwidth.

102

Networking with z/OS and Cisco Routers: An Interoperability Guide

With class-based DWFQ, packets are assigned to different queues based on their QoS group or the IP precedence in the TOS byte.

Class-Based Weighted Fair Queuing (CBWFQ) CBWFQ allows you to define service classes based on match criteria. Packets are matched on criteria such as protocol, access control lists (ACLs), input interfaces, IP precedence, and DSCP values. All packets that match the defined criteria constitute that class and are assigned service characteristics that characterize the class. The service characteristics you configure are bandwidth or bandwidth percentage and if necessary, queue limit. Each class will have its own queue reserved for it. The assigned bandwidth is the amount of bandwidth guaranteed during periods of congestion. If a queue reaches its configured queue limit, packets are dropped from the tail of the queue or using random early detection depending on how the class policy is configured (see “Weighted Random Early Detection (WRED)” on page 107). CBWFQ extends Weighted Fair Queuing (WFQ). With WFQ, flows are classified, or grouped, according to packets with common source IP address, destination IP address, source TCP or UDP port number, or destination TCP or UDP port number. WFQ allocates an equal share of bandwidth to each flow. CBWFQ uses the bandwidth you assigned to the class when you configured it to calculate an internal weight. In this sense, the weight for a class is user-configurable. The available bandwidth, then, is shared based on these weights.

Distributed Class-Based WFQ (DCBWFQ) DCBWFQ extends the standard WFQ function and provides support for user-defined classes on the VIP hardware available for the 7000 and 7500 series routers. The maximum number of packets allowed to accumulate in a traffic class queue is called the queue limit. Packets for a traffic class are subject to the bandwidth allocation and the queue limits that characterize the class. Once a traffic class queue has reached its queue limit, additional packets will be dropped from the queue (tail drop or WRED, depending on service policy configuration).

Low Latency Queuing (LLQ) LLQ is primarily used for Voice over IP (VoIP) traffic. LLQ provides priority queuing within Class-Based Weighted Fair Queuing. It enables use of a single, strict priority queue within CBWFQ at the class level, so voice traffic will not be subjected to delay and jitter.

Chapter 4. Quality of Service

103

While it is possible to map traffic classes other than voice to the strict priority queue, we strongly suggest that you refrain from doing so. Voice traffic, unlike other application traffic that may also require high priority, is well behaved. Allowing this queue to be used for other traffic will likely have an impact on the quality of voice traffic.

Custom Queuing (CQ) With CQ, you allocate bandwidth (queues) to all traffic classes. The size of the queue is defined by the configured packet-count, which controls the bandwidth access.

1/10

1/10

3/10 Classify Traffic Destined for Interface

2/10 Transmit Queue 3/10

Up to 16

Link Utilization Ratio

Weighted Round Robin (Byte Count)

Class Queues: Length Defined by Queue Limit

Classification by: Protocol (IP, IPX, AppleTalk, SNA, DECnet, Bridge, etc.) Incoming, interface (E0, S0, S1, etc.)

Manage Interface Buffer Resources

Output Hardware Interface Hardware Ethernet Frame Relay ATM Serial Link

Allocate Configured Proportion of Link Bandwidth

Figure 4-12 Custom queuing

CQ ensures that no application, or specified group of applications, receives more than a predetermined proportion of overall capacity when the line is congested. CQ is statically configured and does not automatically adapt to changing network conditions.

104

Networking with z/OS and Cisco Routers: An Interoperability Guide

Priority Queuing (PQ) PQ enforces a strict priority. With PQ, one type of traffic will be sent at the expense or degradation of all others. Low priority traffic has the potential to be starved out if the critical traffic rate is high.

High

Medium

Traffic Destined for Interface

Classify Normal Transmit Queue Low

Class Queues: Length Defined by Queue Limit

Classification by: Protocol (IP, IPX, AppleTalk, SNA, DECnet, Bridge, etc.) Incoming, interface (E0, S0, S1, etc.)

Manage Interface Buffer Resources

Output Hardware Interface Hardware Ethernet Frame Relay ATM Serial Link

Absolute Priority Scheduling

Allocate Link Bandwidth by Source Priority

Figure 4-13 Priority queuing

In PQ, each packet is placed in one of four queues; High, Medium, Normal, or Low, based on an assigned priority. Packets that are not classified fall into the Normal queue. During transmission, the algorithm gives the higher priority queues absolute preferential treatment over the lower priority queues. This is a simple and intuitive approach but has the potential to cause excessive delays for lower priority traffic. Higher-priority traffic can be rate-limited to avoid this problem.

Chapter 4. Quality of Service

105

Queuing comparison Table 4-2 summarizes the differences among several possible queuing methods. Table 4-2 Queuing method comparison Flow-based WFQ

CBWFQ

Custom queuing

Priority queuing

Number of queues

Configurable, default 256

1 per class, up to 64 classes

16

4

Kind of service

Ensures fairness among all traffic flows based on weights.

Provides class bandwidth guarantee for user-defined traffic classes.

Round-robin service

High priority queues are serviced first.

Strict priority queuing is available through use of the IP RTP Priority or Frame Relay IP RTP Priority features.

Absolute prioritization; ensures critical traffic of highest priority through use of the Frame Relay PVC Interface Priority Queuing feature.

Provides flow-based WFQ support for non-user-defined traffic classes. Strict priority queuing is available through use of the IP RTP Priority, Frame Relay IP RTP Priority, LLQ, Distributed LLQ, and LLQ for frame relay features.

Configuration

None

Required

Required

Required

CBWFQ and LLQ, along with the Distributed versions of these for the 7500, are the class-based queuing mechanisms. We recommend you use these features going forward. CQ and PQ are deemed legacy queuing features and are less desirable than using the class-based techniques available in current releases of Cisco IOS software.

Congestion avoidance While congestion management techniques are used to manage traffic after congestion has occurred, congestion avoidance techniques are used to prevent congestion. Cisco networks use Weighted Random Early Detection (WRED) as their primary congestion avoidance tool.

106

Networking with z/OS and Cisco Routers: An Interoperability Guide

Weighted Random Early Detection (WRED) When congestion occurs, excess traffic is dropped from the end of the queue. This is called tail drop. Tail drop treats all traffic equally and does not differentiate between classes of service. It causes a phenomenon called global synchronization. Global synchronization occurs when multiple TCP hosts reduce their transmission rates in response to packet dropping, and then increase their transmission rates once again when the congestion is reduced. This causes inefficient use of bandwidth.

Global Synchronization

Queue Utilization

100%

Time Tall Drop

Three traffic flows start at different times.

Another traffic flow starts at this point.

Over time, flows become synchronized.

Figure 4-14 Global synchronization

WRED is a congestion-avoidance technique that monitors the link and starts dropping fewer packets from selected traffic, avoiding bottlenecks and global synchronization on an interface. It is intended for TCP traffic. WRED calculates drop probabilities for a packet based on its IP precedence or DSCP values and their associated minimum/maximum thresholds and mark probability values. This setup allows differential treatment for different traffic types. Note: If you use WRED packet drop instead of tail drop for one or more classes in a policy map, you must not configure WRED on the interface where you apply the service policy.

Chapter 4. Quality of Service

107

Policing and shaping Policing and shaping are used modify the fluidity of network traffic. They ensure that a packet, or data source, adheres to a stipulated contract and determine the QoS for the packet. Policing and shaping mechanisms use the traffic descriptor for a packet indicated by the classification of the packet to ensure this adherence and service. Policers and shapers usually identify traffic descriptor violations in an identical manner. They usually differ, however, in the way they respond to violations, for example: 򐂰 A policer typically will either drop the packet or rewrite its IP precedence by resetting the TOS bits in the packet header. 򐂰 A shaper typically delays excess traffic using a buffer or queuing mechanism to hold packets and shape the flow when the data rate of the source is higher than expected.

Policing can be used to limit the input or output transmission rate of a class of traffic based on user-defined criteria. Policing can also be used to color packets using the various means for marking QoS, such as IP Differentiated Services Control Point (DSCP) values, IP precedence bits, etc. Policing is used at the network edge to limit traffic coming into or going out of interfaces so the traffic rate adheres to a desired set of parameters. Traffic that falls within the specification is forwarded. Traffic that exceeds the specification is either dropped or transmitted with a different priority. The primary reason to use traffic shaping is to control access to available bandwidth and to regulate the flow of traffic in order to prevent network congestion. Traffic shaping allows you to control outbound network traffic so that the flow corresponds to a desired rate. The rate can be shaped to be equal to that of a remote interface at a branch office, for example. This action helps eliminate bottlenecks that can occur when upstream interfaces operate at speeds higher than downstream interfaces. You can also use traffic shaping to conform to subrate speeds, for example, to partition a 45 Mbps link into smaller channels. Traffic shaping can be used prevent packet loss for latency-bounded traffic such as voice. Cisco IOS QoS incorporates different traffic shaping techniques: 򐂰 򐂰 򐂰 򐂰

108

Generic Traffic Shaping (GTS) Class-Based Traffic Shaping (CBTS) Frame Relay Traffic Shaping (FRTS) Distributed Traffic Shaping (DTS)

Networking with z/OS and Cisco Routers: An Interoperability Guide

GTS and CBTS use a weighted fair queue to delay packets in order to shape the flow. DTS and FRTS use either priority queue, a custom queue, or a FIFO queue depending on how the router IOS is configured. These are explained in further detail in the following sections.

Generic Traffic Shaping (GTS) GTS is configured on a per-interface basis. Access control lists can be used to select traffic to be shaped. GTS uses the token bucket mechanism to reduce outbound traffic flow, as shown in Figure 4-15.

Match Token Bucket

Traffic Destined for Interface

Transmit Queue

Classify

Outgoing Packets

WFQ Configured Rate No Match Queueing method

Classification by Extended Access Control List (ACL)

"Token Bucket" Shaping

Figure 4-15 Token bucket mechanism used for shaping

Tokens are put into the bucket at a specified constant rate. Each token is permission for the source to send a certain number of bits into the network. The bucket itself has a specified capacity. If the bucket fills to capacity, newly arriving tokens are discarded. To transmit a packet, the regulator must remove from the bucket a number of tokens equal in representation to the packet size. If there are not enough tokens in the bucket to send a packet, the packet either waits until the bucket has enough tokens or the packet is discarded. If the bucket is already full of tokens, incoming tokens overflow and are not available to future packets. At any time, the largest burst a source can send into the network is roughly proportional to the size of the bucket.

Chapter 4. Quality of Service

109

Note that the token bucket mechanism used for traffic shaping has both a token bucket and a queue to buffer data. If it did not have a data buffer, it would be a policer. For traffic shaping, packets coming in that cannot be transmitted immediately are delayed in the data buffer. GTS applies on a per-line interface basis and can use access lists to select traffic to shape. Mainly it is used for layer 2 protocols such as frame relay, ATM, Ethernet, and SMDS.

Class-Based Traffic Shaping (CBTS) Class-Based Traffic Shaping can be enabled on any interface that supports GTS. Class-Based Traffic Shaping is more flexible than GTS in that you can assign GTS directly to a specific class of traffic without relying only on access control lists. Class-Based Traffic Shaping offers the ability to configure a CBWFQ inside of GTS so that the shaped bandwidth is allocated to the classified traffic in the desired proportions. It can also be set to deliver an average rate of traffic to an output interface or be configured for a peak rate enabling a traffic rate in excess of CIR when bandwidth is available. At the time of this writing, adaptive traffic shaping for frame relay networks is not supported using the Class-Based Traffic Shaping feature. It is recommended that, for frame relay networks, you use Frame Relay Traffic Shaping particularly when you want to configure traffic shaping on a per-PVC basis or adapt the rate in response to layer 2 congestion notification (BECN).

Frame Relay Traffic Shaping (FRTS) FRTS, like GTS, can be used to eliminate congestion when using high-speed central-site connections to the frame relay network and slower remote connections. FRTS is configured on a per virtual circuit or Data Link Connection Identifier (DLCI) basis for frame relay interfaces. Rates can be configured to match the Committed Information Rate (CIR) or some other speed. FRTS will dynamically throttle traffic upon receipt of BECN. Packets are held in buffers and the injection rate into the cloud is adjusted based on the number of frames received with the BECN indicator set. Cisco’s FRTS feature can also be used to integrate ATM ForeSight closed loop congestion control to adapt to downstream network conditions.

Distributed Traffic Shaping (DTS) and Frame Relay Traffic Shaping Distributed traffic shaping and distributed FRTS are applicable for 7500 series routers that use the Versatile Interface Processor (VIP) hardware models VIP2-40, VIP2-50, or greater. DTS offloads traffic shaping from the Route Switch Processor (RSP) to the VIP. 110

Networking with z/OS and Cisco Routers: An Interoperability Guide

DTS uses queues to buffer traffic surges that can congest a network and send the data to the network at a regulated rate. This ensures that traffic will behave to the configured descriptor, as defined by the CIR, Bc, and Be. With the defined average bit rate and burst size that is acceptable on that shaped entity, you can derive a time interval value.

Class-Based Policer The recommended way to perform policing is to use the Class-Based Policer. You configure it using the modular QoS CLI. With the Class-Based Policer you can configure three action conditions: conform, exceed, and in-violation. These conditions are based on the specified rate criteria. You designate a rate by specifying the average rate in bits per second, the normal burst size in bytes, and the excess burst size in bytes. The conform action has to do with packets that conform to the specified rate condition. These are packets that are metered at or under the average rate. The exceed action refers to packets that are above the average rate but meet the normal and maximum burst criteria. The in-violation action refers to packets that exceed the normal and maximum burst criteria. Various actions can be specified for each of these three conditions. For instance, you could simply transmit the packet, drop the packet, or mark it (using DSCP, IP precedence for example) and then transmit. It is important to note that packets marked by the policer are not reclassified. In other words, any attributes changed by the policer (for example, setting a new DSCP value) is not considered for classification purposes. This means that policies defined on the next hop will be able to take advantage of the remarked priority. If you need to take actions based on the new setting, then mark the packets on the incoming interface. Then apply the policies on the outgoing interface based on this new setting.

4.4.2 Configuring QoS in the network In order to ensure that the desired QoS is preserved across the network, consistent definitions are required: 򐂰 At the hosts involved with setting TOS or DSCP values or making reservation requests 򐂰 At the edge of the network where classification, marking, or remarking may be done 򐂰 At the core routers so QoS is maintained using MPLS or queuing techniques based on TOS or DSCP values

The use of standard DSCP values under the DiffServ model helps maintain this consistency. Chapter 4. Quality of Service

111

Table 4-3 shows the 6-bit DSCP values defined under the assured forwarding and expedited forwarding per-hop behaviors. Table 4-3 Standard DiffServ code points Assured Forwarding (AF) Drop Precedence

Class 1

Class 2

Class 3

Class 4

Low

AF11 001010

AF21 010010

AF31 011010

AF41 100010

Medium

AF12 001100

AF22 010100

AF32 011100

AF42 100100

High

AF13 001110

AF23 010110

AF33 011110

AF43 100110

Expedited Forwarding (EF) EF 101110

Configuring DiffServ When implementing DiffServ using Cisco IOS software, you define class maps and create the policy maps using the defined class maps. Finally, you apply the policy on the desired interface (or sub-interface) in either the incoming or outgoing direction. The class maps are used to classify packets into one or more Behavior Aggregates (BAs). For example, the following classes may be defined on a DS-node: class-map VoIP-EF > class-map Gold-AF1 > class-map Silver-AF2 > class-map Bronze-AF3 > class-map BestEffort-AF4 > 112

Networking with z/OS and Cisco Routers: An Interoperability Guide

In the policy map, mechanisms such as Weighted Random Early Detect (WRED), Policing, Generic Traffic Shaping (GTS), and Low Latency Queuing (LLQ) for traffic such as VoIP can be specified for each class. Further, on the Cisco 7500 platforms, VIP-based distributed LLQ, GTS, WRED, and FRTS are available to offload these algorithms from the main processor, and achieve high-end scalability. These mechanisms enable traffic conditioning at the edge of a DiffServ domain, or PHBs in a DiffServ internal node. For example, the following policy may be defined on the classes defined above: policy-map DiffServ-Premium-and-Olympic-Policy class-map VoIP-EF

class-map Gold-AF1 > class-map Silver-AF2 > class-map Bronze-AF3 > class-map BestEffort-AF4 > Note: The policer behavior above is compliant with RFC 2597. Traffic that is within the Token Bucket parameter Bc (configured burst) in an interval is within the configured access rate, traffic between Bc and Be is excess traffic, and traffic that is more than Bc + Be (excess burst) is in-violation traffic that will be dropped.

Finally, the policy can be applied on an interface or sub-interface, on an incoming or outgoing basis. For example: Interface Serial1 Service-policy output DiffServ-Premium-and-Olympic-Policy

Chapter 4. Quality of Service

113

The policy-based mechanism offers a clean, simple method of implementing DiffServ. Sometimes, it is not necessary to implement a full-blown set of policies. An example might be a small network where only policing of all traffic is necessary, without any requirements for classification. In this case, we could have a policy with just class-default.

Configuring Class-Based Packet Marking You will need to perform the following tasks to implement Class-Based Packet Marking: 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰

Configure a class Configure a policy for the class Configure IP precedence (if desired) Configure an IP DSCP value (recommended) Configure a QoS Group value (if desired) Configure a Class of Service value (if desired)

Configure a class and define the match criteria. In this example, the match criteria is the destination MAC address: #class-map class-name match destination-address mac MAC-address

Configure a policy map and give it a name: #policy-map policy-name

Then tie the policy to a predefined class. #class class-name

where class-name is the name of a class defined using the class-map command within a service policy. To configure the router to mark a packet by setting the IP precedence bits in the TOS byte, use the following command: #set ip precedence ip-precedence-value

where ip-precedence-value is a number from 0 to 7. To configure the router to mark a packet by setting the IP DSCP value, use the following command: #set ip dscp ip-dscp-value

114

Networking with z/OS and Cisco Routers: An Interoperability Guide

where ip-dscp-value is a decimal number in the range 0-63. You can also use one of the reserved words such as EF (expedited forwarding) or AF11 (assured forwarding class AF11).

Configuring RSVP You configure RSVP so that the Cisco router network will establish a reserved bandwidth path between hosts using out-of-band signaling. After you have carefully planned your RSVP deployment, perform these tasks to configure the network for RSVP. Refer to appropriate Cisco Systems, Inc. documentation for more details.

Enable RSVP RSVP must be enabled since the default is for RSVP to be disabled. Remember, all routers along the data path where RSVP is to be used must be enabled. To enable RSVP on an interface, use the command: #ip rsvp bandwidth [interface-kbps] [single-flow-kbps]

This command starts RSVP and sets the bandwidth limits. The first number is the amount of bandwidth that can be reserved on the interface in total. The second is the amount of bandwidth that can reserved by a single flow. RSVP works in conjunction with WFQ or RED to ensure RSVP QoS. Enable multicast on a router interface by coding: #ip pim {sparse-mode | sparse-dense-mode | dense-mode [proxy-register {list access-list | route-map map-name

For example: #ip pim dense-mode

4.4.3 SNA QoS SNA traffic has some additional Quality of Service features when configured for Enterprise Extender. Enterprise Extender traffic is transmitted as IP UDP packets. The IP packets have the IP precedence bits set according to SNA Class of Service (COS) automatically. Specific UDP port numbers are also used to separate the traffic by SNA COS.

Chapter 4. Quality of Service

115

In a network that observes IP precedence, no further configuration may be necessary to gain a benefit. However, more granular and more comprehensive control can be achieved by further classifying traffic and marking it using DiffServ DSCP values. Table 4-4 Priority mapping with Enterprise Extender APPN Priority

CoS

IP precedence

Precedence

UDP Port

LLC commands

N/A

110

internet (6)

12000

Network

N/A

110

internet (6)

12001

High

#INTER

100

flash-override (4)

12002

Medium

#CONNECT

010

immediate (2)

12003

Low

#BATCH

001

priority (1)

12004

The information in Table 4-4 will be important as we configure the host PAGENT and network DiffServ policies. Pay particular attention to the relationship between default COS names, IP precedence, and UDP port numbers. Refer to this table when analyzing the QoS deployment scenario in a later chapter.

116

Networking with z/OS and Cisco Routers: An Interoperability Guide

MAP SNA CoS to IP ToS SNA Interactive

40%

Telenet

30%

SNA Batch

5%

FTP

5%

Queuing Technology in the Network Provides QoS Data Center

Figure 4-16 IP/SNA convergence

As illustrated in Figure 4-16, SNA Class of Service (COS) can be mapped to IP TOS or assigned DSCP values for classification and traffic management throughout the network.

4.5 Managing Quality of Service So far, we’ve discussed QoS in terms of how to enable the network to handle various application with different network requirements. But what network remains static? Equally important is the ability to manage the QoS features and services as well as constantly reviewing policies for network resource allocation. QoS essentially provides better service to some and worse service to others, with the goal being, at least, adequate service to all. QoS management helps to keep it that way and provide assurances that the network is functioning as designed, that the service levels are what the architects intended, and that service levels are what the customers require and are entitled to.

Chapter 4. Quality of Service

117

4.5.1 What management tools are available? There are several tools available that will help you manage your network QoS configuration and parameters.

z/OS CS SNMP SLA subagent The z/OS CS SLA subagent allows network administrators to retrieve data and determine if the current set of SLA policy definitions are performing as needed or if adjustments need to be made. The SLA subagent supports the Service Level Agreement Performance Monitor (SLAPM) MIB. Refer to RFC 2758 for more information about the SLAPM MIB.

Cisco QoS MIBs The new Cisco Class-Based QoS MIB provides you the same statistics as the show policy interface command provides. To measure packet loss through the network with regard to different classes of service, use Class-Based Queuing over Security Management Information Base (CBQoSMIB (available in 12.1(5) T), SAA/IPM or QPM. See also: 򐂰 CISCO-CLASS-BASED-QOS-MIB.my 򐂰 CISCO-CLASS-BASED-QOS-MIB-CAPABILITY.my

Cisco QoS Device Manager Cisco QoS Device Manager (QDM) is a Web-based network management application that provides an easy-to-use graphical user interface for configuring and monitoring advanced IP-based Quality of Service (QoS) functionality in Cisco Systems routers. QDM is intended for users that are configuring QoS functionality in their network for the first time. These customers need an easy-to-use management application to help them configure and monitor QoS features in the most critical router devices in their networks. Using QDM, network managers can quickly and easily configure QoS functionality and immediately observe the effect that this QoS configuration has on the pattern of network traffic through the router.

Cisco QoS Policy Manager QPM is a QoS policy system that makes it easy to define traffic policies and automate multiple service levels across any network topology. The product enables network-wide, content-based Differentiated Services, centralized policy control for voice/video/data networks, automated QoS configuration and deployment, and campus- to-WAN policy control. By automating the process of

118

Networking with z/OS and Cisco Routers: An Interoperability Guide

translating application performance requirements into QoS policy, QPM helps ensure reliable performance for Internet business applications and voice traffic that contend with noncritical traffic. Using QPM, a network administrator can quickly construct rules-based QoS policies that identify and partition application traffic into multiple levels of service.

4.6 QoS summary Network requirements are driven by application requirements. While that statement is true, often the capabilities of the network lead to new application development that was not previously possible. The result is constant change and continual advancement of the capabilities of both applications and networks leading to higher and higher productivity levels. As application data transfer requirements shift from the store-and-forward variety to more real-time streaming and multimedia needs, the ability to tailor network services more closely to the actual application requirements is increasingly important. Enterprises need QoS to deploy a converged network utility that supports Voice over IP, multimedia e-learning, video conferencing, Web-based services, transaction-based applications, and the list goes on. Service providers benefit in much the same way by passing on the same services to their customers. In addition, QoS allows providers to offer tiered services, QoS-based service level agreements (SLAs), and on-demand services such as content streaming, video conferencing, etc.

4.6.1 QoS reduces costs Deploying QoS within the network and cooperatively based on application requirements reduces the total cost of communications services in many ways. 򐂰 QoS reduces wide area network costs by using available bandwidth more efficiently. 򐂰 The combination of voice and data over the same network reduces long-distance fees and presents a more cost-effective alternative to inter-office trunks. 򐂰 With service policies in place, network managers can concern themselves more with managing the traffic mix than chasing down each performance problem, application by application.

Chapter 4. Quality of Service

119

120

Networking with z/OS and Cisco Routers: An Interoperability Guide

5

Chapter 5.

Load distribution solutions The overloading and low availability of single server services has led to the widespread use of server clustering. That is, to increase availability of the service, the service is replicated across multiple servers in a cluster. Likewise, to reduce the workload on a single server, the replication can enable this work to be spread over multiple servers in the cluster. Because the types of clusters we consider in this book are z/OS systems, it is convenient for us to view the sysplex as the cluster. By providing load distribution, clustering techniques must also provide for other system requirements in addition to the dispatching of connections. These include the ability to advertise some single system-wide image or identity so that clients can uniquely and easily identify the service. Typically this system-wide identity is either an IP address, known as the cluster address, or a host name, known as the service name. In the former, clients will always use the service via the same IP address. In the latter, although the host name is always the same, the service will be identified by different IP addresses depending on server load. This leads to two categories of clustering load distribution: DNS mapping and connection dispatching. Because DNS mapping techniques rely heavily on the DNS infrastructure and host name resolution, they are susceptible to the inefficiencies of this lookup. As a result, connection dispatching mechanisms are the clustering technique of choice for load distribution.

© Copyright IBM Corp. 2002

121

Note: The term load distribution is commonly but erroneously used interchangeably with the term load balancing. The distribution of load is a much easier problem to solve than the balancing of load. In general, we can view load distribution as an approximation of the more desired load balancing.

5.1 Connection dispatching Connection dispatching is the dispatching of TCP connections from a dispatching (or distributing) node to a group of target servers. With this technique, the client perceives a traditional TCP connection with a server. The dispatching node, however, receives data from the client and forwards it to the appropriate target server within the cluster that can reply directly to the client. All systems in this cluster provide information about their workload to a dispatching entity, which is generally referred to as a distribution manager. This manager is responsible for distributing connection requests from clients to the target systems where the application servers are running. The distribution is based on the current workload information collected by a distribution manager. Users at client workstations are not aware of such application server clusters. They try to connect to a service with a virtual IP address, assuming it is running in the machine of the distribution manager. Therefore, the name server should translate the cluster name of application servers into an IP address that points to this virtual address.

5.1.1 What this chapter includes The z/OS platform (as a target server) has a number of options of this type, including: 򐂰 򐂰 򐂰 򐂰

Network Dispatcher Sysplex Distributor MultiNode Load Balancing (MNLB) (and LocalDirector) Sysplex Distributor/MNLB joint solution

This chapter covers the latter three solutions in great detail. These solutions are in the following categories of distribution manager placement: 1. A distribution manager and Forwarding Agent implemented in the sysplex. 2. A distribution manager and Forwarding Agent implemented outside of the sysplex in a Cisco machine in the network. 3. A distribution manager implemented in the sysplex communicating with Forwarding Agents in routers. This is essentially a hybrid approach.

122

Networking with z/OS and Cisco Routers: An Interoperability Guide

First, however, we begin with a review of the TCP connection initialization.

TCP connection flow Because connection dispatching technologies leverage the flow of a TCP connection, we summarize this process here. Of particular interest is the initialization flow of the TCP connection. This is known as the TCP handshake process. This handshake is initiated by the client of the connection. Normally, the handshake process consists of: 1. The client sends a SYN request to the server requesting the start of the connection. 2. The server responds by sending a SYN ACK response (acknowledgment) to the client. 3. The client sends an ACK response to the server. At the completion of this handshake, the connection is established. Data flows from client to server and vice versa during the lifetime of the connection. In the context of connection dispatching, this initialization handshake can be exploited by directing client SYN requests to the appropriate target server to service the connection requests. The distribution manager makes the decision as to which target server to forward the request. A Forwarding Agent makes subsequent forwarding decisions based on the distribution manager’s target selection.

5.1.2 Distribution manager/forwarding agent in the sysplex In the first case, the distribution manager receives the TCP connection request and forwards the connection request based on a load-balancing decision to the real application server, which is one of several servers in a clustered group. This solution is applied by the IBM Sysplex Distributor running on a z/OS or OS/390 system. The TCP connection request is received by the Sysplex Distributor, which is the representative with its IP group address for the requested application. After making the load-balancing decision, it forwards the connection request via the IBM Cross Coupling Facility (XCF) links to the target application servers. Subsequent IP datagrams from the client to the target server are also sent to the Sysplex Distributor and forwarded via XCF links to the target server. This is because the Sysplex Distributor only knows the connection endpoint of the TCP connection.

Chapter 5. Load distribution solutions

123

5.1.3 Distribution manager/forwarding agent outside the sysplex The second case is broken down further into subcases, one that has a single forwarding agent, the other having multiple forwarding agents.

Forwarding through a distribution manager In the first case, subcase a, the distribution manager outside of the clustered system at the edge of the network receives all IP packets and routes these directly to the target system. TCP connection distribution is based on workload information of the target systems. This solution is applied by the Cisco LocalDirector. It is a machine at the edge of the clustered application server group that is responsible for the forwarding decision to the target application server for the current connection. In order to select the target application server with the lowest load in an IBM Sysplex environment, the Cisco LocalDirector uses IBM Workload Manager (WLM) information obtained via the Cisco Dynamic Feedback Protocol (DFP). DFP communicates with Cisco Workload Agents, an application in IBM systems.

Forwarding through special forwarding agents In the second case, subcase b, a Cisco Service Manager, a machine with similar functions to the LocalDirector, prepares the forwarding decision to the target application server for the current connection. The Cisco Service Manager is also implemented outside of the sysplex at the edge of the network. In this case, the Cisco Service Manager works as an advisor for Cisco Forwarding Agents, which request the Service Managers to provide the IP address with the lowest application server load for each new TCP connection. The Cisco Service Manager sends this forwarding information for the TCP connection request to the Cisco Forwarding Agent. Forwarding Agents are located also outside of the sysplex running the target application servers. The Cisco Forwarding Agent is a Cisco router that is able to forward the TCP connection request and all following IP datagrams of the same TCP connection directly to the selected application server. The Cisco Service Manager selects the appropriate application server out of a group of clustered servers based on IBM Workload Manager (WLM) information using the Cisco Dynamic Feedback Protocol (DFP). DFP communicates with Cisco Workload Agents, an application in IBM systems. Multiple Cisco Forwarding Agents at the edge to the sysplex may be involved to provide parallel paths from the client to the target application server. Thus load balancing from clients upstream via multiple nodes is possible. This solution is known as the Cisco MultiNode Load Balancing (MNLB) function.

124

Networking with z/OS and Cisco Routers: An Interoperability Guide

5.1.4 Distribution within sysplex, forwarding outside the sysplex The third case describes the latest solution that was developed by IBM and Cisco in cooperation based on researches for a 'High Availability Web Services solution. This solution provides an extended and adapted package of hardware and software cooperation between the IBM Sysplex server site with all its dynamic functions and Cisco's MultiNode Load Balancing (MNLB) functions. The advantages of this solution include: 򐂰 Avoid inbound traffic flow through the Sysplex Distributor 򐂰 No delays in learning load-balancing information 򐂰 Use policy or Quality of Service (QoS) information for the selection of the "best" server 򐂰 No need for installation of the LocalDirector/Service Manager for the sysplex traffic

What does this mean for the IBM Sysplex Distributor and Cisco LocalDirector/Service Manager and MNLB implementation? 򐂰 IBM Sysplex Distributor provides the load-balancing decisions instead of functions outside of the sysplex. 򐂰 Cisco MNLB provides the forwarding and routing of the TCP connection traffic. 򐂰 The Sysplex Distributor performs functions of the Cisco Service Manager for the Cisco MNLB environment. 򐂰 The IBM Sysplex Distributor selects the most appropriate server based on each system's WLM information. 򐂰 In addition to Quality of Service (QoS) data, policy information provided by the Policy Agent (PAGENT) may be applied also. 򐂰 Since the Sysplex Distributor does the load balancing based on the data the WLMs within the cluster provide, the usage of the Dynamic Feedback Protocol (DFP) will no longer be required. 򐂰 The Sysplex Distributor provides connection information to the switch's (such as Cisco CAT 6500) MNLB function, the Forwarding Agent. This information is transferred via the Cisco protocol Cisco Appliance Services Architecture (CASA). 򐂰 The Cisco switch (or router) uses this information to forward subsequent client/server data directly to the selected server within the sysplex, thus avoiding having all inbound traffic go through a single point such as the LocalDirector or the Sysplex Distributor. 򐂰 Existing Forwarding Agents will be used.

Chapter 5. Load distribution solutions

125

򐂰 The path for inbound traffic via the Sysplex Distributor and over Cross Coupling Facility (XCF) links is no longer used. The switch/router uses the direct way to the server, for example via the OSA-Express adapter. If the OSA-Express adapter shares multiple TCP/IP stacks in other LPARs/systems within a sysplex, the Generic Routing Encapsulation (GRE) Protocol will be used between the target stack and the Cisco switch/router.

5.2 IBM Sysplex Distributor This section describes the following aspects: 򐂰 򐂰 򐂰 򐂰

Elements of the Sysplex Distributor Start and takeover/takeback tasks Load-balancing rules Handling connection requests

5.2.1 Sysplex Distributor elements The Sysplex Distributor environment consists of several systems/LPARs within a sysplex cluster. All TCP/IP stacks are connected via the Cross Coupling Facility to the other systems, and IP links to LAN switches/routers (with, for example, OSA-Express) or channel-attached routers. The Sysplex Distributor is a TCP/IP function that is defined in the TCP/IP profile. It contains information about distributed dynamic virtual IP addresses (DVIPAs) with its association to TCP/IP applications. In our network, the distributed DVIPA is 172.7.1.1. It is associated to the file transfer daemon (FTP) to be accessed via port 20 and 21. The DVIPA in the Sysplex Distributor has to be regarded as a cluster or group address. In our network, for example, 172.7.1.1 is the group address for the FTP server. Multiple FTP servers may run on different TCP/IP target stacks with this DVIPA address. For load-balancing decisions, the Sysplex Distributor knows to which target systems this DVIPA may be distributed. In our network, the DVIPA for FTP is distributed to systems A, B and C. These target systems all have the DVIPA172.7.1.1 active simultaneously. The DVIPAs on the target systems, however, are hidden DVIPAs. This means they can only be addressed by the Sysplex Distributor. They are not propagated to the network for routing purposes. All routes in the network to 172.7.1.1 point to the distributing stack.

126

Networking with z/OS and Cisco Routers: An Interoperability Guide

5.2.2 Sysplex Distributor start tasks and takeover/takeback When the Sysplex Distributor (in our sample SYS1) is started, it contacts all defined target stacks about its dynamic XCF address of the cross coupling link and about the distribution DVIPAs in the target TCP/IP stacks. Target stacks add the distributed DVIPAs to their own TCP/IP profile as non-routeable DVIPAs. These distributed DVIPAs will not be propagated to the network. The target stacks inform the distribution stack about the state of the server application mapping the distributed DVIPA. This state and some other information are used by the Sysplex Distributor to forward connection requests to the target system. In case of a system or a TCP/IP failure, a backup Sysplex Distributor (in our sample SYS2) will take over all activities of SYS1. In order to shorten the time for a takeover, the primary Sysplex Distributor (in our sample SYS1) informs the backup Sysplex Distributor (SYS2) about the distribution DVIPAs of the target stacks. Thus the backup Sysplex Distributor has information regarding all DVIPAs that the primary Sysplex Distributor maintains. To maintain connections, first the Sysplex Distributor has to create and update a destination port table for active DVIPAs associated with ports to reach the applications on target stacks. There is also a “current” routing table to be maintained for distributing connection requests. The content of these tables is used for operator displays, which show for example: 򐂰 Which DVIPA target stacks and ports are currently available with applications ready to receive workload 򐂰 Over which dynamic XCF IP address this target stack can be reached 򐂰 How many connections are routed to the DVIPA and port target stacks 򐂰 What WLM weights and QoS values are currently valid for the DVIPA and port target stack

The following figures show samples of the port table and current routing table.

Chapter 5. Load distribution solutions

127

d tcpip,tcpsys1,net,vdpt EZZ2500I NETSTAT CS V1R2 TCPSYS1 439 DYNAMIC VIPA DISTRIBUTION PORT TABLE: DEST IPADDR DPORT DESTXCF ADDR READY --------------- ------------ ---------172.7.1.1 00021 172.16.1.3 0000000000 172.7.1.1 00021 172.16.1.4 0000000000 172.7.1.1 00021 172.16.1.5 0000000000 172.7.1.2 04444 172.16.1.3 0000000000

TOTALCONN ----------00000000001 00000000001 00000000001 00000000001

Figure 5-1 Dynamic VIPA distribution table

Figure 5-1 shows that FTP connections to the DVIPA cluster address 172.7.1.1 and port 21 are distributed by the Sysplex Distributor to three different TCP/IP stacks with dynamic XCF IP addresses 172.16.1.3, 172.16.1.4 and 172.16.1.5. It also tells that these three servers are ready to receive additional connection requests and that there is already one connection distributed to the DESTXCF and DPORT. The distribution to the destination port 444 has to be regarded similarly.

d tcpip,tcpsys1,net,vcrt EZZ2500I NETSTAT CS V1R2 TCPSYS1 363 DYNAMIC VIPA CONNECTION ROUTING TABLE: DEST IPADDR DPORT SRC IPADDR SPORT --------------- -------------172.7.1.1 00021 192.16.3.3 05299 172.7.1.1 00021 192.16.3.7 05423 172.7.1.1 00021 192.16.3.5 05318 172.7.1.2 04444 192.16.10.2 05712

DESTXCF ADDR -----------172.16.1.3 172.16.1.4 172.16.1.5 172.16.1.3

Figure 5-2 Dynamic VIPA connection routing table

Figure 5-2 shows the TCP connection partners based on their group IP address (for example 172.7.1.1) and destination port number and the client IP address associated with the source port (for example 192.16.3.3. 05299). The Sysplex Distributor selected the target FTP server under destination IP address 172.16.1.3.

128

Networking with z/OS and Cisco Routers: An Interoperability Guide

5.2.3 Sysplex Distributor load-balancing rules There are different load-balancing rules for the incoming connection requests. Distribution may be done based on: 򐂰 Workload Manager (WLM) information of the target systems and application state 򐂰 WLM information and Quality of Service (QoS) information supplied by the Policy Agent, an additional TCP/IP application 򐂰 “Round robin”, a sequential load balancing

5.2.4 Handling connection requests When the Sysplex Distributor receives a connection request, SYN (see 1 in Figure 5-3 on page 131), from the client via the Internet or intranet, it looks at tables to find out what target systems have the desired application with the requested distributed DVIPA. Based on the load-balancing rules, it selects the appropriate target system. In our sample, the Sysplex Distributor selected the target system C for the FTP connection. The connection request is now forwarded (see 2 in Figure 5-3 on page 131) to the distributed DVIPA using the link via the Cross Coupling Facility. The application server starts the connection establishment process sending the ACK as a response to the SYN from the client, and its own SYN request to the client to establish the full-duplex connection. The way back to the client does not traverse the Sysplex Distributor. It goes directly (in our sample via the OSA adapter) to the network, to a switch or router.

5.2.5 Data path after connection establishment The subsequent data flow for this connection has its path from the client to the application server always via the Sysplex Distributor and cross coupling links. From the application server to the client, there is again a direct path via the OSA adapter to the switch/router. The Sysplex Distributor may cause a bottleneck for the inbound traffic. But because the outbound traffic uses another path to the network, it might not be regarded as significant. In most connections, such as Web server traffic, the amount of inbound data is usually much smaller than the amount of outbound data. The advantage of this solution is that all functions are located internally within the sysplex cluster. No function is on external systems or routers.

Chapter 5. Load distribution solutions

129

System definitions and/or changes are handled by the mainframe administration only.

5.2.6 Takeover/takeback Takeover If the primary sysplex distribution stack fails, all backup distribution stacks get information immediately about this failure via the Cross Coupling Facility. The backup distribution stack with the highest rank activates all distributed DVIPAs learned from the primary distribution stack when this stack was started. The rank is a DVIPA definition in the TCP/IP profile. The new distribution stack (the backup stack) informs all target stacks that from now on it is the distribution stack for all “hidden” DVIPAs. Again, the target stacks inform the backup distribution stack about the state of the DVIPAs and their ports representing the application servers and the current connections. The backup distribution stack creates the destination port table and the current routing table for the operating support. Finally, it propagates its taken-over distributed DVIPAs to the network. This enables routers to update their routing table and their path to the new entry point into the sysplex cluster. The new entry point is the IP address (in our sample the OSA adapter of SYS2) of the backup distribution stack.

Takeback The primary sysplex distribution stack is restarted. All defined distributed DVIPAs are activated. The primary sysplex distribution stack informs the backup distribution stack and all defined target stacks that it is ready to take back connections. The target stacks inform the primary distribution stack about the state of their “hidden” DVIPAs and ports. The backup distribution stack transfers the content of the current routing table with all connections to the primary distribution stack. It also updates the distribution routing table through deleting the returned DVIPAs. But this is done only if these DVIPAs are predefined as “moveable immediately”. Finally, the primary distribution stack propagates its distributed DVIPAs to the network. Routers will update their routing tables and paths to reach the primary sysplex distribution stack again.

130

Networking with z/OS and Cisco Routers: An Interoperability Guide

Target Stacks TCP/IP 172.16.1.3

TCP/IP 172.16.1.4

TCP/IP 172.16.1.5

TCP/IP 172.16.1.6

FTP 172.7.1.1

FTP 172.7.1.1

FTP 172.7.1.1

WEB 172.7.1.2

WEB 172.7.1.2

A

WEB 172.7.1.2 OSA

C

B OSA

D

OSA

OSA

2 TCP/IP 172.16.1.1

TCP/IP 172.16.1.2

3

DVIPA FTP 172.7.1.1

DVIPA FTP 172.7.1.1

XCF Links

DVIPA WEB 172.7.1.2

DVIPA WEB 172.7.1.2

1

SYS1 OSA

OSA

Distribution Stack

Switch/ Router

Login FTP (Ports 20,21; Cached IP Address is 172.7.1.1)

Network

SYS2 Backup Distribution Stack

Login WEB (Port 4444; Cached IPAddress is 172.7.1.2)

Dataflow: Network 1 Distr. Stack 2 Target Stack (LPAR C) 3 Network

Figure 5-3 Sysplex Distributor elements and connection paths

5.2.7 Reaching the goals of availability and load balancing Application availability is provided by implementing multiple target stacks on systems/LPARs with multiple application instances running in parallel. If one system/LPAR or application fails, the end user may reconnect very quickly using the same application name. This application name points to the same “hidden” DVIPA, but in another system/LPAR.

Chapter 5. Load distribution solutions

131

Permanent access to the Sysplex Distributor stack is also provided through one or multiple backup Sysplex Distributor stacks. Takeover and takeback of connections running through defined backup distributor stack(s) and primary distribution stack are handled automatically without re-connection. All distribution stacks use information of the Workload Manager and/or Policy Agent to meet the appropriate decision for a selection of the currently best target address within the sysplex cluster.

5.3 Cisco LocalDirector This section provides a brief overview of the functionality and explains the client connection and data flow.

5.3.1 Overview The LocalDirector is a special network unit designed for building highly redundant and fault-tolerant server farms. It works like a transparent learning bridge, which forwards incoming data packets from the IP network (Internet or intranet) to a selected application server within the server farm. The LocalDirector uses load-balancing methods for directing IP datagrams containing TCP or UDP content. Figure 5-4 shows the location of the LocalDirector in the IP network.

132

Networking with z/OS and Cisco Routers: An Interoperability Guide

Cluster Workload Agents

DFP

VIPA a.a.a.b

VIPA a.a.a.c

VIPA a.a.a.d

Cluster VIPA a.a.a.a

Switch

LocalDirector

Table of Active Servers for a.a.a.a

a.a.a.b a.a.a.c a.a.a.d

Router

IP-Network Client

Client

Figure 5-4 LocalDirector bridge between IP-network and server cluster

All servers appear as one virtual server. Therefore, only one IP address (for example, a.a.a.a) and a single Uniform Resource Locator (URL) is required for all members of the server cluster. The LocalDirector communicates with the servers within the cluster to get information such as state of the application server, current workload of the server, etc. This information is transferred via the Dynamic Feedback Protocol (DFP) from the servers to the LocalDirector. IBM OS/390 has a Workload Agent that is the DFP partner for the LocalDirector. DFP will also be used to distribute information about server availability and load to a Cisco Distributed Director (not shown in the figure) at remote sites of the network. This facility improves Web site response time for clients.

Chapter 5. Load distribution solutions

133

The LocalDirector is able to redirect Hypertext Transfer Protocol (HTTP) messages, used mainly for Web traffic, to redirect to a different location in case of a server failure or even of a server cluster. If all real servers are no longer available, redirection of clients to another Web site is provided. The LocalDirector also has functions to prepare a quick failover to a standby LocalDirector. Should a failure situation occur, a failover mechanism automatically replicates the configuration of the origin LocalDirector to the standby LocalDirector. Current user connections to servers that were maintained by the origin LocalDirector are now maintained by the secondary LocalDirector. The connections between clients and servers are not dropped.

5.3.2 Connection and datagram flow Clients get access to the server cluster using a single URL which is translated by the domain name server into a server cluster IP address. This address is also called a virtual IP for the cluster. A static VIPA may be used in an IBM sysplex. Connection setup going through the LocalDirector to a server cluster is different compared with the traditional client/server connection setup without the LocalDirector. Because the real server is not known by the client, the server cluster IP address is used. But this address is used only to reach the LocalDirector as representative of the server cluster. Only the LocalDirector knows if the resource for the client request is available. If it is available, the server cluster IP address will be assigned to reach the real server IP address. This procedure requires the interruption of the first SYN request at the LocalDirector. The LocalDirector first has to find out, based on load-balancing rules, which real server has to be selected. When the selection process is done, it assigns the real destination IP address and forwards the SYN to the selected server. It also adds the connection to the list of all client/server connections going through this LocalDirector. This information is important in case of a takeover through a secondary LocalDirector. The TCP handshaking process will now be continued between the real server and the client. Every time a packet flows through the LocalDirector, it is aware that the connection is still running, and that the application and the system are also still working.

134

Networking with z/OS and Cisco Routers: An Interoperability Guide

5.4 Cisco MultiNode Load Balancing (MNLB) Compared to the IBM Sysplex Distributor solution, Cisco’s preferred way to reach the goals of availability, scalability and load balancing is completely different. Availability of application servers is seen from the view of the network attached to the server cluster. Access to the server cluster is reached via switches or routers. Beyond the server cluster, the entire control is done externally for the following: 򐂰 The access to available application servers within the server cluster 򐂰 The scalability of extending the access to the server cluster 򐂰 The administration of the workload of each server within the server cluster 򐂰 The load-balancing process to select the currently best machine for the requested client/server connection 򐂰 The forwarding of this connection request to the selected server target

This requires that information of servers within the cluster has to be sent to special service applications implemented in machines (switches, routers or other server) attached to the intranet. These service applications are called MNLB components. Four components are available: 򐂰 򐂰 򐂰 򐂰

Service Manager Forwarding Agent Workload Agent Backup Service Manager

Figure 5-5 shows the location of the MNLB components.

Chapter 5. Load distribution solutions

135

Cluster1

Cluster2

Workload Agents

XCF

Workload Agents

XCF

Router

Forwarding Agents

Router

Switch

Router

Backup Services Manager Services Manager (Local Director)

Switch

Router

Router

Client

Client

IP-Network Figure 5-5 MNLB components

5.4.1 Overview of the MultiNode Load Balancing (MNLB) functions MNLB is designed as a server load-balancing solution for new e-commerce and e-business applications. MNLB consists of software running in Cisco routers and switches, Cisco LocalDirector, and application server platforms. It distributes TCP connection requests and IP datagrams based on load-balancing information across any number of routers with higher level of availability, scalability, and performance for server applications, also located within the IBM sysplex environment.

136

Networking with z/OS and Cisco Routers: An Interoperability Guide

A Service Manager, located in the Cisco LocalDirector, is responsible for the distribution of connection requests. This is done using information about application availability, server processor capacity, and load-balancing algorithms such as round robin or least connections, or information received through the Dynamic Feed Back Protocol (DFP). This protocol, for example, carries information of the IBM OS/390 Workload Manager (WLM). A Workload Agent provides the information the Service Manager needs to calculate an optimum load-balancing result for the server selection. The Workload Agent is software that runs on server platforms or machines that manage server farms or clustered server environments. The Cisco Workload Agent for OS/390 uses weight data obtained from the IBM OS/390 Workload Manager (WLM). It converts this data into a common Dynamic Feed Back Protocol (DFP) before it is sent to the Service Manager. The Cisco Workload Agent for OS/390 optimizes load balancing in an IBM sysplex environment. A Forwarding Agent is used as a packet redirector that forwards packets based on the Service Manager’s instructions. The Forwarding Agent is software running in Cisco routers or route switches modules. A Backup Service Manager is responsible to provide connection establishment when the primary Service Manager fails.

5.4.2 Connection establishment and subsequent data flow 1. The client starts a connection sending the connection request to an application server. 2. A name server resolves the domain name into a virtual IP address, also called the cluster IP address, or cluster VIPA, or distributed VIPA. It is a generic IP address that addresses an application group running in a clustered environment. This address has to be assigned to a real IP address after the load-balancing process. 3. When a Forwarding Agent receives this connection request with the cluster VIPA, the request is forwarded to the Service Manager. The connection request is discovered by the Forwarding Agent by checking the TCP header with the SYN flag on. 4. The Service Manager makes its load-balancing decision based on periodic information provided by the Workload Agents. The optimal application server is selected out of a cluster of servers. 5. The Service Manager sends the connection request, containing the IP address of the optimal application server, back to the Forwarding Agent.

Chapter 5. Load distribution solutions

137

6. Thus the Forwarding Agent learns the destination of the specific connection request. It forwards the connection request to the application server. 7. The server receives the connection request. It tries to establish the connection sending the acknowledgment for the received SYN request. It also sends a SYN request to the client. 8. When the client accepts the SYN request, it also acknowledges the received SYN and sends the first data directly to the application server using the IP address provided by the Service Manager. The Service Manager is no longer contacted during the existing connection. Figure 5-6 on page 139 shows the previously discussed connection flow. Note: Clients do not use the host IP address for the application but the cluster VIPA address, which is a group address only for all hosts within the cluster. The real host IP address for the application is determined by the Service Manager based on workload management information.

5.4.3 Client/server connection restart If a system/LPAR, or TCP/IP stack, or application fails, the end user may reconnect very quickly to another application instance on a different TCP/IP stack within the same cluster VIPA address. The end user will again use the known application domain name, which will be translated by the domain name server into the cluster VIPA address. The Forwarding Agent within the router again discovers the SYN request and asks the Service Manager for the real host VIPA. The Service Manager has received the information of the failing system/LPAR, or TCP/IP stack, or application through DFP message exchange from the Workload Agent, including the state of the currently running applications. It instructs the Forwarding Agent to use a new host VIPA address for the desired application based on load-balancing policies. The rest of the connection establishment process is done as described under 5.4.2, “Connection establishment and subsequent data flow” on page 137.

138

Networking with z/OS and Cisco Routers: An Interoperability Guide

Cluster XCF

Workload Agents DVIPA .a.a.a.c

DVIPA a.a.a.b

D VIPA a.a.a.d

DFP

Cluster VIPA a.a.a.a SY N

Router with Forwarding Agents

A C

SY a.a N to .a. c

K

Router with Forwarding Agents

NA CK

Switch

SY

Returned a.a.a.c

Table of Active Servers for a.a.a.a

a.a.a.a ?

Table of Active Servers for a.a.a.a

a.a.a.a

Backup Services Manager

a.a.a.b a.a.a.c a.a.a.d

Router

Router

Services Manager (Local Director)

SYN to a.a.a.a

Connection Request to a.a.a.a

IP-Network Client

a.a.a.b a.a.a.c a.a.a.d

Client

Figure 5-6 Connection setup flow

5.4.4 Reaching the goals of availability and load balancing The analysis is divided into two parts: 򐂰 The first part views the clustered application server within the sysplex environment and the connection via the OSA adapter to the switch.

Chapter 5. Load distribution solutions

139

򐂰 The second part views components of the intranet such as the Service Managers, the Forwarding Agent, and routers.

Sysplex environment MNLB does not necessarily require dynamic VIPA. Applications may also be members of static VIPAs defined for each host. Multiple application instances are started on the other systems/LPARs within the same cluster VIPA. DVIPAs have the advantage that they may be moved to another TCP/IP stack if the origin stack fails. Which DVIPA/VIPA address will finally be used for the client/server connection depends on the result of the Service Manager’s load-balancing decision. Thus the control of using the application resources is done externally from the sysplex environment by the Service Manager in cooperation with the internal function of the Workload Agent residing in each system/LPAR. Application server availability is determined by the load of the machines within the sysplex and the amount of running application instances. Static VIPAs do not provide automatic takeover or takeback. This is not necessary because the external Service Manager is responsible for the distribution of the connections. Thus the Service Manager determines the workload of the components within the sysplex. Connections from the sysplex to the intranet may have several backups to provide alternative paths. Availability and load balancing will be treated well by the components within the sysplex and the external Service Manager. Should the business grow, a scalable solution should be planned. This could be, for example, installing additional servers within the sysplex without interrupting current connections. This can be achieved through adding new systems, using dynamic XCF for all systems within the sysplex. New DVIPAs/VIPAs have to be added to the existing cluster DVIPA/VIPA and propagated to the Service Manager.

Components of the intranet Components of the intranet include the Service Managers, the forwarding agent, and routers.

140

Networking with z/OS and Cisco Routers: An Interoperability Guide

Service Managers The most important component is the Service Manager located in the LocalDirector’s machine. The Service Manager is the address for all client connections requests. If this component fails or it is not reachable due to failed hardware (for example: adapter or line), then no connection request could be handled from any client to applications running in the sysplex. Therefore, a backup Service Manager should be installed. In case of a failed primary Service Manager, a quick switch to the backup Service Manager has to occur. Prerequisite for the fast switchover is the same knowledge the primary Service Manager has about the states of the components of the sysplex. This means that the primary Service Manager has to exchange data such as workload data and information about the states of application servers, etc. with the backup Service Manager. Service Managers multicast their IP address and cluster DVIPAs/VIPAs they are responsible for into the network. These addresses are used by Forwarding Agents to determine to which Service Manager a connection request has to be sent for assigning the applications’ DVIPA/VIPA. Implementation of only one Service Manager might cause a bottleneck, especially when Web services are requested. Web services use the Hypertext Transfer Protocol (HTTP), which uses many TCP connections during Web surfing. Thus the availability of the external Service Managers may become crucial for quick client access to application servers. Caching of repeating connection requests and their associated DVIPAs/VIPAs in the Forwarding Agent might reduce the load of the Service Manager. If scalability is needed, additional machines with Service Manager functions may be installed in the intranet. Consequently, Service Manager should not be responsible for too many cluster VIPAs.

Forwarding agents Since Forwarding agents discover connection requests, they are important as well. A failure of the forwarding agent may be solved through a backup solution within the intranet. In this situation, another forwarding agent on the path to the desired cluster VIPA would discover the connection request. It would send the request to the IP address of the Service Manager’s that is responsible for the addressed cluster VIPA. The availability of the external forwarding agent is also critical for quick client access to application servers.

Chapter 5. Load distribution solutions

141

Scalability is easily achieved by installing additional routers with forwarding agents in the intranet. This would have no impact on current running connections. The question has to be solved how to implement parallel paths from clients to forwarding agents in order to avoid overloading the path to the forwarding agent. This path will also be used by the current datagram traffic. This discussion, however, is beyond the scope of this book.

5.5 IBM Sysplex Distributor and Cisco MNLB Because of the explosive growth and forecasts in the near future in IP applications and leveraging access to traditional OS/390 and z/OS controlled transactions and databases, IBM and Cisco in cooperation researched a “High Availability Web Services” solution. This solution provides an extended and adapted package of hardware and software cooperation between the IBM sysplex server site, with all its dynamic functions, and Cisco’s MultiNode Load Balancing (MNLB) functions. The advantages of this solution include: 򐂰 Avoid inbound traffic flow through the Sysplex Distributor 򐂰 No delays in learning load-balancing information 򐂰 Use policy or Quality of Service (QoS) information for the selection of the “best” server 򐂰 No need for installation of the LocalDirector for the sysplex traffic

5.5.1 What does this mean? IBM Sysplex Distributor provides the load balancing. Cisco MNLB provides the routing. 򐂰 The Sysplex Distributor receives functions of the Cisco Service Manager for the Cisco MNLB.

– It selects the “best” appropriate server based on each system’s WLM information, or Quality of Service (QoS) data, or policy information provided by the Policy Agent (PAGENT). Since the Sysplex Distributor does the load balancing based on the data the WLMs within the cluster provide, the usage of the Dynamic Feedback Protocol (DFP) will no longer be required. – It provides connection information to the switch’s (such as Cisco 6500) MNLB function, the Forwarding Agent. This information is transferred via the Cisco protocol Cisco Appliance Services Architecture (CASA).

142

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 The switch (or router) uses this information to forward subsequent client/server data directly to the selected server within the sysplex, thus avoiding having all inbound traffic go through a single point such as the LocalDirector or the Sysplex Distributor.

– Existing MNLB Forwarding Agents will be used. The path for inbound traffic via the Sysplex Distributor and over cross coupling links is no longer used. The switch uses the direct way to the server, for example via the OSA-Express adapter. If multiple LPARs/systems share the OSA-Express GbE adapter, the Generic Routing Encapsulation (GRE) Protocol is used to address the application’s DVIPA address of the selected TCP/IP stack. In this case, a GRE tunnel is set up from the Cisco switch/router to the selected TCP/IP stack. Note: The Sysplex Distributor may be used concurrently as designed for OS/390 V2.10 and for the new functions.

This means that there are DVIPAs for use with the Cisco Service Manager function in the Sysplex Distributor, and DVIPAs for use in the previous V2.10 workload distribution solution that may be defined in the TCP/IP profile.

5.5.2 Overview of IBM Sysplex Distributor with Service Manager In the Cisco MNLB configuration, the Service Manager provides a load-balancing algorithm for the distribution. In order to get workload data for the load-balancing process, the Service Manager communicates with the Cisco Workload Agent in each server. This program retrieves information from the OS/390 WLM. The Service Manager uses the Dynamic Feedback Protocol (DFP) to communicate with the Workload Agent. After the load-balancing process, the Service Manager selects the appropriate application server from a server cluster. Finally, it informs the Forwarding Agent to which real server IP address the connection request has to be forwarded. The exchange of this information between the Service Manager and the Forwarding Agent is done via the Cisco Appliance Services Architecture (CASA) protocol. In the IBM z/OS solution, the Sysplex Distributor now has functions of the Cisco Service Manager. The Sysplex Distributor uses the same technology as in the previous release. This means it uses a distributed DVIPA as the cluster address and it also does the selection of the "best" application server based on z/OS Workload Manager (WLM) weighted information. But this selection is extended by using further information about QoS and policies defined in the Policy Agent’s database, such as the LDAP server or a private database.

Chapter 5. Load distribution solutions

143

The IBM Sysplex Distributor communicates with the external Cisco Forwarding Agent in a switch or router using also the CASA protocol to exchange data with the Forwarding Agent. The Sysplex Distributor instructs the Forwarding Agent what real IP address is valid for the application server in the connection request (see 5.5.5, “Connection establishment process” on page 145). In order to let the Forwarding Agent know which cluster server IP addresses and ports are maintained by the Sysplex Distributor, a device activation process has to be started prior to resolving the first connection request. This is done through sending a CASA packet with information to the Forwarding Agent such as: 򐂰 DVIPA for Destination IP address and port defining the cluster address 򐂰 Local address which is a Dynamic XCF IP address and port defining the address to be used by the Forwarding Agent for CASA packets.

This information, called wildcard cache (see Figure 5-9 on page 149), causes the Forwarding Agent to watch for packets containing the destination DVIPA with the specified port and, if it is a connection request, get the real server IP address from the Sysplex Distributor for the specified Dynamic XCF IP address cluster address (see Figure 5-8 on page 149) through the affinity table.

5.5.3 Cisco Forwarding Agent, overview and functions The Cisco Forwarding Agent works as described under 5.4.1, “Overview of the MultiNode Load Balancing (MNLB) functions” on page 136. It is responsible for detection of all incoming connection requests, requesting the real server’s IP address from the Sysplex Distributor and routing the connection requests and all subsequent datagrams directly to the real server. The Forwarding Agent creates an affinity record for all connections, which is updated when the Service Manager returns the real server IP address.

5.5.4 Cisco Workload Agent The Workload Agent resides in the system of the application server. It provides feedback to the Service Manager concerning workload and availability. The Workload Agent uses the DFP protocol to provide the Service Manager with workload information. The Cisco OS/390 Workload Agent is an implementation of the Workload Agent. It is a program running in an OS/390 address space that retrieves workload information from the IBM OS/390 Workload Manager (WLM). In a sysplex configuration, the Sysplex Distributor communicates directly with the IBM Workload Managers of each system/LPAR. Therefore, the Cisco OS/390 Workload Agent is not required.

144

Networking with z/OS and Cisco Routers: An Interoperability Guide

5.5.5 Connection establishment process The following description is an overview of the TCP connection establishment process, which is illustrated in Figure 5-7 on page 147. 1. The client starts a TCP connection request by logging in to a TN3270E server, FTP server, Web server, etc. The client selects as the host name not a real application server but a server group or cluster name within the sysplex cluster. The host name is translated by a domain name server into an IP address for the application group within the sysplex cluster. 2. The connection request is transmitted to the IP net. It is a TCP SYN request. The first router (the default router for the client) finds in its routing table the IP address of the sysplex cluster, which is defined as a virtual IP address in a table of the Cisco Forwarding Agent responsible for the requested sysplex cluster. We explain later how this sysplex cluster IP address (a dynamic virtual IP address (DVIPA) is registered in the Forwarding Agent's table and propagated to the network. 3. When the Forwarding Agent receives IP packets, it explores the content of this packet. If a connection request was received (this is marked by a SYN-bit in the TCP header), it checks the cluster IP address to see which Service Manager is responsible for the cluster address representing a bunch of application servers within the sysplex. This cluster address was previously multicasted by the Service Manager in the Sysplex Distributor to all Forwarding Agents. The Forwarding Agent has propagated this cluster address to the network. After locating the responsible Service Manager, the Forwarding Agent forwards the received connection request encapsulated as a CASA packet to the Service Manager in the IBM Sysplex Distributor. 4. The Service Manager in the Sysplex Distributor uses the WLM and QoS definitions to select the "best" application server within the sysplex and forwards the connection request to the real target server for the client/server connection. The connection request is forwarded via the XCF link to this target server. The XCF link IP address is a dynamically created IP address. It is built when the TCP/IP stack comes up and starts connections to other TCP/IP stacks within the sysplex. 5. The Service Manager also sends a unicast packet to the Forwarding Agent containing the selected target server IP address for this TCP connection. This is the XCF link IP address of the target server. The Forwarding Agent updates its affinity cache with the real server address. 6. The TCP/IP stack of the application server system/LPAR checks if IP and TCP headers are valid. If all is OK, TCP returns an acknowledgment (ACK) for the received SYN request and also sends a SYN request to the client to establish a full-duplex connection.

Chapter 5. Load distribution solutions

145

7. The client returns an ACK for the SYN of the server. This packet will not use the path to the server via the Sysplex Distributor and the XCF link as done in OS/390 V2.10, but now uses the direct path via the Forwarding Agent’s router and the OSA-Express adapter to the target server. – If the path from the Forwarding Agent to the TCP/IP stack goes via an OSA-Express adapter and this adapter shares its Ethernet attachment with multiple LPARs, the Forwarding Agent will encapsulate the IP datagram defined by the Generic Routing Encapsulation (GRE) Protocol frame. 8. After receiving the last ACK from the client at the target server’s side, the full-duplex TCP connection is established and the first data is sent to the client. 9. Data sent from the client uses the direct path to the target server not touching the Sysplex Distributor. 10.During the connection establishment process, the target server informs the Sysplex Distributor about the connection state. The Sysplex Distributor updates its connection routing table. Other Forwarding Agents may also be used on the path from the client to the server and reverse. These Forwarding Agents will allow multiple paths for load-balancing traffic over the network and for backup purposes. If another Forwarding Agent receives an IP packet for the referenced application server, it checks its affinity tables for an existing source-destination IP and port address affinity. – If a matching record is found, the client packet is pointed to the real server IP address to which a connection already exists but now using a parallel path via another Forwarding Agent. – If no matching record is found (for example, no TCP connection is reported yet or an existing connection is initiated via another Forwarding Agent), the IP packet is sent to the Service Manager known through its IP address which was multicasted previously together with cluster addresses. The flow continues using the same procedure described from 5 to 10.

146

Networking with z/OS and Cisco Routers: An Interoperability Guide

Sysplex

4: Forw. Packet 10: Conn. Info x.x.x.1

XCF

x.x.x.2 Target DVIPA a.a.a.a Target DVIPA a.a.a.b

Target DVIPA a.a.a.a Target DVIPA a.a.a.b

x.x.x.3 Sysplex Distributor Service Manager DVIPA a.a.a.a

x.x.x.4 Backup Sysplex Distributor Service Manager

DVIPA a.a.a.b

6: SYN, ACK 8: Data

7: ACK 9: Data

3: CASA Packet

Switch/Router with Forwarding Agent

Switch/Router with Forwarding Agent

Router

Router

8 :Connection Established

ftp Connection Request to a.a.a.b

5: CASA Unicast

Affinity Tables

1, 2: SYN , 7: ACK 9: Data to Forwarding Agent

IP-Network Client y.y.y.1

Client y.y.y.2

Telnet Connection Request to a.a.a.a

Figure 5-7 Establishment of a telnet connection

Figure 5-7 shows a telnet connection from a client to a telnet server group a.a.a.a. The Sysplex Distributor selected for this connection the real server IP address x.x.x.1. Another client connection may be routed to x.x.x.2 based on the load-balancing process. The other client has a connection to the real server x.x.x.2.

Chapter 5. Load distribution solutions

147

5.5.6 Failure of application server, TCP/IP stack, system/LPAR If the application server, or the TCP/IP stack, or the system/LPAR fails there is no real IP address available and the client connection will be lost. The sysplex distributor knows about this failure through WLM information and if the stack fails through lost XCF link connection. The client may start immediately a new reconnection with the same cluster address (for example, a.a.a.a for telnet). The Forwarding Agent will setup a new CASA request to the Sysplex Distributor. The Sysplex Distributor will look for an alternative running application instance on another TCP/IP stack and returns another real IP address to the Forwarding Agent (for example, x.x.x.2). The connection establishment process continues as described before.

5.5.7 Failure of the Sysplex Distributor If the Sysplex Distributor fails, the backup Sysplex Distributor will takeover all responsibilities of the primary Sysplex Distributor. The backup Sysplex Distributor gets the Information via the Cross Coupling Facility and via the WLM of the failing application, or TCP/IP stack. In case the whole system is down, then the WLMPOLL of the backup Sysplex Distributor will time-out. This is the signal for the takeover process. Please see detailed information about the takeover and takeback process in 5.2.6, “Takeover/takeback” on page 130.

5.5.8 Routing packets Routing of inbound packets is done through the Forwarding Agent (installed in Cisco switch). Inbound packets have cluster IP address of the application server. The Forwarding Agent looks at the content of the packet, compares the information with its affinity cache. If it finds a matching entry it knows that there is a running connection. It routes the packet to the associated real IP address of the application server. Outbound packets are sent via the switch or router to the intranet. OSPF load-balancing mechanism may be used to distribute the load over parallel equal cost paths. The information in the affinity cache consists of: 򐂰 򐂰 򐂰 򐂰

Source IP address/port Destination IP address/port Protocol Real IP address

Figure 5-8 shows the content of the affinity cache based on the previous connection sample.

148

Networking with z/OS and Cisco Routers: An Interoperability Guide

Affinity Table Source Address y.y.y.2 a.a.a.a y.y.y.1 y.y.y.1 a.a.a.b a.a.a.b

Port 4213 23 4178 4178 20 21

Dest. Address a.a.a.a y.y.y.2 a.a.a.b a.a.a.b y.y.y.1 y.y.y.1

Port 23 4213 20 21 4178 4178

internal connection information Prot real server TCP x.x.x.1 TCP x.x.x.1 TCP x.x.x.2 TCP x.x.x.2 TCP x.x.x.2 TCP x.x.x.2

Figure 5-8 Affinity table cache

5.5.9 Additional tasks of the MNLB components Service Manager The Service Manager is configured with DVIPAs specifying the cluster address that map to a real server address. The Service Manager uses multicast addresses to send its mapping information to Forwarding Agents via the CASA protocol using Internet Group Management Protocol (IGMP) IP address and port. This initial multicast contains wild card information to offer services for the cluster addresses (in our sample a.a.a.a and a.a.a.b). The information creates the following wildcard cache entries for the inbound and outbound traffic to be observed by the Forwarding Agent. The Sysplex Distributor offers the TCP protocol.

Source Address 0.0.0.0 0.0.0.0 0.0.0.0 a.a.a.a a.a.a.b a.a.a.b

Source Mask 0.0.0.0 0.0.0.0 0.0.0.0 255.255.255.255 255.255.255.255 255.255.255.255

Port 0 0 0 23 20 21

Dest Address a.a.a.a a.a.a.b a.a.a.b 0.0.0.0 0.0.0.0 0.0.0.0

Dest Mask 255.255.255.255 255.255.255.255 255.255.255.255 0.0.0.0 0.0.0.0 0.0.0.0

Port 23 20 21 0 0 0

Prot TCP TCP TCP TCP TCP TCP

Figure 5-9 Wildcard cache

The first three lines in Figure 5-9 determine that the Forwarding Agent accepts incoming client packets from any IP address with any source port for the cluster addresses with the defined destination and port address (for example, a.a.a.a, 23). .

Lines 4 to 6 describe the allowed outbound address combinations.

Chapter 5. Load distribution solutions

149

All information will be used by the Forwarding Agent for content checking of all IP packets. Content checking means that the Forwarding Agent looks into every IP datagram in order to check if it belongs to a connection establishment request or to a current connection. Thus there is a steady control of all TCP connections to the sysplex. This information is transferred to the Sysplex Distributor for display purposes and for distribution to the backup Sysplex Distributor in case of a takeover.

Backup Service Manager The entire CASA architecture is dependent on the Service Manager. If the Service Manager fails, no client request for connection setup can be executed. Therefore, the implementation of one or more backup Service Managers is very important. The Backup Service Manager periodically registers flow information via wildcard affinities with the Forwarding Agents. If the Service Manager fails, the Forwarding Agent automatically chooses out of a precedence selection the Backup Service Manager with the next higher backup precedence flow.

Forwarding Agent The Forwarding Agent is responsible for the routing of IP packets. Routing is done based on the information of the Service Manager. It learns via wildcards received from the Service Manager which functions to perform. This means that the Forwarding Agent has to look at IP packets for source and destination IP addresses and ports the Service Manager has provided via received wildcards. Then, the Forwarding Agent checks all IP headers of the inbound and outbound traffic to see if a new connection request comes in or if an registration already has occurred. A registration is an entry in the affinity cache. 򐂰 If there is no registration, the Service Manager is contacted to provide the IP address of “best” target server. 򐂰 If it matches the information of one affinity entry in the cache, a connection already exists.

The Forwarding Agent is able to receive wildcard information via the same multicast IGMP address and port as the Service Manager. Thus the Sysplex Distributor running the CASA protocol has to be configured with the same multicast IGMP address and port as the Forwarding Agent.

150

Networking with z/OS and Cisco Routers: An Interoperability Guide

Part 2

Part

2

Implementation examples

© Copyright IBM Corp. 2002

151

152

Networking with z/OS and Cisco Routers: An Interoperability Guide

6

Chapter 6.

Configuring CLAW, MPC+ and OSA-Express This chapter describes how to configure IBM z/OS IOCP, VTAM, TCP/IP and Cisco 7x00 series routers containing CIP(s) or ECPA(s) for CLAW, MPC+ and QDIO. Support for CLAW is contained in every level of IOS/CIP/CPA microcode. Support for CMPC+ is in IOS 12.0(3)T (and all later releases) running CIP or CPA 27-1 (and all later releases). Note that 12.0(3)T and CIP/CPA 27-1 are the minimum levels of IOS and CIP/xCPA microcode. They are not the currently suggested levels. If not running an ECPA4, then a suggested IOS level would be the latest 12.1 mainline (that is, having no letter suffix, such as T) along with the latest 27 level CIP/CPA microcode. The most current IOS and CIP/CPA levels are 12.2 and 28-x. These are required if the CPA is an ECPA4. In the network created for this book, we have the following hardware: 򐂰 Sysplex consisting of four simulated LPARs in four virtual machines running under z/VM. Note: The real device addresses and the virtual device addresses are the same. 򐂰 ESCON director. 򐂰 OSA-Express adapter that is shared among the four LPARs. 򐂰 Cisco 7507 router with a dual-port CIP, dual-port Fast Ethernet port adapter (PA) and a four-port serial PA.

© Copyright IBM Corp. 2002

153

򐂰 Cisco 7206 with an ECPA, a Gigabit Ethernet PA and a four-port serial PA. 򐂰 Cisco Catalyst 6509 switch with a dual-port Gigabit Ethernet blade and a 48-port Fast Ethernet blade.

We chose these two routers along with the OSA-Express adapter and the Catalyst 6509 switch to demonstrate a wide variety of host connections and configurations. We have MPC+ connections to each LPAR from the dual-port CIP. We configured the read subchannels on one physical channel interface and the write subchannels on the other physical interface to demonstrate this CMPC+ capability. This is only possible on the dual-port CIP. Depending on actual traffic patterns, it may be preferable to split the read and write subchannels differently. It also demonstrates the configuration differences that exist between the CIP and the CPA due to the existence of the virtual (x/2) channel interface. We also have CLAW packing connections to the four LPARs from the Cisco 7206 with ECPA. Both the CLAW read and write subchannels must go over the same physical interface, so nothing is lost utilizing the (single port) CPA. Our third type of host connection is via the shared OSA-Express adapter and the Catalyst 6509 switch. This provides the highest performance of any of these host connections. We had a Gigabit connection going into the Cisco 7206 but into the Cisco 7507 we configured as a fast EtherChannel (port-channel interface) and included two Fast Ethernet interfaces in the fast EtherChannel to demonstrate two different ways of connecting routers to the switch. Configuring a CMCC adapter, Catalyst 6509 and its associated features requires that you perform tasks for configuration of the mainframe and the router/switch sides of the network. Our network is illustrated in Figure 6-1.

154

Networking with z/OS and Cisco Routers: An Interoperability Guide

IBM z/OS Sysplex .73

9.67.156.72/29

.74

Backup Sysplex Distributor (Service Manager) Static VIPAs 9.67.156.5/30 Dynamic VIPAs 9.67.156.49/29 9.67.156.50/29

Sysplex Distributor (Service Manager) Static VIPAs 9.67.156.1/30 Dynamic VIPAs 9.67.156.25/29 9.67.156.26/29 Distributed VIPAs 9.67.157.17/29 9.67.157.18/29 .66

.17

Static VIPAs 9.67.156.161/30 Dynamic VIPAs 9.67.156.33/29 9.67.156.34/29

.129

.69

.18

.75

MVS062 MVS069

MVS062 MVS069

MVS001

.76

XCF

.130

.67

.20

.132

MVS154 MVS154

Static VIPAs 9.67.156.165/30 Dynamic VIPAs 9.67.156.41/29 9.67.156.42/29

.68

.19

.131

9.67.156.16/29 9.67.156.64/29

READ

WRITE

9.67.157.128/28

D9 CA

D1

AD

7507

fa0/0/0

fa0/0/1

.137

ESCON Director

AC F9

CMPC+ .21, .21 .21, .21 c6/0 c6/1

OSAExpress GbitE

CLAW

CMPC+

.65, .65 .65, .65 c3/0

7206VXR GRE Tunnels

GRE Tunnels g1/0 1/2 .136

1/1

6509

2/26

9.67.157.128/28 vlan 400

2/25

Figure 6-1 Our network

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

155

6.1 Cisco CLAW support Cisco 7000 and 7500 series routers with CIP card and 7200 series routers with a CPA card supply CLAW channel connectivity to the IBM z/OS system via an ESCON channel. The system IOCP must be configured, the router CMCC must be configured, and the z/OS TCP/IP profile must be modified.

6.1.1 IOCP definitions for CLAW devices Each CLAW or CMPC+ connection requires two devices out of a maximum of 256. Although this allows for a maximum of 128 CLAW and CMPC+ connections per interface, a maximum of 32 connections per interface is recommended by Cisco. Actually, since the bandwidth of the ESCON channel is fixed, all CMPC, CLAW, and CSNA connections share that fixed bandwidth. The more connections that are transferring data, the less available bandwidth for each connection. With CLAW, both devices must exist on the same subchannel. They are consecutive device addresses. With MPC+, the read and write may exist on the same subchannel or, if running a dual-port CIP, may be on separate ones. For both CMPC and CLAW the CNTLUNIT should be defined as an 3172 and the IODEVICE should also be defined as a 3172. You can see sample host IOCP statements that we used for our 7206 ECPA CLAW devices in Example 6-1. Since the four z/OS systems in the sysplex are really virtual machines running under z/VM, there is only one IOCP for all four z/OS systems. Example 6-1 z/VM IOCP for all z/OS guests for CLAW *CLAW 1FA0 - 1FA7 RESOURCE PART=((RALVM9,1),(TIVMVS2,2),(RALNS3,3),(CARVM4,9), (TIVVM1,B),(RALHCD,F))

X

CHPID PATH=A8,TYPE=CNC,SWITCH=08, PARTITION=((RALVM9),(RALVM9),REC)

X

CNTLUNIT

CUNUMBR=1FA0,PATH=A8,UNIT=3172, UNITADD=((A0,16)),LINK=AC,CUADD=2

X

IODEVICE

ADDRESS=(1FA0,16),CUNUMBR=1FA0,UNIT=3172

This defines 16 unit addresses (1FA0-1FAF), of which we used a total of eight (1FA0-1FA7).

156

Networking with z/OS and Cisco Routers: An Interoperability Guide

6.1.2 Router definitions The Cisco router channel port is configured with an IP address for the connection to the z/OS TCP/IP stacks. You can see our definitions in Example 6-2. Example 6-2 CMCC CLAW definitions interface Channel3/0 description CLAW interfaces to sysplex hosts ip address 9.67.156.65 255.255.255.248 no ip redirects ip directed-broadcast ip ospf network point-to-multipoint ip igmp join-group 224.0.1.2 no ip mroute-cache load-interval 30 no keepalive service-policy input SETDSCP claw D102 A0 9.67.156.66 MVS001B C7507A PACKED claw D102 A2 9.67.156.67 MVS069B C7507B PACKED claw D102 A4 9.67.156.68 MVS154B C7507C PACKED claw D102 A6 9.67.156.69 MVS062B C7507D PACKED end

PACKED PACKED PACKED PACKED

broadcast broadcast broadcast broadcast

The ip address field defines the IP address of the interface. The claw command has the following format: claw path device-address ip-address host-name device-name host-app device-app [broadcast]

Where: 򐂰 The path value’s first two digits are between 01-FF. For parallel or directly connected ESCON channels (that is, not going to an ESCON director but directly connected to the processor’s CHPID), these first two digits are 01. For a channel through an ESCON director, this value shows the ESCON director’s upstream port number. This is the port number on the ESCON director between it and the CHPID. The third digit shows the LPAR number. An LPAR number is only valid when the ESCON Multiple Image Facility (EMIF) is being used. If the IOCP has a definition regarding cuadd, this value must match. Note: In our case, the CHPID is not SHARED so the LPAR number in the CLAW path statement is zero. 򐂰 The device-address should match the unitadd parameter on the CNTLUNIT macro for the addresses used for the device. In our examples, these values are A0, A2, A4 and A6. This means that unit addresses A0 through A7 are used for this CLAW device. Chapter 6. Configuring CLAW, MPC+ and OSA-Express

157

򐂰 The ip-address is the host IP address for the router channel interface. This address is specified in the HOME statement of the host TCP/IP configuration file. 򐂰 The host-name is the name of the system that is configured with the SYSNAME parameter in the IEASYSxx parmlib member. 򐂰 The device-name is the name of the router and can be any name. 򐂰 The host-app and device-app should be TCPIP. If the same interface is used by more than one TCP/IP stack, the host-app has to be the TCP/IP stack name. 򐂰 The broadcast parameter enables the sending of broadcasts and multicasts across the CMPC+ connection. This is necessary to get OSPF routing updates to and from the z/OS system as well as Sysplex Distributor CASA multicasts from host.

6.1.3 Host TCP/IP profile statements In the host, the Cisco TCP/IP interface is defined as a CLAW device. Our TCP/IP profile definition statements are shown in Example 6-3 through Example 6-6. Example 6-3 TCP/IP profile excerpt from MVS001 ;****************************************************************** ; CISCO CLAW DEFINITIONS - 7200 - Z55 ;****************************************************************** ; DEVICE CIP1A CLAW 1FA0 MVS001B C7507A PACKED 15 15 32768 32768 LINK CISCO1 IP 0 CIP1A START CIP1A ; ;****************************************************************** ; HOME ADDRESSES FOR THIS STACK ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.1 VLINK0 ; Static VIPA for EE Connections 9.67.157.129 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.66 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.17 CISCO2 ; This goes to Cisco 7507 CMPC

Example 6-4 TCP/IP profile excerpt from MVS062 ;****************************************************************** ; CISCO CLAW DEFINITIONS - 7200 - Z55

158

Networking with z/OS and Cisco Routers: An Interoperability Guide

;****************************************************************** ; DEVICE CIP1A CLAW 1FA6 MVS062B C7507D PACKED 15 15 32768 32768 LINK CISCO1 IP 0 CIP1A START CIP1A ; ;****************************************************************** ; HOME STATEMENT ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.161 VLINK0 ; Static VIPA for EE Connections 9.67.157.130 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.69 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.18 CISCO2 ; This goes to Cisco 7507 CMPC

Example 6-5 TCP/IP profile excerpt from MVS069 ;****************************************************************** ; CISCO CLAW DEFINITIONS - 7200 - Z55 ;****************************************************************** ; DEVICE CIP1A CLAW 1FA2 MVS069B C7507B PACKED 15 15 32768 32768 LINK CISCO1 IP 0 CIP1A START CIP1A ; ;****************************************************************** ; HOME ADDRESSES FOR THIS STACK ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.5 VLINK0 ; Static VIPA for EE Connections 9.67.157.132 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.67 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.20 CISCO2 ; This goes to Cisco 7507 CMPC

Example 6-6 TCP/IP profile excerpt from MVS154 ;****************************************************************** ; CISCO CLAW DEFINITIONS - 7200 - Z55 ;******************************************************************

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

159

; DEVICE CIP1A CLAW 1FA4 MVS154B C7507C PACKED 15 15 32768 32768 LINK CISCO1 IP 0 CIP1A START CIP1A ; ****************************************************************** ; HOME STATEMENT ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.165 VLINK0 ; Static VIPA for EE Connections 9.67.157.131 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.68 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.19 CISCO2 ; This goes to Cisco 7507 CMPC

The format of the DEVICE and LINK statements are: DEVICE dev-name CLAW cuaddr host-name router-name NONE | PACKED read-buf write-buf read-size write-size AUTORESTART LINK link-name IP 0 dev-name

Where: 򐂰 The dev-name parameter defines the interface device name. 򐂰 The cuaddr parameter has to match the channel IOCP ADDRESS value in the IODEVICE macro. 򐂰 The host-name is the name of the system supplied in the router configuration above. 򐂰 The router-name should be the same as the one used in the router configuration above. This value is not used; it is only for information. 򐂰 PACKED configures the CLAW connection to run in packed mode. 򐂰 The read-buf parameter specifies the number of buffers for the read channel program. It is between 1 to 512. The default is 15. 򐂰 The write-buf parameter specifies the number of buffers for the write channel program. It is between 1 to 512. The default is 15. 򐂰 The read-size value is the size of each read buffer. This value must be either 32 KB or 60 KB for CLAW devices running in packed mode. 򐂰 The write-size value is the size of each write buffer. This value must be either 32 KB or 60 KB for CLAW devices running in packed mode.

160

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 AUTORESTART is the parameter for restarting the device when stopped.

The format of the HOME statement is: HOME ip_address1 link_name1 ip_address2 link_name2...

Each link name must have an entry in the HOME section. This defines the z/OS’s side of the CLAW link. This must match the IP address in the CMCC’s claw statement and be in the same subnet as the router’s IP address (specified in the CMCC’s channel interface).

6.1.4 Router show commands The Cisco routers provide some statistics and status information with the show commands, some of which are described in the following sections.

The show interface channel command The show interface channel 3/0 command shows the status of the channel interface number 0 (the only one that ever exists on a CPA) in slot 3. The bold fields are described in detail. Example 6-7 Shown interface channel 3/0 C7200-Z55#sho int channel 3/0 Channel3/0 is up, line protocol is up Hardware is Escon Channel Description: CLAW interfaces to sysplex hosts Internet address is 9.67.156.65/29 MTU 4472 bytes, BW 98304 Kbit, DLY 100 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation CHANNEL, loopback not set ECA adapter card Data transfer rate 12 Mbytes, number of subchannels 8 Last input 00:00:00, output 00:00:01, output hang never Last clearing of "show interface" counters never Queueing strategy: fifo Output queue 0/40, 0 drops; input queue 0/75, 0 drops 30 second input rate 0 bits/sec, 0 packets/sec 30 second output rate 0 bits/sec, 0 packets/sec 145447 packets input, 14827150 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 344878 packets output, 26850192 bytes, 0 underruns 0 output errors, 0 collisions, 1 interface resets 0 output buffer failures, 0 output buffers swapped out C7200-Z55#

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

161

򐂰 Channel3/0 is up, line protocol is up

These are two status fields that indicate the state of the interface and of the protocol. Both of them have to be in up state for normal operation. The first up indicates if ESCON synchronization has been achieved. If it is “administratively down”, then the channel has been shut down. You must enter interface configuration mode and issue a no shut command. If the channel is simply down then most likely an ESCON cable is not plugged in or there is a problem with the ESCON connection. The line protocol status is an internal status and should always be up when the channel status is up. 򐂰 30 second input rate 0 bits/sec, 0 packets/sec

These two fields indicate the amount of data the entire channel interface is transferring from the z/OS system to the router in bits per second and packets per second in the last 30-second interval. To get a rough average of the size of the packets being sent from the z/OS system, you can divide the rate in bits per second by the rate in packets per second. This will produce the average packet size in bits. Multiply this by 8 to get the average size in bytes. This is for all the channel connections defined under this interface (for example, CLAW, CMPC, CSNA, etc.). For more specific, per-connection information, use the show extended channel x/y stats command. 򐂰 30 second output rate 0 bits/sec, 0 packets/sec

These two fields indicate the amount of data the entire channel interface is transferring to the z/OS system from the router in bits per second and packets per second in the last 30-second interval. To get a rough average of the size of the packets being sent to the z/OS system, you can divide the rate in bits per second by the rate in packets per second. This will produce the average packet size in bits. Multiply this by 8 to get the average size in bytes. This is for all the channel connections defined under this interface (for example, CLAW, CMPC, CSNA, etc.). For more specific, per-connection information, use the show extended channel x/y stats command.

The show extended channel statistics command The show extended channel 6/0 statistics command gives information regarding the specific CLAW connections. In Example 6-8, the command had the path and devices explicitly entered to reduce the amount of output. If the device was omitted, then all the devices utilizing that path would be displayed. If both the path and device were omitted, all the connections under that channel interface would be displayed. Fields in bold are described further following the example. Example 6-8 show extended channel statistics C7200-Z55#sho ext chan 3/0 stat d102 a2

162

Networking with z/OS and Cisco Routers: An Interoperability Guide

Path: D102 Dev A2

-- ESTABLISHED Command Connects Retries 240786 213830

Cancels 14

Blocks Dev-Lnk A2-00 A2-01 Total:

Read 55 214759 214814

Selective Reset 35 Bytes

Write 0 0 0

Read 1760 13136684 13138444

Write 0 0 0

System Reset 0

Device Errors 0

Dropped Blk Read Write 222 0 19 0 241 0

CU Busy 0 Memd wait Con 0 Y 0 Y 0

Last statistics 5 seconds old, next in 5 seconds C7200-Z55#

Where: 򐂰 Path: D102 -- ESTABLISHED indicates that the information following is for all the connections utilizing channel path D102. ESTABLISHED indicates that at least one M/F has established an ESCON logical path. 򐂰 Dev A2 indicates that the information following is for the channel connection for device with path D102 A2. 򐂰 Connects is the number of times the channel started a channel program on the device. Each time the z/OS system issues a Start Subchannel (SSCH), the CMCC increments the Connects counter. Each time a CCW is retried, the CMCC increments the Connects counter. 򐂰 Command Retries is the number of times the CMCC adapter either had no data to send to the channel (for the read subchannel) or the number of times the CMCC adapter had no buffers to hold data from the channel (for the write subchannel). Every command retry that is resumed results in a connect. A CLAW connection (both on the read and write subchannels) runs most efficiently when the channel programs are long running. With respect to the read subchannel, if the ratio between the connects and command retries is approximately one-to-one, then the channel programs are running well. If the ratio approaches two-to-one or more, then the z/OS system is having trouble keeping up with the CMCC. The number of read buffers (on the TCP/IP DEVICE statement) should probably be increased. With respect to the write subchannel, the number of command retries should generally be zero. If this number is non-zero, the z/OS system is having to retry the write operations. This indicates that the router is having trouble keeping up for some reason. This should generally not be occurring. 򐂰 Cancels is the number of increments when the CLAW connection is stopped and started. You should see this change only when the device is started and stopped in TCP/IP in z/OS. It indicates that a halt subchannel was issued.

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

163

򐂰 Selective Reset indicates that a Clear Subchannel (CSCH) was issued by the host. If this number is incrementing, it may indicate a communication problem between the router and the z/OS. This counter may also increment if the missing Interrupt Handler (MIH) is detecting problems for this subchannel pair on the host. 򐂰 System Reset increments each time the ESCON daughter card activates. If there are lots of these, there is probably a channel problem. Possibilities include a bad channel card on the host, a bad channel cable, or a defective CIP card. 򐂰 Device errors is an indication of a CMCC device error. 򐂰 Dev-Lnk indicates the two logical links (00 and 01) for each subchannel address. Logical Link 00 is used for CLAW control data. This is a connection-oriented communication. When the CMCC CLAW connection to the device is first activated, you will usually see 5 read and 4 write blocks on Logical Link 00.

Logical Link 01 is used for data transfer. Packets flowing on Logical Link 01 would be the TCP/IP packets being routed to and from the mainframe. 򐂰 Blocks Read/Write indicates the number of blocks of packets sent from the CMCC to the z/OS (read) or sent from the z/OS to the CMCC (write). When running in packed mode, each block can contain one or more packets and be a maximum of 32 KB or 60 KB, depending on the size specified in the z/OS TCP/IP DEVICE statement. When not running in packed mode, each block contains exactly one packet. 򐂰 Bytes Read/Write indicates the number of bytes sent from the CMCC to the z/OS (read) or sent from the z/OS to the CMCC (write). By dividing the number of bytes read or written by the number of blocks read or written, the average block size read or written can be calculated. When running in packed mode, you cannot determine the number of packets or calculate the average size of a packet. Use the show extended channel packing stats command to get information regarding packets. 򐂰 Con, if Y', indicates that the CLAW connection was established between the router and the z/OS. Both the read subchannel and the write subchannel must have a ‘Y’ in order for the CLAW connection to be active.

The show extended channel 6/0 packing stats command gives information regarding the specific CLAW connection. In Example 6-9, the command had the path and devices explicitly entered to reduce the amount of output. If the device were omitted, then all the devices utilizing that path would be displayed. If both the path and device were omitted, then all the connections under that channel interface would be displayed. Fields in bold are described further following the example.

164

Networking with z/OS and Cisco Routers: An Interoperability Guide

Example 6-9 Show extended channel packing stats C7200-Z55#sho ext chan 3/0 packing stats d102 a2 Path: D102 Devs: A2,A3 CLAW Link: 1 Read Blks: 215443 Packets Bytes Linkname Read Write Read Write CONTROL 22 11 704 352 IP 215544 36109 12137766 3423192 CKSUM 0 0 0 0 Total: C7200-Z55#

215566

36120

12138470

3423544

Wrt Blks: 36120 Drops Read Write Err C 0 0 0 Y 0 0 0 Y 0 0 0 N 0

0

0

򐂰 Read Blks indicates the number of blocks of packets sent from the CMCC to the z/OS. Each block can contain one or more packets and be a maximum of 32 KB or 60 KB, depending on the size specified in the z/OS TCP/IP DEVICE statement. 򐂰 Wrt Blks indicates the number of blocks of packets sent from the z/OS to the CMCC. Each block can contain one or more packets and be a maximum of 32 KB or 60 KB, depending on the size specified in the z/OS TCP/IP DEVICE statement. 򐂰 Packets Read/Write indicates the number of packets sent from the CMCC to the z/OS (read) or sent from the z/OS to the CMCC (write) on a per-link basis. By dividing the number of packets read or written by the number of Read Blks or Write Blks, the average number of packets per block can be calculated. When done on the IP logical link, this is an indication of the effectiveness of the CLAW packing. 򐂰 Bytes Read/Write indicates the number of bytes sent from the CMCC to the z/OS (read) or sent from the z/OS to the CMCC (write). By dividing the number of bytes read or written by the number of blocks read or written, the average block size read or written can be calculated. By dividing the number of bytes read or written by the number of packets read or written, the average size of a packet can be calculated.

6.1.5 z/OS CLAW commands Example 6-10 VARY START,device command VARY TCPIP,,START,CIP1A EZZ0060I PROCESSING COMMAND: VARY TCPIP,,START,CIP1A EZZ0053I COMMAND VARY START COMPLETED SUCCESSFULLY EZZ4313I INITIALIZATION COMPLETE FOR DEVICE CIP1A

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

165

Example 6-11 VARY STOP,device command VARY TCPIP,,STOP,CIP1A EZZ0060I PROCESSING COMMAND: VARY TCPIP,,STOP,CIP1A EZZ0053I COMMAND VARY STOP COMPLETED SUCCESSFULLY EZZ4315I DEACTIVATION COMPLETE FOR DEVICE CIP1A

Example 6-12 NETSTAT DEVLINKS command D TCPIP,,NETSTAT,DEVLINKS EZZ2500I NETSTAT CS V1R2 TCP 954 . . . DEVNAME: CIP1A DEVTYPE: CLAW DEVNUM: 1FA0 DEVSTATUS: READY CFGPACKING: YES ACTPACKING: PACKED LNKNAME: CISCO1 LNKTYPE: CLAW LNKSTATUS: READY NETNUM: 0 QUESIZE: 0 BYTESIN: 231120 BYTESOUT: 668636 BSD ROUTING PARAMETERS: MTU SIZE: 04096 METRIC: 25 DESTADDR: 0.0.0.0 SUBNETMASK: 255.255.255.248 MULTICAST SPECIFIC: MULTICAST CAPABILITY: YES GROUP REFCNT ---------224.0.0.5 0000000001 224.0.0.1 0000000001

6.2 Cisco CMPC+ support The IBM MPC+ protocol has been implemented in Cisco routers as Cisco CMPC+ support with Cisco IOS Release 12.0(3)T and later router software. CMPC+ requires the use of two data channels, one for read and other for write. These channels can be through the same or different physical channels. MPC+ definition requires IOCP channels to be defined as CTC. In addition, a VTAM TRLE definition for MPC and TCP/IP profile interface definitions on the host side are required. In addition, router configuration is required on the router side. Figure 6-2 shows the connection of the Cisco router to the IBM z/OS system.

166

Networking with z/OS and Cisco Routers: An Interoperability Guide

HOST TCP/IP STACK

VTAM

TRLE

CHANNEL ESCON/BUS-TAG

WRITE

READ

ROUTER IOS 12.0(3) or later

CISCO CHANNEL

CMPC+

LAN/WAN INTERFACES

IP Network

Figure 6-2 Cisco MPC+ channel connection to IBM z/OS

6.2.1 IOCP definitions for CMPC+ devices Each CMPC+ connection requires two devices out of a maximum of 256. Although this allows for a maximum of 128 CLAW and CMPC+ connections per interface, a maximum of 32 connections per interface is recommended by Cisco. Actually, since the bandwidth of the ESCON channel is fixed, all CMPC, CLAW, CSNA connections share that fixed bandwidth. The more connections that are defined, the less available bandwidth for each connection.

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

167

With MPC+ the read and write may exist on the same subchannel or, if running a dual-port CIP, may be on separate ones. For both CMPC and CLAW, the CNTLUNIT should be defined as an 3172 and the IODEVICE should also be defined as a 3172. You can see sample host IOCP statements that we used for our 7507 CIP CMPC+ devices in Example 6-13. Since the four z/OS systems in the sysplex are really virtual machines running under z/VM, there is only one IOCP for all four z/OS systems. Example 6-13 z/VM IOCP for all z/OS guests for CMPC+ *CMPC+ RESOURCE PART=((RALVM9,1),(TIVMVS2,2),(RALNS3,3),(CARVM4,9), (TIVVM1,B),(RALHCD,F))

X

CHPID PATH=6A,TYPE=CNC,SWITCH=05,PARTITION=(RALVM9,REC) CHPID PATH=68,TYPE=CNC,SWITCH=01,PARTITION=(RALVM9,REC) *Read 21E0 - 21E6 *Write 1E10 - 1E16 CNTLUNIT CUNUMBR=21E0,PATH=6A,UNIT=SCTC,LINK=AD, UNITADD=((E0,16)),CUADD=2

*

CNTLUNIT

X

CUNUMBR=1E10,PATH=68,UNIT=3172,CUADD=0, UNITADD=((10,16)),LINK=F9 IODEVICE ADDRESS=(21E0,16),CUNUMBR=21E0,UNIT=SCTC IODEVICE ADDRESS=(1E10,16),CUNUMBR=1E10,UNIT=3172

This defines 32 unit addresses (21E0-21EF and 1E10-1E1F).

6.2.2 Cisco MPC+ router definitions Cisco routers support CMPC+ with IOS Release 12.0(3)T and later on 7000, 7200 and 7500 series platforms. In our network, we have a dual-port CIP in slot 6 in a 7507 router. We will configure the CIP interfaces channel 6/0, 6/1 and 6/2. If a CPA is being used, configure channel x/0 only.

168

Networking with z/OS and Cisco Routers: An Interoperability Guide

We configured the CIP to have its read devices on channel 6/0 and its write devices on channel 6/1 as shown in Example 6-14. This was to show off this ability in the dual-port CIP. In a real installation, you should analyze your traffic patterns and split the read and write devices accordingly. Also, the TG is defined on the virtual channel interface, channel 6/2. The IP address 9.67.156.21 is the IP address of this interface as defined in the HOME statement in the TCP/IP profile. Example 6-14 Cisco MPC+ configuration NIVT7507# interface Channel6/0 description CPMC read interfaces to sysplex hosts no ip address ip directed-broadcast load-interval 30 no keepalive cmpc CA02 E0 TGMVS001 READ cmpc CA02 E2 TGMVS062 READ cmpc CA02 E4 TGMVS069 READ cmpc CA02 E6 TGMVS154 READ ! interface Channel6/1 description CMPC write interfaces to sysplex hosts no ip address ip directed-broadcast ip ospf network point-to-multipoint load-interval 30 no keepalive cmpc D900 10 TGMVS001 WRITE cmpc D900 12 TGMVS062 WRITE cmpc D900 14 TGMVS069 WRITE cmpc D900 16 TGMVS154 WRITE ! interface Channel6/2 description TN3270 Server Port ip address 9.67.156.21 255.255.255.248 ip ospf network point-to-multipoint ip igmp join-group 224.0.1.2 no keepalive tg TGMVS001 ip 9.67.156.17 9.67.156.21 broadcast tg TGMVS062 ip 9.67.156.18 9.67.156.21 broadcast tg TGMVS069 ip 9.67.156.19 9.67.156.21 broadcast tg TGMVS154 ip 9.67.156.20 9.67.156.21 broadcast

!

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

169

A CPA would have both the read and write devices defined on the same channel because the CPA has only one channel interface. Also, it does not utilize the /2 virtual channel interface for the TG The format of the CMPC and TG statements are: cmpc path device tg-name {read | write} tg tg-name {ip | hsas-ip} host-ip-addr local-ip-addr broadcast

Where: 򐂰 The path value’s first two digits are between 01-FF. For parallel or directly connected ESCON channels (that is, not going to an ESCON director but directly connected to the processor’s CHPID), these first two digits are 01. For a channel through an ESCON director, this value shows the ESCON director’s upstream port number. This is the port number on the ESCON director between it and the CHPID. The third digit shows the LPAR number. An LPAR number is only valid when the ESCON Multiple Image Facility (EMIF) is being used. The last digit is generally not used. But if the IOCP has a definition regarding cuadd, this value must match. Note: In our case, the CHPID is not SHARED so the LPAR number in the CMPC path statements is zero. 򐂰 The device-address should match the unitadd parameter on the CNTLUNIT macro. for the addresses used for the device. In our example, this value is 02. That means unit addresses 02 and 03 are used for this CLAW device. 򐂰 The tg-name is the name of the CMPC+ TG. The maximum length of the name is 8 characters. You must use the same tg-name on exactly one read device, one write device and one tg command. This is any name desired and does not have to match any value defined in the z/OS system. 򐂰 The read and write values define this particular device as a read subchannel or write subchannel. These should match what is specified in the TRL major node. 򐂰 ip specifies that this TG connects to the TCP/IP stack. 򐂰 The host-ip-addr value specifies the IP address of the channel-attached host using this TG. This is the IP address in the HOME statement for the MPC+ device in the host TCP/IP profile. 򐂰 The local-ip-addr value must match an IP address configured on the virtual channel interface (the /2 interface). This specifies the IP address of the router to be used for this TG. Since we will be running the OSPF routing protocol, do not specify this address in the DEFAULTNET statement in the host TCP/IP profile.

170

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 The broadcast parameter enables the sending of broadcasts and multicasts across the CMPC+ connection. This is necessary to get OSPF routing updates to and from the z/OS system as well as Sysplex Distributor CASA multicasts from host.

6.2.3 MPC+ host definitions There are IOCP statements defining the read and write devices for each MPC+ connection. Also, there are VTAM TRLE macros (in a TRL major node) that define the devices to VTAM. Lastly, there are TCP/IP profile statements that define the MPC+ connection to the stack.

IOCP statements The 7507 has two CHPIDs for its two channel interfaces through an ESCON director. This IOCP definition is shown in Example 6-15. In our example, the first two unit addresses (00 and 01) are used for CMPC+. Example 6-15 CMPC IOCP definition sample CHPID PATH=12,TYPE=CNC CNTLUNIT CUNUMBR=1000,PATH=(12),UNIT=SCTC, UNITADD=(00,16) IODEVICE ADDRESS=(800,8),CUNUMBR=800,UNIT=RS6K

X

VTAM TRL major node The VTAM TRLE definitions are shown in Examples 6-16 through 6-19. The TRLE name must match the MPC device name given in the TCP/IP profile as shown in Examples 6-20 through 6-23. For our example, these names are N04CMPC through N07CMPC. Example 6-16 MVS001 MPC+ TRLE *********************************************************** * TRL TO CICSO 7507 - CMPC *********************************************************** * N04CMPC TRLE LNCTL=MPC,MAXBFRU=255,REPLYTO=25.5,MAXREADS=8, STORAGE=DS,MPCLEVEL=HPDT, READ=(21E0),WRITE=(1E10)

* *

Example 6-17 MVS062 MPC+ TRLE *********************************************************** * TRL TO CICSO 7507 - CMPC *********************************************************** *

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

171

N05CMPC

TRLE

LNCTL=MPC,MAXBFRU=255,REPLYTO=25.5,MAXREADS=8, STORAGE=DS,MPCLEVEL=HPDT, READ=(21E2),WRITE=(1E12)

* *

Example 6-18 MVS069 MPC+ TRLE *********************************************************** * TRL TO CICSO 7507 - CMPC *********************************************************** * N07CMPC TRLE LNCTL=MPC,MAXBFRU=255,REPLYTO=25.5,MAXREADS=8, STORAGE=DS,MPCLEVEL=HPDT, READ=(21E6),WRITE=(1E16)

* *

Example 6-19 MVS154 MPC+ TRLE *********************************************************** * TRL TO CICSO 7507 - CMPC *********************************************************** * N06CMPC TRLE LNCTL=MPC,MAXBFRU=16,REPLYTO=25.5,MAXREADS=8, STORAGE=DS,MPCLEVEL=HPDT, READ=(21E4),WRITE=(1E14)

* *

The TRLE macro has the following key parameters: 򐂰 LNCTL=MPC must be specified for an MPC+ connection. 򐂰 MAXBFRU specifies the number of 4 KB pages used for reading data. 16 is the maximum and it should be specified. 򐂰 MAXREADS specifies the number of read CCWs contained in the read channel program. Multiplying this by MAXBUFRU gives the amount of fixed storage (locked into real processor memory) in use by the read channel program. 򐂰 STORAGE=DS specifies that the virtual storage used for the read and write channel programs is in a CSM data space. 򐂰 MPCLEVEL=HPDT specifies that this MPC connection utilizes High Performance Data Transfer. This is the “+” in MPC+. This is required. 򐂰 READ and WRITE list the device addresses used for this MPC+ connection. Only one read device and one write device are supported by the CMCC.

TCP/IP profile statements The TCP/IP profile excerpts are shown in Example 6-20 through Example 6-23. Note that the address listed under the HOME statement in each stack must match the first IP address specified in the TG command in the router.

172

Networking with z/OS and Cisco Routers: An Interoperability Guide

Example 6-20 MVS001 TCP/IP profile excerpt for MPC+ ;****************************************************************** ; CISCO 7507 - CMPC Channel ;****************************************************************** ; DEVICE N04CMPC MPCPTP LINK CISCO2 MPCPTP N04CMPC START N04CMPC ; ;****************************************************************** ; HOME ADDRESSES FOR THIS STACK ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.1 VLINK0 ; Static VIPA for EE Connections 9.67.157.129 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.66 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.17 CISCO2 ; This goes to Cisco 7507 CMPC

Example 6-21 MVS062 TCP/IP profile excerpt for MPC+ ;****************************************************************** ; CISCO 7507 - CMPC Channel ;****************************************************************** ; DEVICE N05CMPC MPCPTP LINK CISCO2 MPCPTP N05CMPC START N05CMPC ; ;****************************************************************** ; HOME STATEMENT ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.161 VLINK0 ; Static VIPA for EE Connections 9.67.157.130 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.69 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.18 CISCO2 ; This goes to Cisco 7507 CMPC ;

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

173

Example 6-22 MVS069 TCP/IP profile excerpt for MPC+ ;****************************************************************** ; CISCO 7507 - CMPC Channel ;****************************************************************** ; DEVICE N07CMPC MPCPTP LINK CISCO2 MPCPTP N07CMPC START N07CMPC ; ;****************************************************************** ; HOME ADDRESSES FOR THIS STACK ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.5 VLINK0 ; Static VIPA for EE Connections 9.67.157.132 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.67 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.20 CISCO2 ; This goes to Cisco 7507 CMPC

Example 6-23 MVS154 TCP/IP profile excerpt for MPC+ ;****************************************************************** ; CISCO 7507 - CMPC Channel ;****************************************************************** ; DEVICE N06CMPC MPCPTP LINK CISCO2 MPCPTP N06CMPC START N06CMPC ; ;****************************************************************** ; HOME STATEMENT ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.165 VLINK0 ; Static VIPA for EE Connections 9.67.157.131 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.68 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.19 CISCO2 ; This goes to Cisco 7507 CMPC

174

Networking with z/OS and Cisco Routers: An Interoperability Guide

The format of the DEVICE and LINK statements are as follows: DEVICE device-name MPCPTP AUTORESTART | NOAUTORESTART LINK link-name MPCPTP device-name CHKSUM |

Where: 򐂰 The device_name value for HPDT MPC connections must be the TRLE name of an HPDT connection. The TRLE is defined in a VTAM TRL major node and must be active to start the device. The maximum length is eight characters. 򐂰 MPCPTP specifies the device is a multipath channel point-to-point device. 򐂰 AUTORESTART specifies that in the event of a device failure, the TCP/IP stack attempts to reactivate the device. 򐂰 NOAUTORESTART specifies that the TCP/IP stack does not attempt to reactivate this device. 򐂰 The link_name value is the name of the link. The maximum length is 16 characters. The link name is associated with a home address on the HOME statement. 򐂰 MPCPTP specifies that the link is for MPCPTP. 򐂰 The device_name value must be the same as specified in the previous DEVICE statement. The maximum length is 8 characters. 򐂰 CHECKSUM indicates that an inbound checksum calculation is performed for all packets received on this interface. This is the default. 򐂰 NOCHECKSUM indicates that an inbound checksum calculation is not performed for any packets received on this interface.

The format of the HOME statement is: HOME ip_address1 link_name1 ip_address2 link_name2... Each link name must have an entry in the HOME section. This defines the z/OS’s side of the MPC+ link. This must match the first IP address in the CMCC’s tg statement and be in the same subnet as the router’s IP address (specified in the CMCC’s channel interface and the second IP address on the tg statement).

6.2.4 z/OS MPC+ commands The Display TRL command You can check the VTAM TRLE when TCP/IP device is started. To do this, you can use the VTAM TRL display command (D NET,TRL) as shown in Example 6-24. First all the TRLEs are displayed. Then a specific MPC+ TRLE is displayed.

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

175

Example 6-24 VTAM display of TRLE for MPC+ D NET,TRL IST097I DISPLAY ACCEPTED IST350I DISPLAY TYPE = TRL 994 IST1314I TRLE = N05GIG1 STATUS = ACTIV CONTROL = MPC IST1314I TRLE = N05CMPC STATUS = ACTIV CONTROL = MPC IST1314I TRLE = ISTTN5N7 STATUS = ACTIV CONTROL = XCF IST1314I TRLE = IUTSAMEH STATUS = ACTIV CONTROL = MPC IST1314I TRLE = ISTTN5N4 STATUS = ACTIV CONTROL = XCF IST1454I 5 TRLE(S) DISPLAYED IST314I END D NET,ID=N05CMPC,E IST097I DISPLAY ACCEPTED IST075I NAME = N05CMPC, TYPE = TRLE 997 IST486I STATUS= ACTIV, DESIRED STATE= ACTIV IST087I TYPE = LEASED , CONTROL = MPC , HPDT = YES IST1715I MPCLEVEL = HPDT MPCUSAGE = SHARE IST1717I ULPID = TCP IST1801I UNITS OF WORK FOR NCB AT ADDRESS X'12529800' IST1802I CURRENT = 0 AVERAGE = 1 MAXIMUM = 2 IST1577I HEADER SIZE = 4092 DATA SIZE = 60 STORAGE = ***NA*** IST1221I WRITE DEV = 1E12 STATUS = ACTIVE STATE = ONLINE IST1577I HEADER SIZE = 4092 DATA SIZE = 60 STORAGE = DATASPACE IST1221I READ DEV = 21E2 STATUS = ACTIVE STATE = ONLINE IST1500I STATE TRACE = OFF IST314I END

The Display Netstat Devlinks command You can display the device status with the normal TCP/IP display command as shown in Example 6-12. Unfortunately, this display will show you all devices and you must search the response for the device you are interested in. In the example, the extra devices have been deleted. The LNKSTATUS has to show READY for the device to be operational. Example 6-25 D TCPIP,,NETSTAT,DEVLINKS command for MPC+ on MVS001 D TCPIP,,NETSTAT,DEVLINKS EZZ2500I NETSTAT CS V1R2 TCP 954 . . . DEVNAME: N04CMPC DEVTYPE: MPC DEVNUM: 0000 DEVSTATUS: READY LNKNAME: CISCO2 LNKTYPE: MPC LNKSTATUS: READY NETNUM: 0 QUESIZE: 0 BYTESIN: 136804 BYTESOUT: 628716 BSD ROUTING PARAMETERS: MTU SIZE: 04472 METRIC: 25

176

Networking with z/OS and Cisco Routers: An Interoperability Guide

DESTADDR: 0.0.0.0 SUBNETMASK: 255.255.255.248 MULTICAST SPECIFIC: MULTICAST CAPABILITY: YES GROUP REFCNT ---------224.0.0.5 0000000001 224.0.0.1 0000000001

6.2.5 Router show commands The Cisco routers provide some statistics and status information with the show commands. Some utility show commands are listed below.

The show interface channel command The show interface channel6/0 command shows the status of the channel interface number 0 on slot 6. There are two status fields that indicate the state of the interface and of the protocol. Both of them have to be in up state for normal operation. Example 6-26 show interface command on physical interface for CMPC+ NIVT7507#sho int chan 6/0 Channel6/0 is up, line protocol is up Hardware is cyBus Channel Interface Description: CPMC read interfaces to sysplex hosts MTU 4096 bytes, BW 98304 Kbit, DLY 100 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation CHANNEL, loopback not set ECA adapter card Data transfer rate 12 Mbytes, number of subchannels 4 Last input never, output never, output hang never Last clearing of "show interface" counters 5d01h Queueing strategy: fifo Output queue 0/40, 0 drops; input queue 0/75, 0 drops 30 second input rate 0 bits/sec, 0 packets/sec 30 second output rate 0 bits/sec, 0 packets/sec 0 packets input, 0 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 0 packets output, 0 bytes, 0 underruns 0 output errors, 0 collisions, 0 interface resets 0 output buffer failures, 0 output buffers swapped out

򐂰 Channel6/0 is up, line protocol is up

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

177

These are two status fields that indicate the state of the interface and of the protocol. Both of them have to be in up state for normal operation. The first up indicates if ESCON synchronization has been achieved. If it is “administratively down” then the channel has been shut down. You must enter interface configuration mode and issue a no shut command. If the channel is simply down then most likely an ESCON cable is not plugged in or there is a problem with the ESCON connection. The line protocol status is an internal status and should always be up when the channel status is up. 򐂰 0 packets input, 0 bytes, 0 packets output, 0 bytes

Notice that there are no packets input or output on the physical (x/0) channel interface. The same is true for the x/1 physical channel interface. This is true because packets are not counted under the physical channel interfaces for CMPC on a CIP. They are only counted on the virtual (x/2) virtual channel interface as shown in Example 6-26. Example 6-27 Show interface command on virtual interface for CMPC+ NIVT7507#sho int chan 6/2 Channel6/2 is up, line protocol is up Hardware is cyBus Channel Interface Description: TN3270 Server Port Internet address is 9.67.156.21/29 MTU 4472 bytes, BW 98304 Kbit, DLY 100 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation CHANNEL, loopback not set Virtual interface Last input 00:00:09, output 00:00:06, output hang never Last clearing of "show interface" counters 5d01h Queueing strategy: fifo Output queue 0/40, 0 drops; input queue 0/75, 0 drops 5 minute input rate 0 bits/sec, 0 packets/sec 5 minute output rate 0 bits/sec, 0 packets/sec 58687 packets input, 5260328 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 29979 packets output, 2794987 bytes, 0 underruns 0 output errors, 0 collisions, 0 interface resets 0 output buffer failures, 0 output buffers swapped out

178

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Channel6/2 is up, line protocol is up

These are two status fields that indicate the state of the interface and of the protocol. Both of them have to be in up state for normal operation. If it is “administratively down” then the channel has been shut down. You must enter interface configuration mode and issue a no shut command. Since this is a virtual channel interface it should never be in a down state. The line protocol status is an internal status and should always be up when the channel status is up. 򐂰 5 minute input rate 0 bits/sec, 0 packets/sec

These two fields indicate the amount of data the virtual channel interface is transferring from the z/OS system to the router in bits per second and packets per second in the last 5-minute interval. To get a rough average of the size of the packets being sent from the z/OS system, you can divide the rate in bits per second by the rate in packets per second. This will produce the average packet size in bits. Multiply this by 8 to get the average size in bytes. This will include all CMPC TGs defined. For more specific, per connection, information use the show extended channel x/y stats command on the physical channel interface. 򐂰 5 minute output rate 0 bits/sec, 0 packets/sec

These two fields indicate the amount of data the virtual channel interface is transferring to the z/OS system from the router in bits per second and packets per second in the last 5-minute interval. To get a rough average of the size of the packets being sent to the z/OS system, you can divide the rate in bits per second by the rate in packets per second. This will produce the average packet size in bits. Multiply this by 8 to get the average size in bytes. This will include all CMPC TGs defined. For more specific, per connection, information use the show extended channel x/y stats command on the physical channel interface.

The show extended channel statistics command The show extended channel 6/0 statistics command gives information regarding the specific CMPC connections. In Example 6-28 all the read devices under channel 6/0 are displayed. In Example 6-29 all the write devices under channel 6/1 are displayed. Fields in bold are described further following the example. Example 6-28 show extended channel statistics for CMPC read devices NIVT7507#sho ext chan 6/0 stat Path: CA02 Dev E0

-- ESTABLISHED Command Connects Retries 111 88

Cancels 1

Selective Reset 1

System Reset 1

Device Errors 0

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

CU Busy 0

179

E2 E4 E6

111 94 100

89 73 79

1 0 0

Blocks Dev-Lnk E0-00 E2-00 E4-00 E6-00 Path CA02 Total:

1 1 1 Bytes

Read 92 92 78 84

Write 6 6 6 6

Read 40178 40178 33582 34710

Write 789 789 789 789

346

24

148648

3156

0 0 0

0 0 0

0 0 0

Dropped Blk Read Write 0 0 0 0 0 0 0 0

0

0

Memd wait Con 0 Y 0 Y 0 Y 0 Y

0

Last statistics 2 seconds old, next in 8 seconds

Example 6-29 show extended channel statistics for CMPC write devices NIVT7507#sho ext chan 6/1 stat Path: D900 Dev 10 12 14 16

-- ESTABLISHED Command Connects Retries 413 6 105 7 102 14 107 13

Cancels 1 1 4 4

Blocks Dev-Lnk 10-00 12-00 14-00 16-00 Path D900 Total:

Selective Reset 1 1 1 1 Bytes

Read 9 9 18 18

Write 404 92 84 89

Read 801 801 1602 1602

Write 90290 37966 35986 36958

54

669

4806

201200

System Reset 1 0 0 0

Device Errors 0 0 0 0

CU Busy 0 0 0 0

Dropped Blk Read Write 0 0 0 0 0 0 0 0

0

0

Memd wait Con 0 Y 0 Y 0 Y 0 Y

0

Last statistics 2 seconds old, next in 8 seconds

Where: 򐂰 Path: D102 -- ESTABLISHED indicates that the information following is for all the connections utilizing channel path D102. ESTABLISHED indicates that at least one M/F has established an ESCON logical path. 򐂰 Dev A2This indicates that the information following is for the channel connection for device with path D102 A2.

180

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Connects is the umber of times the channel started a channel program on the device. Each time the z/OS system issues a Start Subchannel (SSCH) the CMCC increments the Connects counter. Each time a CCW is retried the CMCC increments the Connects counter. 򐂰 Command Retries is the number of times the CMCC adapter either had no data to send to the channel (for the read subchannel) or the number of times the CMCC adapter had no buffers to hold data from the channel (for the write subchannel). Every command retry that is resumed results in a connect. A CLAW connection (both on the read and write subchannels) runs most efficiently when the channel programs are long running. With respect to the read subchannel, if the ratio between the connects and command retries is approximately one-to-one then the channel programs are running well. If the ratio approaches two-to-one or more then the z/OS system is having trouble keeping up with the CMCC. The number of read buffers (on the TCP/IP DEVICE statement) should probably be increased. With respect to the write subchannel, the number of command retries should generally be zero. If this number is non-zero, the z/OS system is having to retry the write operations. This indicates that the router is having trouble keeping up for some reason. This should generally not be occurring. 򐂰 Cancels increments when the CLAW connection is stopped and started. You should only see this change when the device is started and stopped in TCP/IP in z/OS. It indicates that a halt subchannel was issued. 򐂰 Selective Reset indicates that a Clear Subchannel (CSCH) was issued by the host. If this number is incrementing, it may indicate a communication problem between the router and the z/OS. This counter may also increment if the missing Interrupt Handler (MIH) is detecting problems for this subchannel pair on the host. 򐂰 System Reset increments each time the ESCON daughter card activates. If there are lots of these, there is probably a channel problem. Possibilities include a bad channel card on the host, a bad channel cable, or a defective CIP card. 򐂰 Device errors is an indication of a CMCC device error. 򐂰 Dev-Lnk are two logical links (00 and 01) for each subchannel address. Logical Link 00 is used for CLAW control data. This is connection oriented communication. When the CMCC CLAW connection to the device is first activated, you will usually see 5 read and 4 write blocks on Logical Link 00.

Logical Link 01 is used for data transfer. Packets flowing on Logical Link 01 would be the TCP/IP packets being routed to and from the mainframe. 򐂰 Blocks Read/Write indicates the number of blocks of packets sent from the CMCC to the z/OS (Read) or sent from the z/OS to the CMCC (Write). When running in packed mode, each block can contain one or more packets and be

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

181

a maximum of 32 KB or 60 KB depending on the size specified in the z/OS TCP/IP DEVICE statement. When not running in packed mode, each block contains exactly one packet. 򐂰 Bytes Read/Write indicates the number of bytes sent from the CMCC to the z/OS (Read) or sent from the z/OS to the CMCC (Write). By dividing the number of bytes read or written by the number of blocks read or written the average block size read or written can be calculated. When running in packed mode you cannot determine the number of packets or calculate the average size of a packet. Use the show extended channel packing stats command to get information regarding packets. 򐂰 Con, if Y', indicates that the CLAW connection was established between the router and the z/OS. Both the read subchannel and the write subchannel must have a ‘Y’ in order for the CLAW connection to be active.

The show extended channel tg tgname command The show extended channel slot/port tg tgname command shows the CMPC+ status. On a CIP the command must be issued against the virtual channel interface (the /2 interface) because this is where the TG is defined. A status of Ready shows the CMPC virtual interface is operational. When the CMPC+ TG connection comes to the active state, the connection status shows Active. Example 6-30 Show extended channel tg tgname NIVT7507#sho ext chan 6/2 tg tgmvs001 CMPC-TG:TGMVS001 Status:Ready Local IP Address:9.67.156.21

Remote IP Address :9.67.156.17

MPC+ Information: Connection Type=TCP/IP Local VC Token :05DF001009 Remote VC Token :05000101AE VC Status :Active

Local Conn. Token :05E300100A Remote Conn. Token:05000101B0 Connection Status :Active

The show extended channel tg stat tgname command The show extended channel slot/port tg stat tgname command shows some statistics about data traffic. A sample of the TG to MVS001 is shown in Example 6-31. Example 6-31 Show extended channel tg stat tgname for MVS001 NIVT7507#sho ext chan 6/2 tg stat tgmvs001 CMPC-TG:TGMVS001 PacketsIn : BytesIn :

182

134 15669

PacketsOut BytesOut

: :

Networking with z/OS and Cisco Routers: An Interoperability Guide

600 60076

ConnNr : 0 ConnNs : SweepReqsIn : 0 SweepReqsOut: SweepRspsIn : 0 SweepRspsOut: Wraps : 0 LastSeqNoIn : 605 LastSeqNoOut: LastSeqNoFailureCause : None TimeSinceLastSeqNoFailure : never

0 0 0 134

The show extended channel cmpc command The show extended channel 6/0 cmpc command shows CMPC connections that are either Active or Inactive. The operational CMPCs are in Active status. All four of our CMPCs are shown in Example 6-8 but the read devices are displayed separately. Example 6-32 Show extended channel cmpc for both read and write devices NIVT7507#sho ext chan 6/0 cmpc Path Dv TGName Dir Bfrs CMPC CA02 E0 TGMVS001 READ 16 CMPC CA02 E2 TGMVS062 READ 16 CMPC CA02 E4 TGMVS154 READ 16 CMPC CA02 E6 TGMVS069 READ 16 NIVT7507#sho ext chan 6/1 cmpc Path Dv TGName Dir Bfrs CMPC D900 10 TGMVS001 WRITE 16 CMPC D900 12 TGMVS062 WRITE 16 CMPC D900 14 TGMVS154 WRITE 16 CMPC D900 16 TGMVS069 WRITE 16

Status Active+ Active+ Active+ Active+ Status Active+ Active+ Active+ Active+

6.3 Configuring for the OSA-Express adapter The OSA-Express can be shared by two or more LPARs in the same system or it can be dedicated to a particular LPAR. In our case we shared it among all the LPARs in our sysplex. There are several new configuration parameters that you must use when defining the OSA-Express feature. In general, we must make IOCP definitions as well as VTAM and TCP/IP PROFILE.TCPIP configuration changes.

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

183

6.3.1 IOCP for OSA-Express devices An OSA-Express device is defined in IOCP as type OSD channel. At least three devices must be defined to each partition that will use the OSA-Express adapter. One of the devices will be used to transfer data while the other two are for read and write control data. Example 6-9 shows the IOCP definition that we used in our environment. Note that in our case, we were using Gigabit Ethernet with our OSA-Express. Example 6-33 z/VM IOCP for all z/OS guests for OSA-Express RESOURCE PART=((RALVM9,1),(TIVMVS2,2),(RALNS3,3),(CARVM4,9), (TIVVM1,B),(RALHCD,F))

X

CHPID PATH=F5,TYPE=OSD,SHARED CNTLUNIT

CUNUMBR=2F00,PATH=F5,UNIT=OSA

IODEVICE

ADDRESS=(2F00,28),CUNUMBR=2F00,UNIT=OSA, UNITADD=00,PART=(RALVM9,RALNS3)

X

Unlike an OSA ATM, you don't need to use OSA/SF to define or configure the OSA-Express Gigabit Ethernet. For monitoring only, you can use OSA/SF. You need to define an OSAD channel for this utility but it is not a must. If you want to use OSA/SF, you can use the sample OSAD channel IOCP definition shown in Example 6-35. Example 6-34 zVM IOCP for OSA/SF IODEVICE

ADDRESS=(2F1E,1),CUNUMBR=2F00,UNIT=OSAD, UNITADD=FE,PART=(RALVM9,RALNS3)

X

Example 6-35 IOCP definition for OSA/SF ODEVICE ADDRESS=23AF,UNITADD=FE,CUNUMBR=(23A0),UNIT=OSAD

6.3.2 Catalyst 6500 configuration Our Catalyst 6509 switch contained a dual-port Gigabit Ethernet blade and a 48-port Fast Ethernet blade. We used one Gigabit Ethernet port to go to the OSA-Express and the other to go to the 7206. We used two of the Fast Ethernet ports, combined in a fast EtherChannel, to go to the 7507. This demonstrates

184

Networking with z/OS and Cisco Routers: An Interoperability Guide

two possibilities and shows that a Gigabit Ethernet connection is not required downstream from the switch to the router. All the previously mentioned ports were put into the same VLAN. The configuration is shown in Example 6-36 and is described immediately following. Note that some of the configuration was deleted in the following example for clarity. Example 6-36 Catalyst 6500 configuration begin ! # ***** NON-DEFAULT CONFIGURATION ***** ! ! #time: Wed Jul 25 2001, 04:06:22 ! #version 5.5(4b) ! set prompt CAT6K ! #vtp set vtp mode transparent set vlan 400 name OSA-Express-VLAN type ethernet mtu 1500 said 100400 state active #port channel set port channel 2/25-28 96 ! # default port status is enable ! ! #module 1 : 2-port 1000BaseX Supervisor set vlan 400 1/1-2 ! #module 2 : 48-port 10/100BaseTX Ethernet set vlan 400 2/25-26 set port speed 2/25-26 100 set port duplex 2/25-26 full set port channel 2/25-26 mode on !

Where: 򐂰 set vlan 400... defines the Ethernet VLAN for our OSA-Express and two routers. 򐂰 set port channel... defines the administrative group for the fast EtherChannel. In this case it is a system-default value of 96. This allows the command show port channel 96 (described later) to be used.

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

185

򐂰 set vlan 400... sets the two Gigabit Ethernet ports (the ports in module 1) and two Fast Ethernet ports (in module 2) into VLAN 400. 򐂰 set port speed...100 sets the Fast Ethernet ports speed to 100 Mbps. Since this is a 10/100 module, the speed would by default be negotiated. In an EtherChannel all the ports must be the same speed and duplex. This eliminates any possibility that the speed auto-negotiation would set the speed to 10 Mbps. 򐂰 set port duplex... full sets the Fast Ethernet ports duplex to full. The duplex would by default be negotiated. In an EtherChannel all the ports must be the same speed and duplex. This eliminates any possibility that the duplex auto-negotiation would set the duplex to half. 򐂰 set port channel... on sets ports 2/2 and 2/26 into channel mode without utilizing Port Aggregation Protocol (PAgP). Since this is an EtherChannel to a router and not another switch, this needs to be set on. Routers do not use PAgP, so without this command the ports would not channel.

6.3.3 7507 configuration The 7507 is connected to the Catalyst 6509 via two Fast Ethernet ports that were channelized. Example 6-37 shows the portion of the configuration we used to connect the 7507 to the Catalyst 6509 (and thereby to the 7206 and the OSA-Express). It is described following the example. Example 6-37 7507 configuration for OSA-Express interface Port-channel1 ip address 9.67.157.137 255.255.255.240 ip route-cache distributed ip ospf cost 1 ip igmp join-group 224.0.1.2 full-duplex hold-queue 300 in ! interface FastEthernet0/0/0 no ip address ip route-cache distributed full-duplex service-policy input SETDSCP channel-group 1 ! interface FastEthernet0/0/1 no ip address ip route-cache distributed full-duplex service-policy input SETDSCP channel-group 1

186

Networking with z/OS and Cisco Routers: An Interoperability Guide

!

Where: 򐂰 interface Port-channel 1 is the port channel interface. All the Fast Ethernet ports will be included in this and then treated as a single interface. 򐂰 ip address... is the IP address in the same subnet as the OSA-Express and the Gigabit Ethernet interface in the 7206. 򐂰 full-duplex indicates that all the ports aggregated in this port channel will be in full duplex. 򐂰 interface FastEthernet... are the two Fast Ethernet interfaces on the IO card itself. These are 100 Mbps only and not 10/100 Mbps. Consequently, the speed does not need to be set to 100 Mbps. In fact, the speed command is not even valid under these interfaces because they only run and 100 mpbs. 򐂰 no ip address indicates that each member of the port channel does not have a separate IP address. There is only one IP address and it is under the port-channel interface itself. 򐂰 full-duplex sets the interfaces to run in full-duplex mode. This prevents any duplex auto-negotiation problems. 򐂰 Channel-group 1 puts this Fast Ethernet interface into port channel 1. Each physical interface that is to be part of the port-channel needs to have this command.

6.3.4 7206 configuration The 7206 is connected to the Catalyst 6509 via a Gigabit Ethernet interface. Example 6-38 shows the portion of the configuration we used to connect the 7206 to the Catalyst 6509 (and thereby to the 7507 and the OSA-Express). It is described following the example. Example 6-38 7206 configuration for OSA-Express interface GigabitEthernet4/0 ip address 9.67.157.136 255.255.255.240 ip pim dense-mode ip igmp join-group 224.0.1.2 negotiation auto service-policy input SETDSCP ip rsvp bandwidth 1 1 ip rsvp udp-multicasts 224.0.0.14 ip rsvp dsbm candidate 100 !

Where: Chapter 6. Configuring CLAW, MPC+ and OSA-Express

187

򐂰 interface GigabitEthernet.. is the Gigabit Ethernet blade. 򐂰 ip address... is the IP address in the same subnet as the OSA-Express and the fast EtherChannel interface in the 7507.

6.3.5 VTAM and TCP/IP definition Currently only CS for z/OS IP uses the OSA-Express function (SNA can only use it indirectly with the use of Enterprise Extender). You need to define a TRLE statement in VTAM and several statements in the TCP/IP profile. Descriptions of each follow.

VTAM TRLE definitions for OSA-Express Example 6-39 through Example 6-42 shows the TRL major nodes that we defined for the OSA-Express device. Example 6-39 MVS001 OSA-Express TRLE *********************************************************** * TRL FOR OSA GIGABIT ETHERNET - 2F00-2F0E *********************************************************** * N04GIG1 TRLE LNCTL=MPC, READ=(2F14), WRITE=(2F15), MPCLEVEL=QDIO, DATAPATH=(2F16,2F17), PORTNAME=(GIGE2F00,0) *

* * * * *

Example 6-40 MVS062 OSA-Express TRLE *********************************************************** * TRL FOR OSA GIGABIT ETHERNET - 2F00-2F0E *********************************************************** * N05GIG1 TRLE LNCTL=MPC, READ=(2F04), WRITE=(2F05), MPCLEVEL=QDIO, DATAPATH=(2F06,2F07), PORTNAME=(GIGE2F00,0) *

188

Networking with z/OS and Cisco Routers: An Interoperability Guide

* * * * *

Example 6-41 MVS069 OSA-Express TRLE *********************************************************** * TRL FOR OSA GIGABIT ETHERNET - 2F00-2F0E *********************************************************** * N07GIG1 TRLE LNCTL=MPC, READ=(2F18), WRITE=(2F19), MPCLEVEL=QDIO, DATAPATH=(2F1A,2F1B), PORTNAME=(GIGE2F00,0) *

* * * * *

Example 6-42 MVS154 OSA-Express TRLE *********************************************************** * TRL FOR OSA GIGABIT ETHERNET - 2F00-2F0E *********************************************************** * N07GIG1 TRLE LNCTL=MPC, READ=(2F08), WRITE=(2F09), MPCLEVEL=QDIO, DATAPATH=(2F0A,2F0B), PORTNAME=(GIGE2F00,0) *

* * * * *

Where: 򐂰 READ=(xxxx) 1 defines the device address that is used for reading control data. 򐂰 WRITE=(xxxx) 2 defines the device address used for writing control data. 򐂰 MPCLEVEL=QDIO 3 indicates VTAM is to use the Queued Direct I/O interface for communicating with the device. OSA-Express must use QDIO in its MPCLEVEL definition. 򐂰 DATAPATH=(xxxx,...) 4 specifies the subchannel addresses used to read and write data through an OSA-Express connection. Each TCP/IP instance within an LPAR that issues a START DEVICE statement for an OSA-Express feature will be assigned one of the DATAPATH channels by VTAM. Sufficient DATAPATH subchannel addresses must be coded for the number of concurrent instances that will be using an OSA-Express port in this LPAR. In our case we had only a single TCPIP stack per LPAR, but there were two addresses defined. One was unused.

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

189

򐂰 PORTNAME= 5 specifies the name that will be used in TCPIP.PROFILE to define the device. In comparison, the other TCP/IP devices use the TRLE name in the device definition.

TCP/IP profile statements for OSA-Express OSA-Express devices need a DEVICE, LINK and START statement as well as an entry in the HOME statement. Example 6-43 through Example 6-46 show our definitions for our four LPARs. Descriptions follow. Example 6-43 TCP/IP profile excerpt from MVS001 for OSA-Express ;****************************************************************** ; GIGABIT ETHERNET * ;****************************************************************** ; DEVICE GIGE2F00 MPCIPA PRIROUTER LINK GIGELINK IPAQGNET GIGE2F00 START GIGE2F00 ; ;****************************************************************** ; HOME ADDRESSES FOR THIS STACK ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.1 VLINK0 ; Static VIPA for EE Connections 9.67.157.129 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.66 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.17 CISCO2 ; This goes to Cisco 7507 CMPC

Example 6-44 TCP/IP profile excerpt from MVS062 for OSA-Express ;****************************************************************** ; GIGABIT ETHERNET * ;****************************************************************** ; DEVICE GIGE2F00 MPCIPA NONROUTER LINK GIGELINK IPAQGNET GIGE2F00 START GIGE2F00 ; ; ;****************************************************************** ; HOME STATEMENT ;****************************************************************** ; HOME ;

190

Networking with z/OS and Cisco Routers: An Interoperability Guide

; ;

ADDRESS =========== 9.67.156.161 9.67.157.130 9.67.156.69 9.67.156.18

LINK_NAME ========= VLINK0 GIGELINK CISCO1 CISCO2

; ; ; ;

Static VIPA for EE This goes to Cisco This goes to Cisco This goes to Cisco

Connections 7513 thru GIGE 7206 CLAW Packing 7507 CMPC

Example 6-45 TCP/IP profile excerpt from MVS069 for OSA-Express ;****************************************************************** ; GIGABIT ETHERNET * ;****************************************************************** ; DEVICE GIGE2F00 MPCIPA SECROUTER LINK GIGELINK IPAQGNET GIGE2F00 START GIGE2F00 ; ;****************************************************************** ; HOME ADDRESSES FOR THIS STACK ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= 9.67.156.5 VLINK0 ; Static VIPA for EE Connections 9.67.157.132 GIGELINK ; This goes to Cisco 7513 thru GIGE 9.67.156.67 CISCO1 ; This goes to Cisco 7206 CLAW Packing 9.67.156.20 CISCO2 ; This goes to Cisco 7507 CMPC

Example 6-46 TCP/IP profile excerpt from MVS154 for OSA-Express ;****************************************************************** ; GIGABIT ETHERNET * ;****************************************************************** ; DEVICE GIGE2F00 MPCIPA NONROUTER LINK GIGELINK IPAQGNET GIGE2F00 START GIGE2F00 ; ;****************************************************************** ; HOME STATEMENT ;****************************************************************** ; HOME ; ; ADDRESS LINK_NAME ; =========== ========= Chapter 6. Configuring CLAW, MPC+ and OSA-Express

191

9.67.156.165 9.67.157.131 9.67.156.68 9.67.156.19

VLINK0 GIGELINK CISCO1 CISCO2

; ; ; ;

Static VIPA for EE This goes to Cisco This goes to Cisco This goes to Cisco

Connections 7513 thru GIGE 7206 CLAW Packing 7507 CMPC

The format of the DEVICE and LINK statements are: DEVICE dev-name MPCIPA NONRouter | PRIRouter | SECRouter NOAUTORestart | AUTORestat LINK link-name IPAQGNET0 dev-name

Where: 򐂰 dev-name defines the device name for this device. Both the dev-name fields in the DEVICE and LINK statements should match the PORTNAME specified in the TRLE definition of this OSA-Express Gigabit Ethernet device. 򐂰 MPCIPA defines the OSA-Express device to TCP/IP. 򐂰 PRIRouter parameter specifies that if a datagram is received at this device for an unknown IP address, the device will route the datagram to this TCP/IP stack. Alternatively, this TCP/IP stack could also function as a backup to the primary router by coding SECRouter as a parameter. However, if NONRouter was specified or no TCP/IP stack was designated as PRIRouter or SECRouter, the device will simply discard the datagram. 򐂰 When the OSA-Express GbE is started, TCP/IP registers the entire set of local (home) IP addresses for this TCP/IP instance to OSA-Express. This allows the device to route datagrams destined for those IP addresses to this TCP/IP instance. (If the device receives a datagram destined for an unregistered IP address, then OSA-Express sends the datagram to the TCP/IP instance that is defined as the primary router or secondary router for this device.) If you change any of the home IP addresses in this TCP/IP stack, you need not stop and restart the MPCIPA device for the OSA-Express routing to take effect for the new home IP addresses, since the stack will dynamically register/deregister the address. 򐂰 IPAQGNET defines this device as an IP-assisted, QDIO family, Gigabit Ethernet or Fast Ethernet device. The link type of IPAQENET is also supported for the use of either Gigabit Ethernet or Fast Ethernet, indicating the more generic family of Ethernet (ENET) links.

The format of the HOME statement is: HOME ip_address1 link_name1 ip_address2 link_name2...

Each link name must have an entry in the HOME section. This defines the z/OS’s side of the OSA-Express link. This must be in the same subnet as the routers’ LAN interface IP addresses. 192

Networking with z/OS and Cisco Routers: An Interoperability Guide

Router show commands Since the routers don’t specifically know they are talking with an OSA-Express, the commands are just the typical LAN type commands. In the 7206, the interface is the Gigabit Ethernet. In the 7507, the interface is the port channel. Descriptions follow. Example 6-47 7206 show interface gigabitEthernet C7200-Z55#sho interface gigabitEthernet 4/0 GigabitEthernet4/0 is up, line protocol is up Hardware is WISEMAN, address is 0002.4adf.ec70 (bia 0002.4adf.ec70) Internet address is 9.67.157.136/28 MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation ARPA, loopback not set Full-duplex mode, link type is autonegotiation, media type is SX output flow-control is on, input flow-control is unsupported ARP type: ARPA, ARP Timeout 04:00:00 Last input 00:00:00, output 00:00:00, output hang never Last clearing of "show interface" counters 1w1d Queueing strategy: fifo Output queue 0/40, 0 drops; input queue 0/75, 0 drops 5 minute input rate 1000 bits/sec, 2 packets/sec 5 minute output rate 0 bits/sec, 0 packets/sec 1392115 packets input, 250322215 bytes, 0 no buffer Received 2528 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 watchdog, 701243 multicast, 0 pause input 0 input packets with dribble condition detected 1420969 packets output, 111325326 bytes, 0 underruns(0/0/0) 0 output errors, 0 collisions, 7 interface resets 0 babbles, 0 late collision, 0 deferred 1 lost carrier, 0 no carrier, 0 pause output 0 output buffer failures, 0 output buffers swapped out

Example 6-48 7507 show interface port-channel command NIVT7507#sho interfaces port-channel 1 Port-channel1 is up, line protocol is up Hardware is FEChannel, address is 00e0.fe16.9800 (bia 0000.0000.0000) Internet address is 9.67.157.137/28 MTU 1500 bytes, BW 200000 Kbit, DLY 100 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation ARPA, loopback not set Keepalive set (10 sec) ARP type: ARPA, ARP Timeout 04:00:00 No. of active members in this channel: 2

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

193

Member 0 : FastEthernet0/0/0 , Full-duplex, 100Mb/s Member 1 : FastEthernet0/0/1 , Full-duplex, 100Mb/s Last input 00:00:01, output never, output hang never Last clearing of "show interface" counters 00:02:32 Queueing strategy: fifo Output queue 0/80, 0 drops; input queue 0/150, 0 drops 5 minute input rate 1000 bits/sec, 2 packets/sec 5 minute output rate 0 bits/sec, 0 packets/sec 329 packets input, 25952 bytes, 0 no buffer Received 4294967195 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 watchdog 0 input packets with dribble condition detected 117 packets output, 8648 bytes, 0 underruns(0/0/0) 0 output errors, 0 collisions, 0 interface resets 0 babbles, 0 late collision, 0 deferred 0 lost carrier, 0 no carrier 0 output buffer failures, 0 output buffers swapped out

Where: 򐂰 GigabitEthernet4/0 is up, line protocol is up Port-channel1 is up, line protocol is up

These are two status fields that indicate the state of the interface and of the protocol. Both of them have to be in up state for normal operation. The first up indicates if the interface hardware is currently active and if it has been taken down by an administrator. If it is “administratively down”, then the channel has been shut down. You must enter interface configuration mode and issue a no shut command. If the channel is simply down, then most likely an Ethernet cable is not plugged in or there is a problem with the switch (such as the port is disabled). The line protocol status is an internal status indicating whether the software processes that handle the line protocol consider the port usable. 򐂰 5 minute input rate 1000 bits/sec, 2 packets/sec

These two fields indicate the amount of data the interface is transferring from the 6509 switch to the router in bits per second and packets per second in the last 5-minute interval. To get a rough average of the size of the packets being sent from the 6509 switch, you can divide the rate in bits per second by the rate in packets per second. This will produce the average packet size in bits. Multiply this by 8 to get the average size in bytes. For the port channel, this is the aggregate of all the included physical interfaces. 򐂰 5 minute output rate 0 bits/sec, 0 packets/sec

194

Networking with z/OS and Cisco Routers: An Interoperability Guide

These two fields indicate the amount of data the entire channel interface is transferring to 6509 switch from the router in bits per second and packets per second in the last 5-minute interval. To get a rough average of the size of the packets being sent to 6509 switch, you can divide the rate in bits per second by the rate in packets per second. This will produce the average packet size in bits. Multiply this by 8 to get the average size in bytes.

Switch show commands The show module command shows the modules (blades) installed in the Catalyst switch. Although our switch had an MSFC (router module) installed, we did not use it. Example 6-49 Catalyst 6509 show module command CAT6K (enable) Mod Slot Ports --- ---- ----1 1 2 15 1 1 2 2 48

show module Module-Type ------------------------1000BaseX Supervisor Multilayer Switch Feature 10/100BaseTX Ethernet

Mod Module-Name --- ------------------1 15 2

Model ------------------WS-X6K-SUP1A-2GE WS-F6K-MSFC WS-X6248-RJ-45

Sub --yes no no

Status -------ok ok ok

Serial-Num ----------SAD0351053W SAD0351014G SAD03131473

Mod MAC-Address(es) --- -------------------------------------1 00-d0-bc-f2-17-da to 00-d0-bc-f2-17-db 00-d0-bc-f2-17-d8 to 00-d0-bc-f2-17-d9 00-d0-03-16-18-00 to 00-d0-03-16-1b-ff 15 00-d0-bc-f2-17-dc to 00-d0-bc-f2-18-1b 2 00-50-54-6d-97-0c to 00-50-54-6d-97-3b

Hw Fw Sw ------ ---------- -------------1.0 5.2(1) 5.5(4b)

1.2 1.0

12.1(2)E, 12.1(2)E, 4.2(0.24)V 5.5(4b)

Mod Sub-Type Sub-Model Sub-Serial Sub-Hw --- ----------------------- ------------------- ----------- -----1 L3 Switching Engine WS-F6K-PFC SAD035100BL 1.0

The show channel group command shows the EtherChannel group status information. Example 6-50 Catalyst 6509 show channel group command CAT6K (enable) show channel group 96 Admin Port Status Channel Channel group Mode id ----- ----- ---------- -------------------- -------96 2/25 connected on 813 Chapter 6. Configuring CLAW, MPC+ and OSA-Express

195

96

2/26 connected

on

813

Admin Port Device-ID Port-ID Platform group ----- ----- ------------------------------- ---------------------------------96 2/25 Not directly connected to switch 96 2/26 NIVT7507 FastEthernet0/0/1 cisco RSP4

CAT6K (enable) sho port channel statistics Port Admin PAgP Pkts PAgP Pkts PAgP Pkts Group Transmitted Received InFlush ----- ------- ----------- --------- --------2/25 96 65 0 0 2/26 96 65 0 0 ----- ------- ----------- --------- ---------

PAgP Pkts RetnFlush --------0 0 ---------

PAgP Pkts OutFlush --------0 0 ---------

PAgP Pkts InError --------0 0 ---------

Each port that is a member of the EtherChannel group is listed as well as its status. Notice that port 2/25 shows “Not directly connected to switch” while port 2/26 shows the name, interface and platform of the 7507. This is because the router does not support Port Aggregation Protocol (PAgP). This is why the ports had to be explicitly channelized in the switch in order for this to work. The switch then determines its neighbor utilizing Cisco Discovery Protocol (CDP). On the router the CDP frame describing itself is only sent via the port channel. It will only travel down one of the FastEthernet links in the PortChannel (in this case FastEthernet 0/0/1). This is just a cosmetic issue and does not affect the performance of the EtherChannel. There is also a show channel group info command that gives more detailed information for the channel group. The show port channel statistics command just shows that the switch has sent out PAgP packets to the router but has not received any. This is another indication that routers do not use PAgP. The show port command displays port status information for one or more ports. Example 6-52 shows the command issued against one of the Fast Ethernet ports in the EtherChannel. Example 6-51 Catalyst 650 show port command CAT6K (enable) sho port 2/25 Port Name Status Vlan Duplex Speed Type ----- ------------------ ---------- ---------- ------ ----- -----------2/25 connected 400 full 100 10/100BaseTX

196

Networking with z/OS and Cisco Routers: An Interoperability Guide

Port

AuxiliaryVlan AuxVlan-Status

InlinePowered Admin Oper Detected ----- ------------- -------------- ----- ------ -------2/25 none none -

PowerAllocated mWatt mA @42V ----- --------

Port Security Violation Shutdown-Time Age-Time Max-Addr Trap IfIndex ----- -------- --------- ------------- -------- -------- -------- ------2/25 disabled shutdown 0 0 1 disabled 35 Port Num-Addr Secure-Src-Addr Age-Left Last-Src-Addr Shutdown/Time-Left ----- -------- ----------------- -------- ----------------- -----------------2/25 0 Port Broadcast-Limit Broadcast-Drop -------- --------------- -------------------2/25 0 Port ----2/25 Port

Send FlowControl admin oper -------- -------off off Status

Receive FlowControl admin oper -------- -------off off

Channel Mode ----- ---------- -------------------2/25 connected on 2/26 connected on ----- ---------- --------------------

Admin Group ----96 96 -----

RxPause TxPause Unsupported opcodes ------- ------- ----------0 0 0

Ch Id ----813 813 -----

Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize ----- ---------- ---------- ---------- ---------- --------2/25 0 0 0 0 0 Port Single-Col Multi-Coll Late-Coll Excess-Col Carri-Sen Runts Giants ----- ---------- ---------- ---------- ---------- --------- --------- --------2/25 0 0 0 0 0 2 0 Last-Time-Cleared -------------------------Wed Jul 4 2001, 01:01:54

Where: 򐂰 Status connected indicates the port is operational. 򐂰 VLAN 400 indicates the VLAN that this port is a member of. 򐂰 Duplex full indicates that the port is running in full duplex mode. Chapter 6. Configuring CLAW, MPC+ and OSA-Express

197

򐂰 Speed 100 indicates that the port is running in 100 Mbps mode. 򐂰 Type 10/100BaseTX indicates that this port is capable of running both as a 10 Mbps and 100 Mbps port. This is why we hard-coded the speed to 100 Mbps.

The show port counters command shows some of the most command error counters for a port or ports. Example 6-52 Catalyst 6509 show port counters command CAT6K (enable) sho port counters 2/25 Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize ----- ---------- ---------- ---------- ---------- --------2/25 0 0 0 0 0 Port Single-Col Multi-Coll Late-Coll Excess-Col Carri-Sen Runts Giants ----- ---------- ---------- ---------- ---------- --------- --------- --------2/25 0 0 0 0 0 2 0 Last-Time-Cleared -------------------------Wed Jul 4 2001, 01:01:54

The show vlan command shows information regarding a specific VLAN. In our case this is the VLAN that is common to the OSA-Express, the 7206’s Gigabit Ethernet and the 7507’s EtherChannel. Example 6-53 Catalyst 6509 show vlan command CAT6K (enable) sho vlan 400 VLAN Name Status IfIndex Mod/Ports, Vlans ---- -------------------------------- --------- ------- ---------------------400 OSA-Express-VLAN active 62 1/1-2 2/25-26 15/1

VLAN Type SAID MTU Parent RingNo BrdgNo Stp BrdgMode Trans1 Trans2 ---- ----- ---------- ----- ------ ------ ------ ---- -------- ------ -----400 enet 100400 1500 0 0

VLAN DynCreated RSPAN ---- ---------- -------400 static disabled

VLAN AREHops STEHops Backup CRF 1q VLAN

198

Networking with z/OS and Cisco Routers: An Interoperability Guide

---- ------- ------- ---------- -------

The VLAN 400 name is shown followed by its status. Mod/Ports lists all the ports that are members of this VLAN. Notice the port 15/1. This is the MSFC’s connection to the switch and is not a physical port. This would allow the MSFC to have a VLAN defined and participate in routing if needed. The show counters command displays detailed statistic information for a given port. It is a command that must be entered in its entirety. All these counters are documented in the Cisco Catalyst 6000 5.1 switch documentation. Example 6-54 Catalyst 6509 show counters command CAT6K (enable) sho counters ? Unrecognized command! CAT6K (enable) sho counters Usage: show counters CAT6K (enable) sho counters 2/1 64 bit counters 0 rxHCTotalPkts 1 txHCTotalPkts 2 rxHCUnicastPkts 3 txHCUnicastPkts 4 rxHCMulticastPkts 5 txHCMulticastPkts 6 rxHCBroadcastPkts 7 txHCBroadcastPkts 8 rxHCOctets 9 txHCOctets 10 rxTxHCPkts64Octets 11 rxTxHCPkts65to127Octets 12 rxTxHCPkts128to255Octets 13 rxTxHCPkts256to511Octets 14 rxTxHCpkts512to1023Octets 15 rxTxHCpkts1024to1518Octets 16 txHCTrunkFrames 17 rxHCTrunkFrames 18 rxHCDropEvents 32 bit counters 0 rxCRCAlignErrors 1 rxUndersizedPkts 2 rxOversizedPkts 3 rxFragmentPkts 4 rxJabbers 5 txCollisions 6 ifInErrors 7 ifOutErrors

= = = = = = = = = = = = = = = = = = = = = = = = = = =

934443 1566712 430716 258427 425634 1134608 78092 173677 90504483 123839729 1408128 956639 92323 34513 833 8717 0 0 0 0 0 0 1 0 3 1 2

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

199

8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 0 1

ifInDiscards ifInUnknownProtos ifOutDiscards txDelayExceededDiscards txCRC linkChange wrongEncapFrames dot3StatsAlignmentErrors dot3StatsFCSErrors dot3StatsSingleColFrames dot3StatsMultiColFrames dot3StatsSQETestErrors dot3StatsDeferredTransmisions dot3StatsLateCollisions dot3StatsExcessiveCollisions dot3StatsInternalMacTransmitErrors dot3StatsCarrierSenseErrors dot3StatsFrameTooLongs dot3StatsInternalMacReceiveErrors txPause rxPause

= = = = = = = = = = = = = = = = = = = = =

46 0 3 0 2 12 0 0 0 3 0 0 16 2 0 0 0 0 0 0 0

6.3.6 z/OS OSA-Express commands The TRL display command You can check the VTAM TRLE when a TCP/IP device is started. To do this, you can use the VTAM TRL display command (D NET,TRL) as shown in Example 6-55. First all the TRLEs are displayed. Then a specific MPC+ TRLE is displayed. Example 6-55 D NET,TRL and D NET,ID=trle commands for OSA-Express on MVS001 D NET,TRL IST097I DISPLAY ACCEPTED IST350I DISPLAY TYPE = TRL 080 IST1314I TRLE = ISTTN4N5 STATUS IST1314I TRLE = ISTTN4N6 STATUS IST1314I TRLE = IUTW1FA0 STATUS IST1314I TRLE = ISTTN4N7 STATUS IST1314I TRLE = IUTSAMEH STATUS IST1314I TRLE = TRLMVS62 STATUS IST1314I TRLE = TRLMVS69 STATUS IST1314I TRLE = TRLMVS54 STATUS IST1314I TRLE = N04GIG1 STATUS IST1314I TRLE = N04CMPC STATUS IST1454I 10 TRLE(S) DISPLAYED IST314I END D NET,ID=N04GIG1

200

= = = = = = = = = =

ACTIV ACTIV ACTIV ACTIV ACTIV ACTIV ACTIV ACTIV ACTIV INACT

Networking with z/OS and Cisco Routers: An Interoperability Guide

CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL CONTROL

= = = = = = = = = =

XCF XCF TCP XCF MPC MPC MPC MPC MPC MPC

IST097I DISPLAY ACCEPTED IST075I NAME = N04GIG1, TYPE = TRLE 083 IST486I STATUS= ACTIV, DESIRED STATE= ACTIV IST087I TYPE = LEASED , CONTROL = MPC , HPDT = YES IST1715I MPCLEVEL = QDIO MPCUSAGE = SHARE IST1716I PORTNAME = GIGE2F00 LINKNUM = 0 OSA CODE LEVEL = 0414 IST1577I HEADER SIZE = 4096 DATA SIZE = 0 STORAGE = ***NA*** IST1221I WRITE DEV = 2F15 STATUS = ACTIVE STATE = ONLINE IST1577I HEADER SIZE = 4092 DATA SIZE = 0 STORAGE = ***NA*** IST1221I READ DEV = 2F14 STATUS = ACTIVE STATE = ONLINE IST1221I DATA DEV = 2F16 STATUS = ACTIVE STATE = N/A IST1724I I/O TRACE = OFF TRACE LENGTH = *NA* IST1717I ULPID = TCP IST1815I IQDIO ROUTING DISABLED IST1757I PRIORITY1: UNCONGESTED PRIORITY2: UNCONGESTED IST1757I PRIORITY3: UNCONGESTED PRIORITY4: UNCONGESTED IST1801I UNITS OF WORK FOR NCB AT ADDRESS X'120FE010' IST1802I P1 CURRENT = 1 AVERAGE = 1 MAXIMUM = 3 IST1802I P2 CURRENT = 0 AVERAGE = 0 MAXIMUM = 0 IST1802I P3 CURRENT = 0 AVERAGE = 0 MAXIMUM = 0 IST1802I P4 CURRENT = 0 AVERAGE = 1 MAXIMUM = 6 IST1221I DATA DEV = 2F17 STATUS = RESET STATE = N/A IST1724I I/O TRACE = OFF TRACE LENGTH = *NA* IST1500I STATE TRACE = OFF

IST314I END

Where: 򐂰 STATUS=ACTIV indicates that the OSA-Express has been activated by the TCPIP stack. If the TRLE’s status is INACTIVE then the V TCPIP,,START,dev-name command needs to be issued. The TRLE itself cannot be activated. 򐂰 MPCLEVEL=QDIO indicates this is an OSA-Express. 򐂰 Each read, write and control channel is shown with its status by the display command. 򐂰 You can also see priority queue information. The first set of lines (IST1757I messages) shows the last 10 seconds of congestion if congestion is detected. 򐂰 The next set of lines (the IST1802I messages) indicates the current number of packets per queue, the average and the high water. 򐂰 The last IST1221I message for device 2F17 indicates a status of reset. This was the second device listed in the DATAPATH and since we only have one TCP/IP stack running in this LPAR, it is unused.

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

201

Display Netstat Devlinks command You can display the device status with the normal TCP/IP display command as shown in Example 6-56. Unfortunately, this display will show you all devices and you must search the response for the device you are interested in. In this example the extra devices have been deleted. The LNKSTATUS has to show READY for the device to be operational. Example 6-56 D TCPIP,,NETSTAT,DEVLINKS command on MVS001 D TCPIP,,NETSTAT,DEVLINKS EZZ2500I NETSTAT CS V1R2 TCP 088 DEVNAME: LOOPBACK DEVTYPE: LOOPBACK DEVNUM: 0000 DEVSTATUS: READY . . lines deleted . DEVNAME: GIGE2F00 DEVTYPE: MPCIPA DEVNUM: 0000 DEVSTATUS: READY CFGROUTER: PRI ACTROUTER: PRI LNKNAME: GIGELINK LNKTYPE: IPAQENET LNKSTATUS: READY NETNUM: 0 QUESIZE: 0 SPEED: 0000001000 BYTESIN: 13718096 BYTESOUT: 8758489 BROADCASTCAPABILITY: NO ARPOFFLOAD: YES ARPOFFLOADINFO: YES BSD ROUTING PARAMETERS: MTU SIZE: 01500 METRIC: 01 DESTADDR: 0.0.0.0 SUBNETMASK: 255.255.255.240 MULTICAST SPECIFIC: MULTICAST CAPABILITY: YES GROUP REFCNT ---------224.0.0.5 0000000001 224.0.0.1 0000000001

Where: 򐂰 IPAQGNET indicates that the link uses the IP assist-based interface (IPA), belongs to the QDIO family of interfaces (Q), and uses the Gigabit Ethernet protocol (GNET). 򐂰 ACTROUTER: PRI indicates that this stack is the primary router for packets with unknown (non-registered addresses). If you have multiple TCP/IP stacks sharing the OSA-Express device, you must ensure that there is only one designated primary and secondary router for the device. Trying to assign another TCP/IP stack as either primary or secondary will result in a warning message for the START processing of the device for that TCP/IP stack.

202

Networking with z/OS and Cisco Routers: An Interoperability Guide

When the OSA-Express GbE is started, TCP/IP registers the entire set of local (home) IP addresses for this TCP/IP instance to OSA-Express. This allows the device to route datagrams destined for those IP addresses to this TCP/IP instance. (If the device receives a datagram destined for an unregistered IP address, then OSA-Express sends the datagram to the TCP/IP instance that is defined as the primary router or secondary router for this device.) If you change any of the home IP addresses in this TCP/IP stack, you must stop and restart the MPCIPA device for the OSA-Express routing to take effect for the new home IP addresses. Two multicast groups are listed: 224.0.0.1, which is all the systems on a subnet address, and 224.0.0.5, which is the OSPF All SPF Routers address. If the z/OS OSPF was the Designated Router (DR) on the subnet, then the All DR Routers multicast address of 224.0.0.6 would also be listed. The CASA multicast address of 224.0.1.2 is not listed here because the forwarding agents in the two routers never multicast to the Services Manager. Consequently, the Services Manager does not need to IGMP join for the CASA multicast address.

Chapter 6. Configuring CLAW, MPC+ and OSA-Express

203

204

Networking with z/OS and Cisco Routers: An Interoperability Guide

7

Chapter 7.

Routing with OSPF and EIGRP This chapter describes the configuration and interaction of dynamic routing within the sysplex and dynamic routing within the network. While dynamic routing is normally thought of purely as a networking function, in the sysplex environment with multiple system images, TCP/IP stacks, and application spaces, dynamic routing on the host becomes attractive, particularly when you want to make use of dynamic VIPAs, Sysplex Distributor, and redundant shared interfaces. In this chapter we cover the configuration used in our lab environment in RTP, NC. We’ll examine the specific configuration steps to implement OSPF in the sysplex and how to connect to a Cisco network that uses EIGRP.

© Copyright IBM Corp. 2002

205

7.1 Topology overview Before we can discuss the configuration details we must have a good understanding of the environment and the connectivity we want to achieve. Our goal in creating the test scenario was to show the configuration of dynamic routing using OSPF within the sysplex and the connectivity to a Cisco network using EIGRP routing, along with the redistribution necessary to achieve a stable environment. Figure 7-1 illustrates the network used in our lab environment.

IBM z/OS Sysplex

OSPF NETWORK AREA 1.1.1.1

1/1

c7507

6509

211.1.1.0/24

2/25 2/26

fa0/0/0 .137

.1

.2

fa0/0/1

c7206 g4/0

1/2

9.67.157.128/28 vlan 400

.136

9.67.157.128/28 vlan 400

EIGRP NETWORK

.146

.145

211.1.2.x .9

.1

.5

16

.2

211.1.2. 8/

17

0 0/3 .2. 1.1 21

16

18

.14

30

211 .1.2 .4/3 0

17

Frame Relay Network

17 .6

c3640n1 .97 fa 3/0 9.67.156.96/28 vlan 100

16 .18

211.1.2.

12/30

17

0 .16 /3 .1.2 211

17 .10

c3640n2 .113 fa 3/0 9.67.156.112/28 vlan 200

.13

16

.17

21 1.1 .2. 20 /3 0

c7513

.21

18

16

.22

c3640n3 .129 fa 3/0 9.67.156.128/28 vlan 300

Figure 7-1 High-level network diagram

As shown in Figure 7-1, our sysplex is connected to a Cisco Catalyst 6509 switch and two Cisco routers; a 7507 and a 7206. There is also connectivity from the sysplex to a Cisco 7513 that is not detailed in the above diagram. Three remote Cisco 3640 routers are connected via a frame relay network. The frame relay

206

Networking with z/OS and Cisco Routers: An Interoperability Guide

network included a Cisco IGX 8400 WAN switch. The sysplex environment used in our lab tests was shared among other groups and connected to a production network. Because of this, some special design considerations were made in the creation of our test network. Many of the subnets used in the network were previously assigned. Some of the subnets were added specifically for our purposes and are not really part of any engineered addressing scheme. When you deploy OSPF routing and initially design your own addressing scheme for your sysplex environment, it is important that you choose your subnets and addresses carefully following an enterprise addressing scheme. Plan ahead and reserve enough address space for all of the potential systems and TCP/IP stacks that you will implement. You will find that the extra time spent analyzing your requirements and planning will pay off.

7.1.1 Routing topology We used EIGRP for the network routing protocol so we could describe the configuration required to redistribute routing information between EIGRP and OSPF and vice versa. As mentioned before, the sysplex environment was already connected to a production network and shared by other groups. So we decided to separate the production network from our test environment in such a way as to be able to accomplish our objectives with the least disruption to other testers. Our network, therefore, was segmented into a separate OSPF area that contained only the sysplex. We designated this area to be area 1.1.1.1. This strategy is recommended for most enterprises. It is wise to create an area specifically for the sysplex. It is also recommended that the area be defined as a stub area to minimize routing updates and OSPF topology calculations on the host systems. The Cisco router network will be configured for EIGRP with the two 7000 series routers acting as autonomous system border routers (ASBRs). It will be the responsibility of the ASBRs to perform the redistribution of routing information between EIGRP and OSPF.

Chapter 7. Routing with OSPF and EIGRP

207

IBM z/OS Sysplex

OSPF Area 0.0.0.0 OSPF Area 1.1.1.1

EIGRP Network

Figure 7-2 Routing topology

As depicted in Figure 7-2, the sysplex environment we used in our tests was configured as OSPF area 1.1.1.1. There was a connection to a production network through a 7513 router (see path to area 0.0.0.0). Our small test network consisted of two 7000 series routers and three 3640 routers. The two 7000 series routers redistribute routing information between OSPF and EIGRP.

208

Networking with z/OS and Cisco Routers: An Interoperability Guide

7.2 OSPF configuration in the sysplex The topology for the sysplex includes four LPARs, MVS001, MVS062, MVS069, and MVS154. Each LPAR contains one TCP/IP stack. Each of the LPARs is connected to the two 7000 series routers over an ESCON channel as well as over a shared Gigabit Ethernet OSA-Express adapter. The ESCON interface to the 7507 is configured for CMPC+ and the ESCON interface to the 7206 is configured for CLAW. Refer to Figure 7-3 on page 210 for details on IP interface and subnet addresses. The routers that connect via the shared OSA-Express adapter are also configured with Generic Routing Encapsulation (GRE) tunnels. This is necessary to ensure the shared OSA-Express adapter can distinguish packets destined for each stack when distributed VIPAs are active on multiple stacks, as in the case of multi-node load balancing. Within the sysplex, XCF links also connect each of the stacks on each LPAR.

Chapter 7. Routing with OSPF and EIGRP

209

IBM z/OS Sysplex .73

9.67.156.72/29

.74

Sysplex Distributor (Service Manager) Static VIPAs 9.67.156.1/30 Dynamic VIPAs 9.67.156.25/29 9.67.156.26/29 Distributed VIPAs 9.67.157.17/29 9.67.157.18/29 .66

.17

Static VIPAs 9.67.156.161/30 Dynamic VIPAs 9.67.156.33/29 9.67.156.34/29

.129

.69

.18

.75

MVS062 MVS069

MVS062 MVS069

MVS001

.76

XCF

Backup Sysplex Distributor (Service Manager) Static VIPAs 9.67.156.5/30 Dynamic VIPAs 9.67.156.49/29 9.67.156.50/29

.130

.67

.20

.132

MVS154 MVS154

Static VIPAs 9.67.156.165/30 Dynamic VIPAs 9.67.156.41/29 9.67.156.42/29

.68

.19

.131

9.67.156.16/29 9.67.156.64/29

READ

WRITE

9.67.157.128/28

D9 CA

D1

AD CMPC+

7507

fa0/0/0

fa0/0/1

ESCON Director

AC F9 CMPC+

.21, .21 .21, .21 c6/0 c6/1

OSAExpress GbitE

CLAW .65, .65 .65, .65 c3/0

7206VXR

GRE Tunnels g1/0 1/2 .136

GRE Tunnels

.137

9.67.157.128/28 vlan 400

Figure 7-3 z/OS sysplex topology

210

Networking with z/OS and Cisco Routers: An Interoperability Guide

2/26 2/25

1/1

6509

7.2.1 OMPROUTE configuration OMPROUTE was defined to be a started task. The AUTOLOG statement in the TCPIP.PROFILE starts OMPROUTE automatically when the stack is activated.The OMPROUTE procedure (Example 7-1) references data set DPTA39.TCPIP.PROFILES.OMPENV to set its environment. Example 7-1 OMPROUTE PROC //OMPROUTE //OMPROUTE // //* //STDENV //STDOUT // // //STDERR // // //CEEDUMP

PROC RE=0M,HOST=&SYSNAME EXEC PGM=BPXBATCH,REGION=&RE,TIME=NOLIMIT, PARM='PGM /usr/sbin/omproute' PARM='PGM /usr/sbin/omproute -s1 -t2 -d4' DD DSN=DPTA39.TCPIP.PROFILES.OMPENV(&HOST),DISP=SHR DD PATH='/tmp/omproute.stdout', PATHOPTS=(OWRONLY,OCREAT,OAPPEND), PATHMODE=(SIRUSR,SIWUSR,SIRGRP,SIWGRP) DD PATH='/tmp/omproute.stderr', PATHOPTS=(OWRONLY,OCREAT,OAPPEND), PATHMODE=(SIRUSR,SIWUSR,SIRGRP,SIWGRP) DD SYSOUT=*,DCB=(RECFM=FB,LRECL=132,BLKSIZE=132)

The member, MVS001, from DPTA39.TCPIP.PROFILES.OMPENV is included in Example 7-2. Example 7-2 OMPROUTE ENV RESOLVER_CONFIG=//'DPTA39.TCPIP.DATA(MVS001)' OMPROUTE_FILE=//'DPTA39.TCPIP.PROFILES.OMPCONF(MVS001)' OMPROUTE_DEBUG_FILE=/tmp/omproute.log

The environment variable OMPROUTE_FILE points to the OMPROUTE configuration data set. This data set contains the statements and parameters that dictate OMPROUTE’s operation. The OMPROUTE configuration file for MVS001 is listed in Example 7-3. Example 7-3 OMPROUTE configuration file for MVS001 RouterID=9.67.156.1; Area Area_Number=1.1.1.1; AS_Boundary_Routing Import_Direct_Routes=YES; Comparison=Type1; OSPF_Interface Name = GIGELINK IP_Address = 9.67.157.129 Attaches_To_Area = 1.1.1.1 MTU = 1500

Chapter 7. Routing with OSPF and EIGRP

211

Retransmission_Interval = 5 Transmission_Delay = 40 Hello_Interval = 10 Dead_Router_Interval = 40 Cost0 = 1 Subnet = Yes Subnet_Mask = 255.255.255.240; OSPF_Interface Name = CISCO1 IP_Address = 9.67.156.66 Destination_Addr = 9.67.156.65 Attaches_To_Area = 1.1.1.1 MTU = 4096 Retransmission_Interval = 5 Transmission_Delay = 120 Hello_Interval = 30 Dead_Router_Interval = 120 Cost0 = 25 Router_Priority = 2 Subnet = Yes Subnet_Mask = 255.255.255.248; OSPF_Interface Name = CISCO2 IP_Address = 9.67.156.17 Destination_Addr = 9.67.156.21 Attaches_To_Area = 1.1.1.1 MTU = 4472 Retransmission_Interval = 5 Transmission_Delay = 120 Hello_Interval = 30 Dead_Router_Interval = 120 Cost0 = 25 Router_Priority = 2 Subnet = Yes Subnet_Mask = 255.255.255.248; ;SPF_Interface ; Name = CISCO3 ; IP_Address = 9.67.157.242 ; Destination_Addr = 9.67.157.241 ; Attaches_To_Area = 1.1.1.1 ; MTU = 4096 ; Retransmission_Interval = 5 ; Transmission_Delay = 120 ; Hello_Interval = 30 ; Dead_Router_Interval = 120 ; Cost0 = 25 ; Router_Priority = 2 ; Subnet = Yes ; Subnet_Mask = 255.255.255.248;

212

Networking with z/OS and Cisco Routers: An Interoperability Guide

OSPF_Interface Name = EZAXCFN5 IP_Address = 9.67.156.73 Subnet_Mask = 255.255.255.248 MTU = 4472 Hello_Interval = 20 Retransmission_Interval = 40 Dead_Router_Interval = 120 Cost0 = 200 Attaches_To_Area = 1.1.1.1 Router_Priority = 0 Subnet = YES; OSPF_Interface Name = EZAXCFN6 IP_Address = 9.67.156.73 Subnet_Mask = 255.255.255.248 MTU = 4472 Hello_Interval = 20 Retransmission_Interval = 40 Dead_Router_Interval = 120 Cost0 = 200 Attaches_To_Area = 1.1.1.1 Router_Priority = 0 Subnet = YES; OSPF_Interface Name = EZAXCFN7 IP_Address = 9.67.156.73 Subnet_Mask = 255.255.255.248 MTU = 4472 Hello_Interval = 20 Retransmission_Interval = 40 Dead_Router_Interval = 120 Cost0 = 200 Attaches_To_Area = 1.1.1.1 Router_Priority = 0 Subnet = YES; Interface Name = DynamicVipa1 IP_Address = 9.67.156.25 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DynamicVipa2 IP_Address = 9.67.156.26 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DynamicVipa3 IP_Address = 9.67.156.33

Chapter 7. Routing with OSPF and EIGRP

213

MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DynamicVipa4 IP_Address = 9.67.156.34 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DynamicVipa5 IP_Address = 9.67.156.41 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DynamicVipa6 IP_Address = 9.67.156.42 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DynamicVipa7 IP_Address = 9.67.156.49 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DynamicVipa8 IP_Address = 9.67.156.50 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DistVipa1 IP_Address = 9.67.157.17 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = DistVipa2 IP_Address = 9.67.157.18 MTU = 4472 Subnet_Mask = 255.255.255.248; Interface Name = TOVTAM IP_Address = 9.67.156.2 MTU = 4472 Subnet_Mask = 255.255.255.252; Interface Name = VLINK0 IP_Address = 9.67.156.1 MTU = 4472 Subnet_Mask = 255.255.255.252;

214

Networking with z/OS and Cisco Routers: An Interoperability Guide

Note the Area_Number = 1.1.1.1 and the Attaches_To_Area = 1.1.1.1 statements in the file above. These statements set the area number for the stack and the area to which the OSPF interface is attached. Note also that we did not configure the area as a stub area. While this would be recommended for most enterprise customers, we elected not to configure a stub area because we did not want to inject a default route into the sysplex area. This was because, as stated earlier, our test environment was connected via another path to a production network. The other reason we did not configure the sysplex as a stub area is so we could show how to perform redistribution in autonomous system border routers. Important: It is recommended that you consider configuring your sysplex OSPF area as a stub area.

To configure the area as a stub area, on the Area statement use the parameter: Stub_Area = Yes

If you specify Stub_area = YES, the area does not receive any AS external link advertisements, reducing the size of your database and decreasing memory usage for routers in the stub area. Be aware that you cannot configure virtual links through a stub area. Nor can you configure a router within the stub area as an AS boundary router. You also cannot configure the backbone as a stub area. External routing in stub areas is based on a default route. Each border area router attaching to a stub area originates a default route for this purpose. The cost of this default route is also configurable with the AREA statement. Note the following in the configuration shown in Example 7-3 on page 211: AS_Boundary_Routing Import_Direct_Routes=YES;

This is used to advertise the non-OSPF_Interfaces to OSPF neighbors. By configuring local interfaces (XCF, VIPA, etc.) as Interface rather than OSPF_Interface, extraneous routing information is not exchanged between stacks in the sysplex.

7.2.2 Verify routing from the host There are several useful commands you can use to verify your routing configuration from the host.

Chapter 7. Routing with OSPF and EIGRP

215

The output from netstat route command is shown in Example 7-4. Example 7-4 netstat route command

netstat route: MVS TCP/IP NETSTAT CS V1R2 Destination Gateway ----------------1.1.1.1 9.67.157.137 1.1.1.2 9.67.157.136 9.67.156.1 0.0.0.0 9.67.156.2 0.0.0.0 9.67.156.4 9.67.157.132 9.67.156.5 9.67.157.132 9.67.156.16 0.0.0.0 9.67.156.17 0.0.0.0 9.67.156.18 9.67.157.130 9.67.156.19 9.67.157.131 9.67.156.20 9.67.157.132 9.67.156.21 0.0.0.0 9.67.156.21 9.67.157.137 9.67.156.25 0.0.0.0 9.67.156.26 0.0.0.0 9.67.156.32 9.67.157.130 9.67.156.33 9.67.157.130 9.67.156.34 9.67.157.130 9.67.156.40 9.67.157.131 9.67.156.41 9.67.157.131 9.67.156.42 9.67.157.131 9.67.156.48 9.67.157.132 9.67.156.49 9.67.157.132 9.67.156.50 9.67.157.132 9.67.156.65 9.67.157.136 9.67.156.66 0.0.0.0 9.67.156.67 9.67.157.132 9.67.156.68 9.67.157.131 9.67.156.69 9.67.157.130 9.67.156.72 0.0.0.0 9.67.156.72 0.0.0.0 9.67.156.72 0.0.0.0 9.67.156.73 0.0.0.0 9.67.156.73 0.0.0.0 9.67.156.73 0.0.0.0 9.67.156.74 0.0.0.0 9.67.156.75 0.0.0.0 9.67.156.76 0.0.0.0 9.67.156.96 9.67.157.136

216

TCPIP NAME: TCP Flags Refcnt ---------UGHO 000000 UGHO 000000 UH 000000 UH 000000 UGO 000000 UGHO 000000 UC 000000 UH 000000 UGHO 000000 UGHO 000000 UGHO 000000 UHC 000000 UGHO 000000 UH 000000 UH 000000 UGO 000000 UGHO 000000 UGHO 000000 UGO 000000 UGHO 000000 UGHO 000000 UGO 000000 UGHO 000000 UGHO 000000 UGHO 000000 UH 000000 UGHO 000000 UGHO 000000 UGHO 000000 UC 000000 UC 000000 UC 000000 UH 000001 UH 000000 UH 000000 UHS 000000 UHS 000000 UHS 000000 UGO 000000

Networking with z/OS and Cisco Routers: An Interoperability Guide

14:28:09, Interface, ---------, GIGELINK, GIGELINK, VLINK0, TOVTAM, GIGELINK, GIGELINK, CISCO2, CISCO2, GIGELINK, GIGELINK, GIGELINK, CISCO2, GIGELINK, VIPL09439C19, VIPL09439C1A, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, CISCO1, GIGELINK, GIGELINK, GIGELINK, EZAXCFN6, EZAXCFN7, EZAXCFN5, EZAXCFN6, EZAXCFN7, EZAXCFN5, EZAXCFN5, EZAXCFN6, EZAXCFN7, GIGELINK,

9.67.156.112 9.67.156.128 9.67.156.160 9.67.156.161 9.67.156.164 9.67.156.165 9.67.157.0 9.67.157.4 9.67.157.8 9.67.157.17 9.67.157.18 9.67.157.128 9.67.157.129 127.0.0.1 211.1.2.12 211.1.2.16 211.1.2.20

9.67.157.136 9.67.157.136 9.67.157.130 9.67.157.130 9.67.157.131 9.67.157.131 9.67.157.136 9.67.157.136 9.67.157.136 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 9.67.157.136 9.67.157.136 9.67.157.136

UGO UGO UGO UGHO UGO UGHO UGO UGO UGO UH UH UO UH UH UGO UGO UGO

000037 000000 000000 000000 000000 000000 000000 000002 000000 000000 000000 000000 000001 000004 000000 000000 000000

GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, GIGELINK, VIPL09439D11, VIPL09439D12, GIGELINK, GIGELINK, LOOPBACK, GIGELINK, GIGELINK, GIGELINK,

The output from the DISPLAY TCPIP,,OMPROUTE,OSPF,NEIGHBOR command issued from MVS001 is shown in Example 7-5. Example 7-5 D TCPIP,,OMPR,OSPF,NBR

D TCPIP,,OMPR,OSPF,NBR EZZ7851I NEIGHBOR SUMMARY 405 NEIGHBOR ADDR NEIGHBOR ID 9.67.156.76 9.67.156.5 9.67.156.65 9.67.156.89 9.67.157.131 9.67.156.75 9.67.157.130 9.67.156.74 9.67.157.132 9.67.156.5 9.67.157.137 9.67.156.82 9.67.157.136 9.67.156.89 9.67.156.74 9.67.156.74 9.67.156.75 9.67.156.75

STATE LSRXL DBSUM LSREQ HSUP IFC 128 0 0 0 OFF EZAXCFN7 128 0 0 0 OFF CISCO1 128 0 0 0 OFF GIGELINK 8 0 0 0 OFF GIGELINK 128 0 0 0 OFF GIGELINK 8 0 0 0 OFF GIGELINK 8 0 0 0 OFF GIGELINK 128 0 0 0 OFF EZAXCFN5 128 0 0 0 OFF EZAXCFN6

7.3 ASBR configuration and redistribution An autonomous system border router (ASBR) is a router that uses both OSPF and any other routing protocol (in our case, EIGRP). Let’s examine the OSPF configuration for the 7206 ASBR. The configuration for the 7507 is not detailed because it is, essentially, the same as what is shown here.

Chapter 7. Routing with OSPF and EIGRP

217

Example 7-6 Router OSPF configuration router ospf 1 router-id 9.67.156.89 log-adjacency-changes auto-cost reference-bandwidth 1000 redistribute eigrp 1 subnets route-map sendem network 1.1.1.2 0.0.0.0 area 1.1.1.1 network 9.67.156.0 0.0.0.255 area 1.1.1.1 network 9.67.157.0 0.0.0.255 area 1.1.1.1 maximum-paths 3

The following items pertain to Example 7-6: 1. The router ospf 1 command configures the OSPF routing process. 2. The network statements define the interfaces where OSPF runs and determine the OSPF area for the interfaces. 3. The auto-cost reference-bandwidth 1000 command controls how OSPF calculates default metrics for the interfaces. The OSPF metric is calculated as the reference-bandwidth-value divided by the bandwidth, with reference-bandwidth-value equal to 108 by default, and bandwidth determined by the bandwidth command. The calculation gives Fast Ethernet a metric of 1. Setting reference-bandwidth to 1000 changes the calculation, so that Fast Ethernet has a cost of 10 and Gigabit Ethernet has a cost of 1. 4. The redistribute command causes EIGRP routing information to be imported to the OSPF process. If you wanted to force the ASBR to generate a default route into an OSPF routing domain, you can do this. Whenever you specifically configure redistribution of routes into an OSPF routing domain, the router automatically becomes an ASBR. However, an ASBR does not, by default, generate a default route into the OSPF routing domain. To force a default route to be injected, use the command: #default-information originate [always] [metric metric-value] [metric-type type-value] [route-map map-name]

7.3.1 Redistribution In our network, the 7000 series routers ran two routing protocols simultaneously, OSPF and EIGRP. The Cisco IOS software can redistribute information from one routing protocol to another. In our case, we redistributed information from EIGRP into OSPF and from OSPF into EIGRP. This allows OSPF routes to be known in the remote 3640 routers within the EIGRP network and allows the LPARs within the sysplex to be aware of routes from EIGRP.

218

Networking with z/OS and Cisco Routers: An Interoperability Guide

You must be careful not to cause routing loops when using redistribution. We used a method using route maps to conditionally control the redistribution of routes between routing domains. Refer to Example 7-7 as we describe the configuration. Example 7-7 Redistribution using route maps router eigrp 1 redistribute ospf 1 route-map stopem router ospf 1 redistribute eigrp 1 subnets route-map sendem route-map stopem deny 10 match tag 1 ! route-map stopem permit 11 match tag 0 ! route-map sendem permit 10 match ip address 10 set tag 1 access-list 10 permit any

Notice the redistribute command under router eigrp 1 and router ospf 1. When considering the action of the redistribute command, remember that the associated router command indicates the destination routing process and the redistribute command specifies the source of the routing information. The redistribute command is defined as follows: redistribute protocol [process-id] {level-1 | level-1-2 | level-2} [as-number] [metric metric-value] [metric-type type-value] [match {internal | external 1 | external 2}] [tag tag-value] [route-map map-tag] [weight number-value] [subnets]

In our case, the configuration commands router ospf 1 and redistribute eigrp 1 subnets route-map sendem mean: 򐂰 That we want to redistribute from EIGRP (source) into OSPF (destination). 򐂰 The scope of redistribution includes subnets. 򐂰 The route map named sendem should be interrogated to filter the importation of routes from EIGRP into OSPF. Our route map tags these routes with a value of 1 so they don’t come back into EIGRP.

Chapter 7. Routing with OSPF and EIGRP

219

The other redistribute command accomplishes the reverse, that is OSPF into EIGRP. In Example 7-7, two route maps are defined, called sendem and stopem. The route-map named sendem is applied to the redistribute command for OSPF and the route-map named stopem is applied to the redistribute command for EIGRP. The route-map command is defined as follows: route-map map-tag [permit | deny] [sequence-number]

Where: 򐂰 The map-tag is the name that is referenced by the redistribute command. 򐂰 The permit or deny keyword has the following effect:

– If the match criteria are met for this route map, and the permit keyword is specified, the route is redistributed as controlled by the set actions. In the case of policy routing, the packet is policy routed. If the match criteria are not met, and the permit keyword is specified, the next route map with the same map tag is tested. If a route passes none of the match criteria for the set of route maps sharing the same name, it is not redistributed by that set. – If the match criteria are met for the route map, and the deny keyword is specified, the route is not redistributed or in the case of policy routing, the packet is not policy routed, and no further route maps sharing the same map tag name will be examined. If the packet is not policy routed, the normal forwarding algorithm is used. 򐂰 The sequence-number is a number that indicates the position a new route map is to have in the list of route maps already configured with the same name. If given with the no form of this command, the position of the route map should be deleted.

Note also the match and set commands. One or more match commands and one or more set commands typically follow a route-map command. If there are no match commands, then everything matches. If there are no set commands, nothing is done (other than the match). Therefore, you need at least one match or set command.

220

Networking with z/OS and Cisco Routers: An Interoperability Guide

In our case, these configuration commands apply when EIGRP routes are redistributed into OSPF: route-map sendem permit 10 match ip address 10 set tag 1

򐂰 The route-map command defines a route map named sendem. 򐂰 Because of the permit keyword, all routes that match are redistributed from EIGRP into OSPF. 򐂰 The match condition, match ip address 10, refers to access control list 10 (permits any) and matches all routes so that all routes redistributed from EIGRP into OSPF are tagged with a value of 1 using set tag 1 .

These commands apply when OSPF routes are redistributed into EIGRP: route-map stopem deny 10 match tag 1 route-map stopem permit 11 match tag 0

򐂰 Remember, all native OSPF routes will have a tag value of 0. 򐂰 The route-map commands define a route map named stopem with match conditions evaluated in the order designated by the sequence numbers, 10 and 11. 򐂰 The first match tag 1 matches all routes tagged with 1 (these are tagged by the route map memento sendem below). Because of the deny keyword, these routes will not be redistributed into EIGRP from OSPF. 򐂰 The second route map is evaluated and the match condition allows all EIGRP routes to be redistributed. Attention: Our route-maps, then, have the effect of blocking circular routes. Routes that are learned by OSPF are redistributed into EIGRP and routes that are learned by EIGRP are redistributed into OSPF. But, routes learned by EIGRP that are redistributed into OSPF are not passed back into EIGRP again.

The effect of the route-map commands and redistribution can be seen using the show ip ospf database command as illustrated in Example 7-8. Example 7-8 show ip ospf database command C7200-Z55#sh ip ospf dat OSPF Router with ID (9.67.156.89) (Process ID 1)

Chapter 7. Routing with OSPF and EIGRP

221

Router Link States (Area 1.1.1.1) Link ID 9.67.156.1 9.67.156.5 9.67.156.74 9.67.156.75 9.67.156.82 9.67.156.89

ADV Router 9.67.156.1 9.67.156.5 9.67.156.74 9.67.156.75 9.67.156.82 9.67.156.89

Age 1295 1291 1263 1247 1739 1552

Seq# 0x800000CC 0x800000C8 0x800000C0 0x800000B7 0x80000076 0x80000275

Checksum 0x931E 0x65C8 0xE915 0xC837 0xF5B3 0xF5C7

Link count 9 11 11 11 6 7

Net Link States (Area 1.1.1.1) Link ID 9.67.157.131

ADV Router 9.67.156.75

Age 1258

Seq# Checksum 0x8000003F 0x4F4D

Type-5 AS External Link States Link ID 9.67.156.0 9.67.156.1 9.67.156.4 9.67.156.5 9.67.156.16 9.67.156.16 9.67.156.16 9.67.156.24 9.67.156.25 9.67.156.26 9.67.156.32 9.67.156.33 9.67.156.34 9.67.156.40 9.67.156.41 9.67.156.42 9.67.156.48 9.67.156.49 9.67.156.50 9.67.156.64 9.67.156.72 9.67.156.72 9.67.156.72 9.67.156.72 9.67.156.96 9.67.156.96 9.67.156.112 9.67.156.112 9.67.156.128 9.67.156.128 9.67.156.160

222

ADV Router 9.67.156.1 9.67.156.1 9.67.156.5 9.67.156.5 9.67.156.5 9.67.156.74 9.67.156.75 9.67.156.1 9.67.156.1 9.67.156.1 9.67.156.74 9.67.156.74 9.67.156.74 9.67.156.75 9.67.156.75 9.67.156.75 9.67.156.5 9.67.156.5 9.67.156.5 9.67.156.82 9.67.156.1 9.67.156.5 9.67.156.74 9.67.156.75 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89 9.67.156.74

Age 156 156 715 719 430 367 139 160 160 160 920 920 920 455 455 455 719 720 720 994 161 720 921 455 491 1044 492 1044 492 1044 922

Seq# 0x800000A0 0x800000A0 0x800000A0 0x800000A0 0x80000066 0x80000066 0x80000066 0x800000A0 0x800000A0 0x800000A0 0x8000009F 0x8000009F 0x8000009F 0x800000A0 0x800000A0 0x800000A0 0x800000A0 0x800000A0 0x800000A0 0x8000005E 0x800000A0 0x800000A0 0x8000009F 0x800000A0 0x80000061 0x8000015B 0x80000061 0x8000015B 0x80000061 0x8000015B 0x8000009F

Networking with z/OS and Cisco Routers: An Interoperability Guide

Checksum 0x259D 0x2D91 0xE4D5 0xECC9 0xC824 0x297E 0x2383 0x1C92 0x3C6A 0x3273 0x1648 0x3620 0x2C29 0xBD96 0xDD6E 0xD377 0x137F 0x3357 0x2960 0x994B 0x3A44 0x2258 0x84B1 0x7CB7 0x22A7 0x1C6 0x8138 0x6057 0xE0C8 0xBFE7 0x29B0

Tag 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 0

9.67.156.161 9.67.156.164 9.67.156.165 9.67.157.0 9.67.157.0 9.67.157.4 9.67.157.4 9.67.157.8 9.67.157.8 9.67.157.16 9.67.157.17 9.67.157.18 211.1.2.0 211.1.2.0 211.1.2.4 211.1.2.4 211.1.2.8 211.1.2.8 211.1.2.12 211.1.2.12 211.1.2.16 211.1.2.16 211.1.2.20 211.1.2.20 C7200-Z55#

9.67.156.74 9.67.156.75 9.67.156.75 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89 9.67.156.1 9.67.156.1 9.67.156.1 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89 9.67.156.82 9.67.156.89

922 456 456 492 1044 492 1044 492 1044 163 163 163 493 551 493 551 493 552 493 552 493 552 493 552

0x8000009F 0x800000A0 0x800000A0 0x80000061 0x8000015B 0x80000061 0x8000015B 0x80000061 0x8000015B 0x800000A0 0x800000A0 0x800000A0 0x80000061 0x80000061 0x80000061 0x80000061 0x80000061 0x80000061 0x80000061 0x8000015C 0x80000061 0x8000015C 0x80000061 0x8000015C

0x31A4 0xF8DA 0x1CE 0x23F9 0x219 0xFA1E 0xD93D 0xD242 0xB161 0x6154 0x812C 0x7735 0xA18E 0x77B1 0x79B2 0x4FD5 0x51D6 0x27F9 0x29FA 0x61B 0x11F 0xDD3F 0xD843 0xB563

0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1

Notice that the route tag values in the right-hand column show a value of 0 for native OSPF routes and 1 for routes learned from EIGRP. Tip: The use of route maps are an effective way to safely redistribute routing information between autonomous systems. When routes from other protocols are redistributed into OSPF, each route is advertised individually in an external LSA. In many cases, you can limit the number of individual routes by configuring the Cisco IOS software to advertise a single route for all the redistributed routes that are covered by a specified network address and mask. Doing this will decrease the size of the OSPF link-state database and minimize OSPF route calculations on the host. Of course, to take advantage of this type of configuration, your addressing scheme must be such that summarization at the boundary is possible.

To advertise one summary route for all redistributed routes covered by a network address and mask, use the following command: #summary-address ip-address mask prefix mask [not-advertise][tag tag]

Chapter 7. Routing with OSPF and EIGRP

223

7.3.2 Verify routing from the router Here are some useful commands to be sure your routing configuration is operating as expected. Example 7-9 shows the output from the show ip ospf neighbor command. In our network, each 7000 series router is connected to all four LPARs over an Ethernet (Ethernet port channel for the 7507 and Gigabit Ethernet for the 7206) and a channel interface. Notice that each LPAR shows a state of FULL. Example 7-9 show ip ospf neighbor C7200-Z55#sh ip ospf nei Neighbor ID Pri 9.67.156.1 1 4/0 9.67.156.75 1 t4/0 9.67.156.5 1 t4/0 9.67.157.243 1 t4/0 9.67.156.82 1 GigabitEtherne t4/0 9.67.156.1 2 9.67.156.75 2 9.67.156.5 2 9.67.157.243 2 N/A 0 N/A 0 C7200-Z55#

State FULL/DROTHER

Dead Time 00:00:36

Address Interface 9.67.157.129 GigabitEthernet

FULL/DROTHER

00:00:33

9.67.157.131 GigabitEtherne

FULL/DROTHER

00:00:34

9.67.157.132 GigabitEtherne

FULL/DROTHER

00:00:36

9.67.157.130 GigabitEtherne

FULL/BDR

00:00:32

9.67.157.137

FULL/ FULL/ FULL/ FULL/ DOWN/ DOWN/

00:01:39 00:01:48 00:01:50 00:01:56 -

9.67.156.66 9.67.156.68 9.67.156.67 9.67.156.69 9.67.156.82 9.67.156.81

Channel3/0 Channel3/0 Channel3/0 Channel3/0 ATM2/0.1 ATM2/0.1

Dead Time 00:00:36 00:00:34 00:00:35 00:00:37 00:00:33 00:01:33 00:01:38 00:01:40 00:01:51

Address 9.67.157.129 9.67.157.131 9.67.157.132 9.67.157.130 9.67.157.136 9.67.156.17 9.67.156.19 9.67.156.20 9.67.156.18

Interface Port-channel1 Port-channel1 Port-channel1 Port-channel1 Port-channel1 Channel6/2 Channel6/2 Channel6/2 Channel6/2

-

NIVT7507#sh ip ospf nei Neighbor ID 9.67.156.1 9.67.156.75 9.67.156.5 9.67.157.243 9.67.156.89 9.67.156.1 9.67.156.75 9.67.156.5 9.67.157.243 NIVT7507#

224

Pri 1 1 1 1 1 2 2 2 2

State FULL/DROTHER FULL/DROTHER FULL/DROTHER FULL/DROTHER FULL/DR FULL/ FULL/ FULL/ FULL/ -

Networking with z/OS and Cisco Routers: An Interoperability Guide

Example 7-10 shows the output from the show ip ospf interface command for both the 7206 and the 7507. This command shows all interfaces configured for OSPF using the network command. Notice the OSPF costs for each of the interfaces in the output. Example 7-10 show ip ospf interface

C7200-Z55#sh ip ospf int CASA1 is up, line protocol is up Internet Address 1.1.1.2/32, Area 1.1.1.1 Process ID 1, Router ID 9.67.156.89, Network Type LOOPBACK, Cost: 1 Loopback interface is treated as a stub Host GigabitEthernet4/0 is up, line protocol is up Internet Address 9.67.157.136/28, Area 1.1.1.1 Process ID 1, Router ID 9.67.156.89, Network Type BROADCAST, Cost: 1 Transmit Delay is 1 sec, State DROTHER, Priority 0 Designated Router (ID) 9.67.156.75, Interface address 9.67.157.131 Backup Designated router (ID) 9.67.156.5, Interface address 9.67.157.132 Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5 Hello due in 00:00:02 Index 8/8, flood queue length 0 Next 0x0(0)/0x0(0) Last flood scan length is 6, maximum is 27 Last flood scan time is 0 msec, maximum is 0 msec Neighbor Count is 5, Adjacent neighbor count is 2 Adjacent with neighbor 9.67.156.75 (Designated Router) Adjacent with neighbor 9.67.156.5 (Backup Designated Router) Suppress hello for 0 neighbor(s) Channel3/0 is up, line protocol is up Internet Address 9.67.156.65/29, Area 1.1.1.1 Process ID 1, Router ID 9.67.156.89, Network Type POINT_TO_MULTIPOINT, Cost: 10 Transmit Delay is 1 sec, State POINT_TO_MULTIPOINT, Timer intervals configured, Hello 30, Dead 120, Wait 120, Retransmit 5 Hello due in 00:00:15 Index 2/2, flood queue length 0 Next 0x0(0)/0x0(0) Last flood scan length is 6, maximum is 27 Last flood scan time is 0 msec, maximum is 0 msec Neighbor Count is 4, Adjacent neighbor count is 4 Adjacent with neighbor 9.67.156.74 Adjacent with neighbor 9.67.156.75 Adjacent with neighbor 9.67.156.1 Adjacent with neighbor 9.67.156.5 Suppress hello for 0 neighbor(s) ATM2/0.1 is administratively down, line protocol is down Internet Address 9.67.156.89/28, Area 1.1.1.1

Chapter 7. Routing with OSPF and EIGRP

225

Process ID 1, Router ID 9.67.156.89, Network Type POINT_TO_MULTIPOINT, Cost: 1 Transmit Delay is 1 sec, State DOWN, Timer intervals configured, Hello 30, Dead 120, Wait 120, Retransmit 5 C7200-Z55#

NIVT7507#sh ip ospf int Port-channel1 is up, line protocol is up Internet Address 9.67.157.137/28, Area 1.1.1.1 Process ID 7514, Router ID 9.67.156.82, Network Type BROADCAST, Cost: 1 Transmit Delay is 1 sec, State DROTHER, Priority 0 Designated Router (ID) 9.67.156.75, Interface address 9.67.157.131 Backup Designated router (ID) 9.67.156.5, Interface address 9.67.157.132 Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5 Hello due in 00:00:09 Index 4/4, flood queue length 0 Next 0x0(0)/0x0(0) Last flood scan length is 1, maximum is 25 Last flood scan time is 0 msec, maximum is 0 msec Neighbor Count is 5, Adjacent neighbor count is 2 Adjacent with neighbor 9.67.156.75 (Designated Router) Adjacent with neighbor 9.67.156.5 (Backup Designated Router) Suppress hello for 0 neighbor(s) Channel6/2 is up, line protocol is up Internet Address 9.67.156.21/29, Area 1.1.1.1 Process ID 7514, Router ID 9.67.156.82, Network Type POINT_TO_MULTIPOINT, Cost : 10 Transmit Delay is 1 sec, State POINT_TO_MULTIPOINT, Timer intervals configured, Hello 30, Dead 120, Wait 120, Retransmit 5 Hello due in 00:00:29 Index 3/3, flood queue length 0 Next 0x0(0)/0x0(0) Last flood scan length is 1, maximum is 12 Last flood scan time is 0 msec, maximum is 0 msec Neighbor Count is 3, Adjacent neighbor count is 3 Adjacent with neighbor 9.67.156.74 Adjacent with neighbor 9.67.156.75 Adjacent with neighbor 9.67.156.5 Suppress hello for 0 neighbor(s) ATM1/0.1 is administratively down, line protocol is down Internet Address 9.67.156.82/28, Area 1.1.1.1 Process ID 7514, Router ID 9.67.156.82, Network Type POINT_TO_MULTIPOINT, Cost : 1 Transmit Delay is 1 sec, State DOWN, Timer intervals configured, Hello 30, Dead 120, Wait 120, Retransmit 5

226

Networking with z/OS and Cisco Routers: An Interoperability Guide

CASA1 is up, line protocol is up Internet Address 1.1.1.1/32, Area 1.1.1.1 Process ID 7514, Router ID 9.67.156.82, Network Type LOOPBACK, Cost: 1 Loopback interface is treated as a stub Host NIVT7507#

Example 7-11 shows the output from the show ip eigrp interface command for both the 7206 and the 7507. This command shows all interfaces configured for EIGRP using the network command. Here you can see the interfaces that connect to the three remote 3640 routers. Example 7-11 show ip eigrp interface C7200-Z55# sh ip eigrp int IP-EIGRP interfaces for process 1

Interface Se5/0.1 Se5/0.2 Se5/0.3 C7200-Z55#

Xmit Queue Mean Peers Un/Reliable SRTT 1 0/0 40 1 0/0 41 1 0/0 38

Pacing Time Un/Reliable 0/23 0/23 2/95

Multicast Flow Timer 151 143 227

Pending Routes 0 0 0

Pacing Time Un/Reliable 0/10 2/95 2/95 2/95

Multicast Flow Timer 0 287 307 319

Pending Routes 0 0 0 0

NIVT7507#sh ip eigrp int IP-EIGRP interfaces for process 1

Interface Fd5/0 Se4/0/0.1 Se4/0/0.2 Se4/0/0.3 NIVT7507#

Xmit Queue Mean Peers Un/Reliable SRTT 0 0/0 0 1 0/0 64 1 0/0 65 1 0/0 65

Example 7-12 shows the output from the show ip route command issued from the 7507. Notice the codes at the start of each line of output. The codes designate the type of route (C - connected, S - static, I - IGRP, R - RIP, M mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1, etc.). You can also see the administrative distance and the metric (or OSPF cost) in the brackets ([]) for each entry. Example 7-12 show ip route NIVT7507#sh ip route Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

Chapter 7. Routing with OSPF and EIGRP

227

E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area * - candidate default, U - per-user static route, o - ODR P - periodic downloaded static route Gateway of last resort is not set

C O C C D C O O O O O O O O D

E1 E1 E1 E1

EX

O S S S O E1

S D D S S S O O C O O O O O

228

E1 E1 E1 E1 E1 E1 E1

1.0.0.0/32 is subnetted, 2 subnets 1.1.1.1 is directly connected, CASA1 1.1.1.2 [110/2] via 9.67.157.136, 07:38:15, Port-channel1 7.0.0.0/30 is subnetted, 1 subnets 7.7.7.0 is directly connected, Tunnel1 8.0.0.0/30 is subnetted, 1 subnets 8.8.8.0 is directly connected, Tunnel62 9.0.0.0/8 is variably subnetted, 45 subnets, 4 masks 9.67.156.128/28 [90/10514432] via 211.1.2.10, 2d05h, Serial4/0/0.3 9.67.157.128/28 is directly connected, Port-channel1 9.67.156.164/30 [110/2] via 9.67.157.131, 07:38:15, Port-channel1 9.67.156.165/32 [110/2] via 9.67.157.131, 07:38:15, Port-channel1 9.67.156.160/30 [110/2] via 9.67.157.130, 07:38:15, Port-channel1 9.67.156.161/32 [110/2] via 9.67.157.130, 07:38:17, Port-channel1 9.67.156.68/32 [110/1] via 9.67.157.131, 07:38:17, Port-channel1 9.67.156.69/32 [110/1] via 9.67.157.130, 07:38:17, Port-channel1 9.67.156.66/32 [110/1] via 9.67.157.129, 07:38:17, Port-channel1 9.67.156.67/32 [110/1] via 9.67.157.132, 07:38:17, Port-channel1 9.67.156.64/29 [170/11024128] via 211.1.2.2, 2d05h, Serial4/0/0.1 [170/11024128] via 211.1.2.10, 2d05h, Serial4/0/0.3 [170/11024128] via 211.1.2.6, 2d05h, Serial4/0/0.2 9.67.156.65/32 [110/1] via 9.67.157.136, 07:38:17, Port-channel1 9.67.156.76/32 is directly connected, Tunnel69 9.67.156.74/32 is directly connected, Tunnel62 9.67.156.75/32 is directly connected, Tunnel154 9.67.156.72/29 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 [110/2] via 9.67.157.132, 07:38:17, Port-channel1 [110/2] via 9.67.157.130, 07:38:17, Port-channel1 9.67.156.73/32 is directly connected, Tunnel1 9.67.156.112/28 [90/10514432] via 211.1.2.6, 2d05h, Serial4/0/0.2 9.67.156.96/28 [90/10514432] via 211.1.2.2, 2d05h, Serial4/0/0.1 9.67.156.20/32 [1/0] via 9.67.156.20, Channel6/2 9.67.156.18/32 [1/0] via 9.67.156.18, Channel6/2 9.67.156.19/32 [1/0] via 9.67.156.19, Channel6/2 9.67.157.18/32 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 9.67.157.17/32 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 9.67.156.16/29 is directly connected, Channel6/2 9.67.157.16/29 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 9.67.156.26/32 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 9.67.156.24/29 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 9.67.156.25/32 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 9.67.156.4/30 [110/2] via 9.67.157.132, 07:38:17, Port-channel1

Networking with z/OS and Cisco Routers: An Interoperability Guide

O D O O D D O O O O O O O O O

E1

9.67.156.5/32 [110/2] via 9.67.157.132, 07:38:17, Port-channel1 9.67.157.4/30 [90/10639872] via 211.1.2.6, 2d05h, Serial4/0/0.2 E1 9.67.156.0/30 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 E1 9.67.156.1/32 [110/2] via 9.67.157.129, 07:38:17, Port-channel1 9.67.157.0/30 [90/10639872] via 211.1.2.2, 2d05h, Serial4/0/0.1 9.67.157.8/30 [90/10639872] via 211.1.2.10, 2d05h, Serial4/0/0.3 E1 9.67.156.50/32 [110/2] via 9.67.157.132, 07:38:17, Port-channel1 E1 9.67.156.48/29 [110/2] via 9.67.157.132, 07:38:17, Port-channel1 E1 9.67.156.49/32 [110/2] via 9.67.157.132, 07:38:17, Port-channel1 E1 9.67.156.34/32 [110/2] via 9.67.157.130, 07:38:17, Port-channel1 E1 9.67.156.32/29 [110/2] via 9.67.157.130, 07:38:17, Port-channel1 E1 9.67.156.33/32 [110/2] via 9.67.157.130, 07:38:17, Port-channel1 E1 9.67.156.42/32 [110/2] via 9.67.157.131, 07:38:17, Port-channel1 E1 9.67.156.40/29 [110/2] via 9.67.157.131, 07:38:17, Port-channel1 E1 9.67.156.41/32 [110/2] via 9.67.157.131, 07:38:17, Port-channel1 11.0.0.0/30 is subnetted, 1 subnets C 11.11.11.0 is directly connected, Tunnel69 211.1.2.0/30 is subnetted, 6 subnets D 211.1.2.16 [90/11023872] via 211.1.2.6, 2d05h, Serial4/0/0.2 D 211.1.2.20 [90/11023872] via 211.1.2.10, 2d05h, Serial4/0/0.3 C 211.1.2.0 is directly connected, Serial4/0/0.1 C 211.1.2.4 is directly connected, Serial4/0/0.2 C 211.1.2.8 is directly connected, Serial4/0/0.3 D 211.1.2.12 [90/11023872] via 211.1.2.2, 2d05h, Serial4/0/0.1 12.0.0.0/30 is subnetted, 1 subnets C 12.12.12.0 is directly connected, Tunnel154 NIVT7507#

7.4 Summary This scenario showed how a Cisco network using EIGRP can be connected to a sysplex running OSPF. We also showed how we used route maps to control redistribution and prevent routing loops. We recommend you always try to simplify your routing topology whenever possible. Using stub areas and default routes is recommended for most enterprises.

Chapter 7. Routing with OSPF and EIGRP

229

230

Networking with z/OS and Cisco Routers: An Interoperability Guide

8

Chapter 8.

Implementing QoS in a z/OS and Cisco environment The best way to understand how to design, configure and deploy QoS policies to meet your business needs is to examine the details in the context of a scenario. In this chapter, we present a sample network to illustrate these concepts. To do so, we analyze our business requirements, create a set of traffic classes, and assign the classes to policies that relate to the business criteria. We look at how the capabilities of the z/OS Policy Agent can be combined with the QoS features of the Cisco IOS to enable end-to-end differentiated services. We examine the results of generated traffic sent over the QoS-enabled network in our lab environment and illustrate our policies in action. While RSVP could also be defined and used for QoS signaling, we configured only DiffServ in our environment, since that is most applicable to a majority of enterprises.

© Copyright IBM Corp. 2002

231

8.1 Implementation steps This section walks you through our sample scenario and illustrates how to develop a plan for QoS deployment, analyze traffic requirements, and assign classes and policies. As we will see, your policies will be directly influenced by the configuration of resources in your network as well as the types of application data flowing through them.

8.1.1 Perform traffic audit As previously described in Chapter 4, “Quality of Service” on page 73, the first step in deploying QoS in a network is to analyze the application and traffic requirements. To adequately perform this step, it is necessary to have a good understanding of the current network environment. Figure 8-1 illustrates the network used in our lab scenario.

232

Networking with z/OS and Cisco Routers: An Interoperability Guide

IBM z/OS Sysplex

OSPF NETWORK AREA 1.1.1.1

1/1

c7507 fa0/0/0

2/25 2/26

.137 .1

c7206

6509

211.1.1.0/24 .2

fa0/0/1

g4/0

1/2

9.67.157.128/28 vlan 400

.136

9.67.157.128/28 vlan 400

EIGRP NETWORK

.146

.145

211.1.2.x .9 .1

.5

16

.2

211.1.2. 8/

17

0 0 /3 .2. 1.1 21

16

18

.14

30

211 .1.2 .4/3 0

17

Frame Relay Network

17 .6

c3640n1 .97 fa 3/0 9.67.156.96/28 vlan 100

16 .18

211.1.2.

12/30

17

0 .16/3 .1.2 211

17 .10

c3640n2 .113 fa 3/0 9.67.156.112/28 vlan 200

.13

16

.17

21 1 .1 .2. 20 /30

c7513

.21

18

16

.22

c3640n3 .129 fa 3/0 9.67.156.128/28 vlan 300

Figure 8-1 Our network environment

In our network, we have three remote sites connected via a frame relay network. Access to the sysplex is available through a Cisco 7507 and a Cisco 7206 VXR router. Each serial interface to the frame relay network operates at a speed of 1024 kbps. Permanent virtual circuits (PVCs) connect each of the remote sites to both sysplex ingress routers and have a defined committed information rate (CIR) of 256 kbps.

Chapter 8. Implementing QoS in a z/OS and Cisco environment

233

As part of the traffic audit, probes should be placed at various traffic aggregation points to analyze the traffic patterns present within the network. Armed with detailed information about the traffic types and patterns that exist within the network, the traffic audit can proceed to determine the associated applications and relative business priority for each traffic category. The results of our traffic audit are summarized in Table 8-1. Table 8-1 Traffic audit results Application

Business Priority

Traffic Type

Order Entry, Customer Service

1

SNA Interactive

CRM

2

Telnet

HR

3

HTTP

File Transfer

4

SNA Batch

File Transfer

4

FTP

Uncategorized

5

Other

Our sample network consists of several types of application traffic. To illustrate the support for QoS, we have identified Telnet, FTP, HTTP or Web traffic, SNA Interactive, and SNA Batch traffic types. We ranked the traffic in terms of business priority. Another thing to consider in the traffic audit is any new application to be deployed in the network. For example, voice traffic is not part of our sample IP-based network today. It is anticipated that Voice over IP (VoIP) will be deployed in the future throughout our network. As a result, planners must take this into consideration when assigning classifications as outlined in 8.1.2, “Traffic classification” on page 234.

8.1.2 Traffic classification Using the information from the traffic audit, we determine exactly what traffic classifications we will include in the network. We associate traffic classes with each of the identified traffic types according to the assigned business priority as shown in Table 8-2. The goal is to minimize the number of classes and allocate traffic into aggregate classes. Table 8-2 Traffic classification values

234

Application

Business Priority

Traffic Type

Class

Future Voice

1

VoIP

LLQ

Networking with z/OS and Cisco Routers: An Interoperability Guide

Application

Business Priority

Traffic Type

Class

Order Entry, Customer Service

1

SNA Interactive

Platinum

CRM

2

Telnet

Gold

HR

3

HTTP

Silver

File Transfer

4

SNA Batch

Bronze

File Transfer

4

FTP

Bronze

Uncategorized

5

Other

Best-effort

To meet our requirements, we define four general traffic classifications using the Olympic model: Platinum, Gold, Silver, and Bronze. Below these service levels is the standard, best-effort service provided by IP. Low Latency Queuing (LLQ) can also be configured above Platinum for voice traffic. Even though voice traffic may not currently be part of your network, we suggest refraining from allocating any other high priority data traffic to this top priority.

8.1.3 QoS policy definition We now assign and document the QoS policies that guide our actual configuration throughout the network, edge routers and switches, and host components. Figure 8-2 on page 236 illustrates the ingress points where traffic marking takes place, and the congestion points where queuing and congestion management occurs.

Chapter 8. Implementing QoS in a z/OS and Cisco environment

235

IBM z/OS Sysplex

OSPF NETWORK AREA 1.1.1.1

1/1

c7507 2/25 2/26

fa0/0/0 .137

.1

c7206

6509

211.1.1.0/24 .2

fa0/0/1

g4/0

1/2

.136

9.67.157.128/28 vlan 400

9.67.157.128/28 vlan 400

EIGRP NETWORK

.146

.145

211.1.2.x .9

.1

.5

16

.2

211.1.2

17

0 0/3 .2 . 1. 1 21

16

18

.14

Frame Relay Network

.8/30

211 .1.2

.4/3 0

17

17 .6

c3640n1 .97 fa 3/0 9.67.156.96/28 vlan 100

16 .18

211.1.2.

12/30

17

0 .16/3 .1.2 211

17 .10

c3640n2 .113 fa 3/0 9.67.156.112/28 vlan 200

.13

16

.17

21 1.1 . 2. 20 /30

c7513

.21

18

16

.22

c3640n3 .129 fa 3/0 9.67.156.128/28 vlan 300

Figure 8-2 Congestion and re-marking points

Traffic is classified at the edges of the network. In our case, classification takes place at the 7507 and 7206 routers for traffic coming from the sysplex and at the 3640 routers for traffic originating at the remote LANs. These points are marked by check marks in the diagram. In our simple network, the major congestion points are at the wide area network boundaries, marked by “X” in the diagram. It is at these points where Class-Based Weighted Fair Queuing (CBWFQ) will manage congestion and ensure applications are allocated bandwidth consistent with our business policies.

236

Networking with z/OS and Cisco Routers: An Interoperability Guide

Voice traffic is not currently served by the IP network but is sure to be added in the future. By its nature, it requires consistent, low latency service so the class-based priority queue, Low Latency Queuing (LLQ), will be reserved for voice. The other classes will be assigned relative bandwidth percentages. Standard assured forwarding and expedited forwarding DSCP values are used for compatibility with carrier service networks and consistency across management domains. The DCSP values used are depicted in Table 8-3. Table 8-3 DSCP values Class

Bandwidth

DSCP Value

DSCP Setting

VoIP

100k or 10%

46

101110, EF

Platinum

40%

10

001010, AF11

Gold

30%

12

001100, AF12

Silver

10%

18

010010, AF21

Bronze

5%

26

011010, AF31

Other

N/A

0

000000, or

8.2 Configuration examples This section shows how we configured for QoS in our network.

8.2.1 z/OS configuration The z/OS Policy Agent (PAGENT) was configured to mark packets according to the traffic classes we defined. Example 8-1 shows the configuration statements used in the PAGENT configuration file. Example 8-1 PAGENT configuration ################################################################## # Traffic Priorities for Policy Agent - Defaults to All interfaces ################################################################## PolicyAction { PolicyScope OutgoingTOS } PolicyAction {

DFSRV10 DataTraffic 00101000 ;Bit Mask for TOS DFSRV12

Chapter 8. Implementing QoS in a z/OS and Cisco environment

237

PolicyScope OutgoingTOS } PolicyAction { PolicyScope OutgoingTOS } PolicyAction { PolicyScope OutgoingTOS }

DataTraffic 00110000 ;Bit Mask for TOS DFSRV18 DataTraffic 01001000 ;Bit Mask for TOS DFSRV26 DataTraffic 01101000 ;Bit Mask for TOS

################################################################## # Allows all IP interfaces to use the Policy Agent (0.0.0.0) ################################################################## #----------------------------------------------------------------# Rules for SNA Batch and FTP #----------------------------------------------------------------PolicyRule DFSRV26_Rule_EE_Batch { Direction Both PolicyScope Both ProtocolNumber UDP SourceAddressRange 9.67.156.1 9.67.156.1 SourcePortRange 12004 12004 ServiceReference DFSRV26 # Service Catagory } PolicyRule DFSRV26_Rule_FTP { Direction Both PolicyScope Both SourcePortRange 20 21 ServiceReference DFSRV26 }

# Service Catagory

#----------------------------------------------------------------# Rules for HTTP #----------------------------------------------------------------PolicyRule DFSRV18_Rule_Base_Web { Direction Both PolicyScope Both SourcePortRange 80 80

238

# HTTP Default Port

Networking with z/OS and Cisco Routers: An Interoperability Guide

PolicyActionReference

DFSRV18

# Service Catagory

} PolicyRule DFSRV18_Rule_Secure_Web { Direction Both PolicyScope Both SourcePortRange 443 443 PolicyActionReference DFSRV18 }

# HTTP Secure Port # Service Catagory

#----------------------------------------------------------------# Rules for TELNET #----------------------------------------------------------------PolicyRule DFSRV12_Rule_Telnet { Direction Both PolicyScope Both SourcePortRange 23 23 PolicyActionReference DFSRV12 } PolicyRule DFSRV12_Rule_Telnet_Secure { Direction Both PolicyScope Both SourcePortRange 523 523 PolicyActionReference DFSRV12 }

# TELNET Default Port # Service Catagory

# TELNET Default Port # Service Catagory

#----------------------------------------------------------------# SNA Interactive #----------------------------------------------------------------PolicyRule DFSRV10_Rule_EE_Interactive { Direction Both PolicyScope Both ProtocolNumber UDP SourceAddressRange 9.67.156.1 9.67.156.1 SourcePortRange 12002 12002 ServiceReference DFSRV10 # Service Catagory }

Chapter 8. Implementing QoS in a z/OS and Cisco environment

239

The OutgoingTOS statement is used to set the various DSCP values for packets according to our policy criteria. The PolicyRule statements establish a means to distinguish the different traffic classes using a combination of IP addresses and port numbers. Each rule is tied back to the PolicyAction name with the ServiceReference statement. SNA traffic being transported using Enterprise Extender (EE) is assigned to specific ports according to the SNA class of service and is always sourced from the static virtual IP address. This provides a convenient means to classify and mark this traffic. Please refer to Table 4-4 on page 116. For example, the rule DFSRV10_Rule_EE_Interactive designates source IP address 9.67.156.1 and port 12002. This port is used for high-priority traffic using SNA COS #INTER.

8.2.2 Cisco network configuration The remaining configuration examples show exactly how the Cisco network uses QoS to meet the end-to-end requirements developed in our sample scenario. First, the classes are defined in each of the routers. The commands in Example 8-2 define the classes. Example 8-2 Cisco router class definitions class-map match-all VoIP-EF description Voice traffic Expedited Forwarding match access-group 121 class-map match-all BestEffort-AF4 description Default match access-group 114 class-map match-all Platinum-AF1 description SNA Interactive match ip dscp 10 class-map match-any Silver-AF2 description HTTP match ip dscp 18 match access-group 112 class-map match-any Bronze-AF3 description FTP, SNA Batch match ip dscp 26 match access-group 113 class-map match-any Gold-AF1 description Telnet match ip dscp 12 match access-group 111

240

Networking with z/OS and Cisco Routers: An Interoperability Guide

Access control lists are coupled with match conditions to classify the traffic. Match conditions that match on DiffServ DSCP values will classify traffic that is marked at trusted points in the network. In our case, the DSCP is set for traffic coming from the sysplex by PAGENT on the host. For example, PAGENT will have marked SNA interactive packets associated with the customer service and order entry applications with a DSCP value of b’00101000’. The first 6 bits of the DS field contain this DSCP value, b’001010’ = decimal 10. This matches the class-map “Platinum-AF1”. Notice that whenever you want an OR condition, use the match-any keyword rather than the match-all keyword. The access control lists associated with the access groups above are shown in Example 8-3. Example 8-3 Access control lists access-list access-list access-list access-list access-list

111 112 113 114 121

permit permit permit permit permit

tcp any any tcp any any tcp any any ip any any udp any any

eq telnet eq www eq ftp range 16384 32768

Next, a policy is applied for each of the traffic classes using the policy-map configuration command as illustrated in Example 8-4. Example 8-4 Policy mapping policy-map FRpolicy class VoIP-EF priority 32 class Platinum-AF1 bandwidth percent 40 class Gold-AF1 bandwidth percent 30 class Silver-AF2 bandwidth percent 10 random-detect class Bronze-AF3 bandwidth percent 5 random-detect class BestEffort-AF4 bandwidth percent 5

The FRpolicy policy map, when applied to the frame relay interface, configures CBWFQ and determines the relative bandwidth settings for the traffic classes. However, it is important to make sure we consider the CIR associated with the PVCs that connect each site to the hub routers. By using traffic shaping, we can be assured that our data traffic does not accumulate within the frame relay network and is instead buffered at the router and injected according to our QoS definition.

Chapter 8. Implementing QoS in a z/OS and Cisco environment

241

Example 8-5 Traffic shaping policy policy-map FRshape class class-default shape average 256000 service-policy FRpolicy

The statement shape average 256000 indicates that Generic Traffic Shaping (GTS) is used and the average rate is limited to the committed information rate of 256000 bps. The FRshape policy map will be applied to the frame relay sub-interfaces and it will, in turn, specify the policy FRpolicy to apply the desired bandwidth percentages. Configured this way, it is, in effect, a nested or hierarchical policy definition. The policy, FRshape, is shown applied to the frame relay sub-interfaces in Example 8-6. Example 8-6 Generic traffic shaping interface Serial5/0 description NIVT Interoperability Frame Relay Cloud mtu 2106 bandwidth 1024 no ip address encapsulation frame-relay IETF no ip mroute-cache fair-queue 64 64 0 ! interface Serial5/0.1 point-to-point ip address 211.1.2.13 255.255.255.252 service-policy output FRshape frame-relay interface-dlci 16 ! interface Serial5/0.2 point-to-point ip address 211.1.2.17 255.255.255.252 service-policy output FRshape frame-relay interface-dlci 17 ! interface Serial5/0.3 point-to-point bandwidth 256 ip address 211.1.2.21 255.255.255.252 service-policy output FRshape frame-relay interface-dlci 18

We used the above configuration during our lab testing and achieved results we desired in our particular setup. However, we recommend that you use a configuration that specifies Frame Relay Traffic Shaping (FRTS). A preferred sample configuration using FRTS is shown in Example 8-7.

242

Networking with z/OS and Cisco Routers: An Interoperability Guide

Example 8-7 FRTS sample configuration interface Serial5/0 description NIVT Interoperability Frame Relay Cloud mtu 2106 bandwidth 1024 no ip address encapsulation frame-relay IETF no ip mroute-cache frame-relay traffic-shaping ! interface Serial5/0.1 point-to-point ip address 211.1.2.13 255.255.255.252 frame-relay interface-dlci 16 class ts-class ! interface Serial5/0.2 point-to-point ip address 211.1.2.17 255.255.255.252 frame-relay interface-dlci 17 class ts-class ! interface Serial5/0.3 point-to-point bandwidth 256 ip address 211.1.2.21 255.255.255.252 frame-relay interface-dlci 18 class ts-class ! map-class frame-relay ts-class no frame-relay adaptive-shaping service-policy output FRshape frame-relay fragment 320

Example 8-7 also shows how to enable fragmentation using the frame-relay fragment command. In this example we specify a fragment size of 320. This means that each fragment except the last will contain 320 bytes of the original payload. Fragmentation reduces the delay and jitter that might affect sensitive applications such as voice over IP. The policy map shown in Example 8-8, SETDSCP, when applied to the network ingress points, classifies and confirms the proper DSCP settings by re-marking. Example 8-8 Policy to set DSCP policy-map SETDSCP class VoIP-EF set ip dscp 46 class Platinum-AF1 set ip dscp 10

Chapter 8. Implementing QoS in a z/OS and Cisco environment

243

class Gold-AF1 set ip dscp 12 class Silver-AF2 set ip dscp 18 class Bronze-AF3 set ip dscp 26

The SETDSCP policy is applied to the ingress interface using the service-policy command in Example 8-9. Note the keyword “input” is specified to define it as an input policy. Example 8-9 Applying the input policy for classification interface GigabitEthernet4/0 ip address 9.67.157.136 255.255.255.240 ip igmp join-group 224.0.1.2 negotiation auto service-policy input SETDSCP

As mentioned earlier, in cases where class-based packet marking cannot be applied to the interface (on the 7507 CIP channel interface, for example), we rely on the host-based Policy Agent (PAGENT) to mark the packets. Close coordination of definitions between those applied by the systems programmer and the network administrator is required. On the remote 3640 routers, SNA traffic is handled by Cisco SNA Switching Services (SNASw). SNASw, configured for Enterprise Extender (EE), transmits SNA traffic as UDP packets sourced from a loopback interface. Class-Based Packet Marking for router-generated traffic is not supported until Cisco IOS 12.2(4). Since we are running Cisco IOS 12.2(1a), we used the police command to ensure the marking of DSCP according to our chosen scheme. Additional ACLs identify traffic using the source loopback interface IP address and the COS-related IP precedence bits set by SNASw as shown in Example 8-10. Example 8-10 Additional access control lists access-list access-list access-list access-list

115 115 115 116

permit permit permit permit

udp udp udp udp

host host host host

9.67.157.5 9.67.157.5 9.67.157.9 9.67.157.5

any any any any

precedence precedence precedence precedence

flash-override immediate network priority

These lists are added to the classification using the keyword match-any on the class-map as shown in Example 8-11.

244

Networking with z/OS and Cisco Routers: An Interoperability Guide

Example 8-11 Adding the ACLs to classifications class-map match-any Platinum-AF1 description SNA Interactive match ip dscp 10 match access-group 115 class-map match-any Bronze-AF3 description FTP, SNA Batch match ip dscp 26 match access-group 113 match access-group 116

Note the use of the match-any condition rather than match-all. This makes the condition that of a logical OR rather than an AND for the class. The police command is then used to mark the packets accordingly, as indicated in Example 8-12. In our case, we were not concerned with policing the traffic, but only ensuring the packets were marked with the correct DSCP value. Example 8-12 Using the police command policy-map FRpolicy class VoIP-EF priority 32 class Platinum-AF1 bandwidth percent 40 police 512000 5000 5000 conform-action set-dscp-transmit 10 exceed-action set-dscp-transmit 10 violate-action set-dscp-transmit 10 class Gold-AF1 bandwidth percent 30 class Silver-AF2 bandwidth percent 10 random-detect class Bronze-AF3 bandwidth percent 5 random-detect police 62000 2000 2000 conform-action set-dscp-transmit 26 exceed-action set-dscp-transmit 26 violate-action set-dscp-transmit 26 class BestEffort-AF4 bandwidth percent 5

Chapter 8. Implementing QoS in a z/OS and Cisco environment

245

Verification and confirmation of the effects of the policy can be seen using the show policy interface command. The match on access-list 115 causes the classification of SNASw originated traffic as “Platinum-AF1” and the policy assures the UDP packets transmitted over the frame relay interface are marked with DSCP value 10. Note the packet count under the “Platinum-AF1” class and the conformed packet count and action under the “police” heading shown in Example 8-13. Example 8-13 Show policy interface command C3640N2#sh pol int * Some output deleted * Serial1/1.2 Service-policy output: FRshape Class-map: class-default (match-any) 74465 packets, 6373664 bytes 5 minute offered rate 11000 bps, drop Match: any Traffic Shaping Target Byte Sustain Excess Rate Limit bits/int bits/int 256000 1984 7936 7936 Queue Depth 0

Packets

Bytes

69937

6010446

rate 0 bps

Interval (ms) 31

Packets Delayed 978

Increment Adapt (bytes) Active 992 -

Bytes Delayed 303375

* Some output deleted * Class-map: Platinum-AF1 (match-any) 1014 packets, 179623 bytes 5 minute offered rate 0 bps, drop rate 0 bps Match: ip dscp 10 0 packets, 0 bytes 5 minute rate 0 bps Match: access-group 115 1014 packets, 179623 bytes 5 minute rate 0 bps Weighted Fair Queuing Output Queue: Conversation 41 Bandwidth 40 (%) Max Threshold 64 (packets) (pkts matched/bytes matched) 10/2145 (depth/total drops/no-buffer drops) 0/0/0 police: 512000 bps, 5000 limit, 5000 extended limit

246

Networking with z/OS and Cisco Routers: An Interoperability Guide

Shaping Active no

conformed 1015 packets, 179722 bytes; action: set-dscp-transmit 10 exceeded 0 packets, 0 bytes; action: set-dscp-transmit 10 violated 0 packets, 0 bytes; action: set-dscp-transmit 10 conformed 0 bps, exceed 0 bps violate 0 bps

Class-based packet marking is being enhanced with Cisco IOS 12.2(4) to allow for marking of router-generated packets such as those originated from SNASw. This will facilitate class-based marking without the use of class-based policing as was used here.

8.3 QoS test results In order to show the effects of QoS and the interaction between the host-based PAGENT and the QoS features configured in the network, a traffic generator tool was used to simulate the application data. We used a tool called Chariot to send data among various endpoints at the remote sites and the host. Specific port numbers were used to simulate traffic categories, which were marked using host-based PAGENT. The traffic profiles for each class were normalized so that the effects of QoS would be apparent. In other words, the same traffic profile was used for each traffic class. The effects of PAGENT and verification of the defined policies can be seen using the netstat slap command on the host as shown in Example 8-14. Example 8-14 PAGENT verification ===> netstat slap MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP PolicyRuleName: DFSRV10_Rule_EE_Interactive, FirstActTime: 20:15:55 LastMapTime: TotalBytesIn: 0000000000 TotalBytesOut: BytesInDiscard: 0000000000 BytesOutDiscard: TotalInPackets: 0000000000 TotalOutPackets: ActConnMap: 0000000000 MaxConnLimit: AcceptConn: 0000000000 DeniedConn: OutBytesInProf: 0000000000 Status: PolicyRuleName: DFSRV26_Rule_EE_Batch, FirstActTime: 20:15:55 LastMapTime: TotalBytesIn: 0000000000 TotalBytesOut: BytesInDiscard: 0000000000 BytesOutDiscard: TotalInPackets: 0000000000 TotalOutPackets: ActConnMap: 0000000006 MaxConnLimit: AcceptConn: 0000000075 DeniedConn:

20:36:16, 20:17:24, 0001409598, 0000000000, 0000002054, 0000000000, 0000000000, Active, 20:17:24, 0000883228, 0000000000, 0000000999, 0000000000, 0000000000,

Chapter 8. Implementing QoS in a z/OS and Cisco environment

247

OutBytesInProf: 0000000000 Status: PolicyRuleName: DFSRV12_Rule_Telnet_Secure, FirstActTime: 20:15:55 LastMapTime: TotalBytesIn: 0000000000 TotalBytesOut: BytesInDiscard: 0000000000 BytesOutDiscard: TotalInPackets: 0000000000 TotalOutPackets: ActConnMap: 0000000006 MaxConnLimit: AcceptConn: 0000000615 DeniedConn: OutBytesInProf: 0000000000 Status: PolicyRuleName: DFSRV12_Rule_Telnet, FirstActTime: 20:15:55 LastMapTime: TotalBytesIn: 0000243516 TotalBytesOut: BytesInDiscard: 0000001660 BytesOutDiscard: TotalInPackets: 0000004898 TotalOutPackets: ActConnMap: 0000000006 MaxConnLimit: AcceptConn: 0000000615 DeniedConn: OutBytesInProf: 0000000000 Status: PolicyRuleName: DFSRV18_Rule_Secure_Web, FirstActTime: 20:15:55 LastMapTime: TotalBytesIn: 0000000000 TotalBytesOut: BytesInDiscard: 0000000000 BytesOutDiscard: TotalInPackets: 0000000000 TotalOutPackets: ActConnMap: 0000000006 MaxConnLimit: AcceptConn: 0000000619 DeniedConn: OutBytesInProf: 0000000000 Status: PolicyRuleName: DFSRV18_Rule_Base_Web, FirstActTime: 20:15:55 LastMapTime: TotalBytesIn: 0000278896 TotalBytesOut: BytesInDiscard: 0000001220 BytesOutDiscard: TotalInPackets: 0000003155 TotalOutPackets: ActConnMap: 0000000006 MaxConnLimit: AcceptConn: 0000000619 DeniedConn: OutBytesInProf: 0000000000 Status: PolicyRuleName: DFSRV26_Rule_FTP, FirstActTime: 20:15:55 LastMapTime: TotalBytesIn: 0000062636 TotalBytesOut: BytesInDiscard: 0000001796 BytesOutDiscard: TotalInPackets: 0000001900 TotalOutPackets: ActConnMap: 0000000006 MaxConnLimit: AcceptConn: 0000000075 DeniedConn: OutBytesInProf: 0000000000 Status:

Active, 00:00:00, 0000000000, 0000000000, 0000000000, 0000000000, 0000000000, Active, 20:35:40, 0005321980, 0000000000, 0000005725, 0000000000, 0000000000, Active, 00:00:00, 0000000000, 0000000000, 0000000000, 0000000000, 0000000000, Active, 20:35:40, 0002639496, 0000000000, 0000003311, 0000000000, 0000000000, Active, 20:36:10, 0003879069, 0000000000, 0000002821, 0000000000, 0000000000, Active,

Note in Example 8-14 the packet counts in each of the traffic classes we defined: EE_Interactive, EE_Batch, Telnet, Base_Web, and FTP. The PAGENT registered our traffic sent over our test network in the categories as we expected.

248

Networking with z/OS and Cisco Routers: An Interoperability Guide

The following examples show the effects of QoS when applied to the data center serial interfaces. We first show response time for various traffic categories without QoS applied. To do so, QoS policies were removed from the configuration at the congestion point where outbound traffic enters the frame relay network, as depicted in Example 8-15. The serial link on the 7507 was shut down so that all traffic traveled over the link attached to the 7206 and the QoS effects would be seen clearly. Example 8-15 QoS policy removed int s5/0.1 no service-policy output FRshape int s5/0.2 no service-policy output FRshape int s5/0.3 no service-policy output FRshape

The output from the show interface command on the router confirms the lack of queuing set on the interface. Note the Queuing strategy: fifo line in Example 8-16. This indicates that only best-effort, first-in-first-out queuing is in effect. Example 8-16 Show interface command after QoS is disabled C7200-Z55#sh in s5/0 Serial5/0 is up, line protocol is up Hardware is M4T Description: NIVT Interoperability Frame Relay Cloud Internet address is 9.67.156.145/28 MTU 2106 bytes, BW 256 Kbit, DLY 20000 usec, reliability 255/255, txload 242/255, rxload 23/255 Encapsulation FRAME-RELAY IETF, crc 16, loopback not set Keepalive set (5 sec) LMI enq sent 300, LMI stat recvd 300, LMI upd recvd 0, DTE LMI up LMI enq recvd 0, LMI stat sent 0, LMI upd sent 0 LMI DLCI 0 LMI type is ANSI Annex D frame relay DTE FR SVC disabled, LAPF state down Broadcast queue 0/64, broadcasts sent/dropped 153/0, interface broadcasts 26 Last input 00:00:02, output 00:00:00, output hang never Last clearing of "show interface" counters 00:25:01 Queuing strategy: fifo Output queue 0/150, 459 drops; input queue 0/75, 0 drops 30 second input rate 25000 bits/sec, 32 packets/sec 30 second output rate 258000 bits/sec, 34 packets/sec 14655 packets input, 1364931 bytes, 0 no buffer

Chapter 8. Implementing QoS in a z/OS and Cisco environment

249

Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 14917 packets output, 11302859 bytes, 0 underruns 0 output errors, 0 collisions, 1 interface resets 0 output buffer failures, 0 output buffers swapped out 1 carrier transitions DCD=up DSR=up DTR=up RTS=up CTS=up

The graph in Figure 8-3 was generated by Chariot and shows response time for each traffic category.

Figure 8-3 Response time without QoS in the network

It can be seen from the chart that SNA traffic (top two lines) is almost starved out by the competing IP data. The FIFO queuing offers no benefit to any particular traffic category. This provides an example of what can happen in a network without any regard to traffic classification or prioritization. Compare these results with those following application of the QoS policies. The QoS policies were re-applied to the outbound congestion point as shown in Example 8-17. Example 8-17 QoS policy applied int s5/0.1 service-policy output FRshape int s5/0.2 service-policy output FRshape

250

Networking with z/OS and Cisco Routers: An Interoperability Guide

int s5/0.3 service-policy output FRshape

The output from the show interface command on the router confirms application of the policy and the active queuing strategy as shown in Example 8-18. Example 8-18 Show interface command after QoS is enabled C7200-Z55#sh in s5/0 Serial5/0 is up, line protocol is up Hardware is M4T Description: NIVT Interoperability Frame Relay Cloud Internet address is 9.67.156.145/28 MTU 2106 bytes, BW 256 Kbit, DLY 20000 usec, reliability 255/255, txload 251/255, rxload 14/255 Encapsulation FRAME-RELAY IETF, crc 16, loopback not set Keepalive set (5 sec) LMI enq sent 626, LMI stat recvd 626, LMI upd recvd 0, DTE LMI up LMI enq recvd 0, LMI stat sent 0, LMI upd sent 0 LMI DLCI 0 LMI type is ANSI Annex D frame relay DTE FR SVC disabled, LAPF state down Broadcast queue 0/64, broadcasts sent/dropped 321/0, interface broadcasts 55 Last input 00:00:03, output 00:00:00, output hang never Last clearing of "show interface" counters 00:52:12 Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 825 Queuing strategy: weighted fair Output queue: 0/150/64/0 (size/max total/threshold/drops) Conversations 0/1/64 (active/max active/max total) Reserved Conversations 0/0 (allocated/max allocated) Available Bandwidth 192 kilobits/sec 30 second input rate 15000 bits/sec, 27 packets/sec 30 second output rate 252000 bits/sec, 26 packets/sec 23501 packets input, 2028921 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 23659 packets output, 16517826 bytes, 0 underruns 0 output errors, 0 collisions, 2 interface resets 0 output buffer failures, 0 output buffers swapped out 2 carrier transitions DCD=up DSR=up DTR=up RTS=up CTS=up C7200-Z55#

The graph in Figure 8-4 shows the response time for the various traffic categories with QoS applied in the network.

Chapter 8. Implementing QoS in a z/OS and Cisco environment

251

Figure 8-4 Response time with QoS enabled in the network

The effects of the QoS policies can be clearly seen in the results. Traffic classes incur response times corresponding to their relative bandwidth allocation. Response time for SNA interactive traffic, for example, averaged 1.57 seconds with QoS enabled, as opposed to 10.95 seconds without QoS enabled, as illustrated in Table 8-4. Table 8-4 Results of QoS policies on response time

252

Class

Average response time with QoS

Average response time without QoS

FTP receive

22.83638

4.9296

HTTP text

10.14838

4.97382

SNA batch

53.279

20.7397

SNA interactive

1.57708

10.9572

Telnet

1.75396

4.35252

Other

47.17735

4.58654

Networking with z/OS and Cisco Routers: An Interoperability Guide

8.3.1 Summary This scenario illustrates how to create and use the QoS Policy Agent on the host in combination with the QoS features available in Cisco IOS to achieve service policies relative to business criteria. While the information was presented using a scenario in our lab environment rather than an actual, enterprise customer deployment, the concepts and procedures apply to many enterprises and service providers that attempt to deploy end-to-end Quality of Service.

Chapter 8. Implementing QoS in a z/OS and Cisco environment

253

254

Networking with z/OS and Cisco Routers: An Interoperability Guide

9

Chapter 9.

Load distribution with MNLB and Sysplex Distributor This chapter introduces a load distribution solution using Communications Server for z/OS Sysplex Distributor in cooperation with Cisco MultiNode Load Balancing (MNLB) functions in routers. The environment, the load distribution process, and also the data flow is different from solutions described in 5.2, “IBM Sysplex Distributor” on page 126, and 5.4, “Cisco MultiNode Load Balancing (MNLB)” on page 135, because two of the required main tasks will now be done by software in the IBM Sysplex Distributor and in Cisco routers/switches operating cooperatively, rather than in the Sysplex Distributor or in the MNLB environment only. These main tasks are: 򐂰 Determination of the optimal application server 򐂰 Forwarding IP packets from clients to the selected application server

The following sections describe: 򐂰 Functions used for a TCP connection and distribution process 򐂰 The advantages of the combined IBM Sysplex Distributor/Cisco MNLB solution 򐂰 The IBM Sysplex Distributor/Cisco MNLB configuration used in our tests 򐂰 The IBM Sysplex Distributor Service Manager implementation 򐂰 Cisco Forwarding Agent definitions

© Copyright IBM Corp. 2002

255

򐂰 The data flow between the IBM Sysplex Distributor Service Manager and Cisco Forwarding Agent 򐂰 Displays used to control the definitions 򐂰 The IBM Sysplex Distributor backup and recovery tasks and definitions 򐂰 The Cisco Forwarding Agent definitions and backup considerations 򐂰 The Generic Routing Encapsulation (GRE) Protocol, and why it is needed

9.1 Connection distribution for a sysplex The MNLB/Sysplex Distributor distribution provides an attractive solution for when clients need high-speed access to TCP/IP services such as: 򐂰 Hypertext Transfer Protocol (HTTP) for Web services 򐂰 3270 Telnet Server (TN3270 or TN3270E) 򐂰 File Transfer Protocol (FTP)

Multiple homogeneous application servers should be organized in a server cluster within a sysplex. The IBM z/OS Sysplex Distributor uses information provided periodically by the z/OS Workload Manager (WLM) to keep track regarding the current optimal server within the sysplex. This information is used later to distribute TCP connection requests to the “best” server. The Sysplex Distributor propagates the server cluster address for specific application services to the network. Clients use this cluster address rather than directly addressing the desired target server. This cluster address is an IP address known by the name server. From the view of the client, it looks as if the Sysplex Distributor runs the application. However, the Sysplex Distributor only advises the router where to forward the packets for a specific connection. The real server is located on another LPAR/system in the sysplex. When the client sends the connection request to the network, it will arrive at the Cisco MNLB function, the Forwarding Agent. The Forwarding Agent is a function located in a Cisco router such as Cisco 7505. Since the destination IP address for the application server points to the IBM Sysplex Distributor, the Forwarding Agent sends the connection request at first to the Sysplex Distributor. The Sysplex Distributor searches for the “best” target server and sends the packet via the Cross Coupling Facility (XCF) link to the target system. The target server processes the connection request, which is indicated by the SYN-bit in the TCP header coming from the client. The target server responds to the client also with a SYN, and an ACK. The ACK is the acknowledgment for the synchronization request from the client to the server. The second SYN flows in the direction of the client to establish the full-duplex TCP connection. The

256

Networking with z/OS and Cisco Routers: An Interoperability Guide

response packet from the server to the client flows directly to the network without traversing the XCF link to the Sysplex Distributor. The route used is calculated by the routing daemon. In our test environment, we used OMPROUTE within the sysplex and Cisco’s EIGRP within the router-controlled network. Compared with the solution provided by OS/390 CS V2.10, it is so far the same packet flow. But the succeeding packet flow is now completely different. The client now will respond to the SYN, previously sent from the application server, with an ACK. Since the client has no destination IP address other than the one of the Sysplex Distributor, it believes that the desired application server runs in the stack of the Sysplex Distributor. Therefore it will continue sending the ACK packet and the following packets to the Sysplex Distributor, which acts as the representative of the cluster IP address for the application server. But from now on when the client’s packets arrive at the Cisco router before entering the sysplex, the router should already know the real IP address of the target server. This would enable direct routing to the target server without touching the Sysplex Distributor. In order to provide the router or switch with knowledge about a shorter path to the target systems, the Sysplex Distributor has to transfer the routing information to the router. Remember, this routing information was based on the z/OS WLM’s workload management information. In a pure Cisco MNLB configuration, this routing information is provided by the Cisco Service Manager, which runs in a Cisco LocalDirector unit, installed externally from the IBM sysplex. The Cisco Service Manager communicates with a function, called the Cisco Forwarding Agent, implemented in the Cisco router. A special communication protocol is used to exchange routing information on accessible application servers in target systems. This protocol is called Cisco Appliance Services Architecture (CASA). The z/OS CS V1.2 Sysplex Distributor now has the Service Manager function also. This enables the Sysplex Distributor to propagate to the Cisco Forwarding Agent routing information for a specific TCP connection. Thus the Cisco Forwarding Agent is able to directly send all packets of an existing connection to the real target server.

9.2 Advantages of the solution Compared with the Sysplex Distributor on OS/390, the IBM z/OS Sysplex Distributor works now as a Service Manager for the Cisco Forwarding Agent in a MultiNode Load Balancing (MNLB) environment.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

257

There are some significant advantages to implementing the Sysplex Distributor with the Service Manager function in an MNLB environment. For example: 򐂰 The inbound path for all TCP data to the target server now goes directly from the Forwarding Agent to the target server.

– The inbound data path used in OS/390 v2.10 always went via the Sysplex Distributor and XCF link to the target server. – In some cases the OS/390 Sysplex Distributor could become a bottleneck for high-speed access used by Web services, regarding the load of traffic on links to the Sysplex Distributor, the CPU load needed for each IP packet on the IP layer, and the load of the traffic used for the XCF links. 򐂰 Now a switch forwards client-to-server traffic at wire speed directly to the selected target server. This means:

– Reduced CPU load for the Sysplex Distributor. – Less contention for the Cross Coupling Facility (XCF) resources. 򐂰 In a pure MNLB configuration, a separate Cisco LocalDirector unit (and a backup LocalDirector) containing the Cisco Service Manager was used to make the connection distribution. Now, when the IBM Service Manager in the Sysplex Distributor is used, the LocalDirector is no longer needed. A backup Sysplex Distributor would take over all functions of the primary Sysplex Distributor in case of a system failure. 򐂰 A separate Cisco OS/390 Workload Agent, running in each LPAR of all TCP/IP application server target stacks is no longer required. Also the Cisco Dynamic Feedback Protocol (DFP) is obsolete and thus traffic load from the Workload Agent to the Forwarding Agent is avoided. 򐂰 The Service Manager in the Sysplex Distributor provides load balancing decisions based on Quality of Service (QoS) from the Policy Agent (PAGENT) definitions in the z/OS system, in addition to the WLM information.

9.3 IP addresses used during our tests Our test configuration consists of a sysplex with four logical partitions (LPARs) named as follows: 򐂰 MVS001, which is the IBM Sysplex Distributor and the Service Manager for the Cisco MNLB configuration. 򐂰 MVS069, which is the IBM Backup Sysplex Distributor for MVS001. 򐂰 MVS062, which can be regarded as a data host running TCP/IP application servers on one TCP/IP stack. For our tests, we used mainly the TN3270E server to access TSO, a Web server, and the FTP server.

258

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 MVS154, which is a data host like the MVS062. TCP connection requests are distributed to this host or to the MVS062 by the Sysplex Distributor/Service Manager based on z/OS Workload Manager (WLM) information.

All four LPARs run one TCP/IP stack. The LPARs are connected to each other via the Cross Coupling Facility (XCF). All LPARs have connections to two Cisco routers controlling the network via: 򐂰 A shared OSA-Express Gigabit Ethernet (GbE) adapter. 򐂰 Via ESCON director to Cisco router using Common Link Access to Workstation (CLAW), or the Multipath Channel Plus (MPC+) Protocol.

– The CLAW connection is to the Cisco router 7206VX. – The MPC connection is to the Cisco 7507 router running Cisco MPC+ (CMPC+).

Chapter 9. Load distribution with MNLB and Sysplex Distributor

259

IBM z/OS Sysplex .73

9.67.156.72/29

.74

Sysplex Distributor (Service Manager) Static VIPAs 9.67.156.1/30 Dynamic VIPAs 9.67.156.25/29 9.67.156.26/29 Distributed VIPAs 9.67.157.17/29 9.67.157.18/29 .66

.17

Static VIPAs 9.67.156.161/30 Dynamic VIPAs 9.67.156.33/29 9.67.156.34/29

.129

.69

.18

.75

MVS062 MVS069

MVS062 MVS069

MVS001

.76

XCF

Backup Sysplex Distributor (Service Manager) Static VIPAs 9.67.156.5/30 Dynamic VIPAs 9.67.156.49/29 9.67.156.50/29

.130

.67

.20

.132

MVS154 MVS154

Static VIPAs 9.67.156.165/30 Dynamic VIPAs 9.67.156.41/29 9.67.156.42/29

.68

.19

.131

9.67.156.16/29 9.67.156.64/29

READ

WRITE

9.67.157.128/28

D9 CA

D1

AD

7507

fa0/0/0

fa0/0/1

.137

ESCON Director

AC F9

CMPC+ .21, .21 .21, .21 c6/0 c6/1

OSAExpress GbitE

CMPC+

CLAW .65, .65 .65, .65 c3/0

7206VXR

GRE Tunnels g1/0 1/2 .136

GRE Tunnels 9.67.157.128/28 vlan 400

Figure 9-1 IBM z/OS Sysplex and Cisco MNLB network

260

Networking with z/OS and Cisco Routers: An Interoperability Guide

2/26 2/25

1/1

6509

XCF interface Dynamic XCF links connect the four LPARs via the Cross Coupling Facility (XCF). The subnet used is 9.67.156.72 with mask 255.255.255.248. The TCPIP.PROFILE definition for the system MVS001 for this link is: IPCONFIG DYNAMICXCF 9.67.156.73 255.255.255.248 2

The definition for MVS062 is: IPCONFIG DYNAMICXCF 9.67.156.74 255.255.255.248 2

The definitions for the remaining systems MVS069 and MVS154 are: IPCONFIG DYNAMICXCF 9.67.156.76 255.255.255.248 2 IPCONFIG DYNAMICXCF 9.67.156.75 255.255.255.248 2

Note: The format of the DYNAMICXCF statement shows the IP address of the XCF link, a subnet mask and a metric value for the link. This metric value will be used only for the Routing Information Protocol (RIP) in BSDROUTINGPARMS statements of the ROUTED daemon. Since we used Open Shortest Path First (OSPF) as the routing protocol for the sysplex with OMPROUTE as the router daemon, all definitions for the metric of each interface have been made in the OMPROUTE configuration file. The metric values defined in the DYNAMICXCF statement are meaningless.

When the system MVS001 has XCF links to all other systems, the HOME list on MVS001 would look like the following screen:

Chapter 9. Load distribution with MNLB and Sysplex Distributor

261

=> netstat home MVS TCP/IP NETSTAT CS V1R2 Home address list: Address Link ---------9.67.156.1 VLINK0 9.67.157.129 GIGELINK 9.67.156.66 CISCO1 9.67.156.17 CISCO2 9.67.157.42 TOLINUX 9.67.156.2 TOVTAM 9.67.156.25 VIPL09439C19 9.67.156.26 VIPL09439C1A 9.67.156.73 EZAXCFN7 9.67.156.73 EZAXCFN5 9.67.156.73 EZAXCFN6 127.0.0.1 LOOPBACK

TCPIP NAME: TCP Flg --P

Three HOME list statements with equal IP addresses but with different link names show the existing XCF links to the other systems. These HOME statements are created dynamically when the partner TCP/IP stack with its XCF interface is started. Also the DEVICE, LINK, and START statements are created automatically. An example of the dynamic XCF DEVICE and LINK definitions is shown in the following screen.

262

Networking with z/OS and Cisco Routers: An Interoperability Guide

. DevName: N07N

DevType: MPC

DevNum: 0000

DevStatus: Ready LnkName: EZAXCFN7 NetNum: 0

LnkType: MPC

LnkStatus: Ready

QueSize: 0

BytesIn: 41536

BytesOut: 40822

BSD Routing Parameters: MTU Size: 04472

Metric: 200

DestAddr: 0.0.0.0

SubnetMask: 255.255.255.248

Multicast Specific: Multicast Capability: Yes Group

RefCnt

-----

------

224.0.0.5 224.0.0.1

0000000001 0000000001

The screen shows the definitions to system N7. In the LINK statement the N7 is taken from the &SYSCLONE parameter of system MVS001. The DEVICE name is taken from the VTAM CP node name. See the z/OS IPL flow in the next screen.

290

IEA007I STATIC SYSTEM SYMBOL VALUES 016

090

&SYSALVL.

090

&SYSCLONE. = "N7"

= "1"

090

&SYSNAME.

= "MVS069"

090

&SYSPLEX.

= "LOCAL"

090

&SYSR1.

= "OS120"

090

&CONSOLID. = "MVS069"

..... 090

&DOMAIN.

= "N07NV"

……… VTAM-Start… 090

IST093I N07N ACTIVE

Chapter 9. Load distribution with MNLB and Sysplex Distributor

263

Static VIPA IP address Normally, the static VIPA addresses are used for a device-independent destination endpoint. Whenever an interface of a device, such as the Ethernet or token-ring adapter, fails, a routing protocol would find an alternate path to the static VIPA address. In our configuration, we used the static VIPA address (for example: 9.67.156.1 with subnet mask 255.255.255.252) as the tunnel endpoint for a tunnel between the Cisco routers 7507 and 7206VXR and the OSA-Express adapter. Tunnels are required only when the OSA-Express adapter is shared by multiple stacks. In our case, there is one OSA-Express adapter that is connected to all four LPARs MVS001, MVS062, MVS069, and MVS154. The OSA-Express adapter is defined in share mode. The tunnel carries Generic Routing Encapsulation (GRE) packets over the Gigabit Ethernet (GbE) to the OSA-Express adapter. Further detailed information is found in 9.10, “Generic Routing Encapsulation (GRE) protocol” on page 325. Actually the static VIPA defined in system MVS001 is used for Enterprise Extender. We used it for the tunnel, too. See the following definitions: Example 9-1 Static VIPA definitions DEVICE VIPA03 VIRTUAL 0 LINK VLINK0 VIRTUAL 0 HOME 9.67.156.1 VLINK0

VIPA03

OSA-Express adapter interfaces One OSA-Express GbE adapter running QDIO shares its port with the four LPARs - MVS001, MVS069, MVS062, and MVS154. The IP addresses of all stacks belong to the netid 9.67.157.128 with subnet mask 255.255.255.240. The TCPIP.PROFILE definitions for MVS001 look like this: Example 9-2

OSA-Express adapter definitions for primary router

DEVICE GIGE2F00 MPCIPA LINK GIGELINK IPAQGNET START GIGE2F00

264

PRIR GIGE2F00

Networking with z/OS and Cisco Routers: An Interoperability Guide

For MVS001, the OSA adapter interface is defined as the primary router (see parameter PRIR in the DEVICE statement). This enables the OSA adapter to forward packets to the TCP/IP stack whose IP address is not in the OSA Adapter Table (OAT). The OAT will learn all IP addresses when the devices are started. These are IP addresses defined in the HOME-list either statically (such as the Ethernet or token ring interfaces) or automatically (such as the dynamic XCF address). The primary router definition “PRIR” for the OSA adapter is enabled for the TCP/IP stack of Sysplex Distributor. The definition of the OSA adapter interface for the TCP/IP stack of the backup Sysplex Distributor is therefore defined as the secondary router. This means that, in case of a failure of the TCP/IP stack on MVS001, the OSA adapter interface for the backup Sysplex Distributor takes over the function of receiving all IP packets with known and unknown IP addresses in the OAT. The OSA-Express GbE definitions for MVS069 look as follows. Example 9-3 OSA-Express adapter definitions for the secondary router DEVICE GIGE2F00 MPCIPA LINK GIGELINK IPAQGNET START GIGE2F00

SECR GIGE2F00

The interfaces of the remaining LPARs are defined as non-router “NONR”. This is because these are target systems and don’t route or forward IP packets. These systems are the endpoints of the TCP connections. They are the real application server systems only. The OSA device definitions are exactly the same. See the following sample. Example 9-4 OSA-Express Adapter definitions for the target systems DEVICE GIGE2F00 MPCIPA NONR LINK GIGELINK IPAQGNET GIGE2F00 START GIGE2F00

The VTAM Transmission Resource Element (TRLE) defines the channel path for the OSA-Express adapter. See the following definitions. Example 9-5 VTAM TRLE entries for OSA-Express adapter N04GIG1

TRLE

LNCTL=MPC, READ=(2F14), WRITE=(2F15), DATAPATH=(2F16,2F17), PORTNAME=(GIGE2F00,0)

Chapter 9. Load distribution with MNLB and Sysplex Distributor

265

Common Link Access to Workstation (CLAW) Protocol This interface provides channel attachment to a Cisco router 7206 through an ESCON director. All of the LPARs have a channel path to the Cisco router. If the OSA-Express fails, this might be an alternate path between the sysplex and the IP network. The following TCPIP.PROFILE definitions are used. Example 9-6 TCPIP.PROFILE definitions for CLAW device DEVICE CIP1A CLAW D30 MVS001B LINK CISCO1 IP 0 CIP1A START CIP1A

C7507A

PACKED

15 15

32768

32768

Sub-channel addresses D30 and D31 are used for the receive and the send path from the MVS001 to the Cisco 7206 router. The host name is MVS001B. The workstation name is C7202A. The channel program uses a packed mode, which means that more than 4 KB buffer sizes for the channel-read or channel-write commands may be used. In packed mode, the size may be 32 KB or 60 KB. In our case we used 32768 bytes for the read and write buffer sizes, using 15 read buffers and 15 write buffers.

Multipath Channel (MPC) Protocol In addition to CLAW, there is also an MPC channel attachment for all LPARs to a Cisco router. This attachment is to Cisco router 7507. The following TCPIP.PROFILE definitions show the attachment. Example 9-7 TCPIP.PROFILE definitions for MPC device DEVICE N04CMPC MPCPTP LINK CISCO2 MPCPTP N04CMPC START N04CMPC

The device name N04CMPC has to match the VTAM TRLE name. The VTAM statement carries the information about the channel addresses, as follows. Example 9-8 VTAM TRLE definition for MPC device N04CMPC

266

TRLE

LNCTL=MPC,MAXBFRU=255,REPLYTO=25.5,MAXREADS=8, STORAGE=DS,MPCLEVEL=HPDT, READ=(D20),WRITE=(D21)

Networking with z/OS and Cisco Routers: An Interoperability Guide

Dynamic VIPA addresses There are some dynamic VIPA addresses defined in our test machine which we did not use. An example for MVS001 is as follows. Example 9-9 dynamic VIPA definitions VIPADYNAMIC VIPADEFINE 255.255.255.248 9.67.156.25 9.67.156.26 VIPABACKUP 1 9.67.156.33 9.67.156.34 VIPABACKUP 1 9.67.156.41 9.67.156.42 VIPABACKUP 100 9.67.156.49 9.67.156.50 ENDVIPADYNAMIC

Dynamic Distributed VIPAs There are also additional definitions for the distributed dynamic VIPAs (DVIPAs), which belong to the same VIAPDYNAMIC/ENDVIPADYNAMIC block. They are defined in the TCPIP.PROFILE of the Sysplex Distributor only. The backup Sysplex Distributor will take over these distributed DVIPAs if the primary Sysplex Distributor fails. You may view the definitions for the backup Sysplex Distributor in 9.9, “Sysplex Distributor backup” on page 311. The TCPIP.PROFILE definitions for the target stacks with its real application servers may be viewed in 9.6.2, “Basic TCPIP.PROFILE definitions” on page 283. The Sysplex Distributor uses the following DVIPAs for the distribution to the target machines MVS062 and MVS154. These are definitions the Sysplex Distributor used in CS for OS/390 V2R10. The definitions for the Sysplex Distributor with Service Manager is shown in Example 9-10. Example 9-10 Dynamic distributed VIPA definitions with no Service Manager VIPADEFINE

MOVEABLE IMMED 255.255.255.248 9.67.157.17 VIPADIST 9.67.157.17 PORT 80 443 23 523 DESTIP 9.67.156.74 9.67.156.75

VIPADEFINE

MOVEABLE IMMED 255.255.255.248 9.67.157.18 VIPADIST 9.67.157.18 PORT 20 21 DESTIP 9.67.156.73 9.67.156.74

Three statements are used to define distributed dynamic VIPAs, also called distributed DVIPAs:

Chapter 9. Load distribution with MNLB and Sysplex Distributor

267

1. VIPADEFINE is used to specify that this IP address may be taken over by another TCP/IP stack. This might occur when a backup Sysplex Distributor takes over the DVIPA because the primary Sysplex Distributor fails. In this case a VIPABACKUP statement has to be defined in the backup Sysplex Distributor’s TCPIP.PROFILE. See an example in 9.9, “Sysplex Distributor backup” on page 311. 2. VIPADIST is used to specify which DVIPA is going to be distributed to TCP/IP target stacks. This statement does not define to which stack it is distributed. It defines which server ports are associated to this DVIPA. In our sample, the distributed DVIPA 9.67.157.17 will be the IP address for Web services on port 80 and 443. Port 80 is the non-secure port that is used for the HTTP protocol. Port 443 is the secure port that is used for the HTTPS protocol. On the same IP address, TN3270E connections will be opened on port 23, and secure TN3270 connections will be on port 523. The distributed DVIPA 9.67.157.18 is defined for File Transfer Protocol (FTP) connections on port 20 for the data transmission and on port 21 for the control connection. 3. DESTIP is used to specify to which TCP/IP stack in another LPAR/system within a sysplex the DVIPA defined under VIPADIST may be distributed. The IP address of the TCP/IP stack is the dynamic XCF address. Whenever the TCP/IP stack in the target system is active, the XCF link to the Sysplex Distributor’s XCF IP address will be used to exchange information between the stacks telling them to implement and activate a distributed DVIPA on the target stack. In our sample TCP connection requests will be distributed for Web services and the TN3270 services to TCP/IP stack with XCF address 9.67.156.74 (MVS062) and to 9.67.156.75 (MVS154). The FTP services are distributed to 9.67.156.73 (MVS001) and to 9.67.156.74 (MVS062). The following screen shows the dynamically created HOME statements on MVS062.

268

Networking with z/OS and Cisco Routers: An Interoperability Guide

=> netstat home MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Home address list: Address Link Flg -------------------------9.67.156.161 VLINK0 P 9.67.157.130 GIGELINK 9.67.156.69 CISCO1 9.67.156.18 CISCO2 9.67.157.243 CISCO3 9.67.156.162 TOVTAM 9.67.156.74 EZAXCFN6 9.67.156.33 VIPL09439C21 9.67.156.34 VIPL09439C22 9.67.156.74 EZAXCFN7 9.67.156.74 EZAXCFN4 9.67.157.17 VIPL09439D11 I 9.67.157.18 VIPL09439D12 I 127.0.0.1 LOOPBACK ***

The screen shows the two inserted (flag I) distributed DVIPAs, 9.67.157.17 and 9.67.157.18, along with special link names that are constructed from the hexadecimal values of each distributed DVIPA.

9.4 Data flow: Service Manager and Forwarding Agent The data flow between the Service Manager in the IBM Sysplex Distributor and the Forwarding Agent in the Cisco router consist of two main types of IP packets that are important for the Forwarding Agent. These are: 򐂰 Fixed affinity 򐂰 Wildcard affinity

Before we continue to describe the data flow, let’s first explain the meaning of the two affinity types.

Affinity The term affinity describes the action of associating or coupling one thing with another.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

269

In this context, information about the IP and TCP header is collected to build a unique identifier in order to distinguish and associate IP packets to a certain TCP connection. This identifier is used by Forwarding Agents to differentiate the IP packets from incoming TCP connection requests and from packets belonging to existing TCP connections.

Fixed affinity A fixed affinity is one that matches specific information within a TCP connection identifier. This identifier is defined by its unique 5-tuple that spans the packet headers. These are: 򐂰 For IP header:

– Protocol type (TCP only = value 6) – Source IP address – Destination IP address 򐂰 For TCP header:

– Source port – Destination port This composed information is used by the Forwarding Agent to differentiate incoming IP packets, and to associate the packets to a certain action. This may be, for example, forwarding the packet to a predetermined target system dictated for this specific TCP connection. In addition, a fixed affinity contains two other important pieces of information: 򐂰 The forward IP address, which names the real target server for the specific TCP connection. The Sysplex Distributor provides the forward IP address that is the dynamic XCF link address of the TCP/IP stack of the application server. 򐂰 A time-to-live (TTL) value telling what time the fixed affinity entry should be kept in the Forwarding Agent’s cache. This value is provided by the Sysplex Distributor. The value is 15 minutes.

Wildcard affinity A wildcard affinity is a 5-tuple piece of information as well. Compared with fixed affinity, it has different values in the source IP address and in the source port. 򐂰 The source IP address contains a value of 0.0.0.0 򐂰 The source port contains a value of 0000

This allows the Forwarding Agent to accept all incoming packets from all IP addresses and ports for further processing.

270

Networking with z/OS and Cisco Routers: An Interoperability Guide

Wildcard affinity is mainly used to accept TCP connection requests with a SYN-bit set on in the TCP header. This request will be forwarded by the Forwarding Manager to the Service Manager for a load-balanced distribution to one of the target systems running the application server. Wildcard affinity is created and sent from the Service Manager to the Forwarding Agent. It also contains a forwarding IP address, which is the dynamic XCF link IP address of the Sysplex Distributor.

9.4.1 Wildcard affinity and processing Wildcard affinity is used by the Forwarding Agent to know which IP packets have to be sent to the Service Manager. Therefore it has to know which IP address is used as the cluster address for a specific application service and the port for this application. When a Forwarding Agent receives an IP packet, it looks at the IP header searching for the cluster address that is the destination IP address and the value ‘6’, telling it is the TCP protocol. It also checks the destination port in the TCP header for the requested application, for example 23 as the telnet port. The Forwarding Agent also requests the source IP address and a source port. In a wildcard definition these two values are always set to zero. This means the Forwarding Agent accepts TCP packets from any source. Only the destination entries and the protocol are important. The wildcard affinity information is provided by the Service Manager in multicast packets at startup of the Sysplex Distributor stack and later periodically. The following screen, obtained from one of two Forwarding Agents, shows the wildcard information sent as multicast from the Sysplex Distributor that we used during our tests.

NIVT7507#show ip casa wildcard Source Address Source Mask 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 NIVT7507#

Port 0 0 0 0 0 0

Dest Address 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.18 9.67.157.18

Dest Mask 255.255.255.255 255.255.255.255 255.255.255.255 255.255.255.255 255.255.255.255 255.255.255.255

Chapter 9. Load distribution with MNLB and Sysplex Distributor

Port 523 443 80 23 21 20

271

The screen sample shows the information used by the Forwarding Agents: 򐂰 They accept IP packets from any client to two destination IP addresses (9,67,157.17 and 9.67.157.18). These IP addresses were defined previously in the TCPIP.PROFILE of MVS001 under VIPADYNAMIC, VIPADEFINE, VIPADIST, DESTIP, and ENDVIPADYNAMIC. See “Dynamic Distributed VIPAs” on page 267. 򐂰 They accept packets with TCP connection requests where the SYN bit in the TCP header is set on for:

– – – – – –

Port 23 as the normal TN3270 connection Port 523 for a secure TN3270 connection Port 80 for HTTP to Web services Port 443 for HTTPS secure access to Web services Port 21 for an FTP server control connection Port 20 for an FTP server data connection

When the Forwarding Agent receives a TCP connection request from a client, it checks the destination IP address, protocol and port number. Only if the destination IP address, port number and TCP protocol match the wildcard affinity entry and the SYN bit is on will this request be encapsulated in a CASA packet and sent to the Service Manager, in this case as a unicast packet. Further information may be obtained from “CASA information in the Forwarding Agent” on page 297.

9.4.2 Service Manager processes TCP connection request The incoming CASA request for the TCP connection is processed by the Sysplex Distributor as follows: 򐂰 Unpacks the CASA packet. 򐂰 Checks the IP destination address, which is the cluster address of an application service. 򐂰 Selects the “best” target server, from a group defined under VIPADIST, PORT, and DESTIP, based on load-balancing information and rules. 򐂰 Forwards the initial IP packet to the target server via the XCF link. 򐂰 Sends a unicast packet, called fixed affinity, to the Forwarding Agent that sent the CASA packet with the SYN request. This unicast message contains the information about the forwarding IP address of the real target server for this particular TCP connection. The forwarding IP address is the XCF link IP address of the target stack. 򐂰 The XCF link address will be used by the Forwarding Agent to build a fixed affinity for the specific TCP connection. See “Fixed affinity” on page 270.

272

Networking with z/OS and Cisco Routers: An Interoperability Guide

9.4.3 Continuation of the TCP connection establishment process The connection establishment process continues between the application server and the client. The application server sends a SYN, ACK to the client. The client also returns an ACK to the application server. Since the Forwarding Agent sees all incoming packets, it is informed about the connection establishment process and updates the state of the connection.

9.4.4 Fixed affinity processing When the TCP connection is established, the Forwarding Agent knows all information of a specific TCP connection. All TCP connections are kept in the Forwarding Agent’s table for fixed affinities. A sample of a fixed affinity table is shown in the following output taken from a Cisco router’s Forwarding Agent.

NIVT7507#show ip casa Source Address Port 9.67.156.104 4970 9.67.156.104 4971 NIVT7507#

affinities Dest Address 9.67.157.17 9.67.157.18

Port 23 21

Prot TCP TCP

A table entry for a fixed affinity is valid for one connection only. It is defined by its unique 5-tuple information taken from the IP header and TCP header. Only IP packets with matching 5-tuples will be recognized by the Forwarding Agent as packets belonging to an existing TCP connection. These packets will be forwarded directly to the real application server without traversing the path via the Sysplex Distributor and the XCF link. Therefore, the fixed affinity entry also must contain a forward IP address for the real application server. This IP address will be provided by the Sysplex Distributor via a unicast fixed affinity-update message. The forwarding IP address is the dynamic XCF link address for the target stack. In the screen above, the forwarding IP address is not shown. It is detailed information that is obtained using the command show ip casa aff det. For more information, see “CASA information in the Forwarding Agent” on page 297.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

273

9.4.5 Prerequisites for the CASA protocol exchange The communication between the Service Manager and the Forwarding Agents is done using the Cisco Appliance Services Architecture (CASA) protocol. In order to use CASA multicast packets, the following definitions are required: 򐂰 In the Sysplex Distributor:

– Define the Sysplex Distributor as the Service Manager. – Determine the multicast IP address, which is a class D address, such as 224.0.1.2. This address was used in our tests, as recommended in Cisco’s documentation. This is a destination IP address to reach all Forwarding Agents. – Determine the CASA port. This is the destination port to reach the CASA protocol in all Forwarding Agents. We used the Cisco default recommendation port number 1637. 򐂰 In the Forwarding Agents:

– Define the multicast IP address used by the Service Manager to address the Forwarding Agent. In our configuration we used 224.0.1.2 as the multicast address. – Define port 1637 as the listening port in the Forwarding Agent. You will find the required definitions for the Sysplex Distributor in Example 9-11 on page 281 and for the Forwarding Agent in 9.7, “Forwarding Agent definitions” on page 283.

274

Networking with z/OS and Cisco Routers: An Interoperability Guide

9.4.6 Message flow of wildcard and fixed affinities, SYN, ACK, data Sysplex Distributor (Service Manager)

Target Stacks

XCF

Target Stack

MVS001

XCF 9.67.156.73

4 rw Fo

DVIPA FTP 9.67.157.18

Fixed Affinity

6 for Connection

ng di

OSA Express 9.67.157.129

ar

DVIPA TN3270 9.67.157.17

Target Stack MVS069

XCF 9.67.156.76

MVS062

MVS154

XCF 9.67.156.74

XCF 9.67.156.75

DVIPA FTP 9.67.157.18

Backup DVIPA FTP 9.67.157.18 Backup DVIPA TN3270 9.67.157.17

DVIPA TN3270 9.67.157.17

DVIPA TN3270 9.67.157.17

OSA Express 9.67.157.130

OSA Express 9.67.157.131

OSA Express 9.67.157.132

3 Interest Match

Switch

with SYN Packet

Router Forwarding Agent Multicast 1 Wildcard Affinities

Sysplex Distributor Backup

Router Forwarding Affinity Table Agent Wildcard Affinities 2S

Affinity Table 5 SY Wildcard Affinities N C Fixed Affinities on + AC K n Fl ect ow io n

Fixed Affinities

YN

Pa ck et

Multicast 1 Wildcard Affinities

Login TN3270 (Port 23 Cached IP- Address is 9.67.157.17)

Figure 9-2 Message flow of wildcard and fixed affinities, SYN, ACK, and data

The following is an overview of the message flow needed to prepare the Forwarding Agent to accept client TCP connection requests and IP packets, and to tell the Forwarding Agent where to send the received IP packets, as shown in Figure 9-2: 1. The Sysplex Distributor configured as Service Manager multicasts a wildcard affinity update specifying that all new connection requests and existing

Chapter 9. Load distribution with MNLB and Sysplex Distributor

275

connection packets, for which there is no fixed affinity entry available, should be sent to the Sysplex Distributor stack. 2. A connection request, represented as a SYN packet, is received by the Forwarding Agent. 3. The Forwarding Agent encapsulates the SYN packet in an Interest Match CASA packet and forwards it to the Service Manager in the Sysplex Distributor stack. 4. The Sysplex Distributor unpacks the SYN packet, makes the routing decision, and forwards the SYN packet to the selected application server. The IP address for the real target, running the application server, is the dynamic XCF link IP address. 5. The target server returns a SYN, ACK directly to the client without touching the Sysplex Distributor stack. 6. The Service Manager in the Sysplex Distributor sends a fixed affinity for this particular connection back to the Forwarding Agent that forwarded the SYN packet. The fixed affinity instructs the Forwarding Agent to send data for this connection via the most efficient route to the application server, identified by the dynamic XCF IP address of the target stack. 7. The subsequent data flow carrying the ACK from the client and data exchanged during the connection are not shown here. These packets travel the direct path from the Forwarding Agent to the target system known by the dynamic XCF link IP address, and no longer touch the Sysplex Distributor.

9.4.7 Message flow for connection data with no fixed affinity This situation might occur when a Forwarding Agent other than the one that received the prior SYN request receives data. This might happen if the Forwarding Agent that has the fixed affinity for a particular connection fails. It also might happen when the path from the client to the application server target stack has changed and uses another Forwarding Agent. This Forwarding Agent, however, does not possess a fixed affinity yet. This second Forwarding Agent only has a wildcard affinity with a forwarding address to the Sysplex Distributor. It is, of course, the dynamic XCF link IP address. If the Forwarding Agent would have no wildcard affinity, it would not be able to communicate with the Service Manager in the Sysplex Distributor’s stack. The following flow description indicates how this case will be managed: 1. The Sysplex Distributor configured as Service Manager multicasts a wildcard affinity update specifying that all new connection requests and existing

276

Networking with z/OS and Cisco Routers: An Interoperability Guide

connection packets, for which there is no fixed affinity entry available, should be sent to the Sysplex Distributor stack. 2. An IP packet for an existing TCP connection now arrives at a Forwarding Agent, but the Forwarding Agent does not have a fixed affinity entry for this particular TCP connection. This Forwarding Agent also did not receive the SYN request for this TCP connection. It only has wildcard affinities. 3. If a matching fixed affinity is not found, the Forwarding Agent compares the packet against the wildcard affinity. –

Is there an entry available for the destination IP address and the destination port in the wildcard affinity table, then the Service Manager is known for the particular TCP connection. The destination IP address should be the cluster address the Sysplex Distributor has multicasted prior to all Forwarding Agents. The destination port should be an application service associated to the cluster address multicasted as well by the Sysplex Distributor. A matching wildcard affinity causes the Forwarding Agent to create a CASA packet.

– If there is no matching wildcard affinity entry (which might not occur because the Service Manager multicasts wildcard affinities every 30 seconds to all Forwarding Agents), the IP packet will be returned to the router. The router will send the packet to the IP address named in the IP header. Since it is the cluster address, which is the distributed DVIPA address, the packet will be received due to the routing mechanism by the Sysplex Distributor stack, or if this stack is down, by the backup Sysplex Distributor. 4. When a matching wildcard affinity is found, the Forwarding Agent encapsulates the IP packet as an IP-only CASA message type, and sends it to the known forwarding IP address, which is the dynamic XCF link address of the Service Manager in the Sysplex Distributor stack. 5. The Sysplex Distributor unpacks this IP-only CASA packet and checks the DVIPA connection routing table. If the 5-tuple information matches an existing TCP connection, it forwards the IP packet to the application server running the current connection. The forwarding IP address for the target running the application server is taken from the DVIPA connection routing table. It is the dynamic XCF link IP address of the target stack. 6. The Service Manager in the Sysplex Distributor returns a fixed affinity update to the Forwarding Agent that previously sent the CASA IP-only packet. 7. The Forwarding Agent updates its fixed affinity table. For subsequent packets of the connection it did not know before, it now has the information to forward IP packets directly to the correct target server without the assistance of the Service Manager.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

277

9.4.8 Message flow for closing a TCP connection The client closes a TCP connection by sending a packet with a FIN-bit set on in the TCP header. The connection closing procedure is similar compared to the connection establishment viewing the TCP flow only. The client's FIN is sent to the application server. The server is notified that the connection will be closed. The server acknowledges the FIN with an ACK, and sends a FIN also to finish the full-duplex TCP connection. The data path, however, from the client to the application server differs completely from the connection establishment path, because the Sysplex Distributor doesn't see the packets sent from the client to the server and vice versa.

278

Networking with z/OS and Cisco Routers: An Interoperability Guide

Sysplex Distributor (Service Manager) 5 MVS001

XCF 9.67.156.73 DVIPA FTP 9.67.157.18 DVIPA TN3270 9.67.157.17 OSA Express 9.67.157.129

Target Stacks XCF Target Stack

Target Stack

MVS069

MVS062

XCF 4 Update VCRT 9.67.156.74

MVS154

XCF 9.67.156.76

XCF 9.67.156.75

Backup DVIPA FTP 9.67.157.18

DVIPA FTP 9.67.157.18

Backup DVIPA TN3270 9.67.157.17

DVIPA TN3270 9.67.157.17

DVIPA TN3270 9.67.157.17

OSA Express 9.67.157.130

OSA Express 9.67.157.131

Router Forwarding Agent

OSA Express 9.67.157.132

1 FIN

, Ack 2 FIN

Switch

7

Multicast 6 Fixed Affinities

Sysplex Distributor Backup

3A

7

Router Forwarding Affinity Table Agent

CK

Fixed Affinities

IN 1F

Affinity Table Fixed Affinities

Multicast 6 Fixed Affinities

Network Login TN3270 (Port 23 Cached IP- Address is 9.67.157.17)

Figure 9-3 Message flow for shut down a TCP connection

The following is the message flow needed to close a TCP connection with activities by the client, the Forwarding Agent, the application server stack, and the Sysplex Distributor: 1. The client starts closing his TCP connection, for example by typing quit within the application program. This creates an IP packet with a FIN-bit set on in the TCP header. The FIN packet is sent towards the application server via the network. The Forwarding Agent receives this packet and forwards it depending on the fixed affinity entry for this particular TCP connection to the target server based on the forwarding IP address. This forwarding IP address is the dynamic XCF link address of the target system. The path from the Chapter 9. Load distribution with MNLB and Sysplex Distributor

279

Forwarding Agent to the target system goes directly upstream without touching the Sysplex Distributor. 2. The TCP/IP stack of the target system receives the FIN request. It signals that the client wants to close the existing TCP connection. The target stack acknowledges the received FIN with an ACK, and also sends a FIN to the client to close the second part of the full-duplex connection. 3. The client responds the second FIN with ACK also using again the direct path to the application server's stack. 4. The application server’s TCP/IP stack informs the Sysplex Distributor about the closed TCP connection via the dynamic XCF link. 5. The Sysplex Distributor updates its cache of the dynamic VIPA connection routing table (VCRT) by deleting the entry for this connection. It recognizes the TCP connection by comparing the 5-tuple information. Remember, the 5-tuple consists of the information about following information: – – – – –

Protocol used, which always is TCP (protocol number 6 in the IP header) Source IP address Destination IP address Source port Destination port

6. The Sysplex Distributor sends a multicast CASA message to all Forwarding Agents using the IP address 224.0.1.2 and port 1637 as the destination, and its own dynamic XCF link IP address, and port 1637 as source. This message is an Affinity-Delete type message with a time-to-live (TTL) value of 0 (zero). The TTL 0 causes all Forwarding Agents to delete existing entries for the particular TCP connection. 7. When the Forwarding Agents receive the multicast packet they delete the fixed affinity for this TCP connection from their caches. The process to close a TCP connection is now finished. The Sysplex Distributor will no longer show the previous TCP connection in its VIPA connection routing table. The show IP CASA affinities command, issued at the Forwarding Agent, may display the previously existing TCP connection for a couple of seconds but the entry will soon disappear.

9.5 Service Manager implementation As described in 9.4.5, “Prerequisites for the CASA protocol exchange” on page 274, the Service Manager has to be defined:

280

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 For the desired distributed DVIPAs 򐂰 Enabling the multicast support with: – Multicast address – Service Manager port

These definitions have to be done in the TCPIP.PROFILE.

9.5.1 Service Manager new TCPIP.PROFILE definitions In order to switch on Service Manager functions in the TCP/IP stack, in addition to Sysplex Distributor functions, several statements had to be defined within the VIPADYNAMIC and ENDVIPADYNAMIC scope. The VIPADEFINE statement received a new parameter, called SERVICEMGR. This allows the Service Manager to propagate the associated IP cluster address to MNLB Forwarding Agents for further distribution by Cisco routers. Packets for a TCP connection with this specific destination cluster IP address for an application are now distributed by an MNLB Forwarding Agent according to the distribution information of the Sysplex Distributor’s Service Manager. Example 9-11 VIPADEFINE ... SERVICEMGR VIPADEFINE

MOVEABLE IMMED SERVICEMGR 255.255.255.248 9.67.157.17 VIPADIST 9.67.157.17 PORT 80 443 23 523 DESTIP 9.67.156.74 9.67.156.75

In our sample the Service Manager is switched on for the DVIPA 9.67.157.17. A TCP connection with this cluster IP address will be distributed by the Forwarding Agent to the real target server 9.67.156.74 or 9.67.156.75, depending on the decision the Service Manager made when processing the SYN request. The parameter SERVICEMGR is needed for each VIPADEFINE statement if the DVIPA address has to be distributed using VIPADIST and DESTIP. The new parameter SERVICEMGR is valid only if VIPADIST and DESTIP statements follow. If no VIPADIST and DESTIP is defined for DVIPA, the SERVICEMGR does not work. An additional statement the VIPASMPARMS statement, has to be added. The statement tells the Service Manager what multicast IP address and port has to be used to send out CASA messages. Example 9-12 VIPASMPARMS VIPASMPARMS SMMCAST 224.0.1.2 SMPORT 1637

Chapter 9. Load distribution with MNLB and Sysplex Distributor

281

At the time of multicasting wildcard affinities, the Service Manager uses the VIPASMPARMS parameter SMMCAST and SMPORT to address the Forwarding Agents. A special authentication password may be defined as the parameter SMSPASSword. This restricts communication between the Service Manager and Forwarding Agent with matching values only based on MD5 (message digest 5 protocol) only. The sample used is based on Cisco router default values. The same multicast IP address and the port have to be defined in the Forwarding Agents.

9.6 TCP/IP stack of the target systems The TCPIP.PROFILE definitions for applications in the target stacks are as usual. There are no changes.

9.6.1 TCPIP.PROFILE definitions The following are the definitions for the applications we used in our tests. The definitions are for system MVS062. Example 9-13 TCPIP.PROFILE definitions for a target stack ;****************************************************************** ; PROCS TO AUTOSTART - PROCS RESIDE IN A66AAA.NCPSVT.PROCLIB ;****************************************************************** ; AUTOLOG FTPMVS ; OE FTP Server TNOEA ; OE TELNET Server OSNMPD ; SNMP AGENT OMPROUTE ; OE RouteD Server WEBSDB2 ; Web Server HODSERV5 ; HOD Admin Server ENDAUTOLOG ; ;****************************************************************** ; WELL KNOWN PORTS - Based on RFCs ;****************************************************************** ; PORT 20 TCP OMVS NOAUTOLOG ; FTP server data port 21 TCP OMVS ; FTP server control port 23 TCP INTCLIEN ; TELNET Server - Base 523 TCP INTCLIEN ; TELNET Server - Secure 623 TCP OMVS ; Telnet Server

282

Networking with z/OS and Cisco Routers: An Interoperability Guide

161 162 520 520 80 443

UDP OSNMPD UDP OMVS TCP OMPROUTE NOAUTOLOG UDP OMPROUTE NOAUTOLOG TCP OMVS TCP OMVS

; SNMP AGENT ; SNMP CLIENT ; OE RouteD Server ; OE RouteD Server ; Web Server ; Web Server

;

9.6.2 Basic TCPIP.PROFILE definitions Basically, the following IPCONFIG parameters have to be defined for the Sysplex Distributor besides the DYNAMICXCF parameter already mentioned: DATAGRAMFWD

Enables rerouting IP packets to another TCP/IP stack.

IGNOREREDIRECT

Enabled automatically when OMPROUTE is used.

VARSUBNETTING

To use variable subnet masks.

MULTIPATH

Enables multipath selection for outbound traffic.

PATHMTUDISCOVERY

Discovers dynamically the minimum transfer unit of each hop to the destination.

SYSPLEXROUTING

Enables the TCP/IP stack to communicate with the WLM.

Other IPCONFIG parameters have to be considered, depending on the existing configuration.

9.7 Forwarding Agent definitions The Forwarding Agent will accept multicast messages sent by the Service Manager with destination IP address 224.0.1.2, and listen to messages on port 1637, which is reserved for CASA UDP messages. These two definitions have to match the Service Manager. See 9.5.1, “Service Manager new TCPIP.PROFILE definitions” on page 281. The CASA protocol in the Forwarding Agents are defined in the Cisco routers. Our network used two Forwarding Agents. The Forwarding Agent is implemented in Cisco router 7507 and 7206VXR.

9.7.1 CASA definitions for Cisco 7507 There are three important lines only: ip multicast-routing

Chapter 9. Load distribution with MNLB and Sysplex Distributor

283

ip casa 1.1.1.1 224.0.1.2 forwarding-agent 1637

These lines indicate: 򐂰 Multicast routing is enabled. 򐂰 CASA is enabled using a device-independent IP address. In our implementation, we used 1.1.1.1 as the unicast IP address. The IP address for receiving multicast packets is 224.0.1.2. 򐂰 The Forwarding Agent listens on port 1637 for destined packets and uses the same port number as the source port. Basically, CASA uses UDP packets only for port 1637.

9.7.2 CASA definitions for Cisco router 7206VXR There are similar definitions for the Cisco router 7206VXR.

ip multicast-routing ip casa 1.1.1.2 224.0.1.2 forwarding-agent 1637

In order to address unicast packets, this router needs another device-independent IP address. In our implementation for this router we used 1.1.1.2. Again it receives multicast packets with the IP address 224.0.1.2 and the listening port is 1637, which is also a source port.

9.8 Operations: control and displays In order to control whether the Service Manager definitions are defined correctly and applied as desired, there are a variety of displays, which are explained in the following sections. During our tests, we checked the requested information using the following displays and trace extracts. The displays are grouped as: 򐂰 CASA information in the Sysplex Distributor 򐂰 CASA information in the Cisco router 򐂰 Integrated CASA information for the Sysplex Distributor and Cisco router using the sample of closing a TCP connection

284

Networking with z/OS and Cisco Routers: An Interoperability Guide

9.8.1 CASA information in the Sysplex Distributor Multicast address and packet distribution flow The defined multicast IP address and the port address may be checked by displaying the DVIPA configuration using the following commands: netstat vipadyn netstat vipadcfg

The display of the distributed VIPA port table shows which cluster addresses with associated port addresses the Sysplex Distributor will distribute to available target hosts. This display will be obtained using the command netstat vdpt. The VIPA connection routing tables provides the information about currently distributed TCP connections. You get the information using the command netstat vcrt. Further, we used to check the data flow of the multicast messages and the distribution flow through an IPCS trace. Excerpts of the trace follow.

Dynamic VIPA configuration, part 1 => netstat vipadyn MVS TCP/IP NETSTAT CS V1R2 IP Address AddressMask -------------------9.67.156.25 255.255.255.248 9.67.156.26 255.255.255.248 9.67.156.33 255.255.255.248 9.67.156.34 255.255.255.248 9.67.156.41 255.255.255.248 9.67.156.42 255.255.255.248 9.67.156.49 255.255.255.248 9.67.156.50 255.255.255.248 9.67.157.17 255.255.255.248 9.67.157.18 255.255.255.248 ***

TCPIP NAME: TCP Status Origination ---------------Active VIPADefine Active VIPADefine Backup VIPABackup Backup VIPABackup Backup VIPABackup Backup VIPABackup Backup VIPABackup Backup VIPABackup Active VIPADefine Active VIPADefine

18:00:42 DistStat --------

Dist Dist/Dest

The last two entries contain the cluster IP addresses that may be distributed. To determine to which available target system TCP connection requests may be distributed, use the netstat vdpt command. The current VIPA configuration including the Service Manager definitions for the multicast IP address and the port number will be shown by using the command netstat vipadcfg (see the following screen output). Chapter 9. Load distribution with MNLB and Sysplex Distributor

285

The multicast address and the port number have to match the definitions in the Cisco routers running the Forwarding Agent. See 9.5.1, “Service Manager new TCPIP.PROFILE definitions” on page 281.

Dynamic VIPA configuration part 2 => netstat vipadcfg MVS TCP/IP NETSTAT CS V1R2 Dynamic VIPA Information: VIPA Backup: IP Address Rank ------------9.67.156.33 000001 9.67.156.34 000001 9.67.156.41 000001 9.67.156.42 000001 9.67.156.49 000100 9.67.156.50 000100 VIPA Define: IP Address ---------9.67.156.25 9.67.156.26 9.67.157.17 9.67.157.18 VIPA Distribute: IP Address ---------9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.18 9.67.157.18 9.67.157.18 9.67.157.18

AddressMask ----------255.255.255.248 255.255.255.248 255.255.255.248 255.255.255.248

Moveable -------Immediate Immediate Immediate Immediate

SrvMgr -----No No Yes Yes

Port XCF Address -------------00023 9.67.156.75 00023 9.67.156.74 00080 9.67.156.75 00080 9.67.156.74 00443 9.67.156.75 00443 9.67.156.74 00523 9.67.156.75 00523 9.67.156.74 00020 9.67.156.74 00020 9.67.156.73 00021 9.67.156.74 00021 9.67.156.73

VIPA Service Manager: McastGroup: 224.0.1.2

286

TCPIP NAME: TCP 18:06:48

Port: 01637

Networking with z/OS and Cisco Routers: An Interoperability Guide

Pwd: No

VIPA distribution port table The VIPA distribution port table obtained using the netstat vdpt command lists cluster IP addresses with the different port addresses the Sysplex Distributor is responsible for. Each cluster address is defined in the TCPIP.PROFILE using the statements VIPADEFINE, VIPADIST, and DESTIP.

==> netstat vdpt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Distribution Port Table: Dest IPaddr DPort DestXCF Addr Rdy TotalConn --------------- -------------- --------9.67.157.17 00023 9.67.156.74 001 0000000000 9.67.157.17 00023 9.67.156.75 001 0000000000 9.67.157.17 00080 9.67.156.74 001 0000000000 9.67.157.17 00080 9.67.156.75 001 0000000000 9.67.157.17 00443 9.67.156.74 001 0000000000 9.67.157.17 00443 9.67.156.75 000 0000000000 9.67.157.17 00523 9.67.156.74 001 0000000000 9.67.157.17 00523 9.67.156.75 001 0000000000 9.67.157.18 00020 9.67.156.73 000 0000000001 9.67.157.18 00020 9.67.156.74 000 0000000000 9.67.157.18 00021 9.67.156.73 001 0000000001 9.67.157.18 00021 9.67.156.74 001 0000000000

Rdy

14:50:09 WLM --01 01 01 01 01 01 01 01 01 01 01 01

Counts the number of applications ready to receive connection requests.

TotalConn Lists the total number of connection already routed to the stack identified by the XCF link IP address. WLM

Is the WLM weight of the target stack. The lowest value is the first to be used for distribution.

Detailed information about the VIPA distribution port table may be obtained using the command netstat vdpt det.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

287

===> netstat vdpt detail MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Distribution Port Table: Dest IPaddr DPort DestXCF Addr Rdy TotalConn --------------- -------------- --------9.67.157.17 00023 9.67.156.74 001 0000000000 QosPlcAct: *DEFAULT* 9.67.157.17 00023 9.67.156.75 001 0000000000 QosPlcAct: *DEFAULT* 9.67.157.17 00080 9.67.156.74 001 0000000000 QosPlcAct: *DEFAULT* 9.67.157.17 00080 9.67.156.75 001 0000000000 QosPlcAct: *DEFAULT* 9.67.157.17 00443 9.67.156.74 001 0000000000 QosPlcAct: *DEFAULT* 9.67.157.17 00443 9.67.156.75 000 0000000000 QosPlcAct: *DEFAULT* 9.67.157.17 00523 9.67.156.74 001 0000000000 QosPlcAct: *DEFAULT* 9.67.157.17 00523 9.67.156.75 001 0000000000 QosPlcAct: *DEFAULT* 9.67.157.18 00020 9.67.156.73 000 0000000000 QosPlcAct: *DEFAULT* 9.67.157.18 00020 9.67.156.74 000 0000000000 QosPlcAct: *DEFAULT* 9.67.157.18 00021 9.67.156.73 001 0000000000 QosPlcAct: *DEFAULT* 9.67.157.18 00021 9.67.156.74 001 0000000000 QosPlcAct: *DEFAULT*

288

20:17:50 WLM --01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01 01 W/Q: 01

W/Q

Is the WLM weight after modification using QoS information provided by the Policy Agent. This information is an indication of the network performance (TCP retransmissions and timeouts) for the display of a QoSPolicyAction.

QoSPlcAct

Is the QoS Policy name configured to the Policy Agent.

Networking with z/OS and Cisco Routers: An Interoperability Guide

VIPA connection routing table => netstat vcrt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort ---------------------------9.67.157.18 00021 9.67.155.223 01034 9.67.157.18 00021 9.67.156.104 03494 9.67.157.17 00023 9.67.156.104 03395 9.67.157.17 00023 9.67.156.104 03460 9.67.157.17 00023 9.67.156.104 03472 9.67.157.17 00080 9.67.156.104 03469

14:48:53 DestXCF Addr -----------9.67.156.73 9.67.156.73 9.67.156.74 9.67.156.75 9.67.156.74 9.67.156.74

The netstat vcrt command shows current TCP connections distributed by Sysplex Distributor to the target systems identified through their XCF link IP address. These connections are addressed to the following: 򐂰 FTP server with its cluster address 9.67.157.18 򐂰 TN3270 server with its cluster address 9.67.157.17

TCP connections to FTP server One connection was initiated by a client from a workstation with the IP address 9.67.155.223, and the other connection from workstation 9.67.156.104. Both connections were distributed to an FTP server running in the same TCP/IP stack as the Sysplex Distributor.

TCP connections to TN3270 server All connections were initiated by a client from one workstation with IP address 9.67.156.104. Three connections were distributed to the TN3270 server running in a TCP/IP stack with IP address 9.67.156.74 and one connection to 9.67.156.75. Detailed information may be obtained using the netstat vcrt detail command. This display would also show Policy Agent information. In our test case, we did not define policy rules.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

289

=> netstat vcrt detail MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP 20:54:59 Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort DestXCF Addr --------------- -------------- -----------9.67.157.17 00023 211.1.2.14 11033 9.67.156.74 PolicyRule: *NONE* PolicyAction: *NONE*

***

Trace details 1. Multicast messages from the Sysplex Distributor Service Manager Multicast of a wildcard affinity-update affinity: This trace record shows multicast messages sent from the Service Manager to the network on each defined interface shown in the HOME list. This includes XCF links and IUTSAMEH links to stacks within the same LPAR. Multicast messages will be sent in a cycle of 30 seconds. There is no statement in the TCPIP.PROFILE to change the value nor to block the interface from sending multicast packets.

290

Networking with z/OS and Cisco Routers: An Interoperability Guide

238 MVS001 PACKET 00000001 14:31:59.255980 Packet Trace To Link : GIGELINK Device: QDIO Ethernet Full=96 Tod Clock : 2001/07/24 14:31:59.255978 Lost Records : 0 Flags: Pkt Ver2 Out Source Port : 1637 3 Dest Port: 1637 Asid: 0036 TCB:0000 IpHeader: Version : 4 Header Length: 20 Tos : 00 QOS: Routine Normal Service Packet Length : 96 ID Number: 626B Fragment : Offset: 0 TTL : 1 Protocol: UDP CheckSum: CF5B Source : 9.67.157.129 1 Destination : 224.0.1.2 2 SGI-Dogfight UDP 4 Source Port Datagram Length

: 1637 : 76

()

3

Destination Port: 1637 CheckSum: 049B FFFF

()

IP Header : 20 IP: 9.67.157.129, 224.0.1.2 000000 45000060 626B0000 0111CF5B 09439D81 E0000102 Protocol Header : 8 Port: 1637, 1637 000000 06650665 004C049B Data 000000 000010 000020 000030 000040

5 00010101 00000000 00000000 85040014 00000000

: 68 81010024 09439D11 003C0000 09439C49

Data Length: 68 00000006 00000000 !....a........... FFFFFFFF 00000050 !...............& 05050008 05040000 !................ 06658100 81000000 !e.........a.a... !....

.......$.. .....C.P! ..... . 09439C68 09439D11 0AD30017 !".9..........L.. 00000000 70024000 EF9D0000 !..f....... ..... 01010402 !........

Networking with z/OS and Cisco Routers: An Interoperability Guide

()

.......4E0.0.n§. ...).C.h.C...... vt.#....p.§..... ........

3. Sysplex Distributor sent IP packet with SYN request to target system

300 MVS001 PACKET 00000001 17:54:26.609824 Packet Trace To Link : EZAXCFN5 Device: MPCPTP Full=48 Tod Clock : 2001/07/16 17:54:26.609824 Lost Records : 0 Flags: Pkt Ver2 Out Source Port : 2771 Dest Port: 23 Asid: 0025 TCB:000 IpHeader: Version : 4 Header Length: 20 Tos : 30 QOS: Priority MinimumDelay Packet Length : 48 ID Number: B66E Fragment : DontFragment Offset: 0 TTL : 126 Protocol: TCP CheckSum: FA29 Source : 9.67.156.104 Destination : 9.67.157.17 TCP Source Port Sequence Number Header Length Window Size Option Option Option Option

: : : : : : : :

2771 () Destination Port: 23 telnet) 1987348003 Ack Number: 0 28 Flags: Syn 16384 CheckSum: EF9D FFFF Urgent Data P Max Seg Size Len: 4 MSS: 1460 NOP NOP SACK Permitted

IP Header : 20 IP: 9.67.156.104, 9.67.157.17 000000 45300030 B66E4000 7E06FA29 09439C68 09439D11 Protocol Header : 28 Port: 2771, 23 000000 0AD30017 76748623 00000000 70024000 EF9D0000 020405B4 01010402

4. Sysplex Distributor sent fixed affinity update to Forwarding Agent A unicast message is used to send a CASA packet from XCF link address of the Sysplex Distributor, which is 9.67.156.73, to the Forwarding Agent with its IP address 1.1.1.2. Only the Forwarding Agent that sent the SYN to the Service Manager will receive the CASA affinity update.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

295

301 MVS001 PACKET To Link : Tod Clock : Lost Records : Source Port : IpHeader: Version : Tos : Packet Length : Fragment : TTL : Source : Destination : UDP Source Port Datagram Length

00000001 17:54:26.610488 Packet Trace GIGELINK Device: QDIO Ethernet Full=80 2001/07/16 17:54:26.610487 0 Flags: Pkt Ver2 Out 1637 Dest Port: 1637 Asid: 0025 TCB:0000 4 Header Length: 20 00 QOS: Routine Normal Service 80 ID Number: 512D Offset: 0 1 Protocol: UDP CheckSum: C0E1 9.67.156.73 1.1.1.2

: 1637 : 60

()

Destination Port: 1637 CheckSum: 97E9 FFFF

()

IP Header : 20 IP: 9.67.156.73, 1.1.1.2 000000 45000050 512D0000 0111C0E1 09439C49 01010102 Protocol Header : 8 Port: 1637, 1637 000000 06650665 003C97E9 Data 000000 000010 000020 000030

: 52 Data Length: 52 00010102 8102001C 00020006 09439C68 !....a........ .........C.h! 09439D11 0AD30017 09439C4A 03840000 !.....L..Ä.d.. .C...C.J....! 85040014 09439C49 06650002 00020000 !e............ .....e......! 00000000 !.... ....

Trace data of the CASA fixed affinity update packet Line 000000:

296

00010102

Defines a CASA message fixed affinity-update affinity

8102001C

Specifies the message type

00020006

Specifies flags and the protocol, which is x’06’ = TCP in the IP header

09439C68

Defines the source IP address 9.67.156.104 which is the workstation as sender of the SYN

Networking with z/OS and Cisco Routers: An Interoperability Guide

Line 000010: 09439D11

Defines the destination IP address is 9.67.157.17, the distributed DVIPA, the cluster IP address of the application service

0AD3

Source port of the client in the workstation

0017

Destination port for the TCP connection, port 23 = TN3270 services

09439C4A

Forward IP address of the real target system running this specific TCP connection; it is the XCF link IP address defined in TCPIP.PROFILE statement VIPADEFINE, parameter VIPADIST, parameter DESTIP

0384

Defines the time-to-live (TTL) value of this fixed affinity-update affinity for this TCP connection; the value is 15 minutes

9.8.2 CASA information in the Forwarding Agent Some displays help you to determine whether the Forwarding Agent is communicating with the Sysplex Distributor Service Manager. During our tests we noticed that this communication did not work as we thought it would. In certain cases, the Service Manager did not send wildcard affinities. These displays allowed us to understand what was really happening. You may control the Forwarding Agent by issuing several displays (show commands) at the Cisco router. These are for example: Display what parameters are available

show ip casa ?

With no parameter

To display a list of subfunctions named in the next lines

With parameter affinities To display fixed affinities, which are the current TCP connections With parameter oper

To display operational information of casa

With parameter stats

To display statistical information about fixed affinities

With parameter wildcard

To display information on wildcard blocks

Chapter 9. Load distribution with MNLB and Sysplex Distributor

297

NIVT7507>show affinities oper stats wildcard

ip casa ? display info on fa affinities operational information for casa statistical information for fa display info on wildcard blocks

Display of CASA operational information This display shows that the CASA is defined with the corresponding multicast address and the listen port number. These have to match the definitions of the Sysplex Distributor TCPIP.PROFILE statement VIPASMPARMS with parameters SMMCAST and SMPORT. In addition, a CASA control address has to be defined to be used as IP address for unicasts to the Service Manager or vice versa. This address has to be unique for all Forwarding Agents. We used IP address 1.1.1.1 for the Forwarding Agent in the router 7507 and 1.1.1.2 in router 720VXR. This display also shows that CASA is operating in the Cisco router.

NIVT7507>show ip casa oper Casa is active: Casa control address is 1.1.1.1/32 Casa multicast address is 224.0.1.2 Listening on ports: Forwarding Agent Listen Port: 1637 Current passwd: NONE Pending passwd: NONE Passwd timeout: 180 sec (Default)

The next step should be to check if the Service Manager propagates wildcard affinities correctly.

Display of wildcard affinities We experienced a situation where the Service Manager did not function properly. When we wanted to see the wildcard affinities propagated by the Service Manager, we received the following display result.

NIVT7507>show ip casa affinities No matching entries in affinity cache NIVT7507>

298

Networking with z/OS and Cisco Routers: An Interoperability Guide

After fixing the problem we received the expected result. The following display shows the wildcard affinities multicasted by the Service Manager to the Forwarding Agents as depicted in the Cisco router 7507.

NIVT7507>show ip casa wild Source Address Source Mask 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0

Port 0 0 0 0 0 0

Dest Address 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.18 9.67.157.18

Dest Mask 255.255.255.255 255.255.255.255 255.255.255.255 255.255.255.255 255.255.255.255 255.255.255.255

Port 523 443 80 23 21 20

Prot TCP TCP TCP TCP TCP TCP

The source IP address of 0.0.0.0 allows the Forwarding Agent to accept any incoming IP packet with any port number of the protocol TCP. The destination IP address represents the cluster address for the application service defined by the port number. For example, port 23 for TN3270E service, 523 for secure TN3270E service, 80 for Web services, 443 for secure Web services, 21 and 20 for FTP service. The subnet mask limits the match to the whole 32-bit IP address. This wildcard affinity block is sent to the Forwarding Agents from the Service Manager every 30 seconds over each interface of the Sysplex Distributor.

Display of wildcard affinities detailed information This display is an excerpt of the block displayed in the previous screen. It provides more detailed information. For example: 򐂰 Interest address:

The interest address is the IP address of the Sysplex Distributor’s XCF link address 9.67.156.73. 򐂰 Port

The port 1637 is the CASA listening port of the Sysplex Distributor. 򐂰 Interest packet

All packets including fragmented IP packets are accepted by the Forwarding Agent. 򐂰 Dispatch address

The dispatch address is the IP address for a target server running the desired application. At the moment no dispatch address is known. A dispatch address will be determined after processing a SYN request in the Sysplex Distributor.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

299

The dispatch address will be determined based on load-balancing rules. This dispatch address is sent via unicast message by the Sysplex Distributor to the Forwarding Agent when the target system is known.

NIVT7507>show ip casa wild det Source Address Source Mask Port Dest Address Dest Mask Port Prot 0.0.0.0 0.0.0.0 0 9.67.157.17 255.255.255.255 523 TCP Service Manager Details: Manager Addr: 9.67.156.17 Insert Time: 01:22:57 UTC 07/11/01 Affinity Statistics: Affinity Count: 0 Interest Packet Timeouts: 0 Packet Statistics: Packets: 0 Bytes: 0 Advertise Dest Address: NO Match Fragments: YES Affinity TTL: 30 Action Details: Interest Addr: 9.67.156.73 Interest Port: 1637 Interest Packet: 0x8100 FRAG ALLPKTS Interest Tickle: 0x0000 Dispatch (Layer 2): NO Dispatch Address: 0.0.0.0 --More-Source Address Source Mask Port Dest Address Dest Mask Port Prot 0.0.0.0 0.0.0.0 0 9.67.157.17 255.255.255.255 443 TCP Service Manager Details: Manager Addr: 9.67.156.17 Insert Time: 01:22:57 UTC 07/11/01 Affinity Statistics: Affinity Count: 0 Interest Packet Timeouts: 0 Packet Statistics: Packets: 0 Bytes: 0 Advertise Dest Address: NO Match Fragments: YES Affinity TTL: 30 Action Details: Interest Addr: 9.67.156.73 Interest Port: 1637 Interest Packet: 0x8100 FRAG ALLPKTS Interest Tickle: 0x0000 Dispatch (Layer 2): NO Dispatch Address: 0.0.0.0 Source Address Source Mask Port Dest Address Dest Mask Port Prot 0.0.0.0 0.0.0.0 0 9.67.157.17 255.255.255.255 80 TCP Service Manager Details: Manager Addr: 9.67.156.17 Insert Time: 01:22:57 UTC 07/11/01 Affinity Statistics: Affinity Count: 0 Interest Packet Timeouts: 0 Packet Statistics:

--More--

300

Networking with z/OS and Cisco Routers: An Interoperability Guide

CASA fixed affinity This display shows the TCP connections currently established and known by the Forwarding Agent. As you can see, only the source and destination IP addresses and ports of the IP and TCP header are shown, not the real target system’s IP address. If you want to see the target system’s address, you have to use the show ip casa det (detail) command.

NIVT7507>show ip casa Source Address Port 9.67.156.104 3879 9.67.156.104 3890 NIVT7507>

aff Dest Address 9.67.157.17 9.67.157.18

Port 23 21

Prot TCP TCP

Our sample shows two established TCP connections: 1. A telnet connection from workstation 9.67.156.104 to the cluster address 9.67.157.17. This connection is distributed to the target system with XCF link address (dispatch address) 9.57.156.74. 2. An FTP connection from workstation 9.67.156.104 to the cluster address 9.67.157.18. This connection is distributed to the target system with XCF link address (dispatch address) 9.57.156.73. NIVT7507>show ip casa aff det Source Address Port Dest Address Port Prot 9.67.156.104 3879 9.67.157.17 23 TCP Action Details: Interest Addr: 9.67.156.73 Interest Port: 1637 Interest Packet: 0x0002 SYN Interest Tickle: 0x0000 Dispatch (Layer 2): YES Dispatch Address: 9.67.156.74 Source Address Port Dest Address Port Prot 9.67.156.104 3890 9.67.157.18 21 TCP Action Details: Interest Addr: 9.67.156.73 Interest Port: 1637 Interest Packet: 0x0002 SYN Interest Tickle: 0x0000 Dispatch (Layer 2): YES Dispatch Address: 9.67.156.73

The detailed fixed affinity information in the Forwarding Agent may be compared with the information of the Sysplex Distributor obtained using the command netstat vcrt.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

301

CASA statistics The following screenprovides information about the activities of the CASA protocol.

NIVT7507>show ip casa stats Casa is active: Wildcard Stats: Wildcards: 6 Wildcard Denies: 0 Pkts Throughput: 0 Affinity Stats: Affinities: 0 Cache Hits: 0 Affinity Drops: 0 Casa Stats: Int Packet: 0 Casa Denies: 0 Security Drops: 0

Max Wildcards: Wildcard Drops: Bytes Throughput:

6 0 0

Max Affinities: 0 Cache Misses: 0 Interest Packet Timeouts: 0 Int Tickle: Drop Count:

0 0

9.8.3 Integrated CASA information This section provides information about the control of operations at the time a TCP connection is closed. The displays are intentionally not separated as in the two previous sections, where the focus was mainly on describing fundamentals of each system. This section describes control situations viewed from a time aspect at both systems, the Sysplex Distributor and the router, at one time. The section includes displays issued at the Sysplex Distributor and at the router to control a process over a time period. It also provides a trace flow to understand the activities. We selected as a sample the process of a finishing a TCP connection.

A sample connection A TN3270 connection was established between a client on workstation 9.67.156.104 and TN3270 server cluster address 9.67.157.17. The Sysplex Distributor on system MVS001 distributed the connection to target system 9.67.156.17 on system MVS154. The following screens show the steps in closing the TN3270 connection: 򐂰 Display the connection via netstat con on MVS154 򐂰 Display the distributed VIPA connection routing table via netstat vcrt on Sysplex Distributor on system MVS001

302

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Display the fixed affinity table in router 7206 using the show ip casa affinities det command 򐂰 Analyze trace records obtained from system MVS154

– FIN sent from client to server, to signal the start of the connection process – ACK sent from server to client, to acknowledge the reception of the FIN – FIN sent from server to client, to signal that second part of closing the connection is started – ACK sent from client server, to finally finish the connection flow and thus the connection is closed 򐂰 Check if the connection no longer exists using the netstat con command on MVS154 򐂰 Check if the VIPA connection routing table is cleared from the connection using the netstat vcrt command 򐂰 Check if fixed affinity is deleted in the router using the show ip casa affinities det command

Display the existing connection The display shows the current connection in the target system MVS154. The information of the TCP connection between client and server is highlighted.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

303

=> netstat conn MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP User Id Conn Local Socket Foreign Socket ------- ---------------------------BPXOINIT 0000001D 0.0.0.0..10007 0.0.0.0..0 ENDPT40 0000003C 0.0.0.0..10115 0.0.0.0..0 FTPMVS1 00000023 0.0.0.0..21 0.0.0.0..0 HODSERV5 0000002B 0.0.0.0..8999 0.0.0.0..0 ................ OMPROUTE 00000022 127.0.0.1..1027 127.0.0.1..1028 OSNMPD 00000021 0.0.0.0..1029 0.0.0.0..0 TCP 0000004E 9.67.157.17..23 9.67.156.104..4202 TCP 00000010 0.0.0.0..523 0.0.0.0..0 TCP 0000000B 127.0.0.1..1025 0.0.0.0..0 TCP 0000000E 127.0.0.1..1026 127.0.0.1..1025 TCP 00000011 0.0.0.0..23 0.0.0.0..0 TCP 0000001C 127.0.0.1..1028 127.0.0.1..1027 TCP 0000000F 127.0.0.1..1025 127.0.0.1..1026 WEBSDB2 00000019 0.0.0.0..80 0.0.0.0..0 ENDPT40 0000003B 0.0.0.0..10115 *..* INETD1 00000041 0.0.0.0..7 *..* INETD1 0000003F 0.0.0.0..19 *..* INETD1 00000040 0.0.0.0..9 *..* ..............

State ----Listen Listen Listen Listen Establsh Listen Establsh Listen Listen Establsh Listen Establsh Establsh Listen UDP UDP UDP UDP

***

The VIPA connection routing table in the Sysplex Distributor stack of system MVS001 also shows this connection.

===> netstat vcrt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort DestXCF Addr --------------- -------------- -----------9.67.157.17 00023 9.67.156.104 04202 9.67.156.75 ***

The router has a fixed affinity for the connection pointing to the same IP address for the target system, the MVS154. The interest address points to the Sysplex Distributor’s Service Manager. The dispatch address points to the target system MVS154.

304

Networking with z/OS and Cisco Routers: An Interoperability Guide

C7200-Z55>show ip casa aff det Source Address Port Dest Address Port Prot 9.67.156.104 4202 9.67.157.17 23 TCP Action Details: Interest Addr: 9.67.156.73 Interest Port: 1637 Interest Packet: 0x0002 SYN Interest Tickle: 0x0000 Dispatch (Layer 2): YES Dispatch Address: 9.67.156.75 --More-C7200-Z55>

Trace of closing the TN3270 connection This trace shows the actions between the client and the target stack on system MVS154 when the client starts to finish the TCP connection with a TN3270 server. The trace is recorded at system MVS154, at the endpoint of the TCP connection. All trace records have record numbers starting with record 503. The direction of the data flow is indicated as follows: Inbound traffic to MVS154

From Link:

Outbound traffic to router

To

:

GIGELINK GIGELINK

Inbound traffic records are shortened intentionally, in order to take some complexity from the shown trace. Inbound traffic requires in addition a Generic Routing Encapsulation (GRE) protocol between router and target stack. Trace data for this part is not shown here. You may read more about GRE in 9.10, “Generic Routing Encapsulation (GRE) protocol” on page 325.

Trace record 503 The trace starts with record 503, listing a FIN sent from a client to the application server to start closing the connection. Client IP address

Source address: 9.67.156.104

Client port number

Port: 4202

TN3270 server IP address

Destination address: 9.67.157.17 (cluster IP address)

TN3270 port

Port: 23 (telnet)

Client sequence number

1118528899 for the FIN to server

ACK sequence number

1627793504 for the last data from server

Chapter 9. Load distribution with MNLB and Sysplex Distributor

305

Example 9-14 Trace record 503: FIN sent from client Start of FIN trace recorded at system MVS154 -----------------------------------------------------------------------------503 MVS154 PACKET 00000001 16:31:46.482764 Packet Trace From Link : GIGELINK Device: QDIO Ethernet Full=64 Tod Clock : 2001/07/27 16:31:46.482763 -----------------------------------------------------------------------------GRE part is deleted here, to avoid the complexity of the trace -----------------------------------------------------------------------------IpHeader: Version : 4 Header Length: 20 Tos : 30 QOS: Priority MinimumDelay Packet Length : 40 ID Number: 1591 Fragment : DontFragment Offset: 0 TTL : 127 Protocol: TCP CheckSum: 9A0F Source : 9.67.156.104 Destination : 9.67.157.17 TCP Source Port Sequence Number Header Length Window Size

: 4202 () : 1118528899 : 20 : 17149

Destination Port: 23 (telnet) Ack Number: 1627793504 Flags: Ack Fin CheckSum: DEC0 FFFF Urgent Data Pointer:

IP Header : 20 IP: 9.67.156.104, 9.67.157.17 000000 45300028 15914000 7F069A0F 09439C68 09439D11 Protocol Header : 20 Port: 4202, 23 000000 106A0017 42AB6583 61062860 501142FD DEC00000

Trace record 504 Trace record 504 lists an acknowledgment (ACK) only from the server as response to the previous FIN from the client. PSH means PUSH, to force the IP stack to send this packet immediately to the network and not wait for other data to fill the output buffer before sending the packet to the client.

306

TN3270 server IP address

Source address: 9.67.157.17 (again cluster IP address)

TN3270 port

Port: 23 (telnet)

Client IP address

Destination address: 9.67.156.104

Client port number

Port: 4202

Server sequence number:

1627793504 (remains the same because no data was sent)

ACK sequence number

1118528900 for the ACK to client (1 byte higher than sent from the client)

Networking with z/OS and Cisco Routers: An Interoperability Guide

504 MVS154 PACKET To Link : Tod Clock : Lost Records : Source Port : IpHeader: Version : Tos : Packet Length : Fragment : TTL : Source : Destination : TCP Source Port Sequence Number Header Length Window Size

: : : :

00000001 16:31:46.482962 Packet Trace GIGELINK Device: QDIO Ethernet Full=40 2001/07/27 16:31:46.482962 0 Flags: Pkt Ver2 Out 23 Dest Port: 4202 Asid: 0034 TCB: 0000000 4 Header Length: 20 30 QOS: Priority MinimumDelay 40 ID Number: 0B06 DontFragment Offset: 0 64 Protocol: TCP CheckSum: E39A 9.67.157.17 9.67.156.104

23 (telnet) 1627793504 20 32760

Destination Port: 4202 () Ack Number: 1118528900 Flags: Ack Psh CheckSum: A1BD FFFF Urgent Data Pointer:

IP Header : 20 IP: 9.67.157.17, 9.67.156.104 000000 45300028 0B064000 4006E39A 09439D11 09439C68 Protocol Header : 20 Port: 23, 4202 000000 0017106A 61062860 42AB6584 50187FF8 A1BD0000

Figure 9-4 Trace record 504: ACK sent from server

Trace record 505 Trace record 505 lists a FIN from the server to close the second part of the TCP connection. PSH means PUSH, to force the IP stack to send this packet immediately to the network and not wait for other data to fill the output buffer before sending the packet to the client. TN3270 server IP address

Source address: 9.67.157.17 (again cluster IP address)

TN3270 port

Port: 23 (telnet)

Client IP address

Destination address: 9.67.156.104

Client port number

Port: 4202

Server sequence number:

1627793504 (remains the same because no data was sent)

ACK sequence number

1118528900 for the ACK to client (remains the same because no data came from the other end of the connection)

Chapter 9. Load distribution with MNLB and Sysplex Distributor

307

505 MVS154 PACKET To Link Tod Clock Lost Records Source Port IpHeader: Version Tos Packet Length Fragment TTL Source Destination TCP Source Port Sequence Number Header Length Window Size

: : : : :

00000001 16:31:46.484090 Packet Trace GIGELINK Device: QDIO Ethernet Full=40 2001/07/27 16:31:46.484089 0 Flags: Pkt Ver2 Out 23 Dest Port: 4202 Asid: 0034 TCB: 007D42E 4 Header Length: 20 : 30 QOS: Priority MinimumDelay 40 ID Number: 0B07 DontFragment Offset: 0 64 Protocol: TCP CheckSum: E399 9.67.157.17 9.67.156.104

: : : :

23 (telnet) 1627793504 20 32760

: : : : :

Destination Port: 4202 () Ack Number: 1118528900 Flags: Ack Psh Fin CheckSum: A1BC FFFF Urgent Data Pointer:

IP Header : 20 IP: 9.67.157.17, 9.67.156.104 000000 45300028 0B074000 4006E399 09439D11 09439C68 Protocol Header : 20 Port: 23, 4202 000000 0017106A 61062860 42AB6584 50197FF8 A1BC0000

Figure 9-5 Trace record 505: FIN sent from server

Trace record 506 Trace record 506 lists an ACK sent from the client to the application server to respond to the FIN and thus the TCP connection is totally closed.

308

Client IP address

Source address: 9.67.156.104

Client port number

Port: 4202

TN3270 server IP address

Destination address: 9.67.157.17 (cluster IP address)

TN3270 port

Port: 23 (telnet)

Client sequence number:

1118528900 for the ACK to server (remains the same, because no data was sent)

ACK number

1627793505 (is increased by 1, caused by having received the FIN from the server)

Networking with z/OS and Cisco Routers: An Interoperability Guide

506 MVS154 PACKET 00000001 16:31:46.489254 Packet Trace From Link : GIGELINK Device: QDIO Ethernet Full=64 Tod Clock : 2001/07/27 16:31:46.489253 Lost Records : 0 Flags: Pkt Ver2 Gre -----------------------------------------------------------------------------GRE part is deleted here, to avoid the complexity of the trace -----------------------------------------------------------------------------IpHeader: Version : 4 Header Length: 20 Tos : 30 QOS: Priority MinimumDelay Packet Length : 40 ID Number: 1592 Fragment : DontFragment Offset: 0 TTL : 127 Protocol: TCP CheckSum: 9A0E Source : 9.67.156.104 Destination : 9.67.157.17 TCP Source Port Sequence Number Header Length Window Size

: : : :

4202 () 1118528900 20 17149

Destination Port: 23 (telnet) Ack Number: 1627793505 Flags: Ack CheckSum: DEBF FFFF Urgent Data Pointer:

IP Header : 20 IP: 9.67.156.104, 9.67.157.17 000000 45300028 15924000 7F069A0E 09439C68 09439D11 Protocol Header : 20 Port: 4202, 23 000000 106A0017 42AB6584 61062861 501042FD DEBF0000 -----------------------------------------------------------------------------End of FIN trace

Figure 9-6 Trace record 506: ACK sent from client

Display target system, SD, and router after closing connection There is no connection on MVS154.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

309

===> netstat conn MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP User Id Conn Local Socket Foreign Socket ------- ---------------------------BPXOINIT 0000001D 0.0.0.0..10007 0.0.0.0..0 ENDPT40 0000003C 0.0.0.0..10115 0.0.0.0..0 FTPMVS1 00000023 0.0.0.0..21 0.0.0.0..0 HODSERV5 0000002B 0.0.0.0..8999 0.0.0.0..0 INETD1 0000004B 0.0.0.0..623 0.0.0.0..0 INETD1 00000044 0.0.0.0..19 0.0.0.0..0 OMPROUTE 00000022 127.0.0.1..1027 127.0.0.1..1028 OSNMPD 00000021 0.0.0.0..1029 0.0.0.0..0 TCP 00000010 0.0.0.0..523 0.0.0.0..0 TCP 0000000B 127.0.0.1..1025 0.0.0.0..0 TCP 0000000E 127.0.0.1..1026 127.0.0.1..1025 TCP 00000011 0.0.0.0..23 0.0.0.0..0 TCP 0000001C 127.0.0.1..1028 127.0.0.1..1027 TCP 0000000F 127.0.0.1..1025 127.0.0.1..1026 WEBSDB2 00000019 0.0.0.0..80 0.0.0.0..0 OSNMPD 00000020 0.0.0.0..161 *..*

State ----Listen Listen Listen Listen Listen Listen Establsh Listen Listen Listen Establsh Listen Establsh Establsh Listen UDP

No entry is in the VIPA connection routing table in the Sysplex Distributor of MVS001.

===> netstat vcrt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort DestXCF Addr --------------- -------------- -----------***

The router has also cleared the fixed affinity table.

C7200-Z55>show ip casa aff No matching entries in affinity cache C7200-Z55>

310

Networking with z/OS and Cisco Routers: An Interoperability Guide

9.9 Sysplex Distributor backup When the primary TCP/IP stack running the Sysplex Distributor fails, the backup stack with a secondary Sysplex Distributor automatically takes over all defined DVIPAs, including the distribution definitions such as VIPADIST and DESTIP. Prerequisite for this takeover process is that VIPABackup is defined on the backup stack(s). The stack with the highest defined rank automatically activates the DVIPA in a backup situation. Ranks may be defined from 1 (default) to 254; 1 is the lowest order in a backup chain.

9.9.1 TCPIP.PROFILE definitions These definitions were used on MVS069. Example 9-15 Sysplex Distributor DVIPA backup definitions VIPADYNAMIC ;---------------------------------------------------------; DISTRIBUTED VIPA BACKUP FOR WEB AND TN3270 ;---------------------------------------------------------VIPABACKUP 200 9.67.157.17 ;---------------------------------------------------------; DISTRIBUTED VIPA BACKUP FOR FTP ;---------------------------------------------------------VIPABACKUP 200 9.67.157.18 ; ENDVIPADYNAMIC

The backup stack will distribute new TCP connection requests based on the VIPADISTRIBUTE and DESTIP parameters of the original stack. Additional IPCONFIG statements in the backup Sysplex Distributor have to be defined, as shown in the following example. Example 9-16 IPCONFIG statements in the backup Sysplex Distributor DYNAMICXCF 9.67.156.76 255.255.255.248 DATAGRAMFWD SYSPLEXROUTING VARSUBNETTING PATHMTUDISCOVERY

Chapter 9. Load distribution with MNLB and Sysplex Distributor

311

9.9.2 Sysplex Distributor backup procedures This section gives an overview of the following: 򐂰 Activities of the backup Sysplex Distributor in a backup sequence when the primary Sysplex Distributor fails. 򐂰 Activities of the primary Sysplex Distributor in a recovery sequence when the primary Sysplex Distributor is restarted.

Sysplex Distributor backup sequence Sysplex Distributor (Service Manager)

Target Stacks XCF

1

Target Stack

MVS001

XCF 9.67.156.73 DVIPA FTP 9.67.157.18

Sysplex Distributor Backup

4

Target Stack 3

3

4 MVS069

MVS062

MVS154

XCF 9.67.156.76

XCF 9.67.156.74

XCF 9.67.156.75

Backup DVIPA FTP 9.67.157.18

OSA Express 9.67.157.129

5 Update VCRT

2

DVIPA FTP 9.67.157.18

DVIPA TN3270 9.67.157.17

Backup DVIPA TN3270 9.67.157.17

DVIPA TN3270 9.67.157.17

DVIPA TN3270 9.67.157.17

OSA Express 9.67.157.130

OSA Express 9.67.157.131

OSA Express 9.67.157.132

6 DVIPAs to Router

Multicast 7 Wildcard Affinities

Switch Router Forwarding Agent

Affinity Table Fixed Affinities

Router Forwarding Affinity Table Agent

Fixed Affinities

Network

Multicast 7 Wildcard Affinities

Login TN3270 (Port 23 Cached IP- Address is 9.67.157.17)

Figure 9-7 Sysplex Distributor takeover procedure

Notes to Figure 9-7:

1. The Distribution Stack (MVS001) fails. 2. The backup Distribution Stack (MVS069) automatically activates DVIPAs for which it is backup.

312

Networking with z/OS and Cisco Routers: An Interoperability Guide

3. The backup Distribution Stack (MVS069) informs target stacks (MVS062 and MVS154). 4. Target stacks (MVS062 and MVS154) inform the backup Distribution Stack (MVS069) about application server status on ports defined for distribution. 5. The backup Distribution Stack (MVS069) builds and maintains its destination port table (DPT) and its current routing table (CRT). 6. The backup Distribution Stack (MVS069) advertises to the network DVIPAs that it took over. 7. Wildcard affinities are multicasted to the Forwarding Agents propagating the new forward IP address, which is now the dynamic XCF link address 9.67.156.76 of the backup Sysplex Distributor (MVS069). Fixed affinities need not to be updated in the Forwarding Agents.

Backup SD MVS069 displays before the primary SD MVS001 failed The following display shows that the backup Sysplex Distributor has not activated yet the DVIPAs 9.67.156.17 and 9.67.156.18 of the primary Sysplex Distributor.

===> netstat vipadcfg MVS TCP/IP NETSTAT CS V1R2 Dynamic VIPA Information: VIPA Backup: IP Address Rank ------------9.67.156.25 000100 9.67.156.26 000100 9.67.156.33 000010 9.67.156.34 000010 9.67.156.41 000010 9.67.156.42 000010 VIPA Define: IP Address ---------9.67.156.49 9.67.156.50 ***

TCPIP NAME: TCP 17:39:21

AddressMask ----------255.255.255.248 255.255.255.248

Moveable -------Immediate Immediate

SrvMgr -----No No

The display of the VIPA configuration table of the primary Sysplex Distributor MVS001 shows that this stack is still Service Manager.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

313

===> netstat vipadcfg MVS TCP/IP NETSTAT CS V1R2 Dynamic VIPA Information: VIPA Backup: IP Address Rank ------------9.67.156.33 000001 9.67.156.34 000001 9.67.156.41 000001 9.67.156.42 000001 9.67.156.49 000100 9.67.156.50 000100

TCPIP NAME: TCP

VIPA Define: IP Address ---------9.67.156.25 9.67.156.26 9.67.157.17 9.67.157.18

AddressMask ----------255.255.255.248 255.255.255.248 255.255.255.248 255.255.255.248

VIPA Distribute: IP Address ---------9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.18 9.67.157.18 9.67.157.18 9.67.157.18

Port ---00023 00023 00080 00080 00443 00443 00523 00523 00020 00020 00021 00021

VIPA Service Manager: McastGroup: 224.0.1.2

Moveable -------Immediate Immediate Immediate Immediate

SrvMgr -----No No Yes Yes

XCF Address ----------9.67.156.75 9.67.156.74 9.67.156.75 9.67.156.74 9.67.156.75 9.67.156.74 9.67.156.75 9.67.156.74 9.67.156.74 9.67.156.73 9.67.156.74 9.67.156.73

Port: 01637

Pwd: No

The following screen shows that the Sysplex Distributor MVS001 currently has two TCP connections.

314

Networking with z/OS and Cisco Routers: An Interoperability Guide

MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort DestXCF Addr --------------- -------------- -----------9.67.157.17 00023 9.67.156.104 04219 9.67.156.74 9.67.157.17 00023 9.67.156.104 04220 9.67.156.75

Because the backup Sysplex Distributor is not the current Sysplex Distributor, it does not have a distributed VIPA port table as shown in the following screen.

===> netstat vdpt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Distribution Port Table: Dest IPaddr DPort DestXCF Addr Rdy TotalConn -------------- -------------- --------***

WLM ---

The backup Sysplex Distributor does not have any TCP connections. See the next screen output of the dynamic VIPA connection routing table of MVS06.

===> netstat vcrt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort DestXCF Addr --------------- -------------- -----------***

The command netstat vipadyn issued at the backup Sysplex Distributor MVS069 also does not show any distributed DVIPAs. The screen output has no distributed VIPAs 9.67.157.17 and 9.67.156.18.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

315

MVS TCP/IP NETSTAT CS V1R2 IP Address AddressMask -------------------9.67.156.25 255.255.255.248 9.67.156.26 255.255.255.248 9.67.156.33 255.255.255.248 9.67.156.34 255.255.255.248 9.67.156.41 255.255.255.248 9.67.156.42 255.255.255.248 9.67.156.49 255.255.255.248 9.67.156.50 255.255.255.248 9.67.157.17 255.255.255.248 9.67.157.18 255.255.255.248 ***

TCPIP NAME: TCP Status Origination ---------------Backup VIPABackup Backup VIPABackup Backup VIPABackup Backup VIPABackup Backup VIPABackup Backup VIPABackup Active VIPADefine Active VIPADefine Backup VIPABackup Backup VIPABackup

DistStat --------

Primary Sysplex Distributor fails The backup Sysplex Distributor starts the takeover phase. It immediately activates the DVIPAs that are defined as VIPABACKUP in the TCPIP.PROFILE. See the next screen output, obtained using the netstat vipadyn command. The distributed DVIPAs 9.76.157.17 and 9.67.157.18 are taken over as VIPABACKUP DISTributed.

MVS TCP/IP NETSTAT CS V1R2 IP Address AddressMask -------------------9.67.156.25 255.255.255.248 9.67.156.26 255.255.255.248 9.67.156.33 255.255.255.248 9.67.156.34 255.255.255.248 9.67.156.41 255.255.255.248 9.67.156.42 255.255.255.248 9.67.156.49 255.255.255.248 9.67.156.50 255.255.255.248 9.67.157.17 255.255.255.248 9.67.157.18 255.255.255.248 ***

TCPIP NAME: TCP Status Origination ---------------Active VIPABackup Active VIPABackup Backup VIPABackup Backup VIPABackup Backup VIPABackup Backup VIPABackup Active VIPADefine Active VIPADefine Active VIPABackup Active VIPABackup

DistStat --------

Dist Dist

The command netstat vipadcfg command issued at the backup Sysplex Distributor shows the following result.

316

Networking with z/OS and Cisco Routers: An Interoperability Guide

Now the backup Sysplex Distributor is responsible for the taken-over distributed DVIPAs. All ports and the dynamic XCF IP addresses of the available target stacks are displayed.

===> netstat vipadcfg MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Information: VIPA Backup: IP Address Rank ------------9.67.156.33 000010 9.67.156.34 000010 .......... ...... VIPA Define: IP Address AddressMask Moveable SrvMgr --------------------------- -----9.67.156.49 255.255.255.248 Immediate No 9.67.156.50 255.255.255.248 Immediate No 9.67.157.17 255.255.255.248 Immediate Yes 9.67.157.18 255.255.255.248 Immediate Yes VIPA Distribute: IP Address Port XCF Address ----------------------9.67.157.17 00023 9.67.156.75 9.67.157.17 00023 9.67.156.74 .......... ..... ........... 9.67.157.18 00020 9.67.156.74 9.67.157.18 00020 9.67.156.73 9.67.157.18 00021 9.67.156.74 9.67.157.18 00021 9.67.156.73 VIPA Service Manager: McastGroup: 224.0.1.2 Port: 01637 Pwd: No

17:49:26

Backup SD informs the target stacks about DVIPAs and ports The port table obtained with the netstat vdpt command is filled now.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

317

MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Distribution Port Table: Dest IPaddr DPort DestXCF Addr Rdy TotalConn --------------- -------------- --------9.67.157.17 00023 9.67.156.74 001 0000000000 9.67.157.17 00023 9.67.156.75 001 0000000000 9.67.157.17 00080 9.67.156.74 001 0000000000 9.67.157.17 00080 9.67.156.75 001 0000000000 9.67.157.17 00443 9.67.156.74 001 0000000000 9.67.157.17 00443 9.67.156.75 000 0000000000 9.67.157.17 00523 9.67.156.74 001 0000000000 9.67.157.17 00523 9.67.156.75 001 0000000000 9.67.157.18 00020 9.67.156.74 000 0000000000 9.67.157.18 00021 9.67.156.74 001 0000000000

WLM --01 01 01 01 01 01 01 01 01 01

In the meantime, some more connections were established.

Target stacks provided connection information MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort DestXCF Addr --------------- -------------- -----------9.67.157.17 00023 9.67.156.104 04623 9.67.156.74 9.67.157.17 00023 9.67.156.104 04632 9.67.156.74 9.67.157.17 00023 9.67.156.104 04622 9.67.156.75 9.67.157.17 00023 9.67.156.104 04631 9.67.156.75 ***

Backup SD stack propagates to the network A router display obtained through the show ip casa wildcard det command shows the wildcard affinity table contents. It shows the new interest address of the backup Sysplex Distributor. This address is 67.156.76. All packets, even fragmented packets are accepted by the Sysplex Distributor. The interest address is the dynamic XCF link IP address of the backup Sysplex Distributor that was propagated to the network via a multicast CASA packet on behalf of the takeover process. This means that, from now on, the Forwarding Agents have to send CASA requests per unicast to the dynamic XCF link IP address 9.67.156.76 and port 1637. See the highlighted lines in the following display. A dispatch address is not available for wildcard affinities. Don’t be confused by the Service Manager’s IP address 9.67.156.67. This the IP address of the CLAW interface.

318

Networking with z/OS and Cisco Routers: An Interoperability Guide

Source Address Source Mask Port Dest Address Dest Mask Port Prot 0.0.0.0 0.0.0.0 0 9.67.157.17 255.255.255.255 523 TCP Service Manager Details: Manager Addr: 9.67.156.67 Insert Time: 11:36:27 UTC 07/27/01 Affinity Statistics: Affinity Count: 0 Interest Packet Timeouts: 0 Packet Statistics: Packets: 0 Bytes: 0 Advertise Dest Address: NO Match Fragments: YES Affinity TTL: 30 Action Details: Interest Addr: 9.67.156.76 Interest Port: 1637 Interest Packet: 0x8100 FRAG ALLPKTS Interest Tickle: 0x0000 Dispatch (Layer 2): NO Dispatch Address: 0.0.0.0 --More-Source Address Source Mask Port Dest Address Dest Mask Port Prot 0.0.0.0 0.0.0.0 0 9.67.157.17 255.255.255.255 443 TCP Service Manager Details: Manager Addr: 9.67.156.67 Insert Time: 11:36:27 UTC 07/27/01 Affinity Statistics: Affinity Count: 0 Interest Packet Timeouts: 0 Packet Statistics: Packets: 0 Bytes: 0 Advertise Dest Address: NO Match Fragments: YES Affinity TTL: 30 Action Details: Interest Addr: 9.67.156.76 Interest Port: 1637 Interest Packet: 0x8100 FRAG ALLPKTS Interest Tickle: 0x0000 Dispatch (Layer 2): NO Dispatch Address: 0.0.0.0 Source Address Source Mask Port Dest Address Dest Mask Port Prot 0.0.0.0 0.0.0.0 0 9.67.157.17 255.255.255.255 80 TCP Service Manager Details: Manager Addr: 9.67.156.67 Insert Time: 11:36:27 UTC 07/27/01 Affinity Statistics: Affinity Count: 0 Interest Packet Timeouts: 0 --More-Packets: 0 Bytes: 0 Advertise Dest Address: NO Match Fragments: YES Affinity TTL: 30 Action Details: Interest Addr: 9.67.156.76 Interest Port: 1637 Interest Packet: 0x8100 FRAG ALLPKTS --More--

Chapter 9. Load distribution with MNLB and Sysplex Distributor

319

On router 7206, nothing has been changed concerning existing connections. The four TCP connections may be viewed by issuing the command show ip affinities. This is the short form for the display.

Source Address 9.67.156.104 9.67.156.104 9.67.156.104 9.67.156.104 C7200-Z55>

Port 4622 4623 4632 4631

Dest Address 9.67.157.17 9.67.157.17 9.67.157.17 9.67.157.17

Port 23 23 23 23

Prot TCP TCP TCP TCP

The command show ip affinities det will let you see that the backup Sysplex Distributor now maintains this connection. This is indicated through the interest address. See the highlighted line. Note, the following display is an excerpt only.

C7200-Z55> show ip casa aff det Source Address Port Dest Address 9.67.156.104 4622 9.67.157.17 Action Details: Interest Addr: 9.67.156.76 Interest Packet: 0x0002 SYN Interest Tickle: 0x0000 Dispatch (Layer 2): YES Source Address Port Dest Address 9.67.156.104 4623 9.67.157.17 Action Details: Interest Addr: 9.67.156.76 Interest Packet: 0x0002 SYN Interest Tickle: 0x0000 Dispatch (Layer 2): YES --More--

Port 23

Prot TCP

Interest Port: 1637

Dispatch Address: 9.67.156.75 Port Prot 23 TCP Interest Port: 1637

Dispatch Address: 9.67.156.74

The takeover process also deposits footprints in the z/OS console log. See the following display excerpt of the system MVS069, the backup Sysplex Distributor.

EZZ8301I EZZ8301I EZZ8301I EZZ8301I

320

VIPA VIPA VIPA VIPA

9.67.156.25 9.67.156.26 9.67.157.17 9.67.157.18

TAKEN TAKEN TAKEN TAKEN

OVER OVER OVER OVER

FROM FROM FROM FROM

TCP TCP TCP TCP

Networking with z/OS and Cisco Routers: An Interoperability Guide

ON ON ON ON

MVS001 MVS001 MVS001 MVS001

Sysplex Distributor recovery sequence When the primary Sysplex Distributor is restarted, the recovery process defined for the DVIPAs will take back the resources depending on the VIPADEFINE parameter, MOVEABLE IMMEDIATE or MOVEABLE WHENIDLE. MOVEABLE IMMEDIATE will enable the primary Sysplex Distributor to take back immediately the current TCP connection maintenance previously taken over from the backup Sysplex Distributor. This takeback happens without interruption of the TCP connection between the client and the application running on a target system. After the takeback phase, the connections are no longer viewed in the backup Sysplex Distributor’s distributed VIPA connection routing table. They are now maintained in the recovered Sysplex Distributor. All new connection requests are directed to the recovered Sysplex Distributor. MOVEABLE WHENIDLE indicates that this DVIPA can be moved to another stack, when there are no connections for this DVIPA on the current stack. While there are existing connections, any new connection request continues to be directed to the current stack.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

321

Sysplex Distributor (Service Manager)

Target Stacks 4

3

XCF

4 1 MVS001

2

DVIPA TN3270 9.67.157.17 OSA Express 9.67.157.129

Target Stack

Target 3 Stack

XCF 5 Update VCRT 9.67.156.73 DVIPA FTP 9.67.157.18

Sysplex Distributor Backup

3

MVS069

MVS062

MVS154

XCF 9.67.156.76

XCF 9.67.156.74

XCF 9.67.156.75

Backup DVIPA FTP 9.67.157.18

DVIPA FTP 9.67.157.18

6 Delete DVIPAs

Backup DVIPA TN3270 9.67.157.17

DVIPA TN3270 9.67.157.17

DVIPA TN3270 9.67.157.17

OSA Express 9.67.157.130

OSA Express 9.67.157.131

OSA Express 9.67.157.132

7 DVIPAs to Router Multicast 7 Wildcard Affinities

Switch Router Forwarding Agent

Router Forwarding Affinity Table Agent

Fixed Affinities

Network

Affinity Table Fixed Affinities

Multicast 8 Wildcard Affinities

Login TN3270 (Port 23 Cached IP- Address is 9.67.157.17)

Figure 9-8 Sysplex Distributor recovery sequence

Notes to Figure 9-8:

1. The Distribution Stack (MVS001) is restarted. 2. The Distribution Stack (MVS001) activates DVIPAs. 3. The Distribution Stack (MVS001) informs the backup Distribution Stack (MVS069), and the target stacks (MVS062 and MVS0154). 4. The target stacks (MVS062 and MVS0154) inform the Distribution Stack (MVS001) about application server status on ports defined for distribution. 5. The backup Distribution Stack (MVS069) sends a table to the distribution stack (MVS001) containing all connections currently routed.

322

Networking with z/OS and Cisco Routers: An Interoperability Guide

6. The backup Distribution Stack (MVS069) cleans up its tables and deletes the DVIPAs given back. 7. The Distribution Stack (MVS001) advertises DVIPAs to the network. Note: This procedure only works if DVIPAs in MVS001 are defined as MOVEable IMMEDiate (the default) on the VIPADefine statement.

Display documentation of the recovery sequence The z/OS console of MVS069, the backup Sysplex Distributor, indicates the DVIPAs are returned to the originator MVS001.

EZZ8303I EZZ8303I EZZ8303I EZZ8303I

VIPA VIPA VIPA VIPA

9.67.156.25 9.67.156.26 9.67.157.17 9.67.157.18

GIVEN GIVEN GIVEN GIVEN

TO TO TO TO

TCP TCP TCP TCP

ON ON ON ON

MVS001 MVS001 MVS001 MVS001

The z/OS console of MVS001, the primary Sysplex Distributor, indicates that the DVIPAs returned.

EZZ8302I EZZ8302I EZZ8302I EZZ8302I

VIPA VIPA VIPA VIPA

9.67.156.25 9.67.156.26 9.67.157.17 9.67.157.18

TAKEN TAKEN TAKEN TAKEN

FROM FROM FROM FROM

TCP TCP TCP TCP

ON ON ON ON

MVS069 MVS069 MVS069 MVS069

DVIPAs are reactivated DVIPAs are activated as shown in “Backup SD MVS069 displays before the primary SD MVS001 failed” on page 313.

Target stacks provide connection state information The recovered Sysplex Distributor takes back the maintenance of the four TCP connections.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

323

===> netstat vcrt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort DestXCF Addr --------------- -------------- -----------9.67.157.17 00023 9.67.156.104 04623 9.67.156.74 9.67.157.17 00023 9.67.156.104 04632 9.67.156.74 9.67.157.17 00023 9.67.156.104 04622 9.67.156.75 9.67.157.17 00023 9.67.156.104 04631 9.67.156.75 ***

Backup Sysplex Distributor stack cleans up port table No ports are available because the backup stack is no longer the Service Manager. See the display in “Backup SD MVS069 displays before the primary SD MVS001 failed” on page 313.

===> netstat vdpt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Distribution Port Table: Dest IPaddr DPort DestXCF Addr Rdy TotalConn --------------- -------------- ---------

WLM ---

***

Backup Sysplex Distributor stack cleans up the VIPA connection routing table

===> netstat vcrt MVS TCP/IP NETSTAT CS V1R2 TCPIP NAME: TCP Dynamic VIPA Connection Routing Table: Dest IPaddr DPort Src IPaddr SPort DestXCF Addr --------------- -------------- -----------***

Recovered SD stack advertises DVIPAs to the network Fixed affinities are still viewable at the Forwarding Agent. In the detailed display, the new interest address is shown.

324

Networking with z/OS and Cisco Routers: An Interoperability Guide

C7200-Z55>show ip casa aff Source Address Port Dest Address 9.67.156.104 4622 9.67.157.17 9.67.156.104 4623 9.67.157.17 9.67.156.104 4632 9.67.157.17 9.67.156.104 4631 9.67.157.17 C7200-Z55>

Port 23 23 23 23

Prot TCP TCP TCP TCP

The interest IP address of the recovered Sysplex Distributor is highlighted.

C7200-Z55>show ip casa aff det Source Address Port Dest Address Port Prot 9.67.156.104 4631 9.67.157.17 23 TCP Action Details: Interest Addr: 9.67.156.73 Interest Port: 1637 Interest Packet: 0x0002 SYN Interest Tickle: 0x0000 Dispatch (Layer 2): YES Dispatch Address: 9.67.156.75 --More--

9.10 Generic Routing Encapsulation (GRE) protocol There may be situations where IP packets have to be encapsulated with an additional IP header before they are sent over the network. This situation is applicable, for example, when there are multiple logical partitions (LPARs) in a z/OS sysplex environment, each running a TCP/IP stack and sharing one OSA-Express adapter, and each TCP/IP stack has defined distributed dynamic virtual IP addresses (DVIPAs). These DVIPAs are defined in the Sysplex Distributor, and in the backup Sysplex Distributor, for target stacks running various application services.

9.10.1 The need for GRE Such a situation arises when multiple z/OS LPARs, each running a TCP/IP stack, share one OSA-Express adapter, and a distributed dynamic virtual IP (DVIPA). A MNLB Forwarding Agent distributes IP packets from clients of TCP connections to target application servers. For each connection, a specific target server was selected by the Sysplex Distributor from a group of servers within the sysplex based on load-balancing rules.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

325

Remember, a route between the client and the server located in a z/OS sysplex will have a destination IP address that is a distributed dynamic VIPA (DVIPA). The distributed DVIPA is the cluster address representing the application server. The distributed DVIPA is dynamically defined by the Sysplex Distributor also in other target stacks for load-balancing purposes. Thus, there are multiple equal distributed DVIPAs known in TCP/IP stacks in a sysplex. The following display is taken from the target system MVS062. It shows the dynamically defined DVIPAs. For example, the IP address 9.67.157.17 for TN3270 and Web services also appears in the HOME list of the application hosts in the target systems MVS062 and MVS154.

=> netstat home MVS TCP/IP NETSTAT CS V1R2 TCPIP Home address list: Address Link ---------9.67.156.161 VLINK0 9.67.157.130 GIGELINK 9.67.156.69 CISCO1 9.67.156.18 CISCO2 9.67.157.243 CISCO3 9.67.156.162 TOVTAM 9.67.156.74 EZAXCFN6 9.67.156.33 VIPL09439C21 9.67.156.34 VIPL09439C22 9.67.156.74 EZAXCFN7 9.67.156.74 EZAXCFN4 9.67.157.17 VIPL09439D11 9.67.157.18 VIPL09439D12 127.0.0.1 LOOPBACK

NAME: TCP Flg --P

I I

A problem arises when a destination IP address, representing this DVIPA, has to be routed by the OSA-Express GbE adapter to the real application server’s IP address. Although the distributed DVIPAs, defined in the Sysplex Distributor with VIPADIST and DESTIP, also appear in the HOME list of all target stacks, they are “hidden” from the network. What does “hidden” mean? Distributed DVIPAs are not propagated by a router daemon and, most importantly, they are not downloaded into the OSA-Express adapter’s address table (OAT). Thus they cannot be mapped with the 48-bit ISO/OSI layer-2 medium access control (MAC) address.

326

Networking with z/OS and Cisco Routers: An Interoperability Guide

In our test environment, all LPARs use the OSA-Express adapter in a shared mode. Thus the OSA-Express adapter knows the distributed DVIPAs, but is owned by the Sysplex Distributor stack only. In our test case the Sysplex Distributor would be the receiver for packets with cluster addresses 9.67.157.17, and 9.67.157.18. This means that, since all packets carrying the cluster address for existing connections would always be sent to the Sysplex Distributor and not the destination address of the real target server, there has to be some additional work by the Forwarding Agent to address the real target server rather than the Sysplex Distributor. Before we try to explain how the target addressing problem will be solved, we first review how the OAT is created in the OSA-Express adapter. It differs completely from the procedure you might follow when an OSA-2 OAT is defined. When a START DEVICE command is executed by the TCP/IP stack for an OSA-Express adapter that runs in MPCIPA mode (also known as IP assist mode), information is passed from the TCP/IP stack to the adapter, including: 򐂰 HCD definitions (channel path ID, control unit, device address, and operation mode (shr)). See the definitions in 6.3.1, “IOCP for OSA-Express devices” on page 184. 򐂰 TRLE definitions with a sub-channel address to access the LPAR (read, write, data path, and port name which maps the device name of the DEVICE statement guiding to the LINK name). See the definition in 6.3.5, “VTAM and TCP/IP definition” on page 188. 򐂰 DEVICE and LINK statement information. 򐂰 HOME statements, which list all IP addresses with associated link names known to the stack.

A sample of the OAT shows which IP addresses are downloaded from the TCP/IP stacks. You will discover the differences in the entries for: 򐂰 MVS001 as the Sysplex Distributor stack 򐂰 MVS062 as one of the target stacks 򐂰 MVS069 as the backup Sysplex Distributor

The backup Sysplex Distributor will receive entries for the VIPABACKUP addresses when the backup occurs. The entries for the other target stack MVS154 are not shown because they are similar to MVS062. We have edited the samples in Example 9-17 through Example 9-19 for a better understanding. Our additional text is typed in italic letters. The OAT entries are also not shown in the real sequence. The real sequence is sorted by the sub-channel addresses of the read, write, and data path addresses defined in the TRLE. Chapter 9. Load distribution with MNLB and Sysplex Distributor

327

OAT for MVS001 (Sysplex Distributor) Example 9-17 OSA-Express adapter address table (OAT) of Sysplex Distributor THERE IS DATA FOR 55 OAT(s) ON CHPID F5 START OF OSA ADDRESS TABLE UA(Dev) Mode Port Entry specific information Entry Valid LP 01 (RALVM9) is the VM system which carries the four z/OS LPARs MVS001, MVS069, MVS062, MVS154 MVS001 Sysplex Distributor (node name: N04) 14(2F14) MPC n/a N04GIG1 (QDIO control) SIU ALL 15(2F15) MPC n/a N04GIG1 (QDIO control) SIU ALL 16(2F16) MPC 00 PRI N04GIG1 (QDIO data) SIU ALL 9.67.156.17 -> MPC CISCO2 MVS001 9.67.156.66 -> CLAW CISCO1 9.67.156.2 -> TOVTAM 9.67.156.73 -> DXCF 9.67.157.129 -> GIGELINK 9.67.156.1 -> static VIPA 9.67.156.25 -> DVIPA 9.67.156.26 -> DVIPA 9.67.157.17 -> distributed DVIPA 9.67.157.18 -> distributed DVIPA

This display shows that z/OS sub-channel address 2F14 and 2F15 are the read and write sub-channel addresses for the QDIO control data. The real data transmission goes over the channel path 2F16, defined in the TRLE. This line also shows that the path to MVS001 is defined as the primary router (PRI). Compare the equivalent line for the backup Sysplex Distributor. There the path is defined as the secondary router (SEC). You will also not find the distributed DVIPAs 9.67.157.18 and 9.67.157.19 in the backup Sysplex Distributor, since they are not displayed after using the netstat home command. Remember, the distributed DVIPAs are activated in a backup case only. The name between the router specification and the QDIO issues represents the OSA name. It is the name of the TRLE entry. For example, N04GIG1 is the OAT entry that points to the TRLE name (see “OSA-Express adapter interfaces” on page 264). When you look at the sub-channel addresses of all LPARs, you will notice that they are unique. This is because the OSA-Express adapter is defined to one system only. This system is a VM system, called RALVM9. The four z/OS systems are guest machines running under VM.

328

Networking with z/OS and Cisco Routers: An Interoperability Guide

In a pure z/OS environment, without VM, you have four independent LPARs. This allows you to define three equal sub-channel addresses for all LPARs. For example: 2F14 2F15 2F16

for the read control sub-channel for the write control sub-channel for the data path

If the port name of the TRLE maps the device name in the TCPIP.PROFILE, then one TRLE may be defined, but copied to the VTAMLST of the four different systems.

OAT for MVS062 (target system) Example 9-18 OSA-Express adapter address table (OAT) of target stack MVS062 MVS062 Target system (node name: N05) 04(2F04) MPC n/a N05GIG1 (QDIO control) SIU 05(2F05) MPC n/a N05GIG1 (QDIO control) SIU 06(2F06) MPC 00 No N05GIG1 (QDIO data) SIU 9.67.156.18 -> MPC CISCO2 MVS062 9.67.157.243 -> CLAW CISCO3 9.67.156.162 -> TOVTAM 9.67.156.74 -> XCF 9.67.157.130 -> GIGELINK 9.67.156.161 -> static VIPA 9.67.156.33 -> DVIPA 9.67.156.34 -> DVIPA

ALL ALL ALL

OAT for MVS069 (backup Sysplex Distributor) Example 9-19 OSA-Express adapter address table (OAT) of backup Sysplex Distributor MVS069 Backup Sysplex Distributor (node name: N07) 18(2F18) MPC n/a N07GIG1 (QDIO control) SIU ALL 19(2F19) MPC n/a N07GIG1 (QDIO control) SIU ALL 1A(2F1A) MPC 00 SEC N07GIG1 (QDIO data) SIU ALL 9.67.156.20 -> MPC CISCO2 MVS069 9.67.157.245 -> CLAW CISCO3 9.67.156.5 -> TOVTAM 9.67.156.76 -> DXCF 9.67.157.132 -> GIGELINK 9.67.156.5 -> static VIPA 9.67.156.49 -> DVIPA 9.67.156.50 -> DVIPA ------------------------------------------------------------------------------*** Legend for abbreviations *** * Entry column Valid column * ----------------------* S - Started OSA - Does not exist in IOCDS, but is on OSA * NS - Not started CSS - Exists only in Channel Subsystem (IOCDS) Chapter 9. Load distribution with MNLB and Sysplex Distributor

329

* * * * * * * * * *

SIU - Started & in use ALL - Exists on the OSA and in IOCDS N/A - Not Applicable R - Rejected (see messages following this legend) Entry Specific Information ----------------------Passthru - Default entry, Home IP address & Netmask SNA - VTAM IDNUM (if port number is FF) MPC (IP or IPX) - OSA name MPC (QDIO Control) - OSA name MPC (QDIO Data) - Default entry & OSA name

CHPID F5 RETURNED THE FOLLOWING OAT REASON MESSAGES None *****************************

End of Query data

************************

IP assist uses the downloaded information to create the OSA Address Table (OAT). This enables the OSA adapter to recognize IP addresses that are mapped to the MAC address of the adapter. IP assist will also use the referenced information such as all IP addresses of other TCP/IP stack's HOME lists if the OSA-Express GbE adapter runs in a shared mode. Now, we return to the addressing situation initiated by the fact that the destination IP for all TCP connection data to the server is the cluster address. The only choice for the OSA-Express could be to send the packet to the Sysplex Distributor, which would check the 5-tuple information and by scanning the VIPA connection routing table (VCRT) would determine to which TCP connection this packet belongs, and forwards it to the correct target system. But this would end being the non-MNLB Sysplex Distributor solution. It is not the preferred approach for a sysplex environment with MNLB. Therefore, another solution has to be made available to overcome the shared OSA adapter, distributed DVIPA problem.

9.10.2 Search for a shared OSA-Express solution One approach could be to encapsulate the IP packet received by the Forwarding Agent into another IP packet with its destination IP address of the real target system. The true destination address is known by the Forwarding Agent for every IP packet associated with the TCP connection. The target IP address was provided by the Sysplex Distributor’s Service Manager through a fixed affinity at connection establishment through the CASA packet carrying the SYN sent to the Service Manager.

330

Networking with z/OS and Cisco Routers: An Interoperability Guide

The protocol of encapsulating the original IP packet into a delivery IP packet is called the Generic Routing Encapsulation (GRE) Protocol. This protocol is used for routing IP packets from the Forwarding Agent to the correct target system in a z/OS sysplex with multiple LPARs sharing one OSA-Express GbE adapter. Without using GRE, a packet arriving at the OSA adapter would be forwarded to a default mapped IP address, which might not be the correct target system. Note: In our case, GRE is required only for inbound packets from the Forwarding Agent to the target system. The path back to the client doesn't need GRE, since the route to the client can be determined correctly because the destination address is the unique client.

9.10.3 Generic Routing Encapsulation (GRE) overview A general description of GRE is found in RFC 1701. RFC 1702 is about using GRE over IPv4 networks. GRE is used in cases where a system has a packet that needs to be encapsulated and delivered to some destination. In our case, it is the MNLB Forwarding Agent that encapsulates and delivers the packet to the real target system. The original packet received by the Forwarding Agent is called a payload packet. The packet around the payload packet is called the delivery packet.

Supported payload protocols GRE was designed to encapsulate any protocol of the many networking protocols known as the Ethernet protocol types shown in Table 9-1: Table 9-1 List of supported protocol types (excerpt only) Protocol Family

Protocol Type

SNA

0004

OSI network layer

0FE

XNS

0600

IP

0800

RFC 826 ARP

806

Frame Relay ARP

808

VINES

0BAD

DECnet (Phase IV)

6003

Chapter 9. Load distribution with MNLB and Sysplex Distributor

331

Protocol Family

Protocol Type

Ethertalk (Appletalk)

809B

Novell IPX

8137

RFC 1144 TCP/IP compression

876B

Secure Data

876D

Overall packet structure An additional header, the GRE header, has to be added to let the receiver of the delivery packet know what Ethernet type of protocol is encapsulated. The structure of the entire encapsulated packet is shown in Figure 9-9.

D e liv e ry H e a d e r

G R E H eader

P a y lo a d P a c k e t

Figure 9-9 Overall packet structure

The delivery header consists of the 20 bytes IPv4 header. The length of the GRE header is dependent on the functions used. For our tests we used protocol type IP (0x0800) only. The length of the payload packet depends on the data to be transmitted.

GRE header structure Figure 9-10 shows the structure of the GRE header.

C R K S s Recur

Flags

Ver

Checksum (optional)

Protocol Type Offset (optional)

Key (optional) Sequence Number (optional) Routing (optional)

Figure 9-10 GRE header structure

The descriptions of the GRE fields may be obtained from RFC 1701. The length of the GRE depends on the functions used by switching on the flag bits. Functions are: Checksum

332

Calculated checksum of the GRE and payload packet

Networking with z/OS and Cisco Routers: An Interoperability Guide

Key

Used for authentication the sender

Sequence number To control the sequence of incoming GRE and payload packets Routing

To determine a source route path

Payload packet structure The payload packet structure is the usual IP packet containing the IPv4 header, TCP or UDP header, and data. In our case, we had use the TCP header only, since the Service Manager in the Sysplex Distributor supports the TCP protocol only.

Forwarding and processing of GRE packets The MNLB Forwarding Agent, in our test case implemented in Cisco routers 7507 and 7206VXR, uses RFC 1701 and RFC 1702 to encapsulate IP packets received from the client and forwards an IPv4 delivery packet, if the path goes outbound over the Gigabit Ethernet interfaces. This means that GRE tunnels have to be defined for the GbE interfaces in both routers. The GRE tunnels have to be defined in the Cisco routers for outbound traffic to the sysplex only. Within the sysplex there are no tunnel definitions required on any TCP/IP stack, either in the Sysplex Distributor or in the target stacks. For definitions in the Cisco routers, see 9.10.4, “Definitions in the Cisco routers 7507 and 7206” on page 337. The payload packet has an Ethernet type 0x'0800'. In our case it is the IP packet received by the Forwarding Agent, sent from the client. So GRE will be used to encapsulate an IP packet by another IP packet according to RFC 1702. Normally, a system that is forwarding delivery packets will not differentiate GRE packets from other packets in any way. This system supports GRE only, because it is enabled by the Cisco routers. However, a GRE packet may be received by a system, for example our z/OS system. This system must recognize that an encapsulated IP packet has been received. z/OS V1.2 supports GRE and is thus able to unpack the delivery packet. Once the delivery packet is recognized, the GRE contents is processed. For example: If the C-bit is set on, a checksum is recalculated and compared. If the R-bit is set on, the routing fields and the offset are available. If the K-bit is set on, the authentication key of the sender is checked. If the S-bit is set on, a sequence number field is present and may be checked with previous sequence numbers. 򐂰 If the s-bit is set on, a strict source routing path is defined. 򐂰 򐂰 򐂰 򐂰

Chapter 9. Load distribution with MNLB and Sysplex Distributor

333

Further information may be obtained from the RFCs 1701 and 1702. As you will see later, all referenced bits in the GRE header were not switched on by the Forwarding Agent in our test case.

What destination IP address in the delivery header? This is a question that has to be answered before defining the tunnel in the two Cisco routers. The tunnel end determines the destination IP address for the delivery header. Delivery Packet IP header

GRE Header Source IP addr.

Dest. IP addr.

router IP address of GbE adapter

?????????????

IP header

Payload Packet

Protocol Type

Source IP addr.

Dest.IP addr.

0x0800

client IP address

cluster IP address

Figure 9-11 What destination IP address for the delivery packet

Because the description in the current documentation did not indicate what IP address for the tunnel end is most suitable, we had to find it out through tests. We tested: 򐂰 The IP address of the dynamic XCF link address. This did not work. A PING timed out. OSPF has routes via the XCF to other stacks only. 򐂰 The IP address of any hardware interface showed the same characteristics. A PING timed out. 򐂰 A DVIPA seems not to be suitable. 򐂰 A good choice was a static VIPA. The PING worked successful. Delivery Packet IP header

GRE Header Source IP addr.

Dest. IP addr.

router IP address of GbE adapter

static VIPA of target system

IP header

Payload Packet

Protocol Type

Source IP addr.

Dest.IP addr.

0x0800

client IP address

cluster IP address

Figure 9-12 Delivery packet

The test clarified that the tunnel end works with the static VIPA of the target system. This means a tunnel has to be defined in each Cisco router with: 򐂰 Source IP address of the GbE interface of the router 򐂰 Destination IP address of the static VIPA of the system to be reached

334

Networking with z/OS and Cisco Routers: An Interoperability Guide

This means that for our tests four tunnels were needed from router 7507 to each host in the sysplex. Four tunnels are also needed from router 7206. The delivery packet is sent via a tunnel from the router over the Gigabit Ethernet, a switch, the OSA-Express adapter, to the TCP/IP stack containing the static VIPA address of the desired host. Figure 9-13 shows the four tunnels from router 7507. The other four tunnels from router 7206 are indicated only. Sysplex Distributor (Service Manager)

Target Stacks XCF

MVS001

MVS069

XCF 9.67.156.73

XCF 9.67.156.76

static VIPA 9.67.156.1 DVIPA FTP 9.67.157.18

MVS062

MVS154

XCF 9.67.156.74

XCF 9.67.156.75

static VIPA 9.67.156.161

DVIPA TN3270 9.67.157.17

static VIPA 9.67.156..5 Backup DVIPA FTP 9.67.157.18

static VIPA 9.67.156.165

Backup DVIPA TN3270 9.67.157.17

DVIPA FTP 9.67.157.18

OSA Express 9.67.157.129 Tunnel1

Sysplex Distributor Backup

DVIPA TN3270 9.67.157.17 Tunnel62

OSA Express 9.67.157.130

OSA Express 9.67.157.132

DVIPA TN3270 9.67.157.17 Tunnel154

OSA Express 9.67.157.131

Tunnel69

Gigabit Ethernet 4 Tunnels to 4 Hosts Tunnel Source 9.67.157.137

Switch

Router 7507 Forwarding Agent

Network

Tunnel Source 9.67.157.136

Router 7206 Forwarding Agent

Login TN3270 (Port 23 Cached IP- Address is 9.67.157.17)

Figure 9-13 Tunnels over the shared OSA-Express

Chapter 9. Load distribution with MNLB and Sysplex Distributor

335

One question is still open. How does the Forwarding Agent know if it should take the destination IP address of tunnel1, or tunnel62, or tunnel69, or tunnel154? The answer could only be that the Forwarding Agent knows the forwarding address for the TCP connection, which is the dynamic XCF link IP address. If a static route is defined in the router pointing to the dynamic XCF link IP address using the tunnel name, such as tunnel1, as the first hop, then the delivery packet will reach the correct destination. Sysplex Distributor (Service Manager)

Target Stacks

Sysplex Distributor Backup

XCF

MVS001

MVS069

XCF 9.67.156.73

XCF 9.67.156.76 MVS062

MVS154

static VIPA 9.67.156.1

XCF 9.67.156.74

XCF 9.67.156.75

DVIPA FTP 9.67.157.18

static VIPA 9.67.156.161

DVIPA TN3270 9.67.157.17

DVIPA FTP 9.67.157.18

OSA Express 9.67.157.129

DVIPA TN3270 9.67.157.17

static VIPA 9.67.156.165 Static Route Static Route Using Tunnel69 Using Tunnel154 DVIPA TN3270 9.67.157.17

OSA Express 9.67.157.131

OSA Express 9.67.157.130

Static Route Using Tunnel1 Static Route Using Tunnel62

Tunnel Source 9.67.157.137

Router 7507 Forwarding Agent

Gigabit Ethernet 4 Tunnels and 4 Static Routes to 4 Hosts Tunnel Source 9.67.157.136

Switch

Network

Router 7206 Forwarding Agent

Login TN3270 (Port 23 Cached IP- Address is 9.67.157.17)

Figure 9-14 Static routes using GRE tunnels

336

Networking with z/OS and Cisco Routers: An Interoperability Guide

static VIPA 9.67.156..5 Backup DVIPA FTP 9.67.157.18 Backup DVIPA TN3270 9.67.157.17

OSA Express 9.67.157.132

Another question may arise: What destination IP address will be taken by the Forwarding Agent when a SYN request is received or if there is no fixed affinity for an existing connection? In both cases, the Forwarding Agent has the forwarding address of the Sysplex Distributor, which is the dynamic XCF link IP address: 򐂰 In the wildcard affinity for the SYN 򐂰 In the wildcard affinity for cluster addresses and port numbers

Therefore, tunnels and static routes also have to be defined from each Forwarding Agent to the Sysplex Distributor with Service Manager and the corresponding backup Sysplex Distributor.

9.10.4 Definitions in the Cisco routers 7507 and 7206 򐂰 One tunnel is needed for each LPAR sharing the OSA adapter:

– The source IP address of the tunnel is the Cisco GbE adapter address – The destination IP address of the tunnel end is the static VIPA address of the z/OS host 򐂰 For each dynamic XCF link IP address a static route is needed to the XCF link address using the tunnel. Delivery Packet IP header

GRE Header Source IP addr.

Dest. IP addr.

router IP address of GbE adapter

static VIPA of target system

IP header

Payload Packet

Protocol Type

Source IP addr.

Dest.IP addr.

0x0800

client IP address

cluster IP address

Figure 9-15 Complete delivery packet

Definitions in router 7507 Example 9-20 Tunnel definitions interface Tunnel1 description GRE tunnel to MVS001 ip address 7.7.7.1 255.255.255.252 no ip route-cache cef no ip route-cache no ip mroute-cache tunnel source 9.67.157.137 tunnel destination 9.67.156.1 ! interface Tunnel62 description GRE tunnel to MVS062

Chapter 9. Load distribution with MNLB and Sysplex Distributor

337

ip address 8.8.8.1 255.255.255.252 tunnel source 9.67.157.137 tunnel destination 9.67.156.161 ! interface Tunnel69 description GRE tunnel to MVS069 ip address 11.11.11.1 255.255.255.252 tunnel source 9.67.157.137 tunnel destination 9.67.156.5 ! interface Tunnel154 description GRE tunnel to MVS154 ip address 12.12.12.1 255.255.255.252 tunnel source 9.67.157.137 tunnel destination 9.67.156.165

The tunnel is defined through an interface command with parameters shown in Table 9-2. We did not use tunnel checksum, tunnel key, tunnel mode (GRE is default), and tunnel sequence-datagrams, in order to keep the test simple. Table 9-2 Tunnel definition through interface command

338

Command / Parameter

Description

interface tunnel1

Describes a tunnel name, which we used later for the static route. In our case it was tunnel1 from the router to MVS001.

description

Allows text for documentation.

ip address

This line is not used for this kind of tunnel to the z/OS hosts. In other situations, for example for VPNs, it is required.

tunnel source

Defines the IP address where the tunnel starts. This address is used as the source address in the delivery packet.

tunnel destination

Defines the host name or IP address of the end of the tunnel. This address is used as the destination address in the delivery packet.

tunnel checksum

A checksum is created by the sender and transmitted in the GRE header to the receiver at the other end of the tunnel. The receiving IP stack recalculates the checksum and compares it with the sender’s. If there is a transmission error, the GRE header and the payload packet will not be forwarded.

Networking with z/OS and Cisco Routers: An Interoperability Guide

Command / Parameter

Description

tunnel key

Is a kind of password, consisting only of numbers from 0 to 429467295.

tunnel mode

Describes the protocol used for the tunnel. In our case it was GRE.

tunnel sequence-datagrams

The sender sets sequence numbers in the GRE header. The receiver may control out-of-order packets with this information.

The tunnel source IP address will become the source IP address in the delivery packet. The tunnel destination IP address will become the destination IP address in the delivery packet. Example 9-21 Static routes in router 7507 ip ip ip ip

route route route route

9.67.156.73 9.67.156.74 9.67.156.75 9.67.156.76

255.255.255.255 255.255.255.255 255.255.255.255 255.255.255.255

Tunnel1 Tunnel62 Tunnel154 Tunnel69

The definition of the static routes from router 7507 to the dynamic XCF link addresses of the four stacks use the tunnel names that have as the tunnel endpoint the static VIPA, defined in each z/OS host stack. Example 9-22 Routing table in router 7507 NIVT7507#show ip route Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area * - candidate default, U - per-user static route, o - ODR P - periodic downloaded static route Gateway of last resort is not set 1.0.0.0/32 is subnetted, 1 subnets C 1.1.1.1 is directly connected, CASA1 9.0.0.0/8 is variably subnetted, 51 subnets, 4 masks C 9.67.156.144/28 is directly connected, Serial4/0/0 O E2 9.67.156.128/28 [110/20] via 9.67.156.149, 01:47:31, Serial4/0/0 C 9.67.157.128/28 is directly connected, Port-channel1 C 9.67.156.164/30 is directly connected, Tunnel154 O E1 9.67.156.165/32 [110/6] via 9.67.157.131, 01:47:31, Port-channel1 C 9.67.156.160/30 is directly connected, Tunnel62 O E1 9.67.156.161/32 [110/6] via 9.67.157.130, 01:47:32, Port-channel1 S 9.67.156.174/32 [1/0] via 9.67.156.174, Channel6/2

Chapter 9. Load distribution with MNLB and Sysplex Distributor

339

S S S S

9.67.156.172/32 [1/0] via 9.67.156.172, Channel6/2 9.67.156.173/32 [1/0] via 9.67.156.173, Channel6/2 9.67.156.170/32 [1/0] via 9.67.156.170, Channel6/2 9.67.156.171/32 [1/0] via 9.67.156.171, Channel6/2 9.67.156.68/32 [110/5] via 9.67.157.131, 01:52:32, 9.67.156.69/32 [110/5] via 9.67.157.130, 01:52:32, 9.67.156.66/32 [110/5] via 9.67.157.129, 01:52:33, 9.67.156.67/32 [110/5] via 9.67.157.132, 01:52:33, 9.67.156.65/32 [110/5] via 9.67.157.136, 01:52:34, S 9.67.156.76/32 is directly connected, Tunnel69 S 9.67.156.74/32 is directly connected, Tunnel62 S 9.67.156.75/32 is directly connected, Tunnel154 O E1 9.67.156.72/29 [110/6] via 9.67.157.130, 01:52:36, [110/6] via 9.67.157.131, 01:52:36, Port-channel1 [110/6] via 9.67.157.132, 01:52:37, Port-channel1 S 9.67.156.73/32 is directly connected, Tunnel1

Port-channel1 Port-channel1 Port-channel1 Port-channel1 Port-channel1

Port-channel1

The display shows that the static route, for example to 9.67.156.76, is directly connected, using tunnel69. Tunnel69 is the name used for the tunnel definition.

Definitions in router 7206 These definitions are similar to the definitions in router 7507. The only difference is the source IP address of the tunnel, which makes the source IP address of the delivery packet. Example 9-23 Tunnel definitions of router 7206 interface Tunnel1 description GRE tunnel to MVS001 ip address 3.3.3.1 255.255.255.252 tunnel source 9.67.157.136 tunnel destination 9.67.156.1 ! interface Tunnel62 description GRE tunnel to MVS062 ip address 6.6.6.1 255.255.255.252 tunnel source 9.67.157.136 tunnel destination 9.67.156.161 ! interface Tunnel69 description GRE tunnel to MVS069 ip address 4.4.4.1 255.255.255.252 tunnel source 9.67.157.136 tunnel destination 9.67.156.5 ! interface Tunnel154 description GRE tunnel to MVS154 ip address 5.5.5.5 255.255.255.252

340

Networking with z/OS and Cisco Routers: An Interoperability Guide

tunnel source 9.67.157.136 tunnel destination 9.67.156.165 !

These definitions do not differ from the ones in router 7507: Example 9-24 Static routes in the router 7206 ip ip ip ip

route route route route

9.67.156.73 9.67.156.74 9.67.156.75 9.67.156.76

255.255.255.255 255.255.255.255 255.255.255.255 255.255.255.255

Tunnel1 Tunnel62 Tunnel154 Tunnel69

Trace examples The following trace samples were taken from an IPCS packet trace of system MVS069. The trace is shows: 򐂰 First the GRE header 򐂰 Then the delivery header 򐂰 Finally all together including the payload packet

GRE header analysis Generic Routing Encapsulation Header GRE Options : Version : 0 Protocol: IP Recursion : 0 Gre header size : 4 GRE Header 000000 00000800

: 4

Figure 9-16 GRE header

In analyzing this header, you see that the Cisco router did not build a checksum for the GRE header, no source routing was used, no authentication was provided by the router, etc. The only information in this header is the definition of the payload protocol with its Ethernet type value x'0800', which stands for IP.

Chapter 9. Load distribution with MNLB and Sysplex Distributor

341

This GRE packet sent by the Cisco router consists of 2 bytes:

Byte 0: Bits:

00 0000 0000 Checksum present Routing present Key present Sequence number present strict source route Recoursion control

Byte 1: Bits:

00 0000 0000 Flags Version

Byte 2: and 3: Bits: 0000

0800 0000 0000 1000 0000 0000 Protocol type

Figure 9-17 GRE header analysis

Delivery header -----------------------------------------------------------------------------503 MVS154 PACKET 00000001 16:31:46.482764 Packet Trace From Link : GIGELINK Device: QDIO Ethernet Full=64 Tod Clock : 2001/07/27 16:31:46.482763 Lost Records : 0 Flags: Pkt Ver2 Gre Source Port : 0 Dest Port: 0 Asid: 0034 TCB: 0000000 IpHeader: Version : 4 Header Length: 20 Tos : 30 QOS: Priority MinimumDelay Packet Length : 64 ID Number: 4AC1 Fragment : Offset: 0 TTL : 254 Protocol: GRE CheckSum: 24EA Source : 9.67.157.136 Destination : 9.67.156.165 IP Header : 20 000000 45300040 4AC10000 FE2F24EA 09439D88 09439CA5

Figure 9-18 Delivery header

This delivery packet was received at host MVS154, a target system. The packet was sent over the GbE link from the router 7206. 򐂰 Tunnel source was 9.67.157.136, which is the IP address of the GbE interface of router 7206.

342

Networking with z/OS and Cisco Routers: An Interoperability Guide

򐂰 Tunnel destination was 9.67.156 165, which is the static VIPA address of the z/OS host MVS154. 򐂰 Protocol in the IP header is GRE (byte 10 in the IP header with value x'2f'). 򐂰 The length of the payload packet is 64 bytes, consisting of:

– 20 bytes delivery header – 4 bytes GRE header – 40 bytes payload, consisting of: • 20 bytes IP header • 20 bytes TCP header • No data, because only an ACK, FIN was transmitted to initiate closing a TCP connection by the client

Entire delivery packet The following trace record shows the IPCS output in a different order as provided by IPCS. It shows in the first part the delivery header with its source address 9.67.157.136, which is the router’s GbE adapter address, and the destination address 9.67.156.165, which is the static VIPA address of the target system MVS154. The next block is the GRE header indicating the IP protocol only, since we did not define other GRE parameters. The last part is the payload with: IP header

Source: 9.67.156.104, which is the client Destination: 9.67.157.17, which is the cluster address for the application

TCP header

Source port, which is the client port Destination port, which is the port of the TN3270 server

Chapter 9. Load distribution with MNLB and Sysplex Distributor

343

-----------------------------------------------------------------------------503 MVS154 PACKET 00000001 16:31:46.482764 Packet Trace From Link : GIGELINK Device: QDIO Ethernet Full=64 Tod Clock : 2001/07/27 16:31:46.482763 Lost Records : 0 Flags: Pkt Ver2 Gre Source Port : 0 Dest Port: 0 Asid: 0034 TCB: 0000000 IpHeader: Version : 4 Header Length: 20 Tos : 30 QOS: Priority MinimumDelay Packet Length : 64 ID Number: 4AC1 Fragment : Offset: 0 TTL : 254 Protocol: GRE CheckSum: 24EA Source : 9.67.157.136 Destination : 9.67.156.165 Generic Routing Encapsulation Header GRE Options : Version : 0 Protocol: IP Recursion : 0 Gre header size : 4 IP Header : 20 000000 45300040 4AC10000 FE2F24EA 09439D88 09439CA5 GRE Header 000000 00000800 IpHeader: Version Tos Packet Length Fragment TTL Source Destination TCP Source Port Sequence Number Header Length Window Size

: 4 : : : : : : :

4 30 40 DontFragment 127 9.67.156.104 9.67.157.17

Header Length: 20 QOS: Priority MinimumDelay ID Number: 1591 Offset: 0 Protocol: TCP CheckSum: 9A0F

: : : :

4202 () 1118528899 20 17149

Destination Port: 23 (telnet) Ack Number: 1627793504 Flags: Ack Fin CheckSum: DEC0 FFFF Urgent Data Pointer:

IP Header : 20 IP: 9.67.156.104, 9.67.157.17 000000 45300028 15914000 7F069A0F 09439C68 09439D11 Protocol Header : 20 Port: 4202, 23 000000 106A0017 42AB6583 61062860 501142FD DEC00000

Figure 9-19 Entire delivery packet

344

Networking with z/OS and Cisco Routers: An Interoperability Guide

Related publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.

IBM Redbooks For information on ordering these publications, see “How to get IBM Redbooks” on page 346. 򐂰 IBM Communications Server for OS/390 V2R10 TCP/IP Implementation Guide Volume 1: Configuration and Routing, SG24-5227 򐂰 IBM Communications Server for OS/390 V2R10 TCP/IP Implementation Guide Volume 2: UNIX Applications, SG24-5228 򐂰 OS/390 eNetwork Communications Server V2R7 TCP/IP Implementation Guide Volume 3: MVS Applications, SG24-5229 򐂰 Accessing OpenEdition from the Internet, SG24-4721 򐂰 TCP/IP in a Sysplex, SG24-5235 򐂰 TCP/IP Tutorial and Technical Overview, GG24-3376 򐂰 The Basics of IP Network Design, SG24-2580 򐂰 Stay Cool on OS/390: Installing Firewall Technology, SG24-2046 򐂰 Secureway Communications Server for OS/390 V2R8 TCP/IP: Guide to Enhancements, SG24-5631 򐂰 Secure e-business in TCP/IP Networks on OS/390 and z/OS, SG24-5383

Other resources These publications are also relevant as further information sources: 򐂰 z/OS V1R2.0 C/C++ Programming Guide, SC09-4765 򐂰 z/OS V1R2.0 MVS Workload Management Services, SA22-7619 򐂰 z/OS V1R2.0 MVS Planning: Workload Management, SA22-7602 򐂰 z/OS V1R2.0 MVS Setting Up a Sysplex, SA22-7625 򐂰 z/OS V1R2.0 MVS Sysplex Services Guide, SA22-7617 򐂰 z/OS V1R2.0 MVS Sysplex Services Reference, SA22-7618

© Copyright IBM Corp. 2002

345

򐂰 z/OS V1R2.0 UNIX System Services Command Reference, SA22-7802 򐂰 z/OS V1R2.0 UNIX System Services Planning, GA22-7800 򐂰 z/OS V1R2.0 UNIX System Services User’s Guide, SA22-7801 򐂰 z/OS V1R2.0 UNIX System Services Messages and Codes, SA22-7807 򐂰 z/OS V1R2.0 CS: IP User’s Guide and Commands, SC31-8780 򐂰 z/OS V1R2.0 CS: IP Migration, GC31-8773 򐂰 z/OS V1R2.0 CS: IP Configuration Guide, SC31-8775 򐂰 z/OS V1R2.0 CS: IP Configuration Reference, SC31-8776 򐂰 z/OS V1R2.0 CS: IP Messages Volume 1 (EZA), SC31-8783 򐂰 z/OS V1R2.0 CS: IP Messages Volume 2 (EZB), SC31-8784 򐂰 z/OS V1R2.0 CS: IP Messages Volume 3 (EZY), SC31-8785 򐂰 z/OS V1R2.0 CS: IP Messages Volume 4 (EZZ-SNM), SC31-8786 򐂰 z/OS V1R2.0 CS: IP Application Programming Interface Guide, SC31-8788 򐂰 z/OS V1R2.0 CS: IP and SNA Codes, SC31-8791 򐂰 UNIX for Dummies, 3rd Edition, SR23-8083 򐂰 S/390 ESCON Channel PCI Adapter User's Guide and Service Information, SC23-4232 򐂰 DNS and BIND by Paul Albitz and Cricket Liu, O'Reilly & Associates, Inc., 1997, SR23-8771

Referenced Web sites This Web site is relevant as a further information source: 򐂰 http://www.cisco.com/

Cisco Systems, Inc.

How to get IBM Redbooks Search for additional Redbooks or redpieces, view, download, or order hardcopy from the Redbooks Web site: ibm.com/redbooks

Also download additional materials (code samples or diskette/CD-ROM images) from this Redbooks site.

346

Networking with z/OS and Cisco Routers: An Interoperability Guide

Redpieces are Redbooks in progress; not all Redbooks become redpieces and sometimes just a few chapters will be published this way. The intent is to get the information out much quicker than the formal publishing process allows.

IBM Redbooks collections Redbooks are also available on CD-ROMs. Click the CD-ROMs button on the Redbooks Web site for information about all the CD-ROMs offered, as well as updates and formats.

Related publications

347

348

Networking with z/OS and Cisco Routers: An Interoperability Guide

Index Symbols &SYSCLONE parameter 263

Numerics 7206 configuration 187

A access control lists (ACLs) 98 ACK 275 admission control 77 Advanced Peer-to-Peer Networking (APPN) 11 affinity 269 application audit 82 APPN 52 AS_Boundary_Routing 215 ASBR 207 configuration and redistribution 217 assured forwarding 112 Asynchronous Transfer Mode (ATM) 7, 29 ATM LANE 36 attack policies 86 automatic VIPA takeover 18 autonomous system border router (ASBR) 207, 217 autonomous systems 8, 41 AUTORESTART 160, 175

B backup Services Manager 150 bandwidth 86 Basic Rate Interface (BRI) 29 BEGINROUTES statement 9, 41 Behavior Aggregate (BA) 112 Best-effort service 74 BGP 43 Bronze 235 business audit 81

C CAR 99

© Copyright IBM Corp. 2002

Carrier Sense Multiple Access with Collision Detection (CSMA/CD) 36 CASA See Cisco Appliance Services Architecture Catalyst 6500 configuration 184 Catalyst 6509 184 CBWFQ 103, 241 Cell Loss Priority (CLP) 99 channel command word (CCW) 28, 31, 36 Channel Interface Processor (CIP) 27, 28 Channel Port Adapter (xCPA) 27, 28 Chariot 247 CHPID 157 CIP 153 CIP2 29 CIR 241 Cisco Catalyst 6509 switch 154, 206 channel attachment 28 CLAW support 156 CMPC+ support 166 IGX 8400 WAN switch 207 Internetworking Operating System (IOS) 29 LocalDirector 124, 132, 134, 258 MPC+ (CMPC+) 259 OS/390 Workload Agent 144 Cisco 3640 206 Cisco 6500 142 Cisco 7000 29 Cisco 7000/7500 27, 28 Cisco 7010 29 Cisco 7200 27, 28, 30 Cisco 7202 30 Cisco 7204 30 Cisco 7204VXR 30 Cisco 7206 30, 154, 206, 233, 264 Cisco 7206VX 259 Cisco 7206VXR 30 Cisco 7500 29 Cisco 7505 29, 256 Cisco 7507 29, 153, 206, 233, 259, 264, 283 Cisco 7513 29

349

Cisco 7576 29 Cisco Appliance Services Architecture (CASA) 25, 125, 142, 143, 257, 272, 274 multicast usage 203 protocol exchange 274 UDP usage 283 Cisco Discovery Protocol (CDP) 196 Cisco Mainframe Channel Connection Adapter (CMCC) 4, 29 Class of Service (CoS) 99 Class-Based Packet Marking 99, 244 configuring 114 Class-Based Policer 111 Class-Based Traffic Shaping (CBTS) 108, 110 Class-Based Weighted Fair Queuing (CBWFQ) 103, 236 Class-Based WFQ (CBWFQ) 100 CLAW 209, 266 CLI 98 CLP 99 cluster address 149 CMCC 154 CMPC+ 209 CNTLUNIT 157 command-line interface (CLI) 99 Committed Access Rate (CAR) 99 committed information rate (CIR) 233 Common Link Access to Workstation (CLAW) 4, 28, 30 configuration 266 in our network 259 Communications Storage Manager (CSM) 27 congestion avoidance 98, 106 congestion management 98, 100 connection dispatching 122 Connection Optimization (DNS/WLM) 17, 19 Controlled Load Service 77 convergence 46 CoS 99 counting to infinity 46 CQ 104 CRM 235 Cross Coupling Facility (XCF) 123, 258, 259 configuration 261 in out network 256 currently unused (CU) field 80 Custom Queuing (CQ) 100, 104

350

D D NET,TRL 175 DATAGRAMFWD 283 DCBWFQ 100, 103 dCEF 102 Designated Router (DR) 203 DESTIP 268 DFP 138 Differentiated Services 13, 15, 74, 75, 78 DS field 13, 80 policies 85 TOS octet 80 traffic class octet 80 Differentiated Services Code Point (DSCP) 80 Direct Memory Access (DMA) 7, 28 distance vector routing 42 distributed Cisco Express Forwarding (dCEF) 102 Distributed Class-Based WFQ (DCBWFQ) 100, 103 distributed DVIPA 21, 126 Distributed Traffic Shaping (DTS) 108, 110 Distributed Weighted Fair Queuing (DWFQ) 102 Distributed WFQ (DWFQ) 100 DMA 36 DNS mapping 121 DNS/WLM 19 DRAM 30 DS field 80 DSCP 80, 241 setting using the Policy Agent 86 DSCP values 240 DUAL algorithm 70 DVIPA 126 DWFQ 100, 102 Dynamic Feedback Protocol (DFP) 23, 124, 133, 143, 258 Dynamic Random Access Memory (DRAM) 30 dynamic routes 9, 41 dynamic virtual IP address (DVIPA) 126 Dynamic XCF 261

E ECPA 4, 30, 153 EE See Enterprise Extender (EE) EGP 8 EIGRP See Enhanced Interior Gateway Routing Proto-

Networking with z/OS and Cisco Routers: An Interoperability Guide

col ENDROUTES statement 9, 41 ENDVIPADYNAMIC 267 Enhanced Interior Gateway Routing Protocol (EIGRP) 9, 42, 44, 66, 205, 207 Enterprise Extender (EE) 5, 115, 240, 244 Enterprise System Connection (ESCON) 4, 28 ESCON 156 ESCON CPA (ECPA) 28 ESCON director 153, 171 Ethernet 7, 29, 36 expedited forwarding 112 exterior gateway protocol (EGP) 8, 41 EZARACF 92, 97

F Fast Ethernet 29 FDDI 36 feasible successor 68 FIFO 101 queuing 101 FIFO queuing 250 File Transfer Protocol (FTP) 256 FIN 278 First-In, First-Out (FIFO) 100 fixed affinity 270 Flow-based WFQ (WFQ) 100 Forwarding Agent 141, 144, 150, 256 configuration 283 frame relay 233 Frame Relay Discard Eligibility (DE) 99 Frame Relay Traffic Shaping 110 Frame Relay Traffic Shaping (FRTS) 108, 110, 242 FRTS 242 FTP 234, 289 as distribution application 126

G gateway 10, 45 GATEWAY statement 9, 41 Generic Routing Encapsulation (GRE) 126, 209, 256, 264, 292, 331 Generic Traffic Shaping (GTS) 108, 109, 242 Gigabit Ethernet 36, 184, 259 Gold 235 GRE 143 GTS 242 Guaranteed Service 78

Guaranteed Services 75

H Hierarchical File System (HFS) 9, 44 High Availability Web Services 125, 142 High Performance Data Transfer (HPDT) 27 HOME 172, 268 hot standby 18 HR 235 hybrid routing 43 Hypertext Transfer Protocol (HTTP) 134, 141, 234, 256 traffic 95 traffic classification 100

I IDAW 34 IDS 89 IGMP 203 IGNOREREDIRECT 283 IGP 8 Import_Direct_Routes 215 Indirect Address Word (IDAW) 34 Integrated Services 14, 75 policies 85 interior gateway protocol (IGP) 8, 10, 41, 44 Internet Group Management Protocol (IGMP) 149 Intrusion Detection Services (IDS) 89 IntServ 14 IOCP 168, 171 IODEVICE 156 IOS 153 IP 3 IP assist 36, 327, 330 IP datagram TOS octet 80 IP offload 36 IP precedence bits 80 IP precedence values 98 IP routing implementations 9, 44 IP Version 6 (IPv6) traffic class octet 80 IPA 36 IPAQGNET 202 IPCONFIG 10, 311

Index

351

K KILL 92

L LAN Channel Station (LCS) 6, 36 LDAP 84, 89 Lightweight Directory Access Protocol (LDAP) 14, 84, 89 link 43 link state 43 link state database 43 link state routing 43 LNKSTATUS 202 LocalDirector 124, 132, 134, 258 Low Latency Queuing (LLQ) 100, 103, 235, 237

M MAXBUFRU 172 maximum packet size 87 medium access control (MAC) 326 metric 44 MIME 100 minimum policed unit 87 MNLB Backup Service Manager 24 configuration 281 Forwarding Agent 24 Service Manager 23 Workload Agent 24 Modular QoS CLI (MQC) 98 MPC 266 MPCLEVEL 172 MPCOSA 36 MPCPTP 175 MPLS 111 MQC 98 MSFC 199 MultiNode Load Balancing (MNLB) 17, 22, 124, 135, 255 detail 122 with Sysplex Distributor 142 MULTIPATH 283 Multipath Channel Plus (MPC+) 4, 28, 30, 33, 259

N NBAR 100 netstat

352

vcrt 285, 289 vdpt 285 vipadcfg 285 vipadyn 285 network audit 81 Network Dispatcher 17, 122 high availability 20 Network Dispatcher (NDR) 20 Network-Based Application Recognition (NBAR) 100 NOAUTORESTART 175 Not-So-Stubby area 64

O OAT 327 Olympic reference model 82 Olympic scheme 82 OMPROUTE 9, 44 configuration 211 Open Shortest Path First (OSPF) 9, 42, 261 Open Systems Adapter (OSA) 6 ORouteD 9, 44 OSA Adapter Table (OAT) 265 OSA-2 27, 35 OSA-Express 27, 28, 154, 184, 209, 330 configuration 183 in our network 264 supported protocols 35 OSPF 10, 43, 52, 205, 207 areas 53 configuration 209 configuration in the sysplex 209 equal cost multipath 11 implementation 11 overview 52 virtual links 61 OutgoingTOS 240

P packet classifier 76 packet scheduler 76 PAGENT 21, 24, 84, 237, 241, 244, 258 configuration file 89 parallel CPA (PCPA) 28 pasearch 89 PATHMTUDISCOVERY 283 PBR 98 PCPA 4

Networking with z/OS and Cisco Routers: An Interoperability Guide

peak rate 87 per-hop-behavior (PHB) 79 permanent virtual circuit (PVC) 233 PHB 80 physical connectivity 4 Platinum 235 poison 49 policing 99, 108 policy action 93 Policy Agent See PAGENT policy condition 93 policy time period condition 93 Policy-Based Routing (PBR) 98 Port Aggregation Protocol (PAgP) 186, 196 PQ 105 Priority Queuing (PQ) 100, 105 Program Controlled Interrupt (PCI) 31 PVCs 241

Q QDIO 36 QoS 142 across network 97 configuration 111 deployment 81 implementation steps 232 managing 117 MIBs 118 test results 247 tools 118 QoS Device Manager (QDM) 118 Qos Policy Manager 118 QPM 119 Queued Direct I/O (QDIO) 7, 27, 36, 86

R RACLIST 93 RAPI 87, 95 Redbooks Web site 346 Contact us xiii redistribution 218 reliable transport protocol 69 Reserved Services 75 Resource Reservation Protocol (RSVP) 13, 14, 75, 77 configuring 115 RFC

1058 10, 49, 50 1721 10 1721-1724 10 1722 10 1723 10, 51 1724 10 1812 52 2211 77 2212 77 2328 11, 52 2474 79, 80, 99 2475 79, 99 2597 79, 99, 113 2598 79, 99 RIP hop count 10, 45 metric 10, 45 RIP Version 1 10, 50 RIP Version 2 10, 50, 51 round-robin DNS 17, 18 route maps 219 route redistribution 44, 62 route summarization 65 Route Switch Processor (RSP) 110 RouteD 9, 44 routes BSD format 9, 41 routing 39 autonomous system 8, 41 routing daemon 9, 42 Routing Information Protocol (RIP) 9, 42, 261 routing metric 10, 45 routing table 8, 41 RSVP Agent 87, 89 RSVP API (RAPI) 95 RSVPD 84

S SAF 97 scan policies 86 SERVAUTH class 92 service classes 77 Service Level Agreement Performance Monitor (SLAPM) 118 service levels 82 Service Manager 149 backup 150 components 141

Index

353

SETDSCP 244 SEZAINST 92, 97 shaping 108 Silver 235 SNA 234 traffic 115 SNA Batch 235 SNA Class of Service (COS) 115 SNA Interactive 235 SNA QoS 115 SNA Switching Services (SNASw) 244, 246 SNMP SLA subagent 118 SPF algorithm 52 sph 48 split horizon 48 split horizon with poison reverse 49 START DEVICE 327 static routes 9, 41 static VIPA 18 STATUS 201 STOP 92 stub area 64, 215 successor 67 SYN 275 Sysplex Distributor 17, 21, 122, 123, 144, 255, 256, 265 backup 311 distributing stack 21 failure 148 policies 86 recovery 321 using QoS information 21 Sysplex Distributor/MNLB joint solution 17, 24, 122 SYSPLEXROUTING 283

variable subnet mask 10 TCP/IP profile 172 TcpImage 90 Telnet 234, 256 TG 172 TN3270 38, 289, 326 TN3270E 299 token bucket 95 depth 87 mean rate 87 traffic conditioner 95 token-ring 7, 29 topology database 57 TOS 99 traffic audit 81, 232 traffic classification 82 traffic conditioner block (TCB) 79 traffic policing 98 Traffic Regulation Management (TRM) 89 Traffic Regulation policies 86 traffic shaping 98 triggered updates 50 TRL major node 171 TRLE 265 TRM 89 Type of Service (TOS) 13, 86

U UDP 89 Uniform Resource Locator (URL) 82, 133 UNIX System Services 9, 44 URL 82, 100

V T takeback 130 takeover 130 TCP 86 TCP/IP addressing 7, 40 DLC layer 40 dynamic routing 9, 41 failure 127 IP address 7, 39 routing 7 static routing 9, 41 subnet mask 7, 40

354

VARSUBNETTING 10, 283 vector 42 Versatile Interface Processor (VIP) 29, 110 VIAPDYNAMIC 267 VIP 102 VIP2 30 VIPA takeback 130 takeover 18, 130 VIPADIST 268 VIPADISTRIBUTE 311 VIPADYNAMIC 281 VIPASMPARMS 281 Virtual Interface Processor (VIP) 102

Networking with z/OS and Cisco Routers: An Interoperability Guide

Virtual IP Address (VIPA) 16, 18 virtual LAN (VLAN) 89 VLAN 89, 199 Voice over IP (VoIP) 103, 234 traffic 103

W Web 326 WebSphere Edge Server 20 Weighted Fair Queuing (WFQ) 100, 101, 103 Weighted Random Early Detection (WRED) 107 WFQ 101 wildcard affinity 270 WLMPOLL 148 Workload Agent 137, 144 Workload Manager (WLM) 21, 23, 87, 124, 129, 143, 144, 256, 259 Cisco agent 144 WRED 107

X XCF 123 xCPA 4

Z z/OS CLAW commands 165 z/OS MPC+ commands 175 z/OS OSA-Express commands 200

Index

355

356

Networking with z/OS and Cisco Routers: An Interoperability Guide

Networking with z/OS and Cisco Routers: An Interoperability Guide

(0.5” spine) 0.475”0.875” 250 459 pages

Back cover

®

Networking with z/OS and Cisco Routers: An Interoperability Guide Implement advanced z/OS and Cisco functionality in your network Details OSPF, EIGRP, MNLB and Sysplex Distributor Includes useful samples and scenarios

The increased popularity of Cisco routers has led to their ubiquitous presence within the network infrastructure of many enterprises. In such large corporations, it is also common for many applications to execute on the z/OS (formerly OS/390) platform. As a result, the interoperation of z/OS-based systems and Cisco network infrastructures is a crucial aspect of many enterprise internetworks. This IBM Redbook provides a survey of the components necessary to achieve full interoperation between your z/OS-based servers and your Cisco IP routing environment. It may be used as a network design guide for understanding the considerations of the many aspects of interoperation. We divide this discussion into four major components: 򐂰

The options and configuration of channel-attached Cisco routers

򐂰

The design considerations for combining OSPF-based z/OS systems with Cisco-based EIGRP networks

򐂰

A methodology for deploying Quality of Service policies throughout the network

򐂰

The implementation of load balancing and high availability using Sysplex Distributor and MNLB (including new z/OS V1R2 support)

We highlight our discussion with a realistic implementation scenario and real configurations that will aid you in the deployment of these solutions. In addition, we provide in-depth discussions, traces, and traffic visualizations to show the technology at work.

SG24-6297-00

ISBN 0738423432

INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information: ibm.com/redbooks