Troubleshooting Tivoli Using the Latest Features 9780738426914

255 108 14MB

English Pages 1154 Year 2003

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Troubleshooting Tivoli Using the Latest Features
 9780738426914

Citation preview

Front cover

Troubleshooting Tivoli Using the Latest Features Insider’s guide to Tivoli troubleshooting

Updated for post 3.6 Framework and applications New troubleshooting functions included

Vasfi Gucer Orcun Atakan Budi Darmawan Jamie Carl Murtuza Choilawala

ibm.com/redbooks

International Technical Support Organization Troubleshooting Tivoli Using the Latest Features March 2003

SG24-6614-00

Note: Before using this information and the product it supports, read the information in “Notices” on page xxix.

First Edition (March 2003) This edition applies to Tivoli Management Framework Version 3, Release 1 onwards, IBM Tivoli Monitoring Version 5, Release 1, Tivoli Enterprise Console Version 3, Release 7 onwards, IBM Tivoli Configuration Manager Version 4, Release 2, Tivoli Workload Scheduler Version 8, Release 1, and Tivoli Remote Contol Version 3, Release 7. © Copyright International Business Machines Corporation 2003. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxvii Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi The team that wrote this redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiii Comments welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiv Part 1. Introduction, installation, and core services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Generic problem determination outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 User groups and other sources of information . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Tivoli Field Guides. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Tivoli mailing list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.3 Orb Data Limited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Other Web sites containing Tivoli related information . . . . . . . . . . . . . . . . 16 Chapter 2. Tivoli Object Database architecture . . . . . . . . . . . . . . . . . . . . . 17 2.1 The Tivoli Enterprise management challenge . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Tivoli Enterprise architecture overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 About CORBA 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 Tivoli Enterprise CORBA implementation . . . . . . . . . . . . . . . . . . . . . 21 2.2.3 Tivoli Enterprise heterogeneity and interoperability . . . . . . . . . . . . . 22 2.2.4 Management services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Tivoli Management Framework concepts . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 Object identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.2 Interfaces and classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.3 Object repository architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.4 Complementary terminology: TEIDL, Interface, and Repository . . . . 34 2.3.5 Internals of Interface: Datatypes, attributes, and operations. . . . . . . 35 2.3.6 An overview of datatypes used in interface definitions . . . . . . . . . . . 40 2.3.7 Internals of profile managers, profiles, and CCMS . . . . . . . . . . . . . . 42 2.3.8 The final (dataless) distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

© Copyright IBM Corp. 2003. All rights reserved.

iii

2.3.9 Database and dataless distribution levels . . . . . . . . . . . . . . . . . . . . . 48 2.3.10 Types of distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.3.11 Advanced CCMS concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Chapter 3. Problem determination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.1 Object Repository tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.1.1 bdbe and bdbx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.1.2 otherpages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.1.3 objcall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.1.4 idlcall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.1.5 idlattr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.1.6 odbls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.1.7 irview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.1.8 tmstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.1.9 odstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.1.10 wtrace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.2 Troubleshooting and search techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2.1 Basic object database access commands . . . . . . . . . . . . . . . . . . . . 66 3.2.2 Troubleshooting methods using low level commands. . . . . . . . . . . . 76 3.2.3 odstat, wtrace, and tmstat scenarios. . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2.4 Maintaining the object database: wchkdb and wchknode . . . . . . . . . 94 3.2.5 Object database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.2.6 Gateway database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.2.7 Consistency versus corruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.2.8 Database check output example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.2.9 The when and how of issuing wchkdb . . . . . . . . . . . . . . . . . . . . . . . 98 3.2.10 Problems with Tivoli object database checks . . . . . . . . . . . . . . . . 100 3.2.11 Improving your use and understanding of wchkdb . . . . . . . . . . . . 101 3.2.12 A second look at check_db and fix_db . . . . . . . . . . . . . . . . . . . . . 102 3.2.13 wchknode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3.2.14 Issuing the wchknode command . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Chapter 4. Using and maintaining log files. . . . . . . . . . . . . . . . . . . . . . . . 117 4.1 Tivoli generated log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.1.1 oservlog. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.1.2 gatelog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.1.3 epmgrlog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.1.4 Installation logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.1.5 Additional logs and files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.1.6 Installation files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.2 User generated log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.2.1 Subroutine to write to a log file with a time stamp . . . . . . . . . . . . . . 123 4.2.2 Logging configuration file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

iv

Troubleshooting Tivoli Using the Latest Features

4.2.3 Logging setup and call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.2.4 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.3 Log rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Chapter 5. Autotrace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.2 Autotrace terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.3 Run-time components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.3.1 Common configuration files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.4 How to use Autotrace in a customer environment. . . . . . . . . . . . . . . . . . 138 5.4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.4.2 Determine active products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.3 Determine configuration channels. . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.4 Enable tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.5 Determine whether Autotrace is capturing data . . . . . . . . . . . . . . . 140 5.4.6 Disable tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.4.7 Channel operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.4.8 Collect trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.4.9 Autotrace Snap Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Chapter 6. Tivoli core installation process . . . . . . . . . . . . . . . . . . . . . . . . 145 6.1 Anatomy of a Tivoli CD-ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.2 Installation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.2.1 General pre-install checks, hints, and tips . . . . . . . . . . . . . . . . . . . 152 6.2.2 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.2.3 Windows NT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.2.4 NFS mounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.2.5 Environment files and variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.2.6 Automatic startup versus remote startup . . . . . . . . . . . . . . . . . . . . 155 6.2.7 Deciding when a re-install is best . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.3 Server installation: Behind the scenes . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.3.1 Troubleshooting server installs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.4 Client installation: Behind the scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.4.1 Start of managed node install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 6.4.2 The install begins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.4.3 Installation of the Tivoli Management Framework files . . . . . . . . . . 171 6.4.4 Tivoli Remote Access Account (TRAA). . . . . . . . . . . . . . . . . . . . . . 172 6.4.5 Installing the files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.4.6 Client database creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.4.7 Problems starting the oserv and contacting the TMR . . . . . . . . . . . 177 6.4.8 Configuring the client database (the TMR Server perspective) . . . 178 6.4.9 Configuring the client database (the client perspective) . . . . . . . . . 179 6.4.10 Verifying a properly installed managed node . . . . . . . . . . . . . . . . 180

Contents

v

6.4.11 Problems creating the managed node . . . . . . . . . . . . . . . . . . . . . 180 6.5 Finishing the install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.5.1 Files installed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.5.2 Updating Name Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.5.3 Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.6 Common errors for server and client installs . . . . . . . . . . . . . . . . . . . . . . 182 6.7 Reinstalling clients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Chapter 7. Software Installation Service (SIS) . . . . . . . . . . . . . . . . . . . . . 185 7.1 SIS component overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 7.2 SIS considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 7.3 Using SIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.3.1 Starting the SIS Graphical User Interface . . . . . . . . . . . . . . . . . . . . 188 7.3.2 Building the Install Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 7.3.3 Select target for install. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 7.3.4 SIS Response Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 7.3.5 Using SIS to install Tivoli products . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.3.6 Tuning SIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 7.3.7 Unsuccessful install. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 7.3.8 Synchronize SIS with TMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 7.4 Troubleshooting SIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 7.4.1 SIS log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 7.4.2 Troubleshooting SIS desktop launches. . . . . . . . . . . . . . . . . . . . . . 204 7.4.3 Troubleshooting SIS startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.4.4 Troubleshooting SIS locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.4.5 Troubleshooting SIS usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 7.4.6 Important SIS files and executables . . . . . . . . . . . . . . . . . . . . . . . . 208 Chapter 8. ISMP based installation (Integrated Installation) . . . . . . . . . . 211 8.1 Overview of Integrated Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8.2 Server Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8.2.1 Authorization roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 8.2.2 Database requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 8.2.3 Starting the installation programs . . . . . . . . . . . . . . . . . . . . . . . . . . 214 8.2.4 Server Install: Behind the scenes . . . . . . . . . . . . . . . . . . . . . . . . . . 215 8.2.5 Typical Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 8.2.6 Custom Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 8.3 Troubleshooting Server Install. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 8.3.1 Cmsummary.log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 8.3.2 Cmismp.log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 8.3.3 Traditional logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 8.3.4 General troubleshooting steps for Server Install . . . . . . . . . . . . . . . 243 8.3.5 Server Install troubleshooting examples . . . . . . . . . . . . . . . . . . . . . 243

vi

Troubleshooting Tivoli Using the Latest Features

8.4 Desktop Install. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 8.5 Troubleshooting Desktop Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 8.5.1 Desktop Install troubleshooting example . . . . . . . . . . . . . . . . . . . . 251 8.6 Web Gateway Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 8.7 Troubleshooting Web Gateway Install. . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Chapter 9. Patch maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 9.1 Patch application and information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 9.1.1 The patch factory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 9.1.2 Forms of patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 9.1.3 Learning about new patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 9.1.4 Understanding the prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 9.1.5 Obtaining patch files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 9.1.6 Turning the archives into binaries . . . . . . . . . . . . . . . . . . . . . . . . . . 265 9.1.7 Patch contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 9.1.8 Building your collection of patches . . . . . . . . . . . . . . . . . . . . . . . . . 266 9.1.9 Testing patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 9.1.10 Applying patches to your systems. . . . . . . . . . . . . . . . . . . . . . . . . 268 9.1.11 More rapid deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 9.1.12 Tag files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 9.1.13 Upgrading endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 9.1.14 Validating patch application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 9.1.15 How to back out of a patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 9.1.16 Knowing what is installed where . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Chapter 10. Backup and restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 10.1 Backup process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 10.1.1 Backup roles and access rights . . . . . . . . . . . . . . . . . . . . . . . . . . 278 10.1.2 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 10.1.3 Running backup from the command line. . . . . . . . . . . . . . . . . . . . 284 10.1.4 Backup process behind the scenes . . . . . . . . . . . . . . . . . . . . . . . 285 10.1.5 Temporary backup file considerations . . . . . . . . . . . . . . . . . . . . . 286 10.1.6 Binary backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 10.1.7 File system backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 10.2 Restore process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 10.2.1 Restore roles and access rights . . . . . . . . . . . . . . . . . . . . . . . . . . 294 10.2.2 Restore examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 10.2.3 Rescue operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 10.2.4 Items not restored from a backup . . . . . . . . . . . . . . . . . . . . . . . . . 298 10.3 Troubleshooting backup and restore operations . . . . . . . . . . . . . . . . . . 299 10.3.1 Restore with -r and - r -R options . . . . . . . . . . . . . . . . . . . . . . . . . 300 10.3.2 Changing the default backup directory . . . . . . . . . . . . . . . . . . . . . 300 10.3.3 Database cannot be backed up . . . . . . . . . . . . . . . . . . . . . . . . . . 301

Contents

vii

10.3.4 10.3.5 10.3.6 10.3.7

Malformed ASCII exception . . . . . . . . . IOM route timeouts . . . . . . . . . . . . . . . . Identifying managed nodes . . . . . . . . . . Implications of using an old backup . . .

...... ...... ...... ......

....... ....... ....... .......

...... ...... ...... ......

. . . .

301 301 302 302

Chapter 11. Tivoli Management Framework core services . . . . . . . . . . . 305 11.1 Tivoli administrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 11.1.1 Authorization roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 11.1.2 Policy regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 11.1.3 Creating administrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 11.1.4 Using a single Tivoli administrator for multiple users . . . . . . . . . . 313 11.1.5 ID mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 11.1.6 Removing and deleting administrators . . . . . . . . . . . . . . . . . . . . . 320 11.1.7 Administrator commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 11.1.8 Administrator roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 11.1.9 Interregion administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 11.1.10 Summary of hints for defining administrators . . . . . . . . . . . . . . . 324 11.1.11 Maintaining Tivoli administrators in large environments . . . . . . . 325 11.2 Notice groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.2.1 Notice group commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.2.2 Notice group components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.2.3 Notice expiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 11.2.4 Custom notice groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 11.2.5 Notices database corruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 11.3 Tivoli tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 11.3.1 Tivoli jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 11.3.2 Task library features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 11.3.3 Task and job internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 11.3.4 Troubleshooting tasks and jobs . . . . . . . . . . . . . . . . . . . . . . . . . . 344 11.4 Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 11.4.1 Scheduler commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 11.4.2 Tips for working with the Scheduler . . . . . . . . . . . . . . . . . . . . . . . 349 11.4.3 Troubleshooting common Scheduler errors . . . . . . . . . . . . . . . . . 351 11.5 Interconnected TMRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 11.5.1 The Tivoli Name Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 11.5.2 Connecting TMRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 11.5.3 Resource visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 11.5.4 Interregion updates and object time stamps . . . . . . . . . . . . . . . . . 362 11.5.5 Resource updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 11.5.6 Case study: Hub-spoke architecture . . . . . . . . . . . . . . . . . . . . . . . 367 11.5.7 Naming standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 11.5.8 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 11.5.9 Troubleshooting TMR connections . . . . . . . . . . . . . . . . . . . . . . . . 374

viii

Troubleshooting Tivoli Using the Latest Features

11.6 Multiplexed Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 11.6.1 Mdist and the distribution hierarchy . . . . . . . . . . . . . . . . . . . . . . . 382 11.6.2 Repeater tuning in MDist1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 11.6.3 Active distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 11.6.4 MDist2 components and functionalities. . . . . . . . . . . . . . . . . . . . . 396 11.6.5 What is new in MDist2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 11.6.6 Troubleshooting MDist2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Chapter 12. RDBMS Interface Module (RIM) . . . . . . . . . . . . . . . . . . . . . . . 441 12.1 Overview of RIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 12.2 Understanding RIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 12.2.1 RIM behind the scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 12.2.2 RIM APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 12.2.3 RDBMS_Interface translation layer . . . . . . . . . . . . . . . . . . . . . . . . 444 12.2.4 Vendor adaptor layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 12.2.5 Client application communication . . . . . . . . . . . . . . . . . . . . . . . . . 444 12.3 Installing a RIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 12.3.1 Creating application database tables . . . . . . . . . . . . . . . . . . . . . . 446 12.3.2 Creating RIM objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 12.3.3 RIM scenario-Inventory RIM objects . . . . . . . . . . . . . . . . . . . . . . . 447 12.4 Troubleshooting example: Failure to connect with a RDBMS . . . . . . . . 454 12.4.1 RIM specifics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 12.5 Designing your Tivoli environment for a RIM . . . . . . . . . . . . . . . . . . . . 461 Chapter 13. Endpoints and endpoint management . . . . . . . . . . . . . . . . . 463 13.1 Tivoli endpoint basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 13.1.1 Endpoint methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 13.1.2 Gateway methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 13.2 Common endpoint management problems . . . . . . . . . . . . . . . . . . . . . . 469 13.2.1 Common misconceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 13.2.2 Understanding the login process. . . . . . . . . . . . . . . . . . . . . . . . . . 470 13.2.3 Recap of all configurable parameters for endpoint management . 474 13.2.4 Endpoint Initial login failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 13.2.5 Multiple (duplicate) endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 13.2.6 Endpoint isolation/migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 13.2.7 Best practices for endpoint management . . . . . . . . . . . . . . . . . . . 494 13.3 Endpoint policies and endpoint policy scripting . . . . . . . . . . . . . . . . . . . 494 13.3.1 General rules for policy scripting . . . . . . . . . . . . . . . . . . . . . . . . . . 496 13.3.2 Information available to policy scripts . . . . . . . . . . . . . . . . . . . . . . 497 13.3.3 Viewing, modifying, and installing policy scripts . . . . . . . . . . . . . . 498 13.3.4 allow_install_policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 13.3.5 select_gateway_policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 13.3.6 after_install policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507

Contents

ix

13.3.7 login_policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 13.4 Endpoint manager and gateway internals . . . . . . . . . . . . . . . . . . . . . . . 508 13.4.1 Why it is so important . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 13.4.2 Gateway thread usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 13.4.3 Gateway threads categorized . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 13.4.4 Endpoint manager thread usage . . . . . . . . . . . . . . . . . . . . . . . . . . 516 13.4.5 Gateway ALI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 13.4.6 Configuring endpoint manager threads . . . . . . . . . . . . . . . . . . . . . 521 Part 2. Performance and availability applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 Chapter 14. Tivoli Enterprise Console. . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 14.1 Tivoli Enterprise Console architecture. . . . . . . . . . . . . . . . . . . . . . . . . . 527 14.1.1 Architecture scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 14.2 Installation debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 14.3 Tivoli Enterprise Console 3.7 tracing and logging . . . . . . . . . . . . . . . . . 538 14.3.1 Tivoli Enterprise Console Server 3.7 diagnostic logging . . . . . . . . 538 14.3.2 TEC User Interface Server 3.7 diagnostic logging . . . . . . . . . . . . 557 14.4 RIM tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 14.5 Rule tracing, profiling, logging, and reporting . . . . . . . . . . . . . . . . . . . . 565 14.5.1 Rule tracing and profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 14.5.2 Event Activity Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 14.5.3 Common rule base errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 14.6 TEC Java Console 3.7 debugging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 14.6.1 Installation and uninstallation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 14.6.2 Run-time logging and debugging . . . . . . . . . . . . . . . . . . . . . . . . . 586 14.7 TEC Sample Event Information 3.7: Spider daemon . . . . . . . . . . . . . . 592 14.8 Tivoli Enterprise Console 3.7 Windows Event Log Adapter debugging 595 14.8.1 Running an adapter in test mode . . . . . . . . . . . . . . . . . . . . . . . . . 600 14.8.2 Running an adapter in debug mode . . . . . . . . . . . . . . . . . . . . . . . 601 14.8.3 Running an adapter with diagnostic logging . . . . . . . . . . . . . . . . . 603 14.8.4 Adapter logging example scenarios . . . . . . . . . . . . . . . . . . . . . . . 605 Chapter 15. IBM Tivoli Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 15.1 IBM Tivoli Monitoring 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 15.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 15.2.1 Integration with Common Information Model . . . . . . . . . . . . . . . . 619 15.2.2 Resource model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 15.3 New features of IBM Tivoli Monitoring 5.1. . . . . . . . . . . . . . . . . . . . . . . 623 15.3.1 MDist2 support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 15.3.2 Web-based Health Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 15.3.3 Tivoli Enterprise Data Warehouse support . . . . . . . . . . . . . . . . . . 624 15.3.4 Additional response actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 15.4 Troubleshooting IBM Tivoli Monitoring 5.1 . . . . . . . . . . . . . . . . . . . . . . 624

x

Troubleshooting Tivoli Using the Latest Features

15.4.1 Logs and traces format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 15.4.2 TMR Server logs and traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 15.4.3 Gateway logs and traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 15.4.4 Endpoint logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632 15.4.5 Web Health Console logs and traces . . . . . . . . . . . . . . . . . . . . . . 638 15.5 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 15.5.1 Tool to generate XML file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 15.5.2 Autotrace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 15.5.3 Serviceability tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 15.6 Known problems and their resolutions . . . . . . . . . . . . . . . . . . . . . . . . . 645 15.7 Problem determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Chapter 16. Tivoli Business Systems Manager . . . . . . . . . . . . . . . . . . . . 673 16.1 Internals of Tivoli Business Systems Manager . . . . . . . . . . . . . . . . . . . 674 16.1.1 Base services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 16.1.2 Object hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681 16.1.3 Enterprise Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 16.1.4 Distributed Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730 16.2 Configuration of Tivoli Business Systems Manager . . . . . . . . . . . . . . . 746 16.2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746 16.2.2 NT Servers installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 16.3 Problem determination of Tivoli Business Systems Manager . . . . . . . . 750 16.3.1 General problem determination techniques . . . . . . . . . . . . . . . . . 750 16.3.2 TBSM base services logging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750 16.3.3 Remote access for Tivoli Business Systems Manager logs . . . . . 755 Chapter 17. Tivoli Enterprise Data Warehouse. . . . . . . . . . . . . . . . . . . . . 757 17.1 Tivoli Enterprise Data Warehouse introduction . . . . . . . . . . . . . . . . . . . 758 17.1.1 How Tivoli Enterprise Data Warehouse is packaged . . . . . . . . . . 760 17.2 Troubleshooting Tivoli Enterprise Data Warehouse . . . . . . . . . . . . . . . 760 17.2.1 Troubleshooting core installation . . . . . . . . . . . . . . . . . . . . . . . . . 761 17.2.2 Troubleshooting Warehouse Enablement Pack installations . . . . 762 17.2.3 Troubleshooting the IBM Console and the Report Interface . . . . . 763 17.2.4 Troubleshooting the customization . . . . . . . . . . . . . . . . . . . . . . . . 766 17.3 Troubleshooting ETLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 17.3.1 Running ETLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 17.3.2 The Data Warehouse Center fails to open . . . . . . . . . . . . . . . . . . 774 17.3.3 Data marts show old data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776 17.4 Maintenance and backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777 17.4.1 Removing old data from the Data Warehouse Center logs. . . . . . 777 17.4.2 Removing old data from the central data warehouse . . . . . . . . . . 778 17.4.3 Reorganizing the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 17.4.4 Updating system catalog statistics . . . . . . . . . . . . . . . . . . . . . . . . 780

Contents

xi

17.4.5 Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 17.5 Un-install components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 17.5.1 Un-install Tivoli Enterprise Data Warehouse core product . . . . . . 781 17.5.2 Un-install the warehouse packs . . . . . . . . . . . . . . . . . . . . . . . . . . 783 17.6 Troubleshooting IBM Tivoli Monitoring Version 5.1.1 TEDW Support. . 783 17.6.1 Retrieving the date of last data upload into ITM database . . . . . . 784 17.6.2 Testing the connection between RIM host and ITM database . . . 785 17.6.3 Checking the status of distributed resource models . . . . . . . . . . . 785 17.6.4 Reviewing data collection parameters . . . . . . . . . . . . . . . . . . . . . 786 17.6.5 Checking trace files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787 Part 3. Configuration and operation applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Chapter 18. Tivoli Workload Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . 793 18.1 Tivoli Workload Scheduler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794 18.1.1 The Tivoli Workload Scheduler network . . . . . . . . . . . . . . . . . . . . 795 18.1.2 Tivoli Workload Scheduler workstation types . . . . . . . . . . . . . . . . 797 18.2 Tivoli Workload Scheduler for z/OS . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 18.2.1 Tivoli Workload Scheduler for z/OS configuration. . . . . . . . . . . . . 801 18.3 End-to-end scheduling architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 805 18.3.1 How end-to-end scheduling works . . . . . . . . . . . . . . . . . . . . . . . . 805 18.3.2 Tivoli Workload Scheduler for z/OS end-to-end components . . . . 808 18.3.3 Tivoli Workload Scheduler for z/OS end-to-end configuration . . . 812 18.3.4 Tivoli Workload Scheduler for z/OS end-to-end database objects 813 18.3.5 Tivoli Workload Scheduler for z/OS end-to-end plans . . . . . . . . . 815 18.4 Troubleshooting for Tivoli Workload Scheduler for z/OS . . . . . . . . . . . 819 18.4.1 Using keywords to describe a problem . . . . . . . . . . . . . . . . . . . . . 820 18.4.2 Searching the software-support database . . . . . . . . . . . . . . . . . . 820 18.4.3 Problem-type keywords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820 18.4.4 Problem analysis procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823 18.4.5 Abnormal termination (ABEND or ABENDU) procedure . . . . . . . . 824 18.4.6 The diagnostic file (EQQDUMP) . . . . . . . . . . . . . . . . . . . . . . . . . . 825 18.4.7 Trace information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 18.4.8 System dump dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 18.4.9 LOOP procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826 18.4.10 Message (MSG) procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827 18.4.11 Performance (PERFM) procedure . . . . . . . . . . . . . . . . . . . . . . . 828 18.4.12 WAIT procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829 18.4.13 Preparing a console dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829 18.4.14 Dump the failing system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 18.4.15 Information needed for all problems . . . . . . . . . . . . . . . . . . . . . . 831 18.4.16 Performing problem determination for tracking events . . . . . . . . 832 18.5 Troubleshooting end-to-end solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 839

xii

Troubleshooting Tivoli Using the Latest Features

18.5.1 End-to-end working directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839 18.5.2 The standard list directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841 18.5.3 The standard list messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843 18.5.4 Diagnose and fix problems with unlinked workstations . . . . . . . . . 843 18.5.5 Symphony renew option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847 18.5.6 UNIX System Services diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 848 18.5.7 TCP/IP server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851 18.5.8 Tivoli Workload Scheduler for z/OS connector . . . . . . . . . . . . . . . 852 18.6 Troubleshooting the Job Scheduling Console . . . . . . . . . . . . . . . . . . . . 854 18.6.1 Trace for the Job Scheduling Console . . . . . . . . . . . . . . . . . . . . . 857 18.7 Troubleshooting Tivoli Workload Scheduler . . . . . . . . . . . . . . . . . . . . . 859 18.7.1 FTAs not linking to the master . . . . . . . . . . . . . . . . . . . . . . . . . . . 859 18.7.2 Batchman not up or will not stay up (batchman down) . . . . . . . . . 861 18.7.3 Jobs not running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862 18.7.4 Jnextday is hung or still in EXEC state . . . . . . . . . . . . . . . . . . . . . 864 18.7.5 Jnextday in ABEND state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864 18.7.6 FTA still not linked after Jnextday . . . . . . . . . . . . . . . . . . . . . . . . . 865 18.7.7 Introduction to the Tivoli Workload Scheduler 8.1 tracing facility . 865 Chapter 19. IBM Tivoli Configuration Manager. . . . . . . . . . . . . . . . . . . . . 867 19.1 IBM Tivoli Configuration Manager 4.2 overview . . . . . . . . . . . . . . . . . . 869 19.2 IBM Tivoli Configuration Manager 4.2 components . . . . . . . . . . . . . . . 869 19.3 IBM Tivoli Configuration Manager 4.2 new features . . . . . . . . . . . . . . . 871 19.3.1 New Web UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871 19.3.2 Resource Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876 19.3.3 Device management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877 19.3.4 Integration with Enterprise Directories . . . . . . . . . . . . . . . . . . . . . 880 19.3.5 Native packaging support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881 19.3.6 Multicast distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881 19.4 Troubleshooting Software Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 882 19.4.1 General troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882 19.4.2 Check the log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884 19.4.3 Check the Distribution Status Console . . . . . . . . . . . . . . . . . . . . . 885 19.4.4 Make sure that Tivoli Management Framework is functional . . . . 886 19.4.5 Check for MDist2 problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886 19.4.6 Verify the setup of endpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 19.4.7 Check lost-n-found . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 19.4.8 Troubleshooting the software package . . . . . . . . . . . . . . . . . . . . . 889 19.4.9 Software Distribution traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890 19.4.10 Troubleshooting Data Moving . . . . . . . . . . . . . . . . . . . . . . . . . . . 894 19.4.11 Troubleshooting Mobile Computing . . . . . . . . . . . . . . . . . . . . . . 896 19.4.12 Troubleshooting a pristine installation . . . . . . . . . . . . . . . . . . . . . 898 19.4.13 Troubleshooting discovering and synchronization . . . . . . . . . . . 900

Contents

xiii

19.4.14 Change Management Status summary. . . . . . . . . . . . . . . . . . . . 901 19.5 Troubleshooting Activity Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902 19.5.1 Activity Planner processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902 19.5.2 Activity Planner configuration file . . . . . . . . . . . . . . . . . . . . . . . . . 902 19.5.3 Activity Planner log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904 19.5.4 Activity Planner trace files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905 19.6 Troubleshooting Change Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908 19.6.1 Change Manager configuration file . . . . . . . . . . . . . . . . . . . . . . . . 908 19.6.2 Change Manager log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908 19.6.3 Change Manager trace files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909 19.7 Troubleshooting Web Gateway and device management. . . . . . . . . . . 910 19.7.1 Troubleshooting Web Gateway installation . . . . . . . . . . . . . . . . . . 910 19.7.2 Common Web Gateway and device management problems . . . . 912 19.7.3 Tracing the Web Gateway. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922 19.8 Troubleshooting Web User Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . 923 19.8.1 Common Web User Interface problems . . . . . . . . . . . . . . . . . . . . 923 19.8.2 Tracing the Web User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 925 19.9 Troubleshooting Enterprise Directory Integration . . . . . . . . . . . . . . . . . 927 19.10 Troubleshooting Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928 19.10.1 Enabling logging and tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929 19.10.2 Troubleshooting on the endpoint. . . . . . . . . . . . . . . . . . . . . . . . . 940 Chapter 20. Tivoli Remote Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949 20.1 Tivoli Remote Control components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950 20.1.1 Remote Control trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951 20.1.2 Tivoli Remote Control logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952 20.2 Troubleshooting Tivoli Remote Control . . . . . . . . . . . . . . . . . . . . . . . . . 952 20.2.1 Tivoli Management Framework troubleshooting . . . . . . . . . . . . . . 953 20.2.2 Windows eventlog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953 20.2.3 Trace files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954 Part 4. Security applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957 Chapter 21. IBM Tivoli Access Manager for Operating Systems . . . . . . 959 21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 960 21.2 Components and architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 960 21.3 Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962 21.3.1 Auditing authorization decisions . . . . . . . . . . . . . . . . . . . . . . . . . . 962 21.3.2 Auditing administrative activity . . . . . . . . . . . . . . . . . . . . . . . . . . . 963 21.3.3 Auditing trace events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963 21.3.4 Global audit levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964 21.3.5 Using warning mode to verify policy . . . . . . . . . . . . . . . . . . . . . . . 965 21.4 Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965 21.4.1 IBM Tivoli Access Manager for Operating Systems log files. . . . . 966

xiv

Troubleshooting Tivoli Using the Latest Features

21.4.2 Installation problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966 21.4.3 Configuration problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967 21.4.4 Run-time problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 970 Appendix A. Tivoli/Windows whitepaper . . . . . . . . . . . . . . . . . . . . . . . . . 973 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Tivoli Authentication Package (TAP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Why TAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Understanding TAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975 Understanding the Tivoli Remote Access Account (TRAA) . . . . . . . . . . . 976 Order of account selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978 wsettap.exe/wlcftap.exe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979 Tivoli accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979 Accounts created . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979 Accounts used by Tivoli Management Framework . . . . . . . . . . . . . . . . . . 981 Changes to NT accounts used by Tivoli Management Framework. . . . . . 984 Privileged account comparison between Framework versions . . . . . . . . . 985 Examples of Account Management using different TME versions . . . . . . 986 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987 TME functions dependent on NT/Active Directory environment . . . . . . . . 987 Security and TME: PDC/Windows NT or mixed NT/Windows 2000 environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 988 Security and TME: Active Directory and Windows 2000/XP . . . . . . . . . . . 990 File system considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995 Tivoli Management Framework install and removal . . . . . . . . . . . . . . . . . . . . 996 Installation of the Tivoli Remote Installation Package (TRIP) . . . . . . . . . . 996 Creation of a Tivoli managed node (TMF) . . . . . . . . . . . . . . . . . . . . . . . . 997 Installation of the Tivoli Management Agent (TMA) . . . . . . . . . . . . . . . . . 999 Preparing NT for a Tivoli installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002 Tivoli files placed under %SYSTEMROOT% . . . . . . . . . . . . . . . . . . . . . 1006 DLL conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1008 Microsoft platform-specific topics regarding TME . . . . . . . . . . . . . . . . . . . . 1009 Windows NT (Version 4.0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010 Windows 2000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010 Windows XP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010 Environmental considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013 How shell and Perl scripts work on NT . . . . . . . . . . . . . . . . . . . . . . . . . . 1013 Dependencies and TMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014 Name resolution/WINS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014 Sourcing the Tivoli environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015 Tivoli Desktop for TMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015 Basic performance tuning considerations . . . . . . . . . . . . . . . . . . . . . . . . 1016

Contents

xv

Non-US Keyboard issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016 Port restriction causes TIME_WAIT to last 169 seconds . . . . . . . . . . . . 1016 Service Pack versions supported with Tivoli . . . . . . . . . . . . . . . . . . . . . . 1017 TCP/IP speed tweaks for Windows NT, 2000 and XP . . . . . . . . . . . . . . 1017 Tivoli, Microsoft, and third party utilities for NT . . . . . . . . . . . . . . . . . . . . . . 1018 NT-specific commands provided by TME . . . . . . . . . . . . . . . . . . . . . . . . 1018 Windows commands for working with TME products . . . . . . . . . . . . . . . 1020 Other utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1020 Concerns regarding perceived security vulnerabilities . . . . . . . . . . . . . . . . . 1020 Utilizing the LSA/Authentication package implementation . . . . . . . . . . . 1021 Vulnerability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021 Common problems, troubleshooting, and FAQs . . . . . . . . . . . . . . . . . . . . . 1022 Issues related to the OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022 Startup of oserv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025 Using TRAA with tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027 General Tivoli Management Framework . . . . . . . . . . . . . . . . . . . . . . . . . 1027 Install issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028 Local responses fail on Windows 2000 Domain Controllers . . . . . . . . . . 1029 Identifying which user a given process will run as . . . . . . . . . . . . . . . . . . . . 1037 Identifying the user using ADE *.ist files . . . . . . . . . . . . . . . . . . . . . . . . . 1040 Options for SET_USER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1040 Appendix B. Tivoli/NetWare whitepaper . . . . . . . . . . . . . . . . . . . . . . . . . 1043 NetWare considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044 NetWare accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044 Installing NetWare gateways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045 Installing the NetWare binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046 Registering the NetWare managed node . . . . . . . . . . . . . . . . . . . . . . . . 1047 Creating the NetWare gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047 Installing endpoints on NetWare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047 Endpoints in Novell Directory Services (NDS) . . . . . . . . . . . . . . . . . . . . 1048 Migration of NetWare clients and managed sites . . . . . . . . . . . . . . . . . . . . . 1049 NetWare clients operating as PC managed nodes . . . . . . . . . . . . . . . . . 1049 NetWare managed sites with IPX/SPX clients . . . . . . . . . . . . . . . . . . . . 1049 NetWare Managed sites with TCP/IP clients . . . . . . . . . . . . . . . . . . . . . 1050 Migrating NetWare clients to endpoints . . . . . . . . . . . . . . . . . . . . . . . . . 1050 Replacing a NetWare managed site subscriber in a profile manager . . . 1051 Replacing a NetWare managed site with endpoint subscribers . . . . . . . 1052 Replacing a NetWare managed site with a profile manager . . . . . . . . . . 1054 Special considerations for NetWare managed site subscribers . . . . . . . 1056 Running the NetWare PC Agent and lcfd concurrently . . . . . . . . . . . . . . 1056 Network limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056 Considerations for NetWare gateways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057

xvi

Troubleshooting Tivoli Using the Latest Features

NetWare gateways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057 Endpoint considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058 Communication protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1060 Questions related Tivoli/NetWare troubleshooting . . . . . . . . . . . . . . . . . 1060 Appendix C. Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065 log_it.pm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066 allow_install_policy.pl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067 rotate_logs.pl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072 prodpatch.pl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076 installed.pl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1080 Appendix D. Additional material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083 Locating the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083 Using the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083 System requirements for downloading the Web material . . . . . . . . . . . . 1084 How to use the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084 Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085 Related publications . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other resources . . . . . . . . . . . . . . . . . . . . . . . . Referenced Web sites . . . . . . . . . . . . . . . . . . . . . . How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . IBM Redbooks collections . . . . . . . . . . . . . . . . .

...... ...... ...... ...... ...... ......

....... ....... ....... ....... ....... .......

...... ...... ...... ...... ...... ......

1089 1089 1089 1091 1092 1092

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093

Contents

xvii

xviii

Troubleshooting Tivoli Using the Latest Features

Figures 1-1 1-2 1-3 1-4 1-5 1-6 1-7 1-8 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 5-1 7-1 7-2 7-3 7-4 7-5 7-6 8-1 8-2 8-3 8-4 8-5 8-6 8-7 8-8 8-9 8-10

Tivoli User Groups web site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Tivoli User Groups web site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 FTP site for archived messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Orb Data’s Technical Exchange page . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Orb Data’s Technical Resources page . . . . . . . . . . . . . . . . . . . . . . . . . 13 Orb Data’s Most Recent Documents page . . . . . . . . . . . . . . . . . . . . . . 14 Orb Data’s Journals page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Orb Data’s Links page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 CORBA operation request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Tivoli Enterprise application interfaces . . . . . . . . . . . . . . . . . . . . . . . . . 22 Management services in the X/Open reference model . . . . . . . . . . . . . 23 Object repository architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Example collection hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Relationships in the TNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Base objects contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 oserv object contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 ALI object contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Process of a new instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Object hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Distribution of a user administration user profile . . . . . . . . . . . . . . . . . . 46 Profile distribution levels graphical user interface . . . . . . . . . . . . . . . . . 49 Autotrace Snap Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 SIS high-level design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 SIS desktop dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 SIS Warning on first initialization of a Shared IR . . . . . . . . . . . . . . . . . 206 SIS IR Read-Only Mode warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 SIS Shared IR - No Write Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 SIS Shared IR Type warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Typical Install: Choosing your install options . . . . . . . . . . . . . . . . . . . . 219 Typical Install: Tivoli Server setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Typical Install: Database Vendor Information . . . . . . . . . . . . . . . . . . . 221 Typical Install: RDBMS and RIM information . . . . . . . . . . . . . . . . . . . . 222 Typical Install: Summary window of settings . . . . . . . . . . . . . . . . . . . . 223 Integrated Install in progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Custom Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Custom Install: Components choice . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Custom Install: Additional Languages . . . . . . . . . . . . . . . . . . . . . . . . . 227 Custom Install: Tivoli Server destination directory structure . . . . . . . . 228

© Copyright IBM Corp. 2003. All rights reserved.

xix

8-11 8-12 8-13 8-14 8-15 8-16 8-17 8-18 8-19 8-20 8-21 8-22 8-23 8-24 8-25 8-26 8-27 8-28 8-29 9-1 9-2 10-1 10-2 10-3 10-4 10-5 11-1 11-2 11-3 11-4 11-5 11-6 11-7 11-8 11-9 11-10 11-11 11-12 11-13 11-14 11-15 11-16 11-17

xx

Custom Install: Repository configuration information. . . . . . . . . . . . . . 229 RDBMS and RIM information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Custom Install: Activity Planner user . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Custom Install: Activity Planner repository information . . . . . . . . . . . . 232 Custom Install: Inventory repository information . . . . . . . . . . . . . . . . . 233 Custom Install: Change Manager repository information . . . . . . . . . . . 234 Custom Install: Enterprise Directory Query Facility configuration . . . . 235 Custom Install: Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Case study-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Case study-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Desktop Install: Welcome screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Desktop Install: Type of Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Desktop Install: Component to Install . . . . . . . . . . . . . . . . . . . . . . . . . 251 Desktop Install problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 setup.log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Web Gateway: Components to Install . . . . . . . . . . . . . . . . . . . . . . . . . 255 Endpoint Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Review screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Endpoint Installation: Installation successful . . . . . . . . . . . . . . . . . . . . 257 Patches FTP site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Tivoli patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Backup Tivoli Management Region dialog. . . . . . . . . . . . . . . . . . . . . . 280 Add Scheduled Job dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Select Notice Groups dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Notice Group Messages dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Example of problem when using old backups . . . . . . . . . . . . . . . . . . . 303 Sample of a desktop containing policy regions . . . . . . . . . . . . . . . . . . 307 Setting Managed Resources for a policy region . . . . . . . . . . . . . . . . . 308 Set Login Names dialog - before and after pressing Enter . . . . . . . . . 311 Administrator Login Name versus Current Login Name . . . . . . . . . . . 313 Multiple use Tivoli administrator: Administrator Properties . . . . . . . . . 314 Multiple use Tivoli administrator: Set Login Names . . . . . . . . . . . . . . . 314 Multiple use Tivoli administrator: Notice Group messages . . . . . . . . . 315 Multiple use Tivoli administrator: Notice Group messages 2 . . . . . . . . 316 Entering an ID Map for a Tivoli administrator User Login Name . . . . . 319 Administrator using an ID map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Scheduler not running message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Interregion Remote Connect Dialog. . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Interregion Secure Connect Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Remote TMR can see Query in the GUI . . . . . . . . . . . . . . . . . . . . . . . 361 Hub-spoke architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Subscription example in a hub-spoke model . . . . . . . . . . . . . . . . . . . . 370 Two-way connected TMRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

Troubleshooting Tivoli Using the Latest Features

11-18 11-19 11-20 11-21 11-22 11-23 11-24 11-25 11-26 11-27 11-28 11-29 11-30 11-31 11-32 11-33 11-34 11-35 11-36 11-37 11-38 11-39 11-40 11-41 11-42 11-43 11-44 11-45 11-46 12-1 12-2 12-3 12-4 13-1 13-2 13-3 13-4 13-5 13-6 13-7 13-8 13-9 13-10

TMR disconnect failed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Update Resource Roles for an administrator . . . . . . . . . . . . . . . . . . . . 380 Always flag setting example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Using WAN option for interconnected TMRs . . . . . . . . . . . . . . . . . . . . 386 Net_load is distributed between connections of a single distribution . . 394 Mdist2 components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Asynchronous delivery concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Synchronous delivery concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 MDist2repeater queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Available connections for each priority . . . . . . . . . . . . . . . . . . . . . . . . 403 Software Distribution GUI - Install Software Package: Setting priority. 405 MDist2 maximum concurrent connections . . . . . . . . . . . . . . . . . . . . . . 407 MDist1 max_conn configuration: Multiple distribution scenario . . . . . . 409 MDist2 net_load/target_netload concepts . . . . . . . . . . . . . . . . . . . . . . 410 MDist1 positive net_load: Multiple distribution scenario . . . . . . . . . . . 412 MDist1 negative net_load: Multiple distribution scenario . . . . . . . . . . . 413 MDist2 configuration: mem_max and disk_max . . . . . . . . . . . . . . . . . 414 MDist1 configuration:mem_max and disk_max in multiple distribution 415 Depot concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Depots between repeaters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Software Distribution GUI: Install Software Package with depot . . . . . 423 Depot directory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 An example of a large-scale distribution environment . . . . . . . . . . . . . 425 Software Distribution scenario using a depot. . . . . . . . . . . . . . . . . . . . 426 Implementation of checkpoint restart . . . . . . . . . . . . . . . . . . . . . . . . . . 428 Retry option: Gateway repeater and endpoint . . . . . . . . . . . . . . . . . . . 430 Retry option - endpoint gateway and another repeater . . . . . . . . . . . . 431 Software Distribution GUI: Time-out Settings . . . . . . . . . . . . . . . . . . . 432 Automated Software Distribution scenario at power-on . . . . . . . . . . . . 434 RIM components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 How an application uses RIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Tivoli environment used in the scenarios . . . . . . . . . . . . . . . . . . . . . . . 448 RIM connection failure message in the desktop . . . . . . . . . . . . . . . . . 454 TMA downcall architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 TMA upcall architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 Tivoli endpoint policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Gateway thread execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Logstatus information workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 odstat of initial login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 odstat of initial login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 epmgrlog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Nested TNR endpoint data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 get_endpoints output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520

Figures

xxi

14-1 14-2 14-3 14-4 14-5 14-6 14-7 14-8 14-9 14-10 14-11 14-12 14-13 14-14 14-15 14-16 14-17 14-18 14-19 14-20 14-21 14-22 14-23 14-24 14-25 14-26 14-27 14-28 14-29 14-30 14-31 14-32 15-1 15-2 16-1 16-2 16-3 16-4 16-5 16-6 16-7 16-8 16-9

xxii

Tivoli Enterprise Console 3.7 architecture . . . . . . . . . . . . . . . . . . . . . . 528 Process flow of an successfully processed event . . . . . . . . . . . . . . . . 532 Process flow of “PARSING FAILED” event . . . . . . . . . . . . . . . . . . . . . 533 Process flow acknowledging an event . . . . . . . . . . . . . . . . . . . . . . . . . 534 Windows NT Task Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 GUI actions to enable rule tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 InstallShield searching for a Java Virtual Machine (JVM) . . . . . . . . . . 578 InstallShield message if no JVM is present . . . . . . . . . . . . . . . . . . . . . 579 InstallShield dialog to specify a location for JVM . . . . . . . . . . . . . . . . . 579 JRE installation selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 Dialog to specify path to already installed JVM . . . . . . . . . . . . . . . . . . 581 JRE installation directory structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 Windows NT registry search results . . . . . . . . . . . . . . . . . . . . . . . . . . 583 tec-jconsole-remove.sh output window (1) . . . . . . . . . . . . . . . . . . . . . 585 tec-jconsole-remove.sh output window (2) . . . . . . . . . . . . . . . . . . . . . 585 tec-jconsole-remove.sh output window (3) . . . . . . . . . . . . . . . . . . . . . 586 Basic Tivoli Enterprise Console Java Console startup output . . . . . . . 587 Preferences window to enable Debug Window . . . . . . . . . . . . . . . . . . 590 Tivoli Enterprise Console Java Console Debug Window . . . . . . . . . . . 591 View of history of status updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591 Starting Tivoli Enterprise Console Java Console displaying status . . . 592 Connected to Tivoli Web interface on eastham . . . . . . . . . . . . . . . . . . 593 Directory structure of the TME NT Event Log Adapter . . . . . . . . . . . . 596 Tivoli Enterprise Console NT/Windows Event Log Adapter services. . 597 tecad_nt.exe usage statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Error message if endpoint environment is not sourced properly . . . . . 599 Adapter error message during startup . . . . . . . . . . . . . . . . . . . . . . . . . 600 Output of adapter running in debug mode . . . . . . . . . . . . . . . . . . . . . . 602 Enable auditing in the Windows NT User Manager . . . . . . . . . . . . . . . 606 Displaying Windows NT Security Event Log . . . . . . . . . . . . . . . . . . . . 607 Example events and details in the Windows Security Event Viewer . . 608 debug.log output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 High-level overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 CIM architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Tivoli Business Systems Manager technical structure . . . . . . . . . . . . . 675 ASIDBValidater setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676 TestSQL for database validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 SQL Enterprise Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 Content of obj_class table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 Property window with unique objectID . . . . . . . . . . . . . . . . . . . . . . . . . 681 The physical object hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682 The dumpfqueue command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684 The propagation concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686

Troubleshooting Tivoli Using the Latest Features

16-10 16-11 16-12 16-13 16-14 16-15 16-16 16-17 16-18 16-19 16-20 16-21 16-22 16-23 16-24 16-25 16-26 16-27 16-28 16-29 16-30 16-31 16-32 16-33 16-34 16-35 16-36 16-37 16-38 16-39 16-40 16-41 16-42 16-43 16-44 16-45 16-46 16-47 16-48 16-49 16-50 16-51 16-52

Status propagation component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688 Propagation agent dispatcher startup log file. . . . . . . . . . . . . . . . . . . . 690 Propagation dispatcher log for event processing . . . . . . . . . . . . . . . . . 691 Propagation dispatcher shutdown log . . . . . . . . . . . . . . . . . . . . . . . . . 692 Enqueue proxy server log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 Remote execution server log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 Propagation agent log-1/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696 Propagation agent log-2/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 Workstation processing component . . . . . . . . . . . . . . . . . . . . . . . . . . . 698 Notification services: Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700 Notification services: Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 Notification services: Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702 Application server: Startup (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 Application server: Startup (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 Application server: Startup (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 Application server: Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706 Application server: Shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Source/390 system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 Tivoli Business Systems Manager data server startup log . . . . . . . . . 709 Tivoli Business Systems Manager object server startup log . . . . . . . . 710 Tivoli Business Systems Manager object pump startup log. . . . . . . . . 711 OS/390 input component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712 Initial connection for Tivoli Business Systems Manager connection . . 714 Sample message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Queue file contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Sample LS log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717 Sample MVSL log file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718 MVS event handler: Processing MVS listener up . . . . . . . . . . . . . . . . 719 MVS event handler: Variable identification . . . . . . . . . . . . . . . . . . . . . 720 MVS Event handler: Object registration . . . . . . . . . . . . . . . . . . . . . . . . 721 MVS event handler: MVS listener down event. . . . . . . . . . . . . . . . . . . 722 MVS upload rule service: Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 722 MVS upload rule service: Object initialization . . . . . . . . . . . . . . . . . . . 724 MVS upload rule services: Message processing . . . . . . . . . . . . . . . . . 725 Enqueue proxy server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 MVS sender service: Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 MVS sender service: Variable registration . . . . . . . . . . . . . . . . . . . . . . 729 Tivoli Business Systems Manager Distributed Edition environment . . 730 Initialization of Agent Listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Agent Listener: Initialization items . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736 GEM object classes in Tivoli Business Systems Manager . . . . . . . . . 737 Tables for CID G02H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 GEMLookupCID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741

Figures

xxiii

16-53 16-54 16-55 16-56 16-57 16-58 16-59 16-60 17-1 17-2 17-3 17-4 17-5 17-6 17-7 17-8 17-9 17-10 18-1 18-2 18-3 18-4 18-5 18-6 18-7 18-8 18-9 18-10 18-11 18-12 18-13 18-14 18-15 18-16 18-17 18-18 19-1 19-2 19-3 19-4 19-5 19-6 19-7

xxiv

GEM_IDlookup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 GEM_DMtoCID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 GEM_InstFiltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744 GEM LOB BCDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745 GEM LOB Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745 Log setting for a service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 Adding LogLevel value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 Tivoli Business Systems Manager log directories . . . . . . . . . . . . . . . . 755 Components of the Tivoli Enterprise Data Warehouse . . . . . . . . . . . . 758 Error messages in the Report Interface after database restart . . . . . . 765 Accessing ETL process logs in Work in Progress . . . . . . . . . . . . . . . . 769 Accessing the log details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770 Log details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 Configuring a database’s transaction log file . . . . . . . . . . . . . . . . . . . . 773 Configuring the log file parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 How to specify the control database . . . . . . . . . . . . . . . . . . . . . . . . . . 775 Control Database Management window . . . . . . . . . . . . . . . . . . . . . . . 776 Create a reorganization step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 Tivoli Workload Scheduler network with only one domain . . . . . . . . . . 795 Tivoli Workload Scheduler network with three domains . . . . . . . . . . . 796 Tivoli Workload Scheduler network with different manager and agents799 TWS for z/OS configuration with two sysplex environments . . . . . . . . 802 Using APPC server for remote panels to TWS for z/OS . . . . . . . . . . . 804 JSC connection to Tivoli Workload Scheduler for z/OS . . . . . . . . . . . . 805 Tivoli Workload Scheduler for z/OS end-to-end scheduling . . . . . . . . 807 Tivoli Workload Scheduler for z/OS inter-process communication . . . 809 Creation of Symphony file in TWS for z/OS plan programs . . . . . . . . . 815 Symphony file distribution from TWS for z/OS server to TWS agents . 817 Link the workstation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844 Displaying the Symphony run number . . . . . . . . . . . . . . . . . . . . . . . . . 845 Setting status to active . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846 JSC log on error message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855 Connector link failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855 Disabled instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856 Tivoli Management Framework failure . . . . . . . . . . . . . . . . . . . . . . . . . 856 Allocation error message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857 WEB UI and firewalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 Resource Manager infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877 Data flow using Software Distribution to push to devices . . . . . . . . . . 878 Data Moving process flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895 Process flow for Mobile Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 897 Pristine tool process flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899 APM executer trace file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907

Troubleshooting Tivoli Using the Latest Features

19-8 19-9 19-10 19-11 19-12 20-1 21-1 21-2 A-1 A-2 A-3 A-4 A-5 A-6 A-7 A-8 B-1 B-2 B-3 B-4 B-5

Failed TWG installation dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 911 Inventory scan job in Web Gateway database. . . . . . . . . . . . . . . . . . . 920 Inventory discovery process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933 Distribution Status icon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934 Distribution status console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935 Remote Control Controller event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953 IBM Tivoli Access Manager architecture and components . . . . . . . . . 961 IBM Tivoli Access Manager for Operating Systems log files . . . . . . . . 966 Bypass traverse checking options: Before the change . . . . . . . . . . . 1030 Bypass traverse checking options: After the change . . . . . . . . . . . . . 1031 Act as part of the operating system options: Before the change . . . . 1032 Act as part of the operating system options: After the change . . . . . 1033 Increase quotas options: Before the change . . . . . . . . . . . . . . . . . . . 1034 Increase quotas options: After the change . . . . . . . . . . . . . . . . . . . . 1035 Replace a process level token options: Before the change . . . . . . . . 1036 Replace a process level token options: After the change . . . . . . . . . 1037 NetWare scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1051 Replacing a NetWare managed site with endpoint subscribers 1/2. . 1053 Replacing a NetWare managed site with endpoint subscribers 2/2. . 1054 Replacing a NetWare managed site with a profile manager 1/2 . . . . 1054 Replacing a NetWare managed site with a profile manager 2/2 . . . . 1055

Figures

xxv

xxvi

Troubleshooting Tivoli Using the Latest Features

Tables 2-1 2-2 3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 4-1 5-1 5-2 6-1 6-2 7-1 8-1 11-1 11-2 11-3 11-4 11-5 11-6 11-7 11-8 11-9 11-10 11-11 11-12 11-13 11-14 13-1 13-2 15-1 16-1 16-2 16-3 16-4

Derived datatypes and related examples. . . . . . . . . . . . . . . . . . . . . . . . 40 Mapping of distribution options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 objcall flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 idlattr flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 odbls flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Common irview commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 tmstat flags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 odstat arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 odstat output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 wtrace flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Errors of allow_install_policy in its log file . . . . . . . . . . . . . . . . . . . . . . 127 Autotrace run-time libraries and control files . . . . . . . . . . . . . . . . . . . . 133 Autotrace channel number assignment for TMF product . . . . . . . . . . . 137 List of error files after a server install . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Error files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 SIS log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Files written during server installation . . . . . . . . . . . . . . . . . . . . . . . . . 242 Administrator commands for troubleshooting . . . . . . . . . . . . . . . . . . . 321 Default policies in a Task Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Validation policies in a task library . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Task Library commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Scheduler commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 wregister flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 wconnect flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 Repeater flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Timeout for distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Differences between gateway and managed node repeaters . . . . . . . 398 MDist1 and MDist2: Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Relationship between the MDist2 net_load and the connections . . . . 411 Relationship of MDist2 net_load, target_net_load, and the connection 411 Relation of disposable option and permanent_storage configuration . 424 Parameter values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Endpoint reaction to gateways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 Gateway trace components summary . . . . . . . . . . . . . . . . . . . . . . . . . 632 Tivoli Enterprise Console exits for event forwarding . . . . . . . . . . . . . . 731 AMS types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738 Software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 List of TBSM base services components installed. . . . . . . . . . . . . . . . 748

© Copyright IBM Corp. 2003. All rights reserved.

xxvii

16-5 16-6 18-1 18-2 18-3 18-4 18-5 18-6 18-7 18-8 19-1 19-2 19-3 19-4 19-5 19-6 19-7 21-1

xxviii

Tivoli Business Systems Manager related programs and services . . . 749 Tivoli Business Systems Manager services . . . . . . . . . . . . . . . . . . . . . 751 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 Socket error codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827 Tracking events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833 Problem determination of tracking events . . . . . . . . . . . . . . . . . . . . . . 834 Files and directory structure of UNIX System Services . . . . . . . . . . . . 841 Trace levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854 Tracelevel values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858 Tracedata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858 Change management status summary . . . . . . . . . . . . . . . . . . . . . . . . 901 Location of apm.ini, APM configuration file . . . . . . . . . . . . . . . . . . . . . 902 Location of APM log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904 Location of CCM configuration file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908 Location of CM log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909 Settings for trace_level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926 Log file information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929 Global audit levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964

Troubleshooting Tivoli Using the Latest Features

Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

© Copyright IBM Corp. 2003. All rights reserved.

xxix

Trademarks The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: AIX® Approach® AS/400® DB2® Domino™ Hummingbird® IBM® IBM ™ Informix® iSeries™ Lotus® MVS™ NetView® Notes®

OS/2® OS/390® OS/400® PAL® Parallel Sysplex® Perform™ Planet Tivoli® RAA® RACF® Redbooks™ Redbooks (logo)™ RMF™ S/390® SecureWay®

Sequent® ThinkPad® Tivoli® Tivoli Enterprise™ Tivoli Enterprise Console® Tivoli Management Environment® Tivoli/Sentry® TME® TME 10™ VTAM® WebSphere® Whistle® z/OS™

The following terms are trademarks of other companies: ActionMedia, LANDesk, MMX, Pentium and ProShare are trademarks of Intel Corporation in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. NetWare is a trademark of Novell Corporation in the United States, other countries, or both. Sun, Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. C-bus is a trademark of Corollary, Inc. in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. SET, SET Secure Electronic Transaction, and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC. Other company, product, and service names may be trademarks or service marks of others.

xxx

Troubleshooting Tivoli Using the Latest Features

Preface This IBM Redbook is an update of the existing Tivoli Enterprise Internals and Problem Determination, SG24-2034 redbook. The material is revised and updated for Tivoli Management Framework and applications post Version 3.6. Some of the applications that are covered from the troubleshooting point of view in this redbook are: 򐂰 Tivoli Management Framework and related concepts 򐂰 Tivoli Enterprise Console 򐂰 IBM Tivoli Monitoring 򐂰 Tivoli Business Systems Manager 򐂰 Tivoli Enterprise Data Warehouse 򐂰 Tivoli Workload Scheduler 򐂰 IBM Tivoli Configuration Manager 򐂰 Tivoli Remote Control 򐂰 IBM Tivoli Access Manager for Operating Systems Another subject that is associated with troubleshooting is proper maintenance of your Tivoli environment, because proper Tivoli maintenance procedures eliminate many potential Tivoli problems. In addition to the troubleshooting information, this redbook briefly touches on some best practices information for maintaining your Tivoli environment, mostly from the troubleshooting perspective. Do not forget that there is an excellent redbook called Maintaining Your Tivoli Environment, SG24-5013, which describes best practices for maintaining a Tivoli environment. We also made use of this book in some selected topics. This redbook will be a major reference for Tivoli administrators in troubleshooting problems related with Tivoli Enterprise.

The team that wrote this redbook This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Austin Center. Vasfi Gucer is a Project Leader at the International Technical Support Organization, Austin Center. He has worked for IBM Turkey for 10 years and has

© Copyright IBM Corp. 2003. All rights reserved.

xxxi

been with the ITSO since January 1999. He has more than 10 years of experience in the areas of systems management, networking hardware, and software on mainframe and distributed platforms. He has worked on various Tivoli customer projects as a systems architect in Turkey and the U.S. He writes extensively and teaches IBM classes worldwide on Tivoli software. Vasfi is also a IBM Certified Senior IT Specialist. Orcun Atakan is a Software Engineer in IBM Tivoli Worldwide Education, Austin, where he has been working for two years. He holds a degree in Computer Engineering from Istanbul Technical University, Turkey. His areas of expertise include network design, IP security, security implementations, Java, and electronic commerce. Orcun has previously authored four IBM Redbooks. Jamie Carl is a member of Tivoli Software's Global Response Team in Austin, Texas. A two-year veteran of IBM, Jamie was a customer prior to coming to IBM, bringing with him three years of experience with Tivoli doing global deployment as an architect, implementer, and support person. Since joining IBM, Jamie has made valuable contributions to the company by using his knowledge and expertise with Windows NT/2000, including the most recent work he has done with Tivoli Management Framework and Active Directory environments. Jamie has made several appearances at SHARE, Tivoli User Groups, and Planet Tivoli presenting Distributed Monitoring 3.7/4.1 and various other topics. Jamie is also one of the authors of IBM Tivoli Monitoring Version 5.1: Advanced Resource Monitoring, SG24-5519. Budi Darwadan is a Tivoli specialist at the International Technical Support Organization, Austin Center. He writes extensively on various Tivoli solutions and systems management in general. Before joining the ITSO three years ago, Budi worked in IBM Global Services Indonesia as a Solution Architect. His current expertise is in general Tivoli solutions and database administration. He is a Tivoli Certified Enterprise Consultant and an IBM Certified Solution Expert in the DB2 Version 7.1 Family. Murtuza Choilawala is an Advisory Software Engineer in Level 2 Tivoli Management Framework support with IBM Tivoli. He has 10 years of experience in Information Technology and has been providing technical support for Tivoli for the past two years. Before joining Tivoli, he worked with the IBM Advance Technical Support team in Rochester, MN providing technical support to IBM Business Partners and Software Developers on iSeries and AS/400 systems. He has also been an AS/400 Product Manager when he was working with an IBM Business Partner in India. Thanks to the following people for their contributions to this project: Wade Wallace International Technical Support Organization, Austin Center

xxxii

Troubleshooting Tivoli Using the Latest Features

Ella Buslovich International Technical Support Organization, Poughkeepsie Center Jeff Achtermann, Kevin Alexander, Jan Byrd, Chuck Camp, Gene Cherry, Mark Fantacone, Joseph Hamblin, Brian Graham, Jerry Moffitt, Jerry Saulman IBM US Peter Elliott, Gary R. Hamilton, Mike Hau IBM UK Chris Maddams London IT Support, HSBC Bank, UK The team also would like to thank the project teams of the following redbooks: 򐂰 Tivoli Enterprise Internals and Problem Determination, SG24-2034 򐂰 Maintaining Your Tivoli Environment, SG24-5013 In addition, the redbook team wants to express thanks to the following organizations: 򐂰 Orb Data Limited, UK (http://www.orb-data.com) 򐂰 deconetix GmbH, Germany (http://www.deconetix.com) The team would like to express special thanks to Mike Hahn from IBM US.

Become a published author Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html

Preface

xxxiii

Comments welcome Your comments are important to us! We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: 򐂰 Use the online Contact us review redbook form found at: ibm.com/redbooks

򐂰 Send your comments in an Internet note to: [email protected]

򐂰 Mail your comments to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493

xxxiv

Troubleshooting Tivoli Using the Latest Features

Part 1

Part

1

Introduction, installation, and core services

© Copyright IBM Corp. 2003. All rights reserved.

1

2

Troubleshooting Tivoli Using the Latest Features

1

Chapter 1.

Overview This chapter gives an overview of the book, then covers some additional resources that you can use for troubleshooting Tivoli Enterprise problems. In this chapter, the following topics are discussed: 򐂰 Section 1.1, “Overview” on page 4 򐂰 Section 1.2, “Generic problem determination outline” on page 6 򐂰 Section 1.3, “User groups and other sources of information” on page 7

© Copyright IBM Corp. 2003. All rights reserved.

3

1.1 Overview The Tivoli Enterprise Management suite aims to make distributed systems and application management relatively easy. It achieves this through a consistent interface and the use of models, such as management by subscription. While the systems administrator can perform many tasks with relative ease, the code Tivoli and its partners provide to achieve those tasks is extraordinarily complex. With the solid foundation of the Tivoli Management Framework, this complexity can remain largely masked from the administrator. However, with such a sophisticated set of products, there will be occasions when those designing, testing, and implementing Tivoli solutions will encounter situations that are not resolved by reference to product manuals alone. In problem-solving situations, you need to understand what is going on between the product components, what messages and trace output means, and what extra actions you can take to try to resolve a problem. This book starts with what is probably the most difficult subject: the core of the Tivoli Management Environment, the Tivoli Object Database. One of the reasons applications fit in so well in the Tivoli Management Framework paradigm is the object-oriented nature of Tivoli Enterprise. No discussion of advanced problem-solving techniques would be complete without an understanding of the database Tivoli maintains for all objects in the management environment. If you are new to objects, methods, and the like, Chapter 2, “Tivoli Object Database architecture” on page 17 will be difficult reading. You should try to gain some level of understanding of what we are trying to describe there. If you are already familiar with a particular Tivoli application, you should find the information in the subsequent product chapters useful even without a clear understanding of Chapter 2, “Tivoli Object Database architecture” on page 17. Next, we will delve into the details of Tivoli object database troubleshooting. Whatever problem you are investigating, there is a core set of tools and commands that you are likely to use. In Chapter 3, “Problem determination” on page 57, we will cover the commands, tools, and techniques available for troubleshooting Tivoli object database problems. Chapter 4, “Using and maintaining log files” on page 117 deals with log files generated by Tivoli Enterprise. A good understanding of these log file will certainly help you troubleshoot Tivoli Enterprise. Autotrace is a new function within Tivoli Management Framework 4.1. It is a third party tool originally developed by the Kernel Group Inc. (TKG). It is mainly used for capturing traces in memory continuously, with minimum impact to performance, while the product is running. Chapter 5, “Autotrace” on page 131 will cover this new tool.

4

Troubleshooting Tivoli Using the Latest Features

If there are any problems installing a Tivoli product, then it is not likely to work in the desired manner. Therefore, the next topic we discuss is the installation process. The emphasis in Chapter 6, “Tivoli core installation process” on page 145 is the installation of the Tivoli Management Framework, but much of the information will be relevant to patches and products since they all use the same or similar installation methods. Chapter 7, “Software Installation Service (SIS)” on page 185 covers the SIS internals and troubleshooting. SIS continues to be the preferred method for mass Tivoli installations. InstallShield Multi-Platform (ISMP) technology is part of Tivoli’s Install Imperative and IBM’s install strategy to achieve two major goals: consistent install and simplified maintenance. IBM Tivoli Configuration Manager is the first product that uses this technology for installation of its components and this feature is called Integrated Install. Chapter 8, “ISMP based installation (Integrated Installation)” on page 211 covers Integrated Install. In the future, all Tivoli products are expected to support this installation technology. Chapter 9, “Patch maintenance” on page 259 deals with a subject that goes side by side with installation: Patch maintenance. If we had to pick only one message to get across in this book, it would be the importance of regular and appropriate backups. This is especially true when troubleshooting. You may need a good backup to return to, or you may need to take backups before performing some procedure. We talk about the backup process in Chapter 10, “Backup and restore” on page 277. Most of the applications make regular use of Tivoli Management Framework services. In Chapter 11, “Tivoli Management Framework core services” on page 305, we take a look at how these services function and the steps you can take if they are not functioning correctly. A lot can depend on getting the right administrative authority, checking in the right place for notices, understanding the implications of interconnected TMRs, and so on. No troubleshooting redbook is complete without covering the RDBMS Interface Module (RIM). RIM has an essential job in Tivoli Enterprise, which is to allow the applications to have a common set of APIs to get and store data. Chapter 12, “RDBMS Interface Module (RIM)” on page 441 covers details of RIM troubleshooting. The introduction of the Tivoli Management Agent has been the most significant innovation in the systems management arena since Tivoli first introduced Tivoli Management Framework based management. We dedicated a chapter for management of endpoints. Chapter 13, “Endpoints and endpoint management” on page 463 delves into the details of endpoints, endpoint gateway, and endpoint

Chapter 1. Overview

5

manager. Healthy operation of this trio is essential for the successful working of your Tivoli environment. The following chapters look at the Enterprise applications themselves. Tivoli recently grouped Tivoli Enterprise applications into four groups according to their functions: Configuration and Operations, Performance and Availability, Security, and Storage Management. We will also follow this classification in this redbook and cover the first three of these groups. Accordingly: Chapter 14, “Tivoli Enterprise Console” on page 525, Chapter 15, “IBM Tivoli Monitoring” on page 617, Chapter 16, “Tivoli Business Systems Manager” on page 673, and Chapter 17, “Tivoli Enterprise Data Warehouse” on page 757 cover Performance and Availability applications. Chapter 18, “Tivoli Workload Scheduler” on page 793, Chapter 19, “IBM Tivoli Configuration Manager” on page 867, and Chapter 20, “Tivoli Remote Control” on page 949 cover Configuration and Operations applications Chapter 21, “IBM Tivoli Access Manager for Operating Systems” on page 959 covers the Security group. Also, in Appendix A, “Tivoli/Windows whitepaper” on page 973, you will find the Tivoli implementation on Windows whitepaper, which is updated for the latest versions of Windows. Finally, the Tivoli/NetWare whitepaper is covered in Appendix B, “Tivoli/NetWare whitepaper” on page 1043. Note: Throughout the book, the term Windows NT will be used for both Windows NT and Windows 2000. If there is any special consideration for one of the versions, it will be explicitly specified.

1.2 Generic problem determination outline If you start to receive errors, and you have questions about the cause, this generic outline for problem determination may help. If you have a scenario that you can re-create, the following is a generic list of steps to perform to gather documentation: To obtain an overall picture: 1. Run odadmin odlist to determine the number of machines; keep for reference purposes. 2. Run odadmin alone to get information, such as the port range restrictions (if any) in place. 3. Run odadmin environ get to determine the environment the oserv is using.

6

Troubleshooting Tivoli Using the Latest Features

To gather data from each suspected machine: 1. Log on as root and as a Tivoli root administrator. This helps ensure you are not experiencing authority problems. 2. Run odadmin trace errors, then odadmin trace objcalls, and then odadmin trace services, in that order. 3. Re-create the problem. On every involved machine, including the TMR server: a. Run odstat -v >odstat.txt. b. Run odadmin db_sync to flush any data held by the oserv to disk. c. Run wtrace -jkH $DBDIR >wtrace.txt (or %DBDIR% for Windows NT). d. Collect the .txt files plus oservlog and any useful system logs. e. Set odadmin trace off. f. Then run odadmin trace errors to revert to the default of just logging errors. The trace should help you determine the failing objcall. Note: When capturing the trace information, it is better to capture the full trace so that reproduction time of the problem is minimized.

1.3 User groups and other sources of information Next, we will list several Web sites (and their content) that deal with the usage and optimization of the Tivoli environment.

1.3.1 Tivoli Field Guides The Tivoli Field Guides from Tivoli professionals represent a new level in customer support by an Enterprise Systems Management company. This on-going initiative seeks to share the experience of Tivoli customers and Tivoli implementation experts with other Tivoli customers in order to provide the information needed to succeed. Note: To access the Tivoli Field Guides, you have to be registered as a Tivoli user and authorized to access the secured pages of the Tivoli Customer Support Web site.

Chapter 1. Overview

7

򐂰 Tivoli Field Guides – For Technical Issues These papers are designed to address specific technical scenarios or concepts, which are often complex to implement or difficult to understand. Some of the subjects discussed are: Endpoint Mobility, Migration, Heartbeat Monitoring, Getting the Most out of Traces, Endpoint Policy Issues, and so on. 򐂰 Tivoli Field Guides – For Business Issues These papers are designed to address specific business practices that have a high impact on the success or failure of an ESM project. The business papers cover the following topics: Change Management, Asset Management, Project Management, Developing Requirements, Building a case for a Test Environment, Going for the Quick Win, Understanding the Phases of Deployment, and so on. Please refer to the following Web site for these guides: http://www-3.ibm.com/software/sysmgmt/products/support/Field_Guides.html

1.3.2 Tivoli mailing list Tivoli user/forum information can be found at: http://publib-b.boulder.ibm.com/redbooks.nsf/Portals/TivoliCustom1

Important: Please note that the previous Tivoli mailing list was hosted on a different Web site. The TME10 list is used to discuss products by Tivoli Systems and partner products produced by members of the TME10+ Association. The list is available in a digest format (tme10-digest); individual list messages are compiled into a single e-mail message and a digest is sent out either daily or after the digest exceeds a certain size. For subscription to the TME10 list, see the policies on the Tivoli User Groups Web site shown in Figure 1-1 on page 9.

8

Troubleshooting Tivoli Using the Latest Features

Figure 1-1 Tivoli User Groups web site

Figure 1-2 on page 10 shows the Download page of the Tivoli User Groups web site.

Chapter 1. Overview

9

Figure 1-2 Tivoli User Groups web site

Current and archived Tivoli user/forum information can be found at the FTP site (see Figure 1-3 on page 11): ftp://www.redbooks.ibm.com/redbooks/tme10_archive/

10

Troubleshooting Tivoli Using the Latest Features

Figure 1-3 FTP site for archived messages

1.3.3 Orb Data Limited The Web site for Orb Data provides excellent information related to the Tivoli product suite and architecture: http://www.orb-data.com/TechExchange.html

Figure 1-4 on page 12 shows Orb Data’s Technical Exchange page with links to journals, presentations, scripts, tips, and further Tivoli related Web sites.

Chapter 1. Overview

11

Figure 1-4 Orb Data’s Technical Exchange page

When searching for tips and/or scripts, Orb Data provides technical resources for several categories, as shown in Figure 1-5 on page 13.

12

Troubleshooting Tivoli Using the Latest Features

Figure 1-5 Orb Data’s Technical Resources page

In addition, it is possible to just list the most recent documents, as illustrated in Figure 1-6 on page 14.

Chapter 1. Overview

13

Figure 1-6 Orb Data’s Most Recent Documents page

Figure 1-7 on page 15 shows Orb Data’s Journals page, which lists technical journals produced by Orb Data professionals.

14

Troubleshooting Tivoli Using the Latest Features

Figure 1-7 Orb Data’s Journals page

For further information related to Tivoli, Orb Data has several other Web sites, as illustrated in Figure 1-8 on page 16.

Chapter 1. Overview

15

Figure 1-8 Orb Data’s Links page

1.4 Other Web sites containing Tivoli related information Other sources of Tivoli related information are found at: http://www-3.ibm.com/software/sysmgmt/products/support/ ftp://ftp.software.ibm.com/software/tivoli_support/

16

Troubleshooting Tivoli Using the Latest Features

2

Chapter 2.

Tivoli Object Database architecture This chapter explains Tivoli Enterprise’s object-oriented environment. This is crucial to help gain an in-depth understanding of how Tivoli works and to perform advanced problem determination activities without jeopardizing the integrity of the Tivoli management database. Reading this material will also help you become familiar with the terminology that is commonly used when working with Tivoli’s platform and applications. This is a very complex subject. If you find this chapter too complex, but have Tivoli application experience, you should find plenty of other information in the later chapters that will still be of use to you. The definitions we have given are not necessarily 100 percent compliant with the CORBA specifications. Instead, we have tried to relate the various terms used to illustrate the hierarchy and the Tivoli implementation. An existing knowledge of Object Oriented technology will help you understand this chapter. Otherwise, persevere! You may need to read the chapter through more than once to grasp all the concepts presented. The following topics are covered in this chapter: 򐂰 Section 2.1, “The Tivoli Enterprise management challenge” on page 18 򐂰 Section 2.2, “Tivoli Enterprise architecture overview” on page 19 򐂰 Section 2.3, “Tivoli Management Framework concepts” on page 24

© Copyright IBM Corp. 2003. All rights reserved.

17

2.1 The Tivoli Enterprise management challenge Tivoli Enterprise is a suite of distributed systems management products that address the following system management needs in a distributed computing enterprise:

18

Heterogeneity

Runs on many different platforms. System administrators need not be concerned with the machine architecture. The network environment can support multiple architectures, so the management platform must be heterogeneous to reduce administrative complexity.

Interoperability

Enables many different platforms to operate together. A system administrator using one machine type can manage resources on other machine types regardless of the architecture. Such interoperability extends heterogeneity, enabling system administrators to seamlessly manage any type of machine from any other type of machine.

Scalability

Handles large computing enterprises. Managing networks comprising thousands of nodes can produce serious difficulties for system administrators. Using Tivoli Managed Regions (TMRs), system administrators can easily distribute changes (such as creating a new user) to large networks.

Distributed

Provides services across distributed systems, spreading the systems management work load. Not only do managed nodes maintain their own object databases, applications share the management burden by allowing major tasks to be handled by separate machines. Examples include TEC and RIM servers and Distributed Monitoring engines.

Robust APIs

Enables all products and customer-developed applications to work together and leverage standard APIs.

Dependability

System management transactions ensure consistency and can back out half-completed operations across the network. This can be very important in large distributed environments where multiple administrators can simultaneously perform operations.

CORBA

The Object Management Group (OMG) proposed CORBA 1.1 as a standard for all common Object Request Broker (ORB) systems. Tivoli Enterprise is compliant with this standard.

Troubleshooting Tivoli Using the Latest Features

2.2 Tivoli Enterprise architecture overview The Tivoli Enterprise applications all share a common framework, the Tivoli Management Framework. The Tivoli Management Framework is an open, object-oriented framework that includes a set of managers, brokers, and agents that conform to the Object Management Group (OMG) Common Object Request Broker Architecture (CORBA) specifications. This technology allows major differences between computer operating systems to be hidden from the Tivoli user and, to some extent, the applications. It allows key services to be encapsulated in objects that can be used by multiple management applications. The Tivoli Management Framework provides platform-independence, a unifying architecture for all applications, and the ability for third-party vendors to easily adapt their offerings or plug them into the framework, allowing systems administrators to manage a wide variety of IT resources in a consistent way. In addition, a robust set of APIs and services enables customers to write their own applications that plug into and leverage the Tivoli Management Framework. Tivoli Enterprise represents several major advancements for managing large networks of heterogeneous, distributed systems. It has two primary components: a comprehensive management platform (Tivoli Management Framework) and a set of X/Open-compliant APIs. The Tivoli Management Framework is built around an implementation of the OMG CORBA 1.1 environment. It also provides an implementation of the enabling services adopted by X/Open as the basis for a systems management framework.

2.2.1 About CORBA 1.1 The Object Management Group (OMG) is a non-profit, international association of more than 300 companies. Its goal is to define an architectural object framework through a series of detailed interface specifications. OMG's CORBA specification introduces the Interface Definition Language (IDL) and the concepts of an object request broker (ORB) and basic object adaptor (BOA). The ORB and BOA provide a mechanism for invoking objects and returning the results to requestors. The CORBA 1.1 specification presents an open system of service requestors and service providers in which the requestors are isolated from the providers. Requests are initiated without regard to the location or implementation of the service provider. The service provider could be on the same machine or on another system of a different architecture somewhere across the network.

Chapter 2. Tivoli Object Database architecture

19

CORBA 1.1 specifies interfaces to a set of low-level object services. It does not, however, specify implementation, security, or installation. Nor does it offer a means for multi-vendor ORB interoperability or C++ language bindings. These are up to the implementor to determine and, as in the case of the Tivoli Management Framework, can significantly enhance the function of a CORBA-based product. The architecture uses three concepts to achieve the integration of a wide variety of object services in many different languages and systems: 򐂰 Object encapsulation: The object providing a requested service does so within its own context, which means that each object has the ability to respond differently to the same request. Thus, two different objects can support the same interface, and each can maintain a different implementation of that interface. 򐂰 Complete service requestor/provider isolation: Allows service requestors to make requests of a provider regardless of the provider location or implementation. A service request includes a service identifier (operation name), a provider identifier (object reference), and other optional data. 򐂰 Interface and implementation separation: Interfaces are defined without regard to the way in which they are implemented. Within the CORBA 1.1 architecture, there are three primary components: The client, the object implementation, and the ORB/BOA. The client is the requestor of a service that an object implementation provides. The ORB delivers the request from the client to the object implementation through the BOA. The object implementation then performs the requested service, and any return data is delivered back to the client. Note that the client and object may or may not reside on the same physical computer system. The client and the object implementation are isolated from each other. Neither has any knowledge of the other except through their interfaces to the ORB and BOA. Client requests are independent of the object implementation location and the programming language in which they are implemented. Furthermore, clients and object implementations are not capable of direct communication (clients can only initiate requests, and object implementations can only provide services at the request of a client). Figure 2-1 on page 21 shows the steps involved when a client requests an operation of some object implementation. The request is shown as step 1.

20

Troubleshooting Tivoli Using the Latest Features

Client

6 Dynamic Invocation Interface

Client

1

3

Server Skeleton

Direct ORB Interface

Client Stub

2

Object Request Broker

4 Basic Object Adapter (BOA)

5

Client Request Results

Figure 2-1 CORBA operation request

The ORB delivers the request to the BOA (step 2) that activates the process under which the object implementation runs. The BOA then invokes the method associated with the request by way of the server skeleton (step 3). When the method is completed, the BOA manages the termination of the method (step 4) and coordinates the return of any results to the client (steps 5 and 6). Alternatively, if a request is unknown until runtime, the Dynamic Invocation Interface (DII) is used to build a request that is used in place of a client stub linked at compile time.

2.2.2 Tivoli Enterprise CORBA implementation On top of the CORBA ORB and the enabling services are a set of management services, user interface services, and advanced application services. These combined services form the application programming interface to which systems management applications are written and make up the Tivoli Advanced Development Environment (ADE). Figure 2-2 on page 22 identifies the application interfaces and their relationships to each other.

Chapter 2. Tivoli Object Database architecture

21

Tivoli and Third-Party Applications Advanced Application Services UI Services Enabling Services CORBA ORB plus Security Operating System & Transport

Figure 2-2 Tivoli Enterprise application interfaces

You can choose to write to one or more layers from this API or substitute alternate services and libraries from third parties as appropriate. Tivoli provides support for the same programming interfaces across all Tivoli Management Framework-supported architectures, which provides a significant portability layer. Currently, some of these interfaces are better documented than others. Tivoli continues to strive to be a genuinely open platform and API documentation gets better with every revision.

2.2.3 Tivoli Enterprise heterogeneity and interoperability Tivoli Enterprise supports heterogeneous systems management. This means that the Tivoli Server can be any supported architecture type, and that Tivoli clients can be any mix of supported architecture types. In heterogeneous networks, applications have traditionally been required to explicitly cope with the data requirements of each platform. Each time the application transmits data, it must be able to convert the data from the native format to the destination format. The Tivoli Management Framework ORB removes this consideration from application programming. When a request to run an operation on some object is made, a client stub initiates the request, collects the data associated with the request, and converts the data from its current format to a common format. This process (known as data marshalling) is performed in accordance with the ASN.1 standard. ASN.1 converts data to a canonical or simplistic form for transmission to a machine of an undetermined platform type.

22

Troubleshooting Tivoli Using the Latest Features

When the data conversion is complete, the client stub passes the marshalled data to the ORB. The ORB then sends the data to the BOA and ultimately to the appropriate server skeleton. The server skeleton then reformats the data according to the requirements of the destination object implementation.

2.2.4 Management services The system management framework fits into the X/Open reference model and is built on top of an OMG CORBA 1.1 foundation. It provides a set of enabling management services for applications. These services include policy, extensibility, scheduling, collections, and instance management. When used with other interfaces and services found in Tivoli, they enable the development of sturdy, feature-rich systems management applications. Figure 2-3 illustrates the X/Open reference model and shows the management services component indicating those areas that the Tivoli Management Framework targets.

User Interface

Hosts

File System

Data Source

Print

Common Facilities

Management Applications

Object Request Broker

Object Services

Management Services

Managed Objects

Customization Scheduling Instance Manager Collections

Object Interface Non-Object Interface

Figure 2-3 Management services in the X/Open reference model

Tivoli specifically focuses on managing policy-driven objects. This management includes the mechanisms and facilities that enable the establishment and enforcement of policy on these objects.

Chapter 2. Tivoli Object Database architecture

23

2.3 Tivoli Management Framework concepts The following section describes basic information about objects (self-sufficient program modules), which serve as the building blocks for Tivoli Management Framework and its associated applications. Important: For this chapter, you should have experience with Tivoli Management Framework and its core applications, and working knowledge of tools like idlcall, idlattr, and objcall. We will use these commands to change object attributes without verification warnings, as well as commands that delete objects from the database. Exercise caution when using these commands.

2.3.1 Object identification Each object has a unique object identifier (OID) and a label attribute. Most objects also have a resource type. A resource is an object that is registered in the Tivoli Name Registry (TNR). Collection objects act as containers of other objects. Collection objects also have a label attribute. If you want to identify an object, you can do so by its OID, label, or its collection. For example: 򐂰 OID: 1234567890.1.348 򐂰 Type/Label: @ManagedNode:hannover 򐂰 Collection: /Regions/Root_hannover-region/hannover

2.3.2 Interfaces and classes The interface is a specification of the behavior, the attributes, declarations of constants, exceptions, and datatypes. An interface defines these elements publicly, so other objects can access or inherit them. The structure of an object is defined by its interface and its class. An interface specification defines public attributes, and a class specification defines private attributes. Sometimes the attributes of the interface are also defined as part of the class. Sometimes the attributes of the interface are implemented by the class as a method, that is, the object appears to have attributes that are stored in the repository, but they are actually calculated by a method. For example, a Circle interface might declare two attributes: radius and area. A Circle class would declare an attribute radius, and a method _get_area that would calculate a value for the area attribute based on the value of the radius attribute.

24

Troubleshooting Tivoli Using the Latest Features

Objects are created, or instantiated, from Class objects, or instance managers. Class objects can inherit from other Class objects, which allows the behavior of the inherited classes to be shared by the derived Class object. There are two different types of classes: Abstract Class and Instantiable Class. You can only instantiate an object from an Instantiable Class. Abstract Classes are used to provide a base class from which other classes may be derived.

2.3.3 Object repository architecture The repository stores data in several files located in the database directory ($DBDIR) on each node (see Figure 2-4 on page 26): 򐂰 Attributes in the object database 򐂰 Methods and inheritance in the inherited method database 򐂰 Notices in the notice database Each of the files has a log file (odb.log, imdb.log, or notice.log), which is used to store pages that have been updated. This allows the pages to be rewritten (or rolled back) if an error occurs during the update. This is also used to back out of transactions that have been aborted. The logical structure of the repository is defined by the objects that are stored in it. Each object can have zero or more attributes, each of which requires space in the repository. So, knowing the attributes of an object tells us what data is stored in the repository for each object. The data is referenced by a key based on the OID, a key type, and either an attribute name or a method name, or an inherited OID. The oserv process searches the repository files for a given key until it finds that correct entry. The entry contains the information necessary to retrieve the data.

Chapter 2. Tivoli Object Database architecture

25

O SER V

N O T IC E D A TA (n o tic e .b d b - o n s e r v e r )

M E T H O D D A TA (im d b .b d b - o n s er ve r)

A T T R IB U T E D A TA (o d b .b d b - o n a ll m a n a g ed n o d es )

Figure 2-4 Object repository architecture

Attributes Many of the services require one or more pieces of data (called an object attribute) in order to correctly execute. An attribute has a name and a datatype associated with an object. These data attributes must be persistent, that is, they must retain their value from one action to another. The persistence of attributes is achieved by storing them in the odb.bdb file of the object repository. They are accessed using a key formed from the OID and attribute name. The key contains a pointer to the actual data itself.

Methods A method is stored as a definition in the imdb.bdb file of the repository. The method is accessed using a key formed from the method’s object identifier and the tag .meth.. The method key points to the method definition data storage location. The method definition data includes specific information about the execution, security, and inheritance of each method. This information includes the user ID, group ID, supported architectures, execution model, executable storage location, signature, implementation ID, access control lists (ACLs), and method roles. Two other TRUE/FALSE characteristics are also stored: the execute and export attributes. An execute characteristic of TRUE means that the method can be executed in the context of the object that stores the method definition. An export characteristic of TRUE means that the method defined can be inherited by other objects when its object is inherited.

26

Troubleshooting Tivoli Using the Latest Features

Inheritance and method resolution Objects can inherit from one or more other objects. This means that an object can define a set of its own methods and attributes, and can also specify support for another object’s methods and attributes through inheritance. Given an inheritance structure, each client method request must be resolved to find the object that stores the method definition. The resolution method reads the inheritance information in the imdb.bdb file, and searches for an object that contain the method definition for the method being requested. The resolve processes returns an OID. The OID is used to first get the method header (part of the method definition) information, which includes the list of valid architectures. It checks the architecture list to make sure that the interpreter type of the target is supported. Then, using the architecture value, it looks up the rest of the method definition consisting of storage and model characteristics. Using this information, it correctly executes the method using a “fork” of the oserv and an “exec” of the method’s executable.

Collection objects Collections are used to organize objects. There are three types of collections: 򐂰 Basic collection 򐂰 Filtered collection 򐂰 Nested collection Each of these collections are provided by Tivoli as inheritable abstract classes. A collection contains lists (or sets) of other objects in the repository. Starting with the top-level collections, it is possible to browse the repository or objects using the collection hierarchy. Nested collections can have other collections as members. This results in a hierarchy. Browsing collections is accomplished by using the wls command. Objects in collections and collections themselves are identified using an object path that reflects the object hierarchy, and support both relative and absolute paths through the concept of a default collection. To change the collection, use the wcd command. Object paths’ notation support both the “.” and “..” notation common to UNIX file directory commands.

Chapter 2. Tivoli Object Database architecture

27

Library

Administrators

Regions

ManagedNode

Root_host-region TOPAS

RIVAS

Host Test Figure 2-5 Example collection hierarchy

Figure 2-5 contains an example of a collection hierarchy. The top (or /) of the hierarchy is the collection of distinguished objects. This is a logical model of how objects are organized into collections. This is not a physical location of objects; objects exist only in the object repository (the odb.bdb). They are linked into various collections in order to keep track of them, and to associate or group objects together for a common purpose.

Resources A resource is an object that is registered in the Tivoli Name Registry (TNR). Each resource has a resource type. The type is, by convention, the label of the object’s Class object. Each resource type has a Class object registered under the Classes resource type. Each resource type is a list of ObjectInfo data elements. Each new class creates a new resource type, so the total list of all resource types is a dynamic list that can be changed.

Tivoli Name Registry The TNR is the main directory service for Tivoli. With the wlookup command, you can get the OID from any registered resource. The TNR is implemented as a set of data pages in the repository, which are dedicated to storing lists of resources by their resource type and label. Each resource type has a separate master page index containing entries for all the resources of that type. The fact that each resource has a separate index in the master page greatly improves response time for this critical repository component. See Figure 2-6 on page 29 for more details.

28

Troubleshooting Tivoli Using the Latest Features

resourcetyp1 instance1 instance2 instance3

reso urcetyp2 oid1 oid2 oid3

instan ce11 instan ce12 instan ce13

oid11 oid12 oid13

Figure 2-6 Relationships in the TNR

Distinguished resources Distinguished resources are a special class of resources. They generally hold TMR-wide data or provide a TMR-wide service. For this reason, they are single instance objects and necessarily have a class object. There is only one instance of a distinguished resource per TMR. Examples for distinguished objects are: 򐂰 Tivoli Name Registry 򐂰 Scheduler

Substrate objects Substrate objects are so-called because they operate at the lowest level of Tivoli Management Framework. They provide key methods and attributes that are necessary for the correct operation of Tivoli Management Framework overall. Two substrate objects exist on each managed node: the base object and the oserv object. The base object is what all other objects inherit from. The oserv object is the object dispatcher. The third object exists only on the TMR Server. This is called the ALI, or the security object. ALI is a term that refers to the Authorization, Location, and Inheritance actions the oserv performs as part of processing each method request. The OIDs for these objects are always the same: 򐂰 Base object: ..0 (aliased to 0.0.0) 򐂰 Oserv object: ..2 򐂰 ALI object: .0.0 The methods of these objects are intrinsic, which means they are implemented by the oserv executable (internal methods). The oserv executable is a large, multi-threaded process that invokes requested methods using a fork and exec mechanism.

Base object The base object is the building block for all other objects in a given repository. Base objects exist on each managed node. Each node has an alias for the base object, which is 0.0.0. The real base object’s OID is always the ..0 on each dispatcher. The base object provides

Chapter 2. Tivoli Object Database architecture

29

fundamental attributes and methods, which are accessible by all other objects on a given node. Figure 2-7 shows the contents of the base object.

ATTRIBUTE:HostLocation ATTRIBUTE:NameRegistry ATTRIBUTE:baselist ATTRIBUTE:fileioRef ATTRIBUTE:master_base_oid ATTRIBUTE:oserv ATTRIBUTE:security_objid ATTRIBUTE:skeleton METHOD:addattr METHOD:bo_set_acl METHOD:clone METHOD:contents METHOD:corba_setattr METHOD:echo METHOD:get_baselist METHOD:get_host_location METHOD:get_master_base METHOD:get_name_registry METHOD:get_oserv METHOD:get_security_objid METHOD:getattr METHOD:i_getattr METHOD:i_setattr METHOD:is_visible METHOD:o_add_groups METHOD:o_addattr

METHOD:o_backup METHOD:o_clone METHOD:o_contents METHOD:o_get_capabilities METHOD:o_get_groups METHOD:o_get_principal METHOD:o_getattr METHOD:o_increment_version METHOD:o_is_visible METHOD:o_remove_groups METHOD:o_restore METHOD:o_rmattr METHOD:o_rmobj METHOD:o_self METHOD:o_set_groups METHOD:o_setattr METHOD:o_visible METHOD:oi_add METHOD:oi_get_list METHOD:oi_move METHOD:oi_remove METHOD:oi_stat METHOD:om_add_header METHOD:om_create METHOD:om_debug METHOD:om_define

METHOD:om_depend METHOD:om_enable METHOD:om_get_acl METHOD:om_get_definition METHOD:om_get_depend METHOD:om_get_implid METHOD:om_get_level METHOD:om_get_roles METHOD:om_get_sig METHOD:om_remove METHOD:om_set_acl METHOD:om_set_catalog METHOD:om_set_id METHOD:om_set_implid METHOD:om_set_level METHOD:om_set_roles METHOD:om_set_sig METHOD:om_stat METHOD:om_undefine METHOD:resolve METHOD:rmattr METHOD:rmobj METHOD:self METHOD:setattr METHOD:visible

Figure 2-7 Base objects contents

Whenever you are invoking any fundamental methods from the command line, be sure of what you are doing. The methods of the base object can be very destructive, particularly setattr, rmattr, or rmobj. None of these commands have safeguards against misuse.

Oserv object The oserv object is a core object that contains information pertaining to the local oserv. One oserv object exists on each managed node. The oserv process is the core process of the Tivoli Management Framework. It invokes methods and is the access manager for the repository. All data retrievals and updates to and from the object repository are performed through the oserv. See the oserv object contents in Figure 2-8 on page 31.

30

Troubleshooting Tivoli Using the Latest Features

ATTRIBUTE:httpd_port ATTRIBUTE:security_level METHOD:boot_method METHOD:cntl METHOD:dispatcher_status METHOD:fileops METHOD:get_interp METHOD:gwlist METHOD:mmcntl METHOD:oc_dbops METHOD:odbls METHOD:odlist METHOD:query

Figure 2-8 oserv object contents

ALI object The ALI object is responsible for security-related methods and attributes. It manages the administrator login process, the inter-region methods, and the remote connections. It is also called the TMR Server or security object. The method also supports the authorization process invoked whenever some action is requested, such as to list, distribute, add, remove, and so on. For example, when the oserv receives a request to execute some method, it is necessary to look up the authority of the administrator that is making the request using the get_principal_roles method. The results of this method are compared with the methods access control list (ACL, as returned by the Base object’s om_get_acl method) and the security group in which the object belongs (as returned by the Base object’s o_get_groups method). If the authorizations are correct, the method is executed, otherwise a NO_PERMISSION exception is thrown. Figure 2-9 on page 32 shows the ALI object contents.

Chapter 2. Tivoli Object Database architecture

31

METHOD:change_groups METHOD:get_identity METHOD:get_principal_admin METHOD:get_principal_id METHOD:get_principal_roles METHOD:get_principals METHOD:get_secure METHOD:idmap METHOD:idmap_list_entries METHOD:idmap_list_maps METHOD:idmap_resolve_entry METHOD:region METHOD:rls METHOD:rls_query METHOD:set_principal_admin METHOD:set_principal_id METHOD:set_principal_roles METHOD:set_secure METHOD:sysmode

Figure 2-9 ALI object contents

Class object A basic element in the object system is the class object. A class object defines the attributes and methods common to a set of objects. The objects that share these common attributes and behavior are called members, or instances, of the class. Most new objects are instances of a class object and are created by the create_instance method of the class object. The class object provides instance management services for new objects. A class object is itself an instance of an instance manager object and is created by the class library object. All class objects have the same structure (that of an instance manager) and the values of all class object attributes are stored on the TMR Server. See Figure 2-10 on page 33 for an illustration of the process of creating a new instance. It is important to remember that an instance of an instantiable class object is not a clone of the class object, that is, the instance does not mimic the structure of the class object. Instead, the instance of an instantiable class object is the clone of one of the supporting objects (specifically the prototype object).

32

Troubleshooting Tivoli Using the Latest Features

Instance Manager

Cloned by Library

Class Object Prototype

Instance

Cloned by Class Object

Figure 2-10 Process of a new instance

Object hierarchy The behavior object of a class contains the definitions of all methods defined for its class object. All instances of the class object inherit the behavior object, and thus have access to all methods of the class. The behavior object inherits the behavior objects from all inherited classes. Thus, the instances have access to the methods of inherited classes. The extension object contains only customized methods that have been added by the Tivoli Application Extension Facility (AEF) wputmeth command. The prototype object is the class-specific skeleton object used to clone new instances of the class. The attributes of the prototype object are duplicated in each new instance. The attributes allocated to the prototype object are a merger of the attributes specific to the class object and the attributes of the prototype object or each inherited class object. The inheritance of the prototype object is also the inheritance of each new instance of the class. The prototype object first inherits from the extension object, which then inherits from the behavior object. In this way, any method defined by the extension object overloads or replaces any method of the behavior object that has the same method name. The default policy and validation policy object lists contain the object references of each default and validation policy object created for the class. The methods of these policy objects implement the policy for the class object. The policy objects are not defined for every class object, only for class objects that defined managed resources (for example, policy region and task library). See an illustration of the object hierarchy in Figure 2-11 on page 34.

Chapter 2. Tivoli Object Database architecture

33

Class Object

behavior prototypes extension def_policies val_policies pres_object members

behavior prototype extension default policy validation policy

presentation

Instance

class_objid Figure 2-11 Object hierarchy

2.3.4 Complementary terminology: TEIDL, Interface, and Repository The following list provides terminology for Tivoli Extended Interface Definition Language (TEIDL), Interface, and Repository: 򐂰 TEIDL Interface Definition Language (IDL) is an industry standard language used for describing objects. Tivoli has extended the IDL with support for exceptions and transactions. They have also added new language components to specify implementation. This extended language is called Tivoli Extended IDL. The most prominent implementation was the addition of the concept of class objects. IDL and TEIDL define the structure of the data in the object repository. The data includes properties of the resources and the definitions of the services provided by these objects. 򐂰 Interface IDL provides a standard for defining the services and data elements that make up the contents of the object repository. Services and data elements can be classified as interfaces. The implementation of the interface identifies which program will be executed when an operation is requested, what permissions are required, what type of program it is (Perl script, C, or Shell script). The specification of the implementation of an interface is accomplished by TEIDL. An interface defines a class of objects with specific behaviors, attributes, access requirements, and executable binaries, which

34

Troubleshooting Tivoli Using the Latest Features

implements methods related to the class. TEIDL defines an implementation for each interface, which results in the creation of new classes with specific properties. Interface definitions describe operations available to an object implementation. Interfaces are implemented as classes. 򐂰 Interface Repository

Objects defined using IDL and TEIDL are stored in the Interface Repository. The IR is a CORBA-defined entity. The types of things defined in the IR include attributes, modules, interfaces, operations, and datatypes. The definition of these entities are defined by the CORBA specification and can be found in TME 10 ADE Tivoli Management Framework Services, GC31-8348. The retrieval of these definitions is based on their interface Repository ID. The notation used to access the ID is ::::, such as: TMF_ManagedNode::Managed_Node::label

which actually represents the name of a managed node in the object database.

2.3.5 Internals of Interface: Datatypes, attributes, and operations An IDL interface specification contains the following components: 򐂰 Module definition 򐂰 Audiotape definitions 򐂰 Interface definitions 򐂰 Attribute definitions 򐂰 Operation definitions

A module name is the part of fully qualified name of each component. The audiotape definitions define the structure of object specific information. These definitions are used to define the structure of object attributes and operation parameters. Attributes define the name and the type of data stored with each object. The operation definition defines the name and the parameters of each type of service provided by the object. A module definition may consist of one or more audiotape definitions, interface definitions, attribute definitions, and other module definitions. The module name becomes part of the full name of all elements within its boundaries. The following example illustrates the relationship between module, interface, and methods using the irview command. The actual specification of the method, the module, and the interface is in the man_node.idl, located in the /$BINDIR/.../include/$interp/tivoli directory on a managed node where Tivoli ADE

Chapter 2. Tivoli Object Database architecture

35

is installed. For example, for an AIX Version 4 managed node, it is /usr/local/Tivoli/include/aix4-r1/tivoli. Example 2-1 lists the interfaces defined in module TMF_ManagedNode. If you look in the file man_node.idl, you can see the definitions of each interface in the file. The command used to list the interfaces in the module TMF_ManagedNode is: irview TMF_ManagedNode contents

Example 2-1 Interfaces defined in the module TMF_ManagedNode bass:/#irview TMF_ManagedNode contents 1438246632.1.4##6@TMF_ManagedNode::t_stat 1438246632.1.4##6@TMF_ManagedNode::t_stat_list_t 1438246632.1.4##7@TMF_ManagedNode::T_S_IFMT 1438246632.1.4##7@TMF_ManagedNode::T_S_IFBLK 1438246632.1.4##7@TMF_ManagedNode::T_S_IFCHR 1438246632.1.4##7@TMF_ManagedNode::T_S_IFDIR 1438246632.1.4##7@TMF_ManagedNode::T_S_IFIFO 1438246632.1.4##7@TMF_ManagedNode::T_S_IFREG 1438246632.1.4##7@TMF_ManagedNode::T_S_IFLNK 1438246632.1.4##7@TMF_ManagedNode::T_ENOENT 1438246632.1.4##7@TMF_ManagedNode::T_EACCES 1438246632.1.4##7@TMF_ManagedNode::T_EFAULT 1438246632.1.4##7@TMF_ManagedNode::T_ENOTDIR 1438246632.1.4##7@TMF_ManagedNode::T_ENAMETOOLONG 1438246632.1.4##7@TMF_ManagedNode::T_EMLINK 1438246632.1.4##6@TMF_ManagedNode::file_settings 1438246632.1.4##6@TMF_ManagedNode::file_settings_list_t 1438246632.1.4##6@TMF_ManagedNode::net_drop 1438246632.1.4##6@TMF_ManagedNode::net_drop_list_t 1438246632.1.4##6@TMF_ManagedNode::consumer 1438246632.1.4##6@TMF_ManagedNode::consumer_list_t 1438246632.1.4##8@TMF_ManagedNode::ExMessage 1438246632.1.4##2@TMF_ManagedNode::SysInfo 1438246632.1.4##2@TMF_ManagedNode::AppInstall 1438246632.1.4##2@TMF_ManagedNode::gui 1438246632.1.4##2@TMF_ManagedNode::Managed_Node 1438246632.1.4##2@TMF_ManagedNode::Managed_NodePD 1438246632.1.4##2@TMF_ManagedNode::Managed_NodePV 1438246632.1.4##2@TMF_ManagedNode::TaskExecute bass:/#

Example 2-2 on page 37 lists the methods defined in the interface Managed_NodePD. The command used to list the methods defined in Managed_NodePD interface is:

36

Troubleshooting Tivoli Using the Latest Features

irview TMF_ManagedNode::Managed_NodePD contents

Example 2-2 Methods accessible by the interface Managed_NodePD bass:/# bass:/#irview TMF_ManagedNode::Managed_NodePD contents 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_ dialog 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::ManagedNode_get_delete_ dialog 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_ cli 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::set_local_label 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::is_supported_interface 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::set_policy_region_name 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::meth_wputpolm 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::meth_waddpolm 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::meth_wrmpolm 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_cache_info 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::add_backref_optimized 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::add_backref 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_backrefs 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::remove_backref 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_member_data 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::check_db 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::fix_db 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_resource_host 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::remove 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::add_backref 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_backrefs 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::remove_backref 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_manager 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_type_name 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::move_to_policy_region 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_policy_region 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::get_policy_region_name 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::check_db 1438246632.1.4##4@TMF_ManagedNode::Managed_NodePD::fix_db 1438246632.1.4##3@TMF_ManagedNode::Managed_NodePD::pres_object 1438246632.1.4##3@TMF_ManagedNode::Managed_NodePD::sort_name 1438246632.1.4##3@TMF_ManagedNode::Managed_NodePD::state 1438246632.1.4##3@TMF_ManagedNode::Managed_NodePD::label bass:/#

Some methods in Example 2-2 are inherited by the Managed_NodePD interface and the rest of them are actually defined in the interface specification file man_node.idl. A double colon “::” is used to reference interfaces, methods, and attributes in the model.

Chapter 2. Tivoli Object Database architecture

37

As you can see in Example 2-3, methods such as ManagedNode_get_create_dialog are visible in the interface definition file. Use the following command to get more information about the method ManagedNode_get_create_dialog in the interface: irview TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog describe

Example 2-3 Part of man_node.idl interface definition file /* Component: Managed Node Module * Description: Module for the Managed Node Interfaces. */ #if !defined man_node_idl #define man_node_idl #pragma generate False #include #include #include #include #include #include #include #include #pragma generate True #ifdef JAVA_IDL #pragma package "com.tivoli.framework" #endif module TMF_ManagedNode { #define PRIVILEGED_FILEIO "privileged_fileio" .......... /* Here is the class that provides the Default Policy */ interface Managed_NodePD : TMF_SysAdmin::PolicyDrivenBase { /* * This is the call back that must be defined for * the creation of managed resources. The name and * the signature are determined by the policy region code. */ void ManagedNode_get_create_dialog( in TMF_Types::ObjectList notused, in TMF_Types::StringList env, out TMF_Types::ObjectList results); void ManagedNode_get_delete_dialog( in TMF_Types::ObjectList in TMF_Types::StringList out TMF_Types::ObjectList void ManagedNode_get_create_cli( in TMF_Types::StringList argv,

38

Troubleshooting Tivoli Using the Latest Features

objects, env, results);

in Object objid, out boolean work_to_do, out TMF_Types::XOpenMessage status); }; /* Here is the class that provides the Validation Policy */ interface Managed_NodePV : TMF_SysAdmin::PolicyDrivenBase { TMF_SysAdmin::TMF_PolicyResultList ManagedNode_validate_all( in SysAdminTypes::ObjectList objects); };

See the output of this command in Example 2-4 on page 39. Example 2-4 Retrieving information about the methods in Managed_NodePD bass:/#irview TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog describe OperationDescription name: ManagedNode_get_create_dialog id: TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog defined in: TMF_ManagedNode::Managed_NodePD TypeCode: void kind: tk_void to_orb_free: 0 size: 0 # parms: 0 mode: NORMAL ParameterDescription name: notused id: TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog::notused defined in: TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog TypeCode: TMF_Types::_sequence_Object_ObjectList kind: tk_sequence to_orb_free: 1 size: 12 # parms: 2 mode: IN ParameterDescription name: env id: TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog::env defined in: TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog TypeCode: TMF_Types::_sequence_string_StringList kind: tk_sequence to_orb_free: 1 size: 12 # parms: 2 mode: IN

Chapter 2. Tivoli Object Database architecture

39

ParameterDescription name: results id: TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog::results defined in: TMF_ManagedNode::Managed_NodePD::ManagedNode_get_create_dialog TypeCode: TMF_Types::_sequence_Object_ObjectList kind: tk_sequence to_orb_free: 1 size: 12 # parms: 2 mode: OUT

2.3.6 An overview of datatypes used in interface definitions Table 2-1 shows the datatypes used in interface definitions. All datatypes are derived from five main types: 򐂰 Integer types (5, 500, -30)

Short, long, unsigned, or unsigned long 򐂰 Floating point types (23.4, -5.7)

Float or double 򐂰 Character audiotape (‘C’, ‘\344’)

Char 򐂰 Boolean

TRUE or FALSE 򐂰 Constructed datatypes Table 2-1 Derived datatypes and related examples

40

Datatype

Syntax

Examples

array

Multi-dimensional in row-major order.

char char_array[3]; {‘A’ ‘B’ ‘C’}

enum

Unscoped enumerated name.

ex::color blue

sequence

Length followed by elements. No elements listed if length is zero.

StringList s_list; {3 “AA” “BB” “CC”}

object

String representation form.

1438246632.1.348

typecode

Fully scoped reference.

{ex::StringList}

Troubleshooting Tivoli Using the Latest Features

Datatype

Syntax

Examples

any

Typecode followed by value.

{{ex::StringList} {2 “AA” “:BB”}}

union

Discriminator followed by union.

ex::u {green “string value”}

struct

The members in order of description.

ex::u::s {{‘A’ ‘B’ ‘C’} {3 ‘A’ ‘B’ ‘C’} TRUE}

Datatypes are also used in implementation of interfaces (classes). The implementation body for a class may contain: 򐂰 Datatypes 򐂰 Constants 򐂰 Attributes 򐂰 Methods

The implementation of interfaces are defined in *.imp files, such as man_node.imp. When you install Tivoli ADE, you can find these files in /usr/local/Tivoli/include/aix4-r1/tivoli directory on an AIX machine. You can look in imp files to get more information about methods, attributes, datatypes, and so on. For example, you can run the following command to find a specific attribute: grep attr /*.i* | grep

An example of this command is shown in Example 2-5 on page 41. Example 2-5 Finding a specific attribute in imp and idl files ass:/# bass:/#pwd /usr/local/Tivoli/include/aix4-r1/tivoli bass:/#grep attr /usr/local/Tivoli/include/aix4-r1/tivoli/*.i* |grep label /usr/local/Tivoli/include/aix4-r1/tivoli/SysAdmin.idl: attribute string label; /usr/local/Tivoli/include/aix4-r1/tivoli/TMF_SysAdmin.imp: attribute string label; bass:/#

In the preceding example, the usage of an attribute label has been found in two files, SysAdmin.idl and TMF_SysAdmin.imp, in the form of: attribute string label;

This statement is an attribute declaration. The attribute label is declared as a datatype string, as shown in Example 2-6 on page 42.

Chapter 2. Tivoli Object Database architecture

41

Example 2-6 The declaration of attribute label as datatype string in SysAdmin.idl interface PolicyDrivenBase : ... { attribute string label; SysAdminLifeCycle::HostLocation get_resource_host(); void remove (); /* normally inherited from LifeCycleObject */ }; // End of PolicyDrivenBase interface

2.3.7 Internals of profile managers, profiles, and CCMS The content in this section is mostly derived from the Tivoli Field Guide: The Tivoli Configuration and Change Management System (CCMS). See 1.3.1, “Tivoli Field Guides” on page 7 for information on how find this publication. The Tivoli Configuration and Change Management System (CCMS) is a common component provided with Tivoli Management Framework. It is concerned with profiles, profile managers, subscribers, and profile distribution. CCM manages application data for all profile-based products such Tivoli Enterprise Console. It manipulates this data through the profiles, such as the following: 򐂰 Tivoli Enterprise Console (TEC) has adapter configuration profiles (ACPs). 򐂰 IBM Tivoli Configuration Manager has software package profiles. Note: Understanding how CCMS works can be invaluable for designing the implementation and maintenance of profile-based applications and debugging distribution problems.

This section details CCMS and how it manages the application data for the Tivoli products. It describes the CCMS terminology and the profile distribution mechanism.

CCMS components CCMS uses profile managers, profiles, and subscribers for data management.

Profile managers and profiles The basic component of CCMS is a profile. It consists of an TME object and a set of records in a CCMS database. Associated with the profile objects are dialogs for accessing the CCMS records and application-specific methods for maintaining the records. Profiles contain application records. For example, Tivoli User Administration user profile records contain user information, and Tivoli Distributed Monitoring sentry

42

Troubleshooting Tivoli Using the Latest Features

profile records contain the monitor definitions. The CCMS database can reside on traditional (database) profile managers (PM), dataless profile managers (DPM), and profile endpoints (PE). Note: The terms database and dataless refer to the subscribers of the profile manager, not the profile manager itself. Both types of profile managers contain data, but the subscribers of the dataless profile, Tivoli Management Agents (TMAs), do not contain data.

The dataless profile manager was introduced with the three-tier architecture. The CCMS database is implemented as a set of records on the profile manager, dataless profile manager, or profile endpoint objects. It is not a relational database that is external to the Tivoli object database. Profile managers contain application specific profiles, such as an Inventory profile. Although profiles contain records for application-specific data, profile records are associated with the profile manager object, not the profile itself. This relationship is shown in Example 2-7. The objcall command in the eight line lists all attributes and methods associated with the profile manager All_Profiles_PM. The idlcall commands in the last eight lines in Example 2-7 return the label of each profile associated with the OIDs of sentry and ACP profiles. Example 2-7 Working out profile names using idlattr bass:/#wlookup -ar ProfileManager ACPdefault 1438246632.1.983#TMF_CCMS::ProfileManager# All_Profiles_PM 1438246632.1.1082#TMF_CCMS::ProfileManager# I001_B_TOPAS_PM 1438246632.1.843#TMF_CCMS::ProfileManager# I001_M_TOPAS_PM 1438246632.1.837#TMF_CCMS::ProfileManager# I001_S_MaintMode_Alle_PM 1438246632.1.1038#TMF_CCMS::ProfileManager# bass:/# bass:/#objcall 1438246632.1.1082#TMF_CCMS::ProfileManager# contents ATTRIBUTE:BDBPG:ACP:1438246632.1.1087:0 ATTRIBUTE:BDBPG:ACP:1438246632.1.1087:1 ATTRIBUTE:BDBPG:ACP:1438246632.1.1094:0 ATTRIBUTE:BDBPG:ACP:1438246632.1.1094:1 ATTRIBUTE:BDBPG:ACP:1438246632.1.1095:0 ATTRIBUTE:BDBPG:ACP:1438246632.1.1095:1 ATTRIBUTE:BDBPG:ACP:1438246632.1.1096:0 ATTRIBUTE:BDBPG:ACP:1438246632.1.1096:1 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_0_1438246632.1.1088#Sentry::All#:0 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_0_1438246632.1.1088#Sentry::All#:1 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_1438246632.1.1083#Sentry::All#:0 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_1438246632.1.1083#Sentry::All#:1

Chapter 2. Tivoli Object Database architecture

43

ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_1_1438246632.1.1089#Sentry::All#:0 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_1_1438246632.1.1089#Sentry::All#:1 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_2_1438246632.1.1090#Sentry::All#:0 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_2_1438246632.1.1090#Sentry::All#:1 ATTRIBUTE:_BOA_id ATTRIBUTE:class_objid ATTRIBUTE:collections ATTRIBUTE:databases ATTRIBUTE:flags ATTRIBUTE:label ATTRIBUTE:last_failed ATTRIBUTE:members ATTRIBUTE:pres_object ATTRIBUTE:pro ATTRIBUTE:pro_name ATTRIBUTE:profile_push_order ATTRIBUTE:push_trans_commit_behavior ATTRIBUTE:resource_host ATTRIBUTE:skeleton ATTRIBUTE:sort_name ATTRIBUTE:state ATTRIBUTE:subscribers ATTRIBUTE:subscriptions bass:/#idlattr -tgv "acp_profile_1" bass:/#idlattr -tgv "acp_profile_2" bass:/#idlattr -tgv "sentry_profile_1" bass:/#idlattr -tgv "sentry_profile_2" bass:/#

1438246632.1.1095 label string 1438246632.1.1096 label string 1438246632.1.1089 label string 1438246632.1.1090 label string

The BDBPG attributes are the CCMS records for the profiles and are divided as follows: 򐂰 The ACP attributes are the TEC adapter configuration profile records. 򐂰 The Sentry2.0 attributes are the Tivoli Distributed Monitoring profile records.

The BDBPG attributes for security profiles and user profiles are security_db and user_db, respectively. Thus, the profile manager is the CCMS database for all profiles it manages.

44

Troubleshooting Tivoli Using the Latest Features

Note: There is not a one-to-one relationship between the CCMS records in the profile manager and the records displayed in the profiles. There is a single CCMS record for each profile record, but there can be additional CCMS records that do not show up in a profile display, such as records pending deletion.

Subscribers Profile managers, both database (PM) and dataless (DPM), can have zero or more subscribers. Traditional profile manager (PM) subscribers can be profile endpoints (PEs), other traditional profile managers, and dataless profile managers. Dataless profile manager (DPM) subscribers can be profile endpoints or TMA endpoints.

CCMS distribution The management by subscription concept revolves around subscribers being configured according to the related profile records. The configuration relationship is reflected in the CCMS push (distribute) operation, where a set of records is pushed to an application. A distribution operation can be requested on a database profile manager, dataless profile manager, profile endpoint (managed node), or selected profiles. This operation can be directed at the entire subscriber list or to specific selected subscribers. A typical user profile distribution mechanism is illustrated in Figure 2-12 on page 46.

Chapter 2. Tivoli Object Database architecture

45

User Profile

Profile Manager All_Nodes

Profile Manager All_UNIX

Managed Node

/etc/hosts

Figure 2-12 Distribution of a user administration user profile

There are four levels of a profile push, as shown in Figure 2-12: 1. From the original user profile to the All_Nodes profile manager. This creates (or updates) a copy of the profile records on the All_Nodes PM object. 2. From the All_Nodes profile manager to the All_Unix profile manager. This creates (or updates) a copy of the profile records on the All_Unix PM object. 3. From the All_Unix profile manager to the UNIX managed node object (profile endpoint). This creates (or updates) a copy of the profile records on the managed node object. 4. From the managed node (profile endpoint) to the endpoint code (in this case, Tivoli User Administration code) to apply the changes to the system files. The final distribution occurs between: 򐂰 A dataless profile manager and its subscribers, which can be PEs or TMA endpoints. 򐂰 A profile endpoint and its implicit subscriber, such as the last push in Figure 2-12.

Sending profile records to the application endpoint code is always the last distribution. All other profile distributions are between a traditional profile manager and its subscribing profile managers, dataless profile managers, and profile endpoints.

46

Troubleshooting Tivoli Using the Latest Features

2.3.8 The final (dataless) distribution The final distribution presents the profile data to the application endpoint code. This distribution can be either of the following: 򐂰 A push from a dataless profile manager to its subscribers (profile endpoints or TMA endpoints). 򐂰 A push from a profile endpoint to its implicit subscriber.

The mechanism involves CCMS delivering the contents of each profile involved in the push to the application responsible for the profile. The contents are delivered through a registered application-supplied endpoint method. At this time, the application becomes responsible for the configuration data and semantics of the push. Thus, the Tivoli Management Framework service hands over a set of records and some control information, such as the type of distribution, and the application endpoint code applies the data as it sees fit. If a profile endpoint is subscribed to a dataless profile manager and a profile is pushed from the DPM to the PE, the CCMS database on the profile endpoint object is bypassed. For example, if pushing a new user profile from a dataless profile manager to a managed node, the profile records are passed directly to the Tivoli User Administration code without writing a copy of the user profile to the managed node object. Example 2-8 shows a final dataless distribution of a security profile to an OS/390 endpoint. Example 2-8 lcfd.log for profile distribution to an OS/390 endpoint Nov 09 14:14:15 Q lcfd New connection from 10.0.2.47+3498 Nov 09 14:14:15 Q lcfd Entering net_recv, receive a message Nov 09 14:14:15 Q lcfd Leaving net_recv: bytes=288, Nov 09 14:14:16 1 lcfd Spawning: /usr/lpp/Tivoli/lcf/preload//bin/os390/TME/SECURITYE/SecEpt, ses: 11d08b6d Nov 09 14:14:22 Q lcfd Entering Listener (running). Nov 09 14:14:22 Q lcfd Entering net_wait_for_connection, handle=0x9e768e8 Nov 09 14:14:22 Q lcfd cti_accept (timeout=-1) Nov 09 14:14:27 Q MethInit Entering mrt_run Nov 09 14:14:27 3 MethInit argvÝ0¨=/usr/lpp/Tivoli/lcf/preload//bin/os390/TME/SECURITYE/SecEpt Nov 09 14:14:27 3 MethInit argvÝ1¨=11d08b6d Nov 09 14:14:27 3 MethInit argvÝ2¨=/etc/Tivoli/lcf/dat/last.cfg Nov 09 14:14:27 Q MethInit TIS init'd with table 1047 Nov 09 14:14:27 2 MethInit Looking for method: security_update. Nov 09 14:14:27 Q security_update calling method.

Chapter 2. Tivoli Object Database architecture

47

Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov Nov

09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09

14:14:27 14:14:27 14:14:27 14:14:27 14:14:27 14:14:27 14:14:27 14:14:29 14:14:33 14:14:33 14:14:33 14:14:33 14:14:40 14:14:40 14:14:40 14:14:40 14:14:40 14:14:40 14:14:40 14:14:40

Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q 2

security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update security_update

Entering send_methstat Entering send_struct net_send of 52 bytes, session 298879853 Leaving send_struct Leaving send_methstat Entering net_recv, receive a message Leaving net_recv: bytes=16394, (type=11 Entering LogInitAppend Entering net_recv, receive a message Leaving net_recv: bytes=4604, (type=11 Entering net_recv, receive a Leaving net_recv: bytes=10 method returned. send_results (max/len) 80/18 Entering send_methstat Entering send_struct net_send of 72 bytes, session Leaving send_struct Leaving send_methstat Clean Shutdown security_update.

The dataless profile distribution is triggered from the gateway (a downcall, in three-tier terminology), as shown by the lcfd entries at the beginning of the log. The lcfd daemon initiates the security endpoint code (SecEpt) to receive and process the security profile records. The Tivoli Management Framework MethInit method initiates the Tivoli Security security_update method to receive and process the records. At this time, the CCMS method on the gateway is passing records to the security code on the endpoint.

2.3.9 Database and dataless distribution levels A push operation can be a single-level push or a multi-level push. In a single-level push, the work is carried out from the source profile manager, dataless profile manager or profile endpoint to the immediate set of subscribers. If you perform a single level push from a DPM to a managed node, the application code (such as Tivoli User Administration) receives the profile records. If you perform a single-level push from a PM to a managed node, the profile records are written to the managed node object but not pushed down to the application code. A second single-level push is required from the managed node object copy of the profile to get the records to the application code. In a multi-level push, the single-level push operation is repeated after the first step to the next layer and continued until it cannot go any deeper (the last push being the final [dataless] push). For example, rather than performing two single-level pushes to get the profile records from the PM to the managed node

48

Troubleshooting Tivoli Using the Latest Features

object and then to the application code, one multi-level push would achieve the same thing. Distribution can be performed using the desktop or command line interface (CLI). In the distribution graphical user interface (GUI), you can find the Distribute Defaults dialog and Distribute Profile dialogs under the profile pull-down menu. The two push level options are found in the Distribute To section of the Distribute Profile and Distribute Defaults dialogs. Figure 2-13 shows the distribution levels on the Tivoli Desktop profile panel.

Figure 2-13 Profile distribution levels graphical user interface

The single-level push option maps to the Next Level Of Subscribers option. The multi-level push option maps to the All Levels Of Subscribers option. Note: A drag-and-drop operation of a profile to a subscriber invokes the settings on the Distribute Defaults dialog for that profile.

When using the wdistrib command, the -m flag indicates a multi-level distribution. Absence of the flag indicates a single-level distribution.

2.3.10 Types of distribution The requestor of a distribution must decide how the records of a profile are to be selected and distributed. There are four types of distribution: NO_FORCE, FORCE, FORCE_ALL, and FORCE_ALL_NO_MERGE. These options can be set using the -l parameter with the wdistrib command. Some of the options can be set in the Distribute will section of the distribution dialogs (see Figure 2-13). The mapping of these options to the GUI and wdistrib options is shown in Table 2-2 on page 50.

Chapter 2. Tivoli Object Database architecture

49

Table 2-2 Mapping of distribution options CCMS

CLI (wdistrib -l)

GUI

NO_FORCE

Maintain

Preserve Modifications option

FORCE

over_opts

Not applicable

FORCE_ALL

over_all

Make EXACT COPY option

FORCE_ALL_NO_MERGE

over_all_no_merge

Not applicable

Knowing the CCMS label for the types of push is useful when using the Tivoli tracing tools. 򐂰 Distribution maintain option

The maintain option is the CCMS NO_FORCE distribution type and the Preserve Modifications GUI option. With this type of distribution, records are selected based on internal flags (the profile_oneshots attribute). CCMS distributes only records that are flagged as having changed after the most recent distribution. For the final (dataless) push, the application code determines how the data is applied to the system. For example, Tivoli User Administration overwrites the password in the /etc/passwd file in some instances, even with the maintain option. For the other (database) pushes, if any of the downstream profile copies were changed, these changes are not overwritten. However, how the changes are applied depends on the application. CCMS does not know about attributes; it only knows about records. The application code (the methods associated with the profile object) determines how the data is applied. For example, a user profile and a profile copy are on a subscriber. If you change the UNIX login shell of the user in both the top-level profile and the profile copy (different in each) and also change another attribute in the top-level profile (such as the user identifier), when the profile is distributed with the maintain option, the user identifier in the profile copy is updated, but the UNIX login shell is not. This means that the application code (in this case, Tivoli User Administration) was involved in writing and updating the profile copy records. The behavior can change from application to application. 򐂰 Distribution over_opts option

The over_opts option is the CCMS FORCE option. It was externalized to wdistrib with Tivoli Management Framework Version 3.6. There is no way to specify this option through the GUI. CCMS selects every record in the specified profiles and distributes them, overwriting any changes in lower level profile copies. As part of the final distribution step, all records of the same profile type are merged and pushed down to the application endpoint code. For example, consider what happens in a push of user profiles from a profile manager to a managed node. When the final push is performed from the

50

Troubleshooting Tivoli Using the Latest Features

managed node object to the Tivoli User Administration code, all user profile records on the managed node object, not just those from the profile that was pushed, are merged and presented to the Tivoli User Administration code. This option is the “softer” Make EXACT COPY GUI option, because it does not carry the implicit delete that the over_all options do. The over_opts option is useful for forcing the distribution of all records without doing a Make EXACT COPY. For example, with Tivoli Distributed Monitoring, you can clear all monitors from the monitoring engine by running the wclreng command. This might be required to resolve a problem in the engine. To distribute all of the monitors back to the engine, you can do one of the following: – Modify each record (monitor) and distribute with the maintain option. – Use the over_opts option. 򐂰 Distribution over_all option

The over_all option is the CCMS FORCE_ALL option or the Make EXACT COPY GUI option. CCMS selects every record in the specified profiles and distributes them, overwriting any changes in lower level profile copies. Like the over_opts option, as part of the final distribution step, all records of the same profile type are merged and pushed down to the application endpoint code. However, unlike the over_opts option, the over_all option carries an implicit delete that some applications invoke. This means that the application endpoint code is given a set of records and told this is a FORCE_ALL push, so the application can replace all existing system records with those presented to it. The software that primarily makes use of this option is Tivoli User Administration on UNIX and NT managed nodes and endpoints. With a FORCE_ALL push, Tivoli User Administration replaces the existing set of users on the box with those presented to it. Thus, it deletes all of the users not represented in the pushed profiles. This is the implicit delete. Not all applications behave this way with a FORCE_ALL. Some treat the push in the same way as an over_opts push, ignoring the implicit delete, while other applications ignore it completely. 򐂰 Distribute over_all_no_merge option

The over_all_no_merge option is CCMS FORCE_ALL_NO_MERGE. There is no way to specify this from the GUI. This option is exactly the same as the FORCE_ALL, except that there is no merge on the final step of the distribution. Thus, the records presented to the application endpoint code are only those in the selected profiles and not the merged set, as in the FORCE_ALL. For example, you can use this option when you want the /etc/passwd file to contain only the records in a specific user profile. If you distributed with the over_all or Make EXACT COPY option, and if other user profile copies are on downstream profile managers, everything is merged into the final push.

Chapter 2. Tivoli Object Database architecture

51

The /etc/passwd file then contains users from all merged user profiles, not just the expected one. In this case, an over_all_no_merge distribution must be used.

2.3.11 Advanced CCMS concepts CCMS is a Tivoli Management Framework component that provides methods for creating, managing, and distributing the profiles using Tivoli Management Framework distribution methods. CCMS has five components: 򐂰 CCMS configuration profiles

TMF_CCMS::Profile is the object used by the profile-based applications. Each type of profile collects some related set of parameters and represents the settings for each of those parameters in a platform-independent fashion. Users of the application create and populate instances of a profile type that abstracts some portion of their network or system configuration controlled by that application. 򐂰 CCMS profile databases

Every configuration profile is associated with one or more records in a database (TMF_CCMS::ProfileBase) that maintains the configuration information managed by the profile. Application users can create, view, modify, and delete the records in each database. When the user distributes the profile to its subscribers, the system creates, modifies, and deletes the actual resources managed by the profiles. 򐂰 CCMS profile organizers

CCMS configuration profiles do not access the records in a database directly. Instead, database access is controlled through a profile organizer. A profile organizer (TMF_CCMS::ProfileOrganizer) defines operations to manipulate records in a profile database and distributes them to subscribers. Profile managers and endpoints are examples of profile organizers. 򐂰 CCMS profile managers

Profile managers (TMF_CCMS::ProfileManager) are a front end to the profile organizer. The ProfileManager object stores the OIDs of all objects that are the subscribers to the profile manager. 򐂰 CCMS profile endpoints

Profile endpoints (TMF_CCMS::ProfileEndpoint) receive profile records from a profile distribution and writes those records out to a local file system file or to a local API for implementation. An endpoint for a given profile type can be any object that has the necessary conversion methods. For example, managed nodes, endpoints, NIS domains, and application-specific databases can be endpoints for the associated types of profiles.

52

Troubleshooting Tivoli Using the Latest Features

TMF_CCMS::ProfileManager The following are the attributes and methods for TMF_CCMS::ProfileManager objects: 򐂰 Attributes:

– Flags Interface: TMF_CCMS::profile_manager_flags – Subscribers Interface: TMF_CCMS::subscriber_list. 򐂰 Methods:

– get_subscription_tree: Displays a list of the subscribers hierarchy. – get_subscription_endpoints: Displays a list of subscribers that are endpoints. By using low level commands, you can access the attributes of the TMF_CCMS::ProfileManager object to get the list of subscribers for advanced maintenance of profile managers, profiles, and subscribers. The wgetsub command in Example 2-9 lists the subscribers of the profile manager All_Profiles_PM. Example 2-9 The output of wgetsub command for All_Profiles_PM bass:/# bass:/#wgetsub @ProfileManager:All_Profiles_PM I001_S_MaintMode_Alle_PM I001_S_MaintMode_TMR1_PM I001_S_MaintMode_TMR2_PM I001_S_MaintMode_TMR3_PM

The idlcall command in Example 2-10 lists the subscribers of All_Profiles_PM. Example 2-10 List the subscribers of the profile manager All_Profiles_PM bass:/#idlcall -v 1438246632.1.1082#TMF_CCMS::ProfileManager get_subscription_tree { { 1438246632.1.1082#TMF_CCMS::ProfileManager "All_Profiles_PM" } { 4 { { 1438246632.1.1038#TMF_CCMS::ProfileManager# " I001_S_MaintMode_Alle_PM" } { 0 } TRUE } { { 1438246632.1.1039#TMF_CCMS::ProfileManager# "I001_S_MaintMode_TMR1_PM" } {0 } TRUE } { { 1438246632.1.1040#TMF_CCMS::ProfileManager# "I001_S_MaintMode_TMR2_PM" } { 0 } TRUE } { { 1438246632.1.10 41#TMF_CCMS::ProfileManager# "I001_S_MaintMode_TMR3_PM" } { 0 } TRUE } } TRUE }bass:/#

Chapter 2. Tivoli Object Database architecture

53

To use idlattr to access the profile manager subscribers attribute, do the following: 1. Use wlookup to get the OID for the profile manager All_Profiles_PM. 2. Use objcall to get the attribute names for the resource type ProfileManager. 3. Use irview to get the interface definitions for methods and attributes for profile manager. 4. Use irview to get the typecode for the attribute subscribers. Example 2-11 shows the commands used to access the profile manager subscribers attribute of All_Profile_PM. Example 2-11 List the attributes of the resource type profile manager bass:/#wlookup -ar ProfileManager All_Profiles_PM 1438246632.1.1082#TMF_CCMS::ProfileManager# I001_S_MaintMode_Alle_PM 1438246632.1.1038#TMF_CCMS::ProfileManager# I001_S_MaintMode_TMR1_PM 1438246632.1.1039#TMF_CCMS::ProfileManager# I001_S_MaintMode_TMR2_PM 1438246632.1.1040#TMF_CCMS::ProfileManager# I001_S_MaintMode_TMR3_PM 1438246632.1.1041#TMF_CCMS::ProfileManager# bass:/#objcall 1438246632.1.1082#TMF_CCMS::ProfileManager contents ATTRIBUTE:BDBPG:ACP:1438246632.1.1087:0 ATTRIBUTE:BDBPG:ACP:1438246632.1.1087:1 ATTRIBUTE:BDBPG:ACP:1438246632.1.1094:0 ATTRIBUTE:BDBPG:ACP:1438246632.1.1094:1 ATTRIBUTE:BDBPG:ACP:1438246632.1.1095:0 ATTRIBUTE:BDBPG:ACP:1438246632.1.1095:1 ATTRIBUTE:BDBPG:ACP:1438246632.1.1096:0 ATTRIBUTE:BDBPG:ACP:1438246632.1.1096:1 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_0_1438246632.1.1088#Sentry::All#:0 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_0_1438246632.1.1088#Sentry::All#:1 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_1438246632.1.1083#Sentry::All#:0 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_1438246632.1.1083#Sentry::All#:1 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_1_1438246632.1.1089#Sentry::All#:0 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_1_1438246632.1.1089#Sentry::All#:1 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_2_1438246632.1.1090#Sentry::All#:0 ATTRIBUTE:BDBPG:Sentry2.0:sentry_profile_2_1438246632.1.1090#Sentry::All#:1 ATTRIBUTE:_BOA_id ATTRIBUTE:class_objid ATTRIBUTE:collections ATTRIBUTE:databases ATTRIBUTE:flags ATTRIBUTE:label ATTRIBUTE:last_failed ATTRIBUTE:members ATTRIBUTE:pres_object ATTRIBUTE:pro

54

Troubleshooting Tivoli Using the Latest Features

ATTRIBUTE:pro_name ATTRIBUTE:profile_push_order ATTRIBUTE:push_trans_commit_behavior

For more information about CCMS concepts, methods and attributes, refer to Tivoli Application Services Manual Volume 2 3.6, SC31-8350.

Chapter 2. Tivoli Object Database architecture

55

56

Troubleshooting Tivoli Using the Latest Features

3

Chapter 3.

Problem determination This section gives an overview of tools useful for analyzing the contents of the Tivoli Object Repository. These tools are executed from the command line and perform various types of queries against the repository. The tool range from invoking object methods to browsing the raw object data. Tools that browse the Interface Repository, tools that retrieve attribute values, and tools that examine the status of currently active transactions are all covered in this chapter. The usage of these commands is described in 3.2, “Troubleshooting and search techniques” on page 66. This chapter has the following main topics. 򐂰 Section 3.1, “Object Repository tools” on page 58 򐂰 Section 3.2, “Troubleshooting and search techniques” on page 66

© Copyright IBM Corp. 2003. All rights reserved.

57

3.1 Object Repository tools We will start by describing various tools for analyzing the contents of the Tivoli Object Repository.

3.1.1 bdbe and bdbx Database editor tools provide searching (scanning) and editing capabilities. These tools can be used to perform a raw scan of the data. The scan can be combined with a grep command to find references to objects (all keys contain the OID) or method names. The presence of the OID or the method name in the database indicates the object or method is still in use somewhere. For example: bdbe -s $DBDIR/odb.bdb | grep bdbe -s $DBDIR/imdb.bdb | grep

The bdbx command provides improved search and export/import capabilities. This tools has been specifically designed to help with the untangling of physically corrupted .bdb files. You can get this tool by contacting Tivoli Customer Support.

3.1.2 otherpages The default action is to look for pages in use that are not allocated, and page references that are invalid. otherpages also checks that all referenced pages are marked as allocated and that page references are valid (within range and existing). For example: otherpages $DBDIR/odb.bdb otherpages $DBDIR/imdb.bdb otherpages $DBDIR/notice.bdb

On Windows NT, the oserv process must be shut down, or the files must be copied to some temporary location. The reason for this is the inability of NT to allow multiple programs to access the same data file at the same time. An odadmin db_sync should be issued prior to using this command.

3.1.3 objcall The objcall command is used to invoke non-IDL-based methods from the shell. For example: objcall [-a] [-b] [-c group:role:...] [-e] [-F filedescriptor] [-k len] [-n] [-p port] [-s] [-T transtype] OID method [arg...]

Table 3-1 on page 59 lists and describes the objcall flags.

58

Troubleshooting Tivoli Using the Latest Features

Table 3-1 objcall flag Option

Description

-a

Performs the object call asynchronously.

-b

Passes the objcall command’s standard input to the method’s standard input. If this argument is not specified, the method gets an empty standard input.

-c group:role:...

Performs the object call with the specified group and specified roles. The caller can only specify roles that the caller has. If this option is not specified, the method runs with all of the caller’s roles.

-e

Passes the objcall command’s environment as the method’s environment. If this argument is not specified, the method is given a default miniature environment.

-F Filedescriptor

Specifies the file descriptor number to which to write status information.

-k len

Reads the number of bytes specified by the len option from standard input for the key value. If the -k argument is not specified, no key is used.

-n

Starts the method and exits the objcall command asynchronously without waiting for the method to return results.

-p port

Specifies the local object dispatcher port number.

-s

Creates keys for sending input and output to and from a method. This argument should be specified only if the method being called expects these keys as input.

-T transtype

Specifies a transaction type.

OID

Specifies the object ID of the object that is to run the method.

method

Specifies the method to be run.

arg...

Specifies one or more arguments for the method.

3.1.4 idlcall The idlcall is one of the most frequently used object tool commands. It is the correct way to invoke any IDL-based method of any object. The command is very similar to the objcall command, which is used to invoke non-IDL-based methods, or substrate object methods. In order for the idlcall command to work correctly, the method being invoked must have been defined as an operation, using the CORBA standard IDL, and the interface in which the operation was

Chapter 3. Problem determination

59

defined must be implemented directly or indirectly (through inheritance) by the referenced OID. For example: idlcall [-T transtype] -v [in/inout args]

The idlcall command can invoke the method with or without a transaction context (using the -T option). The method must be invoked with all the defined IN or INOUT mode parameters (as specified in the IDL operation definition). Furthermore, the parameters must be stated in clear text format in order to be marshaled correctly. The output information returned by the method is also in clear text format. The output data from any method is displayed in the order in which the output parameters were stated in the signature of the operation, which is then followed by any results. Note: If the results of a method resolve command is a substrate OID (such as .1.0), then use objcall to invoke the method.

3.1.5 idlattr The idlattr command retrieves, sets, or adds attributes directly to the object via its object identifier. The datatype of the attribute must always be specified. The value of the attribute, if a set or add operation, must be provided using the clear text notation consistent with the datatype of the attribute. So, the idlattr command is best used only to read an attribute value and only for attributes that have not been defined in IDL, and therefore may not have an access method. Attributes can also be defined in the class definition and may not have a defined access method: idlattr -t [-g | -s | -a | -v]

Table 3-2 lists and describes the flags for idlattr. Table 3-2 idlattr flags

60

Option

Description

-t

Datatype is provided (mandatory).

-g

Retrieve attribute value.

-s

Set attribute value (requires a value).

-a

Add attribute and value (requires a value).

-v

Verbose mode.

OID

Object identifier.

name

Name of the attribute as defined in its TEIDL.

Troubleshooting Tivoli Using the Latest Features

Option

Description

type

Type code of the attributes datatype.

value

A type value.

Note: idlattr defaults to -s (set) and can corrupt data if not used with great care.

3.1.6 odbls The odbls command is useful for finding method and attribute information on a per OID or per method basis. Table 3-3 lists the odbls flags. Table 3-3 odbls flags Option

Description

-a

Displays the attributes in the object database.

-I

Walks through the object database inheritance list.

-i

Displays the inheritance trees in the object database. To use this option, you must use the TMR Server’s database.

-k $DBDIR

The directory that contains the object database to be listed. If this option is not specified, the database in the current directory is listed.

-l

Displays a verbose listing of the requested information.

-M methname

Walks through method headers and dumps entries for methname. To use this option, you must use the TMR Server’s database (requires the imdb.bdb file).

-m

Displays the method headers and dumps all entries. To use this option, you must use the TMR Server’s database directory.

-O

Walks through the object database (odb.bdb). This is the default.

-s

Forces the appropriate object dispatcher to update the database that is to be listed. This synchronization ensures that the odbls command reports the same data that the object dispatcher is using. If this option is not specified, then no synchronization is performed before listing the object database contents.

You must have read permissions on the database to use the odbls command. In addition, you must have the super role to use the -s option.

Chapter 3. Problem determination

61

3.1.7 irview The irview command is invoked from the command line just like many other commands. It is used to browse the interface repository. The irview command uses a repository ID to retrieve definitions of: 򐂰 Attributes (IDL) 򐂰 Operations (IDL) 򐂰 Interfaces (IDL and TEIDL) 򐂰 Modules (IDL and TEIDL) 򐂰 Parameters (IDL) 򐂰 Datatypes (IDL and TEIDL)

Common commands are shown in Table 3-4. Table 3-4 Common irview commands Command

Action

contents

Lists the attributes and methods.

describe

Describes a resource type.

describe_ contents

Describes each of the resource types returned by contents.

describe_ interface

Describes the attributes and interfaces for the resource type.

To invoke the command, use: irview

3.1.8 tmstat The tmstat command displays the currently running transactions and locks and their current state. This command is primarily a debugging tool for users who are developing transaction-based applications; it allows such users to observe their transaction hierarchy. Each transaction ID that is displayed implicitly contains the transaction hierarchy; you can interpret {transA}{transB} to be a child of {transA}. For example: tmstat [-k dbdir] [-p port] [-r region] [-va] [baseobjid...]

Table 3-5 on page 63 lists and describes the flags for tmstat.

62

Troubleshooting Tivoli Using the Latest Features

Table 3-5 tmstat flags Option

Description

-k $DBDIR

Specifies the database directory.

-p port

Specifies the local port number.

-r region

Queries a different region. The region option specifies the base object ID on the remote region Tivoli Management Region (TMR) server.

-v

Specifies verbose mode. Lists of subtransactions are dumped.

-a

Displays all object IDs in the base list of the local region or the region specified by -r region.

baseobjid...

Specifies the object dispatcher to query. Multiple baseobjid options can be specified.

“tmstat scenario” on page 87 gives an example of tmstat usage in a real-life scenario.

3.1.9 odstat The odstat command lists the status of current and recent object calls for the specified object dispatcher. This command can list object calls from a running dispatcher. The most common form of the command is simply the command name invoked from the command line. The command is limited to listing not more than 200 active and inactive transactions. Table 3-6 lists and describes the flags for odstat. Table 3-6 odstat arguments Option

Description

-a

Lists all threads. By default, system threads are omitted.

-c

Lists currently running threads.

-d

Lists the active method-daemon processes.

-h

Terminated threads only.

-k $DBDIR

Returns information from the dispatcher that is using the database in the specified directory. Normally, you should also specify the object dispatcher’s process ID (PID). If you do not, the odstat command makes a guess of the object dispatcher’s process ID, and this guess is frequently wrong.

Chapter 3. Problem determination

63

Option

Description

-l

Returns a long listing.

-o

Queries remote object server.

-p

Provides dispatcher process ID.

-s

Returns a short listing.

-v

Specifies verbose mode.

pid

Specifies the process ID of the dispatcher. If you specify this option, you must also specify the -k option.

Table 3-7 describes the various output data that can be obtained from the odstat command. Table 3-7 odstat output

64

Output

Description

TID

An abbreviation for thread ID.

Type

Refers to what type of item is being called.

O

Object call thread.

M

Method thread.

O+

Object call and method threads on the same machine.

a

Asynchronous object call.

h

“Helperless” method.

d

Daemon method.

b

One-way method.

PTID

The parent thread ID. If this field is blank, the objcall was external. Identifies the parent object’s dispatcher number and thread ID.

State

Refers to the status of the objcall (rwait, mwait, run, done, and so on).

Err

Displays error code. Global error codes are defined in . Additional error codes are defined in .

StdO

Represents the number of bytes written to the standard output.

StdE

Represents the number of bytes written to the standard error.

Start

Represents the time the objcall started.

Troubleshooting Tivoli Using the Latest Features

Output

Description

Method

Refers to the method the objcall invoked.

“Odstat scenario” on page 78 gives an example of odstat usage in a real-life scenario. A wtrace is needed when an odstat alone does not give enough information. wtrace is the subject of next section.

3.1.10 wtrace The wtrace command is used to diagnose problems in methods and executables by examining method input, transactions, and method output. Trace data is stored in the odtrace.log file. This file enables the trace data to be persistent across invocations of the object dispatcher. The tracing state, however, is reset to: odadmin trace errors (objcalls, services)

Since wtrace examines the trace log directly, it does not require the object dispatcher to be active, although having one available to run odstat is helpful. You can use the odadmin utility to enable tracing for single or multiple object dispatchers, for the server only, or for clients only. Table 3-8 lists and describes the wtrace flags. Table 3-8 wtrace flags Option

Description

-D

Prints large blocks of input data.

-M

Prints entire argument list.

-E

Does not print error records.

-H

Prints hex dumps.

-I

Does not print input records.

-O

Does not print output records.

-V

Prints out version information.

-f

Prints numbers in formatted form.

-h

Prints header.

-j

Preferred format for normal screens (80 cols).

Chapter 3. Problem determination

65

Option

Description

-J

Preferred format for wide screens (120 cols).

-k $DBDIR

Specifies the location of the database directory.

-l

Prints longer form.

-n

Prints new lines between transactions.

-o

Prints retain old transaction form.

-u

This usage message.

-v

Prints output records.

“wtrace scenario” on page 81 gives an example of wtrace usage in a real-life scenario.

3.2 Troubleshooting and search techniques In the first part of this section, we will talk more about the commands, objects, database terms, and techniques used in searching for specific information in the object database as method definitions. In the second part, we will make use of commands and techniques covered in the first part to troubleshoot the object database and accessing objects, classes, and method definitions. Note: The techniques and commands in this section may cause changes in the object database. Invocations of objcall, idlcall, and idlattr commands can change the status of your database, which, in turn, may have a drastic effect on the operation of your TMR. Therefore, we recommend you back up your object database before executing any of the commands mentioned in this section. Techniques and commands should first be tested, and the outcome of each step should be carefully observed on an isolated TMR before implementing any of these actions on your production environment.

3.2.1 Basic object database access commands The following is a list of the main object commands used for troubleshooting and searching. These commands are explained in detail in 3.1, “Object Repository tools” on page 58. A simple definition of the commands are given here for quick reference. 򐂰 objcall: Used on objects to execute methods. For example: #

66

objcall

Troubleshooting Tivoli Using the Latest Features

If the objcall method fails, try using idlcall or idlattr to execute methods or get/set the values of attributes. 򐂰 idlcall: Used to execute methods, retrieve, and set object attribute values. For example: #

idlcall

򐂰 idlattr : Used to get and set attribute values of objects. For example: #

idlattr -t [g,s]

Attention: Use the idlattr command carefully, since the -s option (set attribute values) is the default. 򐂰 bdbx: Used to statistically analyze the object database files. For example: # bdbx bdbfile

򐂰 odbls: Used to retrieve information about the methods, inheritance, and attribute data. For example: # odbls -lM

򐂰 irview: Used for looking up information in the interface repository. For example: # irview

򐂰 wlookup: Used to look up an instance of a resource type from the Tivoli name registry. 򐂰 wls: Used to look up an object in the local TMR database: wls [-odl] [Path] wls [-odl] /Library/

Examples related to troubleshooting and search Example 3-1 shows the output for odbls -a . Example 3-1 A list of attributes of the managed node bass using odbls bass:/#pwd /var/spool/Tivoli/bass.db bass:/#wlookup -ar ManagedNode bass 1438246632.1.348#TMF_ManagedNode::Managed_Node# mackeral 1438246632.5.7#TMF_ManagedNode::Managed_Node# stuttgart 1438246632.6.7#TMF_ManagedNode::Managed_Node# bass:/#odbls -a 1438246632.1.348 1438246632.1.348 attributes: __BOA_id filename to make the oserv give you the key before you remove the old installation. You can then cut and paste it from the file name during the new install process.

You can use odadmin set_platform_license to load the key. You can install a TMR without a license key but you will not be able to do any remote functions or install a managed node or endpoint. You may also need to change the license key if you mistakenly typed it in incorrectly.

6.3 Server installation: Behind the scenes The information in this section is largely derived from the Tivoli Field Guide: Demystifying the Installation Process. (see 1.3.1, “Tivoli Field Guides” on page 7 for more information on how to find this publication). The Tivoli Framework 3.7.1 Installation Guide, GC32-0395 provides plenty of information on the actual install process. Here, we list the most important items to remember when performing a TMR Server installation. Before installing the Tivoli Management Framework on the server, the user must have determined the following: 򐂰 Login ID with access as root (for UNIX) or Administrator (for Windows NT). 򐂰 A TMR name and server name. 򐂰 The TMR installation password (optional). 򐂰 The Tivoli license key. 򐂰 The path(s) where the files will reside. 򐂰 For Windows NT, a Tivoli remote access account (TRAA), if binaries will be shared over the network.

Once the Install or Install and Close buttons are chosen, the information is verified, and the installation begins.

156

Troubleshooting Tivoli Using the Latest Features

Tip: 򐂰 It is more secure if the TRAA used for sharing binaries in a Windows NT domain is a user with less privileges than the local Administrator’s account. 򐂰 Do not install the TMR Server on a Windows NT Primary Domain Controller (PDC). It can cause problems if you use the domain administrator account and password for installs. 򐂰 Use the local administrator account on machines in a Windows NT domain. 򐂰 The first Windows NT system installed with TRIP in a TMR is identified by the variable CurrentNtRepeat. TRIP is copied from this machine to other Windows NT systems during the install. This variable may need to be changed if there are access problems between this machine and new clients.

The server installation performs the following: 򐂰 Transfers binaries, libraries, man pages, and message catalogs to the server. 򐂰 Installs a template of the object database and modifies it to include the server’s name, region number, interpreter type, and so on. We can see this activity in the oservlog of a new server installation, as shown in Example 6-2. Example 6-2 oservlog of a new server installation Nov 06 16:08:21: $converting odlist region numbers (2099999999 -> 1360991896) Nov 06 16:08:21: $changing ALI's secret key to new random value. Nov 06 16:08:22: $Database mismatch (from stout/146.84.27.11) Nov 06 16:08:22: $Migrating ALI. Nov 06 16:08:22: $changing encryption type from none to simple Nov 06 16:08:22: TME 10 Framework (tmpbuild) #1 Thu Oct 3 08:08:58 CDT 1996 Copyright Tivoli Systems, an IBM Company, 1996. All Rights Reserved. TMR 1360991896. ORB 1. TMR Server local:94. Port 94. pid 24052

The following is a list of some of the attributes that changed during the server installation: 򐂰 Region number 򐂰 Secret key (ALI refers to the TMR Server) 򐂰 Host name 򐂰 IP address 򐂰 Encryption level

Chapter 6. Tivoli core installation process

157

The default object database shipped with the install code includes a place-holder machine name. This name (stout in this case) appears in this file. You might see this name when looking directly at objects in the object database. This name is replaced with the user-supplied name in externalized objects. If the server icon on the desktop does not show the correct server name, then the installation did not complete and was, therefore, unsuccessful.

6.3.1 Troubleshooting server installs A complete list of error files after a server install is shown in Table 6-1. Table 6-1 List of error files after a server install File

UNIX location

Windows NT/2000 location

Location

tivoli.cinstall

/tmp

%dbdir%\tmp

Server

Oservlog

$DBDIR

%dbdir%

Server

install2.cfg.error, install2.cfg.output

/tmp

%dbdir%

Server

The following sections list common server installation problems. In addition, please see 6.6, “Common errors for server and client installs” on page 182 for common errors for both server and client install.

Copying CD-ROM Use wcpcdrom in the original install directory where wpreinst.sh was executed: wcpcdrom /cdrom /cdrom.shadow

This creates a directory tree of soft links pointing to /cdrom. Small files (.cfg and .ind) are copied, so they can be edited if required. There have been cases where wcpcdrom alone does not create the right letter case for files causing a problem when the link is used during installation. If you receive File Not Found messages after using wcpcdrom, you need to check for this problem. If you want to create a disk-based image to install from, the recommended process is to use the Software Installation Service (SIS). The steps below provide another way to create a CD-ROM image of the Tivoli Management Framework on your disk if you still need to do so. This is not supported by SIS: # cd / # mkdir temp.dir # wcpcdrom /cdrom /temp.dir

158

Troubleshooting Tivoli Using the Latest Features

This wcpcdrom command just creates links: # mkdir /TME3 # cd /temp.dir # tar -chf - . * (cd /TME3; tar -xvf -)

This copies the image to disk: # cd / # rm -r temp.dir

Where /temp.dir is a temporary directory, /cdrom is the path to your CD-ROM device, and /TME3 is the final destination of the image. There is a file called file0.tar created here, which you can use in place of the WPREINST.SH command. Copy file0.tar to a temporary directory and then untar it. This basically creates the same structure as WPREINST.SH would, and all you have to do to install a TMR Server is run ./wserver -c / from the temporary directory.

Host name If you install the server using the fully qualified domain name, you will have to set up a .rhosts entry for the server; otherwise, Tivoli thinks that a remote install is being performed and uses rsh.

6.4 Client installation: Behind the scenes There are several different ways that a managed node can be installed, including the following: 򐂰 Command line installation using wclient 򐂰 Software Installation Service (SIS) installation 򐂰 Using the Tivoli Management Framework graphical user interface (GUI) managed node product install 򐂰 Integrated Install (based on InstallShield Multi-Platform technology)

Although these methods all have somewhat different troubleshooting methodologies, there are common troubleshooting tools to all methods. We will explore that common ground, by understanding the process by which a managed node is created as an entity. To do this, we will examine the case of using the GUI installation. While navigating through all steps to create a managed node, we will show you possible points of failure and how to troubleshoot them.

Chapter 6. Tivoli core installation process

159

For more information about the SIS installation, please refer to Chapter 7, “Software Installation Service (SIS)” on page 185. Integrated Install will be covered in Chapter 8, “ISMP based installation (Integrated Installation)” on page 211.

6.4.1 Start of managed node install At the moment that a Tivoli administrator selects Create -> ManagedNode within a policy region, the Tivoli install begins.

Process begins The processes client_gui and managed_node_pd start on the TMR Server. The processes run as the Tivoli administrator's user ID (UID), as defined in the Properties dialog. If a Tivoli administrator is authorized to install (has the install_client or super role assigned), then the administrator's UID must exist on the TMR Server (UNIX**). You can deny login rights for a particular account if your security policies require this (/bin/false, for example). If the tivoli.cinstall file does not exist, the file is created; otherwise, the file is truncated for the new install. It is important to save these files after each install in case a problem arises because any install (creation of managed node, product install, or patch upgrade) reuses this file. The file is saved in %DBDIR%\tmp on the Windows platform, and /tmp on the UNIX platform. In the case that the file is truncated, look for the following lines to identify the start of the install: Previous debug file truncated... bug file truncated... End of previous debugging output.

Install lock This is an example of an install lock: Engine main in... Gui args in real ms=0/0, cpu ms=0/0 Got install lock 0

Note the install lock. The install_engine does not allow for multiple instances.

Installation object At this point, the install_engine reads in the data attribute of the Installation object as follows: idlcall `wlookup Installation` _get_data

160

Troubleshooting Tivoli Using the Latest Features

This data is static and is reflected in the first 186 lines of the tivoli.cinstall. In almost all cases, this information has no bearing on troubleshooting a failed install. However, the information in the data object is important, because it contains the data to identify the different interp types, and what the inspection scripts look like. If problems occur before the transfer of files, and all other avenues are exhausted, the following section might provide a clue to the root cause: Line 8: New configuration item: type=generic, id=ALI_NAME opus Line 186: New configuration item: type=u6000_svr4mp, id=unpackOptions -u-F

Installation directory The next step is to locate the last used directory for an install: localName get: opus Constructing Cdrom obj: opus, /data/cdrom/tmp31, solaris2

In some cases, if the media location was on a managed node that no longer exists, an error is reported and the Client Install window will not appear. To verify and correct the location, perform the following steps: idlcall `wlookup Installation` _get_media > c:\temp\media.attrib cat c:\temp\media.attrib { "solaris2" "opus" "/data/cdrom/tmp31" }

Edit the file to reflect a valid attribute. On Windows, use a utility, such as notepad.exe, that does not add carriage returns. Then, issue the following command: idlcall `wlookup Installation` _set_media < c:\temp\media.attrib

The managed_node_pd builds a list of the current managed nodes installed in the TMR. This list is built from the Tivoli Name Registry and the Installation object. Constructing InstallHost host: opus type solaris2 user: root Initializing managed node cache: freedom, 1423501536.2.7#TMF_ManagedNode::Managed_Node#, w32-ix86 opus, 1423501536.1.327#TMF_ManagedNode::Managed_Node#, solaris2 Finished initializing managed node cache.

If the error in the tivoli.cinstall is a slow interp on any of the managed nodes in the list, there are two possibilities: 򐂰 A managed node was not cleaned up correctly (the wrmnode command failed). Verify that no managed node name listed is invalid. Commands to look for bad references of a managed node are as follows: Wls –l /Library/ManagedNode odadmin odlist wlookup –ar ManagedNode

Chapter 6. Tivoli core installation process

161

Often, the error slow interp is due to a reference in the Name Registry (wlookup). If so, issue the following command: wregister -u -r ManagedNode

򐂰 The managed node is a valid node, but the Installation object does not know the interpreter of the node. This can occur when a previous install did not update the Installation object. To correct this, issue the command: $BINDIR/TAS/INSTALL/init-nodes –a

This should update the managed node's interp in the Installation object. At this point, the Client Install dialog appears.

Client Install window Within the client installation dialog, the Tivoli administrator can input several parameters for the target nodes, as shown in the following list: Installation password

This is defined when generating the TMR, and if used, is passed to the target node's oserv upon initialization.

Default access account

This must be a UNIX root or NT administrator account that will be responsible for installing the files on the target. An NT target will also have Tivoli Remote Installation Process (TRIP) installed with this user account.

List of targets

Names of managed nodes to be created.

The location of the media

Note that if the media is copied to disk, it should be done with the wcpcdrom command.

Installation locations

These locations are the directories and parameters on start of oserv.

6.4.2 The install begins When the user selects the Install or Install & Close button, the target node is contacted. The process install_engine starts on the TMR Server and is now tasked with the actual installation.

Installation sequence for TRIP The installation sequence for TRIP is covered in this section.

Initial probe of target nodes If the user attempts to use the Trusted Host Default Access Method, the install_engine will attempt to execute the probe scripts. If the user chooses to

162

Troubleshooting Tivoli Using the Latest Features

use the Account Method (Windows does not support Trusted Host, for example), the install will attempt to use the rexec method.

TRIP The Tivoli installation always assumes the target is an NT. It first attempts to install TRIP (see Example 6-3). Example 6-3 Installing TRIP [Lines 197-230 omitted] Processing host rosebud using global default access... configuration item unfound: type=generic, id=dontTrip Warning: empty substitution for dontTrip Substituted 1 instances of @dontTrip@ Trying to install trip on host: rosebud, user: CRITSIT-LAB\mhahn

Note: In order to install TRIP, the customer has to have open security settings.

CurrentNtRepeat The first Windows machine that is installed in the TMR must have the Tivoli Remote Installation Product (TRIP) installed manually. A manual install of TRIP enables the user to designate the directory and drive where TRIP is installed. However, subsequent installations of TRIP through the installation process are installed in c:\Tivoli\trip. After the manual install of TRIP is performed on the first NT, and that NT is created as a managed node, the Tivoli Management Framework assigns this node as the CurrentNtRepeat machine. This becomes the node that will attempt to install TRIP remotely on subsequent targets. Care must be taken that the CurrentNtRepeat machine is in the same domain or a trusted domain, so that the Default Access Method Account and password are valid. Also, TRIP does not have to be running for the CurrentNtRepeat node to remotely install TRIP. If needed, the CurrentNtRepeat machine can be reassigned to another Windows NT managed node in the TMR. To do this, follow these steps: 1. Look up the designated CurrentNtRepeat system as follows: $ wlookup CurrentNtRepeat 1423501536.2.11#TMF_Install::NtRepeat#

2. Look up the Windows NT managed node that you want to designate as the CurrentNtRepeat node as follows: $wlookup -r ManagedNode rosebud 1423501536.3.7#TMF_ManagedNode::Managed_Node#

Chapter 6. Tivoli core installation process

163

3. Unregister the current CurrentNtRepeat node and register the new Windows NT managed node as the CurrentNtRepeat system with the following commands: $ wregister –u CurrentNtRepeat $ wregister CurrentNtRepeat 1423501536.3.7

How TRIP is installed The CurrentNtRepeat node attempts to install TRIP using the NT Server Message Block protocol and requires that the C$ be shared for Administrators (it is, by default). If it is a UNIX machine, the attempt fails, and the installation assumes that the target already has rexec capabilities. The CurrentNtRepeat node then attempts the inspection scripts. The CurrentNtRepeat node keeps a log of the attempted install in %DBDIR%\tmp called ntrepeatlog. The steps for a successful installation are shown in the following list: 1. Identify the target node and the account to use. 2. Install TRIP service on machine rosebud(w32-ix86), user CRITSIT-LAB\mhahn. 3. Set up the path for the copy of files. 4. Verify whether TRIP is already running. 5. Begin to copy the TRIP files to \\\C$. The files that are copied to the target node are referenced in the file $BINDIR/TAS/INSTALL/tripfiles on the CurrentNtRepeat machine. 6. Install the service. Although not noted in the ntrepeatlog, the actual command is trip – install, which creates the TRIP service and starts it. The registry changes are in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\trip. The tivoli.cinstall shows that the TRIP service is installed, as well as the interp of the target node. If TRIP is installed, then the install_engine assigns the w32-ix86 interp to the target. If TRIP was not installed or was running, the install_engine must still find out the interp. This is done in the inspection portion of the install (see Example 6-4 on page 165).

164

Troubleshooting Tivoli Using the Latest Features

Example 6-4 Inspection portion of the install back from trip install installed Trip [Tivoli Remote Installation Service installed and started on rosebud...] Constructing InstallHost host: rosebud type w32-ix86 user:

Possible issues with the TRIP installation There are several issues pertaining to the remote installation of TRIP. If the installation of TRIP fails, the ntrepeatlog might offer an explanation: 򐂰 Wrong or invalid password

There is no error seen in the ntrepeatlog other than that it is unable to verify that the service is running: Installing trip service on machine rosebud(w32-ix86), user CRITSIT-LAB\fred, drive C$(C:/) Remote path: \\rosebud\C$

򐂰 Port 512 already in use

There are several products available for Windows NT that also use port 512, such as Hummingbird’s Exceed product. Also, NT 3.5.1 Service Pack 5 introduced a new spooler that can claim port 512 as one of its ports. The error in the ntrepeatlog is: StartService failed rc=8 service installed but not started on rosebud Rexec socket already bound to something else...

򐂰 Workstation service is not started

This installation method relies on the Workstation service. If this is not running, the error is as follows: In Main: NtRepeat::add in, oid 1423501536.2.11#TMF_Install::NtRepeat# Exec dir: c:\Tiv31\bin NetWkstaGetInfo returned 53 ERROR_BAD_NETPATH, it could be a unix box. Abandon trip. Can't id machine asdfg NtRepeat::add out, oid 1423501536.2.11#TMF_Install::NtRepeat#, rv=

򐂰 Installing TRIP on a drive other than C:

Sometimes, installing TRIP on the C: drive of the remote target is not possible (policy or space). In this case, where you cannot install TRIP locally on the

Chapter 6. Tivoli core installation process

165

target, you can set up the Tivoli install to force the install on another drive. Use the following command: Idlcall `wlookup Installation` _get_data > /tmp/file

Edit the file, and change the following lines: Line 150: generic TripClientDrivePath d:/ Line 156: generic TripShareName D$

Be sure that the new drive is shared to the administrator. Then, reset the data attribute and be sure that there is Tivoli backup: Idlcall `wlookup Installation` set_data < /tmp/file

Validation that the media packet matches Before the inspection of the installation, the install_engine validates whether the media that was defined matches the version of the Tivoli Management Framework installed. This typically does not occur unless Tier2 managed nodes were installed, and a user might have modified the Installation object to install the product. The tivoli.cinstall verifies that the revisions are correct (see Example 6-5). Example 6-5 Verifying the revisions Line 243: media revision matches current product's rev. If it is incorrect, the error appears like this in the tivoli.cinstall: [The revision levels of the installation and media do not match: ``3.0 3.1''.] [ ] Had exception in client install check real ms=2083/203560, cpu ms=0/0 The revision levels of the installation and media do not match: Entering goodbye real ms=0/203560, cpu ms=0/0

To correct this problem, confirm that you are using the correct CDROM image. Look at the /tmf.ind file, and the first two lines show the version as follows: TMF:description:TME 10 Framework, Version 3.1:TMF TMF:revision:3.1

If the version is correct, verify that the Installation object has the correct version: idlcall `wlookup Installation` _get_revision

If the version does not match, yet the Tivoli Management Framework is known to be correct, issue the command: idlcall `wlookup Installation` _set_revision '”3.1”'

166

Troubleshooting Tivoli Using the Latest Features

Be sure you have a Tivoli backup of the TMR Server before doing this command.

Inspection of the target node (REXEC method) If the Tivoli administrator chose the Account for the Default Access Method, the install_engine will rexec to the target and determine its OS and major version. As noted previously, if TRIP is installed or discovered on the target, the install_engine knows that the target is a Windows NT node and does not execute the portion of the inspection script that dictates the interp.

If the target was not a Windows NT node Example 6-6 is the portion of a tivoli.cinstall when the target was a UNIX machine. Note that after it is determined that the target was not processing SMB packets, it assumed that the node was a UNIX machine. Example 6-6 The target is a UNIX machine couldn't install trip on crescent Substituted 1 instances of @uname@ uname:uname -a || /usr/bin/hostinfo || echo FAILED Substituted 1 instances of @defaultPath@ configuration item unfound: type=generic, id=TripPath Warning: empty substitution for TripPath Substituted 1 instances of @TripPath@ Before Command real ms=20327/20327, cpu ms=0/0 executing following command #1 on crescent: PATH="/bin:/usr/bin:/usr/ucb:$PATH";export PATH;umask 022;echo BEG_OF_STREAM;uname -a || /usr/bin/hostinfo || echo FAILED; echo; echo END_OF_STREAM Adding local host alias: opus rexec port is 512, real port is 512 stdout read 14 bytes. stdout read 44 bytes. 0 byte read of stdout 0 byte read of stderr

At this point, the output is compared to regular expressions stored in the data attribute of the Installation object (see Example 6-7). Example 6-7 Output is compared to regular expressions stored in the data attribute output: SunOS crescent 4.1.4 2 sun4c Substituted 1 instances of @identity@ identity info: aix3-r2=^AIX [-A-Za-z0-9_+.]+ 2 3 aix4-r1=^AIX [-A-Za-z0-9_+.]+ [12] 4 hpux9=^HP-UX.*[A-Z].09.[0-9][0-9] [A-Z] 9000/ hpux10=^HP-UX.*[A-Z].10.[0-9][0-9] [A-Z] 9000/ sunos4=^SunOS [-A-Za-z0-9_+.]+ 4[.]1

Chapter 6. Tivoli core installation process

167

sunos4=^CSROS.* solaris2=^SunOS [-A-Za-z0-9_+.]+ 5[.] sysv4-att=.* 386/486/MC dgux5=^dgux sysv4-m88k=^.* m88k uw2-ix86=^.* 4[.]2MP 2[.]0.* i386.* osf-axp=^OSF1.*V[23][.][0-9].* alpha mips-irix5=^IRIX .* 5.* mips nextstep3-ix86=.*NeXT.* u6000_svr4mp=.* 4.0 2 w32-ix86=^Windows_NT Found host match, crescent==sunos4 Constructing InstallHost host: crescent type sunos4 user:

Example 6-7 on page 167 shows that crescent is the target managed node. If an error occurs regarding an unknown interpreter, this is the portion that can explain why.

If the target is a Windows NT node There is no need to determine the OS as you did for the UNIX portion, because the ability of the CurrentNtRepeat node to communicate to the target defines that this target is a Windows NT node.

Inspection of the target node (Trusted Host Method) If the administrator chose Trusted Host for the Default Access Method, the initial probe uses the trusted host method. This is not an option for the Windows NT node.

Creating the directory structure From lines 245-391, the install_engine runs a series of scripts to generate the directory structures. This is run as the Default Access Method Account. The script is shown in Example 6-8. Example 6-8 Generating the directory structures input: PATH="/bin:/usr/bin:/usr/ucb:$PATH";export PATH;umask 022;PFN=TMF_;HN=crescent;( echo LIB /opt/Tivoli31/lib sunos4 .installed/TMF_LIB; echo BIN /opt/Tivoli31/bin sunos4 .installed/TMF_BIN; echo DB /opt/Tivoli31/db crescent.db .installed/TMF_DB; echo MAN /opt/Tivoli31/man sunos4 .installed/TMF_MAN; echo CAT /opt/Tivoli31/msg_cat EMPTY .installed/TMF_CAT; echo APPD /opt/Tivoli31/X11/app-defaults EMPTY .installed/TMF_APPD; echo GBIN /opt/Tivoli31/bin generic_unix .installed/TMF_GBIN;

168

Troubleshooting Tivoli Using the Latest Features

echo CONTRIB /opt/Tivoli31/bin sunos4/contrib .installed/TMF_CONTRIB; ) |while read T RD AD IF; do if [ -f /tmp/TransferEnv ]; then rm /tmp/TransferEnv fi if [ ! -d $RD -a "1" = "0" ]; then echo "CREATE_FAILURE:$T" elif [ ! -d $RD ]; then MDIR=$RD;/bin/mkdir -p $MDIR if [ ! -d $RD ]; then echo "READONLY:$T" fi fi if [ -d $RD ]; then ADP=$RD if [ $AD != EMPTY ]; then ADP=$RD/$AD if [ ! -d $ADP ]; then (cd $RD;MDIR=$RD/$AD;/bin/mkdir -p $MDIR) fi fi if [ ! -d $ADP ]; then echo "READONLY:$T" elif [ -r "$ADP/$IF" ]; then IPF="$ADP/.installed/$PFN$T" if [ -f "$IPF" ]; then cat "$IPF" else echo "INSTALLED:$T" fi else PF="$ADP/$PFN$T" if [ ! -f $PF ] || expr `cat $PF` : '.*:('$HN')$' >/dev/null; then if /bin/echo "GETFROM:$HN">$PF; then echo "GETFROM:$HN:$T" cd $ADP RWD=`/bin/pwd` echo `df . | tail +2` | (read a b c d e f; echo "DF_DATA#$RWD#$a#$d:1k#$f") else echo "READONLY:$T" fi else echo `cat $PF`":$T" fi fi fi

Chapter 6. Tivoli core installation process

169

Files already exist In production environments, a failed install is often caused by the creation of the client database of the target. In this case, the installation has failed, but all of the binaries, libraries, message catalogs and other Tivoli files probably are laid down successfully. In this case, the subsequent install (after cleaning up the target in the Tivoli database) sees the tag files, such as bin and lib, that are laid in the directories specified in the Install window. The install_engine uses these tag files to verify if an application or Tivoli Management Framework was installed. They are located in the .installed directory for each of the various Tivoli directories, such as $BINDIR and $DBDIR. If the creation of a managed node fails, you can minimize the time required to recreate the managed node. If you are confident that the all files were installed successfully, you can reinstall and only distribute the client database portion. The install_engine reinstalls any group of files if the ! is specified at the end of the directory location. This overrides the Inspection script that looks for tag files. This is not recommended if other Tivoli applications were installed in addition to the Tivoli Management Framework. In this case, remove the entire install and attempt the install again.

Problems with the inspection script One possible issue with the inspection script is a hang of the script or of the commands on the endpoint. In both cases, the install GUI shows a series of dots. If there is a problem on the endpoint, the install_engine must be killed, and closer inspection of the endpoint or network is needed.

Creation of the tmersrvd and Tivoli_Admin_Privileges accounts on Windows NT When TRIP is installed, it lays a file in the c:\tivoli\trip directory called ntconfig.exe. This script is responsible for the creation of the accounts and checking permissions on several directories. Because ntconfig.exe needs to create the tmersrvd account and the Tivoli_Admin_Privileges group, the Default Access Method Account must have the appropriate user rights and administrator privileges.

170

Troubleshooting Tivoli Using the Latest Features

Primary Domain Controller and Backup Domain Controller If the target to install is a Backup Domain Controller (BDC), ensure that the Tivoli Management Framework was installed on a Primary Domain Controller (PDC) first. The reason for this is that the shared SAM database in NT must be updated on the PDC before the accounts are workable. If the PDC is not to be a target, the first install on the BDC will fail. Synchronize the domain before the next installation; otherwise, create the following accounts on the PDC and propagate the following account information: 򐂰 Tmersrvd user account

– Part of the Everyone group – Requires no othergroups to be part of it – Password can be anything – Password never expires – User rights for login locally 򐂰 Tivoli_Admin_Privileges group account

– Requires administrators to be included – User rights required – Replace Process Level token – Act as part of operating system – Increase quotas 򐂰 Everyone (existing Windows NT group)

– User rights required – Bypass Traverse Checking Note: If security policies have disabled this for Everyone, assign the Bypass Traverse Checking to the tmersrvd account.

Processing the returned data from the inspection Lines 399-577 of tivoli.cinstall shows the install_engine verifying that the disk space reported by the inspection is sufficient for the installation of the Tivoli Management Framework.

6.4.3 Installation of the Tivoli Management Framework files If all is fine, you will see the installation window shown in Example 6-9 on page 172.

Chapter 6. Tivoli core installation process

171

Example 6-9 Installation window Unless you cancel, the following operations will be executed: For the machines in the independent class: hosts(rosebud) need to copy the machine independent Client Database to: d:/Tiv31/rosebud.db need to copy the machine independent Message Catalogs to: d:/Tiv31/msg_cat need to copy the machine independent X11 Resource Files to: d:/Tiv31/app-defaults need to copy the machine independent Generic Binaries to: d:/Tiv31/bin/generic_unix For the machines in the w32-ix86 class: hosts(rosebud) need to copy the architecture specific Libraries to: d:/Tiv31/lib/w32-ix86 need to copy the architecture specific Binaries to: d:/Tiv31/bin/w32-ix86 need to copy the architecture specific Man Pages to: d:/Tiv31/man/w32-ix86 need to copy the architecture specific Public Domain Contrib to: d:/Tiv31/bin/w32-ix86/contrib

6.4.4 Tivoli Remote Access Account (TRAA) If the target is an Windows NT node, the administrator is presented with the TRAA account. Tivoli Management Framework uses this account to access remote objects, such as remote shares. This account has no bearing on the installation, and can be set to None or to an account that will be used later for access to remote objects in Windows NT. If None was checked, and the target was a managed node, ensure that the previous install did not have a TRAA account set; otherwise, the new install will use this: [tap-confirm-Continue?] Waiting for continuation confirmation... Continuation confirmed with tap user: CRITSIT-LAB\tivuser

6.4.5 Installing the files The Tivoli install has seven (nine for TMP3.2) groups of files that are installed for a managed node.

172

Troubleshooting Tivoli Using the Latest Features

The first set are OS specific: 򐂰 Libraries 򐂰 Binaries 򐂰 ManPages 򐂰 Contrib Directories

The next set are independent of the OS: 򐂰 Client database 򐂰 Message Catalogs 򐂰 Generic Binaries 򐂰 HTML and Java files (3.2 only) 򐂰 Tivoli Management Agent files (3.2 only)

These files are installed on the target by a process called sapack. This executable is stored on the TMR Server in $BINDIR/../client_bundle/bin. Each sapack is OS specific and is installed on the target using rexec. This process runs as the Default Access Method. On UNIX, the sapack process runs in /tmp. On Windows NT, the process runs from %DBDIR%\tmp. The sapack is responsible for the installation of all the files on the endpoint. It does not control the creation of client databases. The files are directly installed to their target directory. There are no temporary staging areas during the install. The install_engine first installs the libraries, then the binaries (see Example 6-10). Example 6-10 Installing the libraries and the binaries [Distributing architecture specific Libraries for rosebud ] configuration item unfound: type=w32-ix86, id=ClientBundle [Lines 613-635 omitted] Sending file /data/cdrom/tmp31/FILE7.PKT Sending 16384 byte chunks to the stream. ...sent 1546 bytes to remote command [.] XE[.] stdout read 15 bytes. XE[.] XE[.] XE[.] stdout read 17 bytes. 0 byte read of stdout

Chapter 6. Tivoli core installation process

173

0 byte read of stderr Command on host rosebud finished. real ms=4486/52019, cpu ms=0/0 [ completed. [Lines 653-669 omitted] ] In archPath -> d:/Tiv31/lib [ Line input: PUSHED:rosebud:LIB sent 19 bytes to command stdout read 51 bytes. 0 byte read of stdout stderr read 68 bytes. 0 byte read of stderr output: PUSHED:rosebud:LIB stderr: Are you sure (Y/N)?processed dir: d:\Tiv31\lib\w32ix86\.installed Command on host rosebud finished. real ms=1630/53655, cpu ms=0/0 Done adjusting probe file. doToDo: I'm going to rm, at cleanup time rosebud:d:/Tiv31/lib/w32-ix86/TMF_LIB

This occurs for each type of file.

Common issues Here are some common issues: 򐂰 MDist issues

The creation of managed nodes does not use MDist. However, any patch or product install uses the MDist settings. Be sure that the TMR Server is properly configured as a repeater. 򐂰 File Package Issues

If there are problems with the file package, verify that the files were local or on a CD. If they were local, there might be an issue with the indexing of the files or a corruption in the file. Try using the CD. 򐂰 Permissions on /dev/null

Be sure that the directory /dev/null of the target has read/write for everyone. 򐂰 Slow installation

If you are using an HP TMR Server and installing to an NT, there is an issue in how the HP rexec negotiates with the rexec of the target. There is no known resolution at this time.

174

Troubleshooting Tivoli Using the Latest Features

6.4.6 Client database creation A managed node install creates the database on the target. The database is the third part of the install, after the libraries and binaries.

Data sent to the target node The installation only lays down the $DBDIR/file_versions directory. The remainder of the database portion is the configuration scripts.

Creation of client database The creation of the client database is divided into two parts. The first includes starting the oserv on the target, and the second is the creation of the objects on the database of the target to make the target a managed node. The HostLocation object is created with the following two steps: 1. Create a temporary HostLocation object by cloning the Base Prototype object (for example, the skeleton object) and add a clone_bpo method. 2. Create a real HostLocation instance using the temporary object as the HostLocation.

Initializing the managed node object The managed node object for the dispatcher is created and initialized after the $BINDIR/TMF/BASESVCS/client.cfg script completes. 򐂰 The HostLocation attribute value of the Base object is obtained and used as the location for the new ManagedNode instance. 򐂰 The HostLocation attribute is set to the new ManagedNode instance. 򐂰 The TaskExecute object instance is created. 򐂰 The NTRepeat object instance is created (NT only). 򐂰 The Presentation object of the new managed node is created. 򐂰 The new managed node is added to the policy region.

Object creation At this point, the following objects have been created: 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰

$TMR..0 = Base Object $TMR..1 = Skeleton Object $TMR..2 = Oserv $TMR..3 = Temporary Object (gets removed during install) $TMR..4 = Prototype Object $TMR..5 = Instance HostLocation $TMR..6 = Prototype

Chapter 6. Tivoli core installation process

175

򐂰 򐂰 򐂰 򐂰 򐂰

$TMR..7 = ManagedNode Instance $TMR..8 = TaskExecute Prototype $TMR..9 = TaskExecute Instance $TMR..10 = NTRepeat Prototype $TMR..11 = NTRepeat Instance

Initializing the managed node At this point, the $BINDIR/TAS/INSTALL/client.cfg script completes and initializes the rest of the managed node objects and attributes: 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰

$TMR..12 = FileIO Prototype $TMR..13 = FileIO Instance $TMR..14 = BackupClient Prototype $TMR..15 = BackupClient Instance $TMR..16 = Httpd Prototype $TMR..17 = Httpd Instance $TMR..18 = DesktopList Prototype $TMR..19 = DesktopList Instance

Starting the oserv for the first time The start of the oserv on the target is managed with the following command: $BINDIR/TAS/INSTALL/install2.cfg

This command performs the following tasks: 򐂰 The command checks for any dispatchers on port 94. 򐂰 It determines the database directory. 򐂰 If an Windows NT node, install2.cfg creates the oserv service using the oinstall command. 򐂰 If an Windows NT node, it copies the $BINDIR/bin/TivoliAP.dll directory to %SYSTEMROOT%/SYSTEM32/TivoliAP.dll. 򐂰 If an Windows NT node, the command checks to see if TAP is available. If not, it starts the oserv with a -u flag. 򐂰 If the host name does not match the label that Tivoli assigns to the target, it creates the /etc/wlocalhost file (UNIX) or the HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform\wlocalhost registry key (Windows NT) that will contain the label by which Tivoli can identify the machine.

176

Troubleshooting Tivoli Using the Latest Features

򐂰 The command starts the oserv and attempts to contact TMR Server. It uses the following switches: -i

Initializes the client database. Do not ever issue this command yourself.

-h

TMR Server name

-k

The database directory

-b

Binary directory

-l

Library directory

-u

Used by Windows NT to bypass TAP for the initial boot. This is not something you normally want to do, but it is done in the initial install and can be used to test for TAP-related issues.

At this point, the oserv is running as root (UNIX) or SYSTEM (Windows NT). The Default Access Method Account is not used for the creation of the database.

6.4.7 Problems starting the oserv and contacting the TMR The following are some of the common issues that involve starting the oserv for the first time. The client’s $DBDIR/oservlog provides some helpful information for understanding the problem. On Windows NT, error 1067 is a generic error message and provides no information concerning the cause of the problem. The oservlog, however, can provide better detail on the failure. Also, the install2.cfg.error and install2.cfg.output files can provide insight on the failure. The files are located in %DBDIR%\tmp on Windows NT, and return the wtemp command on the target node (be sure to source the Tivoli environment issue first). 򐂰 odlist init Failed. Host Inacessible or Host Failure (48)

The failure is due to the inability of the oserv to resolve back to the TMR Server. Verify that the target is able to resolve both short and long names back to the TMR Server. If the label of your TMR Server is using a short name, and the target’s DNS configuration is questionable about the domains that it searches, you can add the following variable to the oserv to force the oserv to use a fully qualified name: odadmin environ get > /tmp/file

Edit the file and add the following line: ALI_HOST_NAME= odadmin environ set < /tmp/file

Chapter 6. Tivoli core installation process

177

򐂰 Could not encrypt or decrypt data 43

The installation password specified in the Client Install window is wrong. You can change this password with the odadmin set_install_pw command or by starting the TMR Server using the following command: oserv -k %DBDIR –s

򐂰 odlist init Failed. Named Resource Already Exists 26

There is already an entry in the odlist that shares the same IP address or host name as the targets. Type odadmin odlist and search for the IP address or host name that already exists. 򐂰 LogonUser Failed: Logon Failure: the User Has Not Been Granted the Requested Logon Type

The TRAA account used was invalid, the password was incorrect, or the user did not have Log on Local rights. 򐂰 Bad Magic Number

The installation password specified in the Client Install window is wrong.

6.4.8 Configuring the client database (the TMR Server perspective) The tivoli.cinstall shows the output in Example 6-11 when the client oserv is started. Example 6-11 tivoli.cinstall-client oserv started 0 byte read of stdout 0 byte read of stderr stderr: messages: temp script: nt_before: d:/Tiv31/rosebud.db/tmp\nt_b4.cfg temp script: nt_after: d:/Tiv31/rosebud.db/tmp\nt_a5.cfg starting script: d:/Tiv31/rosebud.db/tmp\nt_b4.cfg script complete: exit code=0 starting script: d:/Tiv31/rosebud.db/tmp\nt_a5.cfg script stderr: [ Client bootstrap completed successfully. Hey Install: initialize rosebud,reboot ] script complete: exit code=0 Command on host rosebud finished. real ms=20858/139246, cpu ms=0/2000 client will need to be rebooted [ Client connected, configuring...]

178

Troubleshooting Tivoli Using the Latest Features

The target node now has an entry in the odlist and has three objects associated with the entry. The server starts the second phase of the client database install, as shown in Example 6-12. Example 6-12 tivoli.cinstall-client database install Bootstrap!rosebud handleBootstrap(rosebud,1423501536.4.0,w32-ix86) Substituted 1 instances of @PROS@ Substituted 1 instances of @ClientAddNoTrans@ Client add will not be done in a transaction. Executing client_configure command real ms=12/139258, cpu ms=0/2000 1423501536.4.0 rosebud 1423501536.1.196#TMF_PolicyRegion::GUI# w32-ix86

The node now has a dispatch number and the label is being defined in the policy region you selected when starting the install. These arguments are passed to the client_configure method.

6.4.9 Configuring the client database (the client perspective) When the oserv on the client has started, it begins creating the objects that it needs to be a managed node. This is managed by the $BINDIR/TAS/INSTALL/client.cfg process as follows: 򐂰 Initializes oserv object 򐂰 Extends base and oserv objects 򐂰 Creates HostLocation object 򐂰 Adds target to policy region 򐂰 Creates ManagedNode object 򐂰 Creates Task Execute object 򐂰 Creates NTRepeat object (if an Windows NT node) 򐂰 Checks to see if CurrentNtRepeat is set and, if not, makes this Windows NT node the CurrentNtRepeat node (if an Windows NT node) 򐂰 Sets up FileIO object 򐂰 Creates 10 desktop objects on ManagedNode (not TMP3.2) 򐂰 Creates Extd_DesktopList on ManagedNode

The client.cfg process runs as root (UNIX) or the built-in administrator (Windows NT). The next process used is bo_skel1, and this process runs as nobody (UNIX) or tmersrvd (Windows NT). A successful install in the tivoli.cinstall shows the output in Example 6-13.

Chapter 6. Tivoli core installation process

179

Example 6-13 Successful install Finished executing configure command real ms=93209/232467, cpu ms=0/2000 command returned status of 0x0 stdout: Client installation completed successfully. [Client installation completed successfully. ] [ completed.

6.4.10 Verifying a properly installed managed node In order to verify whether the managed node is properly installed and configured, do a Get Properties from the Tivoli Desktop. This confirms whether the IOM channel can be created successfully.

6.4.11 Problems creating the managed node The main source for problems is in the permissions of the %SYSTEMROOT% directory. In many environments, %SYSTEMROOT% might remove global access to prevent non-administrators from seeing files in these directories. The problem is seen when bo_skel1 attempts to start and is unable to read the msvcrt40.dll in %SYSTEMROOT%\system32. This should not be an issue unless the Windows NT administrator also modifies the UserRights so that Everyone has Bypass Traverse Checking. A complete list of error files is shown in Table 6-2. Table 6-2 Error files

180

File

UNIX location

Windows NT/2000 location

Location

tivoli.cinstall

/tmp

%dbdir%\tmp

Server

oservlog

$DBDIR

%dbdir%

Client, server

install2.cfg.error, install2.cfg.output

/tmp

%dbdir%\tmp

Client

client.cfg.error

/tmp

%dbdir%\tmp

Client

Troubleshooting Tivoli Using the Latest Features

6.5 Finishing the install After the creation of the database of the managed node, several steps still remain for installation.

6.5.1 Files installed Files, such as the generic binaries and the ManPages, are done with the sapack executable and continue to be installed with the Default Access Method Account.

6.5.2 Updating Name Registry The install begins to update the server database after the files are completely installed on the managed node, as shown in Example 6-14. Example 6-14 Updating Name Registry reading in name registry info Platform/Product/Patch map: writing alias info into name registry. Before Command real ms=2/284370, cpu ms=0/2000 [registered. ] [ Finished client install

6.5.3 Completion If the newly created managed node is an Windows NT system, the Windows NT requires a reboot for the managed node to be fully operational. Note: You can continue to use the Windows NT managed node before you reboot. However, attempting to restart the oserv on the Windows NT before the reboot will fail. The reason for this is that the TivoliAP.dll must be loaded by the Local Security Authentication (LSA) subsystem on the Windows NT for the oserv to properly spawn processes.

Chapter 6. Tivoli core installation process

181

6.6 Common errors for server and client installs Check for the following possibilities common to most types of install: 򐂰 Error e=5 - permissions

The most common cause of this error is that you have created the Tivoli Install directories manually, and some or all of the directory permissions in the path are incorrect. This comes from the underlying operating system. System error code 5 is usually an access denied message, such as could not run a method as nobody. 򐂰 Error e=1 - wrong usage

A system call was incorrectly initiated, or an invalid call was made. Check tivoli.cinstall file for messages. 򐂰 Error e=9 (or 6 on HP) - library path problem

The exit errors (e=) are usually errors Tivoli received from the system or some other application. You may be able to use system documentation to obtain more information about the cause of the errors. In Windows NT, try NET HELPMSG n where n is the number following e=. In many cases, the error can reflect a problem that happened on another system. When this happens, the system error you receive locally may not be as meaningful. 򐂰 Directory permission

For example, in UNIX, /dev/null, /usr, /usr/local, and /usr/local/Tivoli (if /usr/local or /usr/local/Tivoli already exist) must all be rwxr-xr-x. Other products may change the permissions as Oracle does for /dev/null. 򐂰 Lack of space

Check space and permissions on $DBDIR and /tmp for UNIX and %DBDIR% for Windows NT.

6.7 Reinstalling clients Any partially installed clients must be removed before they can be reinstalled.

To remove Tivoli from a Windows NT client To remove Tivoli from a Windows NT client, perform the following steps from the Windows NT client: 1. Remove the oserv service from the Windows NT service manager: oinstall -remove

182

Troubleshooting Tivoli Using the Latest Features

2. Remove TRIP from the Windows NT service manager: trip -remove

3. Remove the TAP internal key and unregister the TivoliAP.dll with the local security authority. If you are going to reinstall a client using a different installation password, remove the TivoliAP.dll from the %SystemRoot%\system32 directory, because Tivoli will not overwrite an existing file: wsettap -d

4. Remove the Windows NT client code from the TMR Server (see the next section).

To remove a partially or fully-installed client from the TMR To remove a partially or fully-installed client for the TMR, perform the following steps: 1. Determine where, or if, a client is installed with one of the following commands: wlookup -ar clientname wls /Library/clientname odadmin odlist

2. Make sure the oserv is not running on the client. 3. Perform one or more of these steps in order from the server until the client is successfully removed: – wrmnode clientname Removes the specified client from the Tivoli database. – wrmnode clientname -d dispatcher-number

Shuts down the dispatcher of the specified managed node and removes it from the Tivoli database. The dispatcher number can be obtained with the odadmin odlist command. – odadmin odlist objects dispatcher-number Displays the object IDs of the objects owned by the dispatcher. If there are less than three objects, run the following to remove the dispatcher and its objects from the TMR. References to the objects will still remain. – odadmin odlist rm_od dispatcher-number

Removes the node. 4. Run wchkdb -u to update the Tivoli resource database. 5. Remove the client’s database directory. 6. The client can now be reinstalled.

Chapter 6. Tivoli core installation process

183

184

Troubleshooting Tivoli Using the Latest Features

7

Chapter 7.

Software Installation Service (SIS) The Tivoli Software Installation Service (SIS) is an application designed for faster and easier installation of Tivoli products in a Tivoli Management Region (TMR). SIS can push products to Tivoli clients and is intended to provide increased functionality over the standard Tivoli installation process used in previous releases of the Tivoli Management Framework. Using SIS, you can create an install repository (IR) that contains the installation images of the products, determine a product configuration for some or all of the machines in your Tivoli Management Region, and install that configuration on the machines you choose. This chapters covers SIS internals and troubleshooting. The following topics are discussed in this chapter: 򐂰 Section 7.1, “SIS component overview” on page 186 򐂰 Section 7.2, “SIS considerations” on page 187 򐂰 Section 7.3, “Using SIS” on page 188 򐂰 Section 7.4, “Troubleshooting SIS” on page 203

© Copyright IBM Corp. 2003. All rights reserved.

185

7.1 SIS component overview This session gives a high level overview of the concepts of SIS. There are three components to SIS. They are: 򐂰 The Tivoli Software Installation Service Binaries

The Tivoli SIS server is any managed node (including the TMR Server) that has the SIS binaries installed on it. Using either the graphical user interface (GUI) or the command line interface, you can invoke the Tivoli Software Installation Service that runs on the SIS server. 򐂰 The Install Repository

SIS introduces the concept of an Install Repository (IR). The IR holds the images of all the products or patches that are to be installed using SIS. Products are imported into the IR either through the GUI or through the command line interface (CLI). You control which products are to be installed on which targets. During the installation of SIS, you specify the location of the IR. You may find it useful to set a variable, $IR. The Install Repository location will be referenced in scripts and commands in this chapter by the variable $IR. Note that this variable is not defined or used by SIS itself. 򐂰 A Response File:

Response Files are text files that contain product and machine attributes that are required by SIS for an installation, such as: – Product install directory paths – Machine name – Machine access methods – Operating system type – Password settings – Login account information You can define specific attributes for a product and machine, or use a global set of attributes for all machines, and pass these values to SIS in a Response File. Figure 7-1 on page 187 shows the high-level design of SIS.

186

Troubleshooting Tivoli Using the Latest Features

Tivoli: Software Installation Service High-Level Design M anaged N odes and M achines

Install Repository

SIS GU I or CLI

D isk Store

SIS Installation Engine

Dispatch Engine

Response File

Figure 7-1 SIS high-level design

The steps for implementing SIS are: 1. Build the IR, and import the product images you need into the Install Repository by SIS GUI or using the CLI. 2. Once the IR has been built, you can configure what products you want to install on which nodes. This is done through the SIS GUI or through a Response File. 3. Invoke the SIS Installation Engine to dispatch and install products to the machines you specified by either GUI or CLI.

7.2 SIS considerations You should note the following about SIS: 򐂰 SIS does not support installing to nodes located in interconnected TMRs. However, to conserve disk space, the Install Repository can be shared between different TMRs. 򐂰 You cannot use SIS to install any products or patches on an endpoint. However, you can create an endpoint itself using SIS on Windows NT or UNIX. This is discussed in the next paragraph. 򐂰 SIS can not create endpoints on any other machines without the PC agent running on them. This means if you want to create a Windows 95, Windows

Chapter 7. Software Installation Service (SIS)

187

98, Windows 3.x, OS/2, or NetWare endpoint using SIS, it must already be a PC managed node. Note: For Windows NT/2000 systems: 򐂰 You must run the bash command shell for the SIS command line commands. 򐂰 The correct directory for all SIS command line programs is: $BINDIR/../generic_unix/SIS

7.3 Using SIS This section provides details on the components you will work with to use the Software Installation Service.

7.3.1 Starting the SIS Graphical User Interface You can start SIS from the Tivoli Desktop or the command line. To start from the desktop, perform the following steps: 1. From the Tivoli Desktop menu, select Desktop -> Install -> Software Installation Service... to launch the SIS desktop. 2. When you are prompted for the Tivoli Installation password by the Get Installation password dialog, enter the Tivoli Management Framework installation password, if you had specified one, and select OK. If you do not have an installation password, select OK . At this point, the SIS desktop, as shown in Figure 7-2 on page 189, should be displayed.

188

Troubleshooting Tivoli Using the Latest Features

Figure 7-2 SIS desktop dialog

The desktop start method is the supported method for starting the SIS GUI. If it is necessary to start SIS from the command line (for example, if directed to do so by Tivoli support), you can, in a UNIX environment, enter: $BINDIR/../generic_unix/SIS/wsisgui

In a Windows NT environment, enter: sh %BINDIR%/../generic_unix/SIS/wsisgui

From the SIS dialog, you can choose: Install

Starts up the software installation procedure.

Synchronize with TMR

Synchronizes the IR with the TMR Server database and updates product and managed node installation information in the IR.

Quit

Closes Tivoli Installation Service and removes TMR, IR, and usage locks.

About

View SIS product information.

View Logs

View the HTML log files generated by SIS.

The following occurs when SIS starts: 1. SIS writes to the $IR/sis-.out file for both shared and non-shared IRs.

Chapter 7. Software Installation Service (SIS)

189

2. SIS creates the necessary locks. There are four locks created during SIS startup: TMR lock

When the SIS starts, it will create the TMR lock in the object database using wregister -i SIS . The TMR lock prevents other users within the TMR from using SIS at the same time. That is, this prevents two users from attempting to distribute to the same machine at the same time. Instead, the second user would see a warning message describing the TMR lock and displaying the host name where SIS is being executed within the TMR.

IR lock

The IR lock prevents multiple users in different TMRs from using the IR directory in write mode concurrently. A warning message will be displayed to all users after the initial launch of SIS. This warning lists the host name and region name of the machine running the initial SIS. If the IR is used in shared mode, the user has the option of continuing in read-only mode. If the IR is used in read-only mode, the IR lock is not created. The IR lock is created as a ir.lck file when SIS starts.

Usage lock

The IR can be shared between different TMRs at the same time. The first node that starts SIS has the write authority. Other nodes in different TMRs can still use the IR and will create the usage lock when they start SIS. The usage lock file is $IR/TMR/Defaults/ULOCK/.lck.

CLOSEDIR.lck

Used by wimport -remove in the ULOCK directory.

3. SIS reads the products in the IR. SIS first checks for the existence of the $IR/TMR/Defaults/miniprod.sav (non-shared IR) or $IR/TMR/ $DBDIR/ir.loc

IR directory

All of the IR files are located in the directory you specified for IR during SIS installation. $BINDIR/../generic_unix/SIS/ contains the following files:

BUILD.TXT

208

This file contains the build version of SIS. It can be used to determine what version of SIS you are using.

Troubleshooting Tivoli Using the Latest Features

FindIR

This script is used to determine the location of the Install Repository.

PointIR

This script is used to update the location of the Install Repository after the first invocation of SIS.

TMRSync.sh

This script is used to synchronize SIS with TMR.

launch_sis

This script is called by the desktop to launch SIS.

Chapter 7. Software Installation Service (SIS)

209

210

Troubleshooting Tivoli Using the Latest Features

8

Chapter 8.

ISMP based installation (Integrated Installation) This chapter provides an overview of the Integrated Installation (also called local installation method) that was first introduced by IBM Tivoli Configuration Manager 4.2. Integrated Installation is the first application of the InstallShield Multi-Platform (ISMP) technology for installing Tivoli products. In the future all Tivoli products are expected to conform to the InstallShield Multi-Platform (ISMP) technology, which is part of Tivoli’s Install Imperative and IBM’s install strategy. In this chapter, we will show you how to install IBM Tivoli Configuration Manager components using Integrated Installation and then cover Integrated Installation troubleshooting. The following topics will be covered in this chapter: 򐂰 Section 8.1, “Overview of Integrated Install” on page 212 򐂰 Section 8.2, “Server Install” on page 212 򐂰 Section 8.3, “Troubleshooting Server Install” on page 237 򐂰 Section 8.4, “Desktop Install” on page 248 򐂰 Section 8.5, “Troubleshooting Desktop Install” on page 251 򐂰 Section 8.6, “Web Gateway Install” on page 253 򐂰 Section 8.7, “Troubleshooting Web Gateway Install” on page 257

© Copyright IBM Corp. 2003. All rights reserved.

211

8.1 Overview of Integrated Install InstallShield Multi-Platform (ISMP) is part of Tivoli’s Install Imperative and IBM’s install strategy, which is to achieve two major goals: 򐂰 Consistent install 򐂰 Simplified maintenance

The first principle helps achieve Tivoli’s goals by providing the customer with a similar installation experience for each Tivoli product. The second principle allows customers to apply maintenance (upgrades) to Tivoli products in a consistent and simplified way. In this release of IBM Tivoli Configuration Manager 4.2, we provide multiple scenarios where Integrated Install is being used and will help you to install or upgrade Tivoli Management Environments. For IBM Tivoli Configuration Manager 4.2, ISMP is being used in following scenarios: 1. 2. 3. 4.

Server Install (or Integrated Server Install) Upgrade Plan Generator Desktop Install (or Integrated Desktop Install) Web Gateway Install (or Integrated Web Gateway Install)

This chapter focuses on fresh server install. We shall also cover the Integrated Endpoint and Desktop Installs. Important: Since we used a version of IBM Tivoli Configuration Manager before it was general available, the installation panels might slightly differ from the ones that you will see in the final version of the product.

8.2 Server Install There are several ways to install IBM Tivoli Configuration Manager 4.2. In this chapter, we use the new ISMP methods of install. The Server Install scenario starts with CD 5 and should be used if: 򐂰 Tivoli Management Framework is not installed 򐂰 Tivoli Management Framework 4.1 is installed, but has a subset of Configuration Manager 4.2 applications

If the current installation is not in one of these conditions, the installation is stopped by the installation program.

212

Troubleshooting Tivoli Using the Latest Features

There are two types of installation for the Server Install: 򐂰 Typical 򐂰 Custom Attention: Integrated Install is a single node installation. All Tivoli components will be installed on a single TMR Server. This mode of installation is most recommended when you are looking at a simple environment for device management or other simple configurations, a testing or demonstration environment, or a training environment.

Most of the production deployment would fall under a complex Tivoli environment that would require you to have different components installed and configured on different systems. A typical complex environment would also be one that would manage hundreds to thousands of endpoints. For a complex environment, use either the installation program provided by IBM Tivoli Configuration Manager or the installation mechanisms provided by Tivoli Management Framework or Tivoli Software Installation Service. When using the installation program provided by IBM Tivoli Configuration Manager, you might need to uninstall components and services of IBM Tivoli Configuration Manager and move RIM object to other managed nodes after these managed nodes are created. Also note that when installing high availability environments (like HACMP), it is recommended to use the classical install methods rather than Integrated Install.

8.2.1 Authorization roles To install IBM Tivoli Configuration Manager Version 4.2, the user must have the following authorization roles: 򐂰 Root access in a UNIX operating system. 򐂰 Member of the Administrators group on a Windows operating system.

8.2.2 Database requirements RDBMS software needs to installed and configured prior to installing IBM Tivoli Configuration Manager. The TMR Server can be the RDBMS server or client. The communication between the two needs to be functional before doing the install. Depending on the type of RDBMS server being used, one will have to follow the steps provided in Chapter 5, “Working with repositories and queries”, in the IBM

Chapter 8. ISMP based installation (Integrated Installation)

213

Tivoli Configuration Manager Planning and Installation Guide Version 4.2, GC23-4702.

8.2.3 Starting the installation programs Before starting the installation program, read the information about the installation you are planning to perform. The general procedure for starting the installation programs is shown in the next sections.

UNIX From the /FRESH subdirectory of the IBM Tivoli Configuration Manager Installation CD 5, enter one of the following commands: 򐂰 If you do not have a Java Virtual Machine Version 1.3.1 on the system and you want to download this software to the /tmp directory, enter: ./file .bin

Where file is the name of the file that starts the installation program. For each UNIX operating system, there is a different installation program, for example, for IBM AIX, the installation program file will be setup_aix.bin. 򐂰 If you want to download the Java Virtual Machine to a directory other than the /tmp directory, enter: ./file .bin -is:tempdir directory

Where directory is the directory where you download the Java Virtual Machine. Note: You need at least 50 MB of free space in your tempdir directory 򐂰 If you have a Java Virtual Machine on the system and do not want to use the Java provided by Tivoli then, enter: java -D is.external.home=path -jar setup.jar

Where path is the path to the setup.jar file, which is located on the installation CD under the /FRESH subdirectory. Note: If the correct version of Java is not installed, the following message appears at the beginning of the install: #java -Dis.external.home=/img/cd5/FRESH -jar setup.jar -jar : illegal argument Usage: java [-options] class

214

Troubleshooting Tivoli Using the Latest Features

Windows From the /FRESH subdirectory of the IBM Tivoli Configuration Manager Installation CD 5, run the Setup.exe file.

8.2.4 Server Install: Behind the scenes The Java front end wraps the CLI commands used to install IBM Tivoli Configuration Manager Version 4.2 products and patches. Here are the steps that are performed by the Integrated Server Install and the entries in the cmismp.log. 1. The database connection is verified by referring to information provided on the RIM windows: (Jul 18, 2002 3:00:04 PM), Setup.product.install, com.tivoli.cmismp.util.DB2Database, dbg, DB2 CLI to execute: [. "/home/db2data/sqllib/sqllib/db2profile", db2 -t +p] (Jul 18, 2002 3:00:05 PM), Setup.product.install, com.tivoli.cmismp.util.DB2Database, dbg, DB2 CLI exit value: 0 (Jul 18, 2002 3:00:05 PM), Setup.product.install, com.tivoli.cmismp.util.DB2Database, dbg, DB2 CLI stdout: Database Connection Information Database server = DB2/6000 7.2.0 SQL authorization ID = DB2DATA Local database alias = MDIST2 DB20000I The QUIT command completed successfully.

2. The same check is made for PLANNER, INV, and CCM, depending on the type of installation. 3. The system checks to see if Tivoli Management Framework exists on the box. 4. Tivoli Management Framework 4.1 is installed by using the wserver command on a UNIX box or by using the Silent Install on the Windows box: cmd:./wserver -c /img/new1 LK=IBMTIVOLIMANAGEMENTREGIONLICENSEKEY41 ALIDB=/usr/local/Tivoli/db LIB=/usr/local/Tivoli/bin/../lib BIN=/usr/local/Tivoli/bin MAN=/usr/local/Tivoli/bin/../man APPD=/usr/local/Tivoli/bin/../appd CAT=/usr/local/Tivoli/bin/../msg_cat CreatePaths=1

5. The gateway is created on the server by using the wcrtgate command: (Jul 18, 2002 3:19:30 PM), Setup.product.install, com.tivoli.cmismp.services.TMEService, dbg, CLI command to execute: [. "/etc/Tivoli/setup_env.sh" || source "/etc/Tivoli/setup_env.csh", echo abcdefghijk, wcrtgate -h aix-inv01b -p 9494 -n aix-inv01b-gw]

6. Using winstall, the following products are installed: a. Java 1.3 for Tivoli

Chapter 8. ISMP based installation (Integrated Installation)

215

b. Tivoli Java Client Framework 4.1 c. JavaHelp 1.0 for Tivoli 4.1 d. Tivoli Java RDBMS Interface Module (JRIM) 4.1 e. Distribution Status Console, Version 4.1 The command syntax is: winstall -c "/img/new2/JAVA" -i JRE130.IND -y -e BIN=! LIB=! MAN=! CAT=!

7. The mdist2 RIM object is created using the wcrtrim command. 8. The Silent Install of Tivoli Desktop for Windows Version 4.1 is installed next on the Windows operating system. The dswin.log file is created. 9. Web Interface Version 4.2 is installed next by using winstall. 10.Scalable Collection Service Version 4.1 is installed by using wpatch. 11.Inventory Version 4.2 is installed by using winstall. 12.invdh_1 and inv_query RIM objects are created, followed by admin and schema scripts for Inventory. 13.The Query Libraries are created. 14.The following products are installed by using winstall: a. Inventory Gateway Version 4.2 b. Resource Manager c. Resource Manager - Gateway Component d. Activity Planner Version 4.2 (runs SQL scripts for Activity Planner) e. Change Manager Version 4.2, (runs SQL scripts for Change Manager) f. Software Distribution Version 4.2 g. Software Distribution Gateway Version 4.2 h. Software Distribution Software Package Editor Version 4.2 i. IBM Tivoli Directory Query Version 4.2 15.The following plug-ins are installed and updated: a. APM plug-in for Inventory b. CM plug-in for Inventory c. Update Updating plug-in for Task Library d. Software Distribution e. Inventory (hardware conditions)

216

Troubleshooting Tivoli Using the Latest Features

For detailed information regarding the above commands, please refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806. In the next two sections, we shall explore the Typical and Custom Integrated Server installs.

8.2.5 Typical Install When using this installation, the following components are installed, configured, or created: 򐂰 Tivoli Management Framework. 򐂰 To support a single server installation, a Tivoli gateway is also created on this machine. This gateway is automatically configured as a repeater. The installation of Tivoli Management Framework created the Tivoli Server. 򐂰 Resource Manager and Resource Manager Gateway. 򐂰 Enterprise Directory Query Facility. 򐂰 Scalable Collection Service.

Scalable Collection Service is considered part of Tivoli Management Framework, and it is used to collect inventory scan results. 򐂰 Distribution Status console.

The Distribution Status console tracks software distributions and other profile distributions. The installation of the Distribution Status console requires the following Java components that are provided by Tivoli Management Framework: – Java 1.3 for Tivoli – Java RDBMS Interface Module – Java Client Framework for Tivoli These Java components are used by several of the other IBM Tivoli products. 򐂰 The installation of the Distribution Status console creates the mdist2 RIM object. 򐂰 Activity Planner.

This installation creates the planner RIM object. This RIM object can be on the Tivoli Server. 򐂰 Change Manager.

This installation creates the ccm RIM object.

Chapter 8. ISMP based installation (Integrated Installation)

217

򐂰 Inventory and Inventory Gateway.

The installation of the Inventory component creates the inv_query and invdh_1 RIM objects. 򐂰 Software Distribution and Software Distribution Gateway. 򐂰 Software Package Editor.

This installation program can be used on all platforms supported as a Tivoli Server. For details about which platforms are supported as a Tivoli Server, see the Tivoli Management Framework Release Notes Version 4.1, GI11-0890. IBM has optimized and created a simple to use Java GUI to install the necessary components for IBM Tivoli Configuration Manager 4.2. Note: The default actions for a Typical fresh installation are: 򐂰 Create a gateway called -gw (for example, lab16036-gw). 򐂰 Assign a gateway to port 9494. 򐂰 Run Create default tablespaces and run schema , using the default users and paswords such as planner/planner and ccm/tivoli. The default user ID and passwords are documented in Table 21 of IBM Tivoli Configuration Manager Planning and Installation Guide Version 4.2, GC23-4702. 򐂰 Install but do not configure LDAP. 򐂰 Create an APM user (tivapm with password tivapm). 򐂰 Install the language package of the machine locale.

Once the installation program is started as specified in 8.2.3, “Starting the installation programs” on page 214, and the language for install selected, you will be presented with the choice of either doing a Typical or a Custom Install, as seen in Figure 8-1 on page 219. Let us walk through the install choosing the Typical Install.

218

Troubleshooting Tivoli Using the Latest Features

Figure 8-1 Typical Install: Choosing your install options

1. Enter the home directory of the Tivoli installation (Figure 8-2 on page 220). The default path is c:\Program Files\Tivoli for Windows, /usr/local/Tivoli for UNIX. Ensure there is sufficient disk space available to install the binaries and the database files for short-term and long-term capabilities. After selecting the location, click Next.

Chapter 8. ISMP based installation (Integrated Installation)

219

Figure 8-2 Typical Install: Tivoli Server setup

2. This brings you to the database vendor selection screen Figure 8-3 on page 221. As mentioned earlier, the RDBMS server needs to be installed and configured for Integrated Install to complete. The IBM Tivoli Configuration Manager 4.2 server can be a RDBMS server or client. In our installation, we chose DB2 as our Database vendor, though the choices are Oracle, Sybase, MSQL, and Informix. The Database client interface home is the database instance binary path. After selecting the right values, click Next.

220

Troubleshooting Tivoli Using the Latest Features

Figure 8-3 Typical Install: Database Vendor Information

3. The next screen (Figure 8-4 on page 222) provides information for RDBMS and RIM. The data on this screen will be used by the wcrtrim and wsetrim commands during the installation: Database path

The instance path. Please note that this is different than the Database client interface home that you have specified in Figure 8-3.

Server ID

Specifies the server ID for the database. This value enables the RIM host to connect to the RDBMS.

Database vendor

Specifies the vendor of the RDBMS you are using. This field should auto fill from the previous screen

DB2 Database Name

Specifies the unique name (database ID) of the database to which the RIM object will connect.

Database administrator name

Login name of the database administrator for that database and its instance.

Chapter 8. ISMP based installation (Integrated Installation)

221

Database administrator password

Password for the database administrator.

After filling in the right values, click Next. The Install program will now validate the information by trying to make a connection to the database using the information provided.

Figure 8-4 Typical Install: RDBMS and RIM information

4. If the connect to the database is successful, you should see the summary page outlining the products and directory structure where Tivoli Management Framework, Inventory, and Software Distribution will be installed, as seen in Figure 8-5 on page 223. Clicking Next will initiate the actual install of IBM Tivoli Configuration Manager 4.2.

222

Troubleshooting Tivoli Using the Latest Features

Figure 8-5 Typical Install: Summary window of settings

5. Integrated Install should now request the location of the Install Images. You need to point to the Tivoli Management Framework CD. You will also see a progress window showing different components being installed, as seen in Figure 8-6 on page 224. 6. Once the installation of IBM Tivoli Configuration Manager is complete, you will see the cmsummary report. A complete listing can be viewed in the cmsummary.log file located in the temporary directory. Details about this log are discussed later in this chapter.

Chapter 8. ISMP based installation (Integrated Installation)

223

Figure 8-6 Integrated Install in progress

Note: If admin scripts are not modified, then, for a Typical Install, a single database called cm_db is created.

8.2.6 Custom Install When you select Inventory during a custom installation, all the components listed for a typical installation are installed with the exception of the Software Distribution, Software Distribution Gateway, and Software Package Editor components. However, when you select Software Distribution, which requires

224

Troubleshooting Tivoli Using the Latest Features

the selection of Inventory, all the additional components are installed, as mentioned in 8.2.5, “Typical Install” on page 217. This installation program can be used on all platforms supported as a Tivoli Server. For details about which platforms are supported as a Tivoli Server, see the Tivoli Management Framework Release Notes Version 4.1, GI11-0890. IBM has optimized and created a simple to use Java GUI to install the necessary components for IBM Tivoli Configuration Manager 4.2. Our next few screens will walk you through a Custom Install and its required information. Once the installation program is started, as specified in 8.2.3, “Starting the installation programs” on page 214, and the language for install selected, you will be presented with the choice of either doing a Typical or a Custom Install. In this section, we shall deal with the Custom Install. 1. Figure 8-7 shows the beginning of the Integrated Install Custom screen. Press Next to commence the installation.

Figure 8-7 Custom Install

2. The Custom Install allows you to install selectively, for example, if you want to install only Inventory or you would like to install Inventory and Software Distribution, as seen in Figure 8-8 on page 226. After selecting the components, click Next.

Chapter 8. ISMP based installation (Integrated Installation)

225

Restriction: You cannot Install Software Distribution alone when using Integrated Install. Inventory needs to be installed if Software Distribution is required.

Figure 8-8 Custom Install: Components choice

3. The next screen allows you to choose the additional languages you wish to install, (Figure 8-9 on page 227). Select the required language and click Next.

226

Troubleshooting Tivoli Using the Latest Features

Figure 8-9 Custom Install: Additional Languages

4. The next screen allows you to choose the Destination Directory and configure the Tivoli Management Framework Gateway (Figure 8-10 on page 228).

Chapter 8. ISMP based installation (Integrated Installation)

227

Figure 8-10 Custom Install: Tivoli Server destination directory structure

5. The next screen (Figure 8-11 on page 229) allows you to select how you would like to configure the database. It allows you to run the SQL scripts and how you want to run them. As mentioned earlier, the RDBMS server needs to be installed and configured for Integrated install to complete. The IBM Tivoli Configuration Manager 4.2 server can be a RDBMS server or client: – No configuration: Admin and Schema scripts will not be run during the install. You can manually run them after the install completes. The RDBMS server connection needs to be established, even if No configuration is selected. – Run schema scripts only, tablespaces already created: Install will only run the schema scripts; it assumes that the tablespaces were previously created. – Create default tablespaces and run schema scripts: Install will run both the admin and the schema scripts. We recommend that you speak to your DBA in regards to recreating and fine tuning these tablespaces at a later time accordingly to sizing and performance. – Create custom tablespaces and run schema scripts: This option allows you to run custom admin scripts and then run the schema scripts. You have probably received custom database information from your DBA. Again, plan accordingly for short-term and long-term data sizing and performance.

228

Troubleshooting Tivoli Using the Latest Features

You also select the Database vendor and provide the location for Database binaries on this screen.

Figure 8-11 Custom Install: Repository configuration information

6. The next window (Figure 8-12 on page 230) allows you to configure the Distribution Status Console (Mdist2) RIM object. This installation creates the MDist2 RIM object called mdist2. The fields are defined as follows: RM Name

Name of the RIM object, mdist2.

Database Name

Name of the database to which this RIM object connects too.

RIM User Name

Name of the RIM user.

RIM password

Password for RIM user.

Server ID

Specifies the server ID for the database. This value enables the RIM host to connect to the RDBMS.

Database Path

The instance path.

Database vendor

Specifies the vendor of the RDBMS you are using. This field should auto fill from the previous screen.

Chapter 8. ISMP based installation (Integrated Installation)

229

DB2 Instance Name

Specifies the unique name (database ID) of the database to which the RIM object will connect.

Database administrator name

Login name of the database administrator for that database and its instance.

Database administrator password

Password for the database administrator.

After filling in the right values, click Next. The Install program will now validate the information by trying to make a connect to the database using the information provided.

Figure 8-12 RDBMS and RIM information

7. The Activity Planner is the next component of installation. As the screen (Figure 8-13 on page 231) specifies, the user name and the password must be associated with the Tivoli Administrator for proper authentication and for overall security. You can also choose to associate this login with an operating system user name.

230

Troubleshooting Tivoli Using the Latest Features

Figure 8-13 Custom Install: Activity Planner user

8. The information entered in the next screen (Figure 8-14 on page 232) pertains to the RDBMS and RIM for Activity Planner. The meaning of each field is identical to the previous screen for Distribution Status Console, though the fields are specific to the Activity Planner. The name of the RIM object created is planner.

Chapter 8. ISMP based installation (Integrated Installation)

231

Figure 8-14 Custom Install: Activity Planner repository information

9. The information entered in the next screen (Figure 8-15 on page 233) pertains to the RDBMS and RIM for Inventory. The meaning of each field is identical to the previous screen for Distribution, though the fields are specific to Inventory. The installation creates the inv_query and invdh_1 RIM objects on the TMR Server.

232

Troubleshooting Tivoli Using the Latest Features

Figure 8-15 Custom Install: Inventory repository information

10.The information entered in the next window (Figure 8-16 on page 234) pertains to the RDBMS and RIM for Change Manager. The meaning of each field is identical to the previous screen for Activity Planner, though the fields are specific to Change Manager. The installation creates the ccm RIM object. Note: It is not a product requirement for these RIM objects to be on the TMR Server. Actually, you could get better performance if you have these objects on other managed nodes, since TMR Server will likely be a highly utilized machine. To move them to other managed nodes, use the wmvrim command.

Chapter 8. ISMP based installation (Integrated Installation)

233

Figure 8-16 Custom Install: Change Manager repository information

11.The Next screen (Figure 8-17 on page 235) is about the Enterprise Directory Query Facility configuration. Even though you may choose Do not configure the Enterprise Directory Query Facility, it will install it anyway, but just not configure it. If you do chose Configure LDAP access for the Enterprise Directory Query Facility, then the following must be installed prior to doing so: – Tivoli Management Framework – Java 1.3.0 for Tivoli – An installed and configured LDAP directory server. The LDAP Server host name field requires the Lightweight Directory Access Protocol (LDAP) server host name. The LDAP Distinguished user name field specifies the distinguished name of the user with LDAP Administrator privileges. The next field requires that you enter the password for this user ID. For example, the LDAP administrator for Microsoft Active Directory in the .swd.com domain would be specified as follows: CN=Administrator,CN=Users,dc=SWD,dc=COM

The LDAP Naming context field requires the naming context in the enterprise directory tree level used to retrieve information with a query.

234

Troubleshooting Tivoli Using the Latest Features

For example, the naming context within Microsoft Active Directory that is required to make a query to obtain a list of users in the.swd.com domain would be specified as follows: CN=Users,dc=SWD,dc=COM

Figure 8-17 Custom Install: Enterprise Directory Query Facility configuration

Note: After installing the Enterprise Directory Query Facility component, you must execute the LDAP-specific script to extend the schema. These scripts are in LDAP Data Interchange Format (LDIF). Each of the LDAP scripts extends the enterprise directory schema. These scripts are located in $BINDIR/TAS/DirQuery/SCRIPTS directory on the Tivoli Server. You must copy the file from the Tivoli Server to the LDAP server and run the script locally, so that you can run queries from the LDAP server against Enterprise Directory Query Facility:

For Microsoft Active Directory: Adupd.ldf For IBM SecureWay: IBMupd.ldf For Novell Directory Server: NDSupd.ldf 12.The final step before the install commences is the review of all the information provided to Integrated Install. Figure 8-18 on page 236 shows the review

Chapter 8. ISMP based installation (Integrated Installation)

235

panel. A quick glance will help if all the parameters are as required. If any change need to be made, then use the Back button to take you the screen which requires modifications. Clicking Next will initiate the actual install of IBM Tivoli Configuration Manager 4.2.

Figure 8-18 Custom Install: Review

13.Integrated Install should now request for the location of the Install Images. You need to point to the Tivoli Management Framework CD. The progress bar will indicate each component being installed, as seen in Figure 8-6 on page 224. 14.Once the installation of IBM Tivoli Configuration Manager is complete, you will see the cmsummary report.

236

Troubleshooting Tivoli Using the Latest Features

A complete listing can be viewed in the cmsummary.log file located in the temporary directory. Details about this log are discussed later in the next section.

8.3 Troubleshooting Server Install In this section, we shall outline what files and logs to check when a problem occurs during installation. Server Install for IBM Tivoli Configuration Manager 4.2 not only installs Tivoli Management Framework, but it also install Inventory, Software Distribution, and Deployment Services. In 8.2.4, “Server Install: Behind the scenes” on page 215, we explained what Tivoli commands are issued behind the scenes to install IBM Tivoli Configuration Manager 4.2. In this section, we shall show how you can use the traditional logs and the consolidated Integrated Server logs to troubleshoot a failed install

8.3.1 Cmsummary.log After the install completes, integrated install presents the install summary. This window will show a detailed view of what products were successfully installed and, if install failed, what portion of the install failed. Cmsummary.log also shows all the wrapper commands used by the Integrated Install. This log is found in the $Temp directory. Example 8-1 shows a successful cmsummary.log from a custom install. Example 8-1 Successful cmsummary.log ******************************************** ******************************************** Installation Succeeded Successful Items: Tivoli Management Framework gateway Java Client Framework JavaHelp Java RDBMS Interface Module Distribution Status Console Distribution Status admin script Distribution Status Console Rim Object Distribution Status schema script Tivoli Desktop for Windows Web Interface Scalable Collection Service

Chapter 8. ISMP based installation (Integrated Installation)

237

Inventory Inventory admin script Inventory schema script Inventory gateway Resource Manager server Resource Manager gateway Activity Planner Activity Planner admin script Activity Planner schema script Change Manager Change Manager admin script Change Manager schema script Software Distribution Software Distribution gateway Software Package Editor Enterprise Directory Query Facility Registering plugins for Inventory Registering plugins for Inventory Registering plugins for Software Distribution Registering plugins for Software Distribution ******************************************** 2002.09.10 21:55:31 - Tivoli Management Framework gateway - Installation Succeeded ********** 2002.09.10 21:58:36 - Java - Installation Succeeded winstall -c "D:/austin/fwork/images/41/20020901/new2/JAVA" -i JRE130.IND -y -e ********** 2002.09.10 21:59:38 - Java Client Framework - Installation Succeeded winstall -c "D:/austin/fwork/images/41/20020901/new2/JAVA" -i JCF41.IND -y -e ********** 2002.09.10 22:00:19 - JavaHelp - Installation Succeeded winstall -c "D:/austin/fwork/images/41/20020901/new2/JAVA" -i JHELP41.IND -y -e ********** 2002.09.10 22:01:12 - Java RDBMS Interface Module - Installation Succeeded winstall -c "D:/austin/fwork/images/41/20020901/new2/JAVA" -i JRIM41.IND -y -e ********** 2002.09.10 22:03:15 - Distribution Status Console - Installation Succeeded winstall -c "D:/austin/fwork/images/41/20020901/new2/JAVA" -i MDIST2GU.IND -y -e ********** 2002.09.10 22:03:40 - Distribution Status admin script - Installation Succeeded ********** 2002.09.10 22:03:55 - Distribution Status Console Rim Object - Installation Succeeded ********** 2002.09.10 22:04:08 - Distribution Status schema script - Installation Succeeded ********** 2002.09.10 22:04:23 - Tivoli Desktop for Windows - Installation Succeeded

238

Troubleshooting Tivoli Using the Latest Features

D:\austin\fwork\images\41\20020901\new2\DESKTOP\NT_95\setup.exe -s SMS -f1"C:\TEMP\setup0.iss" -f2"C:\TEMP\\dswin.log" ********** 2002.09.10 22:08:20 - Web Interface - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/SWD" -i WEBUI.IND -y -e ********** 2002.09.10 22:12:00 - Scalable Collection Service - Installation Succeeded wpatch -c "D:/rome/cm/20020823/images/tcm/cd1/MCOLLECT" -i MCOLLECT.IND -y -e ********** 2002.09.10 22:28:15 - Inventory - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/INVENTORY" -i 42_INV_F.IND -y -e @RDBMS_Vendor@=DB2 @RDBMS_DB_Name@=inv_db @RDBMS_DB_Home@="C:/Program Files/Sqllib" @RDBMS_DB_Param_one@=tcpip @RDBMS_DB_UserName@=invtiv @RDBMS_DB_Param_two@=~DB2 ********** 2002.09.10 22:28:39 - Inventory admin script - Installation Succeeded ********** 2002.09.10 22:31:08 - Inventory schema script - Installation Succeeded ********** 2002.09.10 22:48:34 - Inventory gateway - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/INVENTORY" -i 42_GW_FR.IND -y -e ********** 2002.09.10 22:58:27 - Resource Manager server - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/TRM" -i ALI_TRM.IND -y -e ********** 2002.09.10 22:59:00 - * - Installation Succeeded $BINDIR/TRM/RegisterUser.sh ********** 2002.09.10 23:00:07 - Resource Manager gateway - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/TRM" -i LCF_TRM.IND -y -e ********** 2002.09.10 23:05:15 - Activity Planner - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/SWD" -i APM.IND -y -e @From@=ISMP @Username@=tivapm @Password@=...@RDBMS_Vendor@=DB2 @RDBMS_DB_Name@=planner @RDBMS_DB_Home@="C:/Program Files/Sqllib" @RDBMS_DB_Param_one@=tcpip @RDBMS_DB_UserName@=planner @RDBMS_DB_Param_two@=~DB2 ********** 2002.09.10 23:05:39 - Activity Planner admin script - Installation Succeeded ********** 2002.09.10 23:06:15 - Activity Planner schema script - Installation Succeeded ********** 2002.09.10 23:09:23 - Change Manager - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/SWD" -i CCM.IND -y -e @RDBMS_Vendor@=DB2 @RDBMS_DB_Name@=ccm @RDBMS_DB_Home@="C:/Program Files/Sqllib" @RDBMS_DB_Param_one@=tcpip @RDBMS_DB_UserName@=tivoli @RDBMS_DB_Param_two@=~DB2 **********

Chapter 8. ISMP based installation (Integrated Installation)

239

2002.09.10 23:09:53 - Change Manager admin script - Installation Succeeded ********** 2002.09.10 23:10:26 - Change Manager schema script - Installation Succeeded ********** 2002.09.10 23:20:56 - Software Distribution - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/SWD" -i SWDIS.IND -y -e ********** 2002.09.10 23:23:20 - Software Distribution gateway - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/SWD" -i SWDISGW.IND -y -e ********** 2002.09.10 23:24:16 - Software Package Editor - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/SWD" -i SWDISJPS.IND -y -e ********** 2002.09.10 23:31:01 - Enterprise Directory Query Facility - Installation Succeeded winstall -c "D:/rome/cm/20020823/images/tcm/cd1/SWD" -i QUERYDIR.IND -y -e ********** 2002.09.10 23:31:47 - Registering plugins for Inventory - Installation Succeeded $BINDIR/TME/APM/SCRIPTS/reg_inv_plugin.sh ********** 2002.09.10 23:32:02 - Registering plugins for Inventory - Installation Succeeded $BINDIR/TME/CCM/SCRIPTS/reg_invscan_plugin.sh ********** 2002.09.10 23:32:25 - Registering plugins for Software Distribution Installation Succeeded $BINDIR/TME/APM/SCRIPTS/reg_swd_plugin.sh ********** 2002.09.10 23:32:47 - Registering plugins for Software Distribution Installation Succeeded $BINDIR/TME/CCM/SCRIPTS/reg_swd_plugin.sh **********

8.3.2 Cmismp.log The cmismp.log contains the date/time stamp of the installation, the chosen components, and a detailed description of each component and the action, as well as the installation command. This is the only log that is produced by the JAVA based Integrated Install. This log is also found in the $Temp directory. An example of cmismp.log is shown in Example 8-2 on page 241.

240

Troubleshooting Tivoli Using the Latest Features

Example 8-2 Start of an cmismp.log (Jul 18, 2002 2:58:12 PM), Setup.product.install, com.tivoli.cmismp.wizard.actions.CMILoadResponseFile dbg, Enter execute (Jul 18, 2002 2:58:12 PM), Setup.product.install,CMISetGenericProperties, dbg, The hostname is aix-inv01band the IP is x.x.x.x.x(the IP address will display) (Jul 1820022:58:12)PM),Setup.product.install,com. tivoli.cmismp.wizard.actions.SetProductsDefaultAction, dbg =, Creating registry instance p2=INV|4.2|SELECTED,VISIBLE|TMF|SWD|50 p3=APM|4.2|SELECTED|TMF|CCM|30 p4=CCM|4.2|SELECTED|TMF,APM|*|30 p5=SWD|4.2|SELECTED,VISIBLE|INV|*|50 p6=WebUi|4.2|SELECTED|TMF|*|20 p7=QRY|4.2|SELECTED|TMF|*|20

The log continues stating the component and database connection information for each component. Example 8-3 is another example of the information in the log. In this case, we are looking at the connection status with the database for MDist2 and the Activity Planner RIM components. Example 8-3 MDist2 and the Activity Planner database components Database Connection Information Database server SQL authorization ID Local database alias

= DB2/6000 7.2.0 = DB2DATA = MDIST2

Database Connection Information Database server SQL authorization ID Local database alias

= DB2/6000 7.2.0 = DB2DATA = PLANNER

The installation log also outputs any dependencies that the installation program requires and it will log the status as well. This is the extract from the tivoli.cinstall files. Example 8-4 on page 242 is an example of the output.

Chapter 8. ISMP based installation (Integrated Installation)

241

Example 8-4 Dependencies Checking product dependencies... Product JRE130 is already installed as needed. Product TMF_3.7 is already installed as needed. Dependency check completed. Inspecting node aix-inv01b... Installing Product: Tivoli Java Client Framework 4.1 Unless you cancel, the following operations will be executed: For the machines in the independent class: hosts: aix-inv01b need to copy the CAT (generic) to: aix-inv01b:/usr/local/Tivoli/msg_cat need to copy the GBIN (generic) to: aix-inv01b:/usr/local/Tivoli/bin/generic_unix

In an event of a problem, the log will show the failure and point to where the problem could be identified. Example 8-5 is an example of an install failure. Example 8-5 Installation problem STACK_TRACE: 12 ProductException: (error code = 401; message="error loading product action") Look at the log file /usr/local/Tivoli/install_tmp/ismp002/3756214.tmp for details. (Jul 18, 2002 3:26:45 PM), Setup.product.install, com.tivoli.cmismp.product.actions.CheckCDImageAction,

Note: cmismp.log file is appended with every new attempt of install. It would be a good practice to rename this log file prior to starting a fresh install so that you have fresh log.

8.3.3 Traditional logs Table 8-1 highlights the log files written during a TMR Server installation. Table 8-1 Files written during server installation

242

File

UNIX directory

Windows folder

Component

tivoli.sinstall

/tmp

%DBDIR%\tmp

TMR Server

oservlog

$DBDIR

%DBDIR%

TMR Server

install.cfg.error install.cfg.output

/tmp

%DBDIR%\tmp

TMR Server

Troubleshooting Tivoli Using the Latest Features

For every product and patch install after the installation, Tivoli Management Framework will write to tivoli.cinstall. This log is located in /tmp on UNIX or %DBDIR%/tmp on Windows. This file is truncated with every winstall, wpatch or wclient command, hence reflecting the last product or patch installed on the system.

8.3.4 General troubleshooting steps for Server Install If something goes wrong during install, do the following: 1. Check in the cmsummary.log to identify the component in error. 2. If needed, look also in the cmismp.log. 3. Make sure the $(TEMP) is set and has enough disk space to start Java Virtual Machine (JVM). 4. If the failure is caused by a script, manually run it. 5. If the failure is in a silent setup, run it manually from the CD to see the output. 6. Fix the problem and restart the installation program; the installation program is able to perform only the missing or failed steps.

8.3.5 Server Install troubleshooting examples In this section, we will cover two real-life examples for Server Install troubleshooting.

Case study-1 The following message in Figure 8-19 on page 244 was received during the Server Install.

Chapter 8. ISMP based installation (Integrated Installation)

243

Figure 8-19 Case study-1

Example 8-6 shows the corresponding cmsummary.log for this problem. Example 8-6 cmsummary.log Installation Failed The following item failed to install JAVA RIM It failed for the following reason: DISSE0020E File or directory “D:\tmp\cd3\SPB\Tivoli_JRIM.spb” not found DISSEOOO5E Operation unsuccessful Successful items: Java Runtime Environment JavaHelp Java Client Framework ===================================== 2002.09.12 10:31:52 - Java Runtime Environment - Installation Succeeded wdinstsp -f -n “Tivoli_JRE_NT.1.3.0” “D:\tmp\cd3\SPB\Tivoli_JRE_NT.spb” =======

244

Troubleshooting Tivoli Using the Latest Features

2002.09.12 10:31:53 - JavaHelp - Installation Succeeded wdinstsp -f -n “Tivoli_JHelp.4.1” “D:\tmp\cd3\SPB\Tivoli_JHelp.spb” ======= 2002.09.12 10:31:55 - Java Client Framework - Installation Succeeded wdinstsp -f -DDSWIN_DIR=“D:\Program Files\Tivoli\desktop” -n “Tivoli_JCF” “D:\tmp\cd3\SPB\Tivoli_JCF.spb” ======= 2002.09.12 10:31:56 - Java RIM - Installation Failed wdinstsp -f -DDSWIN_DIR=“D:\Program Files\Tivoli\desktop” “Tivoli_JRIM.4.1” “D:\tmp\cd3\SPB\Tivoli_JRE_NT.spb” ======= DISSE0020E File or directory D:\tmp\cd3\SPB\Tivoli_JRIM.spb”not found. DISSEOOO5E Operation unsuccessful ======= 2002.09.12 10:31:56 - Java RIM - Uninstallation Failed wdrmvsp -f -DDSWIN_DIR=“D:\Program Files\Tivoli\desktop” “Tivoli_JRIM.4.1”

Investigation of the cmsummary.log 򐂰 The first entries in the Summary Panel show us that one of the operations was unsuccessful: The following item failed to install JAVA RIM It failed for the following reason: DISSE0020E File or directory “D:\tmp\cd3\SPB\Tivoli_JRIM.spb” not found DISSEOOO5E Operation unsuccessful Successful items: Java Runtime Environment JavaHelp Java Client Framework

򐂰 The following line shows us the time stamp, component, and operation result, which is Installation Failed: 2002.09.12 10:31:56 - Java RIM - Installation Failed

򐂰 The following lines show us the command that was executed and reason for failure: DISSE0020E File or directory D:\tmp\cd3\SPB\Tivoli_JRIM.spb”not found. DISSEOOO5E Operation unsuccessful

򐂰 Finally, the following lines show us that the Installation program tried to do a roll-back attempt, but was unsuccessful. This is okay. 2002.09.12 10:31:56 - Java RIM - Uninstallation Failed wdrmvsp -f -DDSWIN_DIR=“D:\Program Files\Tivoli\desktop” “Tivoli_JRIM.4.1”

Chapter 8. ISMP based installation (Integrated Installation)

245

Conclusion Something went wrong during the configuration or running the database scripts. This error does not stop the installation procedure that will be completed, but the environment could be not properly configured. For resolution: 򐂰 Check the cmsummary.log to identify ALL the steps in error 򐂰 If needed, look also in the cmismp.log 򐂰 Fix the problem and manually run the missed steps or database scripts

Case study-2 The message shown in Figure 8-20 was received during the Server Install.

Figure 8-20 Case study-2

This is an example of an error connecting to the database. Example 8-7 on page 247 shows the corresponding cmsummary.log (truncated).

246

Troubleshooting Tivoli Using the Latest Features

Example 8-7 Portion of the smsummary.log Installation Failed The following item failed to install Change Manager schema script It failed for the following reason: Command can not be run Successful items: Configuration Manager admin script Activity Planner Activity Planner schema script Registering plugins for Inventory Registering plugins for Software Distribution ===================================== 2002.09.23 11:39:44 - Configuration Manager admin script - Installation Succeeded ======== 2002.09.23 11:43:35 - Configuration Manager schema script - Installation Failed ======== Command cannot be run ======== 2002.09.23 11:47:55 Registering plugins for Inventory -Installation Succeeded %BINDIR/TME/APM/SCRIPT/reg_inv_plugin.sh

Investigation of the cmsummary.log 򐂰 The following first entries in the Summary Panel show us that there was a failure running the SQL script: The following item failed to install Change Manager schema script It failed for the following reason: Command can not be run

򐂰 The following lines show that even though there was a problem executing the schema script, the installation did not abort and continued to register plug-ins: 2002.09.23 11:47:55 Registering plugins for Inventory -Installation Succeeded %BINDIR/TME/APM/SCRIPT/reg_inv_plugin.sh

Investigation of the cmismp.log We can also check the cmismp.log, which gives us more details about the preceding error condition. Example 8-8 on page 248 is the entry from this log

Chapter 8. ISMP based installation (Integrated Installation)

247

showing the exact nature of the problem: authorization error. This log is also useful for understanding which commands have been run. Example 8-8 Portion of the cmismp.log Setup.product.install,com tivoli.cmismp.util.DB2Database,dbg,DB2CLI stdout SQL1403N The user name and password supplied is incorrect SQLSTATE=08004

Conclusion This is an example of an error running database scripts with an incorrect user/password. Correct the user and password and manually run the scripts.

8.4 Desktop Install With IBM Tivoli Configuration Manager 4.2, you can now make a Tivoli endpoint a fully operational Tivoli Console on a Windows PC. Prior to IBM Tivoli Configuration Manager 4.2, this process had the following limitations: 򐂰 The PC had to be a Tivoli managed node and all the JCF based GUIs (MDist2, APM_Editor, APM_Monitor, and CCM, INV) were needed to be installed on this node. 򐂰 All the required Java packages (Java Runtime Environment (JRE), Swing, JavaHelp, Java Client Framework (JCF), and JavaRIM (JRIM)) has to be installed. 򐂰 During the installation, credentials needed to be specified again when the managed node to JCF spawning occurred.

In IBM Tivoli Configuration Manager 4.2, we can now install the following components on a Windows PC via Desktop Install: 򐂰 Tivoli Desktop for Windows 򐂰 Tivoli Java components 򐂰 Distribution Status Console 򐂰 Activity Planner GUI 򐂰 Change Manager GUI 򐂰 Inventory GUI 򐂰 Software Package Editor

248

Troubleshooting Tivoli Using the Latest Features

During the Desktop InstalI, ISMP synchronously runs the following activities behind the scenes: 򐂰 Install Desktop for Windows 򐂰 Temporary unpack Software Distribution disconnected commands (SPB) 򐂰 Install a pre-requisite SPB for environment setup 򐂰 Install all Java mandatory prerequisites 򐂰 Install selected applications 򐂰 Clean up the environment

To launch the Desktop Install: 1. Run the setup.exe from the third IBM Tivoli Configuration Manager CD. 2. This will bring up the Tivoli Desktop Install. The first window request for which language to use during install followed the welcome screen, as seen in Figure 8-21.

Figure 8-21 Desktop Install: Welcome screen

3. Clicking Next should bring you to the Software License Agreement window. Press Next to continue. 4. In the next window (Figure 8-22 on page 250), Integrated Desktop Install prompts you to select either Typical Install or Custom Install. If Typical Install

Chapter 8. ISMP based installation (Integrated Installation)

249

is selected, than all the components except Software Package Editor is Installed on the system. Custom Install allows you to have a choice of which components to install. After making your choice, click Next.

Figure 8-22 Desktop Install: Type of Installation

5. We have selected Custom Install, which brings up a selection window, as shown in Figure 8-23 on page 251.

250

Troubleshooting Tivoli Using the Latest Features

Figure 8-23 Desktop Install: Component to Install

6. After selecting the components to install and pressing Next, Integrated Install installs the required components. Upon completion of install, Integrated Install will present the cmsummary.log and cmismp.log, which will be located in the temporary directory of the system. These files will show the status of the install.

8.5 Troubleshooting Desktop Install For Desktop Install, log and traces are available by default in only one format/file: 򐂰

$(Temp)/cmismp.log

򐂰

$(Temp)/cmsummary.log

Single components are installed using the Silent Installation and produce an additional log called $(Temp)/setup.log.

8.5.1 Desktop Install troubleshooting example In this scenario, the desktop and endpoint are installed in the silent mode. The error message in Figure 8-24 on page 252 is received.

Chapter 8. ISMP based installation (Integrated Installation)

251

Figure 8-24 Desktop Install problem

The corresponding cmsummary.log is shown in Example 8-9. Example 8-9 cmsummary.log Installation Failed The following item failed to install Tivoli Management Framework It failed for the following reason: Review the setup.log file No items were installed

Finally, we check the setup.log, as seen in Figure 8-25 on page 253.

252

Troubleshooting Tivoli Using the Latest Features

Figure 8-25 setup.log

Conclusion Setup.log gives us more detailed information. Since Silent Installation failed with a non 0 return code (RC=-3 means that there was an unexpected panel flow), we have to run the setup.exe program directly.

8.6 Web Gateway Install The Web Gateway installation program installs a Tivoli endpoint, the Web Gateway component, the Web Infrastructure component, and the Inventory and Software Distribution plug-ins. Tivoli provides different options to install and configure endpoints. Our focus here is to only show the endpoint install and multiple endpoints install via the Install Shield. For detailed information on installing endpoints via Tivoli Management Framework refer to the Tivoli Management Framework Enterprise Installation Guide Version 4.1, GC32-0804. With IBM Tivoli Configuration Manager 4.2, application components can be packaged and deployed on TMA. The fact that they are deployed on a TMA does not necessarily mean that they need to leverage a TMA, but simply that they do not require a classic ORB (a Tivoli managed node) to work. For this release, the usage of the SPB format, to be deployed to TMAs, will be leveraged by: 򐂰 Tivoli Web Gateway (TWG) installing its engine on a TMA where WebSphere and IBM Tivoli Access Manager is available. 򐂰 Web User Interface installing its engine on a TMA where TWG is available. 򐂰 Inventory installing servlets and tools needed to plug into TWG/WebUI. 򐂰 Software Distribution installing servlets and tools needed to plug into the TWG/Web User Interface.

Chapter 8. ISMP based installation (Integrated Installation)

253

SPBs have naming convention and leverage software dependencies. These dependencies occur because applications like TWG, which will need to check for external software prerequisites, encapsulate the logic to check for WebSphere and IBM Tivoli Access Manager availability, inside a custom script, whose failure will prevent the SPB installation. Other applications, such as the Web User Interface, SWD_WebUI_plugin, and INV_WebUI_plugin, will leverage the built-in capability of the SP to check for software prerequisites of other SPs. In this section, we will show a disconnected scenario where ISMP programs, installs TMA from a CD, and then, using the disconnected Software Distribution CLI, also installs TWG, Web User Interface, and application plug-ins, which are packaged in SPB format. In the Typical Installation, the following components are installed: 򐂰 Tivoli endpoint 򐂰 Web Gateway database 򐂰 Web Gateway server 򐂰 Web Infrastructure 򐂰 Inventory plug-ins for Web Infrastructure 򐂰 Software Distribution plug-ins for Web Infrastructure

Custom Installation provides flexibility, as you can install any combination of the listed components. Some components require prerequisites. Let us walk through an endpoint installation scenario via Custom Install: 1. From the IBM Tivoli Configuration Manager CD 4 for Web Gateway: 򐂰 For UNIX

Run ./file.bin for UNIX, where file is the name of the file that starts the installation program. For each UNIX operating system, there is a different installation program, for example, for IBM AIX, the installation program file will be setup_aix.bin. 򐂰 For Windows

Run the Setup.exe file. 2. Figure 8-26 on page 255 shows the component selection window for a custom install.

254

Troubleshooting Tivoli Using the Latest Features

Figure 8-26 Web Gateway: Components to Install

3. The following windows are identical to the direct Endpoint Install windows when Install Shield is used. The destination directory, gateway, and endpoint ports and endpoint options are entered into a window similar to Figure 8-27.

Figure 8-27 Endpoint Information

Chapter 8. ISMP based installation (Integrated Installation)

255

Where: Destination Directory Location of Endpoint Install. Gateway Port

Gateway’s port number. Default is set to 9494.

Endpoint Port

Endpoints port number. Default is set to 9495.

Endpoint options

lcfd configuration options, for example, -g hostname to specify the intercepting gateway. Refer to the lcfd command in the Tivoli Management Framework Reference Manual Version 4.1, SC32-0806 for a list of valid arguments.

4. The summary screen (Figure 8-28) displays the information entered. Please review it carefully; if any of the information is incorrect, please choose the Back button and change the information accordingly.

Figure 8-28 Review screen

5. The install finishes and if any errors occur, an error message will help with finding a solution. Figure 8-29 on page 257 shows successful installation of our endpoint. If you wish to view the details, you can refer to the installation log, cmsummary.log, located in the Temp directory for most Intel based systems and /tmp for UNIX-based installs.

256

Troubleshooting Tivoli Using the Latest Features

Figure 8-29 Endpoint Installation: Installation successful

8.7 Troubleshooting Web Gateway Install Log and traces are available by default in only one format/file: 򐂰 $(Temp)/cmismp.log 򐂰 $(Temp)/cmsummary.log

As a general method for solving Web Gateway Install problems, do the following: 򐂰 Look for the variables set in the SPBs and see if they work well for your environment. 򐂰 If the component in error is Web Gateway, SPBs produce the following output:

– $(temp_dir) /TWGinst_stderr.log – $(temp_dir)/TWGinst_stdout.log 򐂰 If the component in error is Web, SPB produces the following output:

– $(temp_dir)/webui_install.error – $(temp_dir)/ webui_install.output

Chapter 8. ISMP based installation (Integrated Installation)

257

򐂰 If the installation is successfully completed, but the Web Gateway or WEB UI does not work, make sure that:

– Plug-in registration is correct. – WebSphere was running when WebGateway installation started. – dmsadmin and dmuser have been created before starting the installation. Refer to 19.7.1, “Troubleshooting Web Gateway installation” on page 910 for more information.

258

Troubleshooting Tivoli Using the Latest Features

9

Chapter 9.

Patch maintenance This chapter discusses best practices for match maintenance. It has the following sections: 򐂰 Section 9.1.1, “The patch factory” on page 260 򐂰 Section 9.1.2, “Forms of patches” on page 261 򐂰 Section 9.1.4, “Understanding the prerequisites” on page 262 򐂰 Section 9.1.5, “Obtaining patch files” on page 263 򐂰 Section 9.1.6, “Turning the archives into binaries” on page 265 򐂰 Section 9.1.7, “Patch contents” on page 266 򐂰 Section 9.1.8, “Building your collection of patches” on page 266 򐂰 Section 9.1.9, “Testing patches” on page 267 򐂰 Section 9.1.10, “Applying patches to your systems” on page 268 򐂰 Section 9.1.11, “More rapid deployment” on page 269 򐂰 Section 9.1.12, “Tag files” on page 270 򐂰 Section 9.1.13, “Upgrading endpoints” on page 271 򐂰 Section 9.1.14, “Validating patch application” on page 272 򐂰 Section 9.1.15, “How to back out of a patch” on page 272 򐂰 Section 9.1.16, “Knowing what is installed where” on page 273

© Copyright IBM Corp. 2003. All rights reserved.

259

9.1 Patch application and information The information in this section is largely derived from the Tivoli Field Guide: An Approach to Patches Version 1.0 (see 1.3.1, “Tivoli Field Guides” on page 7 for information on how find this publication). Patches are fixes to the Tivoli code. If you experience problems with Tivoli products, review the patches available for your TMR that are reported to fix the problem or symptoms that match your problem. Always test a patch first in a test environment to validate that the patch does indeed fix the problem. In general, a very successful attitude about patches with regard to Tivoli products has always been, "If you do not experience the problem, do not apply the patch. Wait for the maintenance release to catch up." This still applies today. With the advent of the three-tier architecture, a different approach is advocated with patches that impact the most rapidly changing (and sometimes more temperamental) areas of the gateway and endpoint manager. The general recommendation is to install the very latest patches for these sections of code. Every attempt is made within Tivoli to do valid testing of patches against all releases that require testing. However, since the code is highly extensible and the tasks to which the Tivoli code can be subjected are widely diverse, there will be defects in patches that relate to the use of the code in some highly specific manner or under a specific set of (perhaps not reproducible or unforeseen) environmental conditions. It is not the intent of Tivoli to mass-produce single-fix patches.

9.1.1 The patch factory The patch factory is a tool that provides a front-end view of the status of patches. This tool is available for use by support engineers, account management teams, and Tivoli management. A future version will be available to customers. This tool provides a graphical view of all the patches for each product, revealing the relevant information about all patches. The patch factory provides a means for your account management team to provide all the relevant information about the impact of issues facing all customers, which is then used to prioritize the teams who produce patches.

260

Troubleshooting Tivoli Using the Latest Features

9.1.2 Forms of patches A patch may take several forms: 򐂰 Interm-fix (previously known as e-fix, is an emergency fix created to alleviate an urgent issue specifically at your site) 򐂰 Limited availability patch 򐂰 General availability patch 򐂰 Fixpack

Interm-fixes are emergency fixes meant to be deployed only by a single customer. Do not obtain interm-fixes from other customers and apply them in

your environment. The deployment of an interm-fix is generally part of a high level of customer support being provided in extenuating circumstances. This above average support is reserved for such critical situations, and is therefore not generally available to all customers. Interm-fixes are usually customized for a specific customer’s environment, and therefore will not typically be the same exact fix as what will be available in a subsequent patch (once released). Therefore, there can be serious repercussions for customers who apply interm-fixes not intended for their environment. It is important that you understand that interm-fixes are not well tested. Customers are often asked to sign a waiver in order to download the patch and apply it to their environment. Limited availability (LA) patches are a limited release of a patch just prior to general availability. The interp type that the LA patch was written for will be the same as the GA code. All other interp types are completely untested at this stage and should not be implemented. General availability (GA) patches are the patches intended to be distributed to all customers. These patches have completed the verification process. Fixpacks are cumulative general availability patches that are heavily tested and recommended to be installed in your environment. You need to install the latest versions of fixpacks as a preventive maintenance, even if you do not have an immediate problem in your environment.

9.1.3 Learning about new patches As new patches are produced, Tivoli communicates that a patch has been released in various ways, including:

Chapter 9. Patch maintenance

261

򐂰 The mailing of notices to internal resources. This notification alerts support, account management teams worldwide, Tivoli services professionals, and the IBM SPOCs (single points of contact) worldwide. 򐂰 The posting of notices on the external Tivoli Web site. 򐂰 An e-mail push notification when patches become available. The e-mail push is generated from the Tivoli Customer Support News Web-based support tool. To receive notification of patches, as well as other important information, please visit: http://www-3.ibm.com/software/sysmgmt/products/support/Tivoli_Support_Web_A nnouncements.html

9.1.4 Understanding the prerequisites Tivoli patches usually have prerequisite conditions for installing the patch including but not limited to: 򐂰 The base product that the patch affects must be deployed at the same version level as the patch reflects, for example, 3.6.1-DMN-xxxx patches can only be applied against an installation of Distributed Monitoring Version 3.6.1 (not 3.6 or 3.6.2.). There may be exceptions to this for special things like tier-2 enabling patches, but the README files will clearly state this. 򐂰 Other prerequisite patches must be applied. In the README file, there is a section that specifically deals with this. By examining the .IND file, you can note this by lines that are marked by the word "depends," such as in this example: 3.6.1- TMF-0063:depends:TMF_3.6.1. Note: Do not in any way change the .IND file. You will compromise the integrity of your Tivoli environment, and you will not be supported.

Example 9-1 contains the contents of the Software Distribution 3.6.2 patch 20 (3.6.2-COU-0020) .IND file. Example 9-1 Contents of 362COU20.IND file 3.6.2-COU-0020:description:Tivoli Software Distribution Patch 3.6.2-COU-0020:3.6.2-COU-0020 3.6.2-COU-0020:patch_for:Courier 3.6.2-COU-0020:id:CAT:Message Catalogs:both::default=/usr/local/Tivoli/msg_cat:ThisDir=@CAT@;ThisHost=@HostNa me@;ThisPkg=CAT;: 3.6.2-COU-0020:fp:CAT:generic::42:1 3.6.2-COU-0020:id:BIN:Binaries:both:@Arch@:default=/usr/local/Tivoli/bin:ThisDi r=@BIN@;ThisHost=@HostName@;ThisPkg=BIN;: 3.6.2-COU-0020:fp:BIN:solaris2::4425:2

262

Troubleshooting Tivoli Using the Latest Features

3.6.2-COU-0020:fp:BIN:hpux10::8408:3 3.6.2-COU-0020:fp:BIN:aix4-r1::5573:4 3.6.2-COU-0020:fp:BIN:w32-ix86::5825:5 3.6.2-COU-0020:fp:BIN:sunos4::5648:6 3.6.2-COU-0020:id:ALIDB:Server Database:server:@[email protected]:default=/var/spool/Tivoli:ThisDir=@ALIDB@;ThisHos t=@HostName@;ThisPkg=ALIDB;: 3.6.2-COU-0020:fp:ALIDB:solaris2::292:7 3.6.2-COU-0020:fp:ALIDB:hpux10::291:8 3.6.2-COU-0020:fp:ALIDB:aix4-r1::291:9 3.6.2-COU-0020:fp:ALIDB:w32-ix86::295:10 3.6.2-COU-0020:fp:ALIDB:sunos4::291:11 3.6.2-COU-0020:id:MAN:Man Pages:both:@Arch@:default=/usr/local/Tivoli/man:ThisDir=@MAN@;ThisHost=@HostNam e@;ThisPkg=MAN;: 3.6.2-COU-0020:fp:MAN:solaris2::21:12 3.6.2-COU-0020:fp:MAN:hpux10::21:13 3.6.2-COU-0020:fp:MAN:aix4-r1::21:14 3.6.2-COU-0020:fp:MAN:sunos4::21:15 3.6.2-COU-0020:id:SBIN:unknown:both:::ThisDir=@SBIN@;ThisHost=@HostName@;ThisPk g=SBIN;: 3.6.2-COU-0020:fp:SBIN:generic::0:16 3.6.2-COU-0020:patch_id:3.6.2-COU-0020 3.6.2-COU-0020:patch_id:3.6.2-COU-0017 3.6.2-COU-0020:patch_id:3.6.2-COU-0014 3.6.2-COU-0020:patch_id:3.6.2-COU-0011 3.6.2-COU-0020:patch_id:3.6.2-COU-0008 3.6.2-COU-0020:patch_id:3.6.2-COU-0007 3.6.2-COU-0020:patch_id:3.6.2-COU-0003 3.6.2-COU-0020:depends:COU_Upgrade_to_3.6.2 3.6.2-COU-0020:depends:TMF_3.6.4

All the prerequisite conditions must be met before a patch can be successfully deployed. Please do not attempt to change the prerequisites or install the patch without meeting the prerequisites, even though this would be possible through significant modifications of the patch files.

9.1.5 Obtaining patch files There are four main sources for receiving patch files from Tivoli: 򐂰 The Tivoli Support Web site 򐂰 The Tivoli Support FTP site 򐂰 Your account management team 򐂰 Tivoli's software production and distribution centers (when CD-ROM images are burned).

Chapter 9. Patch maintenance

263

To obtain the patches from the FTP site, use the following link on a web browser (you can also download the patches via the command line using FTP): ftp://ftp.software.ibm.com/software/tivoli_support/

You will see a window similar to Figure 9-1 on page 264.

Figure 9-1 Patches FTP site

Click on the patches folder and you will see the Tivoli patches, as in Figure 9-2 on page 265.

264

Troubleshooting Tivoli Using the Latest Features

Figure 9-2 Tivoli patches

The patches are segmented by the maintenance release level, then listed by product. If you are unable to locate a required patch, you can contact your Tivoli account team for assistance. Not all patches released are burned to a CD-ROM. This is usually reserved for maintenance release patches. To obtain a CD-ROM version of a maintenance release patch, send e-mail to [email protected]. You will be required to provide your Tivoli customer number to receive the requested patch(es).

9.1.6 Turning the archives into binaries The archive file that you download (if you download the patches from the FTP or Web sites) will be a .tar file. You will have to un-tar the file using the UNIX tar command or the Tivoli-provided tar utility on Windows NT managed nodes.

Chapter 9. Patch maintenance

265

On Windows NT systems, do not use any other tool than the Tivoli-provided tar utility to unpack the archive. Installation problems have been recorded for using other tar utilities or products, such as WinZip. Note that the files ending in .gz are gzip files, generated on UNIX boxes. With certain utilities, you can un-gzip the files. It would still be recommended to use the Tivoli tar command (in %BINDIR%\tools) to un-tar the files. The Tivoli provided tar utility can be found on all Windows NT TMRs and managed nodes and is readily available upon sourcing the Tivoli environment. Review the README files for each patch for specific instructions. Please note that there have been occasions when distributed CD-ROM media contained possible manufacturing defects that prevented the patch from applying correctly. This may be noted by an unpack error when attempting to install a FILExx.PKT file. If you encounter this type of error, attempt to obtain a second copy of the patch from a new download or from a second CD-ROM.

9.1.7 Patch contents The best place to begin to understand the contents of a patch is with the README file. The README file contains the following information about patches and should always be read when you are considering the necessity of a patch: 򐂰 A general description of the patch 򐂰 The date it was released 򐂰 The defects that were fixed in this patch 򐂰 The architectures impacted by this patch 򐂰 If database changes are included in this patch 򐂰 The files changed by the patch 򐂰 Instructions for applying the patch

9.1.8 Building your collection of patches As you obtain your patches, it is useful to compile the collection into a single repository of these images. If you employ SIS, this functionality is one of the leading features of the product. If you employ the classic Tivoli Management Framework install, the wcpcdrom command enables you to build a single install media image. This script requires that you keep track of the last media packet it created so that it can continue the next product install at that point.

266

Troubleshooting Tivoli Using the Latest Features

9.1.9 Testing patches Before installing any software in a production environment, it should be tested in a test environment. This section discusses considerations for testing a patch in your Tivoli environment. This discussion is a precursor to an impending whitepaper on building and executing a proper test environment. For this reason, this redbook does not address how to build a test environment. The general perspective is that your test environment should represent as closely as possible your production environment. In general, the testing of patches will likely take on several simultaneous fronts in your test environment: 򐂰 Everything that worked before still works. 򐂰 The problem that the patch was supposed to address in your environment is resolved. 򐂰 Outstanding issues (which you have diagnosed and been able to reproduce) are tested to determine if they are addressed by other fixes in this patch.

Testing that your current environment is still viable after the application of this patch should take the following general perspectives: 򐂰 On a product-by-product basis, you are still able to do the things you should be able to do: Software distributions still work, Remote Control still works, Inventory still works, TEC still receives events, the DM monitors you use still fire, and so on. This testing should be done across all products for every patch. 򐂰 All those same abilities still exist across TMR boundaries. 򐂰 All components of the Tivoli Management Framework are still functional (epmgr, gateway, and so on).

To accomplish these goals, develop a test plan for your environment based on your products and how you use them. The use of a formal test plan is absolutely necessary to do these tests consistently and reliably. Formally add problem testing and identification to your formal test plan. Note that if your problem identification work has not been completely successful, the patch in question might not resolve your problem. Your test plan should also include plans to test other reproducible problems in your environment. If there is the potential that this patch may resolve additional problems, be sure to test those issues after installing the patch.

Chapter 9. Patch maintenance

267

As a general part of your test plan, you should also watch that your environment performs normally. You should encounter no new inexplicable problems during your testing.

9.1.10 Applying patches to your systems One of the great new features of the three-tier architecture is the auto-upgrade feature of endpoints. First, let us talk about managed nodes and classic architecture patching, because this will be required to install patches that will impact the upgrade of endpoints. When applying patches to your systems (managed nodes), the following general rules apply: 1. Read the installation instructions thoroughly from the README file. Pay attention when a restart of the oserv, gateway, or the endpoint manager is indicated. 2. If a patch fails to apply on the first system (usually your TMR Server), stop. Do not apply the patch to more systems until you ascertain the reason for failure. 3. Do not "force" a patch into place without the specific guidance of Tivoli support. It is likely that the application of the patch failed for some reason that requires some type of work prior to applying the patch. 4. Do not attempt to create successful conditions for the application of the patch. For example, if you note that the patch fails to apply because it cannot find some files it was supposed to delete, do not create those files by touching files. The real problem determination work that should be done is to figure out why the files were not there. Beyond those general rules, apply common sense. If the patch fails to apply because of an MDist2 error, for example, attempt to determine what the repeater settings are, why they may be causing the problem, and what you can do to alleviate the problem. When a patch fails to apply to managed nodes, there are always error logs to be viewed. In SIS, there are Web browser files that contain the short version of the error information. This may or may not be sufficient. For the full text of the error, search the /tmp directory or the %dbdir%/tmp directory on NT. You will find files that contain the word “error” in their titles. Check the file date and time stamps to determine if they are the ones that you should be reading. Note that these files will be over-written in subsequent attempts, so be sure to save them to other names if you want to retain them for comparison.

268

Troubleshooting Tivoli Using the Latest Features

Sometimes the failure of a patch to apply can be understood more thoroughly by delving into the patch itself. It is possible to "unpack" a patch onto what must be a non-Tivoli system (no existing environment) so that you can examine the patch scripts for further troubleshooting purposes. This is accomplished by copying the executable sapack, which can be found in $BINDIR/TAS/Install, and the FILExx.PKT file that appears to be failing to this non-Tivoli system and then invoking: sapack -u FILExx.PKT

This file packet attempts to do its job, but then fails to remove itself. You can then examine the contents of those scripts and possibly more accurately determine what is causing the patch install to fail. You can and should also review the FILExx.CFG file that corresponds to each file packet in a patch. These configuration files list the binaries that exist in each file packet and the general defaults that will be used in the execution of the installation of that file packet.

9.1.11 More rapid deployment As an alternative method for more rapid patch deployment on managed nodes, be sure to consider a modified gold image process. This can be very significant, especially in environments where network bandwidth is at a premium. This involves starting with one of the systems from our test environment that (as a good test environment should) exactly mirrors your production managed nodes. Apply the patch to this system and then make a tar file or file package with the binaries and libraries (and sections other than the database). The following two approaches that may be employed: Build a CD-ROM and use it to update the binaries and libraries of the managed node, and then immediately follow with the installation of the patch (usually just the database portion). This can be done by a script from the CD-ROM using wpatch, or by coordinated effort using the GUI classic or SIS install. Deploy the gold-image binaries to your systems using the Software Distribution implementation that you have employed. This can be done over a very long period of time while you complete your rigorous testing procedures. When you are satisfied with the test results, do a commit distribution and then do your patching through the GUI or script. In this way, you lessen the impact of applying the patch by spreading the work over a longer period of time.

Chapter 9. Patch maintenance

269

9.1.12 Tag files When patches are applied to managed nodes, there are a series of marker files deployed to keep the installation process quick and clean. These marker files are the so-called tag files found in each main area of Tivoli within hidden (.installed) directories. These marker files are not indications that a patch was installed successfully with 100% accuracy. With the classic Tivoli Management Framework installation methodology, these tag files are text files whose name indicates the patch type, as can be seen in the following example from $BINDIR/.installed. Example 9-2 Tag files -rw-rw-rw- 1 0 0 19 Mar 29 1999 ADE_ADE1 -rw-rw-rw- 1 0 0 18 Mar 29 1999 ADE_BIN -rw-rw-rw- 1 0 0 19 Mar 29 1999 ADE_LBIN -rw-rw-rw- 1 0 0 18 Mar 17 1999 Admin_BIN -rw-rw-rw- 1 0 0 18 Mar 17 1999 CourierGw_BIN -rw-rw-rw- 1 0 0 18 Mar 17 1999 Courier_BIN -rw-rw-rw- 1 0 0 18 Mar 17 1999 SIS_3.6_BIN -rw-rw-rw- 1 0 0 18 Dec 15 15:25 Sentry2.0.2_BIN -rw-rw-rw- 1 0 0 7 Jun 16 1999 TMF_3.6.1_BIN -rw-rw-rw- 1 0 0 22 Aug 10 1998 TMF_BIN -rw-rw-rw- 1 0 0 18 Dec 15 15:40 pa_3.6.1DMN-MR_BIN Each of these tag files contains the text: PUSHED::BIN

Example 9-2 indicates the name of the machine that the files were originally installed upon (can be modified for gold image processes to say goldimage, for example) and the type of patch (in this case, binaries). With the SIS installer, these tag files were changed. They still exist; however, they only contain the name of the node. SIS converts tag files to its format from the classic install method. Those tag files cannot be automatically converted back for the purposes of doing a classic install; it will result in a failure to install. Sometimes the tag files can be left written in the wrong place (usually in the parent directory of .installed). This can indicate a failure in the patch process.

270

Troubleshooting Tivoli Using the Latest Features

When a patch is successfully deployed, these files are in the parent directory, and then moved into the .installed directory during the clean up activities. These tag files are not the registration of the patch, as you will see during the install process. Using the SIS installer, you might need to override the tag files to repeat your attempt to install. You can read how to do this in the SIS manual. In the classic Tivoli Management Framework installation, you can override a prior installation by manually removing the tag files from the $BINDIR, $DBDIR, and $LIBDIR .installed directories before reinstalling, or with the wpatch command, such as the following: wpatch -c -i BIN=BINDIR! DBDIR=DBDIR! LIB=$LIBDIR!

9.1.13 Upgrading endpoints There are two types of endpoint upgrades: 򐂰 Endpoint binaries (daemon) 򐂰 Endpoint application binaries

To upgrade the endpoint binaries themselves, use the Tivoli command wadminep, which is run from a managed node. Note: We recommend you not to use the login_policy scripts to upgrade endpoints. Any w-commands should run outside of any endpoint policy scripts for performance reasons.

For large numbers of endpoints, use tasks or scripts to contact a manageable number of endpoints (preferably during a time of low TMR activity) and upgrade them. Do not attempt to run all the endpoints at once in batch mode. Those endpoints that cannot be contacted from the first list can get added to the next batch. This may span a period of several days or weeks, depending on the environment. The upgrade of binaries of applications on the endpoints happens without intervention as the binary is called. When the binary is called, a mismatch with the copy on the gateway is detected, and the new copy is automatically downloaded to the endpoint before execution. This is completely transparent to you.

Chapter 9. Patch maintenance

271

9.1.14 Validating patch application When applying a patch to your environment, make use of the error reporting options made available through the Tivoli installation tools. However, with certain limitations (human error, code failures, and so on), a patch may fail to install without being noticed. Make validating patch installations a part of your standard installation process. This helps to avoid undetected problems that can propagate through several backups. It also ensures that if the problem you expected the patch to fix continues that it is not due to a failed patch installation. The deployment process in large networking environments can be lengthy and not insignificant to repeat. The most certain way to determine that a patch properly applied to your systems is to compare the file date and time stamp and file size listed in the image.rpt file (when included with the patch) with those on your system. This also includes permissions. This .rpt file can be found in the patch directory or subdirectory that was created when the patch tar file was unpacked. If the patch impacts code that is downloaded to run on the endpoints, you can and should compare both the copy on an endpoint and the copy on the gateway after attempting an operation that invokes it. For example, the inventory scanner program executable is involved in an inventory scan. Check scanner.exe, which exists on the gateway in the lcf_bundle directory and in the related directory on the endpoint.

9.1.15 How to back out of a patch In cases where you wish to uninstall a patch, whether it be partially or completely installed, this is called backing out. To back out, you have to restore your Tivoli environment to its state prior to the patch installation.

Restoring managed nodes With the current Tivoli architecture, there is no good method provided from within the Tivoli product to back out of an installation of a patch on managed nodes. However, the following two approaches are suggested methods for rolling back to a previous state: 1. Restore file system backups from before the installation of the patch. Restore the Tivoli database to the backup that corresponds to that file system backup. 2. Create a tar file of the Tivoli directory as if you were creating the previously mentioned gold image to deploy a new patch. This should be done before the application of the patch. You will need one tar file for each platform (provided that all managed nodes of each platform type have identical code

272

Troubleshooting Tivoli Using the Latest Features

installations). You can restore this single tar file to every managed node of that platform type and restore the Tivoli database backup to each node (if necessary). It is important to note that in this case we are talking about identical product installs across all managed nodes of each platform type. You must create a backup or tar file for each different product install base and for each different platform type. Also, it does not matter if your managed node was created on the first day of your Tivoli implementation at lower product versions and later upgraded to the current level or if the current Tivoli product versions were loaded as a new installation. In order for Tivoli products to be functional, all the relevant code is identical. There may be code that is no longer employed in a directory for a managed node that has been deployed a long time that is not in the same directory on newer managed nodes. This code cannot be used. If it were, it would be present on all your managed nodes.

Restoring endpoints If you restore the file system backup to your gateway and TMR Server and issue the wadminep upgrade command, your endpoint LCF code will restore the version restored on its gateway. Note: Instead of wadmin upgrade you can also use wepupgrd command. They both write to the same log, epupg.log. For more information on wepupgrd, please refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806.

9.1.16 Knowing what is installed where There are two ways to view what products and patches are installed in a Tivoli environment. One is to view all products and patches as a whole, installed on a particular node. The other is to view each product or patch individually across all nodes. You will find two scripts in Appendix C, “Scripts” on page 1065 that will allow you to quickly determine installations across your enterprise from either view. Tivoli provides the wlsinst command with numerous options to determine installed products and patches; however, it is sometimes cumbersome to find the information you are looking for. The following scripts are customized to help find specific information on a particular product or node and are seen as an enhancement to the wlsinst command.

Chapter 9. Patch maintenance

273

While the authors have provided scripts to automate some of the discovery process, it is worthwhile to explain how the information is accessed. To determine what products are installed on a TMR, execute the following command: wls -l /Library/ProductInfo.

See Example 9-3 for output of the command. This command identifies what products are installed, although it does not identify where they are installed or at what patch level. Example 9-3 wls -l /Library/ProductInfo output bass:/#wls -l /Library/ProductInfo 1438246632.1.560#TMF_Install::ProductInfo# 1438246632.1.614#TMF_Install::ProductInfo# 1438246632.1.705#TMF_Install::ProductInfo# 1438246632.1.707#TMF_Install::ProductInfo# 1438246632.1.709#TMF_Install::ProductInfo# 1438246632.1.737#TMF_Install::ProductInfo# 1438246632.1.739#TMF_Install::ProductInfo# 1438246632.1.783#TMF_Install::ProductInfo# 1438246632.1.828#TMF_Install::ProductInfo# 1438246632.1.962#TMF_Install::ProductInfo# 1438246632.1.1060#TMF_Install::ProductInfo# 1438246632.1.1077#TMF_Install::ProductInfo#

Courier Sentry2.0.2 TecMonitors TmeMonitors NTMonitors UnixMonitors CourierGw Inventory InventoryGW ACF ADE AEF

To determine if a particular product is installed on a node, you can execute the following command: idlcall $OID _get_host_locations '{ 1 "$HostName" }' | idlarg 1

Where $OID is the object ID number of a TMF_Install::ProductInfo object and $HostName is the name of the managed node or TMR you are inquiring about. In Example 9-4, we execute the command against the Software Distribution object. Example 9-4 dlcall $OID _get_host_locations output bass:/#idlcall 1438246632.1.560 get_host_locations '{ 1 "bass" }' | idlarg 1 5bass:/#

If the return code is anything other than a 0, the product is installed on that node. In the above example, the return is 5. We get the return code in numeric format because we are piping the complete return through idlarg and asking for the first instance (the idlarg command extracts individual arguments from a list). If we did not pipe the results through idlarg, we get the return seen in Example 9-5 on page 275.

274

Troubleshooting Tivoli Using the Latest Features

Example 9-5 dlcall $OID _get_host_locations output bass:/#idlcall 1438246632.1.560 get_host_locations '{ 1 "bass" }' { 5 { "bass" "/usr/local/Tivoli/msg_cat " "CAT" "aix4-r1" "/mnt/CFG/FILE1.CFG" "installed" } { "bass" "/usr/local/Tivoli/bin aix4-r1 " "BIN" "aix4-r1" "/mnt/CFG/FILE2.CFG" "installed" } { "bass" "/var/spool/Tivoli bass.db " "ALIDB" "aix4-r1" "/mnt/CFG/FILE7.CFG" "installed" } { "bass" "/usr/local/Tivoli/man aix4-r1 " "MAN" "aix4-r1" "/mnt/CFG/FILE17.CFG" "installed" } { "bass" "/usr/local/Tivoli/bin generic " "SBIN" "aix4-r1" "/mnt/CFG/FILE22.CFG" "installed" } }bass:/#

This provides us with further information on the installation of a particular product. The number 5 indicates that there are five components installed on that node. In this example, we see that the CAT, BIN, ALIDB, MAN, and SBIN components are installed, as well as the location and architecture type. While there are many attributes associated with a ProductInfo object, there are four attributes that are helpful when researching installation information: 򐂰 OID 򐂰 Label 򐂰 Description 򐂰 Revision

We can obtain the OID and Label by executing wls -l/Library/ProductInfo. Once we have the OID, we can run idlcall to get the description and revision information: idlcall $OID _get_description idlcall $OID _get_revision

Example 9-6 continues with the same Software Distribution example. Example 9-6 idlcall _get_revision and idlcall _get_ description bass:/#idlcall 1438246632.1.560 _get_revision "3.6" bass:/#idlcall 1438246632.1.560 _get_description "TME 10 Software Distribution, Version 3.6" bass:/#

To identify the patches installed for a particular product, run the following command: idlcall $ProductOid _get_patches

Example 9-7 on page 276 contains the output of the command.

Chapter 9. Patch maintenance

275

Example 9-7 idlcall $OID _get_patches bass:/#idlcall 1438246632.1.560 _get_patches { 2 1438246632.1.613#TMF_Install::PatchInfo# 1438246632.1.742#TMF_Install::PatchInfo# }bass:/# bass:/#

This will provide you with the patch objects installed for a given product. This does tell you where the patch is installed. You can use the same idlcall $OID _get_host_locations command to determine whether or not it is installed on a particular node. To get the label of a particular patch, execute the following: idlcall $PatchOid _get_label

Many patches supersede, or are inclusive of, previous patches. Upon installation, the previous patches are registered as if they had been installed separately. To identify what previous patches are registered you need to view the alias list for each patch. To determine the aliases for each patch, execute the following: idlcall 'wlookup NameRegistry' lookup '"PatchInfo" "$PatchLabel"' | idlarg 3 |idlarg 2

Where $PatchLabel is the label of the patch as identified above. Example 9-8 contains the output of the command. Example 9-8 Patch alias information bass:/#idlcall `wlookup NameRegistry` lookup '"PatchInfo" "3.6.2-COU-0020"' | idlarg 3 |idlarg 2 { 7 "3.6.2-COU-0003" "3.6.2-COU-0007" "3.6.2-COU-0008" "3.6.2-COU-0011" "3.6.2-COU-0014" "3.6.2-COU-0017" "3.6.2-COU-0020" } bass:/#

The first script, “prodpatch.pl” on page 1076, allows you to view all products and patches installed on a given managed node with options. The second script, “installed.pl” on page 1080, views each product individually. This script will allow you to select which product you would like the installation information from; it will then list the product and patch attributes for all nodes in the TMR. These scripts use the commands described above as the base for gathering patch installation information. These scripts can be modified or you can create your own scripts using these commands.

276

Troubleshooting Tivoli Using the Latest Features

10

Chapter 10.

Backup and restore Tivoli keeps all its management data and resource definitions in a distributed database. The TMR Server’s database contains all the objects from the whole TMR. Each managed node keeps a subset related to itself. These databases must be backed up on a regular basis. The backup should be done not just at scheduled regular intervals but before and after a big change, such as a product installation. The following topics are discussed in this chapter: 򐂰 Section 10.1, “Backup process” on page 278 򐂰 Section 10.2, “Restore process” on page 293 򐂰 Section 10.3, “Troubleshooting backup and restore operations” on page 299

© Copyright IBM Corp. 2003. All rights reserved.

277

10.1 Backup process You can back up the database for one or several clients and the TMR Server or all of the TMR from either the command line or the desktop. Make this backup immediately or schedule a regular backup operation. There are two methods for restoring a database. The first is the standard method to use if a system is otherwise operational (and the oserv is running). The second is referred to as a rescue operation and is used if the oserv cannot be started. An important part of any maintenance program is verified Tivoli database and file system backups. This ensures the stability and recovery of the Tivoli environment in case of database corruption or disk failure. Ideally, these backups should be archived using a standard backup tool, such as Tivoli Storage manager.

10.1.1 Backup roles and access rights There are a number of security features used in the backup process. You need to check the following: 򐂰 You must have the backup or super role in the TMR to create a backup and the restore or super role in the TMR to perform a restore. 򐂰 The Administrator also requires a valid user login name and a group name for the machine in which the backup file will be stored. 򐂰 To change the user login name and group name, open the Administrators window, right-click the Administrator icon and select Edit Properties.... Be sure you have the desired IDs and then restart the Tivoli Desktop if a change was made. These changes must be done from the Tivoli Desktop, because there is no command for this action. 򐂰 The ID used will need write permissions for the directory that will contain the backup file.

The following is a list of ways to ensure correct access: 򐂰 Create a new backup group of all administrators who will perform backups and assign ownership of the backup directory to that new group. 򐂰 Change the permissions on the backup directory to allow anyone to write to the directory. (This is not recommended as a permanent solution). 򐂰 Create a task for administrators with the backup role that runs as root and performs the wbkupdb command. 򐂰 Schedule the backups as the root administrator and let Tivoli perform the work.

278

Troubleshooting Tivoli Using the Latest Features

򐂰 Change the user login name and group name to a valid one that has enough permissions to make the backup. The steps to change the user login name and group name of a administrator are explained in 10.1.1, “Backup roles and access rights” on page 278. Note: Be careful when writing backups to an NFS file system. You must consider the implications of using root on NFS file systems.

10.1.2 Database It is important to point out that the built-in database backup tools provided will only back up the Tivoli object database and associated files, not the binaries or libraries. In order to back up your entire Tivoli environment, you must include your TMR Servers in your regular file system backups It is advised to periodically check the consistency of the database with a wchkdb -u. In addition, you might also consider running running bdbx as well as part of the backup procedure. However, the failure to achieve a clean wchkdb or bdbx should not stop you from making database backups on scheduled intervals. Many errors can be corrected, for example, upon running wchkdb -u a second time. In the worst case, should your Tivoli database reach an unrecoverable state, a recent backup containing a functioning Tivoli database with some errors is better than no database at all. Each environment is different, and the requirements for maintaining database backups differ as well. Backups should be performed often while developing the Tivoli environment and frequently once Tivoli is deployed. A simple way to determine the frequency for your database backups is to determine how much data you can afford to lose. A good practice is to perform backups immediately before and after a product installation or major maintenance procedure, such as the creation of numerous managed nodes. These backups should be kept on separate media and in a secure place.

Desktop The simplest method to scheduling database backups is through the Tivoli Desktop. Use the following steps to schedule a backup of one or more machines in the TMR: 1. Select Backup... from the Desktop menu to display the Backup Tivoli Management Region dialog.

Chapter 10. Backup and restore

279

Figure 10-1 Backup Tivoli Management Region dialog

2. Select one or more managed nodes from the Available managed nodes scrolling list and press the left arrow button. The chosen managed nodes are moved from the Available managed nodes scrolling list to the Backup these managed nodes scrolling list. If you wish to back up the entire TMR, select all managed nodes listed in the Available managed nodes scrolling list. This dialog also allows you to set the destination node of the save image and the device and file name of the backup. The %t shown includes the date in the file name. 3. Specify the machine you wish to save the backup image on in the Save image on node field. 4. Specify the device or file name of the backup in the Device/File field. The %t shown adds the date in the file name. 5. Press the Estimate Backup Size button. The Estimate Backup Size dialog is displayed. 6. Assuming the estimated size can be adequately handled, press Close to return to the Backup Tivoli Management Region dialog. See Chapter 5, “Temporary Backup File Considerations”, of Tivoli Enterprise Internals and Problem Determination, SG24-2034 for more information on space requirements for Tivoli backups. 7. To begin an immediate backup of the selected managed nodes, press Start Backup. The Backup Status dialog is displayed, and the backup operation begins.

280

Troubleshooting Tivoli Using the Latest Features

8. To schedule the backup to occur at a later time or repeat on a timed schedule, press Schedule Backup (see Figure 10-2). 9. Complete the Add Scheduled Job dialog, and press Schedule Job & Close. The Add Scheduled Job dialog closes and you return to the Backup Tivoli Management Region dialog.

Figure 10-2 Add Scheduled Job dialog

Chapter 10. Backup and restore

281

Upon completion of a scheduled job, the scheduler provides notification of success or failure using one of four notification options: 򐂰 Post Tivoli Notice 򐂰 Post Status Dialog on Desktop 򐂰 Send E-mail to 򐂰 Log to File

Therefore, after each backup, a notification is sent that an individual must monitor in order to take action in care of errors. Especially in large enterprise environments, notification of all database backups can quickly become tedious. Your log files and notice groups may grow too large and e-mail quickly adds up, making it difficult to catch an error notification among many successful notifications. Also, whereas backup errors or failure are important so that action can be taken, no action is required for successful backups, and therefore notifications reporting a successful backup may not be needed. In order to automate the backup monitoring process, you can use custom scripts to filter notices to Tivoli notices and log files. See “Custom scripts” on page 287 for more information about using custom scripts to log only failed backups. Another way to facilitate easier backup status monitoring is to create a custom notice group specifically for backup notifications. A custom notice group for database backups sets a single reference location to determine success or failure. Other currently defined notice groups may become too cluttered with administrative information to easily find failures. With the creation of a custom notice group, we can now select it to log our database backups, as shown in Figure 10-3 on page 283.

282

Troubleshooting Tivoli Using the Latest Features

Figure 10-3 Select Notice Groups dialog

Upon completion of a backup, the corresponding notice gets posted in this notice group, as shown in Figure 10-4 on page 284.

Chapter 10. Backup and restore

283

Figure 10-4 Notice Group Messages dialog

A custom notice group is also beneficial for implementing custom backups via scripts. See “Custom scripts” on page 287 for more information.

10.1.3 Running backup from the command line The wbkupdb command backs up and restores Tivoli databases. You can provide a list of managed node names as arguments to the wbkupdb command. We give a couple of examples here, but for more information, see the on-line manual page for the wbkupdb command or the Tivoli Management Framework Reference Manual Version 4.1, SC32-0806.

284

Troubleshooting Tivoli Using the Latest Features

Important: Do not forget that wbkupdb only backs up $DBDIR odb type files (five all together). It will not backup tasks, or TEC rulebases, or any TLL or custom methods. You can write a simple script that will collect these files and put them under $DBDIR and then the wbkupdb will pick them up.

Examples of the wbkupdb command The following example backs up the TME10 database for all managed nodes in the TMR from which the wbkupdb was run. The backups are written to the user-defined file /usr/backups/TMR1.bk: % wbkupdb -d /usr/backups/TMR1.bk

Note: The backup file will be created in the specified directory in the TMR Server unless you specify that the backup will be created in another managed node. Use the -h option to specified the managed node in which the backup file will be created.

The second example backs up the database of a single managed node, rh0255a. In this example, a destination directory and file name are not specified. The backup is, therefore, written to the directory containing the Tivoli database directory under a subdirectory called backups. The subdirectory is created if it did not exist when the wbkupdb command was run: % wbkupdb rh0255a

10.1.4 Backup process behind the scenes The following steps are taken by the backup process during a database backup: 1. The list of clients is generated if it was not passed through the backup dialog or the wbkupdb command. This list can be generated outside of the backup process using the command: wls /Library/BackupClient

2. The list of files needing to be backed up is found for the server and for the clients. These lists can be generated outside the backup process using the following commands: BIO=’wlookup -r Classes TMF_BackupImpl’ idlcall $BIO _get_server_files idlcall $BIO _get_client_files

3. The backup host is contacted, and the backup file is created and opened. 4. A popup dialog stating that the backup process is beginning is displayed on active desktops.

Chapter 10. Backup and restore

285

5. A transaction is begun. 6. For each managed node passed in through the dialog, command line, or generated list, the backup process will do the following: a. Determine if the managed node is the TMR Server or a client. b. Contact the managed node and start snapshot method passing in the list of files to be backed up. The method executable is a shell script called $BINDIR/TAS/BACKUP/snapshot.sh for UNIX or %BINDIR%\TAS\BACKUP\snapshot for Windows NT. c. The managed node synchronizes its database to write any outstanding transactions. d. Database files are tarred and compressed. e. The database is synchronized a second time, and the files are tarred and compressed and compared to the first snapshot. If they match, the backup process continues; otherwise, this step is repeated up to three more times before failing. f. Data is sent back to the server using an established IOM channel. g. The TMR Server backs up its own database. By default, backups are put into a directory called backups, one directory up from $DBDIR ($DBDIR/../backups), with a name that includes a time and date stamp. h. The transaction ends. i. A notice is logged and sent to a notice group. j. If the backup process fails, then messages may be written to the backupdb.log file.

10.1.5 Temporary backup file considerations As you can see from the previous list of backup steps, when Tivoli performs a backup, it creates two copies of the backup file to compare. If you back up from the desktop, this compare file is written to the database directory. If you run backup from the command line using the wbkupdb command, the compare file is written to the current directory. In both cases, the directory in which the file is written should have as much available disk space as the largest compressed database. If the directory does not have enough space, you can change the directory in which this file is created by setting a Tivoli oserv environment variable. This is the TMPDIR environment variable on UNIX, or the TEMP, or TMP environment variables on Windows NT Systems.

286

Troubleshooting Tivoli Using the Latest Features

The following steps detail how to set the TMPDIR environment variable specifically for the oserv. You can also use these steps to set the TEMP and TMP variables by substituting TMPDIR with the appropriate variable name: 1. Retrieve the current environment settings and write them to a file using the following command: odadmin environ get > file_name

2. Append the new TMPDIR setting to the file you created: echo “TMPDIR=/home/big_dir” >> file_name

1. Write the environment settings back to Tivoli using the following command: odadmin environ set < file_name

The temporary backup file is deleted when the backup procedure completes.

Custom scripts Using custom scripts to schedule backups allows increased configuration options and error reporting. These scripts can be configured to use different scheduling options, such as: 򐂰 Tivoli Workload Scheduler 򐂰 Tivoli Desktop Scheduler (via tasks) 򐂰 UNIX cron facility

The primary benefit of using custom scripts is tailored reporting. In certain instances, such as with backups, you may only wish to receive failure notices in your reports. Custom scripting enables you to accomplish this, automating the error notification process and reducing the administrative workload of having to review logs or notice groups. The script shown in Example 10-1 on page 288 runs wbkupdb on every managed node in a TMR and logs only backup error notifications to the custom notice group CST DB Backups. You can run wbkupdb without any flags to back up all the managed nodes within a TMR. However, if even one managed node is unreachable, a failure notice results for the entire process. This requires an administrator to parse through the logs or notice groups to determine why the backup failed. This script, however, first pings the first managed node listed in the TMR to check for a working connection. If the managed node is unreachable, an error is reported before attempting to run wbkupdb. If it is reachable, it runs wbkupdb and only reports if the backup was unsuccessful. It then repeats this process for each managed node in the TMR. Automating your TMR backups this way reduces administrative and processing overhead because:

Chapter 10. Backup and restore

287

򐂰 It only runs wbkupdb if the node is reachable. 򐂰 It only reports errors for the managed nodes that were unreachable or failed wbkupdb for other reasons. 򐂰 It processes each managed node separately, so that the logging occurs separately for each node, eliminating the need to review long notices or log files.

An improvement to the custom backup script in Example 10-1 would be to send the error notices directly to the Tivoli Enterprise Console instead of the notice group or log file. This could be accomplished by using the following command: wpostemsg -r WARNING -m "ERROR: The database backup of $BackupHost, was not successful." hostname=$WLOCALHOST TMRMGMT_DB_BACKUP SENTRY

By sending the errors directly to TEC, an administrator does not need to continue to review log files or notice groups for errors. All errors will automatically be sent to TEC and can be monitored from there. Example 10-1 Custom backup script #!/usr/bin/perl # Set the Tivoli Environment Variables &Set_Env(); # Backup all known Hosts for(`wls /Library/ManagedNode`) { ($Host)=split(/\s/,$_); &Backup($Host); } #Subroutine to complete the backup(s) sub Backup { local($BackupHost) = @_; chomp $BackupHost; $BackupHost =~ s/\s//g; `wping $BackupHost`; if($? == 0) { `wbkupdb -d $ENV{'DBDIR'}/../backups/$BackupHost%t $BackupHost`; if($?) { $Error="ERROR: The database backup of $BackupHost, was not successful."; `echo $Error \|wsndnotif "CST DB Backups" Error`; } } else { $Error="ERROR: The ManagedNode, $BackupHost, was not reachable via a wping. A database backup was not attempted."; `echo $Error \|wsndnotif "CST DB Backups" Error`;

288

Troubleshooting Tivoli Using the Latest Features

} } # Subroutine to set the Tivoli Environment variables sub Set_Env { $TivEnv = "/etc/Tivoli/setup_env.sh"; @EnvVal = `. $TivEnv; env`; foreach $EnvValu (@EnvVal) { chop($EnvValu); ($Key, $Val) = split(/=/, $EnvValu); $ENV{$Key} = $Val if(defined $Val); } }

10.1.6 Binary backups Since the Tivoli wbkupdb command only backs up the Tivoli database, you still have to make arrangements to back up the Tivoli binaries. Regularly scheduled backups should be part of any good server maintenance routine. Additionally, be sure to perform a system backup prior to any Tivoli patch installations. It is recommended that you shut down the oserv before performing a binary backup (with odadmin shutdown, for example). A local backup of the binaries while the oserv is down will take much less time and reduce server load than writing to an offline device with the oserv running. If you are also performing a system backup to include the Tivoli database files, it ensures you capture a clean image. See 10.1.7, “File system backups” on page 293 for more details. The following example (Example 10-2) show a custom backup script that is an example of a scripted binary backup that has a minimal impact on the database. It archives both your binary and database files into a compressed tar file, ready to send to your standard offline backup tool. Example 10-2 Custom binary backup script (part 1) #!/usr/bin/perl $Binary_Dir="/usr/local/Tivoli"; $Database_Dir="/var/spool/Tivoli"; $WorkingDir="/CID"; # Directory to place tar file $InfoFile="InfoFile.dat"; &Set_Env(); print "The required Space to complete the backup is ".&Calc_Size()." blocks\n";

Chapter 10. Backup and restore

289

&Backup_Info(); &Backup_Database(); &Archive_Files(); #-------------------------------------------------------# Subroutines below here #-------------------------------------------------------sub Set_Env { $TivEnv = "/etc/Tivoli/setup_env.sh"; @EnvVal = `. $TivEnv; env`; foreach $EnvValu (@EnvVal) { chop($EnvValu); ($Key, $Val) = split(/=/, $EnvValu); $ENV{$Key} = $Val if(defined $Val); } } sub Calc_Size { my $Size_Needed=0; ($Size) = split(/\s/,`du -s $Binary_Dir`); $Dir_Size = int $Size; $Size_Needed=$Size_Needed+$Dir_Size; ($Size) = split(/\s/,`du -s $Database_Dir`); $Dir_Size = int $Size; $Size_Needed=$Size_Needed+$Dir_Size; $Size_Needed=$Size_Needed*2; print"$Size_Needed\n"; return $Size_Needed; } sub Backup_Info { open(INFO, ">$WorkingDir/$InfoFile") or die; print INFO "Date:\t".`date`.""; print INFO "TMR:\t".`wtmrname`; print INFO "Host:\t",`hostname`."\n"; print INFO "COMMAND : odadmin odinfo\n".`odadmin odinfo`."\n\n"; print INFO "COMMAND : wlsinst -a\n".`wlsinst -a`."\n\n"; print INFO "COMMAND : odadmin odlist\n".`odadmin odlist`."\n\n"; print INFO "COMMAND : wlookup -ar Gateway\n".`wlookup -ar Gateway`."\n\n"; close(INFO); } sub Backup_Database { `wbkupdb -fd $ENV{'DBDIR'}/../backups/$BackupHost%t`; if($?) { print STDERR "The database backups failed!\n"; print STDERR "Would you like to continue?(y/n)";

290

Troubleshooting Tivoli Using the Latest Features

$Ans=lc ; chomp $Ans; if ($Ans ne "y") { print STDOUT "\nExiting Backup routine.\n\n"; exit 0; } print"Continue\n"; } } sub Archive_Files { print"Shutting down the oserv....."; `odadmin shutdown`; `tar -cvf $WorkingDir/Archive.tar $Binary_Dir $Database_Dir`; if($?) { print STDERR "The archive of the Tivoli Directory structures failed!\n"; print STDERR "Exiting Archive routine\n\n"; `odadmin start`; exit 1; } print"Starting oserv\n"; `odadmin start`; if($?) { print STDERR "An error was reported during the start of the oserv!\n"; exit 1; } `compress $WorkingDir/Archive.tar`; }

This script also provides you with some details on what your Tivoli configuration was at the time of the backup. Example 10-3 contains an example of the InfoFile.dat that is generated when running the custom binary backup script. Example 10-3 Custom binary backup Date: TMR: Host:

Tue Feb 6 10:43:38 CST 2001 bass-region bass

COMMAND : odadmin odinfo Region = 1438246632 Dispatcher = 1 Interpreter type = aix4-r1 Database directory = /var/spool/Tivoli/bass.db Install directory = /usr/local/Tivoli/bin Inter-dispatcher encryption level = simple Kerberos in use = FALSE Remote client login allowed = TRUE

Chapter 10. Backup and restore

291

Install library path = /usr/local/Tivoli/lib/aix4-r1:/usr/lib:/tmp/iblib/aix4-r1:/usr/lib Force socket bind to a single address = FALSE Perform local hostname lookup for IOM connections = FALSE TME 10 Framework (tmpbuild) #1 Thu Oct 26 23:17:15 CDT 2000 Copyright Tivoli Systems, 1997. All Rights Reserved. Port range = (not restricted) ALLOW_NAT = FALSE State flags in use = TRUE State checking in use = TRUE State checking every 180 seconds Dynamic IP addressing allowed = FALSE Transaction manager will retry messages 4 times.

COMMAND : wlsinst -a *----------------------------------------------------------------------* Product List *----------------------------------------------------------------------* TME 10 TME 10 TME 10 Tivoli Tivoli TME 10 TME 10 TME 10 TME 10 TME 10

Framework 3.7 Software Distribution, Version 3.6 Software Distribution Gateway, Version 3.6 Inventory, Version 3.6.2 Inventory, Version 3.6.2, Gateway Distributed Monitoring NT Monitors 3.6 Distributed Monitoring 3.6 Distributed Monitoring TEC Monitors 3.6 Distributed Monitoring TME Monitors 3.6 Distributed Monitoring Unix Monitors 3.6

*--------------------------------------------------------------------* Patch List *--------------------------------------------------------------------* Tivoli Software Distribution Gateway Patch 3.6.2-COU-0018 Tivoli Software Distribution Patch 3.6.2-COU-0020 Tivoli Distributed Monitoring Patch 3.6.2-DMN-0001 Tivoli Distributed Monitoring Upgrade, Version 3.6/3.6.1 to 3.6.2 Tivoli Inventory 3.6.2 Gateway, Patch 3.6.2-INV-0004 Framework Patch 3.7-TMF-0003 (build 12/13) TME 10 Software Distribution Upgrade to Version 3.6.2 TME 10 Software Distribution Gateway, Upgrade to Version 3.6.2 Tivoli Distributed Monitoring Upgrade, Version 3.6/3.6.1 to 3.6.2, NT Monitors Tivoli Distributed Monitoring NT Monitors Patch 3.6.2-DMN-0001 Tivoli Distributed Monitoring Upgrade, Version 3.6/3.6.1 to 3.6.2, TEC Monitors Tivoli Distributed Monitoring Upgrade, Version 3.6/3.6.1 to 3.6.2, TME Monitors

292

Troubleshooting Tivoli Using the Latest Features

Tivoli Distributed Monitoring Upgrade, Version 3.6/3.6.1 to 3.6.2, UNIX Monitors Tivoli Distributed Monitoring Unix Monitors 3.6.2-DMN-0001 COMMAND : odadmin odlist Region Disp Flags Port IPaddr Hostname(s) 1438246632 1 ct94 9.3.187.155 bass.itsc.austin.ibm.com,bass 5 ct94 9.3.187.178 mackeral.itsc.austin.ibm.com 6 ct94 9.3.187.145 stuttgart.itsc.austin.ibm.com COMMAND : wlookup -ar Gateway mackeral-gateway 1438246632.5.25#TMF_Gateway::Gateway# stuttgart-gateway 1438246632.6.25#TMF_Gateway::Gateway#

10.1.7 File system backups Most organizations have standard system backup procedures in place for servers. As with other servers, you need to pick a time when the TMR Server is not too busy. An ideal time to schedule regular file system backups is immediately after your regular Tivoli database (wbkupdb) backups. This provides a second backup of the Tivoli environment in case of failure. Note: Do not rely solely on file system backups in place of your Tivoli database (wbkupdb ) backups. If a file system backup occurs during a transaction while some data from the process in memory has not completed writing to disk, the captured image is not complete. A restore attempt of the incomplete image would result in inconsistency errors.

10.2 Restore process If a disk drive on a Tivoli client fails, or the file system that stores the management information gets corrupted or is lost, the management data can be recovered by restoring the TMR Server and/or Tivoli clients from an earlier backup. Tivoli usually reports irreparable damage to the object database using messages, such as Persistent storage failure. When this happens, there is usually little option but to revert to a backup of the database. You can also use this process to restore the database after upgrading a client or the server to a new version of the operating system. You can restore one client, several clients, the TMR Server, or the entire TMR. Since the nature of a restore operation affects the underlying database of a client or the server, you can only restore the data from the command line.

Chapter 10. Backup and restore

293

There is a distinction between a standard restore operation and a rescue operation. A standard restore can take place when a managed node’s object dispatcher is running. If the system is in such a state that even the oserv cannot be started, then there is a rescue procedure to restore the database.

10.2.1 Restore roles and access rights You must have the restore or super role in the TMR to perform a restore. If you are performing a rescue operation, you must be Administrator (Windows NT) or root (UNIX) on the machine where the crashed database is located. Note: As with the backup, the Administrator also requires a valid user login name and a group name for the machine on which the backup file is stored. They also needs read permissions for the directory that contains the backup file. See also 10.1.1, “Backup roles and access rights” on page 278 for more information.

10.2.2 Restore examples The wbkupdb command not only backs up but also restores Tivoli databases. You can provide a list of managed node names as arguments to the wbkupdb command. There are two options that can be used during a restore: 򐂰 -r, which causes the oserv to re-execute once the restore is complete to pick up the changes. 򐂰 -r -R, which copies backup files (*.restore) to the database directory without restarting the oserv. The changes are picked up at the next oserv start or restart using odadmin reexec or one of its derivatives.

Here, we present an example of using wbkupdb to restore a system. For further information, see the online manual page for the wbkupdb command or the Tivoli Framework 3.7.1 Reference Manual, SC31-8434. The following command example restores a single managed node, rh0255a. The backup file used to restore the managed node is /usr/backups/TMR1.bk: % wbkupdb -r -d /usr/backups/TMR1.bk rh0255a

Note: You cannot specify another machine in the wbkupdb command as a source for the restore action, so you have to be logged into the machine that has the backup file to make the restore from that file.

294

Troubleshooting Tivoli Using the Latest Features

From time to time, you should verify your Tivoli database backups are complete with all the database files, as well as validate that they are clean of corruption. To determine which files are being backed up, on either the server or clients, you can run the following commands: $oid = ‘wlookup -r Classes TMF_Backupimpl‘ idlcall $oid _get_server_files idlcall $oid _get_client_files

Compare the files returned from these commands to those that are in the tar file created during a backup. To see the files that are actually contained in the tar, execute the following commands against a backup file. It is recommended that you copy the backup file to a temporary directory and run the commands from there: tar -xvf $DBFile uncompress -c < $DBFile | tar xvf -

The following two examples (Example 10-4 and Example 10-5) show sample outputs from the idlcall $oid _get_server_files command and contents of a sample backup tar file. Example 10-4 idlcall $oid _get_server_files output # idlcall 1056974642.1.376 _get_server_files { 9 "odb.bdb" "odb.log" "odlist.dat" "imdb.bdb" "notice.bdb" "notice.log" "file_versions" "epmgr.bdb" ".installed" }#

Compare the output of Example 10-4 to the result of the un-tar in Example 10-5. Example 10-5 Database backup tar contents drwxr-xr-x drwxr-xr-x drwxr-xr-x -rw-------rw-------rw-------rw-------rw-------rw-------

2 2 2 1 1 1 1 1 1

root root root root root root root root root

sys sys system system system sys system sys sys

1024 Feb 06 17:03 .installed.restore 512 Jan 25 14:35 epmgr.bdb.restore 512 Jan 25 14:35 file_versions.restore 2146304 Feb 06 17:02 imdb.bdb.restore 65536 Feb 12 11:24 notice.bdb.restore 0 Feb 12 11:24 notice.log.restore 14950400 Feb 09 08:36 odb.bdb.restore 0 Feb 12 11:27 odb.log.restore 624 Feb 09 08:36 odlist.dat.restore

Verifying that the files are not corrupted is another important step in assuring a successful recovery. Running the Tivoli Management Framework otherpages command against the database file will determine any corruption in the actual file.

Chapter 10. Backup and restore

295

The undocumented command otherpages is provided on all versions of Tivoli Management Framework in the $BINDIR/bin directory. It does a structural analysis of the odb.bdb or imdb.bdb files to see if their b-tree structure is intact rather than an object or logical cleanup of the database, such as wchkdb. This is useful, as it is a different view of the database and checks a different level of integrity. To use the command, shutdown the oserv and then run: otherpages /odb.bdb

or otherpages /imdb.bdb

Tip: This procedure is valid if you are running otherpages on the live *.bdb's. Another option would be to copy it elsewhere first. This has the benefit of not requiring to run utils like otherpages and bdbx on your live *.bdb files.

The script in Example 10-6 will un-tar and uncompress your backup file and check for corruption. Example 10-6 Automated script to determine backup corruption #!/usr/bin/perl $TestDir = "/tmp/wbkupdb"; $BkupFile=$ARGV[0]; chomp $BkupFile; &Set_Env(); if( -d $TestDir) { `rm -r $TestDir`; } `mkdir $TestDir`; if($?) { print STDERR "Could not create temporary wbkupdb directory.\n"; exit 1; } # Un-tar and uncompress the backup file to a working directory `cp $ENV{'DBDIR'}/../backups/$BkupFile $TestDir/.`; $File = `cd $TestDir; tar -xvf $BkupFile`; ($action,$DBFile)=split(/[\s\,]/,$File); `cd $TestDir; uncompress -c < $DBFile | tar xvf - >/dev/null 2>&1`; `mv $TestDir/imdb.bdb.restore $TestDir/imdb.bdb`; `otherpages $TestDir/imdb.bdb`; if($?) { $Error="ERROR: The backup database file $BkupFile:imdb.bdb has errors";

296

Troubleshooting Tivoli Using the Latest Features

`echo $Error \|wsndnotif "CST DB Backups" Error`; } `mv $TestDir/odb.bdb.restore $TestDir/odb.bdb`; `otherpages $TestDir/odb.bdb`; if($?) { $Error="ERROR: The backup database file $BkupFile:imdb.bdb has errors"; `echo $Error \|wsndnotif "CST DB Backups" Error`; } } # Subroutine to set the Tivoli Environment variables sub Set_Env { $TivEnv = "/etc/Tivoli/setup_env.sh"; @EnvVal = `. $TivEnv; env`; foreach $EnvValu (@EnvVal) { chop($EnvValu); ($Key, $Val) = split(/=/, $EnvValu); $ENV{$Key} = $Val if(defined $Val); }

10.2.3 Rescue operation If the object dispatcher that is to be restored is not running (and presumably cannot be run because its database is corrupted or missing), you can extract the database manually and put the files in the correct location in the database directory. This process is known as a rescue operation. If you are performing a rescue operation, you must be Administrator (Windows NT) or root (UNIX) on the machine where the crashed database is located. Example 10-7 is an example of using csh to rescue a TMR Server. Note: For Windows NT, instead of using the uncompress -c command shown in this example, use compress -cd. Example 10-7 Rescue operation # tar xvf /var/spool/Tivoli/backup.db shasta x shasta, 1027749 bytes, 2008 tape blocks # uncompress -c $TEMPDIR/admins while read ADMIN do BBGUI=`wls -l /Administrators/"$ADMIN"/Notices` rc=$? if [ $rc -eq 0 ] then COUNT=1 BBGUI=`echo $BBGUI | cut -f1 -d'#'` idlattr -tg $BBGUI subscription_list TMF_NoticeImpl::ViewerAdmin::SubscriptionList > $TEMPDIR/$BBGUI.attr NUMGROUPS=`idlcall $BBGUI get_subscriptions | idlarg 1` INTEGERTEST=`echo $NUMGROUPS | awk '/[^0-9]+/ { print $1 }'` if [ -z "$INTEGERTEST" ] then touch $TEMPDIR/bbgui/$BBGUI echo "Unsubscribing $ADMIN ..." while [ $COUNT -le $NUMGROUPS ] do

334

Troubleshooting Tivoli Using the Latest Features

COUNT=`expr $COUNT + 1` GROUP=`idlcall $BBGUI get_subscriptions | idlarg $COUNT` GROUPOID=`echo $GROUP | cut -f1 -d'#'` GROUPREG=`echo $GROUPOID | cut -f1 -d'.'` if [ "$GROUPREG" = "$TMR" ] then GROUPNAME=`idlattr -tg $GROUPOID label Object` idlcall $BBGUI unsubscribe_group $GROUP rc=$? if [ "$rc" -ne "0" ] then echo "Problem with $ADMIN" echo "It looks like database corruption" echo "Moving to next admin." echo break fi echo $GROUPNAME >> $TEMPDIR/bbgui/$BBGUI echo "from $GROUPNAME ..." COUNT=`expr $COUNT - 1` NUMGROUPS=`expr $NUMGROUPS - 1` fi done echo "Done Unsubscribing $ADMIN" echo else echo echo "Problem with $ADMIN administrator ... skipping" echo fi else echo echo "Problem with $ADMIN administrator ... skipping" echo fi done < $TEMPDIR/admins # # # #

At this point, all administrators, local and remote, should be unscubscribed from local notice groups if there were no errors. Now we will destroy the existing notice groups and recreate new ones.

# Killing NtfServer and removing notice.bdb and notice.log case "$INTERP" in 'w32-ix86' ) ntprocinfo -k `ntprocinfo | grep NtfServer | awk '{ print $1 }'` 2>/dev/null sleep 5 ;; 'sunos4' )

Chapter 11. Tivoli Management Framework core services

335

kill -9 `ps 2>/dev/null * ) kill -9 `ps 2>/dev/null esac

-eax | grep NtfServer | grep -v grep | awk '{print $1}'` ;; -ef | grep NtfServer | grep -v grep | awk -F' ' '{ print $2 }'` ;;

rm $DBDIR/notice.bdb $DBDIR/notice.log NOTICEOID=`wls -ld /Library/TMF_Notice | cut -f1 -d'#'` THISREGION=`wtmrname` # The next couple of statements save info in $TEMPDIR for # recovery purposes idlattr -tg $NOTICEOID members TMF_imp_TSysAdmin::Collection::MemberList > $TEMPDIR/$NOTICEOID.attr wlookup -ar TMF_Notice > $TEMPDIR/wlookup.out echo "Deleting Notice Groups" ; echo wls /Library/TMF_Notice > $TEMPDIR/groups idlattr -ts $NOTICEOID members TMF_imp_TSysAdmin::Collection::MemberList '{ 0 }' while read GROUP do GROUPOID=`wlookup -r TMF_Notice "$GROUP"` rc=$? if [ $rc -ne 0 ] then GROUPOID=`wlookup -r TMF_Notice "$GROUP#$THISREGION" | cut -f1 -d'#'` else GROUPOID=`echo "$GROUPOID" | cut -f1 -d'#'` fi wregister -u -r TMF_Notice "$GROUP" objcall $GROUPOID rmobj echo "Rebuilding $GROUP ..." idlcall $NOTICEOID create_notice_group \"$GROUP\" 168 ; echo done < $TEMPDIR/groups # The new groups have been created. Now we use the information # that we stored in $TEMPDIR/bbgui/ to resubscribe the # administrators to the groups they were subscribed to. echo echo "Resubscribing Admins to their groups ..." for BBGUI in `ls $TEMPDIR/bbgui` do echo "$BBGUI ..."

336

Troubleshooting Tivoli Using the Latest Features

while read GROUP do echo "... $GROUP" GROUPOID=`wlookup -r TMF_Notice "$GROUP"` rc=$? if [ $rc -ne 0 ] then GROUPOID=`wlookup -r TMF_Notice "$GROUP#$THISREGION" | cut -f1 -d'#'` else GROUPOID=`echo "$GROUPOID" | cut -f1 -d'#'` fi idlcall $BBGUI subscribe_group "$GROUPOID"\#TMF_Notice::Group\# done < $TEMPDIR/bbgui/$BBGUI echo done # Since the interconnected TMRs still know about the old OIDS # for the notice groups, we will tell them to update their # TMF_Notice resources echo echo "Updating Notice resources in interconnected regions ... " for INTTMR in `wlsconn | tail +2 | awk '{print $4}'` do NAMEREG=`objcall $INTTMR.1.0 get_name_registry` rc=$? if [ "$rc" -eq 0 ] then INTREG=`idlcall $NAMEREG lookup \"distinguished\" \"InterRegion\" | idlarg 1` idlcall $INTREG update_resources \{ 1 \"$THISREGION\" \} \{ 1 \"TMF_Notice\" \} else echo "Cannot contact the $INTTMR TMR - resources in this TMR will not be updated" fi done echo ; echo echo "Script Finished" } 2>$TEMPDIR/error.out

11.3 Tivoli tasks A task in Tivoli: 򐂰 Is the definition of the network operation that has to be executed. 򐂰 Can be executed several times. 򐂰 Is stored in a task library.

Chapter 11. Tivoli Management Framework core services

337

The definition of the task includes: 򐂰 Name of the task (Label). 򐂰 Platform in which the task will be executed. 򐂰 Tivoli role(s) required to execute the task. 򐂰 User and group under which the task will be executed. Note: The required role defined in the task is not the role the administrator has in the policy region where the task library is but the one that you have assigned in the policy region in which the task endpoint resides.

For example, the administrator test has admin as the role in the policy region where the task library resides, but in the policy region in which the task will be executed, the administrator has user. Assuming this administrator has no TMR roles, the role required to execute the task must be user in order for this administrator to be able to execute the task in that policy region. If the administrator had a TMR role specified, then that role could be used as the required role for the task.

11.3.1 Tivoli jobs A job is a task, but it already has all of the run-time parameters set, including: 򐂰 The list of targets in which the task will be executed. 򐂰 The execution mode: Serial, one target at a time or parallel, all subscribers simultaneously. 򐂰 The output format: Desktop or saved to a file.

The principal difference between a task and a job is that the job has sufficient data associated with it to be scheduled, and the task does not. Therefore, if you want to execute the operation several times at predetermined intervals: 1. Define the task. 2. Define the job. 3. Schedule the execution of the job.

11.3.2 Task library features Tasks libraries store binary files or scripts that we generally refer to as executables. When you create a task, an image of the executable that you have specified to run will be stored in the TMR Server’s binary tree.

338

Troubleshooting Tivoli Using the Latest Features

Note: If you modify the executable used in a task, you must either resave the task after updating the script on the TMR Server (to have it rewrite to the TASK_REPOSITORY area on the TMR Server), or find the file name in the TASK_REPOSITORY area on the TMR Server, and directly edit the file there manually instead.

To use the second option (direct editing), do the following: 򐂰 Use the wgettask command to see what the task name is (tasks are assigned unique names at creation time and they are located under $BINDIR/TAS/TASK_LIBRARY/bin/). 򐂰 Edit the task directly.

You can pass arguments to a task, but by default, you can only do this executing the task from the command line. If you want to pass any argument using the GUI environment, you will need to use the Task Library Language (TLL). You will need to use wtll to export task library definitions into a flat text format. You can then modify the TLL and use wtll to import the definition again. The task library has default and validation policies: 򐂰 Default task library policies set the available options of endpoints and profile managers to run the task or job. 򐂰 Validation task library policies validate the creation and execution of the task or job in a task library

All these policies can be customized using a shell script to set or validate data. This customization will enable you to set options to determine which users you can run as well as validating users.

11.3.3 Task and job internals This section gives a little more detail about the task creation, distribution process, and the use of default and validation policies.

Creating a task When you create a task, the following occurs behind the scenes: The host that was specified in the Executable for Task window is contacted, and the task is copied to the TMR Server. The directory that is used is: $BINDIR/../interp/TAS/TASK_LIBRARY/bin// The name of the task is _.

Chapter 11. Tivoli Management Framework core services

339

Note: The name of the task is now stored differently than pre 3.6. Previously, every time a task was edited and modified, the previous copies were kept in the directory. The same task was appended with a version number starting at 0 for the original task. Tivoli Management Framework Version 3.6 now only keeps a single copy of the task.

Distribution of the task executables Every time you execute a task, the server distributes to the targets: 򐂰 Executable files to run 򐂰 Access control list for the task 򐂰 Arguments for the task 򐂰 Environmental variables 򐂰 User and group name required 򐂰 Time-out value

The Tivoli method that is invoked is the run_task method. At the target, this method decrypts the task information, gets the correct executable for that interpreter type, and forks/execs the executable to perform the task. When the task is done executing, the distributed task is removed, and the output is collected and returned to the caller. For TMA endpoints, the task is sent to the TMA endpoint. The TMR method run_task then spawns the run_task method on the gateway. The gateway will then cause a downcall of task_endpoint to the TMA endpoints, and the task is run returning any status to the TMR Server. Note: There is no extension associated with the task; therefore, a default extension is associated with the task on the endpoint when it is distributed. This extension depends upon the interp type being used. For generic and any of the UNIX interp types, the extension is sh. For Windows platforms, the extension is cmd.

By default, the distribution of the binaries is only in the TMR Server (ALI), but when you have Tivoli applications, like IBM Tivoli Monitoring, that use executables stored in the task library, it will be useful to distribute the binaries to the file servers either in the local TMR (LOCAL) or interconnected TMRs (GLOBAL). Distributing the executable to all the file servers in a TMR gives the application faster access to the executable, and it will be more flexible and extensible.

340

Troubleshooting Tivoli Using the Latest Features

Note: The distribution of tasks does not use MDist. The TMR Server will still use an IOM channel if data >16K, but the server makes a direct contact with each target. There is no use of mem_max or other tuning parameters.

You can distribute the task binaries with the command line: wdisttask -q library_name wdisttask -s library_name mode wdisttask -d library_name task_name

Note: You cannot use the wdisttask to pre-stage tasks on target systems. These are useful only if an application, such as IBM Tivoli Distributed Monitoring, is running tasks. Running tasks or jobs from the TMR will still cause the tasks to be distributed to the target systems.

Task library policies As we have said before, the default and validation policies are used in the task library. Table 11-2 provides a list of the different default policies in a task library. Table 11-2 Default policies in a Task Library Default policies

Description

tl_def_dist_mode

Default mode for distributing task binaries throughout a TMR. The default is ALI.

tl_def_man_nodes

Default list of managed nodes, as displayed in the Execute Task and Create Job dialogs.

tl_def_prof_mgrs

Default list of profile managers, as displayed in the Execute Task and Create Job dialogs.

tl_def_set_gid

Default group ID: This is an actual ID, not GID.

tl_def_set_uid

Default user ID: This is an actual ID, not UID.

Table 11-3 provides the validation policies. Table 11-3 Validation policies in a task library Validation policies

Description

tl_val_dist_mode

Validates the endpoints on which a task or job will run.

Chapter 11. Tivoli Management Framework core services

341

Validation policies

Description

tl_val_prof_mgrs

Validates the profile managers on which a task or job will run.

tl_val_set_gid

Validates the assigned group name of a task or job (Uses the actual name, not GID).

tl_val_set_uid

Validates the assigned user name of a task or job (Uses the actual name, not UID).

You can look at the policies with: wlspol [ -d | -v ] TaskLibrary

Use wlspol to list the names of the policy default objects, such as BasicTaskLibrary: wlspolm [ -d | -v ] TaskLibrary

Use wlspolm to list policy methods assigned to the TaskLibrary resource. It will display a list, such as the ones shown in the tables above: wgetpolm [ -d | -v ] TaskLibrary BasicTaskLibrary {policy}

Use wgetpolm to list the body or constant value of a default or validation policy method: wputpolm [ -d | -v ] TaskLibrary BasicTaskLibrary {policy} < binary of the new policy>

Use wputpolm to replace a policy method’s body. For more details about these commands, consult the Tivoli Framework 3.7.1 Reference Manual, SC31-8434.

Task library commands Table 11-4 on page 343 summarizes the task library commands.

342

Troubleshooting Tivoli Using the Latest Features

Table 11-4 Task Library commands Command

Purpose

Role

wcrtjob

Creates a new job in a task library

Admin, Senior, Super

wcrttlib

Creates a new task library

Admin, Senior, Super

wcrttask

Creates a new task in a task library

Admin, Senior, Super.

wdeljob

Deletes a job from a task library

Admin, Senior, Super.

wdeltask

Deletes a task from a task library

Admin, Senior, Super.

wdisttask

Controls the distribution of task binaries from a task library

Admin, Senior, Super.

wgetjob

Lists information about a job

User, Admin, Senior, Super.

wgettask

Lists information about a task

User, Senior, Super.

wlstlib

Lists information about a task library

User, Senior, Super.

wrunjob

Executes a job

The role specified in the job definition.

wruntask

Executes a task

The role specified in the task definition.

wsetjob

Sets the properties of a task

Admin, Senior, Super.

wsettask

Sets the properties of a task

Admin, Senior, Super.

wtaskabort

Aborts a task transaction and rolls back any uncommitted changes

Can only be used in a script and does not work by command line.

wtll

Imports and exports task library definitions

User, Admin, Senior, Super.

Chapter 11. Tivoli Management Framework core services

343

11.3.4 Troubleshooting tasks and jobs This is a collection of tips that may be useful in investigating problems with tasks: 򐂰 Be sure that your script or binary file is working as it should be, independent of the task/job process. 򐂰 Task Library Language can only pass 9 arguments down. 򐂰 You can use Tivoli to open an xterm on the machine where the problem task is running. You can use: wxterm -h ManagedNode -display mydesktop:0

Not only does this give you the xterm, but it also confirms that remote initiation of programs is possible. The xterm will run with the same environment as the task. 򐂰 Remember, that normally the first line of a task script must be #!/bin/sh. However, if the task is to execute on a TMA endpoint, then this line must be omitted. 򐂰 If the task is created on a UNIX TMR, and the target system is Windows, the lines of the task must end in ^M (Ctrl-M). 򐂰 The binary directory of the tasks must be writable for creating new tasks. The path of this directory is:

///TAS/TASK_LIBRARY//bin 򐂰 To change the policy region validation to allow root to run tasks, perform the following steps:

a. Run: wcrtpol -v TaskLibrary new_name_for_library_copy wgetpolm -v TaskLibrary new_name_for_library_copy tl_val_set_uid >file_name

b. Edit the file_name and remove the checks for root. Keep the part that says: echo TRUE and exit 0. c. Run: wputpolm -v TaskLibrary new_name_for_library_copy tl_val_set_uid

d. If the desktop is running, exit the desktop and restart it. e. Open the Policy region window. f. Under Properties/Managed Resource Policies, open the button for Validation Policy and when you see this new policy, select it.

344

Troubleshooting Tivoli Using the Latest Features

򐂰 If you are using commands from the Tivoli Management Framework, you MUST set up the environment variables in the script, as follows:

– UNIX: /etc/Tivoli/setup_env.sh

– Windows NT: c:/winnt/system32/drivers/etc/Tivoli/setup_env.cmd

򐂰 To list the task and jobs within a library, use the command: wlstlib library_name

򐂰 To list the properties of a task, use the command: wgettask [ -F

file_name ] task_name library_name

Example 11-8 shows the listed properties of a task. Example 11-8 List the properties of a task [root@itso3]/> wgettask endpoint TaskLibrary Task Name endpoint User Name * Group Name Task ACL senior:super:user Supported Platforms w32-ix86 /w32-ix86/TAS/TASK_LIBRARY/bin/1295714281/T askLibrary__vhyadwba Task Comments Task Name : TaskLibrary/endpoint Task Created : Thu Dec 17 18:05:13 1998 Task Created By : root@itso3 Task Files w32-ix86 itso3 /tmp/task.txt Distribution Mode : ALI Task Comments :

򐂰 To list the properties of a job, use the command: wgetjob job_name library_name

Example 11-9 on page 346 shows another example of listed properties.

Chapter 11. Tivoli Management Framework core services

345

Example 11-9 List properties of a job [root@itso3]/swdist/logs/task> wgetjob datejob TaskLibrary Job Name : datejob Task Name : leedate Execution Mode : parallel Timeout : 60 Output Format : task header return code standard output standard error output save output to file itso3 /swdist/logs/datejob.log Managed Nodes Profile Managers

: :

UNIXPM (ProfileManager)

Note: If the target of a job is a profile manager, the subscribers of the profile manager are resolved when the task is run. So, if you originally create a job that has the profile manager UNIXPM as a target, the subscribers to the UNIXPM profile manager are retrieved each time the job is run.

To export and import task library definitions, use the following commands: wtll [ -F ] export_file -l library_name wtll [ -i ] [ -r ] -p policy_region [-P Preprocessor] import_file

Tip: You can use the -i flag with wtll to insert a new task or group of tasks into an existing task library. The import file is parsed, and specified tasks are created in the existing task library. The import file can be a complete task library or a list of individual tasks. An alternative (and recommended) way to add a task to a task library from the command line is to use the wcrttask command.

Example 11-10 shows a wtll export file. Example 11-10 wtll export file TaskLibrary “task-rh0255b” { Context = (“_!_”,”*”,1); Distribute = (“_!_”,”ALI”,1); HelpMessage = (“_!_”,”Conventional Task Library”,1); Requires = (“_!_”,”>2.5”,1); Version = (“_!_”,”1.0”,1); ArgLayout Filename{

346

Troubleshooting Tivoli Using the Latest Features

TextChoice FileBrowser; ButtonLabel = (testmsg_BrowserButton); }; Task backup { Description = (“_!_”,”Upgraded Task”,1); HelpMessage = (“_!_”,”No Help Available”,1); Uid = (“_!_”,”root”,1); Gid = (“_!_”,”bin”,1); Comments = (“_!_”,”Task Name : task-rh0255b/backup Task Created : Thu Oct 16 14:49:31 1997 Task Created By : [email protected] Task Files default rh0255a /usr/local/bin/backuptmr Distribution Mode : ALI Task Comments : ---------------------------------------------------------“,1); Roles = (“_!_”,”user”,1); Argument (testmsg_ArgDirname){ Layout = “Filename”; MustMatch = “^/.*”; }; Implementation (“default”) Binary “0.default”; }; }

Task library common errors 򐂰 timeout

The task exceeded the amount of time allowed in the time-out setting. 򐂰 getpwname failed with code ##

You are trying to execute the task with a user that does not have an account on the destination managed node where it is trying to run the task. The user must exist before the task can be run. 򐂰 Getting method fork failed errors

This can be an OS resource problem, for example, swap space, lack of threads, and so on. 򐂰 Command exited with signal 5, core=false

This error can occur if you change any default or validation policy, and the script has an error. For example, you might want to change the tl_def_dist_mode to LOCAL. If you remake the script with the echo command,

Chapter 11. Tivoli Management Framework core services

347

you will have this error because the echo adds a new line character, so you will have to use printf like this: #!/bin/sh printf LOCAL exit 0

򐂰 open failed with code '13': 'Permission denied

This error occurs in tasks when the user ID running the task does not have permission to write to the log file. If you are receiving this on tasks, also have the output come to the desktop. If the desktop output is correct, but you are receiving the code 13, then check permission. On jobs, this error was seen under two circumstances: You did not have permission to write to the log file or the task specified in the job does not have a valid user ID on the target system. 򐂰 (14): no permission for `TaskLibrary/rhondatask' for operation `run_task'

The person running the job or task does not have the authority to run the task. Check the Task ACL with the wgettask command. Check the TMR roles for the administrator. Remember: Roles are not hierarchical. Add an appropriate role to either the administrator or to the task. 򐂰 'COMMconnect_host' failed with code '79':'rh2900c' or pctmp109 (Endpoint): ipc_create_remote failed: unable to connect to 146.84.32.208+9494: (67) IPC shutdown

The target endpoint agent was not running on the endpoint. 򐂰 (4): resource `leedate' not found

The task leedate was deleted, but the job still references it. This error was received when running the job referencing the leedate task. Note: If you are still having problems, you can do the following to gather more information about the errors:

1. Enable the wtrace of errors and objcalls. 2. Execute an odstat. 3. Regenerate the problem. 4. Execute an odstat and keep it in a file. 5. Execute wtrace -jk $DBDIR and keep it in another file.

348

Troubleshooting Tivoli Using the Latest Features

11.4 Scheduler As its name suggests, the Scheduler can be used to schedule jobs, backups and profile distributions. To begin to schedule a job, you simply drag and drop the icon of the job onto the icon of the Scheduler.

11.4.1 Scheduler commands Table 11-5 is a quick summary of the Scheduler commands. Table 11-5 Scheduler commands Command

Purpose

Role

wdelsched

Removes jobs from the Scheduler

Super, Senior, Admin

wedsched

Edits a job that currently exists in the Scheduler

Super, Senior, Admin

wenblsched

Disable or enables scheduled jobs

Super, Senior, Admin

wgetsched

Retrieves information on jobs currently scheduled

Super, Senior, Admin, User

wschedjob

Schedules a job that exists in the task library

Super, Senior, Admin

wstartsched

Starts the TME10 Scheduler

Super, Senior

Refer to the Tivoli Framework 3.7.1 Reference Manual, SC31-8434 for more details.

11.4.2 Tips for working with the Scheduler Use the wgetsched command to make sure the job you are scheduling is really in the Scheduler. The output from the command is shown in Example 11-11 on page 350.

Chapter 11. Tivoli Management Framework core services

349

Example 11-11 Output of the wgetsched command [root@itso3]/swdist/logs/task> wgetsched Job ID Job Label Admin Date & Time Enbld Repeat Re try Cancel ------ --------------- --------------- ------------------------ ----- --------- -----000001 update X-TMR re root@itso3 Thu Dec 17 20:30:00 1998 YES YES NO NO 000004 filepackage root@itso3 Fri Dec 18 03:20:00 1998 YES NO NO NO 000008 going to be del Lee@itso3 Fri Dec 18 16:45:00 1998 YES YES NO NO

Check for the name of the job, the date and time scheduled, and the Administrator. Check the Scheduler for entries that repeat. Do not delete a Tivoli Administrator that has jobs in the Scheduler. Doing this will cause the job not to run. You need to use the wgetsched command and find all jobs that the administrator is running and get the Scheduler ID number in Example 11-12. Then issue the wgetsched command in verbose mode and get all information relative to the job so you can recreate the job once the administrator is deleted. Finally, you need to delete the job from the Scheduler with the wdelsched command and re-add it using a different administrator. If you have already deleted the administrator, then every time the scheduled job is run, it will fail, and a notice will be logged. Example 11-12 Example of a wgetsched command with verbose output [root@itso3]/swdist/logs/task> wgetsched Job ID Job Label Admin Date & Time Enbld Repeat Re try Cancel ------ --------------- --------------- ------------------------ ----- --------- -----000001 update X-TMR re root@itso3 Thu Dec 17 21:00:00 1998 YES YES NO NO 000004 filepackage root@itso3 Fri Dec 18 03:20:00 1998 YES NO NO NO 000008 going to be del Lee@itso3 Fri Dec 18 16:45:00 1998 YES YES NO NO [root@itso3]/swdist/logs/task> wgetsched -v -s 8 ID : 8 Name : rhondajob

350

Troubleshooting Tivoli Using the Latest Features

Label Description Administrator Original Time Next Time Enabled Repeat Type Repeat Increment Repeat Unit Repeat Times Retry Type Retry Increment Retry Unit Retry Times Cancel Job Cancel Increment Cancel Unit Email And so on

: : : : : : : : : : : : : : : : : :

going to be deleted This is a job from the Task Library. Lee@itso3 Fri Dec 18 16:45:00 1998 Fri Dec 18 16:45:00 1998 Yes Finite 5 Minute 5 None 0 Minute 0 No 0 Minute

For jobs that are regularly scheduled or repeated jobs at least once a week run the wgetsched command in verbose mode and save their definitions. If the Scheduler must be cleaned, you can use these definitions to recreate the Scheduler jobs.

11.4.3 Troubleshooting common Scheduler errors In some cases, when you try to obtain the list of jobs scheduled, you may obtain the error shown in Figure 11-11. If so, you will need to start the Scheduler again by running the wstartsched command.

Figure 11-11 Scheduler not running message

When you execute the wstartsched command, it may seem to start the Scheduler, but when you try to retrieve the list of jobs, you still get the Scheduler

Chapter 11. Tivoli Management Framework core services

351

not running error. If this happens, the Scheduler must be cleaned. Another indicator of this problem is the following in the oservlog: Oct 22 17:01:34: ^hdaemon exit while in use: (0xa) Oct 22 17:03:08: ^hdaemon exit while in use: (0x6)

Cleaning the Scheduler You can clean up the Scheduler by executing the script in Example 11-13 in the TMR Server. Example 11-13 Cleaning the Scheduler #!/bin/sh . /etc/Tivoli/setup_env.sh index=0 SCHED=`wlookup Scheduler` objcall $SCHED stop set -e NUM_CRED=`objcall $SCHED contents| grep CredDatabase| wc -l` while $index -lt $NUM_CRED do objcall $SCHED rmattr BDBPG:CredDatabase:$index index=`expr $index + 1` echo $index done index=0 NUM_SCHED=`objcall $SCHED contents| grep SchedulerDatabase| wc -l` while $index -lt $NUM_SCHED do objcall $SCHED rmattr BDBPG:SchedulerDatabase:$index index=`expr $index + 1` done wstartsched

11.5 Interconnected TMRs Depending on the size and operational requirements of your organization, you may require more then one TMR. Creating multiple TMRs and interconnecting them introduces more flexible and enhanced system performance, administration, and security. Multiple servers reduce system and network load, allow you to organize TMRs by function or geographic location, and enable resource management to be divided by administrators of various security levels. Should something fail with one specific resource or TMR, the entire enterprise does not fail with it.

352

Troubleshooting Tivoli Using the Latest Features

In order to understand the internals of TMR interconnections, you need to have a good understanding of Tivoli Name Registry, which we will describe next.

11.5.1 The Tivoli Name Registry The Tivoli Name Registry (TNR) provides a fast, space-saving way to access objects in: 򐂰 Large server object databases 򐂰 Multiple connected TMRs with differing speed connections

Each TMR contains a name registry object. The name registry lists names of both local TMR and interregion objects. It resembles a table of contents for the Tivoli object database and all remote databases and contains only references to objects in those databases. All objects in the TMR that need to be referenced should be registered in the name registry when created, unregistered when deleted, and updated when their label is changed. Information from remote TMRs must be updated regularly to maintain accurate data. You can use the wlookup command to view the objects from remote TMRs. For a detailed description of the Tivoli Object Hierarchy and the Name Registry, see Chapter 2, “Tivoli Object Database architecture” on page 17.

Interconnected TMR Name Registry usage A Tivoli application can look at the local name registry to find the references to the remote resources. This removes the need for an expensive process to scan for a resource in each of the connected TMRs. Even though a remote TMR may not be available, a complete list of managed nodes can be retrieved locally. Example 11-14 on page 354 shows output from several commands displaying the connections and the local and remote resources in the Name Registry.

Chapter 11. Tivoli Management Framework core services

353

Example 11-14 Displaying the connections and the resources in the Name Registry # wlsconn MODE NAME SERVER REGION rh0255b.itsc.austin.ibm.com-region rh0255b.itsc.austin.ibm.com 1515280903 tivdev02-region tivdev02 1482082604 # odadmin region Region TME Srvr ipaddr port 1360991896 rh0255c.itsc.austin.ibm.com 1482082604 tivdev02.itsc.austin.ibm.com 1515280903 rh0255b.itsc.austin.ibm.com

9.3.1.235 9.3.1.134 9.3.1.234

94 94 94

1360991896.1.0 1482082604.1.0 1515280903.1.0

# wls /Library/PolicyRegion rh0255c.itsc.austin.ibm.com-region Profiles-rh0255c Queries-rh0255c # wlookup -ar PolicyRegion Profiles-rh0255c 1360991896.1.552#TMF_PolicyRegion::GUI# Queries-rh0255c 1360991896.1.562#TMF_PolicyRegion::GUI# TEC31Region 1515280903.1.673#TMF_PolicyRegion::GUI# TEST-tivdev 1482082604.1.641#TMF_PolicyRegion::GUI# Tivoli/Sentry Defaults-tivdev02-region 1482082604.1.596#TMF_PolicyRegion::GUI# rh0255b.itsc.austin.ibm.com-region 1515280903.1.196#TMF_PolicyRegion::GUI# rh0255c.itsc.austin.ibm.com-region 1360991896.1.196#TMF_PolicyRegion::GUI# test-rh0255b 1515280903.1.536#TMF_PolicyRegion::GUI# tivdev02-region 1482082604.1.195#TMF_PolicyRegion::GUI# # wlookup -ar ManagedNode k124a 1360991896.2.7#TMF_ManagedNode::Managed_Node# rh0255a 1515280903.2.7#TMF_ManagedNode::Managed_Node# rh0255b.itsc.austin.ibm.com 1515280903.1.327#TMF_ManagedNode::Managed_Node# rh0255c.itsc.austin.ibm.com 1360991896.1.327#TMF_ManagedNode::Managed_Node# rh0255e 1360991896.3.7#TMF_ManagedNode::Managed_Node# rh0255f 1515280903.3.7#TMF_ManagedNode::Managed_Node# tivdev02 1482082604.1.326#TMF_ManagedNode::Managed_Node# # wlookup -ar TaskLibrary T/EC Tasks 1515280903.1.675#TMF_Task::TaskLibrary# tl-tivdev02 1482082604.1.633#TMF_Task::TaskLibrary#

Note the relationship between the region numbers in the result of the wlsconn command and the output from the wls and wlookup commands. A summary of interregion-related commands is shown in the next section.

354

Troubleshooting Tivoli Using the Latest Features

Summary of Name Registry commands wls, wgetallinst, and wlookup all search and list resource information, but each command has its own purpose. For example, using the managed resource ManagedNode, depending on which command you use, may get you different results.

The wls command lists the members of the selected collection (ManagedNode in this case): # wls /Library/ManagedNode monomoy

The wlookup command searches a resource's object information in the Tivoli Name Registry. If no type is specified, the default resource type is distinguished. If you use the wlookup command to retrieve the resource managed node, you get the following output: # wlookup -ar ManagedNode bass 1438246632.1.348#TMF_ManagedNode::Managed_Node# hannover 1099910505.2.7#TMF_ManagedNode::Managed_Node# mackeral 1438246632.5.7#TMF_ManagedNode::Managed_Node# monomoy 1056974642.1.348#TMF_ManagedNode::Managed_Node# salmon 1160615483.1.348#TMF_ManagedNode::Managed_Node# stuttgart 1438246632.6.7#TMF_ManagedNode::Managed_Node# trout 1099910505.1.348#TMF_ManagedNode::Managed_Node#

The output is different because wls only searches locally, whereas wlookup searches in the name registry that contains resources from interconnected TMRs (if updated). With the wgetallinst command, the output is the same as with wlookup in this case: # wgetallinst -l ManagedNode bass 1438246632.1.348#TMF_ManagedNode::Managed_Node# hannover 1099910505.2.7#TMF_ManagedNode::Managed_Node# mackeral 1438246632.5.7#TMF_ManagedNode::Managed_Node# monomoy 1056974642.1.348#TMF_ManagedNode::Managed_Node# salmon 1160615483.1.348#TMF_ManagedNode::Managed_Node# stuttgart 1438246632.6.7#TMF_ManagedNode::Managed_Node# trout 1099910505.1.348#TMF_ManagedNode::Managed_Node#

That is because wgetallinst is similar to the wlookup command. The difference is that wlookup displays only those resource types registered in the name registry. wgetallinst displays both registered and unregistered resource types, including resource types that are super sets of other resource types.

Chapter 11. Tivoli Management Framework core services

355

For example, wgetallinst displays all instances of the Profileendpoint resource type, which includes instances of the ProfileManager, ManagedNode, and NisDomain resource types: # wgetallinst Profileendpoint bass hannover mackeral monomoy salmon stuttgart trout bass berlin

11.5.2 Connecting TMRs A secure or remote TMR connection can be made from the desktop or with the wconnect command. We will show you how to configure the TMR through the desktop in this section. In 11.5.8, “Connections” on page 371, we cover the details of the command line method (wconnect) in the context of the hub-spoke TMRs scenario. The required role to connect TMRs is super. A user must determine the following before making a TMR connection: 򐂰 The region name, region number, root, or Administrator password, and interregion password for the remote TMR. 򐂰 Whether to use a one-way or two-way connection. 򐂰 Whether to use a secure or remote connection type.

Remote Connect means that you can initiate the entire connection from one machine. In this case, you must enter the password of the root administrator for the remote TMR Server. Figure 11-12 on page 357 details the Interregion Remote Connect dialog.

356

Troubleshooting Tivoli Using the Latest Features

Figure 11-12 Interregion Remote Connect Dialog

Secure connect means that you must initiate the connection from both TMRs. It is secure because you do not have to enter the password of the root administrator in the remote TMR, and that root password is, therefore, not sent over the wire. Figure 11-13 details the Interregion Secure Connect Dialog. Note that no trusted host login information is required in the dialog.

Figure 11-13 Interregion Secure Connect Dialog

Chapter 11. Tivoli Management Framework core services

357

Secure and remote connections A secure connection does not require a remote login to connected TMRs. Connection requests are required by each communicating TMR. Each side will do the following: 򐂰 Add the remote TMR’s host name, region number, and encryption level to the Interregion object (wlookup InterRegion). The interregion object keeps this data in an attribute called TMRs. 򐂰 Try to communicate with the remote TMR. 򐂰 If communication is successful, add the TMR name to interregion object and exchange resources.

A remote connection allows one TMR to provide all of the connection request information and make the connection. Remote connections perform the same steps as a secure connection, but the rexec or rcmd is used to communicate to and start the connection in the remote TMR. To perform remote connections, the TMR making the connection must have the remote TMR Server’s root password or be a trusted host. Note, that for Windows NT, we cannot use trusted host; we must use TRIP with the shell service option. As it is possible to disable TRIP after the initial install, you may need to check that TRIP is installed and running to perform the TMR connection. Note: Secure and remote are only methods of making a connection, not different types of connection. The only different types are one-way and two-way.

One-way and two-way connections One and two-way connections determine the visibility of interconnected TMRs resources. A one-way connection requires one of the TMRs to act as the manager of the other TMRs resources. The managed TMR will not be able to view the managing TMRs resources. A two-way connection allows each interconnected TMR to view all of the other’s resources as long as they are updated regularly.

11.5.3 Resource visibility Visibility describes available resources in the name registry or in collections. A collection is an object (an administrator’s desktop, a policy region, and so on) that can hold a list of references to other objects in the system. When an

358

Troubleshooting Tivoli Using the Latest Features

administrator opens a collection object, the member objects are displayed. Once TMRs have been connected, member objects could be located in a remote TMR. For example, a newly-created managed node object in TMR A can be visible in TMR B as soon as it is created, if it resides within a collection path that has a point visible in TMR B. This is true even though the new managed node will not be visible in the managed node resource type in the name registry in TMR B until the next update of resources from TMR A. Name Registry visibility applies to: 򐂰 Available lists, as in Available Subscribers in a profile manager window. 򐂰 Resources listed with the wlookup command. 򐂰 Resources referenced from the command line with @Resource:instance.

Collection Path visibility applies to: 򐂰 Resources viewed by opening collections on the desktop. Collections here refer to any kind of a container object that is visible from the GUI. 򐂰 Resources managed through the file system-type commands (wls, wmv, and so on). 򐂰 Resources referenced from the command line with /xxx/yyy.

The following example shows screens from TMRs rh0255b and rh0255c. A query has been created in TMR rh0255c. The name of the query is FIND_NT. In TMR rh0255c, the query is in the local database confirmed with wls and is also in the name registry, as shown with wlookup. In this case, only the queries in the local database are in the name registry. Another way to see this with wlookup is that all the object IDs start with the TMR number of rh0255c (see Example 11-15). Example 11-15 wlookup from rh0255c #wls -l /Library/Query 1360991896.1.565#TMF_Query::Query# Find-AIX 1360991896.1.566#TMF_Query::Query# Q-AIX-Cmdline 1360991896.1.567#TMF_Query::Query# Q-AIX-Cmd 1360991896.1.576#TMF_Query::Query# FIND_NT #wlookup -ar Query FIND_NT 1360991896.1.576#TMF_Query::Query# Find-AIX 1360991896.1.565#TMF_Query::Query# Q-AIX-Cmd 1360991896.1.567#TMF_Query::Query# Q-AIX-Cmdline 1360991896.1.566#TMF_Query::Query# #wruninvquery -l FIND_NT rh0255e

Chapter 11. Tivoli Management Framework core services

359

In TMR rh0255b, wls shows that the only query local to this TMR is Get-AIX. Using wlookup, the FIND_NT query has not been added to the local name registry because the resource type Query has not been exchanged since FIND_NT was created. Three other queries were previously exchanged with TMR rh0255c: Find-AIX, Q-AIX-Cmd, and Q-AIX-Cmdline. Therefore, the lookup for the FIND_NT query fails when you use wruninvquery from the command line: Example 11-16 wrunquery fails (on rh0255b) # wls -l /Library/Query 1515280903.1.773#TMF_Query::Query# Get-AIX # wlookup -ar Query Find-AIX 1360991896.1.565#TMF_Query::Query# Get-AIX 1515280903.1.773#TMF_Query::Query# Q-AIX-Cmd 1360991896.1.567#TMF_Query::Query# Q-AIX-Cmdline 1360991896.1.566#TMF_Query::Query# # wgetquery -f FIND_NT An instance named "FIND_NT" of resource "Query" was not found. # wruninvquery -l FIND_NT An instance named "FIND_NT" of resource "Query" was not found.

However, in TMR rh0255b, the query can be seen through the GUI by selecting Desktop -> TMR Connections -> Top Level Policy Regions..., opening the rh0255c-Queries PolicyRegion and then Q-test. The policy region containing the query library, Q-test, has been exchanged before, so the collection path is available to TMR rh0255b. If you run a distribution from the GUI and want to select from your subscribers using this query, it will run correctly. Figure 11-14 on page 361 details a GUI query.

360

Troubleshooting Tivoli Using the Latest Features

Figure 11-14 Remote TMR can see Query in the GUI

In TMR rh0255b, you can select to update the resources from rh0255c. This can be achieved by selecting Desktop -> TMR Connections -> Update Resources..., or by using the wupdate command. Now you can see the FIND_NT query listed in the output from wlookup, as detailed in Example 11-17 on page 362. Using wruninvquery will also work.

Chapter 11. Tivoli Management Framework core services

361

Example 11-17 wrunquery successful (on rh0255b) # wupdate -r Query rh0255c.itsc.austin.ibm.com-region # # wlookup -ar Query FIND_NT 1360991896.1.580#TMF_Query::Query# Find-AIX 1360991896.1.565#TMF_Query::Query# Get-AIX 1515280903.1.773#TMF_Query::Query# Q-AIX-Cmd 1360991896.1.567#TMF_Query::Query# Q-AIX-Cmdline 1360991896.1.566#TMF_Query::Query# # # wgetquery -f FIND_NT Name: FIND_NT Description: Find machines with Windows_NT RDBMS User: inventory View: INVENTORYDATA Fields: TME_OBJECT_ID TME_OBJECT_LABEL Where Clause: -------------------(BOOTED_OS_NAME = 'Windows NT')# # wruninvquery -l FIND_NT rh0255e #

11.5.4 Interregion updates and object time stamps Each resource type in the name registry carries a time stamp. This time stamp is updated when: 򐂰 A new resource instance is added. 򐂰 An existing resource instance’s information changes. 򐂰 A resource instance is removed.

Each interregion object has a per-TMR/per-resource time stamp for the last time it received an update from a resource in a connected TMR. This time stamp is used to determine whether or not an update of a remote resource type is necessary. For example, in TMR B, the resource type ManagedNode has a time stamp. In connected TMR A, an interregion object keeps a time stamp that records the last time the resource type ManagedNode was updated in TMR B. If an administrator in TMR A requests an update of the ManagedNode resource type from TMR B, the interregion object in TMR A first checks the local time stamp against the time stamp in the name registry of TMR B. If the local time stamp is not older than the 362

Troubleshooting Tivoli Using the Latest Features

remote one, the resource is assumed to have not changed, and no exchange of data takes place. If the interregion time stamp in TMR A is older than the one in the name registry in TMR B, then the entire resource is updated. This means that all of the instances of the resource ManagedNode from TMR B are brought into TMR A and merged into the ManagedNode resource in the name registry in TMR A. A user can force an update of resources, regardless of the time stamp, with the wupdate -f option. You can see the date and time stamp when a resource was last exchanged with another TMR. The command to use is wlsconn remote-regionname. In Example 11-18, we look at a partial listing of the TMR itso3 resources exchanged with TMR rh2900a. On rh2900a, we looked at the resources prior to issuing the wupdate command. Notice the date and time stamp on the Administrator resource of 11/19/98 07:11:55. Example 11-18 Partial Listing of wlsconn rh2900a: wlsconn Guyincharge Name: Guyincharge Server: itso3 Region: 1295714281 Mode: two_way Port: 94 Resource Name ------------TMF_Notice Administrator PolicyRegion TaskLibrary

Last Exchange ------------11/09/97 03:22:34 11/19/98 07:11:55 11/16/98 04:24:17 11/16/98 04:46:08

We then issued the wupdate -r Administrator Guyincharge command to update the Administrator resource. We had previously added two Tivoli Administrators in the Guyincharge region on itso3. Reissuing the wlsconn Guyincharge command gives the result shown in Example 11-19 on page 364.

Chapter 11. Tivoli Management Framework core services

363

Example 11-19 wlsconn after updating the Administrator resource rh2900a: wlsconn Guyincharge Name: Guyincharge Server: itso3 Region: 1295714281 Mode: two_way Port: 94 Resource Name ------------TMF_Notice Administrator PolicyRegion TaskLibrary

Last Exchange ------------11/09/97 03:22:34 12/16/98 07:08:44 11/16/98 04:24:17 11/16/98 04:46:08

11.5.5 Resource updates After establishing the connections, you have to update your resources with the wupdate command (you can do this with the -u flag of wconnect, but it is not recommended). This section explains how resources are classified in the Name Registry with respect to updates between interconnected TMRs. Each resource in the Name Registry carries a set of flags that affect how it is exchanged. A resource can be exchangeable, non-exchangeable, or custom: 򐂰 Exchangeable

An exchangeable resource can be updated between TMRs. When an instance of an exchangeable resource is created in TMR A, and the resource type is updated in the interconnected TMR B, the newly-created resource from TMR A becomes visible in the name registry of TMR B. Exchangeable is the default for all resource types. 򐂰 Non-exchangeable

A non-exchangeable resource cannot be updated between TMRs. This is for resources that need only be visible within a single TMR. Examples of non-exchangeable resources in the Tivoli Management Framework are distinguished, classes, presentation objects, and ActiveDesktopList objects. 򐂰 Custom

A custom resource defines its own methods when exchanged between TMRs. If a resource’s custom flag is set, all instances of the resource receive a callback when updated. The resource that is implemented needs to support the InterRegionUpdate interface defined in the TMF_Application.idl file by providing an

364

Troubleshooting Tivoli Using the Latest Features

implementation for the update resource operation. This is documented in TME 10 ADE Framework Services, GC31-8348. There are three custom resource types shipped with the Tivoli Management Framework: – TaskRepository – AdministratorCollection – TopLevelPolicyRegion To show the resource exchange status flags, use the script in Example 11-20. Example 11-20 Resource exchange status script #!/bin/sh # . /etc/Tivoli/setup_env.sh # TNR=`wlookup NameRegistry` for resource in `wlookup -R` do exchangable=`idlcall $TNR TMF_TNR::Resource::get '"'$resource'"' | awk '{print $4}'` if [ $exchangable = 0 ] then echo "$resource\t\tnon-exchangable" elif [ $exchangable = 1 ] then echo "$resource\t\texchangable" elif [ $exchangable = 2 ] then echo "$resource\t\tcustom" elif [ $exchangable = 3 ] then echo "$resource\t\texchangable,custom" fi done exit 0

The wupdate command updates resources in the local Name Registry from one or more remote Tivoli Management Regions (TMRs). When the wupdate command is run, resources are locked in the Name Registry. In some cases, the resource may already be locked, such as when another wupdate is running. The wupdate command will attempt to lock a resource for 60 seconds before timing out. The format is: wupdate [-f] -r resource [-r resource...] TMRs...

Chapter 11. Tivoli Management Framework core services

365

Each inter-region object has a per-TMR/per-resource time stamp for the last time it received an update from an interconnected TMR. When issuing an update on a resource, the method first checks the local time stamp against the time stamp in the name registry of the remote TMR. If the local time stamp is not older than the remote one, the resource is assumed to have not changed, and no exchange of data takes place. With the -f flag, you can force an update regardless of the time stamp on the resource type. If you use ALL instead of a TMR name, the update is made with the corresponding resource from all connected TMRs. Remember that the update of resources is a pull operation, which means you cannot push resources to another TMR. Therefore, if the remote TMR you are retrieving an update from contains resources that your TMR does not have, you will have to register the resource in your local TMR in order to see it. Register resources in the name registry with the wregister command. wregister -i [-f n] -r resource_type wregister [-i [-f n] -r resource_type] name object wregister -u [-r resource_type] name

Table 11-6 lists the wregister flags. Table 11-6 wregister flags Option

Description

-i

Initializes the resource cache. If this argument is not specified and the specified resource type does not already exist in the cache, wregister generates an error.

-f n

Specifies that the resource type being created should be non-exchangeable. Non-exchangeable resource types cannot be updated between connected Tivoli Management Regions (TMRs).

-r resource_type

Specifies the resource type of the resource to be registered. If omitted, the default resource type is distinguished.

-u

Removes a resource from a resource type.

name

Specifies the name under which the resource is to be registered.

object

Specifies the object reference of the resource.

The script in Example 11-21 on page 367 automates the resource update and registration process by looking up each remote TMR resource in the local name registry. If it is not found, it registers it. After completing this process, it performs the update.

366

Troubleshooting Tivoli Using the Latest Features

Example 11-21 Resource wregister and wupdate script #!/bin/sh . /etc/Tivoli/setup_env.sh TMR=bass-region RESOURCEN="endpoint ManagedNode TMF_Notice TopLevelPolicyRegion PolicyRegion ProfileManager" for RES in $RESOURCEN; do wlookup -ar $RES >/dev/null 2>&1 if [ $? -ne 0 ]; then wregister -ir $RES echo "Resource: $RES were registered!" fi wupdate -r $RES $TMR done exit 0

You may have to modify the script to only register and update specific resources. Also you need to mofify the TMR name (you can make the TMR name dynamic in the script).

11.5.6 Case study: Hub-spoke architecture The TME physical topology is primarily determined by the underlying network topology and management system performance goals. In large network environments, we recommend deploying your TME using a hub-spoke architecture. In a hub-spoke architecture, the TME is segmented into several TMRs. Each TMR can be responsible for directly managing a different physical segment of the enterprise, serve a specific business unit, or be organized by security access levels. The central TMR that manages the other TMRs is called the “hub” and the TMRs it manages are called “spokes.” Whether using a one-way or two-way connection between the hub and spoke TMRs, the hub TMR Server forms the central administration point from which all managed functions are performed within a TME. It is dedicated primarily to high-level management functions, such as creating administrator desktops and TEC consoles, creating, configuring, and distributing sentry profiles to spoke servers, and other hub-wide management activities. Spoke TMRs provide the direct control function for all endpoints in the TME. Spoke regions can be used to group managed nodes by physical location in the network and to localize functions in order to improve network and system performance. Generally, Spoke TMRs are not used as entry points for

Chapter 11. Tivoli Management Framework core services

367

administrators. Tivoli Administrators can use either the Hub TMR or any managed node strategically placed in the design as an entry point into the Tivoli Management Environment. Figure 11-15 illustrates the hub-spoke architecture.

HUB TMR

SPOKE TMR + Gateway

TEC TMR

SPOKE TMR + Gateway

Figure 11-15 Hub-spoke architecture

The TEC server can be configured either as a managed node contained within the Hub TMR Server or on a stand-alone TMR at the same management level as the Hub TMR Server. All managed systems (managed nodes, gateways, and endpoints) are spread out into the Tivoli environment beneath Spoke TMRs, as determined by function, server load, or physical network location. Managed nodes are still required in this environment. For example, managed nodes can be used to support remote Tivoli Administrators desktops or to serve for profile staging.

368

Troubleshooting Tivoli Using the Latest Features

Endpoint gateways are installed throughout the Tivoli environment to host endpoints. In this TME hierarchy, all endpoint gateways are assigned to spoke TMRs only.

Recommendations This section discusses recommendations for how to best design your hub-spoke TME.

Policy regions Create policy regions based on Tivoli application (IBM Tivoi Configuration Manager, IBM Tivoli Monitoring, and so on) only on the Hub TMR Server. The subscriber policy regions then reside on the Spoke TMR Servers. The subscribed policy regions contain the profile managers used for distributing to endpoints. Organizing your policy regions in this manner enables the hub server to be the central point of operations for each application and associated functions. This also avoids subscribing endpoints across TMR boundaries. If an endpoint is subscribed across TMR boundaries, a new object is created in the object database and the wchkdb command must track the object directly, causing unnecessary transactions across TMR boundaries and server load. Instead, if endpoints stay subscribed to their local TMR, the hub and spoke TMRs only need to exchange resources, causing only an entry in the Tivoli Name Registry (TNR) on the Hub TMR. The Check Name Registry method does not require objects to be tracked across TMR boundaries. See Figure 11-16 on page 370 for more details.

Chapter 11. Tivoli Management Framework core services

369

Hub TMR

DM_Appl.HUB.PR

Spoke TMR

WinNT_DM_Appl.HUB_PR

Win98_DM_Appl.HUB_PR Subscribers_Spoke.PR

Subscribers_WinNT_Spoke.PR

Subscribers_Win98_Spoke.PR

Subscribers_WinNT_Spoke_PM

Figure 11-16 Subscription example in a hub-spoke model

Resource updates in hub and spoke architecture Whereas the hub and spoke TMRs need to maintain current resource data with each other, updating resources too often can create server and network load, as it takes time. Also avoid updating more than one TMR Server at the same time, as this can cause transaction locking errors. The update method write-locks resources in the Name Registry that are updated, and read-locks resources in remote TMRs. If an update method occurs for any significant amount of time, other methods are locked out of portions of the Name Registry until it completes. This can result in severe performance problems. Wait at least 30 minutes between updates, depending on the number of resources that will be exchanged.

Event server In an interconnected TMR design, the Event Server and the RIM host must be in the same Tivoli Management Region (TMR). Depending on your interconnection needs, you might only deploy either local consoles or remote consoles. If you need remote consoles, there must be a two-way connection in order to allow the remote consoles to connect to the event server.

370

Troubleshooting Tivoli Using the Latest Features

11.5.7 Naming standards Use naming standards that include the TMR name in all resource object labels. Resource objects with the same name in multiple TMRs can cause conflict. Upon updating resources between interconnected TMRs, the Name Registry appends the TMR name to duplicate labels. Also, once updated, the Name Registry prevents you from creating resource objects with the same name. However, this may not be obvious to an operator, making it easy to select the wrong object for an operation. Not using unique names also can create problems with your scripts, when they find duplicate resources in one or more TMRs.

11.5.8 Connections In this section, we will provide you with the commands and some scripts you can use to connect, disconnect, and view your interconnections. In this scenario, bass is our hub TMR and trout and salmon are spoke TMRs. To connect your TMR, you can use the wconnect command as follows: wconnect [-u][-c local_tmr_encrypt_level][-l login][-m Two-way | Managing | Managed][-n][-r remote_tmr_encrypt_level] remote_server

For example: wconnect -n -l root -r simple -m two-way bass

This example connects one TMR with the TMR bass in a two-way connection with the encryption level for both TMRs being simple (unless specified, the command uses the default of simple for both -c and -r) as user root with no prompt for a password (because root in this case has trusted host access). Note: It is not recommend to use the -u flag to exchange resources between connected TMRs, because you will not need every resource from one TMR on the connected TMR. Do the exchange after the connection is established for only the resources you want exchanged.

All the wconnect flags are described in detail in Table 11-7 on page 372.

Chapter 11. Tivoli Management Framework core services

371

Table 11-7 wconnect flags Option

Description

-n

Instructs Tivoli not to prompt for passwords. You can use this argument only when you have trusted host access and do not require an encryption key. Note: Because Windows NT does not have trusted host access, this argument cannot be used when connecting to Windows NT TMR servers.

-u

Updates resources between TMRs in a two-way connection or from the remote TMR in a one-way connection.

-c encrypt_level

Specifies the inter-region encryption level that is in use in the local TMR. Valid encryption levels are simple, none, or DES. If the encryption level is none, no key is required. If the level is simple or DES, the command prompts for the key in use. The encryption key is the same as the installation password specified during the server installation. The encryption level must be the same as that specified during the TMR Server installation. The default encryption level is simple.

-l login

Supplies a login name for the remote connection process. This login must be a valid login for a user on the remote server, and the user must have a Tivoli administrator with the super role defined in the remote TMR. If the -l argument is specified, the command prompts for a password. If the trusted hosts facility is used, do not type the password. Instead, press the Enter key to continue.

-m mode

Specifies the mode of connection to be established between TMRs. Valid connection modes are as follows: 򐂰

Two-way Establishes a two-way connection between TMRs. In a two-way connection, both TMR Servers have managing authority over the resources in both TMRs. This is the default value.

򐂰

Managing Establishes a one-way connection with the local TMR Server as the managing server. The local TMR Server can manage resources in the remote TMR, but the remote TMR Server cannot manage resources in the local TMR.

򐂰

Managed Establishes a one-way connection with the local TMR Server as the managed server. This option is valid on secure connections only. For remote connections, the local server can be only a managing server.

372

Troubleshooting Tivoli Using the Latest Features

Option

Description

-r encrypt_level

Specifies the encryption level in use in the remote TMR. Valid encryption levels are simple, none, or DES. If the encryption level is none, no key is required. If the level is simple or DES, the command prompts for the key in use. The encryption key is the same as the installation password specified during the server installation. The encryption level must be the same as that specified during the TMR Server installation. The default encryption level is simple.

-p

Specifies the port number to use for communication with the TMR Server if different from the local object dispatcher's port. Note: This option is meant for use in development and test environments only. It should not be used in a production environment.

-s

Establishes the connection using the secure connection process. This requires running wconnect -s on both TMR Servers in the connection.

server

Specifies the name of the TMR Server in the remote TMR.

After you have established your connections you can view them with the wlsconn command. wlsconn TMR wlsconn Region -u

If you use wlsconn without any option, you get all current connections with their mode, region name, server, and region number: # wlsconn MODE NAME bass-region salmon-region trout-region

SERVER bass salmon trout

REGION 1438246632 1160615483 1099910505

If you use wlsconn with the TMR name, you get following information: # wlsconn bass-region Name: bass-region Server: bass Region: 1438246632 Mode: two_way Port: 94 Resource Name ------------endpoint

Last Exchange ------------Tue Feb 20 11:33:29 CST 2001

Chapter 11. Tivoli Management Framework core services

373

ManagedNode TMF_Notice PolicyRegion ProfileManager

Mon Thu Sat Sat

Jan Feb Feb Feb

29 15 17 17

09:54:31 16:39:20 17:30:39 17:31:57

CST CST CST CST

2001 2001 2001 2001

If the -u argument is specified, wlsconn completes the exchange of connection information between the local TMR and the connected remote TMR. In this case, there is no difference between the wlsconn bass-region command and the following: # wlsconn bass-region -u Name: bass-region Server: bass Region: 1438246632 Mode: two_way Port: 94 Resource Name ------------endpoint ManagedNode TMF_Notice PolicyRegion ProfileManager

Last Exchange ------------Tue Feb 20 11:33:29 Mon Jan 29 09:54:31 Thu Feb 15 16:39:20 Sat Feb 17 17:30:39 Sat Feb 17 17:31:57

CST CST CST CST CST

2001 2001 2001 2001 2001

You can also use the odadmin command to retrieve information about local and remote TMRs registered by the oserv: # odadmin region Region TME Srvr ipaddr port 1438246632 bass.itsc.austin.ibm.com 9.3.187.155 94 1438246632.1.0 1160615483 salmon.itsc.austin.ibm.com 9.3.187.157 94 1160615483.1.0 1099910505 trout.itsc.austin.ibm.com 9.3.187.156 94 1099910505.1.0

Note: Certain commands may return different results depending on whether they are run from the Hub TMR or Spoke TMR. The information returned from a Spoke TMR may not match that of another Spoke TMR if the same resources have not been exchanged between all the Spoke TMRs and the Hub TMR.

11.5.9 Troubleshooting TMR connections Some useful commands used for troubleshooting connection problems are: 򐂰 wdisconn

Disconnect a TMR. You can specify a region name, or use the -r option to specify the region number. If you want to do this from the Tivoli Desktop, by

374

Troubleshooting Tivoli Using the Latest Features

selecting TMR Connections -> Disconnect..., you can only select a TMR by name. 򐂰 wdisconn -s

Disconnects only one side of the TMR connection. The -r option can be used for the region number if required. 򐂰 wlsconn

Lists the current connections. The desktop option is TMR Connections -> List Connections.... 򐂰 wupdate

Exchanges resources between TMRs. Select TMR Connections -> Update Resources... at the desktop. 򐂰 odadmin region

List all the regions recognized as currently available by the oserv. This includes the local region and is only available from the command line. Useful options are: – add_region Allows the addition of a region; useful when connections have only been partially made and a failure occurred. – delete_region Another useful option to delete a region to clean up partially-completed or failed connections. A TMRs name used for a connection is the same as the default policy region created during the installation of the TMR sever. After disconnecting TMRs, you should always run a wchkdb -ux to clean up any invalid references.

Common problems This section contains advice on problems you may encounter when managing interconnected TMRs. Many of these problems are easier to explain with reference to a diagram, so the following problems will refer to Figure 11-17 on page 376.

Chapter 11. Tivoli Management Framework core services

375

TMR A (Primary TMR)

TMR B

2 way connection

Gateway 1

Endpoint Group 1

Gateway 2

Endpoint Group 2

Figure 11-17 Two-way connected TMRs

Unable to connect previously-connected TMRs Issue the wlsconn command on TMR A to see if there is still a connection to TMR B. Log in to TMR B’s server and do the same. If only TMR B shows that the connection is active, issue the wdisconn command with the -s flag from TMR B. Example 11-22 Using the wdisconn command to disconnect TMRs # wlsconn MODE NAME SERVER REGION rh0255c.itsc.austin.ibm.com-region rh0255c.itsc.austin.ibm.com 1360991896 tivdev02-region tivdev02.itsc.austin.ibm.com 1482082604 # wdisconn -s tivdev02-region # wlsconn MODE NAME SERVER REGION rh0255c.itsc.austin.ibm.com-region rh0255c.itsc.austin.ibm.com 1360991896 # wchkdb -ux

376

Troubleshooting Tivoli Using the Latest Features

It is advisable to run a wchkdb -ux -o after the disconnect has completed. The -o stores references to problem objects in a file. Subsequent checks can then just use that file as input rather than rechecking every object. If errors occurred during the check, you can try taking some corrective actions before running wchkdb again. This time, however, replace -o with -f. This speeds up the check by just looking at the objects listed in . The next wchkdb will show if the errors persist. There may be occasions where simply running the check more than once is enough to resolve the errors. This is the format of wchkdb to use when specifying an input : wchkdb -ux -f

Unable to disconnect a TMR If you are unable to disconnect a TMR, either both or just one TMR will show the connection. 򐂰 Both TMRs show a connection

If disconnecting TMRs fails (see Figure 11-18), check that both TMRs are showing a connection by selecting TMR Connections -> List Connections... by using the wlsconn command.

Figure 11-18 TMR disconnect failed

If TMR B does not have the TMR A region listed in the output from the odadmin region command, use the following command on TMR B: odadmin region add_region

Chapter 11. Tivoli Management Framework core services

377

Where: region_#

The region number of TMR A.

regionname

The name of the TMR A region.

oserv_port

The port number that oserv is running on in TMR A (default is 94).

encryption_level

The level of encryption used for the TMR Connection. The default is simple.

You will also be prompted for a password, which should be the same encryption password that you used for the initial connection. Once you have added the region manually, you can reissue the wdisconn command from TMR B (see Example 11-23). Example 11-23 wdisconn # wlsconn MODE NAME SERVER REGION rh0255b.itsc.austin.ibm.com-region rh0255b.itsc.austin.ibm.com 1515280903 tivdev02-region tivdev02 1482082604 # odadmin region Region TME Srvr ipaddr port 1360991896 rh0255c.itsc.austin.ibm.com 1515280903 rh0255b.itsc.austin.ibm.com

9.3.1.235 9.3.1.234

94 94

1360991896.1.0 1515280903.1.0

# odadmin region add_region 1482082604 tivdev02.itsc.austin.ibm.com 94 simple Remote region key: # odadmin region Region TME Srvr ipaddr port 1360991896 rh0255c.itsc.austin.ibm.com 1482082604 tivdev02.itsc.austin.ibm.com 1515280903 rh0255b.itsc.austin.ibm.com

9.3.1.235 9.3.1.134 9.3.1.234

94 94 94

1360991896.1.0 1482082604.1.0 1515280903.1.0

#wlsconn MODE NAME SERVER REGION rh0255b.itsc.austin.ibm.com-region rh0255b.itsc.austin.ibm.com 1515280903 tivdev02-region tivdev02 1482082604 # wdisconn tivdev02-region # wlsconn MODE NAME

378

SERVER

Troubleshooting Tivoli Using the Latest Features

REGION

rh0255b.itsc.austin.ibm.com-region rh0255b.itsc.austin.ibm.com 1515280903 # odadmin region Region TME Srvr ipaddr port 1360991896 rh0255c.itsc.austin.ibm.com 1515280903 rh0255b.itsc.austin.ibm.com #

9.3.1.235 9.3.1.234

94 94

1360991896.1.0 1515280903.1.0

The sample screen shows that: – After running wlsconn, the tivdev02-region is connected. – The oserv does not know about the resource when odadmin region is entered. – The tivdev02-region is added manually using odadmin region add_region. – odadmin region now agrees with the wlsconn command, tivdev02-region is connected, and oserv knows it. – The wdisconn tivdev02-region command is issued to disconnect the TMRs. – wlsconn and odadmin region now agree that tivdev02-region has been disconnected. 򐂰 Only one TMR shows a connection

If only one side shows a connection, then you can try a one-sided disconnect from the TMR where the connection appears active, as shown below: wdisconn -s region-name

Unable to see the remote resources from your TMR To ensure that both TMR Servers reflect that there is a connection, use the wlsconn command or select TMR Connections -> List Connections.... Have the relevant resources been updated across the TMR connection since they were added/deleted in their home TMR? Updates can be scheduled or done manually using the GUI (Select TMR Connections -> Update Resources...) or through the command line with wupdate (or wupdate -f to only update changed resources based on time stamps). See the Tivoli Management Framework User ’s Guide Version 4.1, GC32-0805 section on resource updates for details.

Chapter 11. Tivoli Management Framework core services

379

Unable to perform actions on remote objects Does your administrator user ID have the authority to operate on objects in policy regions in the remote TMR? This can be achieved two ways: 1. Give the administrator the correct level of access across the policy regions in the remote TMR. Update the TMR Roles option when updating or creating an Administrator. 2. Select specific roles for selected resources using the Resource Roles option when creating or updating an administrator (Figure 11-19).

Figure 11-19 Update Resource Roles for an administrator

You cannot update the roles for resources that you cannot manage yourself or update the resource roles of an administrator that has higher privileges.

380

Troubleshooting Tivoli Using the Latest Features

Only one TMR is seeing the shared resources Remember that the exchange of resources is a pull operation. The TMR not seeing any remote resources has probably not issued a wupdate to the other TMR. If you have a one-way connection, only the managing TMR will see the remote resources. The Manager option can be selected during a secure connection or the managing TMR will be the one from which the remote connect was started.

An application fails across TMR boundaries Check that you have a two-way connection. Some applications that run across a connection need to return information to the TMR from which they were started. If the process and information can only flow one way, the application may not function correctly. Check that the application is installed on both TMR A and TMR B plus all the managed and PC managed nodes. For example, running an Inventory scan on TMR A for resources in TMR B requires that the product be installed on both TMR Servers and all of the machines you wish to scan in both TMRs.

Installation fails across TMR connections The installation of applications is not supported over TMR connections. Unfortunately, the standard installation process does not check this, and it will allow the installation to start.

Both TMRs in a one-way connection are set to Manager Go to the TMR that should be managed and do a one-sided disconnect: wdisconn -s

Once the disconnect is complete, you can reissue the connection request with the correct settings.

11.6 Multiplexed Distribution Multiplexed Distribution (MDist) is a core Tivoli Management Framework service to provide a distribution function of large volumes of data to multiple recipients. To be able to do such a distribution, a fan-out structure is used, where by a node receives a data stream and forwards it onto multiple recipients, which in turn can forward on themselves to other nodes. This fan-out node is called a repeater. Repeaters are used when the data being sent is greater than 16 KB. There is only one instance when a repeater is never used in a distribution. That is when the distribution is less than 16 KB and only to one target. This then uses oserv to oserv communications. When the data is greater than 16 KB, then the technique

Chapter 11. Tivoli Management Framework core services

381

used for delivery is called Bulk Data Transfer (BDT). It in turn uses a data communication form called Inter Object Messaging (IOM). There are currently two versions of MDist present. In Tivoli Management Framework Version 3.6.x and below, only MDist1 support is provided, while in Tivoli Management Framework Version 3.7 and above, there are two versions, MDist1 and MDist2. The following sections discuss MDist2 in detail, emphasizing the differences between MDist2 and the previous version of MDist.

11.6.1 Mdist and the distribution hierarchy A repeater functions by having a list of end targets, the name and arguments to a CORBA method to be executed on the end target, and an input stream. The data stream is sent to each end target and is exactly the same. The process called rpt performs this function. Information on the status of a distribution is only done when the last end target has received its data stream, that is, when the log and/or the Notices is updated. This is a synchronous distribution. Distribution flow is co-ordinated from the TMR by the repeater manager (rptm). This process also contains the configurations of all installed repeaters. Configuration of repeaters is done by the single command called wrpt. Data that comes to a repeater will be on one thread (In Spool thread), while data that is sent to the end targets will be on other threads (Out Spool threads). A gatelog at debug level 9 shows this relationship as well as other repeater information. The thread information, as well as other repeater related information, is emphasized below. Example 11-24 Information related to repeater configuration in the gatelog gwcache: hit key= gwcache: hit key= mdist: Registering repeater Manager: 1670943514.1.365 mdist: TMF_rptm_mgr::rpt_register called, tuning parms: mdist: mem_max = 10000 mdist: disk_max = 50000 mdist: disk_hiwat = 50000 repeater tuning values mdist: disk_time = 1 mdist: disk_dir = c:/Tivoli/db/win-inv01a.db/tmp/ mdist: net_load = 500 mdist: max_conn = 100 mdist: stat_intv = 180 mdist: Opening cache file: c:\Tivoli\db\win-inv01a.db\tmp\pmap2 location of swap file sched: got a job sched: got a job sched: got a job sched: got a job mdist: in_spool_thread started: TID = 12272d0

382

Troubleshooting Tivoli Using the Latest Features

mdist: out_spool_thread started: tid = 122de68 client = [1670943514.7.508+#TMF_endpoint::endpoint#] mdist: out_spool_thread started: tid = 122dd08 client = [1670943514.8.508+#TMF_endpoint::endpoint#] mdist: out_spool_thread started: tid = 122db50 client = [1670943514.9.508+#TMF_endpoint::endpoint#] ….. new_session: 11a262c3, connecting to 172.24.1.15+8898... reader_thread: received data: session=11a262c1, type=9, len=52 mdist: in_spool_thread finished: TID = 12272d0 reader_thread: received data: session=11a262c3, type=9, len=52 reader_thread: received data: session=11a262c1, type=5, len=116 destroying session 11a262c1 mdist: Finished out_spool to 1670943514.7.508+#TMF_endpoint::endpoint# Results mdist: Result length for 1670943514.7.508+#TMF_endpoint::endpoint# = 42 mdist: out_spool_thread finished: tid = 122de68 client = [1670943514.7.508+#TMF_endpoint::endpoint#] mdist: results collected, ncomplete = 1 nactive = 2 reader_thread: received data: session=11a262c2, type=9, len=52 reader_thread: received data: session=11a262c3, type=5, len=116 destroying session 11a262c3 TRUNCATED...destroying session 11a262c2 mdist: Finished out_spool to 1670943514.8.508+#TMF_endpoint::endpoint# mdist: Result length for 1670943514.8.508+#TMF_endpoint::endpoint# = 42 mdist: out_spool_thread finished: tid = 122dd08 client = [1670943514.8.508+#TMF_endpoint::endpoint#] mdist: results collected, ncomplete = 3 nactive = 0 mdist: Distribution (2) finished.

Repeater configuration and placement The location and configuration of repeaters should be based upon the network topology in terms of network speeds and network hardware (repeaters, bridges, and so on), the structure of the organization in terms of organizational units. When a TMR is installed, it will automatically become the first repeater in the hierarchy of the organization. Running the wrpt command will show the following for a TMR called win-tmr01a: win-tmr01a [1]

wd-

[]

The first element is the host name, followed by the dispatcher number in the number in the brackets, then the repeater flags. In this case, the w and d are set. Table 11-8 on page 384 shows the repeater flags and what they represent. To set a flag, use the wrpt command.

Chapter 11. Tivoli Management Framework core services

383

Table 11-8 Repeater flags Flag

Description

Default (d)

This flag should be set for only one repeater per TMR. The repeater with the default flag is the one that services all repeaters that are not part of another repeater's range. The TMR Server is the initial default repeater.

Nodefault

Turns off the default flag

Wan (w)

When a repeater has this flag enabled, that repeater is the machine used for all TMR to TMR profile distributions. This setting is the WAN entry point, not the WAN exit point. It determines what machine receives distributions into that TMR, not what machine is used to exit that TMR.

Noalways

Turns off the always flag

Nowan

Turns off the WAN flag

Always (a)

When a repeater is set to always, it is always used during a distribution that goes to one of its targets. This flag is used in cases such as Firewalls and WAN connections/Slow Links. This is when the repeater that has the always flag has tuning parameters that are designed to reduce network load.

Using Always and Wan options for interconnected TMRs As other repeaters are created, a hierarchical tree will be constructed, where repeaters service other repeaters. This is through the configuration of ranges Figure 11-20 on page 385 shows the wrpt command output with the Always option.

384

Troubleshooting Tivoli Using the Latest Features

Figure 11-20 Always flag setting example

Figure 11-21 on page 386 shows an example for repeater configuration for interconnected TMRs where the WAN option is implemented.

Chapter 11. Tivoli Management Framework core services

385

2

Figure 11-21 Using WAN option for interconnected TMRs

Example 11-25 shows the repeater configuration we used for interconnect Windows 2000 and AIX TMR in our scenarios. The wlsconn command shows that we have a two-way interconnection between AIX and Windows 2000 TMR. Example 11-25 Repeater configuration for interconnected TMRs # wlsconn MODE NAME SERVER REGION win-tmr02a-region win-tmr02a 1831081883 # wrpt aix-tmr1b [1] wd- [] aix-inv01b [23] --- [] itsodev1 [26] --- [] win-tmr02a [1] wd- [] 3c056 [6] --- [] # wrpt -q aix-tmr1b aix-inv01b itsodev1 win-tmr02a 3c056 --[RPT:aix-tmr1b [1]] |--aix-inv01b [23] |--itsodev1 [26] |--[RPT:win-tmr02a [1]] | |--win-tmr02a [1] | |--3c056 [6]

386

Troubleshooting Tivoli Using the Latest Features

For endpoints, there are a couple of commands that can be used to find information. The first is wrpt -ge. This lists the repeaters by their OIDs and the endpoint ranges are also specified. For example, in the above example, the output for the endpoints would be like Example 11-26. Example 11-26 Endpoint ranges with wrptr -ge # wrpt -ge 1375372617.1.578 wan default ep_range=4-10,12-14,19-22,35,38-40,42,51-54,56,59,62-65,69 1375372617.23.7 1375372617.26.19 ep_range=68,70-71 1831081883.1.577 wan default ep_range=3-4,7,9-10 1831081883.6.21 ep_range=8

An alternative view is Example 11-27. In this case, the endpoint ranges are in the angled brackets. Example 11-27 Endpoint ranges with wrpt -e # wrpt -e aix-tmr1b [1] wd- [] aix-inv01b [23] --- [] itsodev1 [26] --- [] win-tmr02a [1] wd- [] 3c056 [6] --- []

The route to an endpoint can be found in much the same way as a managed node's route. For example, using the wrpt and wep commands, we can determine the route for distributions for one or more endpoints at a time (see Example 11-28). Example 11-28 Finding distribution route for endpoints C:\Tivoli\bin\w32-ix86\bin>wrpt win-tmr01a [1] wd- [] win-inv01a [2] w-- [] win-arch01a [3] --- [] win-rptr01a [4] --- [] C:\Tivoli\bin\w32-ix86\bin>wep ls G 1370748664.3.24 win-arch01a-gw 1370748664.11.522+#TMF_endpoint::endpoint# G 1370748664.4.21 win-rptr01a-gw 1370748664.12.522+#TMF_endpoint::endpoint# 1370748664.14.522+#TMF_endpoint::endpoint# 1370748664.17.522+#TMF_endpoint::endpoint# 1370748664.20.522+#TMF_endpoint::endpoint#

WIN-OME-A WIN-NTK-A WIN-CHRIS-A windows2K01 3C055

Chapter 11. Tivoli Management Framework core services

387

1370748664.21.522+#TMF_endpoint::endpoint# ibmtiv8 1370748664.6.522+#TMF_endpoint::endpoint# WIN-ARCH01A 1370748664.7.522+#TMF_endpoint::endpoint# WIN-RPTR01A 1370748664.8.522+#TMF_endpoint::endpoint# WIN-TMR01A 1370748664.9.522+#TMF_endpoint::endpoint# WIN-TINA-A G 1370748664.1.591 win-tmr01a-gw 1370748664.19.522+#TMF_endpoint::endpoint# ibmtiv10 1370748664.22.522+#TMF_endpoint::endpoint# FUTURISM 1370748664.23.522+#TMF_endpoint::endpoint# shaker 1370748664.5.522+#TMF_endpoint::endpoint# WIN-INV01A

C:\Tivoli\bin\w32-ix86\bin>wrpt -q @ManagedNode:win-arch01a @endpoint:WIN-ARCH01A @endpoint:WIN-RPTR01A @endpoint:WIN-INV01A @endpoint:WIN-OME-A --[RPT:win-arch01a [3]] |--[RPT:win-arch01a [3]] | |--[RPT:win-tmr01a [1]] | | |--WIN-INV01A [5] | | | |--[RPT:win-rptr01a [4]] | | |--WIN-ARCH01A [6] | | |--WIN-RPTR01A [7] | | | |--WIN-OME-A [11] |

We can also exchange the resources between TMRs with wupdate and find the route to an endpoint in a connected TMR (see Example 11-29). Example 11-29 Finding route to an endpoint in an interconnected TMR # wlsconn MODE NAME SERVER REGION win-tmr02a-region win-tmr02a 1831081883

# wupdate -r endpoint win-tmr02a-region TRUNCATED... win-bkp01b 1375372617.14.522+#TMF_endpoint::endpoint# win-bkp03b 1375372617.5.522+#TMF_endpoint::endpoint# win-w200083b 1831081883.7.522+#TMF_endpoint::endpoint# win-w200200a 1831081883.3.522+#TMF_endpoint::endpoint# winarch01b 1375372617.7.522+#TMF_endpoint::endpoint# # wrpt aix-tmr1b [1] wd- [] aix-inv01b [23] --- []

388

Troubleshooting Tivoli Using the Latest Features

itsodev1 [26] --- [] win-tmr02a [1] wd- [] 3c056 [6] --- []

# wrpt -q aix-tmr1b @endpoint:win-w200083b --[RPT:aix-tmr1b [1]] |--[RPT:win-tmr02a [1]] | |--win-w200083b [7] | # # wgateway Object 1375372617.1.578 1375372617.26.19 1831081883.1.577 1831081883.6.21

Name aix-tmr1b-gw itsodev1-gw win-tmr02a-gw win-tmr02b-gw01

Status u u u u

The wrpt -q aix-tmr1b @endpoint:win-w200083b command shows us the distribution route from local TMR Server aix-tmr1b to the endpoint win-w200083b in the interconnected TMR.

11.6.2 Repeater tuning in MDist1 There is a number of tuning parameters that are used to control the flow of the data stream from source to the destination. Each repeater has to be configured correctly to avoid network and resource issues. To view the current settings of a repeater, use the following command: wrpt -t repeaterName

For example, consider the configuration in Example 11-30. Example 11-30 Repeater parameters for aix-tmr1b # wrpt -t aix-tmr1b mem_max = 10000 disk_max = 50000 disk_hiwat = 50000 disk_time = 1 disk_dir = "/tmp" net_load = 500 max_conn = 100 stat_intv = 180

Chapter 11. Tivoli Management Framework core services

389

The above output also shows the values that are used. These will need to be changed to be applicable to the Tivoli environment. mem_max (KB)

This is where data is spooled first during a distribution. It uses real memory (RAM) and not swap memory. Use the default value that is supplied and increase only if the system resources are available.

disk_max (KB)

This is the size of paging or swap that data will use once mem_max has been exhausted. The recommendation is that the size be up to 20% larger than the largest distribution that will go through the repeater.

disk_hiwat (KB)

This represents the hi-water mark for disk_max. Once the value of disk_hiwat has been reached, there is a slow down in the writing to disk every disk_time seconds.

disk_time (secs)

This is the delay to wait between disk block allocations once disk_hiwat has been reached. The default value of 1 is usually not changed.

disk_dir

This is the directory where disk_max paging file is written, and thus be of a size larger than disk_max.

Note: If there is more than one distribution happening simultaneously, then the size of disk_dir must be able to handle such events. That is, if disk_max is set at 200 MB and there are three distributions using this value, then disk_max must be at least 3 x 200 MB = 600 MB. This is because MDist1 works per distribution.

The need to save the data of a distribution is due to slower targets that may be present. By default, disk_dir is %DBDIR%/tmp under NT, while under UNIX it is /tmp, which is used which on many systems as a swap space by applications and the operating system. So a dedicated directory or file system should be used for UNIX based repeaters. Looking at the gatelog in Example 11-31 on page 391, there are actually going to be two swap files created, one called swap.xxx, where xxx is a number in the disk_dir directory and the other is called pmapx, where x represents the distribution ID. You can define the $TMP variable in the odadmin environment to point to a new location.

390

Troubleshooting Tivoli Using the Latest Features

Example 11-31 Changing the swap directory for NT managed nodes C:\Tivoli\db\win-inv01a.db>wrpt -t win-inv01a disk_dir=c:/temp C:\Tivoli\db\win-inv01a.db>wrpt -t win-inv01a mem_max = 10000 disk_max = 50000 disk_hiwat = 50000 disk_time = 1 disk_dir = "c:/temp" net_load = 500 max_conn = 100 stat_intv = 180

If you rexec the oserv on the gateway and repeat the distribution, then the swap file will be written to c:\temp directory. The parameters in Example 11-31 are as follows: net_spacing (milliseconds)

This is the time to wait between each 16 KB write to the network. By default, it is 0, but you may want to changed it if there is a slow network. Once it has been set, it will become visible to the wrpt command.

stat_intv (secs)

This value is a timeout that times how long it will take to send each Tivoli packet (normally 16 KB). If your network is too slow or saturated, stat_intv needs to be higher. If you are getting high-level TCP timeout errors, increase stat_intv, reduce max_conn, or set SLOW_LINK. stat_intv is not used when distributing from a gateway to a Tivoli Managed Agent; the gateway's session_timeout is used instead. More information and configuration of timeouts can be found in a separate section further on in this document.

max_conn

This parameter deals with the number of targets that a repeater distributes to.

net_load (kbps)

This is the amount of network bandwidth used in a distribution. This should be tuned for the LAN and WAN links. This parameter is the reason you should have a repeater at the central site for each WAN type.

Chapter 11. Tivoli Management Framework core services

391

Timeouts and parameters for repeaters You can use the wrpt command to change the repeater manager parameters.

Global Repeater Manager timeout This is set to prevent a hanging condition if there is a loss of contact with a target during a distribution. By default it is set to 0 or never timeout. Using the wrpt -T command gives the output in Example 11-32. Example 11-32 Global Timeout parameter C:\Tivoli\db\win-inv01a.db>wrpt -T repeater manager timeout is infinite.

This time-out setting only applies to managed nodes, not endpoints, which are serviced by the set_session_timeout.

Gateway session timeout This is used exclusively for endpoint timeouts, and is equivalent to the Global Repeater Manager timeout explained earlier. It is set via the wgateway command and the default is 300 seconds (5 minutes). It represents the maximum time a method downcall can be made before an error is generated. Applications like IBM Tivoli Configuration Manager can adjust the timeout. In the case of a Software Distribution component, it is through the use of the progs_timeout variable in the filepackage definition file. The value is set for BARC (Before, After, Removal and Commit) scripts to prevent a hanging condition. For a distribution to an endpoint, the gateway session timeout is set to the value of progs_timeout plus 10 seconds. The default value is 0, which is interpreted as 1 hour. For the Inventory component, there is a profile setting that determines how long a scan will take on the target. The time-out value is set by using the wsetiprf command (see Example 11-33). The default values are as follows: 򐂰 Scanner timeout period: 2700 seconds (45 minutes) 򐂰 Endpoint timeout period: 1800 seconds (30 minutes) Example 11-33 Setting timeout parameter for Inventory profiles C:\>wsetiprf -t 50 @InventoryProfile:"HardwareScan" Scanner timeout period: 2700 seconds endpoint timeout period: 50 seconds

stat_intv timeout This timeout is used exclusively between managed nodes and represents time to send each packet. The error message associated with a failure is a High level TCP timeout.

392

Troubleshooting Tivoli Using the Latest Features

Table 11-9 shows the relationship between distribution types and the timeouts. Table 11-9 Timeout for distributions Function

Target timeout for distributions

managed node to managed node

gateway to endpoint

Before

final_timeout (wrpt -T) stat_intv

progs_timeout final timeout 1 hour

Send

stat_intv

gateway session timeout

After

Final Timeout (wrpt -T)

progs_timeout final timeout 1 hour

repeater to repeater

Infinite

net_load parameter The amount of bandwidth used is one of the parameters that requires careful planning, as to not overload the network with data where there are restrictions in place, namely shared or slow bandwidths. It is suggested that 25% of the bandwidth be used, as not to overload it. Thus, 25% of a 10 MBps Ethernet LAN is 312 kbps, and 50% would be 625 kbps. There are three types of net_load setting: A positive net_load (default type), a negative net_load, and slow links, which change the packet size. In the use of a positive net_load, the distribution through the repeater uses a set amount of bandwidth, with each connection using part of the value. The amount is dependent upon the max_conn setting. For example, if there are only four connections and max_conn is set to 10, then it would be 500/4 = 125 kbps. If there are 10 connections and max_conn is set to 10, then each connection will be 500/10 = 50 kbps. If there are going to be 20 connections made, then the speed will be still 50 kbps, as the max_conn value prevents more than 10 connections being made at once.

Chapter 11. Tivoli Management Framework core services

393

Figure 11-22 Net_load is distributed between connections of a single distribution

Important: If there is more than one distribution going through a repeater, then the bandwidth will be multiplied by the number of distributions. That is, if there are four connections each of 125 kbps, but with two distributions taking place, then 2 x 125 kbps = 250 kbps of bandwidth will be used.

For a negative net_load, the bandwidth is per connection and not per distribution, as described for a positive net_load. Thus, if net_load is specified as: net_load = -500

then each endpoint will have a connection with a bandwidth of 500 kbps. But note that for four connections, the repeater itself will use 4 x 500 kbps bandwidth from it. A slow link environment is used when there is a limited bandwidth available. Setting this variable by using the odadmin command, as shown in Example 11-34. Example 11-34 Changing the distribution packet size odadmin env get > env.txt Edit env.txt and add a line with SLOW_LINK=TRUE odadmin environ set < env.txt Reexec the oserv of the Managed Node

The effect is to have packet size reduced from 16 KB to 512 bytes and the net_load setting from kbps to bytes/sec. The consequence is that the net_load value must be much higher.

394

Troubleshooting Tivoli Using the Latest Features

11.6.3 Active distributions There are three options to pass to the wrpt command to manage active distributions. They are: wrpt -L wrpt -R -k id wrpt -A [-f] -k id

Where: -L

Lists all active distributions in a four-column format. The first column is the unique active distribution number, the second column is the distribution name (a label chosen by the application), the third column is the distribution's start time, and the fourth column gives statistics for the distribution in the following format: in/est_size [out_min-out_max]. in: Amount of data transmitted at that time est_size: File package size out_min: Smallest amount of data transferred out_max: Largest amount of data transferred

-R

Shows the repeater route.

-k

When used as an active distribution option (with the -R and -A options), -k specifies the target active process. When used as a tuning option, -k causes configuration options to affect only the active distributions.

id

Specifies the unique process number of an active distribution, obtained with the -L argument and used with the -k argument.

-A

Aborts a distribution. The user is prompted Are you sure? unless the force argument (-f) is also given.

-f

Forces an operation (suppresses any confirmation prompt).

Example 11-35 on page 396 shows two active distributions and the update status.

Chapter 11. Tivoli Management Framework core services

395

Example 11-35 wrpt c:\>wrpt -L 2 fp_distribute 01 02 22:46:48 3 fp_distribute 01 02 22:46:49

5136/0 [32-32] 5136/0 [32-32]

c:\>wrpt -L 2 fp_distribute 01 02 22:46:48 3 fp_distribute 01 02 22:46:49

5136/0 [4424-4424] 5136/0 [4224-4224]

c:\>wrpt -L 2 fp_distribute 01 02 22:46:48 3 fp_distribute 01 02 22:46:49

5136/0 [5136-5136] 5136/0 [5136-5136]

11.6.4 MDist2 components and functionalities Tivoli Management Framework Version 3.7 and above provides a new improved distribution mechanism called MDist2. All the IBM Tivoli Configuration Manager components use all the functionalities provided by MDist2 The following sections discuss MDist2 in detail, emphasizing the differences between MDist2 and MDist (also referred to as MDist1).

What is MDist2 MDist2 is a Tivoli Management Framework service that provides Tivoli Applications with the functionalities required to perform data transfers through a hierarchy of repeaters. It provides utilities to fully control and automate the application profile distributions. Tivoli Management Framework Version 3.7 and above still supports MDist1 repeaters for backward compatibility. MDist2, however, extends MDist1 capabilities to handle the large-scale distribution needs of Tivoli applications. The following section describes the primary MDist2 components and how they work together.

396

Troubleshooting Tivoli Using the Latest Features

Figure 11-23 Mdist2 components

MDist2 components Figure 11-23 illustrates the primary MDist2 components: Repeater manager

The Tivoli object that maintains configuration data for all repeaters in the TMR. It also determines the distribution path. There is one repeater manager per TMR.

Repeater site

The intermediate client that receives a single copy of data and sends it to another repeater site or target clients.

Repeater depot

The storage site for MDist2 distributions. Every repeater has a depot. Thus, data can be stored on any repeater in the Tivoli environment. This storage mechanism helps reduce network traffic for frequently distributed data sets.

Repeater queue

The queuing mechanism for MDist2 distributions. Every repeater has a queue. The distribution is queued and its persistent information is kept as a local file. This queuing mechanism includes a retry function that enhances support for unreachable targets.

Chapter 11. Tivoli Management Framework core services

397

Distribution manager

The Tivoli object that updates status in the database. There is one distribution manager per TMR. Thus, each TMR keeps track of all distributions it launches.

GUI

The JAVA interface used to view status and control distributions.

RIM

Stands for the RDBMS Interface Module. It is a common interface Tivoli applications can use to store and retrieve information from a relational database, and is used to store MDist2 distribution data.

It is also important to note that there are two repeater types in MDist2: 򐂰 Gateway repeater (TMF/LCF/gateway) 򐂰 Managed node repeater (TAS/MANANGED_NODE/rpt2)

A gateway repeater is linked into the gateway, whereas a managed nodes repeater is a stand-alone binary that runs on a managed node. Although both types of repeaters use the same code base and functionally is very similar, there are some differences between them. Table 11-10 summarizes these differences. Table 11-10 Differences between gateway and managed node repeaters Repeater type

Log file

Targets

Lifetime

Gateway repeater

$DBDIR/gatelog

Repeaters and endpoints

Always up as part of a gateway

Managed node repeater

$DBDIR/rptlog

Repeaters only

Exits if queue is empty after 20 minutes of activity

11.6.5 What is new in MDist2 MDist2 provides the following new functions for distributions in large scale environments: 򐂰 All new repeater with the following functions:

– Asynchronous operation – Priority queues – Total resource limits (memory, disk, and bandwidth) – Data depots – Checkpoint restart – Persistent queues for assured delivery

398

Troubleshooting Tivoli Using the Latest Features

– Disconnected endpoint support – Mobile computing support – Queue data for disconnected endpoints 򐂰 GUI and CLIs

– Real time status – Distribution Control (cancel and pause/resume) We will cover all these functions in detail. Table 11-11 provides a quick comparison of the differences between MDist2 and MDist1. Table 11-11 MDist1 and MDist2: Comparison MDist2 repeater

MDist1 repeater

To application

Asynchronous

Synchronous

Distribution queue

Prioritized

Non-prioritized

Resource limit

Per repeater

Per distribution

Distribution source

Source host/depot

Source host

Data flow

Store and forward

Pipeline

Interrupted distribution

Kept in queue/Retry

Aborted

Restart from interruption

Restart from checkpoint

Manual restart back to original configuration

Disconnected endpoint

Supported by queue

Aborted

Distribution status

Available

Result only

Distribution control

Available

Limited ability to cancel

Asynchronous delivery MDist2 provides an asynchronous interface to applications. Figure 11-24 on page 400 explains this concept.

Chapter 11. Tivoli Management Framework core services

399

1 2 3

Figure 11-24 Asynchronous delivery concept

1. The application submits a delivery request, and immediately gets a return. 2. The delivery operation is in progress. It may take a long time. 3. The final exit status of each distribution is returned through a callback function. The application does not need to be kept waiting until the delivery is completed for all destinations, as with MDist1, which uses a synchronous interface. Figure 11-25 explains the synchronous delivery concept in MDist1.

1 2 3

Figure 11-25 Synchronous delivery concept

1. The application requests a delivery. 2. The delivery operation is in progress. It may take a long time. 3. After all targets receive the distribution, the final exit status of each distribution is returned.

400

Troubleshooting Tivoli Using the Latest Features

Any targets encountering errors or are unreachable affect the throughput of the whole distribution process. Asynchronous delivery allows many independent delivery operations to be in progress. Asynchronous delivery enables efficient software package distributions to large numbers (thousands or more) of targets without being slowed down when encountering errors or unreachable targets. Also, the distribution log file, as shown in Example 11-36, has two parts: 򐂰 Submission 򐂰 Final results

The Telephone_Directory^1.2 package was sent to targets named ausres42, ausres43 and ausres45. The distribution was successful for ausres42 and ausres45, but unsuccessful for ausres43, because the package was already installed on that target. Example 11-36 Distribution log file Software Package: "Telephone_Directory^1.2" Operation: install Mode: not-transactional,not-undoable Time: 2000-04-04 19:59:54 ================= ausres45: ausres43: ausres42: Operation successfully submitted. Distribution ID is 1757544609954 ================= Software Package: "Telephone_Directory^1.2" Operation: install Mode: not-transactional,not-undoable Time: 2000-04-04 20:00:50 ================= ausres42: Distribution ID: `1757544609954896390' Operation successful. ***************** ausres43: Distribution ID: `1757544609954896390' Operation unsuccessful.

Chapter 11. Tivoli Management Framework core services

401

Current software package status is 'IC---'. The requested operation is not allowed for the software package Telephone_Directory^1.2 ***************** ausres45: Distribution ID: `1757544609954896390' Operation successful.

Priority queues Queues in MDist2 have the following functionalities: 򐂰 Handle a large number of concurrently active distributions 򐂰 Distribution to disconnected endpoints 򐂰 Persistent with automatic retry 򐂰 Distributions can be prioritized (high, medium (default), and low) 򐂰 Distributions may have a deadline that specifies when they expire

Figure 11-26 shows the structure of a queue in a repeater.

Figure 11-26 MDist2repeater queue

To handle a large number of distributions, MDist2 repeaters use a queue mechanism. MDist2 distributions can have three priority levels: High, medium, and low. Priority levels designate the order in which distributions are handled by repeaters, affecting the queue placement for each package. Distributions with

402

Troubleshooting Tivoli Using the Latest Features

higher priority levels are handled before ones with lower priority. Repeaters handle distributions with the same priority level in the order in which they are received. MDist2 allows the maximum number of concurrent connections to be specified for each priority level and for each repeater. A distribution with a given priority level can use the number of connections reserved for its priority level plus any connections allocated for lower priority levels.

C o n c u r r e n t C o n n e c tio n s p e r R e p e a te r H

H

H

H

H

max_sessions_high

M

M

M

M

M

M

max_session_medium

L

L

L

L

L

max_sessions_low

H ig h P r io rity

M e d iu m P rio r ity

L o w P r io rity

Figure 11-27 Available connections for each priority

For example, suppose the queues are configured as follows: 򐂰 max_sessions_high = 5 򐂰 max_sessions_medium = 10 򐂰 max_sessions_low = 40

In this example, high priority distributions can use up to 55 (5+10+40) sessions, medium priority distributions up to 50 (10+40) sessions, and low priority distributions up to 40 sessions. In the above example, we used the parameters max_sessions_high, max_sessions_medium, and max_sessions_low to represent the numbers of maximum concurrent connections for high, medium, and low distributions. Use the wmdist command with the -s argument to view or configure these MDist2 options. In Example 11-37 on page 404, we changed the maximum number of concurrent high priority distributions allowed on repeater chatham to 10.

Chapter 11. Tivoli Management Framework core services

403

Example 11-37 Changing priority #wmdist -s chatham max_sessions_high=10 # #wmdist -s chatham repeater_id: rpt_dir: permanent_storage: max_sessions_high: max_sessions_medium: max_sessions_low: disk_max: mem_max: send_timeout: execute_timeout: notify_interval: conn_retry_interval: retry_ep_cutoff: net_load: packet_size: target_netload: debug_level: debug_delay:

1978508757.1.604 /usr/local/Tivoli/db/chatham.db/tmp/ TRUE 10 10 40 512000 65536 300 600 30 900 7200 500 16384 0 3 0

With IBM Tivoli Configuration Manager the priority level is assigned at distribution time, and includes a series of change management operations: Install, remove, commit, accept, undo, and verify. The default priority value is Medium. Figure 11-28 on page 405 shows an example of the Install Software Package dialog. Note the Priority Level group box for setting the priority to Low, Medium, or High.

404

Troubleshooting Tivoli Using the Latest Features

Figure 11-28 Software Distribution GUI - Install Software Package: Setting priority

To do the same operation from the command line, a series of change management commands (winstsp for install and so forth) provide the argument -l to specify the MDist2 related options. Set the priority level using the priority option, and its value as either h (for highest priority), m (for medium priority), or l (for low priority). For example, -l priority=h sets the priority level to high. Note: Priorities cannot be changed once a distribution has been submitted

Retrying broken connections 򐂰 Between gateway and TMAs:

– Gateway repeater can intercept endpoint logins (for example, IBM Tivoli Configuration Manager forced a reboot of the endpoint). – Gateway repeater will also retry the connection every conn_retry_interval seconds until retry_ep_cutoff seconds have elapsed.

Chapter 11. Tivoli Management Framework core services

405

򐂰 Between repeaters:

– Repeater will retry the interrupted repeater every conn_retry_interval seconds until the distribution expires. This could be a long time.

Total resource limits per repeater When a repeater fans out a distribution to multiple connections, it does not use one connection at a time, but instead several connections at the same time. These distribution processes consume system and network resources. It is very important to manage these resources in order to improve performance and throughput not only for Tivoli Software Distribution, but also for the system itself when other applications are running on the repeater node. MDist2 repeaters provide some parameters that specify the total amount of resources a repeater can use. MDist1 repeaters also provide performance options. However, these configurations apply for each distribution. If multiple distributions occur, the resources allotted by the repeater settings are multiplied by the number of distributions initiated. This can lead to unexpected resource utilization, affecting system performance and throughput adversely and potentially causing a system hang. When using MDist1 repeaters, we do not recommend concurrent distributions in large-scale environments. MDist2 settings apply per repeater, improving manageability of system and network resources, such as memory, disk space, and network bandwidth. Better system performance and throughput allow for faster and more efficient distributions. The following three sections cover MDist2 resource configuration options.

Distribution connections MDist2 repeaters provide three parameters that specify the number of concurrent connections allowed within a priority level: 򐂰 max_sessions_high (five connections by default) 򐂰 max_sessions_medium (10 connections by default) 򐂰 max_sessions_low (40 connections by default)

406

Troubleshooting Tivoli Using the Latest Features

Note: Compare these parameters with the MDist1repeater parameter max_conn. Max_conn defines the total number of sessions (without priority), per distribution. However, with MDist2, the possible number of available connections is the sum of high, medium, and low connections. Connections never exceed this number, since they are applied per repeater, not per distribution, as in the case of max_conn. These connections are shared among all active distributions.

Normally, this parameter affects the distribution process between the gateway repeater and the target endpoint. Figure 11-29 explains how these parameters work.

TMR Server

2

TMR Server

Repeater Queue repeater Queue

1

Low

Medium

MDist 2repeater Repeater MDist2 endpoint Endpointgateway Gateway

3

High fan-out 5

le ilab a v A

s ion ect n n Co

H

max_sessions_high

M H

L

H H

max_sessions_medium

4

M

max_sessions_low

Figure 11-29 MDist2 maximum concurrent connections

1. When a distribution request is submitted, it is sent to the gateway repeater first. 2. The MDist2 repeater processes the request, getting the priority and number of targets, and puts them into the designated queue.

Chapter 11. Tivoli Management Framework core services

407

3. The repeater then fans out the data to the subscribers. The MDist2 repeater attempts to establish connections for each priority until the max_sessions for each priority is reached. The order is determined by priority. If there are no connections for a given priority, the repeater will try to borrow a connection from a lower priority. If the number of connections reaches the limit, the rest of the requests remain in the queue. 4. If distribution to any target is completed (succeeded or failed), the connection is released. 5. MDist2 attempts to establish the connection for another request waiting in the queue according to the priority level. In this example the connection for a low priority distribution is waiting to be dispatched, but there is a high priority request waiting that will take precedence. The MDist2 repeater repeats this process until there are no remaining distribution requests. Note: The MDist1 repeater also manages concurrent connections within a distribution the same way except without a priority queue. When the distribution to any target is finished (succeeded or failed), MDist1 also attempts to establish a connection to another target immediately.

Because the MDist1 max_conn parameter is applied per distribution, the number of connections established between a repeater and its targets multiplies by the number of concurrent distributions. Figure 11-30 on page 409 shows a distribution scenario with MDist1.

408

Troubleshooting Tivoli Using the Latest Features

TMR Server Source host

3

1

MDist I Repeater Fan-out

2

max_conn=10

4

max_conn=10

Figure 11-30 MDist1 max_conn configuration: Multiple distribution scenario

1. A distribution request is sent to the MDist1repeater. 2. The MDist1 gateway repeater fans out the data to the subscribers. In this case, the repeater attempts to open 10 TCP connections at the same time and distributes the data to every 10 machines in parallel. 3. Another distribution request is sent to the same MDist1 repeater. 4. The MDist1 repeater fans out the data to another 10 subscribers. As a result, there are 20 TCP connections, not 10 connections. Therefore, if multiple distributions occur in an MDist1 environment, the number of concurrent connections can quickly result in poor or even unmanageable performance. With MDist2, the connections are shared among all active distributions and never exceed the given number.

Distribution speed configuration The MDist2 repeater provides two parameters that stipulate the network bandwidth used by a software distribution. They are specified in kbps.

Chapter 11. Tivoli Management Framework core services

409

These parameters are: 򐂰 net_load (500 KB by default) 򐂰 target_net_load (disabled(0) by default)

The net_load parameter of MDist2 specifies the maximum amount of network bandwidth that the repeater is allowed to use. And the target_net_load parameter of MDist2 specifies the maximum amount of network bandwidth that can be used on each connection to the target. These limits will be enforced regardless of how many distributions are active at the same time. Figure 11-31 shows how this parameter works.

b a nd w idth n et_ lo a d ta rg e t_ n e t_lo a d

N etw ork ne t_ lo ad = 5 00 M D is t 2 R ep ea ter

Targe ts

Figure 11-31 MDist2 net_load/target_netload concepts

To explain the difference of the net_load parameter between MDist1 and MDist2, we apply a multiple distribution scenario to both environments. The number of concurrent connections is strictly managed by the MDist2 priority queues. If multiple distributions occur, the number of concurrent sessions is not multiplied. Table 11-12 on page 411 shows examples of the relationship between the net_load and the number of connections (where the concurrent sessions are limited to 10, and target_net_load is disabled).

410

Troubleshooting Tivoli Using the Latest Features

Table 11-12 Relationship between the MDist2 net_load and the connections net_load

target_net_load

Targets

Speed / Connection

Total Speed / repeater

500

-

4

125 kbps

500 kbps

500

-

8

62.5 kbps

500 kbps

500

-

10

50 kbps

500 kbps

500

-

20

50 kbps

500 kbps

In this example, 10 or 20 targets have the same speed per connection value. That is because the number of concurrent sessions is limited to 10. Table 11-13 shows examples of the relationship amongst net_load, target_net_load, and the number of the connections (where the concurrent sessions are limited to 10). Table 11-13 Relationship of MDist2 net_load, target_net_load, and the connection net_load

target_net_load

Targets

Speed / Connection

Speed of repeater

500

50

4

50 kbps

500 kbps

500

50

8

50 kbps

500 kbps

500

50

10

50 kbps

500 kbps

500

50

20

50 kbps

500 kbps

In this example, four or eight targets are limited to 50 kbps, because the target_net_load is limited to 50 kbps. The net_load and target_net_load parameters enable strict management of the network bandwidth the distribution consumes, both per repeater and per connection.

Chapter 11. Tivoli Management Framework core services

411

Note: 򐂰 The MDist2 net_load parameter corresponds to the MDist repeater parameter net_load, but its behavior is different. With MDist2, it is applied per repeater, not per distribution. This bandwidth is shared among all active distributions. 򐂰 An MDist1 repeater and an MDist2 repeater can coexist on the same node (for example, endpoint gateway), but in that case, both should be configured individually as follows: MDist1 net_load wrpt -t net_load=500 (MDistI)

The net_load parameter of an MDist1 repeater is independently applied per distribution. The MDist1 net_load parameter provides two different options: The positive net_load and negative net_load. A positive net_load parameter is specified with a positive number. It means that each distribution will use a set amount of bandwidth. Each connection will use the amount of the net_load divided by the number of connections. Figure 11-32 explains how an MDist1 repeater manages the network bandwidth using a positive net_load.

net_load (500 KB/sec)

distribution bandwidth

Network net_load=500

MDist I Repeater

Targets

Figure 11-32 MDist1 positive net_load: Multiple distribution scenario

In this example, the net_load is set to 500 kbps, and there are two distributions, both of which distribute the data to four distribution targets concurrently (the MDist1 max_conn is set to more than four). Each distribution connection can use up to 125 KB/s bandwidth. But since there are two distributions to be executed, the total bandwidth can be up to 1,000 KB/s. The negative net_load is specified with a negative number. The negative sign is simply a flag that tells the repeater

412

Troubleshooting Tivoli Using the Latest Features

that the net_load setting is applied per connection, not per distribution. Therefore, each connection uses the specified network bandwidth. Figure 11-33 explains how an MDist1 repeater manages network bandwidth using a negative net_load parameter.

n e t_ lo a d ( 5 0 0 K B /s e c )

d is t rib u tio n

b a n d w id th

N etw o rk n e t_ lo a d = -5 0 0

M D is t I R e p e a te r

T a rg e ts

Figure 11-33 MDist1 negative net_load: Multiple distribution scenario

In this example, the net_load is set to -500 (KB/s), and there are two distributions, both of which distribute the data to four distribution targets concurrently (the max_conn is set to more than four). Each distribution connection can use up to 500 KB/s bandwidth. Because there are two distributions to be executed, the total amount of network bandwidth used is 4,000 KB/s. Tip: The MDist2 target_net_load parameter corresponds to the MDist1 negative net_load parameter. If a value is specified for target_net_load, the total amount of network bandwidth can never exceed the MDist2 net_load value. The MDist2 net_load parameter does not accept a negative number.

With MDist1, there is another option to control the network bandwidth. Setting the oserv environment variable SLOW_LINK to true causes the net_load parameter to be specified in bytes/second instead of kbps. Data transmission is performed in 1024 byte packets instead of the usual 16 KB packets. SLOW_LINK does not exist in MDist2. It has been replaced by the MDist2 configuration parameter packet_size. To summarize, MDist1 settings apply per distribution (with the exception of the negative net_load) and can result in unexpected bandwidth utilization. With

Chapter 11. Tivoli Management Framework core services

413

MDist2, the network bandwidth is shared among all active distributions and never exceeds the net_load.

Memory and disk for spooling data The MDist2 repeater provides two parameters that specify values for maximum amounts of system memory and disk space that can be allocated to the repeater to spool data during distributions. The following are these two parameters: 򐂰 mem_max (65 MB by default) 򐂰 disk_max (512 MB by default)

Figure 11-34 explains the interaction between mem_max and disk_max during a distribution.

rpt_dir (depot) 1 Data

2

3

mem_max

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Memory

Data

disk_max

Disk

Figure 11-34 MDist2 configuration: mem_max and disk_max

1. The data is distributed to the MDist2 repeater. 2. The data is initially stored on the disk (depot) of the repeater. The maximum amount of disk space that can be used for the data is a value specified by disk_max. 3. The memory is used as a buffer to distribute the data to the target. The maximum amount of memory that can be used for the data is a value specified by mem_max.

414

Troubleshooting Tivoli Using the Latest Features

Note: 򐂰 Both MDist and MDist2 provide mem_max and disk_max parameters, but there are two important differences: 򐂰 With the MDist2 repeater, these parameters are applied per repeater, not per distribution, as with MDist. The maximum allotted memory and disk space are shared among all active distributions. 򐂰 The MDist2 repeater adopts a “store-and-forward” mechanism using depots. Each repeater receives the entire distribution and stores it at least temporarily on disk before sending it on to its endpoints or the next level of repeaters.

When setting the disk_max value, be sure that the file system has enough space to accommodate the amount required by any intended concurrently distributed software packages for temporary storage or the total amount used for software packages loaded permanently on the depot. Figure 11-35 shows how the MDist1 repeater manages mem_max and disk_max using the multiple distribution scenario.

Data

Data

4

6

5

Full Data Data Data

mem_max

Data

/tmp (disk_dir)

Data Data

3

Data

2

1

Data

Full

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data Data

mem_max

Memory

disk_max

Disk

Figure 11-35 MDist1 configuration:mem_max and disk_max in multiple distribution

1. The data of the first distribution is distributed to the MDist1 repeater. 2. The data is initially spooled to real memory on the repeater. The maximum amount of memory used for spooling data is specified by the value for mem_max.

Chapter 11. Tivoli Management Framework core services

415

3. After the mem_max is reached, the data spools to the disk on the repeater. The directory or file system to which data spools is specified by the disk_dir parameter. The maximum amount of disk space that can be used for spooling data is specified by the value for disk_max. 4. The data of the second distribution is distributed to the MDist1 repeater. 5. The repeater allocates another area in real memory. 6. After mem_max is reached again, the data spools to another space on the disk (under the same specified disk_dir directory). As long as multiple distributions occur in the MDist1 environment, memory allocation multiplies. Uncalculated resource utilization may affect system performance or cause a hung system.

Data depots MDist2 provides gateways with the ability to store transferred data in a local repository. This is called a depot, and its data entry is called a segment. A depot can store data (segments) temporarily or permanently. Basically, the multiplexing functionality is the same between MDist1 and MDist2; that is, the repeater site receives a single copy of data and distributes it to the next tier of clients. However, there is a difference in whether the data can be stored in the distribution hierarchy or not. For MDist, the data is cached only while distributions are executed. When the distributions are completed, whether succeeded or failed, the data will be deleted. This can be described as a data pipeline. With MDist2, on the other hand, each repeater receives the entire distribution before sending it on to the next level of repeaters or to its endpoints. The data can be stored on the depot either temporarily or permanently. This type of distribution uses a store-and-forward mechanism (see Figure 11-36 on page 417).

416

Troubleshooting Tivoli Using the Latest Features

T M R Server Sou rce host

ch ca

sto

e

re Depot

MDist I Repeater

MDist II Repeater Fan-out

Fan-out

forward

Repeater Fan-out

Managed node

Endpoint

Figure 11-36 Depot concepts

Figure 11-37 shows the structure of depots and usage between repeaters.

Figure 11-37 Depots between repeaters

Chapter 11. Tivoli Management Framework core services

417

The depots have the following distribution functionalities: 򐂰 Repeaters can depot distribution data 򐂰 Checkpoint restart between repeaters and gateways and endpoints 򐂰 Reduce network traffic for frequently distributed data sets 򐂰 Data can be stored temporarily or permanently 򐂰 Data is transferred via store and forward

Segments A depot stores each distribution segment as a separate entity, which consists of two files with the extension of .toc and .dat, where both files share the same name (an 8-digit serial number). For example, Example 11-38 shows the .toc and .dat files for the depot called halibut. Example 11-38 .toc and .dat files >set DBDIR DBDIR=c:\Tivoli\db\halibut.db >dir c:\Tivoli\db\halibut.db\tmp\depot 00000001.dat 00000001.toc 00000002.dat 00000003.toc 00000003.dat 00000003.toc

The .toc (table of contents) file maintains the information for each segment as shown in the list below, and the .dat files are the actual data to be distributed. When the repeater loads segments into memory, the .toc file is scanned.

418

Segment

Segment description.

Creation time

The time the segment first arrived at the depot.

Last access time

The last time the segment was referenced by a distribution.

Update count

The number of times the segment data was modified.

Access count

The total number of destinations to which the segment was distributed.

In use count

The number of connections using the segment.

Is stored

Indicates whether the segment's data is stored in the depot.

Troubleshooting Tivoli Using the Latest Features

IBM Tivoli Configuration Manager uses this function to store software packages in depots. In this case, it is called a software depot. This provides the ability to store mission critical software on servers that are closer to the ultimate destinations. Also, the same software package can be stored in more than one depot. We will now introduce how the depot is used in the IBM Tivoli Configuration Manager context.

Enable the depot to store software packages permanently A depot can store data (segments) temporarily or permanently, as defined by the MDist2 configuration option permanent_storage. Its value is either TRUE (allow segments to be stored in depot permanently) or FALSE (segments are deleted after distribution is completed). The default value is TRUE. In large-scale environments, you may have a requirement to keep software packages on local LAN sites to avoid software packages from being transferred over a WAN every time a distribution occurs. In this case, be sure that this option is configured to TRUE by using the wmdist -s command. In case it is not TRUE, enter the following command: wmdist -s central permanent_storage=TRUE

Depot configuration To see the current configuration of a depot, you may use the wdepot describe command. Example 11-39 shows the output of this command. Example 11-39 Depot configuration #wdepot central describe depot Location = d:/Tivoli/db/central.db/tmp/depot/ depot Size = 512000 (KB) Temporary Storage = 4294967295 (KB) Permanent Storage = 0 (KB) Total Storage = 4294967295 (KB) Free Space = 512001 (KB) # #wmdist -s central repeater_id: rpt_dir: permanent_storage: max_sessions_high: max_sessions_medium: max_sessions_low: disk_max: mem_max:

1978508757.5.21 d:/Tivoli/db/central.db/tmp/ TRUE 5 10 40 512000 65536

Chapter 11. Tivoli Management Framework core services

419

send_timeout: execute_timeout: notify_interval: conn_retry_interval: retry_ep_cutoff: net_load: packet_size: target_netload: debug_level: debug_delay:

300 600 30 900 7200 500 16384 0 3 0

Define the depot directory with the MDist2 configuration option rpt_dir. Upon setting this option with a directory path, a .../depot/ subdirectory is created. The default directory of the rpt_dir option is $DBDIR/tmp/. Note: In the previous example, the repeater central is in Windows NT. For a UNIX node, the default directory would be /usr/local/Tivoli/db/central.db/tmp/depot/.

Recall that the depot size is handled by the MDist2 configuration option disk_max (512 MB by default). The wmdist command with the -s argument can display or modify its value.

Loading software packages on depots Software packages can be loaded or unloaded on a depot prior to distribution. Tivoli Software Distribution 4.0 provides the wldsp command, which loads a software package on one or more depots. Similarly, the wuldsp command unloads a software package from one or more depots. The syntax of the commands are: Usage: wldsp @[SoftwarePackage:]spobj_name [repeater ...] Usage: wuldsp @[SoftwarePackage:]spobj_name [repeater ...]

Where: spobj_name

Name (label) of the software package object

repeater

Name of one or more repeater depots

Consider Example 11-40 on page 421.

420

Troubleshooting Tivoli Using the Latest Features

Example 11-40 wldsp command #wldsp @example^01 central Operation successfully submitted. Distribution ID is 1978508757953260009. # #wuldsp @example^01 central Operation successfully submitted. Distribution ID is 1978508757953260753.

The wldsp command submits a distribution that sends a software package to the repeater as a target. Similarly, wuldsp submits a distribution that deletes a software package from the repeater. Using the Tivoli Management Framework command wdepot, you can also control the segments (software packages) on a depot, such as adding, deleting, listing, or getting information about each segment. The syntax of the command is: wdepot repeater_name list [id^version] [-l] wdepot repeater_name delete id^version wdepot repeater_name purge

Where: list

Lists all segment entries in the depot

delete

Deletes specified segment entries in the depot

purge

Deletes all entries in the depot

id^version

ID and the version of the segment

–l

Lists all information for each entry

Example 11-41 gives an example of the command. Example 11-41 wdepot #wldsp @example^1 halibut Operation successfully submitted. Distribution ID is 1757544609954288623. #wdepot halibut list ame Version Test 1 07:40:17 First 1 10:25:26 example 1

Status P P P

Size(bytes) 12181684(100%)

Update time 2000/03/01

320669(100%)

2000/03/08

12181684(100%)

2000/03/0

Chapter 11. Tivoli Management Framework core services

421

#wdepot halibut list -i Entry #1: Id: Version: Bytes received: Location: Creation time: Last modification time: Receive time: Last access time: Update time: Access count: Modification count: Reference count: Storage status:

example^1 -l example 1 12181684(100%) c:/Tivoli/db/halibut.db/tmp/depot/0000004.dat 2000/03/28 18:19:42 2000/03/28 18:19:42 2000/03/28 18:19:42 2000/03/28 18:19:42 2000/03/28 18:19:42 0 1 0 Permanent

As shown above, the software package example^1 is converted to a file under the depot directory (0000004.dat). Note: To load a software package on a depot, it first must be built as a software package block (.spb).

Software package distribution One of the distribution options you can set is From depot. When selected, the software package loaded on the depot distributes instead of the original package on the source host. Tivoli Software Distribution provides both GUI and command line interfaces for installing software packages as well as setting distribution options. Figure 11-38 on page 423 shows the Install Software Package Dialog with the distribution option From depot selected.

422

Troubleshooting Tivoli Using the Latest Features

Figure 11-38 Software Distribution GUI: Install Software Package with depot

Set the same option from the command line with winstsp using the -l argument, which is used for all MDist2-related options. To tell the system to distribute data from the depot, type winstsp -l from_depot=yes. Note: The From depot option requires software packages be loaded prior to the distribution with the wldsp command. Otherwise, the distribution will fail.

Another distribution option, Disposable, results in data stored on the depot to be only temporarily used. When this option is used, the software package associated with the distribution is deleted from the depot once the distribution is finished (either all endpoints have completed or the distribution has expired). You can set this option both from the GUI (see the check box Disposable in Figure 11-38 and from the command line. To set this option from the command line, use winstsp with the -l argument (winstsp -l disposable=y). Be sure not to use this option if you want to keep software packages on the depot.

Chapter 11. Tivoli Management Framework core services

423

Note: When using the Disposable option, you do not need to load software packages prior to the distribution. Nor do you need to unload software packages explicitly with wuldsp because they are automatically deleted from the depot after distribution.

If the MDist2 repeater is configured to use the depot temporarily (with the permanent_storage=FALSE statement), the software package is deleted after distribution regardless, even if the disposable option is not set. Table 11-14 explains the relationship between disposable and MDist2 permanent_storage configuration options. Table 11-14 Relation of disposable option and permanent_storage configuration Option

permanent_storage=FALSE

permanent_storage=TRUE

disposable=Y

deleted (unloaded)

deleted (unloaded)

disposable=N

deleted (unloaded)

remains (loaded)

Figure 11-39 shows the default depot directory on a repeater

MDist II Repeater

$rpt_dir

Depot

/depot/

Depot

/states/

Persistent Queue

Queue

$DBDIR/tmp/ (default) Figure 11-39 Depot directory

Consider a typical three-tiered management structure environment over an WAN, as illustrated in Figure 11-40 on page 425.

424

Troubleshooting Tivoli Using the Latest Features

Operation C enter T MR Server T EC Server Repeater etc.

Wide Area N etwork

Site D

Site A

EP G ateway

EP G atewa y

EP

EP

EP

EP

EP

EP

EP

Site B

EP

EP

Site C EP G ateway

EP Gateway

EP

EP

EP

EP

EP

EP

EP

EP

EP

EP

EP

Figure 11-40 An example of a large-scale distribution environment

In this environment, one or more endpoint gateways are located on each site, serving as repeater sites for the distribution. Note: Gateways are repeaters by default.

When distributing to a target on the remote side of a slow network connection, such as a WAN, the low bandwidth may affect the performance and throughput of the distribution and sometimes the entire network. Therefore, distributions need to be fast and efficient to reduce system and network load. To meet this requirement, MDist1 uses a fan-out mechanism for distributions to large numbers of targets and distributions that must cross a WAN. By configuring a repeater as an entry point for each LAN, the source repeater, distributing across a WAN, only needs to distribute to the individual LAN entry point repeaters. The individual LAN repeaters then “fan out” the distribution to other repeaters within its local hierarchy if enabled, or directly to the end targets, depending on how they are configured. Therefore, by enabling repeaters in a proper hierarchy configuration throughout a WAN, distributions to 1000 targets, for example, do not require 1000 data transfers across a slow network as it does when using only one point of distribution. Using the fan-out mechanism

Chapter 11. Tivoli Management Framework core services

425

minimizes the frequency data crosses a slow network or WAN by using LAN repeaters to distribute within their own local network. Software package blocks, as introduced in Tivoli Software Distribution 4.0, also provide a possible solution in slow network or WAN scenarios. A file package block is a static file containing a snapshot of a file package. This snapshot includes the file package definition, file package attributes, source files and directories, and configuration programs for a specific file package. A file package block of a specific file package gets created on each remote LAN repeater across a WAN. This enables each LAN entry point repeater to distribute the file package locally to its targets. In addition to this, IBM Tivoli Configuration Manager Software Distribution component provides a depot feature that performs the same function simpler and more efficiently. With the MDist2 depot, you can store (load) software packages on depots on any endpoint gateway (MDist2 repeater), and distribute them from the depot. Figure 11-41 explains how the software package is created and distributed to each target.

O peration C enter TM R S e rve r S o u rce H o st

SP 1

2

W ide Area N etw ork

2 2

2 Site D

S ite A E P G a te w a y

D e pot

3

3

EP

EP

EP

EP

EP

EP

EP

Site B

EP

EP

EP

Site C

E P G a te w a y

D e p ot

EP

EP

EP

E P G a te wa y

D epot

3

EP

E P G a te wa y

D epot

EP

3

EP

EP

EP

EP

EP

Figure 11-41 Software Distribution scenario using a depot

1. Create the software package. The software package is created on the source host.

426

Troubleshooting Tivoli Using the Latest Features

2. To load a software package to a depot, execute the wldsp command with the appropriate options. Then the software package with its definitions is copied to each MDist2 repeater (endpoint gateway) that you specified. 3. To distribute the software package to each target (endpoint), and execute the winstsp or other appropriate change management command with the From depot option. The endpoint gateway distributes the software package from the depot to each target. For example, an MDist2 distribution across a slow WAN link to a remote depot repeater fails when the remote depot repeater tries distributing to its endpoints. The remote depot repeater (local to the target endpoints) contains the software package, and retries sending the distribution to those targets over a specified period of time. In MDist, if the same distribution scenario occurred, the retry function re-distributes the software package from the original source host through the same route over the WAN because MDist1 does not dynamically store distribution information. With MDist2, however, if the data is loaded on the depot, the distribution retry does not require the file to be transmitted from the source host over a slow link (WAN). The advantage of using a depot becomes more apparent for larger software packages. Software packages can be stored on local repeaters for fast and efficient distribution to systems on the same LAN when the source repeater is on the other end of a WAN connection. The depot also provides checkpoint restart functionality, as discussed in the next section.

Checkpoint restart When a distribution is interrupted due to a network failure, machine reboot, or power failure, it is automatically resumed from the last checkpoint on the receiver. This means that if an interruption occurs, entire files and configurations do not need to be retransmitted, but only the portions that did not arrive before the break in connection. This maximizes the use of valuable bandwidth by preventing the entire file package from having to be re-sent to the endpoint. Once the connection is re-established, the sending repeater (depot) contacts the receiving repeater and sends it the list of segments for the software package. The receiving repeater checks its depot for segments received against the list. If it has a partial segment, the receiver sends the incomplete segment back to the sending repeater. It tells the sending repeater not to send the segments that it received in full. Handshake occurs at the start of each connection (both repeaters & endpoints). Endpoint checkpoint restart requires the participation of the application. Figure 11-42 on page 428 shows how checkpoint restart works.

Chapter 11. Tivoli Management Framework core services

427

Figure 11-42 Implementation of checkpoint restart

For example, suppose a ThinkPad is connected to a campus network and a distribution is performed to it as a background process when the user disconnects the system to use it remotely in midst of the transmission. The MDist2 repeater regards the distribution to this system as interrupted and sets a checkpoint. Once the user reconnects and the endpoint logs back in to the gateway, the distribution resumes from the interrupted point to completion.

How MDist2 handles network and power failures differently If the distribution of a software package is interrupted due to a network failure, machine reboot, or power failure, the gateway recognizes the failure and retries the distribution when the timeout expires. The distribution is automatically resumed when the cause of failure has been corrected. But the checkpoints are saved in a different way if a power failure or network failure occurs, and also is the way the customization has to be performed is different. For network failures, the distribution resumes from the last completed block. The amount of data that must be retransmitted depends on the MDist2 configuration parameter packet_size, which sets the checkpoint memory buffer size. The default value is 16 KB, and is recommended for the best network distribution performance. You can modify the packet_size value with the wmdist -s command. However, 16 KB is typically optimal and changing this value is not advised. Use the wmdist -s packet_size=16 command to change the size of the checkpoint memory buffer size. For power failures, the distribution resumes from the last successful checkpoint, as detailed in the description that follows. A handshake occurs between repeaters and between repeaters and endpoints that allow a data stream to be resumed at the point of interruption.This handshake occurs for each connection, so that if there is a failure within the

428

Troubleshooting Tivoli Using the Latest Features

hierarchy during the distribution of a large package, the distribution need not be restarted from the beginning. For transmissions between repeaters and endpoints, the buffer size is set dynamically. Normally, the size of the buffer is set to a tenth of the size of the package, giving ten checkpoint saves during the transmission, as the number of bytes transmitted between each checkpoint save depends on the size of the buffer. If there is a failure, the transmission restarts from the last checkpoint save. The buffer has a minimum size of 1 MB and a maximum of 2 GB. Therefore, the number of checkpoints differs from ten only in the case files smaller than 10 MB or larger than 20 GB. If you prefer to set a fixed size for the buffer, you can define a buffer size by assigning a value to the checkpoint_buffer_size attribute in the swdis.ini file. This attribute controls checkpoint buffers in the following way: 򐂰 If the attribute is not specified, dynamic buffer evaluation is used. 򐂰 If the attribute is set to equal 0, checkpointing is disabled. 򐂰 If the attribute is set to equal a value between 1 MB and 2 GB, the buffer size is set to that value.

As we will explain in the next section, the administrator can control the distribution process with pause, resume, or cancel options. The pause and resume actions are treated as any distribution interruption and use checkpoint restart. Canceled distributions, however, must restart from the beginning.

Assured delivery To handle a large number of distributions, MDist2 repeaters use a queue mechanism. MDist2 repeaters manage persistent information of distributions, keeping them stored in a file (MDist2.bdb) locally. This information is loaded into memory every time the repeater starts up. You can configure the directory of MDist2, including the depot location using the rpt_dir configuration option. In this directory, a subdirectory named /states maintains the persistent information of the MDist2 repeater queue (MDist2.bdb) as well as the log file (MDist2.log). If a connection to a receiver (both repeaters and target endpoints) cannot be established or is broken, the repeater keeps an initiated distribution in its queue, and waits for the connection to be (re)established. The repeater automatically retries to connect every predefined number of seconds until the connection is re-established or until the distribution automatically aborts after reaching the deadline specified by the calling administrator (through the application

Chapter 11. Tivoli Management Framework core services

429

configuration, not MDist2). The interval of time between retries is defined by the MDist2 configuration option conn_retry_interval (900 seconds by default). For endpoints, MDist2 provides an additional deadline configuration option, retry_ep_cutoff (7200 seconds by default) that is non-application specific. When a distribution is in progress to an endpoint and the connection is broken, the gateway repeater tries to reconnect to the endpoint every defined number of seconds. Once retry_ep_cutoff is reached, the gateway repeater stops trying to connect to the endpoint and places the distribution in queue until the application-specified distribution deadline. Figure 11-43 illustrates the retry process after an interruption between an endpoint gateway (MDist2 repeater) and an endpoint. TMR Server Source host

conn_retry_interval

2 Queue

MDist II Repeater Endpoint Gateway

1 interrupted or

unavailable

3 Endpoint

4

expired

application dead line

expired

retry_ep_cutoff

Figure 11-43 Retry option: Gateway repeater and endpoint

Where: 1. The connection to the endpoint cannot be established or breaks. 2. The MDist2 repeater keeps this distribution in the repeater queue. 3. The MDist2 repeater attempts to establish or re-establish a connection to the designated target every MDist2 configured conn_retry_interval. 4. The distribution expires when the deadline defined by the application (for example, Software Distribution) is reached or the MDist2 configuration option retry_ep_cutoff is reached.

430

Troubleshooting Tivoli Using the Latest Features

Figure 11-44 illustrates an interruption between an endpoint gateway (MDist2 repeater) and another repeater. TMR Server Source host

conn_retry_interval

2 MDist II Repeater

Queue

1 interrupted or

unavailable

3 4 Repeater

application dead line

expired

Figure 11-44 Retry option - endpoint gateway and another repeater

Where: 1. The connection to the receiving repeater cannot be established or breaks. 2. The MDist2 repeater keeps the distribution in the repeater queue. 3. The MDist2 repeater attempts to establish or re-establish a connection to this repeater in a interval defined by the MDist2 configuration option conn_retry_interval. 4. Because MDist2 does not have a deadline configuration option for repeaters, the repeater continues to retry until the application deadline is reached. If an undelivered target is a repeater, the distribution aborts for all targets it serves beneath it. The sending repeater resumes the distribution to the receiving repeater upon reconnect, and in turn the receiving repeater distributes to the targets beneath it. Note: MDist1, unlike MDist2, stores no persistent state information and therefore cannot resume a time-out distribution. The administrator must re-distribute as a new operation.

As described in the previous scenario, the application can define the deadline (timeout) of distribution. Software Distribution component provides this setting as

Chapter 11. Tivoli Management Framework core services

431

part of its change management options, such as install and remove. The administrator sets the deadline when setting up a software package distribution (see Figure 11-45). To reach the dialog seen in Figure 11-45, select the menu bar Advanced Options -> Time-out Settings.

Figure 11-45 Software Distribution GUI: Time-out Settings

As shown, there are four options: Deadline

The date on which a distribution expires, that is, when it fails for unavailable targets.

Notification Interval The length of time before MDist2 sends distribution results notification back to the sending application. See “Reporting results” on page 437.

432

Send Timeout

The length of time a repeater will wait for a target to receive a block of data. This timeout is used to detect network or endpoint failures. The default length of time is 300 seconds (five minutes). This option overrides the MDist2 send_timeout option.

Execution Timeout

The length of time a repeater will wait for Tivoli Software Distribution to return the result of a distribution after all the data has been sent. This timeout is used to detect network, endpoint, or script failures, such as a script running in an infinite loop. The default length of time is 300 seconds (five minutes).This option overrides the MDist2 execute_timeout option.

Troubleshooting Tivoli Using the Latest Features

Note: MDist2 also provides send_timeout and execute_timeout configuration options. Their roles are same as the previous example: 򐂰 send_timeout (300 seconds by default) 򐂰 execute_timeout (600 seconds by default)

Applications that provide the same time-out options as MDist2 override the MDist2 setting.

Disconnected endpoint support Using the persistent queue data maintained by the MDist2 repeater, MDist2 enables support for disconnected endpoints. If an MDist1 repeater cannot establish a connection to a target endpoint, the distribution to the target aborts, and the administrator has to re-submit the distribution to this target. With MDist2, if the gateway repeater cannot establish a connection, the repeater keeps the distribution in its queue and waits for the endpoint to log in. It retries connecting to the endpoint in an interval defined by the MDist2 configuration option conn_retry_interval, up to the deadline defined by the MDist2 configuration option conn_retry_cutoff. This functionality enables the solution to automate the distribution when the target system starts up. Figure 11-46 on page 434 shows the sequence of events for disconnected support.

Chapter 11. Tivoli Management Framework core services

433

TMR Server Source host

1 2 MDist II Repeater

Queue

4

3 2

Login

4

Power Off

Figure 11-46 Automated Software Distribution scenario at power-on

Where: 1. The initial distribution request is sent to the MDist2 repeater. 2. The MDist2 repeater (endpoint gateway) attempts to establish a connection to the designated target. In this case, the target is unreachable, because it is powered off. The distribution stays in the repeater queue and tries again to establish a connection to the endpoint at a given interval until the distribution reaches the deadline. 3. When the system starts up, the endpoint tries to log into its endpoint gateway. 4. The gateway repeater of MDist2 intercepts the endpoint login and determines if there are any distribution requests in its queue for the endpoint. If there are, the distribution starts.

Mobile computing support The disconnected endpoint support function realized by the distribution queue with persistent information helps assure successful deployment to intermittently connected systems. If a mobile user is not connected at the time of the distribution, the information will remain in the queue for this user until the workstation re-connects to the network, or until the distribution reaches the expiration time set by the administrator.

434

Troubleshooting Tivoli Using the Latest Features

Distribution control and status Once distributions are submitted to MDist2, the Distribution Manager manages them. The Distribution Manager keeps distribution status and performs control operations, such as pause, resume, and cancel. It assigns each distribution an ID and uses it to identify and track the distribution. The Distribution Manager maintains the list of all completed and pending distributions and their status for each destination in a relational database accessed by RIM. The stored distribution information updates dynamically as the status changes. Note: MDist2 repeaters can still perform basic repeater functions without a RIM database; however, you must configure a RIM database to control or receive the status of distributions.

MDist2 provides an option to automatically remove completed distributions from the database to preserve database space. To configure this option, use the wmdist -T command. The database_purge_interval value is the interval in seconds from the completion of distribution until its entry is deleted. The default is -1, which means the option is disabled. You can also delete distributions using the GUI or the wmdist -d command. MDist1 does not provide status information to the application until the distribution is finished. For example, when Tivoli Software Distribution initiates a distribution, the user cannot see which machines are encountering problems, or estimate when the distribution will finish. In large-scale environments, where there are hundreds or thousands of targets, the need for real-time distribution status information increases greatly, and MDist2 should therefore be used.

Distribution status A distribution can be in any of the following states for each destination: 򐂰 Status by Severity OK

The distribution processed normally or was placed in a state by user intervention.

Warning

A recoverable error occurred in the distribution. States include Interrupted and Unavailable.

Error

An error occurred in the distribution. States include Failed and Expired.

Chapter 11. Tivoli Management Framework core services

435

򐂰 Status by Progress Pending

The distribution is still in progress. States include Waiting, Receiving, Sending, Paused, Interrupted, and Unavailable.

Completed

The distribution is completed. States include Successful and Unsuccessful. Unsuccessful includes Failed, and Canceled.

Distribution and target states Canceled

An administrator canceled the distribution.

Expired

The distribution exceeded the time-out deadline.

Failed

The distribution failed. Check the application log for more information.

Interrupted

The distribution was interrupted, such as by a connection break. The repeater depot continually tries to resume the distribution over specified intervals until either the MDist2 or application distribution timeout is reached.

Paused

An administrator halted the distribution for an indefinite period of time.

Receiving

The target is in midst of receiving the distribution.

Rejected

The endpoint user rejected the distribution. (This state is not in the current release, but is reserved for future use.)

Sending

The repeater is sending data and the distribution can still be paused or canceled.

Successful

The distribution is successfully completed.

Unavailable

The repeater cannot communicate with the target. The depot attempts the distribution again when the endpoint logs back into the gateway. The state changes to Expired once the distribution timeout is reached.

Waiting

The target is waiting to receive the distribution. There is not yet communication between the depot and target.

These states mostly refer to the target endpoint; however, they include repeater statuses, such as sending, used in the distribution topology view or node tables.

436

Troubleshooting Tivoli Using the Latest Features

Reporting results We have already mentioned that MDist2 uses an asynchronous delivery method. Results are sent back to the application that called the MDist2 service. The following sequence of events occur when reporting the status: 1. The MDist2 repeater buffers the results of completed distributions. 2. The repeater sends the results back to the application and Distribution Manager, and the MDist2 database updates with the new target information when: 3. The distributions for all targets complete. 4. The interval (minutes) specified by the MDist2 configuration option notify_interval elapses. This option is configured using the wmdist -s repeater_name notify_interval= command. The default value is 30 minutes. In large-scale environments, MDist2 can group success or failure notifications into single messages to reduce the network traffic.

Interfaces: GUI and command line MDist2 provides both GUI and the command line interfaces to manage distributions. The GUI displays the statuses of distribution both by distribution and by endpoints. It can provide the information with different views. Different status can be differentiated by color or organized in charts. This information updates automatically. It provides distribution management functionality, such as pause, resume and cancel. Once you select a distribution or node associated with the distribution, you can use one of the operation icons to manage the distribution. To manage distributions from the command line interface, use wmdist. In addition to repeater configuration tasks, wmdist is used to retrieve distribution status both in summary and in detail format, or perform an operation directly from the command line or from shell scripts.

Pausing and resuming a distribution Upon pausing a distribution, the connection for the distribution closes. It reestablishes once the distribution is resumed. The Distribution Manager uses the distribution ID to track it. Upon initiating the pause, MDist2 sends a pause signal to the targets. This signal routes using the same path as the distribution. The endpoint state does not change until the pause command catches up to the distribution and the Distribution Manager receives confirmation status. As a result, it is possible that distribution to an endpoint will complete before the pause command can reach the distribution.

Chapter 11. Tivoli Management Framework core services

437

Canceling a distribution Upon canceling a distribution, the Distribution Manager uses the distribution ID to identify the distribution being canceled. To perform the cancel operation, MDist2 sends a cancel signal to the targets (this signals routes using the same path as the distribution being canceled). The state of an endpoint does not change until the cancel signal catches up to the distribution and the Distribution Manager receives confirmation status. As a result, it is possible for a distribution to an endpoint will complete before the cancel command can reach the distribution.

Summary of distribution control and status Below is a summary of control and status of distributions in IBM Tivoli Configuration Manager.

Distribution status 򐂰 Asynchronous notification 򐂰 Final and some intermediate states (paused) stored in RIM RDBMS 򐂰 Viewable with GUI 򐂰 Database entries persist after distribution finishes

Distribution control 򐂰 Abort and Pause/Resume available for:

– Entire distributions – Individual endpoints for a distribution 򐂰 Use with scheduling to set distribution windows. 򐂰 Pause closes connections. Resume will re-establish the connection. Application must handle checkpoint restart.

Distribution Manager 򐂰 Separate distinguished object: Distribution Manager. 򐂰 Stores final status in a RIM database:

– Distribution states: Table containing list of active and completed distributions. Columns: Dist ID, User, Label, Size, Source application, Source node, Start time, Finish time, Last update time, Expire time, Target count, No. of targets in each state, Min, Max, and Average time in each state. – Target States: Table containing an entry for every endpoint of every distribution. Columns: Dist ID, Node OID, Parent node, State, Start time, Finish time, Last update time, Time in each state.

438

Troubleshooting Tivoli Using the Latest Features

򐂰 To reduce network traffic, only final states are stored in database. Possible exceptions are Pause and Unavailable. 򐂰 Results are returned from repeaters to the source host through the same repeater path used to distribute the data. The source host passes the statuses to the distribution manager. 򐂰 Can be configured to periodically purge completed distributions. 򐂰 Provides interface for pause/resume and cancel for entire distributions or selected endpoints of a particular distribution.

11.6.6 Troubleshooting MDist2 We will cover MDist2 troubleshooting within the context of IBM Tivoli Configuration Manager troubleshooting. Please refer to 19.4.5, “Check for MDist2 problems” on page 886.

Chapter 11. Tivoli Management Framework core services

439

440

Troubleshooting Tivoli Using the Latest Features

12

Chapter 12.

RDBMS Interface Module (RIM) The RDBMS Interface Module (RIM) is designed to allow Tivoli applications that collect or generate large amounts of data to store that data in third-party databases. The goal of the RIM is to allow the applications to have a common set of APIs to get and store data. RIM’s job is to convert the data provided through those APIs to the format used by the various database vendors. The following topics are discussed in this chapter: 򐂰 Section 12.1, “Overview of RIM” on page 442 򐂰 Section 12.3, “Installing a RIM” on page 445 򐂰 Section 12.2, “Understanding RIM” on page 442 򐂰 Section 12.4, “Troubleshooting example: Failure to connect with a RDBMS” on page 454 򐂰 Section 12.5, “Designing your Tivoli environment for a RIM” on page 461

© Copyright IBM Corp. 2003. All rights reserved.

441

12.1 Overview of RIM The Tivoli applications that store and use large amounts of data are being migrated to use external databases. Many Tivoli applications, such as IBM Tivoli Configuration Manager, IBM Tivoli Monitoring, Tivoli Enterprise Console, and Netview, are using RIM to store their data on RDBMS databases. RIM objects are created on managed nodes. During the installation, you are asked to provide RIM configuration options. This information is used to create the RIM object and register it in the Tivoli object database. You can delete and recreate the RIM objects created by the installation. You can also create additional RIM objects using the wcrtrim command or move a RIM object from one managed node to another using the wmvrim command. The managed node where you create the RIM object is called RIM host. There are two requirements for RIM hosts: 򐂰 The managed node must be local to the Tivoli region. 򐂰 The managed node must be preconfigured with the RDBMS client or server software.

The password for each RIM object must be the same as the password of the RDBMS database that it accesses. In other words, the RIM password and the RDBMS database password need to be the same. Therefore, you must change the password of each RIM object to match that of its repository. When you change the RDBMS database password, you must also change the password for the RIM object with the wsetrimpw command. Additionally, the user name for the RIM object and the user name for the RDBMS must also match.

12.2 Understanding RIM From the Tivoli application user’s perspective, RIM is invisible. An application using RIM will gather and store data in the RDBMS, but there is no interaction or involvement by the user.

12.2.1 RIM behind the scenes Several components work together to make communication through the RIM possible. The client application uses RIM APIs to make a request to gather and retrieve data. The RDBMS_Interface translation layer receives the request from the client, looks up the RIM host, and sends the request to the RIM host. The third component is the vendor adaptor layer, which sends vendor-specific

442

Troubleshooting Tivoli Using the Latest Features

requests to the database. An overview of these components is given in Figure 12-1.

RIM Client Application RIM Link Library (RIM APIs)

Client Application

RIM Host Managed Node

RIM IDL Interface (Translation Layer)

RIM VAL (Vendor Adaptor Layer)

Vendor Specific Library Calls Figure 12-1 RIM components

12.2.2 RIM APIs There are several Tivoli APIs that an application can use to store and retrieve information from an external database. Using API calls allows the application to be database vendor independent. These APIs can be seen as IOM commands in a wrimtrace output on the RIM host when an application has been using RIM. A list of RIM calls that can be seen using wrimtrace includes: connect iom_session release database commit rollback execute_sql quote_value insert insert2 update update2

Chapter 12. RDBMS Interface Module (RIM)

443

retrieve retrieve2 delete delete2

Refer to Tivoli Framework 3.7.1 Reference Manual, SC31-8434 for more information about using wrimtrace. Note that odadmin environ get RIM_DB_LOG shows you where the log will be written, and that you have two tracing options besides TRACE_OFF, which give database errors (ERROR) or the contents of the IOM packets (INFORMATION). When you change the trace level, be sure to kill the RIM__prog and RIM__Agent processes described in the manual. Important: Neither process should be killed if there is more than one RIM object in the TMR.

12.2.3 RDBMS_Interface translation layer The translation layer acts as the engine behind the RDBMS_Interface API. It performs two major tasks: 򐂰 Looks up the correct server with which to connect. 򐂰 Translates data passed through the API as any into an SQL string involving operations on the data model tables, rows, and columns (database records and fields).

12.2.4 Vendor adaptor layer The lowest layer of the RDBMS_Interface are the stub functions making calls to an actual vendor-specific library adopted for a given combination of RDBMS vendor and platforms. This layer consists of hooks to vendor-specific library calls. Each vendor instance of this layer is implemented as a separate dedicated C program.

12.2.5 Client application communication RIM performs the following actions using the values provided (see also Figure 12-2 on page 445): 1. Application 1 asks TMR A for the object ID of the required RIM host. The TMR looks in the name registry and finds that Application 1’s RIM host is RIM host X. 2. Application 1 requests an action of RIM host X. This request is routed through TMR A.

444

Troubleshooting Tivoli Using the Latest Features

3. RIM host X uses the directory value for Database Home and looks at the appropriate configuration file. 4. RIM host X looks up how to contact the RDBMS server matching the name of the Server ID to an entry in the configuration file. 5. RIM host X contacts the RDBMS server and performs the action for the database specified in the Database ID field. It accesses this database using the user ID and password that has been set for Application 1’s RIM object. 6. Once the request is completed by the RDBMS server, RIM host X passes the data through an IOM channel directly to that managed node.

TMR A

RIM Host X

IOM Channel

Managed Node Application 1

RDBMS Server

Figure 12-2 How an application uses RIM

12.3 Installing a RIM RIM is a component of the Tivoli Management Framework. RIM is installed with the Tivoli Management Framework, and then each application that uses it creates the appropriate RIM objects. The application installation usually consists of two steps: 򐂰 Creating the user ID and user tables in the RDBMS server. 򐂰 Creating the RIM component for the application to connect to the RDBMS server.

Chapter 12. RDBMS Interface Module (RIM)

445

12.3.1 Creating application database tables Tivoli applications supply scripts to create the necessary tables, views, and users on the RDBMS server. Even though RIM means the application does not need to know the specifics of each database implementation, these scripts used to build the tables do need to be vendor specific. This means that each time RIM expands to support another database, there will need to be a product update that provides new scripts for that database. The tables and views that are created make up the databases for the applications that use RIM. Note: 򐂰 It is recommended that you work with an experienced DBA to set up your databases. 򐂰 The scripts that define your application database and tables are created when you install your application, such as TEC or Inventory. 򐂰 Check that you are defining enough space for your environment. For example, check the number of machines that will be scanned by Inventory or the number of events that you expect with TEC. 򐂰 The Tivoli user ID provided during installation is used as the database and table creator. This means, in most implementations, that the database should be created using the same account information as supplied during initial installation. If your installation needs to use a different user ID, use the wsetrim command to change the user ID. This command is used to change any of the RIM objects settings and will be explained in detail later in this chapter.

12.3.2 Creating RIM objects RIM for Tivoli Management Framework Versions 3.7/4.1 includes support for these database products: 򐂰 DB2 򐂰 Sybase 򐂰 Oracle 򐂰 Informix 򐂰 MS SQL (NT Only)

446

Troubleshooting Tivoli Using the Latest Features

These databases can be specified in the GUI or by using the wcrtrim command. The syntax is: wcrtrim [–i] –v vendor { –o host_oid | –h host_name } –d database –u user –H db_home –s server_id [–I instance_home][–a application_label][–m max_connections] rim_name

Where: [-v vendor]

Either DB2, Sybase, Informix, Oracle, or MS_SQL.

[-h host_name]

The host name of the managed node where the RIM object will reside.

[-d database]

The name of the database in the RDBMS server.

[-u user]

The name of the user that can operate on the database.

[-H db_home]

The directory where the RDBMS vendor-specific configuration file resides.

[-s server_id]

The name of the RDBMS server.

[-I instance_home]

The directory where the DB2 instance was created.

[-a application_label]

The new application label.

[-m max_connections]

The maximum number of connections that the RIM object has.

rim_name

The name of the RIM object you are creating.

The application_label and max_connections arguments are not enforced by RIM. It is provided for application use. You can refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806 for specific settings for the databases supported by Tivoli when creating the RIM object. Now let us walk through the steps of creating the RIM object and modifying some of its settings.

12.3.3 RIM scenario-Inventory RIM objects In this scenario, we have the environment shown in Figure 12-3 on page 448.

Chapter 12. RDBMS Interface Module (RIM)

447

Figure 12-3 Tivoli environment used in the scenarios

We have installed IBM Tivoli Configuration Manager Version 4.2 in this environment and will show you the steps we performed to configure RIM objects. The following are our servers and roles used in this scenario: 򐂰 TMR Server: win-tmr01a 򐂰 Gateway Collector: win-rptr01a 򐂰 RIM host and Oracle Client: win-arch01a 򐂰 RIM host and Oracle Server: win-inv01a

We will first create a RIM object using wcrtrim on win-inv01a, change some of its default settings using wsetrim, then test the RIM connection with wrimtest. Finally, we will create another RIM object for the same application (Inventory component of IBM Tivoli Configuration Manager Version 4.2) on win-arch01a. We will associate both RIM objects with the Inventory component of IBM Tivoli Configuration Manager.

448

Troubleshooting Tivoli Using the Latest Features

We use Oracle database on win-inv01a for the Inventory configuration repository. We also have the Oracle RDBMS client on win-arch01a, as it is a requirement for the managed node to be a RIM host to be able to talk to the actual configuration repository on the managed node win-inv01a. . Note: This example was taken from an IBM Tivoli Configuration Manager Version 4.2 configuration, where you can define more than one RIM object for the Inventory component. We will create invdh_1 and invdh_2 as two different RIM objects for the Inventory component to optimize the performance of writing scan results into the RDBMS. At the time of the writing of this redbook, the Inventory component was the only application that has this capability (ability to work with more than one RIM object).

First, let us create the invdh_1 RIM object on win-inv01a. The RIM object will use an Oracle RDBMS on win-inv01a. We create the RIM object invdh_1 on win-inv01a with the wcrtrim command, as shown in Figure 12-1. Example 12-1 Creating invdh_1 RIM object C:\Program Files\Tivoli>wcrtrim -v Oracle -h win-inv01a -d wininv -u invtiv -H c:\oracle\ora90 -s wininv invdh_1 RDBMS password:

Next, we associate the RIM object with the Data Handler (The Data Handler is the component of Inventory that uncompresses and decodes the scan data and sends it to the RIM) with the wsetrim -a command. We also use the -m option with the wsetrim command to set the maximum number of connections for the RIM object to the database.

Chapter 12. RDBMS Interface Module (RIM)

449

Note: Here are all the options you can use with wsetrim: [-d new_name]

The new name of the RIM object.

[-a application_labe]

The new application label.

[-m max_connections]

The maximum number of connections that the RIM object has.

[-d database]

The name of the database in the RDBMS server.

[-u user]

The name of the user that can operate on the database.

[-H rdbms_home]

The directory where the RDBMS vendor-specific configuration file resides.

[-s server_id]

The name of the RDBMS server.

[-I instance_home]

The directory where the DB2 instance was created.

rim_name

The current name of the RIM object you are changing. This parameter is always required.

To change the managed node or vendor for a RIM object, you must delete the object using the wdel command and re-create it using the wcrtrim command. To change the label, you can either delete and re-create the RIM object, or you can use the _set_label method if Tivoli Application Development Environment is installed. The options used for wsetrim in this scenario in Example 12-2 are: –a invdh

Specifies that the application label for the RIM object is invdh. For RIM objects that the Inventory Data Handler uses to connect to the RDBMS, you must set this value to invdh.

–m 2

Specifies the maximum number of connections that the RIM object has to the database. Set this option to a positive value from 1 to 16. The default value is 16. It is recommended that the total number of RDBMS connections set for all RIM objects used by the Inventory Data Handler match the number of output threads set for the Inventory Data Handler.

invdh_1

Specifies the name of the RIM object.

Example 12-2 wsetrim -a invdh -m 2 invdh_1 command C:\Program Files\Tivoli>wsetrim -a invdh -m 2 invdh_1

450

Troubleshooting Tivoli Using the Latest Features

Tips:

IBM Tivoli Configuration Manager Version 4.2 Inventory component has additional RIM settings: Application type and maximum number of connections used to connect to the RDBMS. Here we set these using wsetrim. These settings could also be set using IDL calls. For example we could use the following commands to set the application type for the RIM object invdh_1: 1. Run wlookup -ar RIM to get the object ID, such as: invdh_1 1370748664.2.36#RIM::RDBMS_Interface#

2. Run the following command to get the application type: # idlacall 1370748664.2.36 _get_application_type

which would show "invdh" in this case. 3. To set the application type, you can use the _set_application_type parameter, for example: # idlacall 1370748664.2.36 _set_application_type “”

4. Similarly, you can get the maximum number of connections using the _get_max_conn parameter, for xample: # idlacall 1370748664.2.36 _get_max_conn

which would show 16 in this case. 5. For setting the maximum number of connections, you can use _set_max_conn parameter, for example: # idlacall 1370748664.2.36 _get_max_conn

We then check the connection of the RIM object to the RDBMS with wrimtest (see Example 12-3). Example 12-3 Testing the connection C:\Program Files\Tivoli>wrimtest -l invdh_1 Resource Type : RIM Resource Label : invdh_1 Host Name : win-inv01a User Name : invtiv Vendor : Oracle Database : wininv Database Home : c:\oracle\ora90 Server ID : wininv Instance Home : Instance Name : Opening Regular Session...Session Opened RIM : Enter Option >x (Type x + Enter to exit)

Chapter 12. RDBMS Interface Module (RIM)

451

Releasing session

Tip: With Tivoli Management Framework Version 4.1, there is another way to test the RIM connectivity and troubleshoot RIM configuration problems. You can run the RIM database agent manually to troubleshoot RIM connection problems using the RIM_DB_Agent command located in $BINDIR/bin. It is in $PATH after you set the Tivoli environment variables. To run the RIM database agent manually, do the following:

1. Set the shared library path. Note the following considerations: – If your RIM host is on a Windows NT system, make sure that the PATH system environment variable includes the path to the database vendor DLL files. – If your RIM host is on a UNIX system, you need to set the shared library path only if you are using DB2 or Informix. – For DB2, the shared library path must include $INSTHOME/sqllib/lib. – For Informix, the shared library path must include $INFORMIXDIR/lib:$INFORMIXDIR/lib/cli:$INFORMIXDIR/lib/esql. The following lists the name of the environment variable to use as the shared library path for each operating system type. – For AIX: LIBPATH – For HP-UX: SHLIB_PATH – For Solaris: LD_LIBRARY_PATH – For Windows: PATH 2. Run the wgetrim command to obtain the options for the RIM object. 3. Run RIM_DB_Agent using the values of the RIM object options that you obtained in step 2. The syntax of RIM_DB_Agent is as follows: RIM_DB_Agent –d database –u user [–p password] –H rdbms_home –s server_id [–Iinstance_home]

If the connection is successful, RIM_DB_Agent displays the message Connection Successful. If the connection fails, the appropriate errors are displayed. 4. Next, we create, modify, and test the second RIM object invdh_2 on win-arch01a and associate it with Data Handler on win-inv01a, as in Example 12-4 on page 453.

452

Troubleshooting Tivoli Using the Latest Features

Example 12-4 Creating second RIM object C:\oracle\ora90>wcrtrim -v Oracle -h win-arch01a -d wininv -u invtiv -H c:\oracle\ora90 -s wininv invdh_2 RDBMS password: C:\Program Files\Tivoli>wsetrim -a invdh -m 2 invdh_2 C:\oracle\ora90>wrimtest -l invdh_2 Resource Type : RIM Resource Label : invdh_2 Host Name : win-arch01a User Name : invtiv Vendor : Oracle Database : wininv Database Home : c:\oracle\ora90 Server ID : wininv Instance Home : Instance Name : Opening Regular Session...Session Opened RIM : Enter Option >x Releasing session

Example 12-5 shows the RIM objects in the Tivoli object database. RIM objects invdh_1 and invdh_2 are used by the Inventory Data Handler to write scan data to the RDBMS on win-inv01a. Here you see another RIM object (Example 12-5), inv_query which is used to perform queries against the RDBMS. The winvfilter, winvpackage, winvrmnode, winvsig, and winvupdatesid commands and the Inventory administrative interface use this RIM object. Example 12-5 RIM objects in the Tivoli object database C:\sources>wlookup -ar RIM ccm 1370748664.2.39#RIM::RDBMS_Interface# inv_query 1370748664.2.38#RIM::RDBMS_Interface# invdh_1 1370748664.2.36#RIM::RDBMS_Interface# invdh_2 1370748664.3.32#RIM::RDBMS_Interface# mdist2 1370748664.2.41#RIM::RDBMS_Interface# planner 1370748664.2.40#RIM::RDBMS_Interface#

At this point, we have completed our RIM configuration. The next step in this scenario is to set the maximum number of RDBMS connections for the Data Handler to match the total number RDBMS connections set for all RIM objects used by the Inventory Data Handler. This is recommended for performance purposes and could be set by running the following command: wcollect -o 4 @InvDataHandler:inv_data_handler

Chapter 12. RDBMS Interface Module (RIM)

453

We will not go into more details on Data Handler configuration, since it is not our primary focus in this scenario.

12.4 Troubleshooting example: Failure to connect with a RDBMS Now we will cover a RIM troubleshooting example involving wrimtrace. A distribution failure was indicated to the administrator with the dialog shown in Figure 12-4.

Figure 12-4 RIM connection failure message in the desktop

This problem was created by changing the settings of the Inventory RIM object. A wgetrim (Example 12-6 on page 455) shows that the Database Home directory has been changed to /tmp. The interfaces file for Sybase will not be found there.

454

Troubleshooting Tivoli Using the Latest Features

Example 12-6 Example of wgetrim #wgetrim inventory RIM host: k124a RDBMS User: tivoli RDBMS Vendor: Sybase Database ID: inventory Database Home: /tmp Server ID: rh0255e

To check whether or not a RIM object can connect to the database, use the wrimtest command. If you get a Session Opened message, then RIM connected to your database. You can then execute a retrieve or any SQL command against the database to view data. Example 12-7 is an example of using wrimtest. Example 12-7 wrimtest [root@itso2]/> wrimtest -l tec Resource Type : RIM Resource Label : tec Host Name : itso2 User Name : tec Vendor : Sybase Database : tec Database Home : /data/sybase Server ID : ITSO2 Instance Home : Opening Regular Session...Session Opened RIM : Enter Option >g Table Name > tec_t_evt_rep Enter [/n] [/s /l /f /d []Editor? [y/n] [Default n] > 1 > msg 2 > class 3 > origin 4 > Where Clause > Retrieve) Num of Rows [0] > Row 0 msg : (0) Distributed Monitoring TACF_Monitors/TACF Files on host hptmp9-ep Wed Dec 2 10:12:00 CST 1998 CST Status: >>> critical

Example 12-7 showed using wrimtest to list data from the TEC Event Repository table (tec_t_evt_rep), showing columns msg, class, and origin. Refer to the Tivoli Framework 3.7.1 Reference Manual, SC31-8434 or the wrimtest man page for details on using the wrimtest command. Example 12-8 shows an odstat output from TEC, which did not start up because the RIM object is not configured correctly. Since this is a very large output, many lines have been deleted to just show sample messages. The thread in error is 13166, near the bottom. Example 12-8 TEC not starting up n_active = 14 n_free = 168 tid type ptid State StdO StdE Start Err 1 SYS 7 O+bhdoq run 0 0 Mon14:16 1998892590.1.158#TMF_Scheduler::sch eduler# start 11 O+bhdoq run 0 0 Mon14:16 1998892590.1.616#SentryEngine::engi ne# run_engine 86 SYS 105 O+hdoqs run 0 0 Mon14:17 1998892590.1.530#TMF_UI::Extd_Desk top# uiserver

Method

---- history ---13141 O+bhdoq done 6 0 11:18:27 1998892590.1.885#Tec::Se rver# start_server 13142 O+hdq1-13141 done 6 0 11:18:27 1998892590.1.885#Tec::Serv er# _set_state 13147 O+hdq1-13142 done 6 0 11:18:27 1998892590.1.881#Tec::Inst anceManager# update_state 13148 O+hdq1-13141 done 290 0 11:18:28 1998892590.1.885#Tec::Serv er# get_backrefs 13151 O+hdoq1-13141 done 6 0 11:18:28 1998892590.1.179#TMF_Administrator::Configuration_GUI# refresh_member 13152 O+hdq 1-105 done 5100 0 11:18:28 1998892590.1.865#TMF_UI::P resentation# get_icon_info

456

Troubleshooting Tivoli Using the Latest Features

13153 O+hdq 1-4509 done 5100 0 11:18:28 1998892590.1.865#TMF_UI::P resentation# get_icon_info *13154 O 1-13141 done 0 0 11:18:28 NO_METHOD 1998892590.1.881#Tec::In stanceManager# refresh_member 13155 O+ 1-13141 done 47 0 11:18:28 0.0.0 get_host_location 13156 O+ 1-13141 done 47 0 11:18:28 0.0.0 get_host_location 13157 O+hdoq1-13141 done 43 0 11:18:28 1998892590.1.348#TMF_ManagedNode::Managed_Node# install_directory 13158 O+ 1-13157 done 15 0 11:18:28 0.0.0 get_oserv 13159 O+ 1-13157 done 26 0 11:18:28 1998892590.1.2 query insta ll_dir 13160 O+hdoq1-13141 done 24 0 11:18:29 1998892590.1.348#TMF_ManagedNode::Managed_Node# interpreter 13161 O+ 1-13160 done 15 0 11:18:29 0.0.0 get_oserv 13162 O+ 1-13160 done 7 0 11:18:29 1998892590.1.2 query inter p *13163 O+ho 1-13141 done 352 0 11:18:29 e=12 1998892590.1.885#Tec::Serv er# is_dataserver_running 13164 O+ 1-13163 done 15 0 11:18:29 0.0.0 get_name_registry 13165 O+hdoq1-13163 done 105 0 11:18:29 1998892590.1.26 lookup *13166 O+hdoq1-13163 done 352 0 11:18:29 e=12 998892590.1.905#RIM::RDBM S_Interface# RIM_iom_session 13167 O+hdq1-13141 done 6 0 11:18:29 1998892590.1.885#Tec::Serv er# _set_state

Example 12-9 on page 458 shows the wtrace output for the same error. The complete output is too large, so we have just shown the thread that actually produced the error, 13166. The RIM_iom_session method failed because a RIM could not make a connection to the RDBMS.

Chapter 12. RDBMS Interface Module (RIM)

457

Example 12-9 Example of a RIM error in wtrace output loc-ec 13166 M-hdoq 1-13163 54 e=12 Time run: [Tue 01-Dec 11:18:29] Object ID: 1998892590.1.905#RIM::RDBMS_Interface# Method: RIM_iom_session Principal: [email protected] (-2/-2) Helper pid: 50532 Path: /data/usr/local/Tivoli/bin/aix4-r1/TAS/RI M/RIM_Sybase_prog Input Data: (encoded): { 40 "0x32 0x34 0x35 0x34 0x39 0x38 0x36 0x38 0x33 0x30 0x49 0x33 0x35 0x31 0x30 0x30 0x49 0x32 0x30 0x30 0x30 0x34 0x62 0x34 0x38 0x32 0x30 0x30 0x30 0x34 0x62 0x34 0x38 0x20 0x69 0x74 0x73 0x6f 0x32 0x00 " } Results: (encoded): "Exception:UserException:SysAdminException::ExException:RIM: :ExRIMError:RIM::ExRIMConnectFail" { "Exception:UserException:SysAdminException::ExException: RIM::ExRIMError:RIM::ExRIMConnectFail" "rim_errors" 1 "Could not connect to RDBMS server to access database %7$s." 912532709 { 0 } "tec" 0 }

If we now look at the RIM trace log from the same exception, we will see a more detailed description of the problem (Example 12-10 on page 459).

458

Troubleshooting Tivoli Using the Latest Features

Example 12-10 Example output of a RIM tracing with wrimtrace 00050532 [Tue Dec 1 11:44:50 1998] Connection ID: 0, Operation: val_connect, DB Call: val_connect Library Error: Could not open interface file.DB-Library Reports OS Error No such file or directory 00050534 [Tue Dec 1 11:45:33 1998] Trace Message to IOM Channel 00050534 [Tue Dec 1 11:45:33 1998] Trace Message IOM Loop 00050534 [Tue Dec 1 11:45:38 1998] Trace Message IOM Command: RELEASE row_param: rows: number1: 0 number2: 0 string1: string2: 00050534 [Tue Dec 1 11:45:38 1998] Trace Message REPLY IOM Command : RELEASE Result : Success rows: 00050534 [Tue Dec 1 11:45:38 1998] Trace Message Loop

- Connection ID:: Connecting

- Connection ID:: Beginning - Connection ID::

- Connection ID::

- Connection ID:: Ending IOM

Here we can see the connect operation failed because it could not find a file or directory. This was because the Database Home field in the tec RIM object definition was incorrectly typed. After correcting the RIM object using wsetrim, the database connected successfully, as shown in the second half of the above screen.

12.4.1 RIM specifics When an exception is thrown, any RDBMS-specific error messages are included in the exception. Error messages are also logged to a file. Exceptions contain: 򐂰 The RIM error message 򐂰 The database function that caused the error 򐂰 The return code from the database function 򐂰 The database error message

Before the exception occurs, a message is written to the RIM log. The default error log file is /tmp/rim_db_log, but the location can be changed by creating a

Chapter 12. RDBMS Interface Module (RIM)

459

RIM_DB_LOG variable in the oserv environment by using the following command: odadmin environ set RIM_DB_LOG “pathname”

You can locate and change the default location of the RIM log file by completing the following steps: 1. Run the following command: odadmin environ get >env.out

2. Edit the env.out file and add the following: RIM_DB_LOG=/tivoli/rim/rim_db_log

3. Run the following command: odadmin environ set Settings -> Control panel -> Services to access the box. With ntprocinfo, TRIP will show up in the process listing, if it is running. If you use the Windows GUI, you should see a Tivoli endpoint entry with a Started status. If the entry is missing, then the lcfd service setup did not complete. If the entry exists but does not reflect a Started status, then setup is complete but the service is not running. Use ps -ef | grep lcfd on UNIX machines to determine if the lcfd process is running. If lcfd is running, then you know you have a login problem. If lcfd is not running, look in LCF_DATDIR directory for a lcfd.log file. Existence of the log file indicates a previous startup attempt. (There may also be a core file in that case.) Review the log file to see what it has to tell you. Check the last.cfg file (in the same directory) to verify appropriate startup options (you may want to edit log_threshold to 3) and then try to restart the lcfd. If the directory contains only the environment setup scripts and a startup script, then no previous startup attempt was made. Change directories to LCF_BINDIR,

480

Troubleshooting Tivoli Using the Latest Features

and attempt a manual start of the lcfd, using appropriate options. If it starts, monitor for completion of the initial login. If startup fails, continue troubleshooting the installation. It may be easiest to delete the installation and re-install.

Login problems Once TMA installation/startup has been confirmed, if the new endpoint still fails to show up in a wep ls output after a reasonable period of time, you know you have a login problem. A login typically fails to complete either because no gateway is reachable or due to a problem with the endpoint manager. So your troubleshooting approach should be to obtain as much log data as possible and eliminate as many variables as you can. The recommended approach is to directly log in to a specified gateway that has previously been checked for proper operation and configured to adequately log endpoint communications. To accomplish this, do the following: 1. Run wgateway on the appropriate TMR Server to verify that a gateway exists, and to verify its name. 2. Run wgateway describe. This will verify proper gateway operation and the gateway port value. 3. Run wgateway set_debug_level 5 on the chosen gateway. This is the log level required to capture endpoint login traffic. 4. Restart the gateway. 5. Obtain the IP address of the gateway host. 6. Stop and restart the lcfd using the -g + option to force login to the selected gateway, and the -d 3 option to record all login traffic. Now monitor the login process. Unless there is a problem with the endpoint manager, you should see login complete fairly soon. If not, review the lcfd.log file on the endpoint, and the gatelog file on the gateway for further information.

Use of the endpoint lcfd.log For a failed login, you need to know: 򐂰 Did the lcfd process remain alive? 򐂰 Did the node_login method get run?

If both these things are true, but no line indicating receipt of a dispatcher number appears, then the problem most likely lies elsewhere. Otherwise, the problem is definitely on the endpoint itself. Endpoint logging operates at various levels from 0 through 4, and is set with the -d option of the lcfd command. The default log level is 1, which is minimal logging. Example 13-1 on page 483 is an

Chapter 13. Endpoints and endpoint management

481

extract from an endpoint log level 1 lcfd.log file that shows the methods and messages associated with a successful initial login attempt.

482

Troubleshooting Tivoli Using the Latest Features

Example 13-1 lcfd.log file for for machine nollie Sep Sep Sep Sep Sep Sep Sep Sep Sep

10 10 10 10 10 10 10 10 10

15:52:04 15:52:07 15:52:07 15:52:07 15:52:07 15:52:07 15:52:07 15:52:07 15:52:07

1 1 1 1 1 1 1 1 1

lcfd lcfd lcfd lcfd lcfd lcfd lcfd lcfd lcfd

node_login: listener addr '0.0.0.0+7503' nollie is dispatcher 2 in region 1326392407 write login file 'lcf.dat' complete Logging into new gateway... nollie is dispatcher 2 in region 1326392407 write login file 'lcf.dat' complete final pid: 7136 Login to gateway 146.84.113.64+7502 complete. Ready. Waiting for requests (0.0.0.0+7503). Timeout 120.

In this example, Ready line indicates that downcalls/methods are now allowed. The start of login is the node_login method. If the next line shows receipt of a dispatcher number, then the endpoint manager considered this a successful login and returned this information. The line showing login to a specific gateway as complete indicates completion of the final, normal login. Higher log levels will provide more detailed information. The evidence of a failed initial login with the log level at 1 is the absence of the line showing a dispatcher and region number. To get more information, restart the lcfd using log level 3 and recheck the lcfd.log file. Example 13-2 is from a log level 3 endpoint lcfd.log file, and shows the methods and messages associated with an initial login attempt. Note the much more verbose messages. This extract reflects a non-default login attempt to a specific gateway rather than a broadcast attempt. What you want to know at this point is, did the lcfd attempt a login, and if so, to what IP address? Example 13-2 Log level 3 endpoint lcfd.log-1 Apr Apr Apr Apr Apr

15 15 15 15 15

09:05:12 09:05:12 09:05:12 09:05:12 09:05:12

1 2 2 Q Q

lcfd lcfd lcfd lcfd lcfd

node_login: listener addr '0.0.0.0+9494' No known gateways. Trying other login listeners... send_login_dgram: interval=300 attempts=6 net_usend of 410 bytes to 146.84.113.31+9495. Bcast=0

Following the actual sending of the login data, the next two log entries should appear (Example 13-3). Example 13-3 Log level 3 endpoint lcfd.log-2 Apr 15 09:05:12 Q lcfd send_login_dgram: waiting for reply. attempt 1 of 6 Apr 15 09:05:12 Q lcfd net_accept, handle=0x306188

Chapter 13. Endpoints and endpoint management

483

If the preceding sequence appears, the endpoint has completed its attempt to contact a gateway. One of two things will happen next: Either the endpoint will receive a reply, or it will not. If it does not, upon expiration of the udp_interval, you will see the lines in Example 13-4. Example 13-4 Log level 3 endpoint lcfd.log-3 Apr 15 11:11:41 Q lcfd send_login_dgram: recv 1 timed out Apr 15 11:11:41 Q lcfd net_usend of 410 bytes to 146.84.113.31+9495. Bcast=0

This will repeat for as many attempts as have been specified in udp_attempts, unless a response is received. What you know at this point is that the endpoint code itself is functioning. The problem lies elsewhere. If no response is ever heard from any gateway, nothing further can be learned from the lcfd.log. In that case, check the intercepting gateway's gateway.log. When the endpoint does receive a response from an intercepting gateway, Example 13-5 is seen in the lcfd.log. Example 13-5 Log level 3 endpoint lcfd.log-4 Apr 15 09:06:35 Q lcfd New connection from 146.84.113.31+4576 Apr 15 09:06:35 Q lcfd Entering net_recv, receive a message

This message will indicate either a successful initial login, or a failure. A typical failure might look like Example 13-6. Example 13-6 Log level 3 endpoint lcfd.log-5 Apr 15 09:06:36 Q lcfd Leaving net_recv: bytes=2554, (type=15 session=0) Apr 15 09:06:36 Q lcfd recv: len='2554' (code='15', session='0') Apr 15 09:06:36 0 lcfd A failure was detected by the oserv daemon: resource still in use

The failure message was generated on the TMR Server and passed back down through the initial login gateway to the endpoint. A successful initial login will produce results similar to Example 13-7. Example 13-7 Log level 3 endpoint lcfd.log-6 Apr Apr Apr Apr Use

484

15 15 15 15 of

11:40:05 Q lcfd Leaving net_recv: bytes=778, (type=14 session=0) 11:40:05 Q lcfd recv: len='778' (code='14', session='0') 11:40:05 2 lcfd Writing GCS file: c:\Tivoli\lcf\dat\1\last.cfg 11:40:05 1 lcfd nollie is dispatcher 4272 in region 1979620092 the Intercepting gateway gatelog

Troubleshooting Tivoli Using the Latest Features

This discussion will assume that a login problem has been previously diagnosed and the steps listed under “Login problems” on page 481 have been performed. In that case, the intercepting gateway will be running at log level 5. The reason for this assumption is that a gateway operating at the default log level will have no record of any endpoint login activity. An initial login request will look like Example 13-8. Example 13-8 An initial login request Win NT ep freedom 1999/04/15 09:03:49 +05: dgram in: 410 bytes 1999/04/15 09:03:49 +05: udp server: waiting for connection on 0.0.0.0+9495... 1999/04/15 09:03:49 +05: process_node_login: Endpoint (0) is speaking ECP protocol version 2 1999/04/15 09:03:49 +05: processing login request from 146.84.113.68+1050 (freedom,w32-ix86, BXB11N8T4X+ZV52GL9YV00000561,reg=0,od=0) 1999/04/15 09:03:49 +05: eplogin (0): forwarding initial login to epmgr

The receiving gateway saw a login attempt from dispatcher number 0 ("od=0") and passed this login attempt to the endpoint manager (epmgr). Note the IP address plus port number following the string processing login request from. When searching the log file to find the endpoint manager response to this request, use the IP address + port string to search. There could be multiple login attempts arriving from this endpoint, and the port number is needed to understand what response belongs to which login attempt. Example 13-9 shows the gatelog entry corresponding to the initial login attempt seen in the previous example. The login attempt failed due to a resource conflict on the TMR Server. As seen in the lcfd.log example, the failure message gets passed back to the endpoint. Example 13-9 Gatelog entry 1999/04/15 09:08:31 +05: failure during login for 146.84.113.68+1050 (freedom,w32-ix86,BXB11N8T4X+ZV52GL9YV00000561,reg=0,od=0): A failure was detected by the oserv daemon: resource still in use

A successful login attempt, on the other hand, looks like Example 13-10. Example 13-10 Successful login attempt 1999/04/15 11:42:02 +05: login succeeded for 146.84.113.68+1068 (freedom,w32-ix86,BXB11N8T4X+ZV52GL9YV00000561,reg=1979620092,od=4272)

Note that the return message from the endpoint manager contains the region number and dispatcher number for the newly created endpoint. At this time, the endpoint has already been added to the endpoint list and the Name Registry.

Chapter 13. Endpoints and endpoint management

485

Once this information is passed back to the endpoint (this does not appear in the gatelog, only the lcfd.log) the new endpoint then attempts contact with its permanent gateway and, if successful, receives an initial method download. To continue investigation at this point, you must determine what primary gateway was assigned. Use wep to get the OID of the assigned primary gateway. This should work even if the endpoint has failed to complete its login. If the permanent gateway is configured to log endpoint login traffic (log level 5), the initial method download will appear as a rather verbose set of entries, which terminates with this line: 1999/04/15 11:42:16 +05: run_login_policy: Running login policy on endpoint freedom.

Your clue that everything necessary happened is the log entry generated by the run_login_policy method. If that entry exists on the gatelog, you can accept that the entire initial login sequence succeeded and the endpoint got configured. If that entry does not exists, that means that endpoint login has failed subsequent to receiving its dispatcher number and interface list back from the endpoint manager. There might be two scenarios here: 򐂰 One is the situation in which the endpoint cannot reach its assigned primary gateway due to network/routing problems. In that case, the only evidence will be in the endpoint's lcfd.log file. Eventually, the endpoint might log in to one of its alternate gateways, but this might require an extended time to occur. When it does occur, the login might be treated as an isolation login, and the endpoint will (most likely) get the same interface list and primary gateway as before. If the network problem has not been corrected, the whole sequence will start over and potentially repeat endlessly. The only way to solve this is to fix the network problem. 򐂰 The second situation occurs if a network condition exists such that an endpoint can connect to an intercepting gateway, but the gateway cannot connect back to the endpoint. In such cases, the endpoint will continue to perform initial logins to the gateway. Aside from the potential for duplicate endpoints this can create, the endpoint itself may never complete the login sequence if no secondary gateway or broadcast fallback option was provided in its startup configuration. Identification of this condition is not very easy, because gateway log files do not reflect the failed communication down to the endpoint. The clue to look for is continued receipt of initial logins by the endpoint after the login succeeded entry in the gatelog. To verify the condition, you will need to attempt to ping the endpoint from the intercepting gateway machine. If the ping fails to contact the endpoint, this is the problem. The fix is to either restart the endpoint (lcfd) using a different intercepting gateway, or to fix the access problem.

486

Troubleshooting Tivoli Using the Latest Features

If any response at all is received back from the endpoint manager, then that part of the communication chain worked, and in most failed initial logins, the message received by the gateway is a good indication of the reason for the failure. If no response is received back, you will have to go to the epmgr.log to determine where the communication chain broke down.

13.2.5 Multiple (duplicate) endpoints A common problem in large TMRs has been the creation of duplicate endpoints during initial endpoint login. Identification of the problem is easy; a wep ls output will show multiple entries for the same host name, with all but one having a dot (.) and a dispatcher number appended to the host name. Note: This only happens if duplicates are allowed.

First, let us elaborate what we mean by duplicate endpoints. This term does not refer to multiple copies of endpoint code installed on one host, even if all are running and logged into a gateway. Rather, it refers to a situation in which one endpoint code installation exists on a host, but multiple entries for that host exist in the endpoint list on the server, as evidenced by the wep ls output. In this case, only one entry is the real one and there is no easy way to tell which is the functional endpoint.

Why duplicate endpoints occur The following are the two primary causes for duplicate endpoints on installations: 򐂰 Re-installation of a machine that uses the same host name and IP address as the first installation. 򐂰 Excessive delays in login processing on the TMR Server.

The reason for the first one is clear. But the second one needs more explanation. This issue results mainly from two characteristics of the Tivoli Management environment: 򐂰 The resource locking mechanism of the TNR 򐂰 The indeterminate sequencing of thread processing

The locking mechanism permits no queuing of lock requests, but read lock requests always take priority over write lock requests. Also, a write request must have an exclusive lock, whereas read locks can be granted on a non-exclusive basis. This makes a write lock difficult to obtain on a busy TMR, since many command-line interface (CLI) commands (and their GUI counterpart actions) obtain read locks on the TNR.

Chapter 13. Endpoints and endpoint management

487

It is possible to observe the sensitivity of endpoint initial logins to TNR locking issues by reviewing the following example. First, look at an extract from a command output of odstat, showing the threads and methods executed to complete an initial login sequence (Example 13-11). Example 13-11 odstat showing initial login sequence 648 O+hdoqs 1-6 done 444 0 15:12:06 1399474381.1.517#TMF_LCF::EpMgr# endpoint_login 649 O+ 1-648 done 34 0 15:12:07 1399474381.1.517#TMF_LCF::EpMgr# allow_install_policy 650 O+ 1-648 done 34 0 15:12:07 1399474381.1.517#TMF_LCF::EpMgr# select_gateway_policy 651 O+hdoq 1-648 done 39 0 15:12:08 1399474381.1.26 region_get_all 652 O+hdoqs 1-648 done 390 0 15:12:08 1399474381.1.530#TMF_Gateway::Gateway# new_endpoint 653 O+hdoq 1-648 done 155 0 15:12:08 1399474381.1.26 region_get_all 654 O+hdoqs 1-648 done 78 0 15:12:08 1399474381.1.530#TMF_Gateway::Gateway# get_net_aliases 655 O+hdoq 1-648 done 6 0 15:12:08 1399474381.1.26 add_value 656 O+ 1-655 done 15 0 15:12:08 0.0.0 get_name_registry 657 O+hdoq 1-655 done 97 0 15:12:08 1399474381.1.26 lookup 658 O+hdq 1-655 done 57 0 15:12:08 1399474381.1.4 lookup_id 659 O+hdq 1-655 done 296 0 15:12:08 1399474381.1.4##6@LCFData::ep_tnr_info_s describe 660 O+ 1-648 done 34 0 15:12:09 1399474381.1.517#TMF_LCF::EpMgr# after_install_policy

Thirteen separate threads are generated to complete one standard login sequence on a single-gateway TMR. In environments with multiple gateways, add up to five more threads to allow for additional get_net_aliases methods. This login sequence uses the default endpoint policies, which make no use of Tivoli CLIs. Use of the CLIs by the endpoint policies can even result in additional threads and locks. By tracing objcalls and services during this sequence, it is possible to identify the locks generated. The lock lines in Example 13-12 on page 489 were extracted from a command output of wtrace of the same example login.

488

Troubleshooting Tivoli Using the Latest Features

Example 13-12 Lock lines loc-is loc-is loc-os loc-is loc-is loc-os loc-is loc-is loc-os

651 651 651 653 653 653 655 655 655

getattr 0 lock_timeout lock 21 lock 0 getattr 0 lock_timeout lock 20 lock 0 getattr 0 lock_timeout lock 21 lock 0

Note that three threads generate locks against the TNR for each initial login attempt. The first two threads are read locks, but the third thread requires a write lock to add a value to the TNR. This is typically where endpoint logins fail. For an example of this failure, look at the trace file extract in Example 13-13. Example 13-13 Trace file extract loc-ic 17861 M-hdoq 1-17848 211 Time run: [Fri 09-Apr 17:01:04] Object ID: 1979620092.1.26 Method: add_value Principal: tomu@ttdsmsu8 (60001/60001) Path: /solaris2/TMF/BASESVCS/TNR_prog1 Trans Id: { 1979620092:1,1979620092:1,469:15930 } #4 Input Data: (encoded): "Endpoint" { "1979620092.1638.2181+#TMF_Endpoint::Endpoint#" "wdntgcn1-211 009" { "LCFData::ep_tnr_info_s" 15 false { "1979620092.75.326#TMF_Gateway::Gateway#" } } } loc-is 17861 getattr 0 lock_timeout Time run: [Fri 09-Apr 17:01:04] Object ID: 1979620092.1.26 Method: add_value loc-os 17861 getattr 4 Time run: [Fri 09-Apr 17:01:04]

Chapter 13. Endpoints and endpoint management

489

Object ID: 1979620092.1.26 Method: add_value Results: (binary) 00000000 loc-is 17861 lock 21 Time run: [Fri 09-Apr 17:01:04] Object ID: 1979620092.1.26 Method: add_value Input Data: (ascii): NameRegistry!Endpoint loc-os 17861 lock 0 IN_USE Time run: [Fri 09-Apr 17:02:04] Object ID: 1979620092.1.26 Method: add_value loc-oc 17861 e=12 90 Results: (encoded): "Exception:StExcep::SystemException:StExcep::OBJ_ADAPTER" { 31 1 }

Large numbers of initial logins occurring simultaneously increase the problem. This by itself can result in no more than a delay in initial login, because resource-in-use type errors are passed back to the originating endpoint, which tries again after expiration of the login_interval. But processing delays on busy TMRs can exceed 30 minutes. This has little to do with processor load on the server; rather, it is a direct result of the resource locking mechanism used by the oserv process. When the endpoint receives no response, it assumes that its login attempt was unheard, and retries login on a default five minute interval. These attempts are processed as initial logins, creating a series of duplicate endpoints. A good way to prevent duplicate endpoints is to deploy endpoints in advance of products, and by gradually phasing in the endpoint deployments. Note: Endpoint duplication can be a potential issue during an in-line migration of an existing environment,. Deployment of a limited-size test environment cannot expose this problem.

In the case of busy TMR Servers, especially when large numbers of endpoints are trying to accomplish initial logins, you have to be careful when deploying TMA code through Software Distribution, by using a UNIX or Windows NT login script or by various machine imaging methods. If you choose any of these methods, try to limit the number of endpoints that need to perform the initial login using a phased deployment approach.

490

Troubleshooting Tivoli Using the Latest Features

“Using allow_install_policy to prevent multiple endpoints” on page 500 gives you some recommendations on how to prevent duplicate endpoint problems using allow_install_policy.

Correcting duplicate endpoints In a situation in which multiple endpoint entries occur due to repeated initial login attempts, only the last endpoint entry is the valid entry. However, because the order of processing is indeterminate, it is difficult to find out which is the functional endpoint. It might be the first, last, or any one of the dispatcher numbers in between. After you identify the problem, perform the following four tasks: 1. Stop further initial logins (duplicate or otherwise) until remedial steps can be taken. 2. Install patch 3.6-TMR-0020 or any superseding patch. (This patch is already included in 3.7 and above versions of the Tivoli Management Framework). 3. Put an endpoint policy in place to prohibit duplicate logins in the future. 4. Clean up existing duplicate endpoints. Step one requires replacing any existing allow_install_policy with a policy that always returns a non-zero exit code. You can do so with the following three-line script: #!/bin/sh # script shutoff.sh exit 1

Obtain a backup of the existing policy script as follows: # wputeppol allow_install_policy >

At this point, no more duplicates are allowed to log in. If the system is not already patched to stop login storms, do so now. Move to step three. The most reliable strategy to prevent duplicate endpoints is to use the allow_login policy. If the allow_login policy checks for a host name or IP address entry in a text file and disallows login if an entry exists for the endpoint, no duplicate endpoints can ever be created. (If the endpoint does not exist, the allow_install_policy can add it to the list before moving on.) But with this scheme, the first initial login attempt is effectively the only login attempt, and if it fails for any reason, the endpoint will never be allowed to complete initial login.

Chapter 13. Endpoints and endpoint management

491

Whichever approach you choose, prepare either a shell or Perl script and test it thoroughly using simulated login strings. Make sure it behaves appropriately for both existing and non-existent endpoint host names. Then install it using wputeppol. Next, clean up the existing duplicate endpoints. The only cleanup procedure to accomplish this is to delete all entries for this host from the endpoint list (using the wdelep command), shut down the endpoint process lcfd, delete the lcfd.dat file on the endpoint (directory.../Tivoli/lcf/dat/1), and restart the Tivoli endpoint process lcfd. With your new policy scripting in place, the new endpoint login can succeed without creating a duplicate entry.

13.2.6 Endpoint isolation/migration Once an endpoint completes the initial login sequence and becomes associated with a gateway, all its communications are handled by that gateway. However, it is possible that the gateway may fail to respond to the endpoint for a variety of reasons. If this happens, the endpoint becomes isolated. There are three commonly seen isolation scenarios that can cause problems: 򐂰 Gateway process or host machine failure followed by an upcall from the endpoint 򐂰 Endpoint migration, followed by gateway failure, followed by an upcall 򐂰 Improper login interval settings

In the first scenario, if an upcall-generating endpoint finds that its primary gateway is unreachable, it will attempt contact with one of its alternate gateways and perform an isolation login, as described earlier. However, for customers not utilizing upcall-generating products, there is a problem. A downcall will not cause a reconnection. If the TMR Server does a downcall (say a distribution), when it finds that the gateway is down, it reports that the downcall failed and takes no further action. If completion of the downcall task at this time is essential, this behavior can be worked around by issuing a wep migrate command to assign the endpoint to a functional alternate gateway. Then the downcall will complete; the endpoint will obtain new primary gateway information, and shift to the new primary via a migratory login process. One suggestion to automate this workaround is to monitor the gateway periodically via a scheduled task using the wgateway command. If the gateway goes down and cannot be recovered, then issue wep migrate commands for all of the endpoints belonging to that gateway. When the primary gateway comes back on line, you may want to re-migrate these endpoints back where they belong. (It will not occur automatically.)

492

Troubleshooting Tivoli Using the Latest Features

The second scenario involves an incomplete change in gateway assignment via wep migrate. If the former primary gateway fails before the endpoint learns of the change in assignment, the migration may occur in an unanticipated manner, as described below. A downcall will complete the migration normally, because it will use the still-functional newly assigned gateway, and in the process the endpoint learns of the reassignment and logs in to its new gateway. On the other hand, an upcall will cause the endpoint to attempt communication with its last known gateway and when the login fails then the endpoint will try to connect to an alternate gateway. The intercepting gateway then deals with the situation as an isolation login. This has the effect of nullifying the gateway selection specified in the wep migrate call, because the select_gateway policy is run for isolation logins. The endpoint will still migrate, but perhaps not to the anticipated gateway. One other isolation scenario sometimes occurs when endpoint and gateway login timing settings are incorrectly coordinated. Recall that the endpoint will re-attempt login after udp_interval seconds, if no response is heard from the gateway. The default setting is 300 seconds, but can be set to any desired value. The gateway, if brought up to current patch levels, enforces a login_interval restriction of (default) 270 seconds, that is, any login attempts coming in less than 270 seconds apart from the same endpoint are simply ignored. Important: The default udp_interval setting on the endpoint is greater than the default login_interval setting on the gateway; this is intentional.

If the login_interval setting on the gateway is set to a value greater than the udp_interval on the endpoint, the endpoint will never get logged in if the first attempt fails for any reason. This is because the timer gets reset to zero after each login attempt. The lesson to be learned is, do not configure the gateway login_interval unless you also make appropriate adjustments to endpoint udp_interval settings.

Orphaned endpoints This problem occurs when the endpoint manager loses track of an existing endpoint, due to restore from backup or an inadvertent wdelep. The endpoint tries to log in, but gets rejected every time. The only way to fix the endpoint is to physically go to it and delete the lcf.dat file. It is a good practice to capture wep ls output in a file after endpoint deployments, and use this file for comparison purposes.

Chapter 13. Endpoints and endpoint management

493

13.2.7 Best practices for endpoint management There are a number of things that should be done at the outset to improve the reliability of TMA deployment and management: 򐂰 Develop and use the endpoint policy scripts for your environment. This could be called the most important step for improving the reliability of TMA deployment and management. 򐂰 Set the log levels on all gateways to the appropriate levels before starting extensive endpoint deployments. As a routine practice, gateways should be set to log level 5 before mass endpoint deployment occurs, new endpoints should be configured to start at log level 3, and these settings should be retained until the TMR has stabilized. 򐂰 Use directed login of new endpoints to specified gateways, and disable broadcast logins, at least during initial deployment. 򐂰 Use some mechanism to restrict the number of endpoints attempting initial login to groups of a hundred or less. This is not a hardcoded number; it simply reflects experience with endpoint deployment in a busy TMR. 򐂰 Choose the endpoint and endpoint manager timing parameters very carefully with consideration to your particular environment. When customizing the parameters, remember these following points:

– Always set the login_interval on the endpoint manager lower than the intervals on the endpoints. You filter out valid logon attempts if you set the login_interval higher. This is the most common mistake made when configuring the timing parameters. – Do not set the udp_interval too short. You can have duplicate endpoints when your Tivoli management region does not respond quickly enough. – When in doubt about choosing settings for any endpoint timing parameters, choose longer time intervals for your endpoints. Shorter time intervals for endpoints often cause more problems than longer time intervals. Longer time intervals usually just cause frustration because things happen more slowly, but they do not create duplicate endpoints or flood the gateways with logon requests.

13.3 Endpoint policies and endpoint policy scripting In this section, we will describe the endpoint policies. The information in this section is largely derived from the Tivoli Field Guide: The Endpoint Policy. (see 1.3.1, “Tivoli Field Guides” on page 7 for more information on how to find this publication).

494

Troubleshooting Tivoli Using the Latest Features

You can configure endpoint login behavior and communication patterns by developing scripts that execute at various times in the process. There are four such hooks. Of these, the login_policy runs on the endpoint gateway; the other three run at the endpoint manager. The four endpoint policies are (see Figure 13-3 on page 496): allow_install_policy

This policy gets executed for initial logins ONLY and determines whether the endpoint will be allowed to log into this TMR.

select_gateway_policy This policy gets executed each time a login packet gets forwarded to the endpoint manager. So it applies to initial logins, isolation logins, and orphaned endpoint logins. For pre-3.6.2 versions, it unfortunately also applies to migratory logins in certain circumstances. It determines the endpoint's list of alternative gateways. after_install_policy

This policy is run only for initial logins and orphaned endpoint logins and runs immediately after creation of the endpoint object. Since the endpoint is not yet fully configured, it cannot be used to perform downcalls.

login_policy

This policy is executed at each normal endpoint login and is used for actions that are appropriate each time an endpoint logs in.

We will first cover general rules for policy scripting and then describe these endpoint policies in the order in which they execute.

Chapter 13. Endpoints and endpoint management

495

EP M a n ag e r

a llo w_ in sta ll_ po licy

1

se le ct_ g a te wa y_ p olicy

2

a fter_in stall_ p o licy

3

lo gin _p o licy

4

E P G atew ay

EP

Figure 13-3 Tivoli endpoint policies

13.3.1 General rules for policy scripting First of all, you have to have endpoint policy scripts. An out-of-the-box TMR does come with an initial set of scripts. These scripts fulfill a mandatory requirement for all policy scripts: generation of a predictable exit status. But you might need to customize these default policy scripts for your specific environment. The following are things that you need to consider when customizing these scripts: 򐂰 Test your scripts thoroughly. A script that encounters input that it cannot handle and therefore exits abnormally will cause indeterminate (and probably wrong) behavior by the endpoint manager or gateway. 򐂰 Confine policy script activity to the minimum consistent with the policy's stated purpose. Although, in theory, you could take a wide variety of actions in policy scripts (virtually any system or Tivoli CLI command is available to the script writer), there is a finite limit to the number of policy scripts that can be running at any one time, so they should be as compact as possible. 򐂰 Simply fail a policy script if a resource conflict occurs, rather than attempt to wait (sleep) for the contention to resolve itself. 򐂰 Minimize use of Tivoli CLI commands to obtain TMR resource information in very busy TMR environments. For best performance and reliability, it is better to create and use plain text files containing the needed information, especially when doing mass deployments of new endpoints. The reason for this is that

496

Troubleshooting Tivoli Using the Latest Features

the endpoint manager makes extensive use of Tivoli databases while setting up new endpoints, and use of other "w" commands at the same time exacerbates the potential for resource conflicts. 򐂰 Hardcode non-volatile information directly into the policy script, for efficiency. In other words, if it is not going to change often and unpredictably, do not keep looking it up. An example is the latest endpoint version. This will change infrequently, and only when a patch/upgrade is installed. So if you want the login policy to inspect and/or auto-upgrade your endpoint code version, it is more efficient to hardcode the current version into the policy script. 򐂰 If possible, endpoint policy scripts should be written in Perl for efficiency and portability. However, any shell program available on the TMR Server will work for all but login policy, which is installed on the TMR Server, but gets run on each gateway. Depending on the TMR architecture, login policy scripting may need to be cross-platform portable, making Perl the most viable option. However, to fully-benefit from Perl, keep these in mind:

– If using external command output in a Perl script, avoid constructing a series of pipes. For example, it is common in shell scripting to run a command, pipe the output to grep to capture the desired line(s) of output, then pipe that result to cut to grab the needed field of information. This has been known to create problems on Windows systems, and is inefficient. Take the output of the initial command and process it using Perl pattern-matching constructs. – Perl code is compiled before it is run, in contrast to shell scripts, which are interpreted, one line at a time, and run. This makes Perl significantly faster, if used properly. Do not use shell commands if there is a Perl command that will do the job. For example, do not use the UNIX rm command to delete a file. Use the Perl's unlink command.

13.3.2 Information available to policy scripts In order to design policy scripting, you must know what information you have available. The endpoint manager/gateway calls all policy scripts with ten arguments: 1. The label (hostname by default) of the endpoint machine. 2. The object reference of the endpoint machine (OBJECT_NIL for initial logins). 3. The architecture type of the endpoint machine. 4. The object reference of the gateway that the endpoint logged into. 5. The IP address of the endpoint logging in. 6. Region. 7. Dispatcher ("0" (zero) for initial logins).

Chapter 13. Endpoints and endpoint management

497

8. Version (of the endpoint lcfd code). 9. The inventory ID of the endpoint. 10.The protocol of the endpoint logging in, TCPIP or IPX. In addition, the endpoint manager sets the environmental variable LCF_LOGIN_STATUS, based on login type: 򐂰 1 for initial login 򐂰 2 for isolation login 򐂰 3 for migration login

And, for post-3.6.1-TMF-0035 environments: 򐂰 4, 5, and 6 (not used currently) 򐂰 7 for an orphaned endpoint login

Under certain circumstances, you may need additional volatile information about the TMR. In such cases, the full range of Tivoli CLIs is available, subject to the same considerations that were covered in 13.3.1, “General rules for policy scripting” on page 496. Tip: You cannot depend on the usual environmental variables, for example, BINDIR and DBDIR to be available when scripts are executed by the endpoint manager. If a script that seemed to work fine in testing fails (or gives unexpected results) when issued by the endpoint manager as part of a login, that's the first thing to look for. For best efficiency, hardcode needed environment variables into the Perl script. (The next thing to look at in such cases is permissions on any directories/files the script is trying to read or write.)

13.3.3 Viewing, modifying, and installing policy scripts The system comes pre-configured with default bourne-shell (bash) endpoint policy scripts. These scripts provide a basic framework on which more robust scripts can be built. To obtain text file copies of the current endpoint policies, do the following on the TMR Server: wgeteppol wgeteppol wgeteppol wgeteppol

498

allow_install_policy > allow.sh select_gateway_policy > select.sh after_install_policy > after.sh login_policy > login.sh

Troubleshooting Tivoli Using the Latest Features

Tip: The first two characters of the policy name, for example, "al", "se", and so on, can be used with wgeteppol and wputeppol as a sort of shorthand, for example: wgeteppol al > allow.sh.

These scripts can then be modified with a text editor to create the policy you need. Once the new script is complete, to install it do the following on the TMR Server for each policy: wputeppol allow_install_policy < allow.sh

13.3.4 allow_install_policy The allow_install_policy, called install policy for short, allows you to terminate the login immediately when the endpoint manager receives the login request from the intercepting gateway. For example, you can decide to refuse login requests based on the endpoint's IP address.To do this task, simply exit the script with a non-zero value. You can also use this policy to perform any pre-login actions you might need. In general, you need to deploy an allow_install_policy rather than take the default behavior of allowing any endpoint to log in. This is obviously important in environments having multiple, interconnected TMRs, to prevent an endpoint from associating itself with the wrong TMR. A properly-designed allow_install_policy is useful in any reasonably large environment. There is only one action this policy has to perform, and that is generate an exit status of zero (0) to allow the login to proceed, non-zero to cause rejection of the login. A good design will log each login attempt to a text file and record all meaningful data, for example, the intercepting gateway OID, endpoint name, interpreter type, IP address, and script exit status. This provides a ready reference for troubleshooting for very little increase in overhead. Notes: 򐂰 The endpoint manager log does not log such information. 򐂰 Be careful to not use w-commands in allow_install_policy and in general in any policy script.

Example 13-14 on page 500 is an example of an allow_install_policy.

Chapter 13. Endpoints and endpoint management

499

Example 13-14 Log writer function # log writer function for allow_install_policy # script exit status (0 to allow, 1 to disallow login) should be # determined and put in "$status" variable before calling function sub log_login{ %log_type = ("1", "initial", "2", "isolation", "3", "migratory" "7", "orphaned"); @timestamp = localtime(time); $time = "$timestamp[2]:$timestamp[1]:$timestamp[0]"; # open logfile to append entry, return failure code if unsuccessful # NOTE: DBDIR must be set elsewhere in script open(LOGFILE,">>$ENV{DBDIR}/ep_login.log") || return(1); print LOGFILE "$time:$epname:$ip_addr:$log_type{ $ENV{LCF_LOGIN_STATUS}}:$arch:$log_gwy:$status\n"; close(LOGFILE); return(0); }

In case of problems, you might want to reject all initial logins until the problem has been solved. The following script does that: #!/bin/sh # shutoff.sh exit 1

This can quickly be put in place of the normal allow_install_policy if an unanticipated problem occurs. This is important, because it is far simpler to prevent incorrect endpoint creation than to repair them once they exist. Remember, the endpoint will keep trying to log in regardless of how many times it gets rejected. By rejecting initial endpoint logins, the administrator buys time to fix the problem, be it an incorrect policy script behavior or a gateway that is unexpectedly down.

Using allow_install_policy to prevent multiple endpoints The allow_install_policy can also serve to stop potential duplicate initial logins. There are currently two main causes for this: 򐂰 An overloaded TMR Server failing to respond before the endpoint's udp_interval expires 򐂰 A rebuilt system trying to log in using its old IP address

The latter situation happens when a customer has to re-install a host system from an image that does not contain the final endpoint configuration. This is commonly done in the real world.

500

Troubleshooting Tivoli Using the Latest Features

One approach to work around both these problems uses CLI commands in the allow_install_policy, that is, specifically have the allow_install_policy run wep or wlookup, and if the entry exists, run wdelep to get rid of it. Attention: In some rare cases, such mechanisms might have possible side effects. These CLIs require locks on the Name Registry, so if endpoint logins are being delayed due to resource contention, this just makes the problem worse by generating more locks. In addition, it is not fail-safe; if a preceding initial login attempt is hung due to lock contention, a subsequent attempt might try to login before the endpoint is registered, and create a duplicate.

Finally this approach is against the recommendation to not to use “w” commands in policy scripts. In such cases, the allow_install_policy should exit with a non-zero status immediately after removing the pre-existing endpoint. Also, the use of an exit status of 6 requires attention.This has special meaning to the endpoint manager, which will interpret this code as a signal to abort login processing for this endpoint. What this means is that an exit 6 will not return a failure message to the endpoint. Consequently, the endpoint will re-attempt login at the end of its udp_interval (default five minutes) rather than its login_interval (default 30 minutes). This gives time for the wdelep to completely purge the existing endpoint entry without greatly delaying the login. Example 13-15 shows how to code this task. Example 13-15 Use of exit code 6 `wep $ARGV[0]`; #checks if endpoint already exists if ($? == 0){ # if the endpoint exists, we delete it and exit `wdelep $ARGV[0]`; #deletes obsolete endpoint if ($? != 0){ exit 1; # Reject login if wdelep fails } `wep sync_gateways`; # refresh the gateway memory DB exit 6; # abort the login (to allow cleanup to complete) } # Code to process a non-existent endpoint follows

A better workaround approach during mass endpoint deployments uses a plain text file that is examined and updated by the allow_install_policy. This avoids making the lock contention problem worse, but is also not fail-safe. If the first initial login attempt fails after passing the allow_install_policy, no subsequent ones are ever allowed. However, if the allow_install_policy is logging these

Chapter 13. Endpoints and endpoint management

501

attempts (as previously recommended), you can detect this condition and take remedial action. Attention: If this mechanism is used, the policy script must lock the reference file in order to update it. Otherwise, multiple updates might be attempted almost simultaneously, with consequent loss of data.

The Perl subroutine in Example 13-16 checks an external text file to see if the endpoint's IP address appears, and allows the login if it does. It also updates the file so no more logins from that IP address will be allowed. Example 13-16 Preventing multiple endpoint updates #!/etc/Tivoli/bin/perl # Set up $status = 1; # script return code, preset to indicate failure $in_use = "/var/spool/tivoli/tmr_host_name.db/in_use"; # lock file $allow_list = "/var/spool/tivoli/tmr_host_name.db/allow_list"; # ep list &lock_file; exit($status) if &check_list; # check allow list, exit on non-zero # return # # Put customer-specified allow checks here, reset $status to proper # exit status code # exit($status); # Subroutines---------------------------------------------------------sub lock_file{ # Since multiple logins could be occuring, # create a file to "lock" things if ( -e $in_use ){ exit($status); # another login has the list in use, bail out }else{ # the next two commands work like unix "touch" command # without the shell overhead open(IN_USE,">$in_use"); close(IN_USE); } } sub check_list{ open(ALLOWED,"$allow_list"); @is_approved_to_login = ; # get local copy of list close(ALLOWED); foreach(@is_approved_to_login){ if( /$ARGV[4]/ ){ # pattern match endpoint IP with those in list

502

Troubleshooting Tivoli Using the Latest Features

# If we get a match, reset status $status = 0; # otherwise add to revised list }else{ push(@new_approved_list,$_); } } if ($status == 0){ # if we found ep in list, overwrite existing file open(ALLOWED,">$allow_list"); # print ALLOWED @new_approved_list; close(ALLOWED); } unlink("$in_use"); # remove lock file return($status); }

The allow_install_policy can also be used as a throttle to control the number of initial logins being processed. This is another way to avoid duplicate endpoint creation during mass deployments of endpoints. One way of doing this is to run a daemon program that monitors the number of initial logins in process, and updates a text file that is read by the allow_install_policy. The sample code in Example 13-17 is the daemon program. Example 13-17 Use of a throttle to control the number of initial logins #!/etc/Tivoli/bin/perl # MRH 2000/01/14 FirstWeatherStormAlert.pl # Monitor endpoint new/migration logins, and # instrument allow_policy to deny new requests # during endpoint storms # # This is run on TMR Server, as a user that can both # issue odstats and write to the $allow_exit file # # Presets: # Number at which throttles occur $max_num=18; # File which will contain exit code $allow_exit="/tmp/ep/allow_exit"; # How often odstat is executed $pause=5; # Kill file $kill_file="/tmp/ep/stop_odwatch"; # # Begin main routine while ( -f $kill_file ) { sleep $pause;

Chapter 13. Endpoints and endpoint management

503

$count = 0; open(ODSTAT, "odstat -c |") || die "Failed to get odstat\n"; foreach $line () { if ( $line =~ /login_encrypted/ ) { $count++; } close(ODSTAT); } # Print Statistics print "ENDPOINT LOGINS: $count\n"; # First, check to see if the $Check was set if ( $Check eq "Y" && $max_num > $count ) { open(ALLOW,">$allow_exit") || warn "Unable to open file\n"; print ALLOW "0"; $Check="N"; print "ALLOW EXIT NOW 0\n"; close(ALLOW); } if ($max_num < $count ) { open(ALLOW,">$allow_exit") || warn "Unable to open file\n"; print ALLOW "6"; $Check="Y"; print "ALLOW EXIT NOW 6\n"; close(ALLOW); } } exit 0;

13.3.5 select_gateway_policy Use of select_gateway policy is likewise essential for almost all deployments, due to the design of the default selection process. The default process uses the intercepting gateway as primary, followed by the first five reachable gateways in the TMR's gateway list. A problem with the default gateway selection process occurs when a primary gateway becomes unavailable and endpoints begin to migrate. Without a select_gateway policy, all these endpoints will end up on the gateway that picked up their isolation login. This may not be a desired behavior, especially since all these endpoints might have the same secondary gateway (another characteristic of environments using default select_gateway policy).

504

Troubleshooting Tivoli Using the Latest Features

A successful select_gateway policy must deal with two entirely different scenarios: 򐂰 Initial endpoint login 򐂰 Isolation login

If an endpoint is isolated, it is probable that its preferred gateway is down, in which case all the other endpoints on that gateway will also be isolated. Putting all these endpoints on the same secondary gateway may not be desirable from a performance/loading perspective. It could result in a sequence of cascading gateway failures as one gateway after another receives an insupportable load of isolation logins. You have to keep in mind that different outcomes may be desired for isolation logins than for initial logins. The select_gateway policy should generate (as standard output) a list of gateway OIDs, and must issue an exit status. Normally, an exit status of zero (0) should be issued. A non-zero exit status will terminate the login process with a failure message. The recommended length of the gateway list is five gateways, although fewer is acceptable. (The endpoint manager will increase the list up to five, if at least five reachable gateways exist.) Lists significantly longer than five might run into packet length problems when the interface list is passed to the endpoint. The ordering of gateways in the list is significant in that it determines the sequence in which the endpoint will attempt to use the gateways. Typical gateway selection schemes key off of the endpoint IP address, this being the one unique bit of information contained in the login packet, but this does not work for everyone. The Perl code in Example 13-18 is used to assign endpoints by using the least-used-gateway first criteria, with no other considerations. Example 13-18 Code that uses least-used gateway criteria for endpoint assignments #!/etc/Tivoli/bin/perl # # Script requires a "gatelist" text file consisting of a gateway OID # followed by a "tab" character ,the gateway name as output by # "wgateway", another tab, and the number of endpoints assigned to that # gateway on each line, e.g.: # # 1399474381.1.530sample-gateway236 # # The method of generation of this file is left to the user. Script # will update the file to reflect endpoint additions resulting from use # of this policy script. # # subroutine to sort in numerical order sub by_number { return $GATEWAYS{$a} $GATEWAYS{$b}; }

Chapter 13. Endpoints and endpoint management

505

# subroutine to balance load on running gateways sub load_balance { open(GATELIST,"gatelist") || die "Cannot open gateway list.\n"; while (){ chop $_; @tmp = split(/\s+/); $GATEWAYS{$tmp[0]} = $tmp[2]; $GATENAMES{$tmp[0]} = $tmp[1]; } close (GATELIST); # Sort gateway list in order of increasing endpoint count @sorted_keys = sort by_number keys(%GATEWAYS); open(GATELIST,">gatelist"); $gatecount = 0; # initialize count of good gateways # Process gateway list, adding to output until five good gateways # are found. foreach (@sorted_keys) { if ($gatecount < 5){ # check gateways until 5 good ones found if (! system("wgateway $GATENAMES{$_} describe > /dev/null 2>&1")){ print "$_\n"; $gatecount++; $GATEWAYS{$_}++ if $gatecount == 1; } } # re-write gateway list file print GATELIST "$_\t$GATENAMES{$_}\t$GATEWAYS{$_}\n"; } close(GATELIST); } ####################################################################### # START OF MAIN PROCESSING ROUTINE ####################################################################### if ($ENV{LCF_LOGIN_STATUS} == 2) { #isolation login, use load_balance &load_balance(); exit 0; } exit 0; # use system defaults for initial logins # Add scripting for initial login policy as required

Another alternative is to determine preferred gateways based on the Windows domain name. This is not contained in the login packet, and not readily associated with anything else in the login packet. In this case, you can maintain a text cross-reference file that associates the endpoint name or IP address with the Windows domain name.

506

Troubleshooting Tivoli Using the Latest Features

13.3.6 after_install policy You can, for most cases, ignore the after_install policy during an initial deployment. All you can do with it is assign a newly-created endpoint to a policy region, or subscribe it to a profile. No configuration of the endpoint itself can be done via this policy. Avoid running wep set_label in this policy script. If you elect to use this policy to make Tivoli database changes via CLIs, make sure to use the allow_install_policy to reduce the number of initial logins being allowed. Otherwise, you run the risk of command timeouts due to resource contention issues.

13.3.7 login_policy Login policy is unique in that it gets run every time an endpoint logs in; in other words, every time someone powers up their computer. So it is especially important that login policy scripts be as efficient as possible. The only thing login policy has to do is generate an exit status. But in this case, the difference between zero and non-zero is more subtle. In either case, the login will succeed, but in the case of a non-zero exit code, the boot_method for the endpoint will not be run. One possible use of login policy is to initiate auto-upgrade of endpoint code via the wadminep upgrade command. The arguments for doing so are: 򐂰 The processing overhead of wadminep is born by the managed node on which the command is issued. So running it from one node (especially the TMR Server) is poor resource utilization. 򐂰 The files to be copied to the endpoint comes from the same managed node where the command is run, by default.

But you have be careful about the following issue: In a well-designed TMR, the most efficient route from source host to endpoint will be the endpoint's assigned gateway. Even in a fairly static environment, an individual endpoint upgrade takes 30-40 seconds. When things get busy, the interval stretches. Under certain combinations of gateway/network load, it is easily possible to exceed the max_concurrent_jobs limit of the gateway (default is 200 jobs, of which each login is a job) and essentially halt all further processing. So we strongly recommend you to log all endpoint information in a separate file and perform the upgrade as a separate task, not in the login_policy.

Chapter 13. Endpoints and endpoint management

507

Tip: In any case, if the decision is made to use login policy for endpoint upgrades, we recommend at least hardcoding the latest available endpoint version into the login policy script. That way, the overhead associated with external file lookups or CLI commands is avoided. The following (Perl) code snippet illustrates this: $curr_version = 41014; if($ARGV[7] != $curr_version){ wadminep $ARGV[0] upgrade; }

This is efficient because no external references or lookups are required. All needed information is provided (by the gateway process) as arguments to the script. On the other hand, as we have already recommended, a still better approach is to simply log the endpoint label into a text file, and run the upgrade as a completely separate task, not in the login_policy. In a large TMR, it is inevitable that endpoints will migrate to alternate gateways. Once migrated, they do not automatically return to their primary gateway. Consequently, it might be possible to write a login policy that confirms that the endpoint is logging into its preferred gateway, and if not, migrate it if the preferred gateway is functional. The processing overhead is likely to be significant, especially since wgateway must be used to verify the operational status of the preferred gateway. We recommend doing this only if local experience shows it to be needed. Alternatively, run wep migrate_to_pref occasionally. Refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806 for more information on wep migrate_to_pref command. Another use of login policy is to keep a record of login attempts. This log file can then be used for endpoint upgrades and deletions of inactive endpoints from the TMR. Note: The same login_policy script is run on all of the endpoint gateways in the TMR. This policy does not support the use of binaries.

13.4 Endpoint manager and gateway internals In this section, we will have a closer look at th endpoint manager and gateway internals. The information in this section is largely derived from the Tivoli Field Guide: The TME10 Endpoint Gateway, A technical discussion - a look at the internals. (see 1.3.1, “Tivoli Field Guides” on page 7 for more information on how to find this publication).

508

Troubleshooting Tivoli Using the Latest Features

13.4.1 Why it is so important An unstable endpoint manager or endpoint gateway can have a catastrophic ripple effect on the reliability and effectiveness of the Tivoli Management environment. This is primarily because of the number of management units that are affected by their instability. An unreliable gateway with 1,000 endpoints logged into it affects the ability of the administrator to effectively manage those 1,000 machines. In an environment of 10,000 endpoints, that represents a 10% failure rate, which is unacceptable in any 24x7 operation. An unreliable endpoint manager has an even more crippling effect, because management of the aforementioned 10,000 endpoints becomes frustrating if the endpoint manager process (ep_mgr) dies unexpectedly during critical operations. Analyzing this from a product perspective, we can better understand the problems customers encounter: 򐂰 Software distribution has become a time sensitive operation, with Service Level Agreements (SLAs) requiring high first-time hit ratios, because of small windows of opportunities to deliver business critical software. 򐂰 Inventory data requires a high level of accuracy, as companies rely on this data to manage their asset portfolio, plan application rollouts, hardware upgrades, and deployments. If the data is not available, these activities are stymied. 򐂰 System availability of mission critical servers is the foundation of any corporate organization that hopes to compete. The importance and relevance of the availability of data is dependent on how timely this information is delivered to the administrative IT team.

All these functions are dependent on a stable and functional endpoint manager and endpoint gateways. With this in mind, it is important to understand how to manage these two processes.

13.4.2 Gateway thread usage Every gateway method is initially spawned via an oserv thread. The gateway method request is handled by a gateway RPC thread that is spawned by the aforementioned oserv thread. When the gateway RPC thread is spawned, gateway_method_in() is called. This routine/function determines what type of method call is being made. There is a max_gateway_rpc_threads parameter, which allows flexibility when managing gateway processes.

Chapter 13. Endpoints and endpoint management

509

Also, each gateway RPC thread uses an oserv thread (and also an operating system thread). A good rule of thumb when using this parameter is: max_gateway_rpc_threads wlsinst -avh *-----------------------------------------------------------------------------* Product List *-----------------------------------------------------------------------------* Tivoli Enterprise Console Server 3.7 chatham aix4-r1 DB /var/spool/Tivoli/chatham.db BIN /usr/local/Tivoli/bin/aix4-r1 GBIN /usr/local/Tivoli/bin/generic_unix CAT /usr/local/Tivoli/msg_cat Tivoli Enterprise Console User Interface Server 3.7 chatham aix4-r1 DB /var/spool/Tivoli/chatham.db BIN /usr/local/Tivoli/bin/aix4-r1 Tivoli Enterprise Console Console 3.7 eastham aix4-r1 BIN /usr/local/Tivoli/bin/aix4-r1 GBIN /usr/local/Tivoli/bin/generic_unix Tivoli Enterprise Console Sample Event Information 3.7 eastham aix4-r1 BIN /usr/local/Tivoli/bin/aix4-r1 GBIN /usr/local/Tivoli/bin/generic_unix

The installation process creates temporary files to log all actions. Depending on the current installation step, the temporary log file’s name is created as follows: 򐂰 During an installation step the installation process copies files or performs tasks on a .

Example: A Tivoli Enterprise Console Server 3.7 installation copies binaries to $BINDIR. 򐂰 During these actions, two temporary log files are created on the target machine in /tmp (UNIX) or %DBDIR%\tmp (NT):

– _after.output, which includes output related to the performed actions and the scripts that are run. – _after.error, which includes errors related to a failed installation step.

536

Troubleshooting Tivoli Using the Latest Features

Example: – TEC_SERVER_3.7.0_BIN_after.output – TEC_SERVER_3.7.0_BIN_after.error It is recommended to watch the files during installation using the tail -f command on UNIX systems. 򐂰 If the installation step succeeds, both files are removed and an installation tag /.installed/ is created.

Example: $BINDIR/.installed/TEC_SERVER_BIN Then all actions are repeated for the next installation step. 򐂰 If an installation step fails, the entire installation process stops and the temporary log files are kept for debugging. No installation tag is set.

In case of an installation error, there are some basic tasks for debugging a problem. To force Tivoli to recopy the binaries, remove the appropriate installation tags and install the product again. The binaries will be copied only to the directories where the installation tag of this product has been removed. Normally, Tivoli does not put any file in the database directory ($DBDIR). Instead, the installation process will run a script to update the Tivoli database. This script will run Tivoli object calls and system commands to add new objects to the Tivoli database. Also, it will register this product in the installed products list. The installation tag from the database directory should not be removed to try an installation again. This will cause an installation failure, because the installation process will try to add existing objects to the database. To avoid this situation, the product needs to be uninstalled before trying to reinstall it. A rollback of a Tivoli database backup can be helpful as well. After an installation error, the temporary log files of the failed installation step may include helpful error messages to track down the problem. In case of a failed Tivoli database update, the commands of the after script would be included in the temporary installation log files. For actions referring to the Tivoli database, an installation tag is placed in $DBDIR/.installed, including an append _ALIDB. As most of the errors occur during this stage, the log files to look at will be *_ALIDB_after.output/error for the failing product. The last commands that have been run before the error occurred can be found in this files, and may give an indication of where to look. If the error refers to a special file package (*.PKG), check for the file system space or a corrupted installation image. Also, a look at the installation image’s

Chapter 14. Tivoli Enterprise Console

537

contents could give some hints. Another step would be to look at the index file (*.IND) of the product to be installed that is parsed during the installation.

14.3 Tivoli Enterprise Console 3.7 tracing and logging This section includes a collection of tracing and logging options that can be used during run time or, in case of failure of a certain product, to retrieve more details about the background processing.

14.3.1 Tivoli Enterprise Console Server 3.7 diagnostic logging The following sections describe how to set up diagnostic logging for the Tivoli Enterprise Console Server 3.7 processes. The tracing procedure itself remains the same as it was for Tivoli Enterprise Console Server 3.6, which is not documented in the manuals. For this reason, the following information may be useful for debugging issues. All changes for the new version of Tivoli Enterprise Console 3.7 are mentioned where necessary.

General information and recommendations When the Tivoli Enterprise Console Server is running, five processes are present: 򐂰 tec_server 򐂰 tec_reception 򐂰 tec_rule 򐂰 tec_dispatch 򐂰 tec_task

Each of these processes can be individually traced. Therefore, four tracing levels are available: 򐂰 error: This is the default option that is enabled after startup of the Tivoli Enterprise Console Server. Only errors that may point to an condition different from the normal state of the Tivoli Enterprise Console processes are logged. Logging entries with this level are indicated as ERR in the log files. 򐂰 trace0: This option provides brief information on the actions of the Tivoli Enterprise Console processes. Logging entries with this level are indicated as TR0 in the log files. 򐂰 trace1: This option provides more detailed information on the actions of the Tivoli Enterprise Console processes. Logging entries with this level are indicated as TR1 in the log files.

538

Troubleshooting Tivoli Using the Latest Features

򐂰 trace2: This is the most powerful tracing option used for debugging issues. It provides very detailed information on all aspects of what each of the Tivoli Enterprise Console processes is doing. Logging entries with this level are indicated as TR2 in the log files.

If no trace level higher than error has been specified, the diagnostic log files usually have a size of 0 bytes. Only when an error occurs will they contain information. This is an easy way to detect failures on one of the Tivoli Enterprise Console processes. When running the Tivoli Enterprise Console Server in a normal condition, which means no errors are observed, it is always sufficient to use the default tracing option of error. Only if errors show up in the log files should a higher trace level be used. To debug a problem, always use trace2. This produces a large amount of output to help track down the problem. After successful debugging with trace2, reset the tracing to error to avoid loss of performance caused by the tracing. If errors are observed while tracing, note the time of occurrence (if possible) to locate the relevant section of the trace files. This could establish a pattern for an error, which could be connected to other system actions. The best use of tracing is for reproducible problems where you can set up the trace2 logging, then reproduce the error.

Diagnostic logging file format The format of the diagnostic log files is: . [PID] ::

Example: Aug 1 14:12:01.209544 tec_dispatch[18406] TR1 tec_ipc_connect.c:154: Connect to TEC Master succeeded

Diagnostic logging configuration file format Diagnostic logging is set up using the configuration file $BINDIR/TME/TEC/.tec_diag_config, which includes trace options for all five processes and all levels of tracing mentioned in the previous section. Example 14-2 on page 540 shows the main structure of this file.

Chapter 14. Tivoli Enterprise Console

539

Example 14-2 tec_diag_config file "$Id: @(#)53 1.17 src/bim_server/tec_utils/.tec_diag_config, tec, tec_3.7.0 5/30/00 20:13:46 $" # # Sample Bim Error/Trace Diagnostic Messages configuration file # # format: # Highest_level

# Truncate_on_restart [true|false] # Highest_level

# # Highest_level trace2 Truncate_on_restart # tec_master ############# tec_master Highest_level tec_master Master ... # tec_reception ################ tec_reception Highest_level tec_reception Tec_Reception ... # tec_rule ########### tec_rule Highest_level tec_rule Tec_Rule ... # tec_dispatch ############### tec_dispatch Highest_level tec_dispatch Tec_Dispatch ... # tec_task ##################### tec_task Highest_level tec_task Task ... # database utilities #####################

540

Troubleshooting Tivoli Using the Latest Features

true

error error

/tmp/tec_master

error error

/tmp/tec_reception

error error

/tmp/tec_rule

error error

/tmp/tec_dispatch

error error

/tmp/tec_task

wtdbclear Clear_db wtdbclear DB_Utils

error error

/dev/tty /dev/tty

The following text describes the format of the diagnostic logging configuration file. The first valid entry in the file is shown in Example 14-3. Example 14-3 Format of logging information file-1 Highest_level

trace2

This entry sets a global highest level of tracing that will be available for all the following sections, but does not enable the tracing itself. The default entry of trace2 enables all possible trace levels to be used. This line does not need to be edited. The second entry is shown in Example 14-4. Example 14-4 Format of logging information file-2 Truncate_on_restart

true

The usage is: Truncate_on_restart [true|false]

The default setting of Truncate_on_restart true enables rewriting the Tivoli Enterprise Console log files on each Tivoli Enterprise Console Server (re-)start. All previous information will be lost. To avoid truncating the log files when the Tivoli Enterprise Console server is (re-)started, change the entry to Truncate_on_restart false. This allows information to be appended to the diagnostic log files without truncating them at (re-)start. However, only use this option disabled while debugging a problem, because if the diagnostic files are never truncated, you may run out of disk space. The next five sub-sections of the .tec_diag_config file represent the Tivoli Enterprise Console processes. For example, for the tec_master (tec_server) process, the full sub-section is similar to Example 14-5 on page 542.

Chapter 14. Tivoli Enterprise Console

541

Example 14-5 tec-master process section tec_master ############# tec_master Highest_level tec_master Master tec_master Master_Msg tec_master Master_Synchro tec_master Master_Exec tec_master Tec_Methods

error error error error error error

/tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master

# low level modules tec_master Exit_Msg tec_master Tec_Baroc tec_master Tec_Methods tec_master Timer

error error error error

/tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master

# IPC modules tec_master Ipc tec_master Ipc_Accept tec_master Ipc_Dsend tec_master Ipc_Alive tec_master Ipc_Server

error error error error error

/tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master

# Pool modules tec_master Pool tec_master Pool_Master

error error

/tmp/tec_master /tmp/tec_master

# Msg modules tec_master Msg tec_master Msg_CA tec_master Msg_CC tec_master Msg_DP tec_master Msg_EP tec_master Msg_GO tec_master Msg_HI tec_master Msg_NE tec_master Msg_OK tec_master Msg_RR tec_master Msg_TP

error error error error error error error error error error error

/tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master /tmp/tec_master

The first line has the format: Highest_level

and sets the highest level of tracing that will be available for that sub-section. The default for is error. This entry supersedes the global Highest_level entry mentioned above. Only when the entry is not present or commented out does the global entry take effect.

542

Troubleshooting Tivoli Using the Latest Features

All following entries in this sub-section are of the format:

Where:

Specifies the Tivoli Enterprise Console process on which the tracing level is to be configured.

Specifies a selected module of this Tivoli Enterprise Console process on which the tracing level is to be configured.

Specifies the tracing level of the current module. The Highest_level entry limits the module’s logging level, if set to a lower level.

Specifies the location of the log file.

The handling of the sub-sections regarding all other Tivoli Enterprise Console processes works the same way. The last sub-section stands for a collection of database utilities. The commands wtdbclear, wtdumper, wlsemsg, and wsetemsg are supported. This section has been expanded for the Tivoli Enterprise Console 3.7 version. The global setting of the Highest_level applies. The entry shows /dev/tty by default. For more information about this, see “Trace file locations” on page 545.

Setting up diagnostic logging To enable the diagnostic logging, the $BINDIR/TME/TEC/.tec_diag_config file needs to be edited. The format of the file has been explained in the previous section. Before editing the file, it is recommended that you keep a copy of the original version. This makes it easy to revert the tracing back to normal level. To enable full logging with trace2 for all Tivoli Enterprise Console processes and modules, the diagnostic logging configuration file needs to be changed, as shown in Example 14-6 on page 544 (all bold marked strings were changed).

Chapter 14. Tivoli Enterprise Console

543

Example 14-6 Full logging with trace2 "$Id: @(#)53 1.17 src/bim_server/tec_utils/.tec_diag_config, tec, tec_3.7.0 5/30/00 20:13:46 $" # # Sample Bim Error/Trace Diagnostic Messages configuration file # # format: # Highest_level

# Truncate_on_restart [true|false] # Highest_level

# # Highest_level trace2 Truncate_on_restart # tec_master ############# tec_master Highest_level tec_master Master ... # tec_reception ################ tec_reception Highest_level tec_reception Tec_Reception ... # tec_rule ########### tec_rule Highest_level tec_rule Tec_Rule ... # tec_dispatch ############### tec_dispatch Highest_level tec_dispatch Tec_Dispatch ... # tec_task ##################### tec_task Highest_level tec_task Task ... # database utilities #####################

544

Troubleshooting Tivoli Using the Latest Features

true

trace2 trace2 /tmp/tec_master

trace2 trace2 /tmp/tec_reception

trace2 trace2 /tmp/tec_rule

trace2 trace2 /tmp/tec_dispatch

trace2 trace2 /tmp/tec_task

wtdbclear Clear_db wtdbclear DB_Utils ...

error error

/dev/tty /dev/tty

We recommend enabling all entries for logging with trace level trace2 at first. This provides a full set of log files containing all available information. If this shows an error that only affects one of the Tivoli Enterprise Console processes, the diagnostic logging may be reduced to this process or selected modules thereof. All entries for the log file location can be left as default. For more information on this topic, see the next section.

Trace file locations The location of the log files can be specified in the $BINDIR/TME/TEC/.tec_diag_config file. By default, the file names and their locations are: 򐂰 /tmp/tec_master 򐂰 /tmp/tec_dispatch 򐂰 /tmp/tec_rule 򐂰 /tmp/tec_reception 򐂰 /tmp/tec_task Note: On UNIX systems, the /tmp directory should always be present by default.

On NT systems, this directory may not initially exist. The log files will only be created when their target directory exists. The notation of the target directories in the file .tec_diag_config can be /tmp/tec_master or :/tmp/tec_master. If no is specified, the drive where the TEC Server is installed is used. When the log files are deleted, they are recreated when the next message is logged or on restart of the TEC Server. When diagnostic logging is enabled with level trace2, a large amount of data is written to the log files. This can produce problems with the available file system space. Therefore, do not run trace2 for a long time or ensure enough space is available at your diagnostic logging target directories.

Chapter 14. Tivoli Enterprise Console

545

In addition to logging into a file, there are two more options to use as diagnostic logging target: 򐂰 An entry of /dev/null instead of a file name discards all output. 򐂰 For the database utilities, an entry of /dev/tty is the default. This means the diagnostic logging output can be redirected to the console. For example, the /dev/tty entry needs to be replaced with /dev/lft0 to dump out diagnostic messages to the screen. This has been tested successfully on AIX using the Command Line Login instead of CDE.

Starting and stopping tracing Perform the following actions to start diagnostic logging: 1. Save the original diagnostic logging configuration file as $BINDIR/TME/TEC/.tec_diag_config.original. 2. Apply all changes as needed to the $BINDIR/TME/TEC/.tec_diag_config file to enable diagnostic logging and save the file. 3. Stop the TEC Server (run wstopesvr). 4. Restart the TEC Server (run wstartesvr). The new settings will now take effect. If the file .tec_diag_config does not exist at the right location or with the right file name, the Tivoli Enterprise Console Server will not start. An error similar to Example 14-7 will be displayed. Example 14-7 Error message when Tivoli Enterprise Console will not start chatham> wstartesvr The Tivoli Enterprise Console Server is initializing... Error::TasExCat:0017 Mon Jul 31 19:08:39 CDT 2000 (17): system problem: `TasExCat:0002 Mon Jul 31 19:08:37 CDT 2000 (2): operation `' failed'

Perform the following actions to stop diagnostic logging: 1. Restore the original diagnostic logging configuration file from $BINDIR/TME/TEC/.tec_diag_config.original to $BINDIR/TME/TEC/.tec_diag_config. 2. Stop the Tivoli Enterprise Console Server (run wstopesvr). 3. Restart the Tivoli Enterprise Console Server (run wstartesvr). The new settings will now take effect.

546

Troubleshooting Tivoli Using the Latest Features

On NT systems, the Tivoli Enterprise Console Server restart is necessary to let the new settings take effect. On UNIX systems, the Tivoli Enterprise Console Server does not have to be shut down and restarted to pick up the trace options. Alternatively, it is sufficient to use the kill -1 command on the Tivoli Enterprise Console processes. This option can be useful if a productive Tivoli Enterprise Console Server cannot be shut down easily. Enter the commands shown in Example 14-8. Example 14-8 Killing Tivoli Enterprise Console processes chatham> ps -ef|grep tec_server root 20324 17510 0 19:50:13 - 0:00 tec_server chatham> ps -ef |grep 20324 root 20324 17510 0 19:50:13 - 0:00 tec_server root 13238 20324 0 19:50:15 - 0:00 tec_task -config /usr/local/Tivoli/bin/aix4-r1/TME/TEC/.tec_config root 15554 20324 0 19:50:15 - 0:00 tec_reception -config /usr/local/Tivoli/bin/aix4-r1/TME/TEC/.tec_config root 21272 20324 0 19:50:15 - 0:01 tec_rule -config /usr/local/Tivoli/bin/aix4-r1/TME/TEC/.tec_config root 32712 20324 0 19:50:15 - 0:05 tec_dispatch -config /usr/local/Tivoli/bin/aix4-r1/TME/TEC/.tec_config chatham> chatham> chatham> chatham> chatham>

kill kill kill kill kill

-1 -1 -1 -1 -1

20324 13238 15554 21272 32712

After performing these commands, all new settings in the .tec_diag_config file will take effect.

Example of diagnostic logging for TEC Server startup The following example provides (shortened) output listings for the Tivoli Enterprise Console Server startup with diagnostic logging turned on. This shows the increasing levels of information and gives an idea of the look of the diagnostic log file. Also, the listings can be useful as an error free reference in terms of a failing Tivoli Enterprise Console Server startup. The action that has been traced is shown in Example 14-9 on page 548.

Chapter 14. Tivoli Enterprise Console

547

Example 14-9 Tivoli Enterprise Console startup chatham> wstartesvr The Tivoli Enterprise Console Server is initializing... The Tivoli Enterprise Console Server is running. chatham>

Startup of TEC Server with diagnostic logging level error In case of an error free Tivoli Enterprise Console Server startup all diagnostic log files are empty when using trace level error. Example 14-10 Log files after Tivoli Enterprise Console startup chatham> ls -l /tmp/tec_* -rw-r--r-1 root nobody -rw-r--r-1 root nobody -rw-r--r-1 root nobody -rw-r--r-1 root nobody -rw-r--r-1 root nobody

0 0 0 0 0

Jul Jul Jul Jul Jul

31 31 31 31 31

19:19 19:19 19:19 19:19 19:19

tec_dispatch tec_master tec_reception tec_rule tec_task

Startup of TEC Server with diagnostic logging level trace2 All logging entries shown in this section also apply to the diagnostic logging levels trace0 (TR0) and trace1 (TR1). For each trace level, the entries of all higher levels will not appear in the files. Every trace level lower than trace2 reflects only a subset, and does not add additional data. The contents of the tec_master log file are shown in Example 14-11. Example 14-11 Contents of the log file tec_master Aug 1 14:12:01.109417 tec_master[4064] TR0 tec_master_exec.c:338: Starting Dispatch: /usr/local/Tivoli/bin/aix4-r1/TME/TEC/tec_dispatch -config /usr/local/Tivoli/bin/aix4-r1/TME/TEC/.tec_config Aug 1 14:12:01.116781 tec_master[4064] TR1 tec_master_exec.c:352: Dispatch Engine Started ... Aug 1 14:12:01.127097 tec_master[4064] TR0 tec_master_exec.c:338: Starting Reception: /usr/local/Tivoli/bin/aix4-r1/TME/TEC/tec_reception -config /usr/local/Tivoli/bin/aix4-r1/TME/TEC/.tec_config Aug 1 14:12:01.130227 tec_master[4064] TR1 tec_master_exec.c:352: Reception Engine Started ... Aug 1 14:12:01.138865 tec_master[4064] TR0 tec_master_exec.c:338: Starting Rule: /usr/local/Tivoli/bin/aix4-r1/TME/TEC/tec_rule -config /usr/local/Tivoli/bin/aix4-r1/TME/TEC/.tec_config Aug 1 14:12:01.146775 tec_master[4064] TR1 tec_master_exec.c:352: Rule Engine Started ...

548

Troubleshooting Tivoli Using the Latest Features

Aug 1 14:12:01.159075 tec_master[4064] TR0 tec_master_exec.c:338: Starting Task: /usr/local/Tivoli/bin/aix4-r1/TME/TEC/tec_task -config /usr/local/Tivoli/bin/aix4-r1/TME/TEC/.tec_config Aug 1 14:12:01.166792 tec_master[4064] TR1 tec_master_exec.c:352: Task Engine Started ... Aug 1 14:12:01.213561 tec_master[4064] TR2 tec_msg_hi.c:89: Received message HI : DISP ... Aug 1 14:12:01.324200 tec_master[4064] TR2 tec_msg_hi.c:89: Received message HI : RECV ... Aug 1 14:12:04.522377 tec_master[4064] TR2 tec_msg_hi.c:89: Received message HI : TASK ... Aug 1 14:12:09.188205 tec_master[4064] TR2 tec_msg_hi.c:89: Received message HI : RULE ... Aug 1 14:12:21.175239 tec_master[4064] TR2 tec_master_synchro.c:429: **** SYNCHRO TABLE **** Aug 1 14:12:21.175462 tec_master[4064] TR2 tec_master_synchro.c:430: proc TEC_SYNCHRO_check_and_go Aug 1 14:12:21.175612 tec_master[4064] TR2 tec_master_synchro.c:433: position = 0 origin=DISP port_num=51204 ready=1 Aug 1 14:12:21.175823 tec_master[4064] TR2 tec_master_synchro.c:433: position = 1 origin=RECV port_num=51197 ready=1 Aug 1 14:12:21.175985 tec_master[4064] TR2 tec_master_synchro.c:433: position = 2 origin=TASK port_num=51201 ready=1 Aug 1 14:12:21.176131 tec_master[4064] TR2 tec_master_synchro.c:433: position = 3 origin=RULE port_num=0 ready=1 Aug 1 14:12:21.176278 tec_master[4064] TR2 tec_master_synchro.c:438: *********************** ... Aug 1 14:12:21.176930 tec_master[4064] TR2 tec_msg_go.c:74: Writing message GO Aug 1 14:12:21.177266 tec_master[4064] TR1 tec_ipc_dsend.c:131: Sending packet to Dispatch (5 bytes) Aug 1 14:12:21.177422 tec_master[4064] TR2 tec_ipc.c:1401: Writing 5 bytes to Dispatch : GO~| Aug 1 14:12:21.177881 tec_master[4064] TR2 tec_ipc.c:1049: Send succeeded Aug 1 14:12:21.178225 Aug 1 14:12:21.178519 to Reception (5 bytes) Aug 1 14:12:21.178674 Reception : GO~| Aug 1 14:12:21.179056 Aug

tec_master[4064] TR2 tec_msg_go.c:74: Writing message GO tec_master[4064] TR1 tec_ipc_dsend.c:87: Sending packet tec_master[4064] TR2 tec_ipc.c:1401: Writing 5 bytes to tec_master[4064] TR2 tec_ipc.c:1049: Send succeeded

1 14:12:21.179397 tec_master[4064] TR2 tec_msg_go.c:74: Writing message GO

Chapter 14. Tivoli Enterprise Console

549

179691 tec_master[4064] TR1 tec_ipc_dsend.c:109: Sending packet to Rule (5 bytes) Aug 1 14:12:21.179845 tec_master[4064] TR2 tec_ipc.c:1401: Writing 5 bytes to Rule : GO~| Aug 1 14:12:21.180185 tec_master[4064] TR2 tec_ipc.c:1049: Send succeeded ... Aug 1 14:12:23.050173 tec_master[4064] TR2 tec_methods.c:478: Refreshing State server_up

On startup, the tec_server (tec_master) process initiates the spawning of all four TEC Server engines (dispatch engine, reception engine, rule engine, and task engine). Each of those engines answers by sending a feedback (HI) that the server receives. This is finished with a time stamp of 14:12:09. After this task, the tec_server process loops, looking for the sub-processes finishing their initialization. This is maintained using an SYNCHRO TABLE, including a ready flag. Only if all ready flags are set to 1 are the processes considered to be alive. The SYNCHRO TABLE at time stamp 14:12:21 fulfills this condition, which leads to a GO message sent back to the dispatch engine, reception engine, and rule engine, informing them to start operation. The task engine gets no feedback. At time stamp 14:12:23, the tec_server process is at state server_up. The dispatch engine is the first one to be spawned by the server engine. The contents of the log file tec_dispatch are shown in Example 14-12. Example 14-12 Contents of the log file tec_dispatch 0671 tec_dispatch[18406] TR2 tec_dispatch.c:340: Dispatch diagnostics initialized ... Aug 1 14:12:01.192143 tec_dispatch[18406] TR2 tec_dispatch.c:354: Dispatch BAROC initialized ... Aug 1 14:12:01.209544 tec_dispatch[18406] TR1 tec_ipc_connect.c:154: Connect to TEC Master succeeded Aug 1 14:12:01.209696 tec_dispatch[18406] TR0 tec_dispatch.c:376: Dispatch connected to Master on host chatham ... Aug 1 14:12:01.209844 tec_dispatch[18406] TR0 tec_msg.c:102: Writing packet Aug 1 14:12:01.209997 tec_dispatch[18406] TR2 tec_msg_hi.c:108: Writing message HI : DISP Aug 1 14:12:01.210143 tec_dispatch[18406] TR1 tec_msg.c:134: Wrote packet ... Aug 1 14:12:01.210971 tec_dispatch[18406] TR0 db_utils.c:822: RIM connect Aug 1 14:12:01.822965 tec_dispatch[18406] TR2 db_utils.c:843: connect: DB vendor is RIM_objects::RIM_Oracle Aug 1 14:12:01.823269 tec_dispatch[18406] TR1 db_utils.c:867: RIM connect succeeded ...

550

Troubleshooting Tivoli Using the Latest Features

Aug 1 14:12:01.826055 tec_dispatch[18406] TR0 classes ... ... Aug 1 14:12:01.844662 tec_dispatch[18406] TR1 tec/rb_dir/TEC_CLASSES/root.baroc parsed ... Aug 1 14:12:07.535884 tec_dispatch[18406] TR0 classes succeeded ... Aug 1 14:12:15.837002 tec_dispatch[18406] TR2 (tec_t_enumerations): Inserting rows Aug 1 14:12:15.837157 tec_dispatch[18406] TR0 (tec_t_enumerations) Aug 1 14:12:15.867454 tec_dispatch[18406] TR1 (tec_t_enumerations) succeeded ... Aug 1 14:12:15.901609 tec_dispatch[18406] TR0 Initialize Dispatch process succeeded ... Aug 1 14:12:15.953901 tec_dispatch[18406] TR0 tec_ipc_connect: Connect to TEC Reception ... Aug 1 14:12:15.957281 tec_dispatch[18406] TR1 to TEC Reception succeeded ... Aug 1 14:12:15.958344 tec_dispatch[18406] TR0 tec_ipc_connect: Connect to TEC Task ... Aug 1 14:12:15.961794 tec_dispatch[18406] TR1 to TEC Task succeeded ... Aug 1 14:12:21.182855 tec_dispatch[18406] TR2 message GO

tec_baroc.c:151: Load BAROC

tec_baroc.c:193: File

tec_baroc.c:205: Load BAROC

db_utils.c:637: Deferred Insert db_utils.c:669: Insert db_utils.c:690: Insert

tec_disp_process.c:64:

tec_ipc_connect.c:119: tec_ipc_connect.c:154: Connect

tec_ipc_connect.c:119: tec_ipc_connect.c:154: Connect

tec_msg_go.c:60: Received

One of the first actions of the dispatch engine after initializing is connecting to the TEC Master and sending the HI message. Then a connection to RIM is established. All BAROC classes are parsed and the database tables are populated with static information, such as enumeration definitions (table tec_t_enumerations). After this task, the dispatch engine has finished its initialization phase, connects to the reception engine and the task engine, and starts operating after receiving the GO message from the Tivoli Enterprise Console Server. The reception engine is the second process spawned by the server engine. The contents of the tec_reception log file are shown in Example 14-13 on page 552.

Chapter 14. Tivoli Enterprise Console

551

Example 14-13 Contents of the log file tec_reception .319779 tec_reception[26016] TR1 tec_ipc_connect.c:154: Connect to TEC Master succeeded Aug 1 14:12:01.319933 tec_reception[26016] TR0 tec_reception.c:268: Reception connected to Master on host chatham ... Aug 1 14:12:01.320080 tec_reception[26016] TR0 tec_msg.c:102: Writing packet Aug 1 14:12:01.320232 tec_reception[26016] TR2 tec_msg_hi.c:108: Writing message HI : RECV Aug 1 14:12:01.320376 tec_reception[26016] TR1 tec_msg.c:134: Wrote packet ... Aug 1 14:12:01.321630 tec_reception[26016] TR0 db_utils.c:822: RIM connect Aug 1 14:12:02.070517 tec_reception[26016] TR2 db_utils.c:843: connect: DB vendor is RIM_objects::RIM_Oracle Aug 1 14:12:02.070811 tec_reception[26016] TR1 db_utils.c:867: RIM connect succeeded ... Aug 1 14:12:21.188716 tec_reception[26016] TR2 tec_msg_go.c:60: Received message GO ... Aug 1 14:12:21.277356 tec_reception[26016] TR2 db_utils.c:297: short_message TEC_Start;source=TEC;msg="TEC Event Server initialized";hostname=chatham;END

The reception engine connects to the Tivoli Enterprise Console Server and sends the HI message. Then it establishes its RIM connection. After receiving the GO message from the Tivoli Enterprise Console Server, it starts operating, and the TEC_Start event is processed. The rule engine is started as the third process. The contents of the tec_rule log file are shown in Example 14-14. Example 14-14 Contents of the log file tec_rule .566226 tec_rule[30000] TR1 tec_rule_init.c:84: Init engine with kb in tec/rb_dir ... Aug 1 14:12:01.816191 tec_rule[30000] TR2 tec_rule_init.c:90: Rule Engine cache initialized for 1000 events ... Aug 1 14:12:01.987432 tec_rule[30000] TR0 tec_baroc.c:151: Load BAROC classes ... ... Aug 1 14:12:02.022938 tec_rule[30000] TR1 tec_baroc.c:193: File tec/rb_dir/TEC_CLASSES/root.baroc parsed ... Aug 1 14:12:09.165957 tec_rule[30000] TR0 tec_baroc.c:205: Load BAROC classes succeeded ...

552

Troubleshooting Tivoli Using the Latest Features

Aug 1 14:12:09.183977 tec_rule[30000] TEC Master succeeded Aug 1 14:12:09.184134 tec_rule[30000] Master on host chatham Aug 1 14:12:09.184283 tec_rule[30000] Aug 1 14:12:09.184441 tec_rule[30000] : RULE Aug 1 14:12:09.184586 tec_rule[30000] ... Aug 1 14:12:15.935423 tec_rule[30000] TEC Reception succeeded Aug 1 14:12:15.935583 tec_rule[30000] connected to Reception on host chatham ... Aug 1 14:12:15.940504 tec_rule[30000] TEC Dispatch succeeded Aug 1 14:12:15.940660 tec_rule[30000] connected to Dispatch on host chatham ... Aug 1 14:12:21.195699 tec_rule[30000]

TR1 tec_ipc_connect.c:154: Connect to TR0 tec_rule.c:267: Rule connected to TR0 tec_msg.c:102: Writing packet TR2 tec_msg_hi.c:108: Writing message HI TR1 tec_msg.c:134: Wrote packet TR1 tec_ipc_connect.c:154: Connect to TR0 tec_pool_r_master.c:157: Rule

TR1 tec_ipc_connect.c:154: Connect to TR0 tec_pool_r_master.c:197: Rule

TR2 tec_msg_go.c:60: Received message GO

First, the rule engine initializes and sets the Rule Engine cache. Then all BAROC classes are loaded and parsed. A connection to the TEC Master is established and the HI message is sent. Then the rule engine connects with the reception engine and dispatch engine and, after receiving the GO message from the Tivoli Enterprise Console Server, it starts operating. Finally, the task engine is requested to start up by the Tivoli Enterprise Console Server. The contents of the tec_task log file are shown in Example 14-15. Example 14-15 Contents of the log file tec_task 92826 tec_task[16506] TR0 launch.c:200: forward_init Aug 1 14:12:04.500819 tec_task[16506] TR1 launch.c:225: Aug 1 14:12:04.502258 tec_task[16506] TR0 launch.c:658: Aug 1 14:12:04.519492 tec_task[16506] TR2 launch.c:672: to master Aug 1 14:12:04.519913 tec_task[16506] TR2 launch.c:682: Aug 1 14:12:04.520394 tec_task[16506] TR0 launch.c:756: Aug 1 14:12:04.520587 tec_task[16506] TR0 launch.c:713: Aug 1 14:12:04.520853 tec_task[16506] TR0 launch.c:610: Aug 1 14:12:04.521026 tec_task[16506] TR2 launch.c:618: for message Aug 1 14:12:04.525902 tec_task[16506] TR1 launch.c:742: succeeded (536976952) Aug 1 14:12:04.526251 tec_task[16506] TR2 launch.c:776: Aug 1 14:12:04.526419 tec_task[16506] TR1 launch.c:779: succeeded (536976952)

forward_init succeeded connect_to_master Opened IPC connection Sent HI to master open_task_server_port open_server_port poll_master poll_master waiting open_server_port Sent TP to master open_task_server_port

Chapter 14. Tivoli Enterprise Console

553

Aug 1 14:12:15.963702 Aug 1 14:12:15.963945 Aug 1 14:12:15.964092 (536972408) Aug 1 14:12:15.964234 for message

tec_task[16506] TR0 launch.c:524: poll_client tec_task[16506] TR0 launch.c:75: new_refcnt tec_task[16506] TR1 launch.c:80: new_refcnt succeeded tec_task[16506] TR2 launch.c:536: poll_client waiting

The task engine also connects to the TEC Master and sends a HI message. As the Tivoli Enterprise Console Server does not respond by sending a GO message, the task engine immediately starts operating and listens for incoming messages.

Startup of TEC Server-odstat/wtrace output The following information was gathered using the Tivoli Management Framework commands odstat and wtrace. For the Tivoli Enterprise Console Server startup, the main parts from odstat are shown for different phases of the startup procedure. More detailed information is visible in the wtrace, which is appended in some examples. During the start and run time of the Tivoli Enterprise Console Server, the method start_server is always running: run

0

0 14:11:59

1692742425.2.32#Tec::Server# start_server

The first action when starting up the Tivoli Enterprise Console, is to locate the EventServer (see Example 14-16). Example 14-16 Tivoli Enterprise Console startup-1 7114 O+ 7115 O

done done

15 103

rem-ic

0 14:11:59 0 14:11:59

0.0.0 get_name_registry 1692742425.1.26 lookup

7115 M-H Extern 44 Time run: [Tue 01-Aug 14:11:59] Object ID: 1692742425.1.26 Method: lookup Principal: [email protected] (0/0) Path: /aix4-r1/TMF/BASESVCS/TNR_prog1 Input Data: (encoded): "EventServer" "EventServer" rem-oc 7115 103 Results: (encoded): { "1692742425.2.32#Tec::Server#" "EventServer" { "null" 0 false }}

554

Troubleshooting Tivoli Using the Latest Features

The next step is setting and updating the Tivoli Enterprise Console Server state with server_initing (Example 14-17). Also, the Tivoli Desktop’s icon is being changed to reflect the currently starting Tivoli Enterprise Console Server state. Example 14-17 Tivoli Enterprise Console startup-2 7117 O+hdq 2-7116 done 6 0 14:11:59 1692742425.2.32#Tec::Server# _set_state 7118 O 2-7117 done 0 0 14:11:59 update_state 7119 O 2-7117 done 6 0 14:11:59 1692742425.1.179#TMF_Administrator::Configuration_GUI# update_state 7120 O 2-7117 done 6 0 14:11:59 1692742425.1.871#Tec::InstanceManager# update_state 7121 O+hdq 2-7116 done 159 0 14:11:59 1692742425.2.32#Tec::Server# get_backrefs 7122 O 2-7116 done 6 0 14:11:59 1692742425.1.179#TMF_Administrator::Configuration_GUI# refresh_member

loc-is

7117 setattr 25 state Time run: [Tue 01-Aug 14:11:59] Object ID: 1692742425.2.32#Tec::Server# Method: _set_state Input Data: (encoded): "server_initing"

After finishing these steps, the Tivoli environment’s settings are collected (see Example 14-18). These include, for example, host locations (if the oserv is running), interpreter type, file and binary locations, rule base location and settings, reception log buffer size, and other Tivoli Enterprise Console configuration data. The details about what data is collected can be found in wtrace, starting the search with the get_host_location method. Example 14-18 Tivoli Enterprise Console startup-3 7124 O+ 2-7116 done 45 0 14:12:00 0.0.0 get_host_location 7125 O+ 2-7116 done 45 0 14:12:00 0.0.0 get_host_location 7126 O+hdoq 2-7116 done 38 0 14:12:00 1692742425.2.7#TMF_ManagedNode::Managed_Node# install_directory 7127 O+ 2-7126 done 15 0 14:12:00 0.0.0 get_oserv 7128 O+ 2-7126 done 21 0 14:12:00 1692742425.2.2 query install_dir 7129 O+hdoq 2-7116 done 24 0 14:12:00 1692742425.2.7#TMF_ManagedNode::Managed_Node# interpreter 7130 O+ 2-7129 done 15 0 14:12:00 0.0.0 get_oserv 7131 O+ 2-7129 done 7 0 14:12:00 1692742425.2.2 query interp

loc-is loc-is

7116 getattr 7116 getattr

0 0

tec_home rb_dir

Chapter 14. Tivoli Enterprise Console

555

loc-is loc-is loc-is loc-is loc-is loc-is loc-is loc-is loc-is loc-ic

7116 getattr 0 is_tec_server 7116 getattr 0 rule_trace 7116 getattr 0 rule_cache_size 7116 getattr 0 recv_log 7116 getattr 0 recv_bufsize 7116 getattr 0 rule_cache_clean_freq 7116 getattr 0 rule_cache_non_closed_history 7116 getattr 0 rule_cache_full_history 7116 getattr 0 master_start_timeout 7124 M-H 2-7116 0 Method: get_host_location Results: (ascii): 1692742425.2.7#TMF_ManagedNode::Managed_Node# loc-ic 7126 M-hdoq 2-7116 0 Method: install_directory Results: (encoded): "/usr/local/Tivoli/bin" loc-ic 7127 M-H 2-7126 0 Method: get_oserv Results: (ascii): 1692742425.2.2 loc-ic 7128 M-H 2-7126 0 Method: query Method Args: install_dir Results: (ascii): /usr/local/Tivoli/bin loc-ic 7129 M-hdoq 2-7116 0 Method: interpreter Results: (encoded): "aix4-r1"

Next, check to see if the RDBMS is up. This is done via several RIM connects (see Example 14-19). Example 14-19 Tivoli Enterprise Console startup-4 7132 O+ho 2-7116 done 9 0 14:12:00 1692742425.2.32#Tec::Server# is_dataserver_running is_dataserver_running 7134 O+ 2-7132 done 15 0 14:12:00 0.0.0 get_name_registry 7136 O 2-7132 done 104 0 14:12:00 1692742425.1.26 lookup 7137 O+hdoq 2-7132 done 18 0 14:12:00 1692742425.2.38#RIM::RDBMS_Interface#

After this task is completed, the Tivoli Enterprise Console Server is nearly finished with starting up. Again, the Tivoli Enterprise Console Server state is updated with server_up and the Tivoli Desktop’s icon is changed to reflect the now running Tivoli Enterprise Console Server state (showing the “red arrow around the globe”) (see Example 14-20 on page 557).

556

Troubleshooting Tivoli Using the Latest Features

Example 14-20 Tivoli Enterprise Console startup-5 7155 O+hdq 2-7116 done 6 0 14:12:22 1692742425.2.32#Tec::Server# _set_state 7156 O 2-7155 done 0 0 14:12:22 update_state 7157 O 2-7155 done 6 0 14:12:22 1692742425.1.179#TMF_Administrator::Configuration_GUI# update_state 7158 O 2-7155 done 6 0 14:12:22 1692742425.1.871#Tec::InstanceManager# update_state 7159 O+hdq 2-7116 done 159 0 14:12:22 1692742425.2.32#Tec::Server# get_backrefs 7160 O 2-7116 done 6 0 14:12:22 1692742425.1.179#TMF_Administrator::Configuration_GUI# refresh_member

loc-is

7155 setattr 20 state Time run: [Tue 01-Aug 14:12:22] Object ID: 1692742425.2.32#Tec::Server# Method: _set_state Input Data: (encoded): "server_up"

Now the Tivoli Enterprise Console Server is running and the server_up state is also logged in the tec_master diagnostic log file (only with trace level trace2 (TR2)), as already mentioned in “Startup of TEC Server with diagnostic logging level trace2” on page 548.

14.3.2 TEC User Interface Server 3.7 diagnostic logging The following sections describe how to set up diagnostic logging for the Tivoli Enterprise Console User Interface Server 3.7 process. All considerations cited in 14.3.1, “Tivoli Enterprise Console Server 3.7 diagnostic logging” on page 538 apply here. Only the differences are mentioned below.

General information and recommendations The User Interface Server is running as one process: tec_ui_server. Four tracing levels are available: error, trace0, trace1, and trace2. If no trace level higher than error has been specified, the diagnostic log file usually has a size of 0 bytes. Only when an error occurs will it contain information. This is an easy way to detect failures of the UI Server process. For further information, refer to “General information and recommendations” on page 538.

Chapter 14. Tivoli Enterprise Console

557

Diagnostic logging file format The format of the log file is the same as for the Tivoli Enterprise Console Server tracing. Please refer to “Diagnostic logging file format” on page 539 for more details.

Diagnostic logging configuration file format The diagnostic logging is set up using the $BINDIR/TME/TEC/.ui_server_diag_config configuration file. Example 14-21 shows its contents. Example 14-21 Diagnostic logging configuration file # Author: Rod Dunsmore # # Sample Bim Error/Trace Diagnostic Messages configuration file # # format: # Highest_level

# Truncate_on_restart [true|false] # Highest_level

# # Highest_level trace2 Truncate_on_restart # ui_server ############# ui_server Highest_level ui_server TecServerProxy ui_server ConsoleProxy ui_server ConsoleProxyMgr ui_server EventTransactionMgr ui_server EventTransaction ui_server CacheEvent ui_server Event ui_server DoubleLinkedList ui_server TimedHashTable ui_server ui_meth ui_server uiserver_init ui_server SerObj ui_server DB_Utils ui_server LookUp ui_server Evt_Rep

true

error error error error error error error error error error error error error error error error

/tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server

The format of the diagnostic logging configuration file has been explained in detail in “Diagnostic logging configuration file format” on page 539.

558

Troubleshooting Tivoli Using the Latest Features

Setting up diagnostic logging To enable the diagnostic logging the file $BINDIR/TME/TEC/.ui_server_diag_config needs to be edited. The format of the file has been explained in the previous chapter. Before editing the file, make a copy of the original version. This makes it easy to revert the tracing back to normal level. To enable full logging with trace2 for the UI Server process and its modules, the diagnostic logging configuration file needs to be changed in the following way (all bold marked strings were changed): # Author: Rod Dunsmore # # Sample Bim Error/Trace Diagnostic Messages configuration file # # format: # Highest_level

# Truncate_on_restart [true|false] # Highest_level

# # Highest_level trace2 Truncate_on_restart # ui_server ############# ui_server Highest_level ui_server TecServerProxy ui_server ConsoleProxy ui_server ConsoleProxyMgr ui_server EventTransactionMgr ui_server EventTransaction ui_server CacheEvent ui_server Event ui_server DoubleLinkedList ui_server TimedHashTable ui_server ui_meth ui_server uiserver_init ui_server SerObj ui_server DB_Utils ui_server LookUp ui_server Evt_Rep

true

trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2 trace2

/tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server /tmp/ui_server

In case of an error, enable all entries for logging with trace level trace2. This provides a log file containing the most detailed information to help track down a problem.

Chapter 14. Tivoli Enterprise Console

559

All entries for the log file location can be left as default. For more information on this topic, see Chapter 15, “IBM Tivoli Monitoring” on page 617.

Trace file locations The location of the UI Server log file can be specified in the file $BINDIR/TME/TEC/.ui_server_diag_config. By default, the file and its location is tmp/ui_server. All considerations from “Trace file locations” on page 545 also take effect for the UI Server.

Starting and stopping tracing Perform the following actions to start diagnostic logging: 1. Save the original diagnostic logging configuration file as: $BINDIR/TME/TEC/.ui_server_diag_config.original 2. Apply all changes as needed to the $BINDIR/TME/TEC/.ui_server_diag_config file to enable diagnostic logging and save the file. 3. Stop the UI Server. The new settings will take effect after the (automated) restart of the UI Server. Stop diagnostic logging by restoring the original diagnostic logging configuration file from $BINDIR/TME/TEC/.ui_server_diag_config.original to $BINDIR/TME/TEC/.ui_server_diag_config. 4. Stop the UI Server. The new settings will take effect after the (automated) restart of the UI Server. Tip: There is no command to start or stop the UI Server.

To stop the UI Server, its process (tec_ui_server) needs to be killed. Alternatively, the UI Server kills itself if no requests are received for a certain time (approximately two minutes). Also, the UI Server cannot be started manually. It will start automatically with the next connection of a Tivoli Enterprise Console Java Console being launched. On Windows NT systems, the only way to activate the diagnostic logging is to kill the UI Server process, for example, using the NT Task Manager facility (End Process button), as shown in Figure 14-5 on page 561.

560

Troubleshooting Tivoli Using the Latest Features

Figure 14-5 Windows NT Task Manager

On UNIX systems, the UI Server does not necessarily have to be shut down and restarted to pick up the trace options. Alternatively, it is sufficient to use the kill -1 command on the UI Server process. This option can be useful if a productive system cannot be shut down easily. Enter, for example, the commands Example 14-22. Example 14-22 Kill UI Server process chatham> ps -ef|grep tec_ui_server nobody 3948 17510 0 16:13:18 - 0:00 tec_ui_server chatham> kill -1 3948

After performing these commands, all new settings that have been made previously in the .ui_server_diag_config file will take effect.

Examples of User Interface Server diagnostic logging The following examples provide output listings for the UI Server with diagnostic logging level trace2 (TR2) turned on. All logging entries shown in the following also apply for the diagnostic logging levels trace0 (TR0) and trace1 (TR1). For

Chapter 14. Tivoli Enterprise Console

561

each trace level, the entries of all higher levels would not appear in the files. Every trace level lower than trace2 reflects only a subset, and does not add additional data.

Automated startup of UI Server after launching a TEC Console First, the UI Server is not running. It cannot be started manually and it also does not automatically start when the Tivoli Enterprise Console Server starts. Instead, the UI Server only starts when it is needed and then connects to the Tivoli Enterprise Console Server. This happens, for example, when a Tivoli Enterprise Console Java Console is being launched and wants to connect to the UI Server. The contents of the ui_server log file for this scenario are shown in Example 14-23. Example 14-23 Contents of the log file ui_server .102590 ui_server[3948] TR1 TecServerProxy.C:816: rc = 0 Aug 3 16:13:31.102869 ui_server[3948] TR1 TecServerProxy.C:817: connectToServer() - exiting Aug 3 16:13:31.103019 ui_server[3948] TR1 TecServerProxy.C:871: initServerConnection() - connectToServer() == 0 Aug 3 16:13:31.103170 ui_server[3948] TR1 TecServerProxy.C:882: initServerConnection() - Attempting to connect from client side... Aug 3 16:13:31.103962 ui_server[3948] TR1 TecServerProxy.C:887: initServerConnection() - After iom_timed_open() Aug 3 16:13:31.104169 ui_server[3948] TR1 TecServerProxy.C:918: initServerConnection() - Unlocking connect_mutex Aug 3 16:13:31.104327 ui_server[3948] TR1 TecServerProxy.C:921: initServerConnection() - exiting Aug 3 16:13:31.104480 ui_server[3948] TR1 TecServerProxy.C:267: listenToServer() - Connected to server! Aug 3 16:13:31.104646 ui_server[3948] TR2 TecServerProxy.C:276: listenToServer() - blocking on iom_receive... Aug 3 16:13:31.111173 ui_server[3948] TR0 TecServerProxy.C:379: processServerData() - entering

Automated shutting down of UI Server after launching a TEC Console If the UI Server is idle, it shuts itself off and restarts automatically when necessary. The output in Example 14-24 on page 563 reflects the UI Server shutting down itself.

562

Troubleshooting Tivoli Using the Latest Features

Example 14-24 UI Server shutting down itself Aug 3 16:06:31.682437 mlock... Aug 3 16:06:31.682876 purge... Aug 3 16:06:31.683053 mlock... Aug 3 16:06:31.683206 sleep...

ui_server[28034] TR1 TimedHashTable.C:92: writelocking ui_server[28034] TR1 TimedHashTable.C:94: about to ui_server[28034] TR1 TimedHashTable.C:117: unlocking ui_server[28034] TR1 TimedHashTable.C:76: going to

14.4 RIM tracing The wrimtrace utility is the primary tool for debugging RIM problems. It is used to enable RIM tracing in order to debug RIM objects while it is running. It prints two kinds of information to the RIM log file: 򐂰 The contents of the IOM (Inter Object Message) packets being passed between the RIM object and the client program 򐂰 Errors produced by SQL statements to the database server

Three tracing options are available: 򐂰 ERROR: Enables RIM tracing and prints SQL errors to the log file 򐂰 INFORMATION: Enables RIM tracing and prints the contents of IOM packets to the log file 򐂰 TRACE_OFF: Disables RIM tracing

To start/stop RIM tracing, proceed as follows: 򐂰 Kill all RIM__prog and RIM__Agent processes. Note: After killing the RIM processes, all open Tivoli Enterprise Console Consoles need to be restarted. 򐂰 Use the wrimtrace command.

To enable RIM tracing, for example, the following commands apply: 򐂰 wrimtrace tec ERROR 򐂰 wrimtrace tec INFORMATION 򐂰 wrimtrace tec "ERROR|INFORMATION" (enables both options simultaneously)

Chapter 14. Tivoli Enterprise Console

563

To disable RIM tracing, use the following command: wrimtrace tec TRACE_OFF

The RIM processes are spawned automatically on request. Then the new tracing options take effect. The trace file location on UNIX systems is /tmp/rim_db_log by default. To change this location, an oserv environment variable needs to set up as follows: 1. Run odadmin environ get > environ.txt. 2. Append RIM_DB_LOG= in the environ.txt. 3. Run odadmin environ set < environ.txt>. 4. Restart the oserv to activate the new environment variable. On Windows NT, the oserv environment variable needs to be set to enable the RIM trace logging. An example of RIM tracing is shown in Example 14-25. Example 14-25 RIM tracing 00029374 [Mon Aug 7 11:44:22 2000] Trace Message - Connection ID:: Connecting to IOM Channel 00029374 [Mon Aug 7 11:44:22 2000] Trace Message - Connection ID:: Beginning IOM Loop 00029374 [Mon Aug 7 11:44:23 2000] Trace Message - Connection ID:: IOM Command:RETRIEVE2 row_param:Table Name :tec_t_evt_rec_logColumns: server_hndl(L):0 reception_id(L):0 agent_id(L):0 date_entry(L):0 status(L):0 commnt(S): short_message(S): long_message(S): rows: where_clause:server_hndl = 1 order by date_entry desc, reception_id desc number1:100number2:0string1: string2: 00029374 [Mon Aug 7 11:44:23 2000] Trace Message - Connection ID:: REPLY IOM COMMAND :RETRIEVE2 Result : Success Result : Success rows:No. of Rows 100 First Row :Table Name :tec_t_evt_rec_logColumns: server_hndl(L):1

564

Troubleshooting Tivoli Using the Latest Features

reception_id(L):7445 agent_id(L):65537 date_entry(L):965666339 status(L):2 commnt(S):[N] short_message(S):[N] long_message(S):nt_CpuPrcCpuTime; source='SENTRY'; sub_source='NT_PRF'; severity='HARMLESS'; origin='9.3.187.239'; sub_origin='itsovas4'; hostname='itsovas4'; adapter_host='itsovas4'; distrib_admin='itsovas4'; response_level='warning'; probe='PrcCpuTime'; probe_arg='0'; tmr='1692742425'; dispatcher='21'; prev_value='90.7174'; value='41.3919'; effective_value='41.3919'; collection='NT_Processor'; info=''; monitor='Percent Processor Time'; units='(percent)'; relation='Decreases below'; relation_delta=' ()'; msg='Distributed Monitoring NT_PRF/Percent Processor Time on host itsovas4 08/07/00 11:46:05 CDT Status: >>> warning SOFTWARE -> Tivoli -> Tmw -> DisplayThreshold and HKEY_LOCAL_MACHINE -> SOFTWARE -> Tivoli -> Tmw -> MaxFileLogSize. Each line in the log contains the following columns: 򐂰 Date 򐂰 Trace level

Chapter 15. IBM Tivoli Monitoring

633

򐂰 Component/classes

– ActionManager: Executes corrective actions (built-in tasks). – Adapters: Sends events to TEC and TBSM. – Analyzer: Collects and analyzes the performance data. – ComponentManager: Provides the default implementation for creation/profile push/start/stop/delete. – Dispatcher: Responsible for forwarding the events or operations to the subscribed components (this is because the upcall is not thread safe; therefore, events or operations that need to perform upcall need to pass through the dispatcher). – EventAggregator: Receives indications from decision trees and processes those based on the occurrences and holes. – EventCorrelator: Consolidates and correlates events generated between different resource models. – InstallSharedDll: Installs the shared dll. – Logger: Stores the data locally on the endpoint. – MsgQueueManager. – MofCompiler. – TemporarySink: This is a pure virtual utility that provides a wrapping to WMI for facilitating the implementation of WMI event listeners. It also provides support for checking the correct behavior, ensuring that WMI does not unload the listeners. – TMNT_Upcall. – TMWMethodProv: The WMI method provider; responsible for implementing the CIM class methods. – TMWService: An object that allows the scripts implementing the best practices to configure and set the default values for parameters and thresholds to specify the data sources that will be used; also useful to use some IBM Tivoli Monitoring services like event sending, tracing, running monitors and scripts, and data logging; basically, a way the scripts interface the engine and the underlying CIM implementation. – WbemConnector. 򐂰 Thread ID 򐂰 Message

After changing the trace settings, the endpoint monitoring engine should be restarted to reflect the changes.

634

Troubleshooting Tivoli Using the Latest Features

The monitoring engine should be stopped using the command: wdmcmd -stop -e endpoint_label

The monitoring engine should be started using the command: wdmcmd -restart -e endpoint_label

Tip: When issuing the wdmtrceng command on a Windows target, you must use the forward slash (/) in the directory path. For example: wdmtrceng -e itsotiv1 c:/progra~1/tivoli/lcf/dat /1/LCFNEW/ Tmw2k/Tmw2k.log 3 800000

WMI You can use the Workbench to see the functionality of the Windows resource model, then use the CIM Studio to browse the CIM repository. CIM Studio provides the wbemdump command, which is a CLI tool that can query the CIM repository. These kinds of tools can be useful to determine the efficiency of the probes. The WMI log files record the activities of WMI in collecting the data required by the resource models. The WMI log files are located in the directory %SystemRoot%/system32/wbem/logs. For details, see the WMI documentation available in Windows Help, or you can see also a WMI Tutorial available at: http://www.microsoft.com/downloads/release.asp?releaseid=12570

Non-Windows endpoints logs The product maintains four logs at the endpoint: 򐂰 򐂰 򐂰 򐂰

Endpoint engine update log Endpoint engine log and trace Endpoint native trace Endpoint JMX log

Chapter 15. IBM Tivoli Monitoring

635

Note: The endpoint engine update log, endpoint engine log, endpoint native trace, and endpoint JMX log use the same configuration file for tracing. So changing one trace level of one log affects the other logs’ settings.

The wdmtrceng command changes the files $LCF_DATDIR/LCFNEW/Tmw2k/Unix/data/log_level and $LCF_DATDIR/LCFNEW/Tmw2k/Unix/data/log_size. In our environment, to change all trace levels to 3 and the trace size to 8000000, we issued the following command: wdmtrceng -e itsodev3 /Tivoli/lcf/dat/1/LCFNEW/Tmw2k/Unix/data/log_level 3 8000000

Endpoint engine update log This log maintains details of the activities of the engine update process, which is the process that launches and controls the endpoint engine as follows: Process name

tmw2k_ep; this process is started and finished very fast, so it is not possible to see the running process.

Log name

trace_dmxeu.log (when the log is full, it is renamed as dmxeu.old, deleting any existing file with that name, and a new log file is created).

Location

$LCFDATDIR/LCFNEW/AMW/logs.

Configuration

To configure the trace, issue the wdmtrceng command from the server or ManagedNode, identifying the endpoint at which you want to configure the log. You can set any of the following parameters: Trace trace_dmxeu.log Trace level: Default is 0; you can set it to the following modes: 0: Errors 1: Warnings and errors: Trace level MID 2: All steps of the monitoring: Trace level MAX 3: Verbose output, all operations of the monitoring process: Trace level MAX and OTHER

636

Troubleshooting Tivoli Using the Latest Features

Endpoint engine log and trace This log maintains details of the activities of the engine, which is the process that runs the resource models and sends events and indications to the gateway, as follows: Process name

java com.tivoli.dmunix.ep.agent.Main.

Log name

msg_dmxengine.log (when the log is full, it is renamed as dmxengine.old, deleting any existing file with that name, and a new log file is created).

Trace name

trace_dmxengine.log (when the log is full, it is renamed as dmxengine.old, deleting any existing file with that name, and a new log file is created).

Location

$LCF_DATDIR/LCFNEW/AMW/logs.

Configuration

To configure the trace, issue the wdmtrceng command from the server or ManagedNode, identifying the endpoint at which you want to configure the log. You can set any of the following parameters: Trace trace_dmxeu.log Trace level: Default is 0; you can set it to the following modes: 0: Errors 1: Warnings and errors: Trace level MID 2: All steps of the monitoring: Trace level MAX 3: Verbose output, all operations of the monitoring process: Trace level MAX and OTHER

Endpoint native trace This log maintains details of the activities of the native processes, which obtain the resource information required by the resource models as follows: Process name

java com.tivoli.dmunix.ep.agent.Main.

Log name

trace_dmxntv.log (when the log is full it is renamed as dmxntv.old, deleting any existing file with that name, and a new log file is created).

Location

$LCFDATDIR/LCFNEW/AMW/logs.

Chapter 15. IBM Tivoli Monitoring

637

Configuration

To configure the trace, issue the wdmtrceng command from the server or ManagedNode, identifying the endpoint at which you want to configure the log. You can set any of the following parameters: Trace trace_dmxeu.log Trace level: Default is 0; you can set it to the following modes: 2: All steps of the monitoring: Trace level MAX 3: Verbose output, all operations of the monitoring process: Trace level MAX and OTHER

Endpoint JMX log This log maintains details of the activities of the JMX process, which is a Tivoli implementation of Java Management Extension. It is only written when the trace level is set to 3. The details are as follows: Process name

Tmx4j.

Log name

Tmx4j_1.log (when the log is full it is renamed as Tmnx4j_2.log, deleting any existing file with that name, and a new log file is created).

Location

$LCF_DATDIR/LCFNEW/Tmw2k/UNIX.

Configuration

To configure the log, issue the wdmtrceng command from the server or ManagedNode, identifying the endpoint at which you want to configure the log. You should note that this command maintains a common configuration for all logs at a non-Windows endpoint. You can set any of the following parameters: Trace level: 2 (verbose)

Note: For the log’s endpoint native trace and endpoint JMX, we only noticed some information being registered in the log after setting the log_level to a minimum value of 2.

15.4.5 Web Health Console logs and traces The Web Health Console has a facility for both standard message logging and advanced debug tracing. Message logging and minimum level debug tracing are always on and writing to their own files. These files can be found under /usr/Tivoli/AMW/logs.

638

Troubleshooting Tivoli Using the Latest Features

Modifying Web Health Console tracing parameters Tracing can be adjusted by modifying the tracing parameters for the Web Health Console application. Edit the WHC_INSTALL_DIR\installedApps\dm.ear\dm.war WEB-INF\classes\com\ibm \dm\web\util\PDLog.properties file. You can change the lines: tmeLogger.trc.level=DEBUG_MIN tmeLogger.trc.level=DEBUG_MID tmeLogger.trc.level=DEBUG_MAX

depending on how much tracing you want: MIN, MID or MAX. MID provides a good amount of Web Health Console operation, while MAX provides a great deal of detailed internal operation. You can also adjust the following lines to change the number of trace files written and the max size of the files before it roles over to a new file: file.maxFiles=3 file.maxFileSize=1024

Once these changes are made, you should stop and start the WebSphere Application Server to enable the changes, with the following commands: 򐂰 UNIX: export DISPLAY=machineName:0.0 WHC_INSTALL_DIR/bin/stopServer.sh WHC_INSTALL_DIR/bin/startServer.sh

򐂰 Windows: WHC_INSTALL_DIR/bin/stopServer.bat WHC_INSTALL_DIR/bin/startServer.bat

Note: The Web Health Console will run slower while in MID or MAX tracing. This should be turned back to MIN as soon as possible.

WebSphere tracing You can set also WebSphere tracing. Then you will be able to see all the requests and operations in the ApplicationServer perspective. To enable WebSphere tracing, do the following steps: $WHC_INSTALL_DIR/bin/DrAdmin.sh -serverPort 7000 -setRingBufferSize 2048 $WHC_INSTALL_DIR/bin/DrAdmin.sh -serverPort 7000 -setTrace "com.ibm.*=all=enabled"

After these commands, the trace log can be located under $WHC_HOME/logs/default_server_stdout.log and $WHC_HOME/logs/default_server_stderr.log.

Chapter 15. IBM Tivoli Monitoring

639

To disable the WebSphere tracing, run the command: $WHC_INSTALL_DIR/bin/DrAdmin.sh -serverPort 7000 -setTrace "com.ibm.*=all=disabled"

For further reference to WebSphere troubleshooting, please refer to IBM WebSphere Version 4.0: Advanced Edition Handbook, SG24-6176.

15.5 Tools In this section, we describe some tools provided with the product and other available tools.

15.5.1 Tool to generate XML file The formatter program creates an XML-based file from the log or trace generated by IBM Tivoli Monitoring. It is located on the IBM Tivoli Monitoring Tools CD in the LogToXML directory. It accepts three parameters: 򐂰 The first parameter defines whether the product is dealing with a log file or a message file. 򐂰 The second parameter is the name of the source file (either a log or a message file). 򐂰 The third parameter is the name of the file to be created in XML.

Here is an example: prepareLog LOG trace_x.log trace_x.xml

Note: Before running the prepare log program, the Java Virtual Machine 1.3.0 path must be set.

15.5.2 Autotrace Autotrace is a process tracing software from The Kernel Group Inc. (TKG), and is available on Solaris, HP-UX, Windows, and AIX platforms. It is used to collect information at an endpoint that is stored in a configurable memory buffer. You choose when to write a snapshot of the buffer to a file, and you then send the file to Tivoli Customer Support for analysis. The information written to the trace file consists of the input and output parameters for each process call.

640

Troubleshooting Tivoli Using the Latest Features

Autotrace consists of two elements: 򐂰 A trace collector enabled and controlled by you at the endpoint and Tivoli Management Region server 򐂰 A trace analyzer operated by the Tivoli Customer Support staff at the Tivoli Customer Support

For installation and use of Autotrace, see Example 15-5 on page 659, and refer to the IBM Tivoli Monitoring User’s Guide Version 5.1, SH19-4569.

15.5.3 Serviceability tasks IBM Tivoli Monitoring provides three serviceability tasks, as described in the following sections.

DMCollectEpLog This task collects (in a tar file created at the endpoint) all the endpoint logs and information about the size and dates of the binaries, as well as the current and universal time the logs were created. The task accepts the name of the tar file as an argument. The tar file can be found at the $LCF_DATDIR endpoint directory. For UNIX/Linux platforms, the following files are collected: 򐂰 $LCF_DATDIR/lcfd.log 򐂰 $LCF_DATDIR/lcfd.bk 򐂰 $LCF_DATDIR/last.cfg 򐂰 $LCF_DATDIR/LCFNEW/Tmw2k/Unix/Tmx4j1.log 򐂰 $LCF_DATDIR/LCFNEW/Tmw2k/Unix/Tmx4j2.log 򐂰 $LCF_DATDIR/LCFNEW/AMW/logs/trace_xxxxx.log 򐂰 $LCF_DATDIR/LCFNEW/AMW/logs/msg_xxxxx.log 򐂰 $LCF_DATDIR/LCFNEW/Tmw2k/Unix/data/dmxout.log (this is a file that traces errors at Java engine startup)

For Windows platforms, the following files are collected: 򐂰 %LCF_DATDIR%/lcfd.log 򐂰 %LCF_DATDIR%/lcfd.bk 򐂰 %LCF_DATDIR%/last.cfg 򐂰 %LCF_DATDIR%/LCFNEW/Tmw2k/Unix/Tmw2k.log

Chapter 15. IBM Tivoli Monitoring

641

Any core dumps from the engine are not included in the tar file to avoid impacting the task performance. Core dumps can be found in the $LCF_DATDIR \LCFNEW\Tmw2k\Unix directory.

DMCollectMnLog This task collects (in a tar file created at the ManagedNode in the $DBDIR directory) all the ManagedNode logs and traces, including event logs for Windows platforms. The task accepts the name of the tar file as an argument. The tar file can be find at the ManagedNode directory $DBDIR. For UNIX/Linux platforms, the following files are collected: 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰

$DBDIR/oservlog $DBDIR/gatelog /tmp/traces/trace_tnmt_gtw_engn.log $DBDIR/AMW/logs/trace_xxxx.log $DBDIR/ $DBDIR/

For Windows platforms, the following files are collected: 򐂰 򐂰 򐂰 򐂰 򐂰 򐂰

%DBDIR%/oservlog %DBDIR%/gatelog %DBDIR%/tmp/traces/trace_tnmt_gtw_engn.log %DBDIR%/AMW/logs/trace_xxxx.log %DBDIR%/ %DBDIR%/

DMCollectEpEnv This task collects information about the environment at the endpoint. The data collected is written to a file using the Execute Task dialog (Save to File option). This task does not accept arguments. For UNIX/Linux platforms, the following information is collected: 򐂰 򐂰 򐂰 򐂰 򐂰

Operating system version Disk space statistics and file system installation at the endpoint Memory statistics (available and used) Environment variable settings List of system patches installed

For Windows platforms, the output from the winmsd command is collected. 򐂰 For Windows 2000 the report is created in %LCF_DATDIR%/winmsdreport.txt. 򐂰 For Windows NT the report is created in %LCF_DATDIR%/.txt.

642

Troubleshooting Tivoli Using the Latest Features

The output of this task is shown in Example 15-1. Example 15-1 DMCollectEpEnv task output tividc11:/>wruntask -t DMCollectEpEnv -l "Tivoli Distributed Monitoring (Advanced Edition) Tasks" -h vmlinux5 ############################################################################ Task Name: DMCollectEpEnv Task Endpoint: vmlinux5 (Endpoint) Return Code: 0 ------Standard Output--------> Platfom Linux ---> Operating System level 2.2.16 ---> Disk space and inode information for file system /opt/tivoli/dat/1 Filesystem 1k-blocks Used Available Use% Mounted on /dev/dasdc1 708636 146452 526188 22% /opt ---> Statistics for memory Amount of idle memory (kB): 8272 Amount of memory used as buffers (kB): 35736 ---> Environment variables BASH=/bin/sh BASH_VERSINFO=([0]="2" [1]="04" [2]="0" [3]="1" [4]="release" [5]="s390-suse-linux") BASH_VERSION='2.04.0(1)-release' COLORTERM=1 DIRSTACK=() ENDPOINT='vmlinux5 (Endpoint)' ENDPOINT_OID=1931340152.16.517+#TMF_Endpoint::Endpoint# EUID=0 FROM_HEADER=vmlinux5.itso.ibm.com GNOMEDIR=/opt/gnome GROUPS=() HISTCONTROL=ignoredups HOME=/root HOSTNAME=vmlinux5 HOSTTYPE=s390 IFS='' INFODIR=/usr/local/info:/usr/share/info:/usr/info INFOPATH=/usr/local/info:/usr/share/info:/usr/info INTERP=linux-s390 KDEDIR=/opt/kde LCFROOT=/opt/tivoli LCF_BINDIR=/opt/tivoli/bin/linux-s390/mrt

Chapter 15. IBM Tivoli Monitoring

643

LCF_CACHEDIR=/opt/tivoli/dat/1/cache LCF_DATDIR=/opt/tivoli/dat/1 LCF_LIBDIR=/opt/tivoli/lib/linux-s390 LCF_TEMPDIR=/tmp/ LC_CTYPE=en_US LD_LIBRARY_PATH=/opt/tivoli/dat/1/cache/lib/linux-s390:/opt/tivoli/dat/1:/opt/t ivoli/lib/linux-s390:/usr/lib LESS='-M -S -I' LESSCHARSET=latin1 LESSKEY=/etc/lesskey.bin LESSOPEN='|lesspipe.sh %s' LOGNAME=root LS_COLORS='no=00:fi=00:di=01;34:ln=01:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01 :ex=01;31:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.tar=00; 31:*.tgz=00;31:*.rpm=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z= 00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xb m=01;35:*.xpm=01;35:*.tif=01;35:*.png=01;35:' LS_OPTIONS='-a -N --color=tty -T 0' MACHTYPE=s390-suse-linux MAIL=/var/spool/mail/root MANPATH=/usr/local/man:/usr/share/man:/usr/man:/usr/X11R6/man:/usr/openwin/man MINICOM='-c on' NLSPATH=/opt/tivoli/generic/msg_cat/%L/%N.cat:/opt/tivoli/generic/msg_cat/%l/%N .cat:/opt/tivoli/generic/msg_cat/C/%N.cat NNTPSERVER=news OPTERR=1 OPTIND=1 OSTYPE=linux PAGER=less PATH=/opt/tivoli/bin/linux-s390/tools:/bin:/usr/bin:/usr/ucb:/usr/sbin:/sbin:/u sr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/ usr/games/bin:/usr/games:/opt/gnome/bin:/opt/kde/bin PIPESTATUS=([0]="0") POVRAYOPT=-l/usr/lib/povray/include PPID=23900 [email protected] PRINTER=lp PS4='+ ' PWD=/opt/tivoli/dat/1 QTDIR=/usr/lib/qt RC_LANG=en_US RC_LC_COLLATE=POSIX REMOTEHOST=tividc11 SHELL=/bin/bash SHELLOPTS=braceexpand:hashall:interactive-comments SHLVL=2 SUSE_DOC_HOST=localhost TERM=vt100 TEXINPUTS=':~/.TeX:/usr/share/doc/.TeX:/usr/doc/.TeX'

644

Troubleshooting Tivoli Using the Latest Features

TISDIR=/opt/tivoli/dat/1 UID=0 USER=root WINDOWMANAGER=/usr/X11R6/bin/kde XKEYSYMDB=/usr/X11R6/lib/X11/XKeysymDB XNLSPATH=/usr/X11R6/lib/X11/nls_='---> Environment variables' curdir=/opt/tivoli/dat/1 freemem=8272 ignoreeof=0 no_proxy=localhost os=Linux usedmem=35736 ---> List of the patches installed ------Standard Error Output-----############################################################################

15.6 Known problems and their resolutions IBM Tivoli Monitoring has some known issues that are reported in Appendix C of the IBM Tivoli Monitoring User’s Guide Version 5.1, SH19-4569.

15.7 Problem determination In this section, we describe, with examples, the necessary steps to troubleshoot a profile distribution and running engine on a Windows/UNIX endpoint.

Windows engine In this example, we use the following components: 򐂰 Endpoint: itsotiv1 򐂰 Tmw2kProfile: tmw2k.windows.pf 򐂰 Resource model: TMW_ParamServices – Parameter: Schedule – Cycle Time = 60s – Indication Services Stopped Service • Ocurrences=6, holes=0 • Send TEC event • ActionList = dm_mn_send_notice, Restart Service

Chapter 15. IBM Tivoli Monitoring

645

Windows endpoint profile distribution Before starting the distribution, we changed the debug level of the following components: TMR 򐂰 MDist 2 wmdist -D 9

򐂰 Gateway wgateway gateway set_debug_level 6

򐂰 Endpoint last.cfg - log_threshold=3

The first profile distribution compiles the MOF files to create the associations in the CIM repository. Example 15-2 shows an example of an ITM profile distribution on a Windows platform. Example 15-2 Profile distribution in Windows environment # Distributing the Tmw2kProfile tividc11:/>wdmdistrib -p tmw2k.windows.pf itsotiv1 AMW0162I - Operation successfully submitted. Distribution ID is 1931340152.249

# Checking distribution status tividc11:/>wmdist -l -i 1931340152.249 Name Distribution ID Targets Completed Successful Failed tmw2k.windows.pf(install) 1931340152.249 1 1(100%) 1(100%)

# Checking log TMR - $DBDIR/distmgr.log. Verifying distribution ID # 1931340152.249 2002/05/09 11:24:18 +06: Registering new distribution, ID = 1931340152.249 2002/05/09 11:24:20 +06: DBStatusUpdate started for ID= 2002/05/09 11:24:20 +06: retrieving repeater nodes via nodeElements() for distri bution 2002/05/09 11:24:21 +06: DBStatusUpdate completed for ID= 2002/05/09 11:24:22 +06: DBStatusUpdate started for ID= 2002/05/09 11:24:23 +06: retrieving repeater nodes via nodeElements() for distribution

646

Troubleshooting Tivoli Using the Latest Features

2002/05/09 11:24:23 +06: retrieving repeater nodes via nodeElements() for distribution 2002/05/09 11:24:24 +06: DBStatusUpdate completed for ID= 2002/05/09 11:24:25 +06: DBStatusUpdate started for ID= 2002/05/09 11:24:25 +06: retrieving repeater nodes via nodeElements() for distribution 2002/05/09 11:24:26 +06: DBStatusUpdate completed for ID= 2002/05/09 11:24:33 +06: DBStatusUpdate started for ID= 2002/05/09 11:24:33 +06: retrieving target nodes via nodeElements() for distribution 2002/05/09 11:24:34 +06: retrieving repeater nodes via nodeElements() for distribution 2002/05/09 11:24:35 +06: DBStatusUpdate completed for ID= # Checking log TMR - $DBDIR/AMW/logs/msg_tmw2k.windows.pf.log 1020961460000Thu May 9 11:24:20 2002GMTAMWcoremanaged_node21 028AMW --> 1931340152.249 - tmw2k.windows.pf(install) - AMW0162I Operation successfully submitted. Distribution ID is '1931340152.249'. 1020961475000Thu May 9 11:24:35 2002 GMTAMW coreitsotiv1 12526AMW MAX../../../../src/objects/TMNTUpcall/platform/NTTask_Engine_meth_imp.cpp< F

656

Troubleshooting Tivoli Using the Latest Features

>t_imp_DMMiddleLayer_Processor_TMNT_Task_start536992792MAX../../../../src/objects/TMNTUpcall/platform/NTTask_Engine_meth_imp.cpp< F>t_imp_DMMiddleLayer_Processor_TMNT_Task_terminate537003944return data = 0None. 1020963214000Thu May 9 16:53:34 2002GMTAMWtasktividc1111878< F>MAX../../../../src/objects/TMNTUpcall/platform/NTTask_Engine_meth_imp.cpp< F>t_imp_DMMiddleLayer_Processor_TMNT_Task_terminate537003944wdmdistrib -p tmw2k.unix.pf -J /backup/Tools/Jre itsodev3 AMW0162I - Operation successfully submitted. Distribution ID is 1931340152.263 tividc11:/>wmdist -l -i 1931340152.263 Name Distribution ID Targets Completed Successful Failed tmw2k.unix.pf(install) 1931340152.263 1 1(100%) 1(100%) 0( 0%). # Checking log TMR - $DBDIR/distmgr.log. Verifying distribution ID # 1931340152.263 2002/05/14 11:40:30 : Registering new distribution, ID = 1931340152.263 2002/05/14 11:40:32 : DBStatusUpdate started for ID= 2002/05/14 11:40:32 : retrieving repeater nodes via nodeElements() for distribution 2002/05/14 11:40:33 : DBStatusUpdate completed for ID= 2002/05/14 11:43:01 : DBStatusUpdate started for ID= 2002/05/14 11:43:01 : retrieving target nodes via nodeElements() for distribution 2002/05/14 11:43:02 : retrieving repeater nodes via nodeElements() for distribution 2002/05/14 11:43:02 : DBStatusUpdate completed for ID= # Checking log TMR - $DBDIR/AMW/logs/msg_tmw2k.unix.pf.log 1021376431000Tue May 14 11:40:31 2002 GMTAMWcoremanaged_node6510AMW --> 1931340152.263 tmw2k.unix.pf(install) - AMW0162I - Operation successfully submitted. Distribution ID is '1931340152.263'. 1021376583000Tue May 14 11:43:03 2002 GMTAMW core itsodev3 12744AMW >>> ENTRY May 14 11:31:41 Q mdist2_operation_receiver init_passing_data >>>> ENTRY May 14 11:31:41 Q mdist2_operation_receiver load_status_file >>>> ENTRY May 14 11:31:41 Q mdist2_operation_receiver Status file '/Tivoli/lcf/dat/1/LCFNEW/Tmw2k/dist.st' does not exist. New distribution!!! May 14 11:31:41 Q mdist2_operation_receiver return data = 0 May 14 11:31:41 Q mdist2_operation_receiver load_status_file qid=1 DBG|ASIEPSServer::LookupQueueSettings: Path="D:\TivoliManager\Data\Queues\ROOT-0001.que" MaxEntries=65536 CellSize=2048 DBG|ASIEPSServer::Enqueue: qid=00000001 length=896 returned 1 DBG|ASIEPSServer::Enqueue: qid=00000001 length=896 returned 1 . . . NOT|Shutting down due to service control stop command INF|Terminating worker thread NOT|Service shut down INF|Cleaning up

Figure 16-14 Enqueue proxy server log

Chapter 16. Tivoli Business Systems Manager

693

As shown in Figure 16-14 on page 693: 1

Indicates that the enqueue proxy server is ready.

2

The server connects to the queue ROOT-0001.que, a queue file for the propagation agent for ROOT object. The queue ID (qid) is 1.

3

When a message is enqueued, the server indicates the qid stored the message and the length of the message.

4

Shutdown sequence of the enqueue proxy server.

Remote execution server The ASIRemoteExecutionServer or Tivoli BSM Remote Execution Server runs in the propagation server to facilitate starting and stopping the propagation agent. The propagation agent itself is not started as a service, but as an ordinary process. Figure 16-15 on page 695 shows a sample remote execution server log file.

694

Troubleshooting Tivoli Using the Latest Features

INF|ASIRExecSrvr: 2.1.7.11 DBG|Adding an interface to ASIRExecServer Server DBG|RpcServerRegisterIf returned: 0 DBG|Adding the all protocols, dynamic endpoints to ASIRExecServer Server DBG|RpcServerUseProtseqEp returned: 0 DBG|Adding an interface to ASIRExecServer Server DBG|RpcServerRegisterIf returned: 0 DBG|Initializing ASIRExecServer Server DBG|Obtaining the binding vector DBG|Printing the binding vector for ASIRExecServer Server DBG|0: ncacn_np:\\\\IBMTIV5[\\pipe\\0000010D.001] DBG|1: ncalrpc:IBMTIV5[WMSG0000010D.00000001] DBG|2: ncacn_ip_tcp:9.3.4.55[1040] DBG|3: ncadg_ip_udp:9.3.4.55[1041] DBG|4: ncacn_nb_tcp:IBMTIV5[107] DBG|5: ncacn_http:9.3.4.55[1053] DBG|Exporting the ASIRExecServer Server as /.:/ASIRExecServer_IBMTIV5 to the name service database for 2 interfaces DBG|Registering endpoints for the ASIRExecServer Server in the endpoint map for 2 interfaces DBG|RpcEpUnregister returned: 1753 DBG|RpcEpUnregister returned: 1753 DBG|ASIRExecServer Server Started. Ready and Waiting to process Client Requests... . . . DBG|ASIRExecServer::ExecSvc: asipagent 1 INF|ASIRExecServer::ExecSvc: asipagent 1 --> 406 . . . DBG|ASIRExecServer::KillSvc: 406 INF|ASIRExecServer::KillSvc: 406 --> 259 . . . NOT|Shutting down due to service control stop command INF|Terminating worker thread NOT|Service shut down INF|Cleaning up

Figure 16-15 Remote execution server log file

In Figure 16-15, the indicated portions of the log are: 1

Initialization of the Remote execution server.

2

Processing a remote execution request; the command line is asipagent 1. The return code is the process ID (70).

3

Processing a remote kill request for process ID 70.

4

Shutting down the Remote execution server.

Chapter 16. Tivoli Business Systems Manager

695

Propagation agent The propagation agent is an executable, ASIPAgent.exe. The argument is the instance ID of the PA object. The propagation agent log file is given in Figure 16-16 and Figure 16-17 on page 697. DBG|Connecting to database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=Object-1], integrated security=0... DBG|Registering for ADO Connection events... DBG|Opening database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=Object-1], integrated security=0... DBG|ADO Connection Open. DBG|Successfully connected to database as [Object-1]. DBG|Database Client Process ID (SPID) = 30 DBG|ANSI_WARNINGS setting is OFF DBG|QUOTED_IDENTIFIER setting is OFF DBG|adCmdStoredProc(4) - { ? = call pa_setVersionInfo(?, ?) }( /*RETURN_VALUE=(R)*/@pa_id=1/*(I)*/, @version=N'02.01.07.12'/*(I)*/) DBG|[Microsoft][ODBC SQL Server Driver][SQL Server]pa_setVersionInfo @pa_id = 1, @version = '02.01.07.12' DBG|adCmdStoredProc(4) - { ? = call pa_setVersionInfo(?, ?) }( /*RETURN_VALUE=0(R)*/@pa_id=1/*(I)*/, @version=N'02.01.07.12'/*(I)*/) DBG|Closing database connection [PA ROOT]... DBG|ADO Connection Closed. DBG|Unregistering for ADO Connection events... DBG|Releasing database connection [PA ROOT]... DBG|Successfully closed database connection [PA ROOT]. DBG|Connecting to database with connection [Provider=SQLOLEDB;DRIVER=SQL Server;SERVER=ibmtiv5;APP=PA ROOT;], integrated security=0... DBG|Registering for ADO Connection events... DBG|Opening database with connection [Provider=SQLOLEDB;DRIVER=SQL Server;SERVER=ibmtiv5;APP=PA ROOT;], integrated security=0... DBG|ADO Connection Open. DBG|Successfully connected to database as [PA ROOT]. DBG|Database Client Process ID (SPID) = 30 DBG|ANSI_WARNINGS setting is OFF DBG|QUOTED_IDENTIFIER setting is OFF DBG|adCmdStoredProc(4) - { ? = call asisp_checkForDuplicateProcess(?) }( /*RETURN_VALUE=(R)*/@OldSPID=30/*(I)*/) DBG|adCmdStoredProc(4) - { ? = call asisp_checkForDuplicateProcess(?) }( /*RETURN_VALUE=0(R)*/@OldSPID=30/*(I)*/) INF|EPSLookupQueueSettings: begin INF|EPSLookupQueueSettings: return=23769480

INF|Initializing object class A1SC(Acc1Subcomponent) INF|Initializing object class ACC1(Access1) . . . INF|Initializing object class XTRM(NetworkXTerminal) DBG|Synchronizing with registry . . .

Figure 16-16 Propagation agent log-1/2

696

Troubleshooting Tivoli Using the Latest Features

INF|code=ExceptionDeleted eventno=1882 src_cno=836 src_id=2 dst_id=6 _DATE=2002/08/19 _TIME=16:06:51 _MSEC=0 _SEQ_NO=" " _OBJ_TYPE_ID="G02D " _OBJECT_NAME=" " _NATIVE_KEY=" " _EXCP_OBJ_TYPE=" " _EXCP_OBJ_NAME=" " _EXCP_NAME="WebSphere_MQ_Que" _EXCP_CD="e_MQ_QueueManagerUnavailable" _nEXCH=2 _AlertState=3 _Priority=2 _NumOwned=0 DBG|adCmdStoredProc(4) - { ? = call pa_prefetchPendingEventObjects(?, ?, ?) }( /*RETURN_VALUE=(R)*/@pa_id=1/*(I)*/, @last_eventno=1881/*(I)*/, @max_events=1000/*(I)*/) NOT|Prefetched to eventno 1882. Last dispatched eventno is 1882 DBG|adCmdStoredProc(4) - { ? = call pa_prefetchPendingEventObjects(?, ?, ?) }( /*RETURN_VALUE=0(R)*/@pa_id=1/*(I)*/, @last_eventno=1881/*(I)*/, @max_events=1000/*(I)*/) DBG|adCmdStoredProc(4) - { ? = call asisp_loadpropmatrices(?, ?, ?, ?) }( /*RETURN_VALUE=(R)*/@cid=N'G02D'/*(I)*/, @id=2/*(I)*/ /*@fix=(I)*/ /*@include_thresholds=(I)*/) DBG|adCmdStoredProc(4) - { ? = call asisp_loadpropmatrices(?, ?, ?, ?) }( /*RETURN_VALUE=0(R)*/@cid=N'G02D'/*(I)*/, @id=2/*(I)*/ /*@fix=(I)*/ /*@include_thresholds=(I)*/) DBG|PRMX PRST PRBA Critical High Medium Low Ignore PRMX_ID PRST_ID PRBA_ID DBG|ChildEventMatrix Count Yellow 0 0 0 0 0 0 0 120185 DBG|ChildEventMatrix Count Red 0 0 0 0 0 0 0 120186 DBG|ChildEventMatrix Max Yellow 0 0 0 0 0 0 60569 120101 DBG|ChildEventMatrix Max Red 0 0 0 0 0 0 60569 120102 DBG|ExceptionMatrix Count Yellow 0 0 0 0 0 1 0 120187 DBG|ExceptionMatrix Count Red 0 1 0 0 0 1 0 120188 DBG|ExceptionMatrix Max Yellow 0 0 0 0 0 1 60570 120103 DBG|ExceptionMatrix Max Red 0 0 0 0 0 1 60570 120104 NOT|ACTION 1 6 836 2: Setting AlertState to 1 NOT|ACTION 1 6 836 2: Generated Red/High ChildEvent with EventCount=2, Direction=-1 DBG|UPDATE G02Dcname_V SET _AlertStateID = 1 WHERE id = 2 DBG|asisp_notifyObjAttrUpdated @cid='G02D', @id=2, @value='GENC/STCR-1/ALRS-1/T/Desc', @attrname='AlertState', @attrid=0 DBG|asisp_notifyObjAttrUpdated @cid='G02D', @id=2, @value='1', @attrname='AlertStateID', @attrid=0 DBG|asisp_createCHEV 'UNIX-1', 'UNIX', 1 DBG|delete_CHEV 19

Figure 16-17 Propagation agent log-2/2

Chapter 16. Tivoli Business Systems Manager

697

As indicated in Figure 16-16 on page 696 and Figure 16-17 on page 697, the following constitutes the propagation agent processing: 1

The queue information is retrieved from the enqueue proxy server.

2

The agent initializes all the available classes’ propagation information at initialization. When there is a new class or changes in the propagation property of a class, the propagation agent needs to be restarted.

3

This indicates that an event is coming for object ID G002 (class ID 672) and instance ID 14. The alert state is 2 (Yellow) and priority is 1 (Critical).

4

The propagation agent loads the propagation matrices for the G002 class, including both the Child Event and Exception matrices.

5

The agent runs the SQL query to set the alert state of the object to 2.

6

The notification is invoked by the change of Alert state.

7

A child event is generated to its parent, an object of class NTSR (NT server) with the instance ID of 13.

Notification services for workstation Your enterprise is managed from the Tivoli Business Systems Manager workstation. The workstation is connected to the system through the Tivoli BSM Application server or ASIApplicationSvc process. Figure 16-18 shows the workstation processing component.

MSSQL Database

ASINotificationSvc

ASIApplicationSvc

Figure 16-18 Workstation processing component

698

Troubleshooting Tivoli Using the Latest Features

The Application server provides the workstations with the database access function. The workstations never access the database directly. Whenever there are any object changes, creation, or deletion, the necessary database trigger will ensure that the PendingNotification table is updated. This table is used by the Tivoli BSM Notification service, or ASINotificationSvc, to notify all workstations of the changes and, in turn, notify all the workstations that are connected to it.

Notification service processing The startup log of the notification server is shown in Figure 16-19 on page 700.

Chapter 16. Tivoli Business Systems Manager

699

INF|Entering ASIServiceApp::loadAppHookDll() INF|Attempting to dynamically load [D:\TivoliManager\bin\ASINotificationSvc.dll]... INF|Obtaining .DLL entry point with GetProcAddress(). INF|Successfully loaded Service Library (D:\TivoliManager\bin\ASINotificationSvc.dll) DBG|Calling ASIServiceDLLLoadFunc(@1) function... DBG|Call to ASIServiceDLLLoadFunc(@1) function succeeded. DBG|Exiting ASIServiceApp::initialize(). DBG|Calling ASIServiceApp::start(). DBG|Entering ASIServiceApp::start() DBG|Setting DesiredState registry to RUNNING. DBG|Calling start() on the dynamically loaded .DLL INF|Service Status Report: CurrentState=STARTING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASINotificationSvc is STARTING... INF|Service Status Report: CurrentState=STARTING WaitHint=15s ExitCode=0 CheckPoint=0 DBG|Connecting to database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=ASINotificationSvc], integrated security=0... DBG|Registering for ADO Connection events... DBG|Opening database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=ASINotificationSvc], integrated security=0... DBG|ADO Connection Open. DBG|Successfully connected to database as [ASINotificationSvc]. DBG|Database Client Process ID (SPID) = 28 DBG|ANSI_WARNINGS setting is OFF DBG|QUOTED_IDENTIFIER setting is OFF INF|Service Status Report: CurrentState=STARTING WaitHint=5s ExitCode=0 CheckPoint=1 INF|Service Status Report: CurrentState=STARTING WaitHint=5s ExitCode=0 CheckPoint=2 INF|Service Status Report: CurrentState=STARTING WaitHint=5s ExitCode=0 CheckPoint=3 INF|Service Status Report: CurrentState=RUNNING WaitHint=0s ExitCode=0 CheckPoint=4 NOT|Notification dequeuing thread starting NOT|RPC server thread starting DBG|ASINotificationSvc is RUNNING... INF|Adding an interface to ASINotification Server DBG|Service Status Report: CurrentState=RUNNING WaitHint=15s ExitCode=0 CheckPoint=0 DBG|RpcServerRegisterIf returned: 0 DBG|Adding the all protocols, dynamic endpoints to ASINotification Server CRT|CoInitialize has not been called.

Figure 16-19 Notification services: Startup

As seen in the log, all the currently active workstation sessions are acquired and stored. These session are identified by: 򐂰 TCP/IP address, such as ncacn_ip_tcp:9.3.240.132[1546] 򐂰 User ID that the machine is logged on as 򐂰 Windows networking OLE number, such as ncalrpc:AUSRES10[OLE4f]

700

Troubleshooting Tivoli Using the Latest Features

When a notification event is received, as shown in Figure 16-20, it notifies all the connected workstations. The notification is usually sent using the TCP/IP interface. Each workstation is notified using a separate notification thread.

DBG|ASINotificationServer::registerInstance -- Registering this server instance DBG|adCmdStoredProc(4) - { ? = call asisp_addappinstance(?, ?, ?, ?) }( /*RETURN_VALUE=(R)*/@app=N'ASINotificationSvc_IBMTIV5'/*(I)*/, @host=N'IBMTIV5'/*(I)*/, @binding=N'ncacn_ip_tcp:9.3.4.55[3031]'/*(I)*/, @pid=267/*(I)*/) DBG|[Microsoft][ODBC SQL Server Driver][SQL Server]added entry to application_session (application: ASINotificationSvc_IBMTIV5, host: IBMTIV5, binding handle:ncacn_ip_tcp:9.3.4.55[3031], pid: 267) DBG|adCmdStoredProc(4) - { ? = call asisp_addappinstance(?, ?, ?, ?) }( /*RETURN_VALUE=0(R)*/@app=N'ASINotificationSvc_IBMTIV5'/*(I)*/, @host=N'IBMTIV5'/*(I)*/, @binding=N'ncacn_ip_tcp:9.3.4.55[3031]'/*(I)*/, @pid=267/*(I)*/) DBG|ASINotificationServer::reloadActiveSessions begin DBG|adCmdStoredProc(4) - { ? = call asisp_activesessions }( /*RETURN_VALUE=(R)*/) DBG|adCmdStoredProc(4) - { ? = call asisp_activesessions }( /*RETURN_VALUE=0(R)*/) DBG|ASINotificationServer::reloadActiveSessions waiting on ServerNotify::m_cs DBG|ASINotificationServer::reloadActiveSessions acquired ServerNotify::m_cs DBG|ASINotificationServer::reloadActiveSessions: adding client[0]: 'ncacn_ip_tcp:9.3.4.55[3183]', 'vbudi', 'ncalrpc:IBMTIV5[OLE33]' INF|Adding client context for { 'ncacn_ip_tcp:9.3.4.55[3183]' 'vbudi' 'ncalrpc:IBMTIV5[OLE33]' } DBG|ASINotificationServer::reloadActiveSessions return DBG|ASINotification Server Started. Ready and Waiting to process Client Requests... DBG|ASINotificationServer::unregisterInstance -- Unregistering this server instance DBG|adCmdStoredProc(4) - { ? = call asisp_removeappinstance(?) }( /*RETURN_VALUE=(R)*/@app=N'ASINotificationSvc_IBMTIV5'/*(I)*/) DBG|adCmdStoredProc(4) - { ? = call asisp_removeappinstance(?) }( /*RETURN_VALUE=0(R)*/@app=N'ASINotificationSvc_IBMTIV5'/*(I)*/) DBG|RpcServerListen returned: 0

Figure 16-20 Notification services: Processing

Figure 16-21 on page 702 shows the shutdown process of the notification service. It first tries to terminate all the notification threads servicing the workstations.

Chapter 16. Tivoli Business Systems Manager

701

DBG|-------------------------------------DBG|ASINotification Server Thread has Exited. DBG|-------------------------------------DBG|ASINotificationServer::unregisterInstance -- Unregistering this server instance DBG|adCmdStoredProc(4) - { ? = call asisp_removeappinstance(?) }( /*RETURN_VALUE=(R)*/@app=N'ASINotificationSvc_IBMTIV5'/*(I)*/) DBG|adCmdStoredProc(4) - { ? = call asisp_removeappinstance(?) }( /*RETURN_VALUE=0(R)*/@app=N'ASINotificationSvc_IBMTIV5'/*(I)*/) NOT|RPC server thread exiting NOT|Dequeue, Forwarding, and RPCServer threads have shut down INF|Service Status Report: CurrentState=STOPPING WaitHint=15s ExitCode=0 CheckPoint=0 DBG|Closing database connection [ASINotificationSvc]... DBG|ADO Connection Closed. DBG|Unregistering for ADO Connection events... DBG|Releasing database connection [ASINotificationSvc]... DBG|Successfully closed database connection [ASINotificationSvc]. NOT|---- All support threads returned ---INF|Service Status Report: CurrentState=STOPPED WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASINotificationSvc is STOPPED... INF|Service Status Report: CurrentState=STOPPING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASINotificationSvc is STOPPING... DBG|Exiting ASIServiceApp::svcMain(). Stopping service... DBG|Entering ASIServiceApp::stop() INF|Service Status Report: CurrentState=STOPPING WaitHint=15s ExitCode=0 CheckPoint=0 INF|Service Status Report: CurrentState=STOPPED WaitHint=15s ExitCode=0 CheckPoint=0

Figure 16-21 Notification services: Shutdown

Application server processing The application server is the main workstation controller for Tivoli Business Systems Manager. Figure 16-22 on page 703 and Figure 16-23 on page 704 show the startup of the application server. The startup phase in Figure 16-22 on page 703 and Figure 16-23 on page 704 shows that it does the following: 򐂰 Loads the active sessions, identified by:

– TCP/IP address, such as ncacn_ip_tcp:9.3.240.132[1546] – User ID that the machine is logged on as – Windows networking OLE number, such as ncalrpc:AUSRES10[OLE4f] 򐂰 Initializes application server interfaces. The most useful interface that we used is the TCP/IP interface, which is bound with the handle ncacn_ip_tcp:9.3.187.190[1147].

702

Troubleshooting Tivoli Using the Latest Features

INF|Entering ASIServiceApp::loadAppHookDll() INF|Attempting to dynamically load [D:\TivoliManager\bin\ASIApplicationSvc.dll]... INF|Obtaining .DLL entry point with GetProcAddress(). INF|Successfully loaded Service Library (D:\TivoliManager\bin\ASIApplicationSvc.dll) DBG|Calling ASIServiceDLLLoadFunc(@1) function... DBG|Call to ASIServiceDLLLoadFunc(@1) function succeeded. DBG|Exiting ASIServiceApp::initialize(). DBG|Calling ASIServiceApp::start(). DBG|Entering ASIServiceApp::start() DBG|Setting DesiredState registry to RUNNING. DBG|Calling start() on the dynamically loaded .DLL INF|Service Status Report: CurrentState=STARTING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASIApplicationSvc is STARTING... INF|Service Status Report: CurrentState=STARTING WaitHint=20s ExitCode=0 CheckPoint=1 DBG|Connecting to database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=ASIApplicationSvc-000], integrated security=0... DBG|Registering for ADO Connection events... DBG|Opening database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=ASIApplicationSvc-000], integrated security=0... DBG|ADO Connection Open. DBG|Successfully connected to database as [ASIApplicationSvc-000]. DBG|Database Client Process ID (SPID) = 10 DBG|ANSI_WARNINGS setting is OFF DBG|QUOTED_IDENTIFIER setting is OFF . . . DBG|Successfully connected to database as [ASIApplicationSvc-015]. DBG|Database Client Process ID (SPID) = 26 DBG|ANSI_WARNINGS setting is OFF DBG|QUOTED_IDENTIFIER setting is OFF DBG|Allocated DB connection instance ASIApplicationSvc-000: #Allocated=1 DBG|ASIRequestServer::reloadActiveSessions begin DBG|adCmdStoredProc(4) - { ? = call asisp_languageFromLCID(?, ?) }( /*RETURN_VALUE=(R)*/@lcid=1033/*(I)*/, @language=NULL/*(IO)*/) DBG|[Microsoft][ODBC SQL Server Driver][SQL Server]asisp_languageFromLCID: @lcid=1033 DBG|Stored Procedure Output Params = {[RETURN_VALUE-Long(0)], [@language-String(255)]} DBG|Stored Procedure Output Params = {0, English} DBG|adCmdStoredProc(4) - { ? = call asisp_languageFromLCID(?, ?) }( /*RETURN_VALUE=0(R)*/@lcid=1033/*(I)*/, @language=N'English'/*(IO)*/)

Figure 16-22 Application server: Startup (1 of 2)

Chapter 16. Tivoli Business Systems Manager

703

DBG|Setting language to [English] from LocaleID=0x409 DBG|[Microsoft][ODBC SQL Server Driver][SQL Server]Changed language setting to us_english. DBG|ASIRequestServer::reloadActiveSessions waiting on ServerNotify::m_cs DBG|ASIRequestServer::reloadActiveSessions acquired ServerNotify::m_cs DBG|ASIRequestServer::reloadActiveSessions: adding client[0]: 'ncacn_ip_tcp:9.3.4.55[3183]', 'vbudi', 'ncalrpc:IBMTIV5[OLE33]' INF|Adding client context for { 'ncacn_ip_tcp:9.3.4.55[3183]' 'vbudi' 'ncalrpc:IBMTIV5[OLE33]' } DBG|ASIRequestServer::reloadActiveSessions return INF|Creating the client notify map INF|Starting notification thread INF|Started notification thread 1e0 DBG|Adding an interface to ASIApplicationSvc_V2_2 Server DBG|RpcServerRegisterIf returned: 0 DBG|Adding the all protocols, dynamic endpoints to ASIApplicationSvc_V2_2 Server DBG|Waiting for notification: #entries=0 DBG|RpcServerUseProtseqEp returned: 0 DBG|Adding an interface to ASIApplicationSvc_V2_2 Server DBG|RpcServerRegisterIf returned: 0 DBG|Printing the binding vector for ASIApplicationSvc_V2_2 Server DBG|0: ncalrpc:IBMTIV5[OLE37] DBG|1: ncacn_np:\\\\IBMTIV5[\\pipe\\000000FC.001] DBG|2: ncacn_ip_tcp:9.3.4.55[3026] DBG|3: ncadg_ip_udp:9.3.4.55[3027] DBG|4: ncacn_nb_tcp:IBMTIV5[108] DBG|5: ncacn_http:9.3.4.55[3029] DBG|Exporting the ASIApplicationSvc_V2_2 Server as /.:/ASIApplicationSvc_V2_2_IBMTIV5 to the name service database for 7 interfaces DBG|Registering endpoints for the ASIApplicationSvc_V2_2 Server in the endpoint map for 7 interfaces DBG|RpcEpUnregister returned: 1753 DBG|ASIRequestServer::appServerRegisterInstance -- Registering this Application Server instance DBG|[Microsoft][ODBC SQL Server Driver][SQL Server]added entry to application_session (application: ASIApplicationSvc_IBMTIV5, host: IBMTIV5, binding handle:ncacn_ip_tcp:9.3.4.55[3026], pid: 252) DBG|Deallocated DB connection instance ASIApplicationSvc-000: #Allocated=0 INF|Service Status Report: CurrentState=RUNNING WaitHint=30s ExitCode=0 CheckPoint=2 NOT|ASIApplicationSvc is RUNNING...

Figure 16-23 Application server: Startup (2 of 2)

704

Troubleshooting Tivoli Using the Latest Features

DBG|ASILocalApp::initialize() DBG|server: DBG|username: DBG|application DBG|Connecting to database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=ASILocalApp], integrated security=0... DBG|Registering for ADO Connection events... DBG|Opening database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=ASILocalApp], integrated security=0... DBG|ADO Connection Open. DBG|Successfully connected to database as [ASILocalApp]. DBG|Database Client Process ID (SPID) = 27 DBG|ANSI_WARNINGS setting is OFF DBG|QUOTED_IDENTIFIER setting is OFF NOT|Updating service status INF|Service Status Report: CurrentState=RUNNING WaitHint=0s ExitCode=0 CheckPoint=0 DBG|ASIServerApp::insertObj: inserted 0000000000 INF|Caching meta data table xmit structures... DBG| AttrTypeTable...13 rows DBG| LinkTypeTable...406 rows DBG| MethodTable...968 rows DBG| MethodParamTable...2046 rows DBG| MethodParamTypeTable...7 rows DBG| ObjClassTable...415 rows DBG| ObjLinkTable...767 rows DBG| ObjAttrTable...1137 rows DBG| ObjMethodTable...1916 rows DBG| ObjStaticTable...154 rows DBG| EnumerationTable...253 rows DBG| EnumValueTable...1555 rows DBG| MethodParamRangeTable...247 rows DBG| ObjAttrRangeTable...245 rows DBG| ObjStaticRangeTable...0 rows DBG| IsATable...523 rows DBG| IsAChainTable...2493 rows INF|Done caching meta data table xmit structures. INF|Service Status Report: CurrentState=RUNNING WaitHint=15s ExitCode=0 CheckPoint=0 INF|Starting status update thread... INF|Starting the debug break thread... NOT|ASIApplicationSvc_V2_2 Server Started. Ready and Waiting to process Client Requests...

Figure 16-24 Application server: Startup (2 of 2)

In Figure 16-24, the next phase of the application server initialization is entered. It loads the meta data information and prints out statistics on the content of database tables.

Chapter 16. Tivoli Business Systems Manager

705

The operation of the application server, shown in Figure 16-25, contains the following: 򐂰 Getting icon informations for an object 򐂰 Periodically reloading its session to verify the activity of the user 򐂰 Other display related works

NOT|ASIRequestServer updateThreadProc starting INF|Started the debug break thread. DBG|ASIStringToASIID() called from: [binding=[ncacn_ip_tcp:ibmtiv5], hostname=[IBMTIV5], lcid=0x409, clientVersion=[], tbsmUser=[], ntUser=[Administrator]] INF|StringToASIID "Data" DBG|Allocated DB connection instance ASIApplicationSvc-000: #Allocated=1 DBG|Calling ASISQLDBC::cancel()... DBG|adCmdStoredProc(4) - { ? = call string_to_id(?) }( /*RETURN_VALUE=(R)*/@string=N'Data'/*(I)*/) DBG|adCmdStoredProc(4) - { ? = call string_to_id(?) }( /*RETURN_VALUE=0(R)*/@string=N'Data'/*(I)*/) DBG|Deallocated DB connection instance ASIApplicationSvc-000: #Allocated=0 INF|StringToASIID "Data": -> 317

Figure 16-25 Application server: Processing

Figure 16-26 on page 707 shows a sample shutdown sequence for the application server.

706

Troubleshooting Tivoli Using the Latest Features

DBG|Entering ASIServiceApp::stop() INF|Service Status Report: CurrentState=STOPPING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASIApplicationSvc is STOPPING... INF|Service Status Report: CurrentState=STOPPING WaitHint=15s ExitCode=0 CheckPoint=0 INF|Service Status Report: CurrentState=STOPPING WaitHint=30s ExitCode=0 CheckPoint=0 INF|Terminated status update thread INF|Terminated debug break thread DBG|Allocated DB connection instance ASIApplicationSvc-000: #Allocated=1 DBG|ASIRequestServer::appServerUnregisterInstance -- Unregistering this Application Server instance DBG|Deallocated DB connection instance ASIApplicationSvc-000: #Allocated=0 INF|Terminated notification thread INF|No database contexts currently allocated, shutting down DBG|Closing database connection [ASIApplicationSvc-000]... DBG|ADO Connection Closed. DBG|Unregistering for ADO Connection events... DBG|Releasing database connection [ASIApplicationSvc-000]... DBG|Successfully closed database connection [ASIApplicationSvc-000]. DBG|Closing database connection [ASIApplicationSvc-001]... DBG|ADO Connection Closed. . . . DBG|-------------------------------------DBG|ASIApplicationSvc_V2_2 Server Thread has Exited. DBG|-------------------------------------NOT|RPC server thread shut down, exiting NOT|---- All support threads returned ---INF|Service Status Report: CurrentState=STOPPING WaitHint=15s ExitCode=0 CheckPoint=0 DBG|Exiting ASIServiceApp::svcMain(). Stopping service... ERR|ERROR: RpcEpUnregister returned: 1753 INF|Service Status Report: CurrentState=STOPPING WaitHint=10s ExitCode=0 CheckPoint=1 DBG|Closing database connection [ASILocalApp]... DBG|ADO Connection Closed. DBG|Unregistering for ADO Connection events... DBG|Releasing database connection [ASILocalApp]... DBG|Successfully closed database connection [ASILocalApp]. INF|Service Status Report: CurrentState=STOPPED WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASIApplicationSvc is STOPPED... ERR|Caught unknown exception while attempting to stop the service DBG|Exiting ASIServiceApp::stop()

Figure 16-26 Application server: Shutdown

Rule processing As an event based management solution, Tivoli Business Systems Manager has the ability to process different events differently. This behavior is governed using a set of rules recorded in clip files (files with the .clp extension).

Chapter 16. Tivoli Business Systems Manager

707

These rule files typically reside in TivoliManager\Data\Rules, and are loaded at the initialization of the processor. To modify the rules, you must restart the respective process. Most of these rules resides in the database server, except for the MVS upload rule, which resides on the event server.

16.1.3 Enterprise Edition In this section, we discuss the structure and operation of the TBSM Enterprise edition components. The Enterprise Edition comes in an OS/390 mainframe and input component on the Windows NT servers.

The Source/390 Figure 16-27 shows the block diagram of the Source/390 component of Tivoli Business Systems Manager that runs as the monitoring agent in OS/390 systems.

S ource/390 M VS C onsole

TM /390 Object Pump Traps

TM/390 D ataspace

Pe rfo rm ance Produ cts External data interface

CIC S TD Q

Ne tView PPI

TM /390 O bject Server

VTAM

Figure 16-27 Source/390 system

As shown in Figure 16-27, the Source/390 component consists of three started tasks: 򐂰 The dataspace acts as an intermediary for inter-process communication between the object pump and the object server. The communication involves the exchange of the event messages and object status.

708

Troubleshooting Tivoli Using the Latest Features

򐂰 The object pump traps all messages and exceptions from various sources, such as the system console and other data sources, such as the performance monitoring subsystems, RODM notification, and so on. 򐂰 The object server handles the communication and LU 6.2 connection to the NT systems.

To better illustrate the processing of the Source/390, this is the start up sequence for the address spaces of Source/390: 1. First, the dataspace is started. The dataspace must show the GTM5010I message before we can proceed. The startup log for the data server is provided in Figure 16-28.

GTM5000I GTM4110I GTM5002I GTM5101I GTM5161I GTM5030I GTM5010I

TM/390 DATASPACE INITIALIZATION STARTING GTMDSPC , USING ID 01 PROCESSING PARMLIB MEMBER: DSPCSC66 TM/390 DATASPACE CREATED, SIZE= 10000000, ORIGIN=00000000 IPL SENSED MODIFY INTERFACE ESTABLISHED, CONSOLE COMMUNICATION AVAILABLE TM/390 DATASPACE INITIALIZATION COMPLETE

Figure 16-28 Tivoli Business Systems Manager data server startup log

2. The object server is started and connected to the data server, as seen in message GTM4041I in Figure 16-29 on page 710. It then allocates LU 6.2 sessions, as indicated by message GTM7406I in Figure 16-29 on page 710. The transaction programs are initialized as indicated by the GTM7424I messages.

Chapter 16. Tivoli Business Systems Manager

709

GTM4000I TM/390 INITIALIZATION STARTING GTM4600I EXTENDED RECOVERY ENVIRONMENT ESTABLISHED GTM4033I TM/390 SERVER SENSED, SERVER=GTMDSPC GTM4110I GTMSRVR , USING ID 01 GTM4002I PROCESSING PARMLIB MEMBER: SRVRSC66 GTM4002I PROCESSING PARMLIB MEMBER: PARMFP GTM4003I FOCAL POINT SET TO: SC03000I GTM4041I CONNECTION TO DATASPACE SUCCESSFUL, SERVER=GTMDSPC , START DATE=2000.073, START TIME=12:56:27.359634 GTM7400I TM/390 APPC MGR : INITIALIZATION IN PROGRESS GTM7404I TM/390 APPC MGR : OPENING VTAM APPLID TM39066 GTM4200I ALLOCATION SUCCESSFUL, DDNAME=ACC1LOG , S99ERROR=0000, S99INFO=0000, DSNAME=TIVUSER.SC66.SRVR.LOG1 GTM7424I TM/390 APPC MGR : INITIALIZING PROGRAM ACC1SEND GTM4030I MODIFY INTERFACE ESTABLISHED, CONSOLE COMMUNICATION AVAILABLE GTM4010I TM/390 INITIALIZATION COMPLETED GTM7406I TM/390 APPC MGR : SESSIONS. TOTAL=00004, WINNERS=00001, LOSERS=00001 GTM7424I TM/390 APPC MGR : INITIALIZING PROGRAM ACC1RECV

Figure 16-29 Tivoli Business Systems Manager object server startup log

3. The object pump is started, and, as indicated by the log in Figure 16-30 on page 711, it performs the following: – Issues the MONITOR command – Allocates a MCS console, as indicated by IEA630I – Connects to the object server, as indicated by GTM1620I – Start the registration process, as indicated by GTM1770I and GTM1780I

710

Troubleshooting Tivoli Using the Latest Features

GTM7500I INITIALIZATION IN PROGRESS GTM4600I EXTENDED RECOVERY ENVIRONMENT ESTABLISHED GTM4110I GTMPUMP , USING ID 01 GTM7508I TM/390 SERVER DETECTED, JOBNAME=GTMDSPC GTM7801I STORAGE ALLOCATED FOR 10,016 TRAPS MN JOBNAMES,T MONITOR SESS,T IEA630I OPERATOR TM39001 NOW ACTIVE, SYSTEM=SC66 , LU=TM39001 GTM7545I T390 : SUBSYSTEM INITIALIZED GTM7890I PPI RECEIVER IS ACTIVE GTM7501I RUNNING INITIAL REXX EXEC : $ACCINIT GTM0001I TM/390 INITIALIZATION STARTED - 13 Mar 2000 12:58:11 GTM0002I SYSTEM WAS IPL'D ON 03/13/2000 (031300) AT 12:43:12 GTM0003I TM/390 IS RUNNING ON SYSTEM SC66 GTM2101I LOG PROCESSING IS AVAILABLE. DDNAME = ACC1LOG GTM2102I DSNAME = SYSOUT(X) GTM2104I THE LOG WILL BE CLOSED AND OPENED ON THE FOLLOWING INTERVAL: 04:00:00 GTM1620I OBJECT PUMP TO OBJECT SERVER HANDSHAKE STARTED GTM0220I TM/390 LOGON PROCESSING INITIALIZATION STARTED ... GTM1001I TM/390 EVENT MANAGER INITIALIZATION STARTED ... GTM9520I TM/390 COMMAND PROCESSOR INSTALLED GTM0990I TM/390 INITIALIZATION COMPLETED GTM1770I ALL REQUIRED SHARED VARIABLES HAVE BEEN REGISTERED, PROCESSING CONTINUES GTM1780I OBJECT PUMP/OBJECT SERVER IS REQUESTING OBJECTS

Figure 16-30 Tivoli Business Systems Manager object pump startup log

The object registration handshake is explained from the NT side in the next section and the overall data exchange is discussed in “Objects registration process” on page 714. The TBSM Enterprise edition is supplied as SMP/E installable image in OS/390. The following datasets are the default datasets of TBSM Enterprise edition: 򐂰 GTM.SGTMEXEC contains various REXX programs for Tivoli Business Systems Manager. 򐂰 GTM.SGTMINST contains the SMP/E installation JCL for Tivoli Business Systems Manager. 򐂰 GTM.SGTMMODS contains the APF authorized module for Tivoli Business Systems Manager. 򐂰 GTM.SGTMMSGS contains the message library and macros for Tivoli Business Systems Manager.

Chapter 16. Tivoli Business Systems Manager

711

򐂰 GTM.SGTMSAMP contains sample jobs, parameters, and initialization for Tivoli Business Systems Manager.

During the course of the installation of the TBSM Enterprise edition interfaces, you may need to concatenate these datasets or copy some of their members. Attention: We have found that the SGTMMODS dataset supplies its own REXX environment, IRXANCHR. It is not advisable to override the REXX environment provided in SYS1.LINKLIB. Therefore, when you need to concatenate the SGTMMODS to the STEPLIB of another job or started task that runs REXX (for example NetView), create your own module and copy all SGTMMODS members except those starting with ‘IRX.’

OS/390 input component The OS/390 input component receives and replies to events from the Source/390. Figure 16-31 shows the diagram of this component.

Source/390 component

SNA server SNA client TPSTART.EXE

ASIM VSListenerSvc ACC1RCV

SYSn.que ASIMVSEventHandlerSvc

ASIMVSSenderSvc ACC1RECV

SYSn-Upload.que

ASIMVSIPListenerSvc

ASIEnqueueProxyServer BCP load files

ASIMVSUploadRuleSvc

M SSQL database

Figure 16-31 OS/390 input component

Tivoli BSM MVS Listener or ASIMVSListenerSvc receives messages or exceptions from the OS/390 system. Once the connection to the OS/390 system is established by the object pump, the SNA server and the Source/390 object

712

Troubleshooting Tivoli Using the Latest Features

server attempt to allocate a conversation. When the Source/390 object pump is started, the ACC1RCV transaction program (TP) is automatically started. ACC1RCV is the TP name for the MVS Listener. The Tivoli BSM MVS Listener is an auto-start transaction program. It is automatically started by the TPSTART program, and starts running when a message is sent by the object pump after the connection between the Tivoli Business Systems Manager and the SNA server is established. It stops when the connection is lost. After the program is initialized, the MVS listener begins to receive data from the OS/390 system and looks for the existence of a queue file that has the same identifier as the OS/390 system being monitored. For example, the queue file for SC66 is SC66.que. If the queue file exists, it begins to insert data into the file. If the file is not present, the MVS listener then initializes a message queue file and begins to insert data into it. The Tivoli BSM MVS Event handler service (ASIMVSEventHandlerSvc) periodically checks the message queue file for data. If the data is present, it reads the message from the queue and inserts it into the database. There are some commands that are executed automatically during the startup of Source/390 following a system IPL or Source/390 restart. These commands perform such tasks as initializing Source/390, registering objects, and requesting file status. You can also use the Source/390 command from a Tivoli Business Systems Manager workstation using context menus of the operating system object. The automatic execution of these commands results in Source/390 sending state information in the form of messages to the Tivoli Business Systems Manager NT Servers. Upon receipt of these state messages, the Tivoli BSM MVS Upload Rule Server service (ASIMVSUploadRuleSvc) evaluates the information, formulates the proper commands to send, and finally uploads the proper command or command set to Source/390, where they are executed. The MVS upload rule server service runs on any one of the NT Servers within the Tivoli Business Systems Manager Server suite. It is typically configured on the same host as the MVS listeners and MVS event handlers, because they are also used for the processing of data from Source/390. The upload rule processing is triggered by the event handler upon inserting the event into the database. In addition to processing messages regarding the initialization of Source/390, the MVS upload rule server evaluates other conditions that are of concern to the proper execution of the Source/390 environment. When the OS/390 upload is enabled, the reply message is sent back to the event server machine through the Tivoli BSM Enqueue Proxy Server or

Chapter 16. Tivoli Business Systems Manager

713

ASIEnqueueProxyServer, which puts the events in an upload queue file. An example of an upload queue file in our example environment is SC66-Upload.que. The Tivoli BSM MVS Sender service (ASIMVSSenderSvc) checks the queue files and sends the message back to the Source/390 using the ACC1RECV transaction program, which invokes the ACC1RECV program in the object server address space. See the second message, GTM7424I, in Figure 16-30 on page 711. Another communication method between OS/390 and the event server is the TCP/IP connection. The Tivoli BSM MVS IP listener receives pre-discovery files from OS/390 and prepare them to be loaded and discovered to the Object database.

Objects registration process In this section, we discuss in more detail the object registration process. The object registration process is a conversational mechanism from the object pump to the Tivoli Business Systems Manager input component for getting an automated status of any OS/390 object. Figure 16-32 shows a flow chart on the initial connection of the Source/390 to the Tivoli Business Systems Manager input component.

01/01 OS identification

OS/390 -> WinNT

WinNT -> OS/390

02/12 Variable creation ENT - COMP - MACH LPAR-OS

02/04 Request objects

02/05 TrapCreation All object traps under the OS (STC, batch, DB2,IMS, CICS etc)

02/20

02/09 Omegamon For each Omegemon objects

02/10

02/15

Batch Registration

TDQ Registration

RMF Registration including each metrics in RMF profile

Figure 16-32 Initial connection for Tivoli Business Systems Manager connection

The message exchange is conducted in an internal form. You can peek on the messages in the queue files that are used by the Listener and Sender services. These queue files are stored in D:\TivoliManager\Data\Queues. These queue files are named after the OS/390 system that they belong to. For example, SC66

714

Troubleshooting Tivoli Using the Latest Features

has a listener queue called SC66.que and a sender queue called SC66-Upload.que. The message is separated into fields with the backslash character (\, ASCII x’5C’, EBCDIC x’E0’) and ended with a tilde character (~, ASCII x’7E’, EBCDIC x’A1’). The first two fields are the format type and action type fields. The format type and action type fields uniquely differentiate the messages for usage and field contents. A sample message is given in Figure 16-33.

Format type - Action type fields 02 Functional command 05 Object monitoring

A typical field 04 Field type object identification rest data

Tilde, end of message

02\05\ 02STC\03NPM\ 040004E4000C\ 45ACTIVE\46FNM025~ Figure 16-33 Sample message

In the case of an STC resource called NPM in our SC66 operating system, we can see the data exchanged, as shown in Figure 16-34.

data sent from OS/390

1

01\01\002000022013:09:46.710538\01SC668\02OS\03SC66~

3

02\04\002000022013:12:05.635942\01SC6613\02OS\03SC66\040000250006~

5

02\01\002000022013:14:02.920700\01SC6620\02STC\46IEF403I\03NPM\ 040004E6000C\45ACTIVE\49NPM - STARTED - TIME=13.14.02 - ASID=0089.~ SC66-upload.que

2

02\12\02ENT\03ITSO\040000040001~ 02\12\02COMP\03POUGHKEEPSIE\040000090002~ 02\12\02MACH\03SC66\0400001B0003~ 02\12\02LPAR\03PRIMARY\040000200005~ 02\12\02OS\03SC66\040000240006~

4

02\05\02STC\03NPM\040004E4000C\45ACTIVE\46IEF403I\45INACTIVE\46IEF404I~

Figure 16-34 Queue file contents

The object registration event occurs in the following sequence: 1. When the object pump is connected to the listener, it sends the identification noting the time stamp of the event and that the operating system name is SC66. The event is received by the MVS listener and stored to the database through the MVS event handler. 2. The variable registration messages are sent by the MVS sender services.

Chapter 16. Tivoli Business Systems Manager

715

3. Upon receiving the variables, the object pump sends the indication that it is ready to receive the list of objects to be monitored using the 02\04 record by sending the SC66 OS’s object ID. 4. The ASIMVSUploadRuleSvc evaluates the message received by the listener and puts the appropriate message to reply in the upload queue. The ASIMVSSenderSvc reads the SC66-Upload.que file and sends it to the object pump. 5. When the event that is trapped occurs, the object pump initiates the 02\01 message to announce the status change of the affected object. The following is a list of commonly used field identifications: 00

Time stamp in the format of YYYYMMDDHH:MM:SS.UUUUUU.

02

Object type ID, similar to the content of cid column in obj_class table.

03

Object name.

04

Native key. A unique 10 character object identification that is constructed of the hexadecimal value of the object ID and the class ID.

45

State. The value that will go to the State attribute of an object.

46

Message ID that is trapped. This must also exist in the MessageDescription table.

49

Description text.

By evaluating the queue files and the Tivoli Business Systems Manager log files, you may be able to determine the problem in the object registration process.

Processing flow for OS/390 related components This section provides an detailed discussion on each component processing as indicated by its log files. The log files shown here are created with log level of 0. The processes are: 򐂰 MVS listener 򐂰 MVS event handler 򐂰 MVS upload rule services 򐂰 MVS enqueue proxy server 򐂰 MVS sender service

716

Troubleshooting Tivoli Using the Latest Features

MVS listener The listener process writes to two log files. The first file has the prefix of LS; it contains the listener initialization before it knows that the OS/390 image has contacted it. The second has the prefix of MVSL_ and contains the SMF ID of the OS/390 that triggers the listener process. Most of the initialization errors are found in the LS log file. A sample LS log file is shown in Figure 16-35.

INF|Successfully loaded Service Library (c:\TivoliManager\bin\ASIMVSListenerSvc.dll) INF|Service Status Report: CurrentState=STARTING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASIMVSListenerSvc is STARTING... NOT|Accepted conversation for ID=0x6054078C NOT|Looking for PartnerLU [USIBMSC.TM39066] value in subkeys under [Components\ASIMVSListenerSvc\Instances]. NOT|Found instance [SC66] by PartnerLU. NOT|Program is running as a service, State Control thread will not be created. NOT|Unable to retrieve RetryWait value from registry. Using default of 15000 DBG|Setting RetryWait=[15000] NOT|Unable to retrieve MaxRetries value from registry. Using default of 20 DBG|Setting MaxRetries=[20] DBG|Setting RemoteCodePage=[37] DBG|Setting LocalCodePage=[1252] DBG|Setting QueuePath=[C:\TivoliManager\Data\Queues\SC66.que] DBG|Setting MaxEntries=[32000] DBG|Setting CellSize=[258]

Figure 16-35 Sample LS log file

Some of the important information in this log file is: 1

When you see that this file is created, it means TPSTART has successfully launch the listener process.

2

The accepted conversation indicates that all SNA setup is fine and the communication is started. A failure to accept a communication is most likely caused by a missing SNA client license. Use the Control Panel applet Licensing and add the client license in the SNA server machine.

3

The listener will then match the partner LU with the PartnerLU key under the operating system instances; when a match is found, it initializes itself to that instance and creates the MVSL log.

The sample MVSL log file is shown in Figure 16-36 on page 718.

Chapter 16. Tivoli Business Systems Manager

717

NOT|Conversation established with LU [USIBMSC.TM39066], Conversation ID: 0x6054078C NOT|Initialized queue C:\TivoliManager\Data\Queues\SC66.que, Max Entries: 32000, Cell Size: 258) DBG|Setting OSID=[0000070006] DBG|Clearing MVSListener_Down on the OS to signify that there is a LU6.2 connection and the listener is up and running... DBG|02\01\002000120117:40:29.000000\01SC66000\02OS\03SC66\040000070006\59MVSListener_Down\60 OK~ DBG|Enqueuing the MVSListener_Down exception CLEAR event DBG|Enqueue of MVSListener_Down exception CLEAR completed. INF|Service Status Report: CurrentState=RUNNING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASIMVSListenerSvc is RUNNING... DBG|status received is CM_CONFIRM_RECEIVED DBG|Received Data: 01\01\002000120116:31:55.272064\01SC661\02OS\03SC66\7311/28/2000\7408:46:51\759.12.2.27\76wt sc66oe\77USIBMSC.SC66M~ DBG|Received Data: 02\04\002000120116:41:31.340521\01SC662\02OS\03SC66\040000070006~ DBG|status received is CM_CONFIRM_RECEIVED . . . FAT|cmrcv returned abnormal return code (27) DBG|MVSListener_Down: LU6.2 connection and MVSListener are down. Possible causes include stoppage or failure of: MVS System, VTAM Communiciations, TBSM Source/390 Server, SNA Server, or SNA Client. DBG|02\01\002000120118:56:48.000000\01SC66000\02OS\03SC66\040000070006\59MVSListener_Down\60 MVSListener_Down: LU6.2 connection and MVSListener are down. Possible causes include stoppage or failure of: MVS System, VTAM Communiciations, TBSM Source/390 Server, SNA Server, or SNA Client.~ DBG|Enqueuing the MVSListener_Down exception SET event DBG|Enqueue of MVSListener_Down exception SET completed. NOT|MVS dequeuer: start function returning INF|Service Status Report: CurrentState=STOPPING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASIMVSListenerSvc is STOPPING...

Figure 16-36 Sample MVSL log file

The important parts of the MVSL log file entries shown in Figure 16-36 are:

718

1

The unique object ID for the requesting operating system; the 10 digit hexadecimal number contains the ID and class ID of the object.

2

Clearing the MVS listener down exception on the OS object.

3

Receiving the 01/01 request, request variables information.

4

Receiving the 02/04 request, request for objects.

Troubleshooting Tivoli Using the Latest Features

5

Termination sequence started; when the object server is stopped, the MVS listener down exception is generated to the OS object.

6

Termination of the listener service.

MVS event handler The MVS event handler gets the queue messages from the MVS listener and puts them into the Tivoli Business Systems Manager database. It also notifies the MVS upload rule services to process the message. The first event that an event handler usually processes is the MVS listener up event; the processing sequence is shown in Figure 16-37. The MVS Listener up event is processed by invoking the stored procedure Process_FunctionalCommand_Update. The Processed record: message indicates that the processing is completed.

DBG|5433: 02\01\002000120117:40:29.000000\01SC66000\02OS\03SC66\040000070006\59MVSListener_D own\60OK~ DBG|Process_FunctionalCommand_Update @DATE_TIME=2000120117:40:29.000000, @NATIVE_KEY=0000070006, @OBJECT_NAME=SC66, @CHILD_OBJ_TYPE=(null), @DISCOVERY_IND=(null), @EXCP_NAME=MVSListener_Down DBG|dbnextresult -> 1 DBG|Process_FunctionalCommand_Update returned 2 INF|Processed record:

Figure 16-37 MVS event handler: Processing MVS listener up

The request for object message (01/01) that is received from the listener is processed, as shown in Figure 16-38 on page 720.

Chapter 16. Tivoli Business Systems Manager

719

DBG|5434: 01\01\002000120116:31:55.272064\01SC661\02OS\03SC66\7311/28/2000\7408:46:51\759.12 .2.27\76wtsc66oe\77USIBMSC.SC66M~ DBG|Process_Identification_Pump: @src_cid=OS , @src_id=7, @FormatType=1, @ActionType=1, @DATE_TIME=2000120116:31:55.272064, @SEQ_NO=SC661, @OBJ_TYPE_ID=OS, @OBJECT_NAME=SC66, @IPL_DATE=11/28/2000, @IPL_TIME=08:46:51, @IP_ADDRESS=9.12.2.27, @IP_DNS_NAME=wtsc66oe, @QUALIFIED_NETNAME=USIBMSC.SC66M DBG|Process_Identification_Pump: system-closing 0 outdated WTOR exceptions as occured before IPL date/time [Dec 1 2000 4:31PM] DBG|Process_Identification_Pump: inserted=0 records into StagedEXCP DBG|asisp_createObjPumpIdentMsg: @OBJECT_NAME=SC66, DATETIME='Dec 1 2000 4:31PM', SEQ_NO='SC661' DBG|asisp_createObjPumpIdentMsg: creating an audit message (ASI_01\01) on the target OS-7(SC66) DBG|asisp_createObjPumpIdentMsg: enqueuing Variable Registration (02\12) MNRE request via SendMVSVariableReg DBG|dbnextresult -> 1 DBG|Process_Identification_Pump returned 2 INF|Processed record:

Figure 16-38 MVS event handler: Variable identification

The following sequence is observed in Figure 16-38: 1

The 01/01 message is received.

2

The event handler invoked the Process_Identification_Pump stored procedure.

3

The asisp_createObjPumpIdentMsg is invoked.

4

The MVS Upload rule server is invoked with SendMVSVariableReg and the operating system object ID as the parameters.

Figure 16-39 on page 721 shows the events for the request object registration (02/04) message.

720

Troubleshooting Tivoli Using the Latest Features

DBG|5435: 02\04\002000120116:41:31.340521\01SC662\02OS\03SC66\040000070006~ DBG|asisp_createReqObjsMsg: @OBJECT_NAME=SC66, DATETIME='Dec 1 2000 4:41PM', SEQ_NO='SC662' DBG|asisp_createReqObjsMsg: creating an audit message (ASI_02\04) on the target OS-7(SC66) DBG|asisp_createReqObjsMsg: enqueuing Object Registration (02\05) MNRE request via SendMVSRegisterObjects DBG|asisp_createReqObjsMsg: enqueuing TDQ Registration (02\15) MNRE request via SendMVSTDQMsgCapture DBG|asisp_createReqObjsMsg: enqueuing OmegamonLogon (02\09) MNRE request via SendMVSOmegamonConnect DBG|asisp_createReqObjsMsg: enqueuing Action Traps (02\10) MNRE request via SendMVSActionTraps DBG|asisp_createReqObjsMsg: enqueuing RMF Registration (02\20) MNRE request via SendMVSRMFRegistration DBG|dbnextresult -> 1 DBG|Process_FunctionalCommand_ReqObjs returned 2 INF|Processed record:

Figure 16-39 MVS Event handler: Object registration

In Figure 16-39, the following is observed: 1

The 02/04 message is received and the stored procedure Process_FunctionalCommand_ReqObjs is invoked.

2

The stored procedure asisp_createRegObjMsg is invoked and calls the MVS upload rule service to process the various registration processes, such as SendMVSRegisterObjects, SendMVSTDQCapture, SendMVSOmegamonConnect, SendMVSActionTraps, and SendMVSRMFRegistration.

3

The Process_FunctionalCommand_ReqObjs is completed.

When the object pump and object server is stopped, the MVS listener code is stopped. As shown in Figure 16-36 on page 718, the MVS listener down is triggered when the MVS listener process is stopped. Figure 16-40 on page 722 shows the event handler process for that event.

Chapter 16. Tivoli Business Systems Manager

721

DBG|5436: 02\01\002000120118:56:48.000000\01SC66000\02OS\03SC66\040000070006\59MVSListener_D own\60MVSListener_Down: LU6.2 connection and MVSListener are down. Possible causes include stoppage or failure of: MVS System, VTAM Communiciations, TBSM Source/390 Server, SNA Server, or SNA Client.~ DBG|Process_FunctionalCommand_Update @DATE_TIME=2000120118:56:48.000000, @NATIVE_KEY=0000070006, @OBJECT_NAME=SC66, @CHILD_OBJ_TYPE=(null), @DISCOVERY_IND=(null), @EXCP_NAME=MVSListener_Down DBG|dbnextresult -> 1 DBG|Process_FunctionalCommand_Update returned 2 INF|Processed record:

Figure 16-40 MVS event handler: MVS listener down event

MVS upload rule services Figure 16-41 shows the first part of the initialization of the MVS upload rule services.

INF|286|Initializing Rule Engine. DBG|288|Aquiring registry keys. DBG|1366|Initializing database connection #1. DBG|1372|Database Settings: SERVER=[itsovas3], USER=[sa], DATBASE=[Object], APPLICATION=[ASIMVSUploadRuleObject] DBG|249|Changed database context to 'Object'. DBG|1427|Initializing database connection #2. DBG|1433|Database Settings: SERVER=[itsovas3], USER=[sa], DATBASE=[ASIRuleSvc], APPLICATION=[ASIMVSUploadRuleBase] DBG|249|Changed database context to 'ASIRuleSvc'. DBG|314|Instantiating COM ASIRuleEngine object. DBG|140|Starting router thread... DBG|147|Starting command loop thread... DBG|257|GetNextEvent: waiting for message DBG|321|Instantiating Rule Engine Event Handler. DBG|697|Initializing the RuleBase. INF|1736|Loading rule initialization file: [C:/TivoliManager//Data/Rules/ASIRuleSvc.CLP] INF|1736|Loading rule initialization file: [C:/TivoliManager//Data/Rules/ASIMVSUploadRuleSvc.CLP] DBG|249|Changed database context to 'ASIRuleSvc'. INF|1601|Trying to load the sync record from the rulebase. DBG|249|GetSync @RuleBaseName=MVSUpload

Figure 16-41 MVS upload rule service: Initialization

722

Troubleshooting Tivoli Using the Latest Features

The following events are reflected in the log file in Figure 16-41 on page 722: 1

It loads two CLP files: The ASIRuleSvc.CLP and ASIMVSUploadRuleSvc.CLP.

2

It synchronizes with ASIRuleSvc; the table that contains the synchronization information is ObjectSync. When you create a new OS/390 object, we need to clear this table.

The second part of the initialization loads all the objects that it needs to work on, which are the OS/390 related objects. The sample entries in the log file for the operating system objects for SC42 and SC66 are shown in Figure 16-42 on page 724. Note that the messages may be interleaved in the actual log because the processing is asynchronous.

Chapter 16. Tivoli Business Systems Manager

723

INF|1655|Loading instances from the RuleBase. DBG|249|MVSUpload_LoadInstances . . . DBG|791|(make-instance OS-3 of OS (RuleBaseID 2) (cid OS) (id 3) (Name SC42)) DBG|1824|Stdin|(make-instance OS-3 of OS (RuleBaseID 2) (cid OS) (id 3) (Name SC42)) DBG|791|(make-instance OS-7 of OS (RuleBaseID 2) (cid OS) (id 7) (Name SC66)) DBG|1824|Stdin|(make-instance OS-7 of OS (RuleBaseID 2) (cid OS) (id 7) (Name SC66)) . . . INF|1669|Loading facts from the RuleBase. DBG|249|LoadFacts @RuleBaseName=MVSUpload DBG|791|(assert (RuleBaseInitialized)) DBG|1824|Stdin|(assert (RuleBaseInitialized)) DBG|791|(run) DBG|1824|Stdin|(run) DBG|1688|Waiting for all of the initialization files and facts to be processed... . . . DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(make-instance OS-3 of OS (RuleBaseID 2) (cid OS) (id 3) (Name SC42))" DBG|1829|Stdout|[OS-3] DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(make-instance OS-7 of OS (RuleBaseID 2) (cid OS) (id 7) (Name SC66))" DBG|1829|Stdout|[OS-7] DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(assert (RuleBaseInitialized))" DBG|1829|Stdout| DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(run)" DBG|1690|All of the initialization files and facts have been processed. Continuing with initialization... INF|733|Creating the event dequeue thread. INF|443|Service started. Entering Windows Message Loop.

Figure 16-42 MVS upload rule service: Object initialization

In Figure 16-42, the following events are observed:

724

1

Generating internal messages to create the operating system instances.

2

Initiating the RuleBaseInitialized message.

3

Creating the OS instances based on the messages generated in (1).

4

Initialization is completed as a response to the RuleBaseInitialized message.

Troubleshooting Tivoli Using the Latest Features

Processing the initial messages for object registration from OS/390 is triggered by the MVS event handler. Figure 16-43 shows the sample processing for SC66.

DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(send [OS-7] SendMVSRegisterObjects)" DBG|1829|Stdout|("01 Dec 2000 17:47:00:000" "Dec 1 2000 5:48PM") DBG|249|_SendMVSRegisterObjects: OS , 7 DBG|249|sh SendMVSRegisterObjects.ksh -cOS -i7 DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(send [OS-7] SendMVSTDQMsgCapture)" DBG|1829|Stdout|0 DBG|249|_SendMVSTDQMsgCapture: OS , 7 DBG|249|sh SendMVSTDQMsgCapture.ksh -cOS -i7 DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(send [OS-7] SendMVSOmegamonConnect)" DBG|1829|Stdout|0 DBG|249|_SendMVSOmegamonConnect: OS , 7, action=1, bConnectAllOSDescendents=1 DBG|249|sh SendMVSOmegamonConnect.ksh -cOS -i7 -a1 -x DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(send [OS-7] SendMVSActionTraps)" DBG|1829|Stdout|0 DBG|791|(run) DBG|1824|Stdin|(run) DBG|249|_SendMVSActionTraps: OS , 7 DBG|249|sh SendMVSActionTraps.ksh -cOS -i7 DBG|257|GetNextEvent: waiting for message DBG|296|GetNextEvent: got message "(send [OS-7] SendMVSRMFRegistration)" DBG|1829|Stdout|0 DBG|249|_SendMVSRMFRegistration: OS , 7 DBG|249|sh SendMVSRMFRegistrations.ksh -cOS -i7 DBG|257|GetNextEvent: waiting for message

Figure 16-43 MVS upload rule services: Message processing

The messages in Figure 16-43 correspond to the methods invoked by the MVS event handler in Figure 16-38 on page 720, as indicated by (1), and the methods invoked by the MVS event handler in Figure 16-39 on page 721, as indicated by (2). These methods are implemented with the Microsoft SQL server stored procedure, which invokes shell scripts in the database server.

Chapter 16. Tivoli Business Systems Manager

725

Restriction: The Upload rule server uses a cache in the ASIRuleSvc database. Objects that are created after the Rule server is initialized would not be known by the rule server. Running an SQL delete from ASIRuleSvc..ObjectSync command enforces the rule server to resynchronize with the Object database.

Also note that the MVS Upload Rule does not support an MVS system ID that starts with a number (such as 7030).

MVS enqueue proxy server The enqueue proxy server receives messages meant to be sent through the MVS sender service. Figure 16-44 on page 727 shows an excerpt of the enqueue proxy server log file.

726

Troubleshooting Tivoli Using the Latest Features

INF|ASIEPSSrvr: 2.1.6.5 DBG|Adding an interface to ASIEPSServer Server DBG|RpcServerRegisterIf returned: 0 DBG|Adding the all protocols, dynamic endpoints to ASIEPSServer Server DBG|RpcServerUseProtseqEp returned: 0 DBG|Adding an interface to ASIEPSServer Server DBG|RpcServerRegisterIf returned: 0 DBG|Initializing ASIEPSServer Server DBG|Obtaining the binding vector DBG|Printing the binding vector for ASIEPSServer Server DBG|0: ncacn_np:\\\\ITSOVAS2[\\pipe\\000003FB.001] DBG|1: ncalrpc:ITSOVAS2[WMSG000003FB.00000001] DBG|2: ncacn_ip_tcp:9.3.187.194[2589] DBG|3: ncadg_ip_udp:9.3.187.194[2590] DBG|4: ncacn_nb_tcp:ITSOVAS2[107] DBG|5: ncacn_http:9.3.187.194[2592] DBG|Exporting the ASIEPSServer Server as /.:/ASIEPSServer_ITSOVAS2 to the name service database for 5 interfaces DBG|Registering endpoints for the ASIEPSServer Server in the endpoint map for 5 interfaces DBG|ASIEPSServer Server Started. Ready and Waiting to process Client Requests... DBG|ASIEPSServer::ConnectToQueue: QueueName=SC66_Upload.que -> qid=1 DBG|ASIEPSServer::Enqueue: qid=0x00000001 length=38 returned 1 DBG|ASIEPSServer::Enqueue: qid=0x00000001 length=47 returned 1 DBG|ASIEPSServer::Enqueue: qid=0x00000001 length=39 returned 1 DBG|ASIEPSServer::Enqueue: qid=0x00000001 length=42 returned 1 DBG|ASIEPSServer::Enqueue: qid=0x00000001 length=37 returned 1 DBG|ASIEPSServer::Enqueue: qid=0x00000001 length=3 returned 1 . . . NOT|Shutting down due to service control stop command INF|Terminating worker thread NOT|Service shut down INF|Cleaning up

Figure 16-44 Enqueue proxy server

Similar to the enqueue proxy server for the propagation agent, the log in Figure 16-44 present the following items: 1

Indicates that the enqueue proxy server is ready.

2

The server connects to the queue SC66-Upload.que, which is a queue file for ASIMVSSenderSvc-SC66. The queue ID (qid) is 1.

3

When a message is enqueued, it indicates which qid stores the message and the length of the message.

4

Shutdown sequence of the enqueue proxy server.

Chapter 16. Tivoli Business Systems Manager

727

MVS sender service The MVS sender service is responsible for extracting any messages that have been entered to the upload queue by the enqueue proxy server and sending them through the APPC session to the OS/390 system using TP ACC1RECV. Figure 16-45 shows the sender service initialization log.

INF|Successfully loaded Service Library (C:\TivoliManager\bin\ASIMVSSenderSvc.dll) INF|Service Status Report: CurrentState=STARTING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASIMVSSenderSvc-SC66 is STARTING... DBG|Settings: SendWithConfirmWait[0x00007530] DBG|Settings: RemoteCodePage[0x00000025] DBG|Settings: LocalCodePage[0x000004E4] DBG|Settings: Luname[TM39066] DBG|Settings: Modename[LU62PS] DBG|Settings: Tpname[ACC1RECV] DBG|Settings: SymbolicName[ACC1RECV] DBG|Settings: NumMaxRetries[0xFFFFFFFF] DBG|Settings: RetryWait[0x00007530] DBG|Settings: WaitTimeOut[0x0000EA60] NOT|Initialized queue (Queue File: C:\TivoliManager\Data\Queues\SC66_Upload.que, Max Entries: 65536, Entry Size: 1024) INF|Service Status Report: CurrentState=RUNNING WaitHint=15s ExitCode=0 CheckPoint=0 NOT|ASIMVSSenderSvc-SC66 is RUNNING... DBG|WinCPICStartup DBG|cminit: symbolic_name=[ACC1RECV] DBG|cminit returned conversation_id=[6057a1e6] DBG|cmspm: Set_Processing_Mode=[1] DBG|cmsmn: modename=[LU62PS] DBG|cmspln: partnerlu=[TM39066] DBG|cmstpn: tpname=[ACC1RECV] DBG|cmssl: syncLevel=[1] DBG|cmallc: Allocate Session DBG|Wait for cmallc to complete NOT|Session allocated: conversation_id=6057a1e6 NOT|Session reconnected

Figure 16-45 MVS sender service: Initialization

In Figure 16-45, the following items are observed:

728

1

The sender service is starting.

2

The APPC settings for the sender service registry for this instance.

3

The Session allocation handshake with the object server on OS/390.

Troubleshooting Tivoli Using the Latest Features

Figure 16-46 shows the sequence of sending the variable registration messages to OS/390 for SC66.

DBG|cmsst: sendType=[0] DBG|cmsend: lLen=[38], qEntry=[ 02\12\02ENT\03ITSO\040000040001~] DBG|CHAR DATA[ 0 2 \ 1 2 \ 0 2 E N T \ 0 3 I T S O \ 0 4 0 0 0 0 DBG|ASCII DATA[20202020202030325C31325C3032454E545C30334954534F5C30343030 DBG|EBCDIC DATA[404040404040F0F2E0F1F2E0F0F2C5D5E3E0F0F3C9E3E2D6E0F0F4F0F0 DBG|Wait for cmsend to complete DBG|cmsend: lLen=[47], qEntry=[ 02\12\02COMP\03POUGHKEEPSIE\04000003000 DBG|CHAR DATA[ 0 2 \ 1 2 \ 0 2 C O M P \ 0 3 P O U G H K E E P S DBG|ASCII DATA[20202020202030325C31325C3032434F4D505C3033504F5547484B454550 DBG|EBCDIC DATA[404040404040F0F2E0F1F2E0F0F2C3D6D4D7E0F0F3D7D6E4C7C8D2C5C5D DBG|Wait for cmsend to complete DBG|cmsend: lLen=[39], qEntry=[ 02\12\02MACH\03SC66\040000050003~] DBG|CHAR DATA[ 0 2 \ 1 2 \ 0 2 M A C H \ 0 3 S C 6 6 \ 0 4 0 0 0 DBG|ASCII DATA[20202020202030325C31325C30324D4143485C3033534336365C303430 DBG|EBCDIC DATA[404040404040F0F2E0F1F2E0F0F2D4C1C3C8E0F0F3E2C3F6F6E0F0F4F0 DBG|Wait for cmsend to complete DBG|cmsend: lLen=[42], qEntry=[ 02\12\02LPAR\03PRIMARY\040000050005~] DBG|CHAR DATA[ 0 2 \ 1 2 \ 0 2 L P A R \ 0 3 P R I M A R Y \ 0 4 DBG|ASCII DATA[20202020202030325C31325C30324C5041525C30335052494D4152595C DBG|EBCDIC DATA[404040404040F0F2E0F1F2E0F0F2D3D7C1D9E0F0F3D7D9C9D4C1D9E8E0 DBG|Wait for cmsend to complete DBG|cmsend: lLen=[37], qEntry=[ 02\12\02OS\03SC66\040000070006~] DBG|CHAR DATA[ 0 2 \ 1 2 \ 0 2 O S \ 0 3 S C 6 6 \ 0 4 0 0 0 0 0 DBG|ASCII DATA[20202020202030325C31325C30324F535C3033534336365C3034303030 DBG|EBCDIC DATA[404040404040F0F2E0F1F2E0F0F2D6E2E0F0F3E2C3F6F6E0F0F4F0F0F0 DBG|Wait for cmsend to complete DBG|cmsst: sendType=[2] DBG|cmsend: lLen=[3], szEntry=[99~] DBG|DISREGARD next 3 lines, 99~ to be sent as "\xF9\xF9\xA1" EBCDIC DBG|CHAR DATA[9 9 ~ ] DBG|ASCII DATA[39397E] DBG|EBCDIC DATA[F9F9A1] DBG|Wait for cmsend with confirm to complete DBG|Waiting on queue for more data to send

Figure 16-46 MVS sender service: Variable registration

In Figure 16-46, the following items are observed: 1

Sending ENTERPRISE object

2

Sending COMPLEX object

3

Sending MACHINE object

4

Sending LPAR object

Chapter 16. Tivoli Business Systems Manager

729

5

Sending OS object

6

Transmission completed indication

16.1.4 Distributed Edition The TBSM Distributed Edition CD-ROM contains software in Tivoli installable format. You need to use the Tivoli Desktop or the Tivoli winstall command to install them. The 1.1-BSM-0001 patch provides an update to the Tivoli Business Systems Manager Event Enablement to Version 1.1.0.1. The diagram in Figure 16-47 illustrates the TBSM Distributed Edition components.

Endpoint

TMR/Gateway

OS/390

DM Profiles Distributed Monitoring Gateway

Universal Collection Unix Collection NT Collection

NetView

Event Automation Services

TEC server

Tivoli Instrumentation TIS Base Tivoli Managers Quickstart

Enterprise Console

Event Enablement

TBSM/D servers

TBSM Client

agent listener queue

SQLDatabase

Application Server

Agent Listener

Task Server

Database Server Machine

Figure 16-47 Tivoli Business Systems Manager Distributed Edition environment

As shown in Figure 16-47, the managed application component can reside at: 򐂰 The Tivoli endpoint. The application is monitored using Tivoli Distributed Monitoring profiles. These profiles can be plain DM profiles, a Tivoli instrumentation profile, or an APM profile created by the Tivoli Module Builder or other tools. A plain DM profile can generate Tivoli Enterprise Console

730

Troubleshooting Tivoli Using the Latest Features

events and send them to the Tivoli Enterprise Console via the DM gateway, while APM profiles generate the Tivoli Enterprise Console events directly. 򐂰 The OS/390 system. The application is monitored and managed using the NetView Application Management Interface (AMI). NetView APM messages are generated and forwarded to Tivoli Enterprise Console through the Event Automation Service.

Tivoli Enterprise Console is responsible for all the TBSM Distributed Edition events. The event enablement provides several Tivoli Enterprise Console exits that receive certain event types and forward the events to it. Table 16-1 shows the available Tivoli Enterprise Console exits. Table 16-1 Tivoli Enterprise Console exits for event forwarding Exit

Input event type

Tivoli Business Systems Manager event type

ihstetec

APM Heartbeat

APM Heartbeat

ihstmtec

APM Threshold

APM Threshold

ihstztec

DM Events

APM Threshold

ihstttec

TBSM Generic events

APM Threshold

ihstctec

APM Connection Change (GEM only)

N/A

ihststec

APM System layer (GEM only)

N/A

Tivoli Business Systems Manager only processes two types of events: The APM Heartbeat and APM Threshold. These events are received by the Agent Listener service, which runs in the database server. The appropriate Tivoli Enterprise Console ruleset, for example, interapp.rls, invokes the Tivoli Enterprise Console exit to send APM events to the Event Enablement subsystem. Event Enablement sends the event to all its subscribers, which includes the Agent Listener from Tivoli Business Systems Manager. Tivoli Business Systems Manager does not support the following APM events. If Tivoli Business Systems Manager receives these type of events, it will be discarded: 򐂰 Connection change (APM_CONNECTION_CHANGE) 򐂰 Component information change (APM_CI_CHANGE)

NetView commands and Tivoli Management Framework tasks are invoked by the Task Server. NetView commands are forwarded to NetView using the NETCONV session. Tivoli tasks are executed directly by the Task Server using the Tivoli Management Framework facility.

Chapter 16. Tivoli Business Systems Manager

731

TBSM Distributed Edition component details This section describes the detailed processing of each component in the TBSM Distributed Edition. The components discussed here are: 򐂰 Endpoint monitoring 򐂰 Event enablement 򐂰 Agent Listener

Endpoint monitoring The endpoint runs the lcfd as the main endpoint executable and the dm_ep_engine as the Tivoli Distributed Monitoring engine. TBSM Distributed Edition uses Distributed Monitoring to schedule AMS monitors. TBSM Distributed Edition can also receive events from a basic DM monitoring profile. The following lists the information sources for endpoint operation: 򐂰 lcfd.log

This file provides general information about the TMA endpoint. It is located in %Installdir%\dat\1. 򐂰 DM36.log

This file contains basic information about the execution of monitors. If some monitor fails to run, information is recorded here. It is located at %Installdir%\dat\1. 򐂰 wlseng -l

This command can be issued from the TMR Server or managed node. It shows the installed monitors and their characteristics. It is the main source of information about installed and running monitors. 򐂰 APM log files

These log files contains information about specific APM monitoring profiles. These APM monitors do not use normal DM mechanism because they do not return information to DM. DM is used only for scheduling. The monitor usually runs a shell script or a Java program. Event information will be passed directly to Tivoli Enterprise Console using wpostemsg or idlcall. In our example, we have log files for every GEM enabled product, such as Tivoli Manager for Domino and Tivoli instrumentation service: – %Installdir%\dat\1\Domino_APMHeartbeatMonitor.log – %Installdir%\dat\1\Domino_Query_State.log – %Installdir%\dat\1\tmp\Tivoli_Instrumentation_Service_APMHeartbeatMonit or.log – %Installdir%\dat\1\tmp\Tivoli_Instrumentation_Service_State.log

732

Troubleshooting Tivoli Using the Latest Features

When there are other management products installed, additional log files will be available. Use a find mechanism to find out where they are located. Usually, they are located within the Tivoli directory tree under %Installdir%.

Event enablement and Tivoli Enterprise Console Events are forwarded to Tivoli Enterprise Console through Distributed Monitoring or directly from APM monitors using wpostemsg. You can see the event in the appropriate Tivoli Enterprise Console event group, or use an event group where no filtering is defined. When the event does not show up, use the wtdumprl command to find out whether it has arrived at the Tivoli Enterprise Console. When you can see the event on your display, you can check the action status about forwarding the event to event enablement. This can be done when the event is displayed at Tivoli Enterprise Console by clicking View action status. When displaying this panel, you can see which event enablement exit has been taken and whether it was successful or not. See also wtdumptr for a listing of the actions taken for all events. 򐂰 wtdumprl

This command is used to display all received events and their disposition, whether they are processed or have a parsing failed status. When a parsing has failed, it means that either the class is not defined, or there is an unknown slot (field) or an unparsable slot. 򐂰 wtdumptr

This command is used to display the completion of the action invoked from the Tivoli Enterprise Console rules for an event. For example, the completion code for invoking the Tivoli Business Systems Manager event enablement exits. Event enablement is the interface between Tivoli Enterprise Console and Tivoli Business Systems Manager. Events are received from the Tivoli Enterprise Console and are transferred to the Tivoli Business Systems Manager Agent Listener. When an event has passed the Tivoli Enterprise Console and did not arrive at Tivoli Business Systems Manager, we have to check whether it has been transformed and forwarded by event enablement. Two types of log files are available for the event enablement and task server, and they are stored in $BINDIR/TDS/EventService/log: 򐂰 Message logs

Message logs are normal ASCII text files and can be viewed with a standard editor. The message log files are: – ihseemsg.log – ihstsmsg.log

Chapter 16. Tivoli Business Systems Manager

733

򐂰 Error logs

Error logs are used for logging internal errors and debugging information. The contents contain binary data, so they are not readable with a standard editor. We use the ihszfmt program to format and print the error logs. The debugging information in the error logs are controlled by the tserver command. The ihszfmt and tserver commands reside in $BINDIR/TDS/EventService/bin. The following error logs are available: – ihseeerr.log – ihstserr.log To start recording diagnostic information, use the command: tserver ee_utility -t

To turn off recording diagnostic information, use the command: tserver ee_utility -n

Agent Listener When an event has passed event enablement, it will be sent to the input queue in the database server machine. The Agent Listener will get the event information from the queue. When looking at the Agent Listener processing, the following items need to be considered as sources of information for the Agent Listener processing: 򐂰 GEM event enablement configuration

The gemeeconfig command can be executed to verify that the Agent Listener is connected to the correct Tivoli Enterprise Console machine. 򐂰 Queue status

The queue status can be seen using the dumpfqueue command. Most of the time, the queue should be empty. Sometimes when a lot of messages are in the queue, the dequeue mechanism fails and we have to dequeue them manually using the dequeue command. 򐂰 Agent Listener log file

The Agent Listener operation is stored in a log file with a prefix of AL*. Figure 16-49 on page 736 shows the initialization of Agent Listener.

734

Troubleshooting Tivoli Using the Latest Features

INF|Initializing Rule Engine. DBG|Aquiring registry keys. DBG|Initializing database connection. DBG|Database Settings: SERVER=[ibmtiv5], USER=[sa], APPLICATION=[ASIAgentListenerSvc] DBG|Connecting to database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=ASIAgentListenerSvc], integrated security=0... DBG|Registering for ADO Connection events... DBG|Opening database with connection [Provider=MSDASQL;DRIVER=SQL Server;SERVER=ibmtiv5;APP=ASIAgentListenerSvc], integrated security=0... DBG|ADO Connection Open. DBG|Successfully connected to database as [ASIAgentListenerSvc]. DBG|Changing database to [Object]... DBG|Successfully changed database to [Object]. DBG|Database Client Process ID (SPID) = 36 DBG|ANSI_WARNINGS setting is OFF DBG|QUOTED_IDENTIFIER setting is OFF DBG|Changing database to [Object]... DBG|Successfully changed database to [Object]. DBG|Initializing queue. FILE=[D:\TivoliManager\Data\Queues\AgentListener.que], MAXENTRIES=[70000], CELLSIZE=[2048] DBG|Instantiating COM ASIRuleEngine object. DBG|Starting router thread... DBG|Starting command loop thread... DBG|GetNextEvent: waiting for message DBG|Instantiating Rule Engine Event Handler. DBG|Loading CLIPS initialization file(s). DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\persistence.deffunction.dummies.rule.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\persistence.output_all_quiet.rule.clp)" DBG|GetNextEvent: waiting for message DBG|Waiting for all of the initialization files to be processed... DBG|GetNextEvent: got message "(load-facts D:\TivoliManager\Data\Rules\persistence.off.fact.clp)"

Figure 16-48 Initialization of Agent Listener

The following are shown in Figure 16-49 on page 736: 1

The Agent Listener is actually a rule engine.

2

It loaded the rule files, including APM based rule files.

Chapter 16. Tivoli Business Systems Manager

735

DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\AllEvent.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\APMConnection.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\APMGeneric.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\APMHeartBeat.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\APMThreshold.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\asisystemtime.output_all_quiet.rule.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load D:\TivoliManager\Data\Rules\rule_base_initialized.rule.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(load-facts D:\TivoliManager\Data\Rules\rule_base_initialized.fact.clp)" DBG|GetNextEvent: waiting for message DBG|GetNextEvent: got message "(run)" DBG|GetNextEvent: waiting for message DBG|All of the initialization files have been processed. Continuing with initialization... DBG|Initializing the Equeuers. DBG|Instantiating the Enqueue Site. DBG|Opening the Enqueuers registry key. DBG|GetNextEvent: got message "( assert ( ASISystemTime (SYSTEMTIME_VALUE 1030397874) (SYSTEMTIME_STRING "08/26/2002 15:37:54") ) )" DBG|GetNextEvent: waiting for message DBG|Creating the event dequeue thread. INF|Service started. Entering Windows Message Loop.

Figure 16-49 Agent Listener: Initialization items

TBSM Distributed Edition database extension This section discusses the extension of the Tivoli Business Systems Manager database for AMS resources. The discussion covers AMS object types, AMS class implementation, and AMS tables.

AMS object types The Application Management Specification (AMS) functionality is implemented in Tivoli Business Systems Manager by extending its database data structure. The Tivoli Business Systems Manager abstract object is created as the base class of the GEM objects. These classes have the hierarchy shown in Figure 16-50 on page 737.

736

Troubleshooting Tivoli Using the Latest Features

GEM Software Component GMSC

GEM AMS object GMGM

GEM Generic GMGN

Distributed Monitoring GMDM

GEM Mainframe GMMF

GEM components Gxxxx

Figure 16-50 GEM object classes in Tivoli Business Systems Manager

The following sections discuss each object class shown in Figure 16-50.

Application Policy Management (APM) resources APM resources are defined through AMS definitions, such as the components created with the Tivoli Module Builder and Designer. Many GEM enabled software products are instrumented with AMS. These products, if not predefined in Tivoli Business Systems Manager, can be manually defined in Tivoli Business Systems Manager using the (xdf)parser utility. This utility interprets the AMS definitions and creates SQL definitions that extend the Tivoli Business Systems Manager data model. This (xdf)parser utility is implemented in Java. (It has its own JRE.) See Table 16-2 on page 738 for definitions resulting from AMS definition file types.

Chapter 16. Tivoli Business Systems Manager

737

Table 16-2 AMS types AMS file type

Tivoli Business Systems Manager definitions

bsdf

Line of Business (LOB) Definitions, depending on the type, Business System, Application, or Middleware.

bssdf

Line of Business Definitions.

bmdf

Ties the component to its LOB and provides instance filtering for the LOB.

bcdf

No correlated Tivoli Business Systems Manager definitions are created.

gdf

Defines and provides the name of the default task library.

cdf

Tivoli Business Systems Manager object type for the component name and version. Associates the icon used at the console. Tasks are added to the menu items for the instance and the names of alternate task libraries are specified. A LOB is created for the manufacturer of the component.

Monitoring definitions are not extracted from the AMS definitions, because monitors are mapped to the component on their product name. Events resulting from distributed monitors appears as exceptions within the Tivoli Business Systems Manager object instance. To add monitors to an Tivoli Business Systems Manager object, you can create it within the monitoring profile where the APM Heartbeat monitor is defined.

Distributed Monitoring resources Distributed Monitoring resources will not be defined with AMS description files. To define monitored resources to Tivoli Business Systems Manager, we use the gmdmmap command. With this command, we create a new class for a software component and a monitoring collection that is associated with that. More than one monitoring collection can be associated with one software component, but a monitoring profile cannot be associated with more than one software component. Distributed Monitoring profiles can also be added to APM defined software components. For example, to associate a monitoring profile for Domino monitors to the APM defined instrumentation of Tivoli Manager for Domino, we create an association with the gemdmmap command for a monitoring profile to the APM-based Domino object class. The sentry events that are sent by the profile are processed as APM Threshold exceptions. The instance of the software component must be created with the APMHeartbeat event. Distributed Monitoring events will correlate to APM instances when the monitoring profiles ar distributed to the same managed node or endpoint where the APM resources are located.

738

Troubleshooting Tivoli Using the Latest Features

Tivoli Rule Engine API The Tivoli Rule Engine API provides the ability to integrate a generic Tivoli Enterprise Console event to Tivoli Business Systems Manager objects as a generic GEM object class. The events that are not triggered by Distributed Monitoring can also be forwarded to Tivoli Business Systems Manager and defined as a separate software component using this API. You define a generic GEM object using gemgenprod.sh. The Tivoli Enterprise Console event forwarding should be performed using the ihstttec exit.

Mainframe objects This class represent APM objects that are instrumented using NetView for OS/390 AMI. The objects in this class are created under the Operating System object under the Complex - Machine - LPAR hierarchy based on the content of the hostname field.

AMS class implementation The Tivoli Business Systems Manager database has been preinitialized with several GEM object types supplied by Tivoli products. The GEM component in Tivoli Business Systems Manager are defined with the class ID of Gxxx, where xxx is any alphanumeric characters. The class name is represented as Gxxxcname. They are implemented in a set of tables, as with an ordinary Tivoli Business Systems Manager object. Figure 16-51 on page 740 shows the tables for GEM object G02H.

Chapter 16. Tivoli Business Systems Manager

739

Figure 16-51 Tables for CID G02H

Some of the important tables shown in Figure 16-51 are: G02H_ID

This table contains a single number that represents the highest instance ID number in the class table. Whenever an instance is created for a class, the content of this table must be incremented.

G02Hcname_C

The class table that contains the instances of this class.

G02Hcname_S

The setting table that contains a single row representing the class-wide attributes. Some examples in this category are icon definition, message tables, and propagation matrix limits.

AMS tables For the GEM based components, there are several additional tables that define important information on the classes and instances. These tables can be categorized as being used for: 򐂰 Finding and locating GEM classes and instances 򐂰 Placement of object instances 򐂰 Automatic creation of Line of Business views

740

Troubleshooting Tivoli Using the Latest Features

These additional tables are useful for finding and locating these GEM objects: 򐂰 GEMLookupCID

This table contains the information of GEM based classes in Tivoli Business Systems Manager. These classes are named Gxxxx, with a long name of Gxxxxcname. Figure 16-52 shows the content of a GEMLookupCID table.

Figure 16-52 GEMLookupCID

The Manufacturer, Product, and Version columns contain unique information that identify the component. Tasklib contains the name of the Task Library where a task for this component can be invoked. Comptype indicate what type of GEM component it is: gem, os390, gen, or dm. 򐂰 GEM_IDlookup

This table shows the argument that matches an event attribute with the GEM object and its parent. Figure 16-53 on page 742 shows a sample content of this table.

Chapter 16. Tivoli Business Systems Manager

741

Figure 16-53 GEM_IDlookup

Some instances can be located from their TCP/IP host names, the sub-source field, or from their endpoint IDs in the TMR. 򐂰 GEM_DMtoCID

This table maps the DM profile to the GEM object class ID. When an event from a DM profile is received, as indicated in the sub-source slot, it will be applied to the GEM object in the specified class. Figure 16-54 shows a sample content of the GEM_DMtoCID table.

Figure 16-54 GEM_DMtoCID

742

Troubleshooting Tivoli Using the Latest Features

The Enterprise outliner view for a GEM object has the following hierarchy: BUSC - Enterprise - Network Region - Network Location. These tables are used to get the necessary information for placing the GEM object in the Enterprise Outliner: GEM_EEHostToEnterprise

Maps the event enablement host name to the Enterprise object. We map our machine brewster to ITSO enterprise.

GEM_LocationToRegion

The Network Region object name is derived from the location. The default derivation is to take the second qualifier of the location name.

GEM_HostnameToLocation

The TCP/IP host name is used to get the Network Location parameter. The default location is derived from the second and third part of the TCP/IP host name.

For example, to create the hierarchy for resources residing in shark.itsc.austin.ibm.com, Tivoli Business Systems Manager creates the following hierarchy: BUSC - ITSO - austin - itsc.austin - shark.itsc.austin.ibm.com - resources The following table dictates an automatic grouping to Line of Business view for GEM objects: 򐂰 GEM_InstFiltering

This table provide the instance filtering definition for LOB grouping. When an instance is created, all rows in this table that apply to the instance are evaluated. It evaluates the value in memPattern based on the value in memExpr, and decides whether to include the object in an LOB view or not. Figure 16-55 on page 744 shows part of the GEM_InstFiltering table.

Chapter 16. Tivoli Business Systems Manager

743

Figure 16-55 GEM_InstFiltering

As shown in Figure 16-55, for the cid of G00H (Domino object): – For the BSDF level, the Domino object will always be defined in the TIVOLI Domino 1.0.0 LOB. – For the BCDF level, when the HB_DATA1 slot contains the text Mail, the object will also be grouped under the TIVOLI Lotus 1.0.0 E-MailTIVOLIDomino1.0.0 LOB view. – For the BCDF level, when the HB_DATA2 slot contains the text Rep, the object will also be grouped under the TIVOLI Lotus 1.0.0 ReplicationTIVOLIDomino1.0.0 LOB view. 򐂰 GEM_LOB_BCDF

This table lists the BCDF LOB View that can be created for each BCDF triplet that comes in an APMheartbeat message. Figure 16-56 on page 745 shows some sample content of this table.

744

Troubleshooting Tivoli Using the Latest Features

Figure 16-56 GEM LOB BCDF

򐂰 GEM_LOB_Lookup

This table lists all the possible LOB objects that can be created with the GEM components and what their parents should be. The LOBID and parentLOBID column can contain 0 or the LOB instance ID referring to the LOB_C table. A value of 0 means that the LOB is not yet created and needs to be created in the object discovery process. Figure 16-57 shows some sample content of our GEM_LOB_Lookup table.

Figure 16-57 GEM LOB Lookup

Chapter 16. Tivoli Business Systems Manager

745

16.2 Configuration of Tivoli Business Systems Manager Tivoli Business Systems Manager is available as an ordinary program product from the IBM software ordering process with a order number of 5698-BSM. The TBSM base services come as a CD-ROM for the NT component. There are six server types for TBSM base services. These are: SNA server

This machine manages the SNA connection to the Tivoli Business Systems Manager Enterprise Edition.

Event server

This machine receives, processes, and reacts to OS/390 events. It also runs SNA client software.

Database server

This machine provides the heart of the processing of Tivoli Business Systems Manager in the Microsoft SQL server database.

Application server

This machine manages the client workstation connections.

Propagation server

This machine processes events and calculates the necessary propagation action to be taken.

History server

This machine replicates the operational Tivoli Business Systems Manager database in the database server for reporting and analysis.

For the Enterprise Edition implementation, all six servers are required, while for the Distributed Edition, only two servers, the application server with the propagation function and the database server are required.

16.2.1 Prerequisites The complete set of prerequisites are in the IBM Tivoli Business Systems Manager Release Notes Version 2.1, SC23-4841. Table 16-3 on page 747 lists the hardware and software configuration that we used.

746

Troubleshooting Tivoli Using the Latest Features

Table 16-3 Software configuration Database server

Application server

Propagation Server

Event server

SNA server

NT Server 4.0 Service Pack 6a

NT Server 4.0 Service Pack 6a

NT Server 4.0 Service Pack 6a

NT Server 4.0 Service Pack 6a

NT Server 4.0 Service Pack 6a

NT 4.0 Server Resource Kit Supplement 3

NT 4.0 Server Resource Kit Supplement 3

NT 4.0 Server Resource Kit Supplement 3

NT 4.0 Server Resource Kit Supplement 3

MS SNA Server 4.0 Service Pack 2

MKS Toolkit 6.2

MKS Toolkit 6.2

MKS Toolkit 6.2

MKS Toolkit 6.2

MS SQL Server Version 7.0 SP #2 (client only, for health monitor)

MS SQL Server Version 7.0 SP #2 (client only)

Microsoft Internet Explorer Version 5

MS SQL Server Version 7.0 SP #2 Microsoft IIS 4.0 Microsoft Internet Explorer Version 5

Microsoft Internet Explorer Version 5

MS SNA Client 4.0 Service Pack 2 Microsoft Internet Explorer Version 5

16.2.2 NT Servers installation This section provides the summary of the installation and configuration requirements for the NT servers for Tivoli Business Systems Manager. Refer to the section “Tivoli Business Systems Manager Configurations” in IBM Tivoli Business Systems Manager Installation and Configuration Version 2.1, GC32-0800 for a detailed process of each server installation. We recommend you install the Tivoli Business Systems Manager NT servers with: 򐂰 A dedicated IP address, not DHCP 򐂰 In a single NT domain or workgroup 򐂰 The computer name and the host name matched

Table 16-4 on page 748 shows the summary of TBSM base services components that are installed.

Chapter 16. Tivoli Business Systems Manager

747

Table 16-4 List of TBSM base services components installed Database server

Application server

Propagation server

Event server

򐂰

Workstation program files

򐂰

Workstation program files

򐂰

Workstation program files

򐂰

Workstation program files

򐂰

Help files

򐂰

Help files

򐂰

Help files

򐂰

Help files

򐂰

Tools and Utilities

򐂰

Tools and Utilities

򐂰

Tools and Utilities

򐂰

Tools and Utilities

򐂰

SQL extension files

򐂰

Propagation Agent Component

򐂰

Staged Event Loader

Application Server files

򐂰

򐂰

򐂰

򐂰

NT Agent Listener

Additional Management Files

Mainframe monitoring component

򐂰

򐂰

TBSM xdf parser

Mainframe file receiver component

򐂰

TBSM TDS files

Message sender

򐂰

The Microsoft Internet Information Server (IIS) is a prerequisite of the Tivoli Business Systems Manager Reporting system and Help system. The reporting system accesses all data stored in the history database server. Because we did not install the History server, we installed IIS in the SQL Database server instead. In general, the installation of Tivoli Business Systems Manager components into the NT servers requires the setup.exe file. We always used the custom installation option because we wanted to select the appropriate components, as indicated by Table 16-4. Should you want to combine some of the servers functions, just install the list of the combined components. Most of the Windows NT components of Tivoli Business Systems Manager are installed as Windows NT services. You can remotely start or stop those NT services using the SC command that is supplied with Windows NT 4.0 Resource Kit. See the IBM Tivoli Business Systems Manager Administrator's Guide Version 2.1, GC32-0799 for details about remote administration of Tivoli Business Systems Manager NT services. Be aware that there will be no Tivoli Business Systems Manager component installed in the SNA server. When the installation are completed, the services or programs related to Tivoli Business Systems Manager shown in Table 16-5 on page 749 should be found in the servers.

748

Troubleshooting Tivoli Using the Latest Features

Table 16-5 Tivoli Business Systems Manager related programs and services Database server

Application server

Propagation Server1

Event server4

SNA server4

-

-

SnaBase

SnaBase

Pre-requisite programs MSSQLServer

TPSTART.EXE TBSM base services ASIDBValidater

ASIDBValidater

ASIDBValidater

ASIDBValidater

ASIRuleSvc

ASIApplicationSvc

ASIRemote ExecutionServer

ASIEnqueueProxy Server

ASIPADispatcher

ASIEnqueueProxy Server

ASIMVSIPListener Svc

ASIStagedEvent Loader

ASIHealthMonitor3 ASIMonitorEvent Handlers3 ASIMonitorPA Blocking3

ASIMVSUpload RuleSvc

PAgent.exe

ASIServiceApp. EXE -n ASIMVSListener.dll

ASIAgentListener Svc 2

-

ASIMVSSenderSvc -5

ASITaskServer ASIEvent Enablement ASITSDEvent HandlerSvc Additional program -

-

-

5

1. For Distributed Edition, the Remote Execution Server and Enqueue Proxy Server resides in the Application Server. 2. Agent Listener services are only used for the Distributed Edition. 3. These three services are for Health monitoring; they are installed with the Additional Management component. 4. The event server and SNA server are only used for the Enterprise Edition. 5. MVS Sender Services and MVS Listener will not be available until the MakeMVScomponent.sh is run for a specific OS/390 image.

Chapter 16. Tivoli Business Systems Manager

749

16.3 Problem determination of Tivoli Business Systems Manager In this section, we discuss general problem determination methods and TBSM base services log files.

16.3.1 General problem determination techniques In general, most of the problems related to Tivoli Business Systems Manager can be analyzed by using the following tools: 򐂰 Health monitoring system. The health monitoring system monitors the general status of Tivoli Business Systems Manager components and returns an easy to use color coded display. 򐂰 For further problem determination, refining the log files by changing the logging level may be necessary. 򐂰 Database calls may be trapped by tracing the SQL call made by a certain process. Use the tracing tools from Microsoft SQL server. 򐂰 Communication problems may need further debugging using the Microsoft SNA server trace and the queue files for Listener and Sender.

16.3.2 TBSM base services logging To be able to debug problems in the Tivoli Business Systems Manager processes, you can change the logging level for each process. The log file is always stored in the Logs subdirectory of the Tivoli Business Systems Manager installation directory. Each processes has its own prefix for the log file name and the time stamp when the process is started. Table 16-6 on page 751 lists the file name associated with each process.

750

Troubleshooting Tivoli Using the Latest Features

Table 16-6 Tivoli Business Systems Manager services Process name

Log prefix

Function

-

Regularly checks database availability, suspends other services when the database becomes unavailable.

PD

Starts and stops propagation agents, sends event to be propagated to the agents.

EPS

Provides remote file queueing capability for TBSM, resides in the event server for the Sender services and the propagation server for Propagation agents.

RX

Providing remote execution and remote kill function for propagation agent.

򐂰

ASIDBValidater

򐂰

Tivoli BSM Database Validater

򐂰

ASIPADispatcher

򐂰

Tivoli BSM Propagation Agent Dispatcher

򐂰

ASIEnqueueProxyServer

򐂰

Tivoli BSM Enqueue Proxy Server

򐂰

ASIRemoteExecution Server

򐂰

Tivoli BSM Remote Execution Server

򐂰

PAgent.exe

PA

Propagation agent that calculates the propagation of events, exceptions, and child events.

򐂰

ASIApplicationSvc

AS

򐂰

Tivoli BSM Application Server

Manages client workstation display and connection.

򐂰

ASINotificationSvc

NS

򐂰

Tivoli BSM Notification Server

Sends notification to workstation on object addition, deletion, and state changes.

򐂰

ASIMVSListenerSvc

LS MVSL_nnnn_

The listener process that is invoked by the TPSTART utility.

򐂰

ASIMVSEventHandlerSvc-nnnn

MVSE_nnnn_

򐂰

Tivoli BSM MVS EventHandlerSvc-nnnn

Storing events from OS/390 to TBSM database using the staged event loader. Events are received from the MVS listener queues.

򐂰

ASIMVSUploadRuleSvc

MVSURS

򐂰

Tivoli BSM MVSUpload Rule Server

Evaluates an OS/390 message and invokes the appropriate operation based on the message to generate a reply

򐂰

ASIMVSSenderSvc-nnnn

MVSS_nnnn_

򐂰

Tivoli BSM MVSSenderSvc-nnnn

Sending reply messages back to the Source/390 subsystems.

Chapter 16. Tivoli Business Systems Manager

751

Process name 򐂰

ASIStagedEventLoader

򐂰

Tivoli BSM Staged Event Loader

򐂰

ASIMVSIPListener

򐂰

Tivoli BSM MVSIPListener

򐂰

ASIAgentListener

򐂰

Tivoli BSM Agent Listener

򐂰

ASIRuleSvc

򐂰

Tivoli BSM Rule Server

򐂰

ASITaskServer

򐂰

Tivoli BSM Task Server

򐂰

ASIEventEnablement

򐂰

Tivoli BSM Event Enablement

򐂰

TSDEventHandlerSvc

򐂰

Tivoli BSM TSD Event Handler

Log prefix

Function

SEL

Loads events to TBSM database; they can be OS/390 events sent from the event handler or Tivoli Enterprise Console events from the Agent Listener.

IPL

Receives files from OS/390 and executes a command to process the file.

AL

Receiving and processing Tivoli Enterprise Console events for APM_Heartbeat and APM_Threshold class.

RLS

Batch rule processing.

ihstsmsg.log ihstserr.log

Receiving and executing NetView commands or Tivoli Management Framework tasks.

ihseemsg.log ihseeerr.log

Forwarding from Tivoli Enterprise Console to Agent Listener.

TBSMTSD.log

TSD event support.

Note: When the full file name is shown, the process does not conform to the Tivoli Business Systems Manager logging mechanism described in this section.

This log setting is provided in the NT registry. The logging level is stored in the value of LogLevel in the Log sub-key. There are six levels of logging in Tivoli Business Systems Manager. The level of 0 is the most complete and is only recommended for debugging. The default logging level is 2, which indicates only major events, such as start, stop, and when an error is written to the log. Figure 16-58 on page 753 shows the registry tree for Tivoli Business Systems Manager log setting.

752

Troubleshooting Tivoli Using the Latest Features

1 2

3

Figure 16-58 Log setting for a service

If the LogLevel parameter for a component does not exist, it gets the setting from the upper level setting. For example, the ASIMVSListenerSvc has a log setting under the ITSOVAS2 host, as indicated by (1) in Figure 16-58. If it does not contain the LogLevel value, the LogLevel will be retrieved from (2) as the setting for all ASIMVSListenerSvc or, if needed, from the global setting for all Tivoli Business Systems Manager processes in (3). We recommend that instead of setting the global LogLevel value, you add the LogLevel value on each of the components you want to trace so that the setting is in (2). Add the value using the Edit -> Add Value menu and put in the appropriate value, as shown in Figure 16-59 on page 754.

Chapter 16. Tivoli Business Systems Manager

753

Figure 16-59 Adding LogLevel value

The log file format is also stored in the registry as the value of LogHeading. An example is the LogHeading for Notification Services: %Y/%m/%d %H:%M:%S|%t|%O|%o|NS|%i|%F|%L|

A sample log message is: 2000/02/17 18:14:44|767|INF|1|NS|e2|B:\Services\NT\ASIServiceApp\ASIServiceAppD LL\ASIServiceApp.cpp|781|Successfully loaded Service Library (d:\TivoliManager\ bin\ASINotificationSvc.dll)

Some of the format strings are: %Y

Year

%m

Month

%d

Date

%H

Hour

%M

Minute

%S

Second

%t

Microsecond

%O

Record types:

DBG INF NOT ERR WRN CRT

754

debug informational notice error warning critical

Troubleshooting Tivoli Using the Latest Features

%o

Priority, which indicates what logging level will show this type of record

%i

Thread ID

%F

Source program name

%L

Line number in the source program

16.3.3 Remote access for Tivoli Business Systems Manager logs In Tivoli Business Systems Manager, all log directories are shared as Logs$. In our installation, we connect to the log directories of all our Tivoli Business Systems Manager servers using the command: NET USE * \\server\Logs$ /user:Administrator passwd

This allows us to see all the log files locally, at which point we run the following commands: NET NET NET NET

USE USE USE USE

* * * *

\\ITSOVAS1\Logs$ \\ITSOVAS2\Logs$ \\ITSOVAS3\Logs$ \\ITSOVAS4\Logs$

Figure 16-60 shows the NT Explorer window after issuing the commands.

Figure 16-60 Tivoli Business Systems Manager log directories

Chapter 16. Tivoli Business Systems Manager

755

756

Troubleshooting Tivoli Using the Latest Features

17

Chapter 17.

Tivoli Enterprise Data Warehouse In this chapter, we will address Tivoli Enterprise Data Warehouse troubleshooting. The following topics are covered in this chapter: 򐂰 Section 17.1, “Tivoli Enterprise Data Warehouse introduction” on page 758 򐂰 Section 17.2, “Troubleshooting Tivoli Enterprise Data Warehouse” on page 760 򐂰 Section 17.3, “Troubleshooting ETLs” on page 769 򐂰 Section 17.4, “Maintenance and backup” on page 777 򐂰 Section 17.5, “Un-install components” on page 781 򐂰 Section 17.6, “Troubleshooting IBM Tivoli Monitoring Version 5.1.1 TEDW Support” on page 783

© Copyright IBM Corp. 2003. All rights reserved.

757

17.1 Tivoli Enterprise Data Warehouse introduction The Tivoli Enterprise Data Warehouse (TEDW) is an application used to collect and manage data from various Tivoli and non-Tivoli system management applications. The data is imported from the source applications, stored centrally, and further processed to fit the needs of the end users. Here we describe the basic components of the Tivoli Enterprise Data Warehouse in the logical order of the data flow in Figure 17-1.

Tivoli Warehouse

Control Server: IBM DB2® DWC

Warehouse Metadata

Tivoli Reporting Services

Source Apps DM

ETL

Inventory

ETL

TEC

Source App

ETL

Tivoli Reporting Interface

Central Data Warehouse

ETL

Data Marts Data Marts Data Marts Data Marts Data Marts Data Marts

ETL

Business Intelligence Tools IBM

Cognos

Brio

Business Objects

Figure 17-1 Components of the Tivoli Enterprise Data Warehouse

The first step in introducing the Tivoli Enterprise Data Warehouse is to enable the source applications. This means providing all the tools and customizations necessary to import the source operational data into the central data warehouse. All components needed for that task are collected in so-called warehouse packs for each source application. Future releases of all Tivoli applications will be Tivoli Enterprise Data Warehouse ready and shipped with their warehouse packs. One important part of the warehouse packs are the ETL programs. The abbreviation ETL is for Extract, Transform, and Load data. ETL programs process data in three steps. First, they extract the data from a data source. Then

758

Troubleshooting Tivoli Using the Latest Features

the data is validated, transformed, aggregated, and/or cleansed so that it fits the format and needs of the data target. Finally, the data is loaded into the target database. In Tivoli Enterprise Data Warehouse, there are two types of ETLs. The central data warehouse ETL pulls the data from the source applications and loads it into the central data warehouse (see Figure 17-1 on page 758). The central data warehouse ETL is also known as source ETL or ETL1. The second type of ETL is the data mart ETL, which is discussed later. The central data warehouse (CDW) is the database that contains all enterprise wide historical data (hour is the lowest granularity). This data store is optimized for the efficient storage of large amounts of data and has a documented format that makes the data accessible to many analysis solutions. The database is organized in a very flexible way, so you can store data from new applications without adding or changing tables. The data mart ETL extracts a subset of historical data from the central data warehouse that contains data tailored to and optimized for a specific reporting or analysis task. This subset of data is used to create data marts. Data mart ETL is also known as target ETL or ETL2. A data mart is a subset of the historical data that satisfies the needs of a specific department, team, or customer. A data mart is optimized for interactive reporting and data analysis. The format of a data mart is specific to the reporting or analysis tool you plan to use. Each application that provides a data mart ETL creates its data marts in the appropriate format. Tivoli Enterprise Data Warehouse provides a Report Interface (RI) that creates static two-dimensional reports of your data using the data marts. The RI is a role based Web interface that can be accessed with a simple Web browser without any additional software installed on the client. You can also use other tools to perform OLAP analysis, business intelligence reporting, or data mining. The Control server is the system that contains the control database which contains metadata for Tivoli Enterprise Data Warehouse and from which you manage your data warehouse. The Control server controls communication between the Control server, the central data warehouse, the data marts, and the Report Interface. The Control server uses the Data Warehouse Center to define the ETL processes and the star schemas used by the data marts. You use the Data Warehouse Center to schedule, maintain, and monitor these processes.

Chapter 17. Tivoli Enterprise Data Warehouse

759

17.1.1 How Tivoli Enterprise Data Warehouse is packaged When installing Tivoli Enterprise Data Warehouse support for Tivoli software, you receive and install two logical parts: 򐂰 The Tivoli Enterprise Data Warehouse core application, which provides the warehouse infrastructure 򐂰 One or more warehouse packs, which are applications that make use of the infrastructure

Tivoli Enterprise Data Warehouse The Tivoli Enterprise Data Warehouse core application is packaged as a collection of CDs that are provided with each Tivoli software product that uses its infrastructure. You receive a different set of CDs depending on whether you order support for single byte character set (SBCS) languages or double byte character set (DBCS) languages. The Tivoli Enterprise Data Warehouse CD set consists of the following CDs: 򐂰 Tivoli Enterprise Data Warehouse: The installation media for the Tivoli Enterprise Data Warehouse application. 򐂰 Tivoli Enterprise Data Warehouse Language Support: The files necessary to use Tivoli Enterprise Data Warehouse in non-English languages. This CD contains both SBCS and DBCS language support. 򐂰 Tivoli Enterprise Data Warehouse Documentation: The Tivoli Enterprise Data Warehouse documentation library. 򐂰 A collection of DB2 CDs, which varies depending on whether you order the SBCS or DBCS version of Tivoli Enterprise Data Warehouse.

Warehouse packs A warehouse pack is the part of a Tivoli software product that provides warehouse functionality. It can be provided on the installation media for the product, on a separate CD, or in a collection of warehouse packs. When not provided on a CD containing only one or more warehouse packs, a warehouse pack is located in a subdirectory named tedw_apps_etl.

17.2 Troubleshooting Tivoli Enterprise Data Warehouse Next, we give some useful tips for troubleshooting Tivoli Enterprise Data Warehouse. In the first subsection, we cover problems with installation. In the second subsection, we give some hints for working with the Report Interface. In the last subsection, we cover working with the Data Warehouse Center and installing your own ETLs and data marts.

760

Troubleshooting Tivoli Using the Latest Features

17.2.1 Troubleshooting core installation Before you start the installation of Tivoli Enterprise Data Warehouse, you should do the following: 򐂰 Check the Tivoli Enterprise Data Warehouse Release Notes Version 1.1, GI11-0857, for the required prerequisites and patches for the operating systems and databases. 򐂰 If you are performing a silent installation of Tivoli Enterprise Data Warehouse on a UNIX system without a local X11 server, you must set and export the DISPLAY environment variable to a valid X11 server. The X11 server can be on a different system. 򐂰 For a distributed installation, the Domain Name Service (DNS) must be able to resolve host names from short names.

When all prerequisites are met, you can start the installation. The Tivoli Enterprise Data Warehouse installer generates a log file named \TWH.log. Look into this file if the installation aborts with an error message or hangs. If the Tivoli Enterprise Data Warehouse installer fails, then all changes should be rolled back automatically. Thus, the reason for the install failure is usually not found in the last lines of the log file. Additionally there is a log file, \twh_ibm_db2_runlog.log, which contains output and errors from any DB2 commands. In this log file, you can search for MARKCORE, which marks the start of the core installation and MARK, which marks the start of a warehouse pack installation ( is the three-letter product code of the product for which you install the warehouse pack). Note that these markers are created by an attempt to connect to a non-existing database with the name of the marker, so do not worry about the error messages containing the markers. Here are some common installation problems: 򐂰 A common install error, especially in a single machine installation, is insufficient disk space. If you install all parts of Tivoli Enterprise Data Warehouse to the same drive and your TEMP directory is on the same drive, you should have 2 GB of free space. The reason for so much disk space is that the CD image is copied to TEMP to allow for CD swapping. 򐂰 The installation of Tivoli Enterprise Data Warehouse might fail with the following message in the TWH.log file: ==>Testing DB2 exec path(F) CDWIC0024E Could not execute/locate DB2 command!!!

This is because the PATH environment variable has become too long. The PATH environment variable is limited to 2075 characters in length.

Chapter 17. Tivoli Enterprise Data Warehouse

761

򐂰 The installation of Presentation Services (PS) locks if ports are already in use. You must specify unused port numbers when you install Tivoli Enterprise Data Warehouse. In particular, if there is already a Web server on the system that you plan to install the Report server on, you must un-install it, disable it, or specify a different port number for the HTTP Server Port for Tivoli Presentation Services.

17.2.2 Troubleshooting Warehouse Enablement Pack installations Before you start the installation of a Warehouse Enablement Pack, you should do the following: 򐂰 Check the Warehouse Enablement Pack’s Release Notes for required prerequisites and patches for the operating systems, databases, and Tivoli Enterprise Data Warehouse components. 򐂰 For a distributed environment, ensure that user temporary tablespace is available on the TWH_CDW and TWH_MART databases for the database user assigned during the WEP installation. 򐂰 Perform backups of the TWH_CDW, TWH_MART, and TWH_MD databases.

When these prerequisites are met, the installation may begin. The Tivoli Enterprise Data Warehouse installer generates a log file named \Apps\APPDIR\TWHAPP.log, where APPDIR represents the three letter product code for the WEP. Look into this file if the installation aborts with an error message or hangs. The following list describes some common installation problems: 򐂰 The installation of the warehouse pack fails when the Tivoli Enterprise Data Warehouse e-fixes have not been installed (1.1-TDW-0002 and 1.1-TDW-0005). In this situation, you have to install the e-fixes on the Control server (refer to each e-fix’s readme.txt file). Un-install the warehouse pack that failed (see 17.5.2, “Un-install the warehouse packs” on page 783). Restore the TWH_MD, TWH_CDW, and TWH_MART databases from the backups taken before the installation. Reinstall the warehouse pack that failed. 򐂰 The installation of an ETL2 warehouse pack fails when temporary user space is not available. In a stand alone environment, this occurs when a different user installs the Tivoli Enterprise Data Warehouse core product (for example, db2admin) and then a different user installs the ETL1 WEP (for example, db2inst1). During a distributed installation, the user temporary tablespace is not created on the remote database server. In either case, the user temporary tablespace must be created manually.

762

Troubleshooting Tivoli Using the Latest Features

To do this, use the following commands from a DB2 command prompt on the server which hosts TWH_CDW and TWH_MART: db2 “connect to TWH_CDW user using ” db2 “create user temporary tablespace usertmp2 managed by system using (‘usertmp2’)” db2 “connect to TWH_MART user using ” db2 “create user temporary tablespace usertmp3 managed by system using (‘usertmp3’)”

17.2.3 Troubleshooting the IBM Console and the Report Interface In this section, we discuss problems with the work of the Tivoli Enterprise Data Warehouse Report Interface and the IBM Console. We give useful tips as to where to start the troubleshooting. We also mention problems and known defects.

No connection to the IBM Console If you have problems opening the IBM Console in your Web browser with the URL http://hostname:port/IBMConsole, check the following: 1. See if the name of the Report server is correct. Try the fully qualified host name. Check the port of your Web server. The default value is port 80, if not changed during installation. 2. If everything is correct and you still have no connection to the IBM Console, use the IP address of the Report server instead of its host name. If this works, you probably have a problem with your name resolution. Check the NIS and DNS settings (check whether you can resolve the host name using the nslookup hostname command). Check the /etc/hosts file on UNIX or the C:\WINNT\system32\drivers\etc\hosts file on Windows machines. 3. If OK, check your network connection to the Report server (ping hostname). 4. Check if the Web server is running. Use the above URL without /IBMConsole. You should see a page displaying Welcome to the IBM HTTP Server. If not, check if the service Tivoli Presentation Services HTTP Server is started on the Report server. If not, try to start it manually. 5. If it is not possible to start this service, you can try to connect to the administration server (http://hostname:8008) and check the Web server configuration. You will probably have to create a user ID for the administration server first. Follow the instructions that are displayed after the login to the administration server has failed several times.

Chapter 17. Tivoli Enterprise Data Warehouse

763

Tip: In our testing environment, we experienced the problem that the IP was changed by DHCP but the original IP was still in the Web server configuration. Do not use DHCP on your Report server.

6. If you can connect to the Web server (with the URL but without /IBMConsole) but not to the IBM Console (URL with IBMConsole), check if the following services are started on the Report server: – Server for IBMConsole – Web Services for IBMConsole See also the log files of these two services, which are in the directories PS_install_dir/log/fwp_wc and PS_install_dir/log/fwp_mcr, respectively. 7. If you can connect to the IBM Console from a Web browser running locally on the Report server, but not from Web browsers running on (some) remote machines, check the following file on your Report server: PS_install_dir/ibmhttpd/conf/httpd.conf

This file contains redirects for your IBM Console login window. If these redirects use the short host name for your Report server, you will have problems if your client cannot correctly resolve this short host name. This problem is not solved when you use the fully qualified host name in your Web browser: # Allow simpler sign-on URL. RedirectPermanent /IBMConsole/ http://host:80/servlet/com.tivoli.pf.wc... RedirectPermanent /IBMConsole http://host:80/servlet/com.tivoli.pf.wc...

To solve this problem, change the host name to the fully qualified host name in the httpd.conf file and restart the service IBM Presentation Services HTTP Server.

Troubleshooting tips when using the Report Interface The following are troubleshooting tips for the Report Interface: 1. If you work with the Report Interface, Java Script and style sheets must be enabled in the Web browser. 2. In the upper-right corner of each task panel you find the help button (?), which provides detailed information for the task. 3. If you have created an object (for example, a report or a user) and you do not find it in the appropriate list, you can try the following: – Click Refresh, if available.

764

Troubleshooting Tivoli Using the Latest Features

– If you see no objects or old objects only, check if the database service DB2-DB2 on the control server is running. If you stopped and restarted the Control server database and you see error messages, as in Figure 17-2, restart the following services on the Report server: – Server for IBMConsole. – Web Services for IBMConsole.

Figure 17-2 Error messages in the Report Interface after database restart

See also the leadoffs of these two services, which are in the directories PS_install_dir/log/fwp_wc and PS_install_dir/log/fwp_mcr, respectively.

Problems creating data marts from customized star schemas If you do not find your star schema in the Add Star Schemas to a Data Mart dialog, check if the star schema has been created in the Data Warehouse Center. Start the Data Warehouse Center from your DB2 Control Center using Tools -> Data Warehouse Center. Expand the tree Warehouse Schemas in the left-hand side panel. You will see all available star schemas there. You can create new star schemas in the Data Warehouse Center by right-clicking Warehouse Schemas.

Problems creating the first report from new data marts The following is a list of problems that you may run into when creating the first report from new data marts: 1. Check whether your star schema contains all the necessary tables (fact table, metric table, and dimension tables). 2. The Report Interface assumes certain column names in these tables. Check the naming conventions (see “Naming Conventions” section in Tivoli Enterprise Data Warehouse Enabling an Application Version 1.1, GC32-0745).

Chapter 17. Tivoli Enterprise Data Warehouse

765

3. The connections between the tables have to be set up correctly in the star schema. The Report Interface uses these connections. They are written to the rpi.strings table in the control server database TWH_MD. They are updated by a trigger when you save the star schema in the Data Warehouse Center. The connections set up in the star schema will result in the where clauses in the SQL statements of your reports. 4. Check the SQL output from the reports pop-up menu (Show SQL...) for further hints in locating a problem. This might be helpful when no data is found while running the report. Note that you must have sufficient roles to see the SQL output. If you do not have sufficient roles, you will not see the Show SQL... entry in the pop-up menu of the reports.

17.2.4 Troubleshooting the customization In this section we discuss some troubleshooting techniques for customizing. With customizing we refer to the process of creating ETLs and integrating them into Tivoli Enterprise Data Warehouse using the Data Warehouse Center. We give some tips on how to set up the Data Warehouse Center correctly for your own source application. The information given here can also be helpful for troubleshooting errors that occur during and after the installation of data warehouse packs.

Troubleshooting ETLs The following are ideas for troubleshooting ETLs: 1. The Data Warehouse Center generates log files in the path defined by the DB2 environment variable %VWS_LOGGING%. This variable usually points to \sqllib\logging. In this directory, you find the Warehouse Agent log file Agnt.log and the Warehouse Agent environment Agnt.set. Look for the most recent files. 2. When you run processes in the Data Warehouse Center, you can see their status in the Work in Progress dialog (you can open this dialog using the Warehouse menu in the Data Warehouse Center). When you encounter errors in the process status, you can gain more information by right-clicking the failed step and selecting Show Log. Look for the first entry with the message Type Run Time Error. Right-click this message and select Show Details. 3. If you have connection problems to remote databases, try to connect to the source database using the CLI tools of the database, for example, DB2 CLP, sqlplus for Oracle, or dbaccess for Informix. You can also use the ODBC Data Source panels in Windows to test your database connection.

766

Troubleshooting Tivoli Using the Latest Features

You can also use the execsql command provided by Tivoli Enterprise Data Warehouse to test the database connection: execsql dummy dummy.out user pwd

If the name of the RDBMS vendor appears in dummy.out, then the connect was successful. 4. It is recommended that you use the SQL execution engine (execsql) and its wrapper script (sqlscript.sh) provided by Tivoli Enterprise Data Warehouse, in your own ETLs. You can get helpful troubleshooting information from the log file written by the execsql command. You find these log files under %VWS_LOGGING%\.log (for example, apf_c05_s010_init.log). These log files show the following information (see also Example 17-1): – – – –

ODBC data sources used SQL statements executed Rows affected per SQL statement Elapsed time per SQL statement

Example 17-1 An execsql log file example ======== Began 2001.12.21 18:58:47.818 ======== ======================== = Source Datasource : oracle816b = Source User Name : scott = DB Vendor : Oracle 8 08.01.0006 Oracle 8.1.6.0.0 = DB Server Name : = Target Datasource : oracle816b = Target User Name : scott = DB Vendor : Oracle 8 08.01.0006 Oracle 8.1.6.0.0 = DB Server Name : = Input File : e:/TWH/apps/apf/v1/etl/sql/apf_c05_s010_test.oracle ======================== = SOURCE SQL Statement: "insert into tab2 values ('a')" = Elapsed Time : 00:00:00.1000 = Rows Modified : 1 = Successful Execution: No Errors ======================== = SOURCE SQL Statement: "insert into tab2 values ('b')" = Elapsed Time : 00:00:00.1000 = Rows Modified : 1 = Successful Execution: No Errors ======================== ======== Completed 2001.12.21 18:58:48.138 ========

Chapter 17. Tivoli Enterprise Data Warehouse

767

5. Before attempting to run your custom ETL script from the Data Warehouse Center, you can run the script from the command line to validate the script. Start the bash program, which is installed with Tivoli Enterprise Data Warehouse, and enter the following command: sqlscript.sh product_code script_name source_db source_uid source_pwd target_db target_uid target_pw

Where script_name is the name of your custom script. 6. If you get any errors from logging into the Data Warehouse Center, check that the control database is set to TWH_MD. To do this task, click Advanced on the Data Warehouse Control logon panel. Also, make sure that you set up the control database correctly in the Warehouse Control Database Management in the Start -> Programs -> DB2 menu. Check the ODBC Data Source of the control database. 7. If you see errors in the Data Warehouse Center after a database restart, restart the vwserver and vwlogger services. 8. For Windows NT and Windows 2000, the vwserver and vwlogger services do not log on as the DB2 user, which causes ETL processes to fail. There are workarounds for this problem: – Workaround for Windows NT: i. Open the Services window. ii. Select Warehouse logger. iii. Select the Startup button. iv. Click This Account. v. Type the DB2 user ID. vi. Type the DB2 password in the Password field. vii. Type the DB2 password in the Confirm Password field. viii.Click OK . ix. Repeat step a through step h for the Warehouse Server. x. Stop and then restart the vwserver and vwlogger services. – Workaround for Windows 2000: i. Open the Services window. ii. Select Warehouse logger -> Action -> Properties. iii. Click the Log On tab. iv. Click This account. v. Type the DB2 user ID.

768

Troubleshooting Tivoli Using the Latest Features

vi. Type the DB2 password in the Password field. vii. Type the DB2 password in the Confirm Password field. viii.Click OK . ix. Repeat step a through step h for the Warehouse Server. x. Stop and then restart the vwserver and vwlogger services.

17.3 Troubleshooting ETLs The following are some issues to consider when troubleshooting ETLs.

17.3.1 Running ETLs This section discusses some common problems encountered when running ETLs.

ETL logs In general, a good starting point when troubleshooting ETLs is looking at their logs. With the Data Warehouse Center open, do the following: 1. Select Warehouse -> Work In Progress. This will bring up the Work in Progress window. 2. From the Work In Progress window, right click on the failed process. Select Show Log, as seen in Figure 17-3.

Figure 17-3 Accessing ETL process logs in Work in Progress

3. The Log window appears. Expand the window until the column Message Type can be fully read. Right-click any line containing ‘Run Time Error’. Select Show Details (Figure 17-4 on page 770).

Chapter 17. Tivoli Enterprise Data Warehouse

769

Figure 17-4 Accessing the log details

4. The next window to appear is the Log Details window. This window will show you what error the process encountered (Figure 17-5 on page 771).

770

Troubleshooting Tivoli Using the Latest Features

Figure 17-5 Log details

ETL1 fails without ETL2 installed Do not attempt to run the processes for ETL1 before ETL2 is installed. In our testing, we found the process AMX_c05_s040_Rim_Extract from the IBM Tivoli Monitoring Version 5.1.1 Generic Warehouse Enablement pack fails when run before an ETL2 is installed. In the event this mistakenly happens, apply the following steps: 1. Install an ETL2 WEP that relates to the previously installed ETL1 WEP. In our case study, we installed the IBM Tivoli Monitoring Version 5.1.1 Warehouse Enablement Pack for Operating Systems (AMY). 2. If using the IBM Tivoli Monitoring Version 5.1.1 Generic Warehouse Enablement pack, the Extract_Control window must be reset. Otherwise, no data will be collected when the processes for this ETL run, even though they complete successfully. Run AMX_Reset_Extract_Window.bat to reset the Extract_Control window. This and other tools are located under the \apps\amx\v511\misc\tools directory.

Chapter 17. Tivoli Enterprise Data Warehouse

771

Note: This tool should be used only to restart the Extract Control window for the AMX_c05_ETL1_Process from the beginning. If you want to reset the window to the last extract, use the extract_log to get the last values of each extract function used by Generic ETL1. Extract_Win.bat shows the TWG.Extract_Control and TWG.Extract_Log windows based in the integer sequence.

3. Run the processes for ETL1 and ETL2.

ETL processing fails due to incorrect configuration In some cases, ETL processing fails due to incorrect configuration of the source and target database in the Data Warehouse Center. Ensure that all users IDs and passwords are correctly configured for each source and target database. Specifically, when dealing with ETL1s that extract data from application databases, ensure that the user ID specified in the Data Warehouse source database matches the schema owner in the application database. Another common mistake with ETL1s is that the tables used in the ETL have the incorrect schema owner. In our case study, the IBM Tivoli Monitoring Version 5.1.1 Generic Warehouse Enablement pack uses the endpoint table from the ITM_DB application database. By default, our installation set the ETL to access DB2ADMIN.ENDPOINT, but the actual table from ITM_DB was DB2INST1.ENDPOINT. The ETL processes failed until this was updated.

ETLs fail because the transaction log is full Another common error is that an ETL process fails because the transaction log for the database is full (see Figure 17-5 on page 771). The recommended solution is to increase either the size of the transaction log or the number of secondary transaction logs for the database on which the ETL fails. Most likely, this might happen when an ETL is trying to prune data from a source application database (in our case, ITM_DB). We do not recommend increasing the number of primary transaction logs, because they are pre-allocated. You can use the DB2 Control Center in order resolve this problem: 1. Start the IBM DB2 Control Center utility by selecting Start -> Programs -> IBM DB2 -> Control Center. 2. Expand the Systems folder until you reach the database in which the transaction log is full. Right click on the database, and select Configure (Figure 17-6 on page 773).

772

Troubleshooting Tivoli Using the Latest Features

Figure 17-6 Configuring a database’s transaction log file

3. A Configure Database window appears. Select the Logs tab (Figure 17-7 on page 774). 4. The two parameters that should be changed are Log File Size or Number of Secondary Log Files. Change these to fit your environment’s needs.

Chapter 17. Tivoli Enterprise Data Warehouse

773

Figure 17-7 Configuring the log file parameters

17.3.2 The Data Warehouse Center fails to open These are the most common occurrences when the Data Warehouse Center fails to open: 򐂰 Incorrect user ID or password 򐂰 The Data Warehouse Center control database is not set.

The following instructions describe how to specify the control database: a. On the Windows taskbar, select Start Programs -> IBM DB2 -> Control Center. The Control Center window is displayed. b. From the DB2 Control Center, start the DB2 Data Warehouse Center by selecting Tools -> Data Warehouse Center. The Data Warehouse Center Logon window is displayed. c. In the Data Warehouse Center Logon window, click Advanced. d. Type TWH_MD for the control database and click OK (Figure 17-8 on page 775).

774

Troubleshooting Tivoli Using the Latest Features

e. The Configure Data Warehouse Center window appears.

Figure 17-8 How to specify the control database

f. Click Cancel to close the logon panel. g. Open the Control Database Management window. On the Windows taskbar, select Start -> Programs -> IBM DB2 -> Warehouse Control Database Management. h. Type TWH_MD in the New control database field and type the DB2 user name and password, then click OK (Figure 17-9 on page 776).

Chapter 17. Tivoli Enterprise Data Warehouse

775

Figure 17-9 Control Database Management window

i. Click OK to configure the Warehouse Control Database Management window j. When the Processing has completed message appears, click Cancel. k. The setup and connection are now complete. 򐂰 The services for the Data Warehouse Center need to be restarted. From a command prompt, issue the following commands: net net net net

stop vwkernel stop vwlogger start vwlogger start vwkernel

17.3.3 Data marts show old data In some situations, reports run, but they do not contain any new data. A common misconception is that something is broken within the Tivoli Enterprise Data Warehouse. Although there could be an error within the Tivoli Enterprise Data Warehouse, there are many cases in which malfunctions in the source application prevent the application database from being updated. Obviously, it is critical to determine where the data collection is not occurring.

776

Troubleshooting Tivoli Using the Latest Features

Generally, we follow these steps while troubleshooting this issue: 1. Query the source application database to determine when the last collection occurred. For a detailed example, refer to 17.6, “Troubleshooting IBM Tivoli Monitoring Version 5.1.1 TEDW Support” on page 783. 2. If the application database has current data, query TWH_CDW to ensure it possesses current data. Use the following SQL statement to show the last date the MSMT table in TWH_CDW was updated (you might have to alter the query if you have multiple ETL1s running): db2 select max(msmt_strt_dt) from twg.msmt

If the data is not current, check the ETL logs for error messages (see 17.3.1, “Running ETLs” on page 769). 3. If data in TWH_CDW is current, query your WEP’s data mart. In our case study, we installed the IBM Tivoli Monitoring Version 5.1.1 Warehouse Enablement Pack for Operating Systems. We ran the following command to determine the date of the last data insertion into the data mart: db2 select max(meas_hour) from amy.f_os_hour

If, after applying these steps, it is determined that the data in the data mart is current, most likely there is an error with the reports. Please refer to 17.2.3, “Troubleshooting the IBM Console and the Report Interface” on page 763 for information on troubleshooting the Tivoli Enterprise Data Warehouse Report Interface. If using a third party business intelligence product, please refer to that product’s documentation.

17.4 Maintenance and backup This section describes how to use warehouse programs to maintain your warehouse database. We will also talk about how to back up and restore Tivoli Data Warehouse databases.

17.4.1 Removing old data from the Data Warehouse Center logs You should regularly delete information in the Data Warehouse Center log files, IWH*.log, located in the directory specified by the %VWS_LOGGING% environment variable. These log files grow rapidly. You can only delete information from these files when the Data Warehouse Center services Warehouse Server and Warehouse Logger are stopped. You can do this in a script, as in Example 17-2 on page 778.

Chapter 17. Tivoli Enterprise Data Warehouse

777

Example 17-2 Purging script net stop vwkernel net stop vwlogger

net start vwlogger net start vwkernel

17.4.2 Removing old data from the central data warehouse You can control how often data is removed, or pruned, from the central data warehouse using a combination of triggers and warehouse processes. This is done by completing these tasks: 1. Scheduling the pruning process 2. Specifying the data to be pruned For more information on data pruning processes and the database tables used, see Tivoli Enterprise Data Warehouse Enabling an Application Version 1.1, GC32-0745. Pruning processes for warehouse packs are defined by each application. See the documentation provided with each warehouse pack for information about pruning the data for that application.

17.4.3 Reorganizing the data You can use the DB2 reorganize warehouse program to rearrange a table in physical storage. This eliminates fragmentation and ensures that the table is stored efficiently in the database. You can also use reorganization to control the order in which the rows of a table are stored, usually according to an index. You can define reorganization steps in a Data Warehouse process (see Figure 17-10 on page 779).

778

Troubleshooting Tivoli Using the Latest Features

Figure 17-10 Create a reorganization step

You can use a warehouse source or target as a source for this step subtype. The REORG program writes to the source table. To define values for a step that runs a DB2 UDB REORG warehouse program: 1. Open the Steps Property notebook. 2. Specify general information about the program. 3. Optional: On the Parameters page, specify information for the REORG step: – In the Using temporary table space field, type the name of the temporary table space that should be used during the REORG step. – In the Using index field, type the name of the index that should be used during the REORG step. 4. On the Processing Options page, provide information about how your step processes. 5. Click OK to save your changes and close the Step Property notebook.

Chapter 17. Tivoli Enterprise Data Warehouse

779

17.4.4 Updating system catalog statistics You can use the DB2 RUNSTATS warehouse program to gather statistics about the physical and logical characteristics of a table and its indexes. DB2 Universal Database uses these statistics to determine the best way to access your data. You can use the DB2 UDB RUNSTATS warehouse program to create a step that can be used to update system catalog statistics on the data in a table, the data in the table indexes, or the data in both the table and its indexes. The optimizer uses these statistics to choose which path will be used to access the data. In general, you need to update statistics if there are extensive changes to the data in the table. Create a RUNSTATS step in a process (see Figure 17-10 on page 779). The RUNSTATS program uses a warehouse target as a source and a target. Link a warehouse target to the step in the Process Model window before you define the values for the step. To define values for a step that runs a DB2 UDB RUNSTATS warehouse program: 1. Open the Steps Property notebook. 2. Specify the general information about the warehouse program. 3. Optional: On the Parameters page, specify information for the RUNSTATS warehouse program: a. Specify the level of statistics you want to gather for the table by clicking a radio button under Statistics for the table. b. Specify the level of statistics you want to gather for the table’s indexes by selecting a radio button under Statistics for the indexes. c. Use the Share level radio buttons to specify the type of access you want other users to have to the table while the statistics are being gathered. 4. On the Processing Options page, provide information about how your step processes. 5. Click OK to save your changes and close the Step Properties notebook.

17.4.5 Backup This section provides information about backing up and restoring the Tivoli Enterprise Data Warehouse databases. When planning back-up operations or performing restore operations, you must consider the relationships between the data in Tivoli Enterprise Data Warehouse databases. Some examples follow: 򐂰 If an older version of the control database is restored with a newer version of the central data warehouse database, log messages and ETL run status

780

Troubleshooting Tivoli Using the Latest Features

generated by the Data Warehouse Center are lost and will not match the state of the central data warehouse database. 򐂰 If a data mart database is completely lost and no backup exists, you might be able to recreate the data mart database from data in the central data warehouse. To recreate the data mart database, you must adjust the extract control information for the star schemas in the data mart database located in the central data warehouse, and then run the data mart ETL processes for each star schema in the lost data mart database. It is not possible to recreate a data mart database from data that has been pruned from the central data warehouse. Note that the extract control parameters in the database must be manually changed to ensure that all data is restored. 򐂰 If an old copy of the central data warehouse database is restored along with newer copies of the control database and newer copies of data mart databases, manual adjustment of extract control tables might be required to pull additional data from source applications to bring the central data warehouse database up-to-date. In some cases, source data might have been pruned after data was populated into the central data warehouse database, making it impossible to recover all of the data. Some data mart ETL processes might encounter problems as they attempt to reinsert records from the recovered central data warehouse database into more recent copies of data mart databases. In these cases, manual intervention by a database administrator might be required to fully recover the system. 򐂰 User definitions for the Report Interface are stored in the Tivoli Presentation Services directory. When the users are assigned to user groups in the Report Interface, a subset of user information is stored in the Tivoli Enterprise Data Warehouse control database.

17.5 Un-install components Here we give the basic steps for the un-installation of the Tivoli Enterprise Data Warehouse core product and the Tivoli Enterprise Data Warehouse application packs. You can find more detailed information and troubleshooting hints for un-installation in Tivoli Enterprise Data Warehouse Installing and Configuring Version 1.1, GC32-0744.

17.5.1 Un-install Tivoli Enterprise Data Warehouse core product Tivoli Enterprise Data Warehouse might be installed on one machine or distributed on up to four machines. However, if you start the un-install process on one machine, you have to un-install all components on this machine. There is no option to select components to un-install.

Chapter 17. Tivoli Enterprise Data Warehouse

781

If there is one component per machine, removing a component will allow a reinstall of the same components on that machine. However, if you remove the Control server, then all other components on all other machines must be removed. If you un-install a distributed environment, make sure that you un-install the Control server last. Otherwise, you might not be able to un-install the remaining Tivoli Enterprise Data Warehouse components. During the Tivoli Enterprise Data Warehouse un-installation process, all Tivoli Enterprise Data Warehouse-related databases will be dropped. This will fail if databases are locked by other processes. To make sure that the databases can be dropped successfully, you should stop and start the database before starting the uninstallation: db2 stop force db2 start.

To start the Tivoli Enterprise Data Warehouse un-installation process on a Windows machine, type: %TWH_TOPDIR%\uninstall\uninstall.exe

Or on a UNIX machine: $TWH_TOPDIR/uninstall/uninstall.bin.

See %TWH_TOPDIR%/TWHUninstall.log for troubleshooting. The Tivoli Enterprise Data Warehouse uninstaller does not un-install Presentation Services (PS), but merely removes the Tivoli Enterprise Data Warehouse components from PS. This should allow you to reinstall Tivoli Enterprise Data Warehouse. If it does not work, you have to manually un-install PS. If required, use the PS uninstaller %PS_TOPDIR%\uninstall.bat to un-install PS. If this fails, you can manually un-install PS. Use regedt32 to remove the following NT services from HKLM\System\CurrentControlSet\Services: 򐂰 򐂰 򐂰 򐂰

ps_mcr ps_wc TivoliPresentationServicesHTTPAdministration TivoliPresentationServicesHTTPServer

After this step you have to reboot your machine. Then remove the PS and TWH installation directories, if they exist. You have to edit the vpd.properties file, which is located in the %WINDIR% directory on Windows machines, in /usr/lib/objrepos on AIX machines, and in /root on Linux machines. This file tracks all products installed using InstallShield. Remove all lines beginning with Tivoli Enterprise

782

Troubleshooting Tivoli Using the Latest Features

Data Warehouse or with Tivoli_Enterprise_Data_Warehouse and entries from the Presentation Service. The next step is to drop the DB2 databases TWH_MD, TWH_CDW, and TWH_MART: db2stop force db2 drop db .

If you get the error database not found, then catalog the databases and then drop them: db2 catalog db

17.5.2 Un-install the warehouse packs Before you start the un-installation, you should do the same backups as for the installation of warehouse packs. Warehouse packs can be un-installed with the following steps: 1. List the Warehouse Enablement Packs installed using the following command \install\bin\twh_app_deinstall.sh -c twh_app_deinstall.cfg --listapps

2. Edit the twh_app_deinstall.cfg configuration file in the same directory. Follow the instructions in this file. You have to insert the product code and the DB2 password for each database. 3. When ready, run: twh_app_deinstall.sh -c twh_app_deinstall.cfg

If you are un-installing a warehouse pack to recover from a failed installation, restore the backups taken before the failed installation. You must do this restore before trying to run the install again.

17.6 Troubleshooting IBM Tivoli Monitoring Version 5.1.1 TEDW Support In this section, we will cover troubleshooting of IBM Tivoli Monitoring Version 5.1.1 TEDW Support with a sample warehouse pack for troubleshooting. We will provide information on troubleshooting commands and techniques that can be used whenever the expected data is not collected into the IBM Tivoli Monitoring database, which is one of the data sources for the Tivoli Enterprise Data Warehouse.

Chapter 17. Tivoli Enterprise Data Warehouse

783

The steps required for checking the data collection process are the following: 1. Retrieving the date of the last data upload into the ITM database 2. Testing the connection between the RIM host and ITM database 3. Checking the status of distributed resource models 4. Reviewing data collection parameters 5. Checking trace files

17.6.1 Retrieving the date of last data upload into ITM database Connect to ITM_DB: db2 => connect to ITM_DB user db2inst1 using Database Connection Information Database server SQL authorization ID Local database alias

= DB2/NT 7.2.0 = DB2INST1 = ITM_DB

then select the date of last data upload: db2 => select max(timekey_dttm) from metricsdata 1 -------------------------2002-10-21-22.00.39.000000 1 record(s) selected.

The date of last data upload allows an estimation of the timeframe in which the data collection process stopped. This estimate can be very useful when examining the monitoring trace files for understanding the reasons of data upload failure. You can also check the host names of all endpoints that provided data: db2 => select host_name from endpoints HOST_NAME ---------------------------------------------------------tedw1.itsc.austin.ibm.com tedw2.itsc.austin.ibm.com yarmouth.itsc.austin.ibm.com chatham.itsc.austin.ibm.com 4 record(s) selected.

784

Troubleshooting Tivoli Using the Latest Features

This information can be used to track which monitored endpoints are not providing data.

17.6.2 Testing the connection between RIM host and ITM database Once you have ascertained that ITM_DB does not receive data from monitored endpoints, you should check the connection between the RIM host and the database. From any managed node of your Tivoli region that has the Tivoli environment configured, run the wrimtest -l command, as in Example 17-3. If this test fails, you should review all the parameters specified during RIM object creation. Example 17-3 wrimtest eastham:/>wrimtest -l itm_rim_chatham Resource Type : RIM Resource Label : itm_rim_chatham Hostname : chatham User Name : db2inst1 Vendor : DB2 Database : itm_db Database Home : /usr/lpp/db2_07_01 Server ID : tcpip Instance Home : /home/db2inst1 Opening Regular Session...Session Opened RIM : Enter Option >

Often, a connection failure is simply caused by a change in the database password without the needed update of the RIM object. To change the RIM password, use the command: wsetrimpw rim_name

17.6.3 Checking the status of distributed resource models The wdmlseng -e command can be used to list the status of all resource models distributed to the specified endpoint. Only the resource models with status Running are collecting data on the endpoint.

Chapter 17. Tivoli Enterprise Data Warehouse

785

Example 17-4 Status of resource models distributed on an endpoint eastham:/>wdmlseng -e tedw1 Forwarding the request to the engine...

The following profiles are running: ITM_NT_monitors#eastham-region TMW_LogicalDisk :Running TMW_Process :Running TMW_MemoryModel :Running TMW_Processor :Running

To change the status of resource models, modify the original monitoring profile using the Tivoli Desktop and redistribute it to the endpoint. If the monitoring engine is not running correctly on the endpoint, you can try to restart using the command: wdmcmd

-restart -e

17.6.4 Reviewing data collection parameters The data collector process populates the ITM_DB database with the monitoring value metric from the endpoints at every interval you specified in the wdmcollect command. The collected metrics are then temporarily cached in the managed node in the $DBDIR/dmml/tedw/ directory. The file is a zip file that contains the metrics information in XML format. Once you issue the wdmcollect command, after the interval is expired, you can check whether the file is created. If your ITM database do not receive data even if your resource models are correctly running on the monitored endpoints and the connection between RIM host and database is working, you should verify that: 1. The Enable Data Logging and TEDW Data in Logging option of your monitoring profiles is checked). 2. The data collection parameters are correct (use the wdmconfig -m -G datacollector* command). 3. The gateway data collection process was started with the wdmcollect command (use wdmcollect -m all -q to list the active collection processes for all managed nodes).

786

Troubleshooting Tivoli Using the Latest Features

17.6.5 Checking trace files If you went through all the previous steps, but your ITM database still does not receive data, you can examine the log files. There are several trace and log files related to the data collector process under the $DBDIR/AMW/logs directory on the managed nodes. They are: 򐂰

msg_DataCollector.log

򐂰

trace_tmnt_datacollector_eng1.log

򐂰

trace_tmnt_hb_eng1.log

򐂰

trace_tmnt_profile_core1.log

򐂰

trace_tmnt_rimh_eng1.log

򐂰

trace_tmnt_rm_eng1.log

򐂰

trace_tmnt_task_eng1.log

The message file contains operational messages, while the trace files contain error messages. Each function of the data collector has its own trace file. These functions are the main data collector engine, heartbeat engine, RIM interface, resource model engine, and task execution engine. For TEDW interface, the important files are msg_DataCollector.log and trace_tmnt_rimh_eng1.log. In Example 17-5, we show the output of msg_DataCollector.log for a successful data upload from endpoint tedw1; you can easily follow all the steps from the distribution to the upload to the database. Example 17-5 msg_DataCollector.log 1035246605000Tue Oct 22 00:30:05 2002 GMTAMWDataCollectoreastham12488AMW - AMW0181I The distribution with ID '19431921771035246602' succeeded on the endpoint 'tedw1'.

1035247476000Tue Oct 22 00:44:36 2002 GMTAMWDataCollectoreastham12488AMW - AMW0189I The data related to the request '520207df' has been successfully received 1035247480000Tue Oct 22 00:44:40 2002 GMTAMWDataCollectoreastham19074AMW - AMW0202I The file '/var/spool/Tivoli/eastham.db/dmml/tedw/tedw1/1035246605.zip' is going to be processed for upload.

1035247481000Tue Oct 22 00:44:41 2002 GMTAMWDataCollectoreastham19074AMW - AMW0198I -

Chapter 17. Tivoli Enterprise Data Warehouse

787

The data related to file: '/var/spool/Tivoli/eastham.db/dmml/tedw/tedw1/1035246605.zip.dir/ITM_WH@2002#10 #22#0 #20.xml' have been successfully loaded into the DataBase

In Example 17-6, we show the occurrence of a failure in the data collection process. In this case, the problem was caused by a stop of Tivoli Management Framework processes on the RIM host. Example 17-6 trace_tmnt_rimh_eng1.log 1037723936000Tue Nov 19 16:38:56 2002 GMTAMWdatacollectoreastham18332MIN../../../../.. /src/objects/DataCollector/platform/StoreData/RIMConnectionHandler.cxxRIMCon nectionHandler::connect()537 026664'FRWSL0005E A communications failure occurred: FRWOG0014E destination dispatcher unavailable Please refer to the TME 10 Framework Planning and Installation Guide, "TME Maintenance and Troubleshooting" for details on diagnosing communication errors or contact your Tivoli support provider.

The IBM Tivoli Monitoring has important log files also at the endpoints: 򐂰 For Windows platform, most information is stored in the Tmw2k.log file under the $LCF_DATDIR/LCFNEW/Tmw2k directory. 򐂰 For the UNIX platform, the logs are under the $LCF_DATDIR/LCFNEW/AMW/logs directory. These log files are:

– msg_dmxengine.log – trace_dmxengine.log – trace_dmxeu.log – trace_dmxntv.log As for the managed nodes logs, the message file contains operational messages, while the trace files contain error messages. In Example 17-7 on page 789, we show how trace_dmxengine.log reports a problem with a distributed resource model (in this case, the Network Interface resource model in ITM_Unix_monitor profile).

788

Troubleshooting Tivoli Using the Latest Features

Example 17-7 trace_dmxengine.log 1036710042642Thu Nov 07 17:00:42 CST 2002AMWEngineyarmouth1224 0MINReferenceModel profile=ITM_Unix_monitor#eastham-region model=DMXNetworkInterfaceThread[TmrSrvAction_RMTimer,5,main]No NICs found!None

For further information about IBM Tivoli Monitoring troubleshooting refer to Appendix C of IBM Tivoli Monitoring User’s Guide Version 5.1.1, SH19-4569.

Chapter 17. Tivoli Enterprise Data Warehouse

789

790

Troubleshooting Tivoli Using the Latest Features

Part 3

Part

3

Configuration and operation applications

© Copyright IBM Corp. 2003. All rights reserved.

791

792

Troubleshooting Tivoli Using the Latest Features

18

Chapter 18.

Tivoli Workload Scheduler In this chapter, we want you to get familiar with the identification and isolation of the most common problems encountered in a Tivoli Workload Scheduler environment. We will cover troubleshooting tips and techniques when using Tivoli Workload Scheduler for z/OS (formerly known as Operations, Planning, and Control, or OPC), Tivoli Workload Scheduler (formerly known as Maestro) and end to end solution, which uses both products together in an integrated mainframe and distributed scheduling environment. Note: This chapter is based on Tivoli Workload Scheduler and Tivoli Workload Scheduler for Z/OS Version 8.1.

The following topics are discussed in this chapter: 򐂰 Section 18.1, “Tivoli Workload Scheduler” on page 794 򐂰 Section 18.2, “Tivoli Workload Scheduler for z/OS” on page 799 򐂰 Section 18.3, “End-to-end scheduling architecture” on page 805 򐂰 Section 18.4, “Troubleshooting for Tivoli Workload Scheduler for z/OS” on page 819 򐂰 Section 18.5, “Troubleshooting end-to-end solution” on page 839 򐂰 Section 18.6, “Troubleshooting the Job Scheduling Console” on page 854 򐂰 Section 18.7, “Troubleshooting Tivoli Workload Scheduler” on page 859

© Copyright IBM Corp. 2003. All rights reserved.

793

18.1 Tivoli Workload Scheduler Tivoli Workload Scheduler’s scheduling features help you plan every phase of production. During the processing day, the Tivoli Workload Scheduler production control programs manage the production environment and automate most operator activities. Tivoli Workload Scheduler prepares jobs for execution, resolves interdependencies, and launches and tracks each job. Because jobs start running as soon as their dependencies are satisfied, idle time is minimized, and throughput improves significantly. Jobs never run out of sequence, and, if a job fails, Tivoli Workload Scheduler handles the recovery process with little or no operator intervention. Tivoli Workload Scheduler is composed of three major parts: 򐂰 Tivoli Workload Scheduler engine

Also called the scheduling engine. This is installed on every computer that should participate in a Tivoli Workload Scheduler network. The engine is a complete Tivoli Workload Scheduler installation, which means all Tivoli Workload Scheduler services and components are installed on the computer. When doing the installation, the engine is configured for the role that the computer with the engine is going to play within the Tivoli Workload Scheduler scheduling network, such as master domain manager, domain manager, or fault tolerant agent. The configuration of the engine role is done in two places: In parameter files (localopts and globalopts), and in the database definition for the Tivoli Workload Scheduler workstation that represents the engine on the physical computer. 򐂰 Tivoli Workload Scheduler connector

Maps Job Scheduling Console commands to the Tivoli Workload Scheduler engine. The Tivoli Workload Scheduler connector runs on the master and on any of the fault tolerant agents (FTA) that you will use as backup machines for the master workstation. The connector requires the Tivoli Management Framework configured for a Tivoli Server or Tivoli managed node. 򐂰 Job Scheduling Console (JSC)

A Java-based graphical user interface (GUI) for the Tivoli Workload Scheduler suite. The Job Scheduling Console runs on any machine from which you want to manage Tivoli Workload Scheduler plan and database objects. It provides, through the Tivoli Workload Scheduler connector, conman and composer functionality. The Job Scheduling Console is not required to be installed on the same machine with the Tivoli Workload Scheduler engine or connector. You can use the Job Scheduling Console from any machine as long as it has a TCP/IP link with the machine running the Tivoli Workload Scheduler connector.

794

Troubleshooting Tivoli Using the Latest Features

In the next sections, we will provide an overview of the Tivoli Workload Scheduler network and workstations, the topology used to describe the architecture in Tivoli Workload Scheduler, and the two basic aspects of job scheduling in Tivoli Workload Scheduler: The databases, and the plan and the terminology used in Tivoli Workload Scheduler.

18.1.1 The Tivoli Workload Scheduler network A Tivoli Workload Scheduler network is made up of the workstations, or CPUs, on which jobs and job streams are run. A Tivoli Workload Scheduler network contains at least one Tivoli Workload Scheduler domain, the master domain, in which the master domain manager is the management hub. It is the master domain manager that manages the databases and it is from the master domain manager that you define new objects in the databases. Additional domains can be used to divide a widely distributed network into smaller, locally managed groups. In a single domain configuration, the master domain manager maintains communications with all of the workstations (fault tolerant agents) in the Tivoli Workload Scheduler network (see Figure 18-1).

MASTERDM AIX

Master Domain Manager

FTA1

FTA2 AIX

FTA3 OS/400

FTA4

Windows 2000

Solaris

Figure 18-1 Tivoli Workload Scheduler network with only one domain

Chapter 18. Tivoli Workload Scheduler

795

Using multiple domains reduces the amount of network traffic by reducing the communications between the master domain manager and the other computers in the network. In Figure 18-2, we have a Tivoli Workload Scheduler network with three domains.

MASTERDM AIX

Master Domain Manager

DomainA

DomainB AIX

Domain Manager DMA

FTA1

Domain Manager DMB

FTA2 AIX

FTA3 OS/400

HPUX

FTA4

Windows 2000

Solaris

Figure 18-2 Tivoli Workload Scheduler network with three domains

In a multi-domain configuration, the master domain manager communicates with the workstations in its domain and with the subordinate domain managers. The subordinate domain managers, in turn, communicate with the workstations in their domains and subordinate domain managers. Multiple domains also provide fault-tolerance by limiting the problems caused by losing a domain manager to a single domain. To limit the effects further, you can designate backup domain managers to take over if their domain managers fail. Before the start of each new day, the master domain manager creates a plan for the next 24 hours. This plan is placed in a production control file named Symphony. Tivoli Workload Scheduler is then restarted in the network, and the master domain manager sends a copy of the Symphony file to each of its automatically linked agents and subordinate domain managers. The domain managers, in turn, send copies of the Symphony file to their automatically linked agents and subordinate domain managers.

796

Troubleshooting Tivoli Using the Latest Features

Once the network is started, scheduling messages like job starts and completions are passed from the agents to their domain managers through the parent domain managers to the master domain manager. The master domain manager then broadcasts the messages throughout the hierarchical tree to update the Symphony files of domain managers and fault tolerant agents running in full status mode. It is important to remember that Tivoli Workload Scheduler does not limit the number of domains or levels (the hierarchy) of your Tivoli Workload Scheduler network. The number of domains or levels in your Tivoli Workload Scheduler network should be based on the topology of the physical network where you want to implement the Tivoli Workload Scheduler network.

18.1.2 Tivoli Workload Scheduler workstation types For most cases, workstation definitions refer to physical workstations. However, in the case of extended and network agents, the workstations are logical definitions that must be hosted by a physical Tivoli Workload Scheduler workstation. Tivoli Workload Scheduler workstations can be of the following types: 򐂰 Master domain manager (MDM)

The master domain manager in the topmost domain of a Tivoli Workload Scheduler network. It contains the centralized database files used to document scheduling objects. It creates the plan at the start of each day, and performs all logging and reporting for the network. The plan is distributed to all subordinate domain managers and fault tolerant agents. 򐂰 Backup master

A fault tolerant agent or domain manager capable of assuming the responsibilities of the master domain manager for automatic workload recovery. The copy of the plan on the backup master is updated with the same reporting and logging as the master domain manager plan. 򐂰 Domain manager

The management hub in a domain. All communications to and from the agents in a domain are routed through the domain manager. The domain manager can resolve dependencies between jobs in its subordinate agents. The copy of the plan on the domain manager is updated with reporting and logging from the subordinate agents.

Chapter 18. Tivoli Workload Scheduler

797

򐂰 Backup domain manager

A fault tolerant agent capable of assuming the responsibilities of its domain manager. The copy of the plan on the backup domain manager is updated with the same reporting and logging information as the domain manager plan. 򐂰 Fault tolerant agent (FTA)

A workstation capable of resolving local dependencies and launching its jobs in the absence of a domain manager. It has a local copy of the plan generated in the master domain manager. It is also called workstation tolerant agents. 򐂰 Standard agent

A workstation that launches jobs only under the direction of its domain manager. 򐂰 Extended agent

A logical workstation definition that enables you to launch and control jobs on other systems and applications, such as Peoplesoft, Oracle Applications, SAP, and MVS JES2 and JES3. 򐂰 Network agent

A logical workstation definition for creating dependencies between jobs and job streams in separate Tivoli Workload Scheduler networks. 򐂰 Job Scheduling Console client

Any workstation running the graphical user interface from which schedulers and operators can manage Tivoli Workload Scheduler plan and database objects. Actually, this is not a workstation in the Tivoli Workload Scheduler network; the Job Scheduling Console client is where you work with the Tivoli Workload Scheduler database and plan. Figure 18-3 on page 799 shows a Tivoli Workload Scheduler network with some of the different workstation types.

798

Troubleshooting Tivoli Using the Latest Features

MASTERDM

AIX

Master Domain Manager

DomainA

DomainB AIX

HPUX

Domain Manager DMA

Domain Manager DMB Job Scheduling Console

FTA1

FTA2

FTA3 Solaris

HPUX

DomainC

DomainD

DMC

Solaris

DMD

FTA5 AIX

DomainE AIX

AIX

FTA4

AIX

OS/400

FTA6 Win NT

DME

FTA7

FTA8 Win 2K

FTA9 AIX

HPUX

Figure 18-3 Tivoli Workload Scheduler network with different manager and agents

18.2 Tivoli Workload Scheduler for z/OS Tivoli Workload Scheduler for z/OS expands the scope for automating your data processing operations. It plans and automatically schedules the production workload. From a single point of control, it drives and controls the workload processing at both local and remote sites. By using Tivoli Workload Scheduler for z/OS to increase automation, you use your data processing resources more efficiently, have more control over your data processing assets, and manage your production workload processing better. Tivoli Workload Scheduler for z/OS is composed of three major features: 򐂰 The Tivoli Workload Scheduler for z/OS agent feature

The agent is the base product in Tivoli Workload Scheduler for z/OS. The agent is also called a tracker. It must run on every operating system in your z/OS complex on which Tivoli Workload Scheduler for z/OS controlled work runs. The agent records details of job starts and passes that information to the engine, which updates the plan with statuses.

Chapter 18. Tivoli Workload Scheduler

799

򐂰 The Tivoli Workload Scheduler for z/OS engine feature

One z/OS operating system in your complex is designated the controlling system and it runs the engine. The engine is also called the controller. Only one engine feature is required, even when you want to establish standby engines on other z/OS systems in a syspelx. The engine manages the databases and the plans and causes the work to be submitted at the appropriate time and at the appropriate system in your z/OS sysplex or on another system in a connected z/OS sysplex or z/OS system. 򐂰 The Tivoli Workload Scheduler for z/OS end-to-end feature

This feature makes it possible for the Tivoli Workload Scheduler for z/OS engine to manage production workload in a Tivoli Workload Scheduler distributed environment. You can schedule, control, and monitor jobs in Tivoli Workload Scheduler from the Tivoli Workload Scheduler for z/OS engine with this feature. The end-to-end feature is covered in 18.3, “End-to-end scheduling architecture” on page 805. The workload on other operating environments can also be controlled with the open interfaces provided with Tivoli Workload Scheduler for z/OS. Sample programs using TCP/IP or an NJE/RSCS (Network Job Entry/Remote Spooling Communication Subsystem) combinations show how you can control the workload on environments that at present have no scheduling feature. Besides these major parts, the Tivoli Workload Scheduler for z/OS product also contains the Tivoli Workload Scheduler for z/OS connector and the Job Scheduling Console (JSC). 򐂰 Tivoli Workload Scheduler for z/OS Connector

Maps the Job Scheduling Console commands to the Tivoli Workload Scheduler for z/OS engine. The Tivoli Workload Scheduler for z/OS connector requires the Tivoli Management Framework configured for a Tivoli Server or Tivoli managed node. 򐂰 Job Scheduling Console (JSC)

A Java based graphical user interface (GUI) for the Tivoli Workload Scheduler suite. The Job Scheduling Console runs on any machine from which you want to manage the Tivoli Workload Scheduler for z/OS engine plan and database objects. It provides, through the Tivoli Workload Scheduler for z/OS connector, functionality similar to the Tivoli Workload Scheduler for z/OS legacy ISPF interface. You can use the Job Scheduling Console from any

800

Troubleshooting Tivoli Using the Latest Features

machine as long as it has a TCP/IP link with the machine running the Tivoli Workload Scheduler for z/OS connector. The same Job Scheduling Console can be used for Tivoli Workload Scheduler and Tivoli Workload Scheduler for z/OS.

18.2.1 Tivoli Workload Scheduler for z/OS configuration Tivoli Workload Scheduler for z/OS supports many configuration options using a variety of communication methods: 򐂰 The controlling system (the controller or engine) 򐂰 Controlled z/OS systems 򐂰 Remote panels and program interface applications 򐂰 Job Scheduling Console 򐂰 Scheduling jobs that are in a distributed environment using Tivoli Workload Scheduler (described in 18.3, “End-to-end scheduling architecture” on page 805)

The controlling system The controlling system requires both the agent and the engine. One controlling system can manage the production workload across all your operating environments. The engine is the focal point of control and information. It contains the controlling functions, the dialogs, the databases, the plans, the scheduler’s own batch programs for housekeeping, and so on. Only one engine is required to control the entire installation, including local and remote systems. Since Tivoli Workload Scheduler for z/OS provides a single point of control for your Tivoli Workload Scheduler for z/OS production workload, it is important to make this system fail-safe. This way, you minimize the risk of having any outages in your production workload in case the engine or the system with the engine fails. To make your engine fail-safe, you can start backup engines (hot standby engines) on other systems in the same sysplex as the active engine. If the active engine or the controlling system fails, Tivoli Workload Scheduler for z/OS can automatically transfer the controlling functions to a backup system within a Parallel Sysplex. Through XCF (Cross Coupling Facility), Tivoli Workload Scheduler for z/OS can automatically maintain production workload processing during system failures. The standby engine can be started on several z/OS systems in the sysplex. Figure 18-4 on page 802 shows on active engine with two standby engines running in one sysplex. When an engine is started on a system in the sysplex, it

Chapter 18. Tivoli Workload Scheduler

801

will check if there is already an active engine in the sysplex. It there is no active engines, it will be an active engine. If there is an active engine, it will be an standby engine. The engine in the example has connection to eight agents: Three in the sysplex, two remote and three in another sysplex. The agents on the remote systems and in the other sysplex are connected to the active engine via ACF/VTAM connections.

Agent

Agent

Standby Engine

Standby Engine

z/OS SYSPLEX Agent Active Engine

VTAM

Remote Agent

VTAM

Remote Agent

Remote Agent

Remote Agent

z/OS SYSPLEX

Remote Agent

Figure 18-4 TWS for z/OS configuration with two sysplex environments

Controlled z/OS systems An agent is required for every controlled z/OS system in a configuration. This includes, for example, local controlled systems within shared DASD or sysplex configurations. The agent runs as a z/OS subsystem and interfaces with the operating system (through JES2 or JES3, and SMF), using the subsystem interface and the operating system exits. The agent monitors and logs the status of work, and passes the status information to the engine via shared DASD, XCF, or ACF/VTAM. You can exploit z/OS and the cross-system coupling facility (XCF) to connect your local z/OS systems. Rather than being passed to the controlling system via shared DASD, work status information is passed directly via XCF connections.

802

Troubleshooting Tivoli Using the Latest Features

XCF lets you to exploit all of production-workload-restart facilities and its hot standby function in Tivoli Workload Scheduler for z/OS.

Remote systems The agent on a remote z/OS system passes status information about the production work in progress to the engine on the controlling system. All communication between Tivoli Workload Scheduler for z/OS subsystems on the controlling and remote systems is done via ACF/VTAM. Tivoli Workload Scheduler for z/OS lets you link remote systems using ACF/VTAM networks. Remote systems are frequently used locally “on premises” to reduce the complexity of the data processing installation.

Remote panels and program interface applications ISPF panels and program interface (PIF) applications can run in a different z/OS system than the one where the active engine is running. Dialogs and PIF applications send requests to and receive data from a Tivoli Workload Scheduler for z/OS server that is running on the same z/OS system where the target engine is running, via advanced program-to-program communications (APPC). The APPC server will communicate with the active engine to perform the requested actions. Using a APPC server for ISPF panels and PIF gives a user the freedom to run ISPF panels and PIF on any system in an z/OS enterprise as long as this system has an advanced program-to-program communication with the system where the active engine is started. This also means that you do not have to make sure that your PIF jobs always run on the z/OS system where the active engine is started. Furthermore, using the APPC server makes it seamless for panel users and PIF programs if the engine is moved to its backup engine. The APPC server is a separate address space, started and stopped either automatically by the engine or by the user via the z/OS start command. There can be more than one server for an engine. If the dialogs or the PIF applications run on the same z/OS system where the target engine is running, the server may not be involved. As shown in Figure 18-5 on page 804, it is possible to run the Tivoli Workload Scheduler for z/OS dialogs and PIF applications from any system as long as the system has a ACF/VTAM connection to the APPC server.

Chapter 18. Tivoli Workload Scheduler

803

PIF program

z/OS SYSPLEX

ISPF panels

Active Engine APPC Server

VTAM

VTAM

Remote System

Remote System

Remote System ISPF panels

ISPF panels

PIF program

Figure 18-5 Using APPC server for remote panels to TWS for z/OS

Job Scheduling Console The Job Scheduling Console (JSC or JS Console) is another way to work with Tivoli Workload Scheduler for z/OS databases and the current plan. Using the Job Scheduling Console, you have a graphical user interface. The Job Scheduling Console connects to the Tivoli Workload Scheduler for z/OS engine via an Tivoli Workload Scheduler for z/OS TCP/IP server task. The TCP/IP server is a separate address space, started and stopped either automatically by the engine or by the user via the z/OS start and stop commands. There can be more than one TCP/IP server for an engine.

804

Troubleshooting Tivoli Using the Latest Features

z/OS SYSPLEX

Job Scheduling Console

Job Scheduling Console

Active Engine TCP/IP Server

Figure 18-6 JSC connection to Tivoli Workload Scheduler for z/OS

18.3 End-to-end scheduling architecture In the two previous sections (18.1, “Tivoli Workload Scheduler” on page 794 and 18.3, “End-to-end scheduling architecture” on page 805), we discuss the architecture for Tivoli Workload Scheduler and the architecture for Tivoli Workload Scheduler for z/OS. In this chapter, we will bring the two products together and will describe the Tivoli Workload Scheduler for z/OS end-to-end architecture.

18.3.1 How end-to-end scheduling works End-to-end scheduling is based on the ability to directly connect a Tivoli Workload Scheduler domain manager, and its underlying agents and domains, to the Tivoli Workload Scheduler for z/OS engine. The engine is seen by the distributed network as the master domain manager. Tivoli Workload Scheduler for z/OS also creates the plan for the distributed network and sends it to the domain manager. The domain manager sends a copy of the plan to each of its linked agents and subordinate domain managers for execution. The Tivoli Workload Scheduler domain manager acts as the broker for the distributed network by resolving all dependencies for the subordinate managers and agents. It sends its updates (in the form of events) to Tivoli Workload Scheduler for z/OS so that it can update the plan accordingly. Tivoli Workload Scheduler for z/OS handles its own jobs and notifies the domain manager of all the status changes of the Tivoli Workload Scheduler for z/OS jobs that involve

Chapter 18. Tivoli Workload Scheduler

805

the Tivoli Workload Scheduler plan. In this configuration, the domain manager and all the distributed agents recognize Tivoli Workload Scheduler for z/OS as the master domain manager and notify it of all the changes occurring in their own plans. At the same time, the agents are not permitted to interfere with the Tivoli Workload Scheduler for z/OS jobs, since they are viewed as running on the master that is the only node that is in charge of them. With this version of Tivoli Workload Scheduler for z/OS, the fault tolerant agents replace the Tivoli OPC tracker agents and make scheduling possible on the distributed platform with more reliable, fault tolerant, and scalable agents. In Figure 18-7 on page 807, you can see a Tivoli Workload Scheduler network managed by a Tivoli Workload Scheduler for z/OS engine. This is accomplished by connecting a Tivoli Workload Scheduler domain manager directly to the Tivoli Workload Scheduler for z/OS engine. Actually, if you compare Figure 18-2 on page 796 with Figure 18-7 on page 807, you will see that the Tivoli Workload Scheduler network that is connected to Tivoli Workload Scheduler for z/OS was a Tivoli Workload Scheduler network managed by a Tivoli Workload Scheduler master domain manager. When connecting this Tivoli Workload Scheduler network to the Tivoli Workload Scheduler for z/OS engine, the formerly Tivoli Workload Scheduler master domain manager is changed to domain manager for DomainZ (Z was chosen because this domain manager is intermediary between the Tivoli Workload Scheduler distributed network and the Tivoli Workload Scheduler for z/OS engine). The new master domain manager is the Tivoli Workload Scheduler for z/OS engine.

806

Troubleshooting Tivoli Using the Latest Features

MASTERDM Master Domain Manager

z/OS

TWS for z/OS Engine TWS for z/OS Server

DomainZ Domain Manager DMZ

AIX

DomainA

DomainB AIX

HPUX

Domain Manager DMA

FTA1

Domain Manager DMB

FTA2 AIX

FTA3 OS/400

FTA4

Windows 2000

Solaris

Figure 18-7 Tivoli Workload Scheduler for z/OS end-to-end scheduling

Tivoli Workload Scheduler for z/OS also allows you to access job streams (schedules in Tivoli Workload Scheduler) and add them to the current plan in Tivoli Workload Scheduler for z/OS. In addition, you can build dependencies among Tivoli Workload Scheduler for z/OS job streams and Tivoli Workload Scheduler jobs. From Tivoli Workload Scheduler for z/OS, you can monitor and control the distributed agents. In the Tivoli Workload Scheduler for z/OS current plan, you can specify jobs to run on workstations in the Tivoli Workload Scheduler network. Tivoli Workload Scheduler for z/OS engine passes the job information to the Symphony file in the Tivoli Workload Scheduler for z/OS server, which in turn passes the Symphony file to Tivoli Workload Scheduler domain manager (DMZ) to distribute and process. In turn, Tivoli Workload Scheduler reports the status of running and completed jobs back to the current plan for monitoring in the Tivoli Workload Scheduler for z/OS engine.

Chapter 18. Tivoli Workload Scheduler

807

18.3.2 Tivoli Workload Scheduler for z/OS end-to-end components To run the Tivoli Workload Scheduler for z/OS, you must have a Tivoli Workload Scheduler for z/OS server started task dedicated to end-to-end scheduling. It is possible to use the same server to communicate with the Job Scheduling Console. The Tivoli Workload Scheduler for z/OS use TCP/IP for communication. The Tivoli Workload Scheduler for z/OS engine uses the end-to-end server to communicate events to the distributed agents. The end-to-end server will start multiple tasks and processes using the z/OS UNIX System Services (USS). The Tivoli Workload Scheduler for z/OS end-to-end scheduling engine is comprised of three major components: 򐂰 The Tivoli Workload Scheduler for z/OS controller. Manages database objects, creates plans with the workload, and executes and monitors workload in the plan. 򐂰 The Tivoli Workload Scheduler for z/OS server. Acts as the Tivoli Workload Scheduler master domain manager. It receives a part of the current plan from the Tivoli Workload Scheduler for z/OS engine, which contains jobs and job streams to be executed in the Tivoli Workload Scheduler network. The server is the focal point for all communication and from the Tivoli Workload Scheduler network. 򐂰 The Tivoli Workload Scheduler primary domain manager. Serves as the communication hub between Tivoli Workload Scheduler for z/OS server and the distributed Tivoli Workload Scheduler network. The domain manager is connected directly to the Tivoli Workload Scheduler for z/OS master domain manager in USS.

In Tivoli Workload Scheduler for z/OS 8.1.0, it is only possible to connect one focal point domain manager directly to the Tivoli Workload Scheduler for z/OS server (this domain manager is also called the primary domain manager). It is possible to designate a backup domain manager for the focal point Tivoli Workload Scheduler domain manager.

Detailed description of the communication The communication between the Tivoli Workload Scheduler for z/OS controller and the Tivoli Workload Scheduler for z/OS server is shown in Figure 18-8 on page 809.

808

Troubleshooting Tivoli Using the Latest Features

TWS for z/OS Engine TWS for z/OS Controller

TWS for z/OS Server

job log retrievers

GS GS

NMM EM

sender subtask receiver subtask

TWS processes

NetReq.msg

netman spawns writer

spawned as necessary

end-to-end task WA

(in USS) Message files

outbound queue inbound queue

writer

output translator input translator

Mailbox.msg

mailman

Intercom.msg

batchman

From remote mailman

To remote writer

tomaster.msg

Figure 18-8 Tivoli Workload Scheduler for z/OS inter-process communication

Tivoli Workload Scheduler for z/OS server processes and tasks The Tivoli Workload Scheduler for z/OS server uses the following processes and tasks for end-to-end scheduling (see Figure 18-8): Netman

Replicates the Tivoli Workload Scheduler process. It starts at system startup. It monitors the NetReq.msg queue and the Tivoli Workload Scheduler TCP/IP port (default 31111). When it receives a request, it starts the writer or mailman processes. The request to start or stop mailman will come from the output translator via the NetReq.msg queue. The request to start or stop writer will come from mailman on the Tivoli Workload Scheduler domain manager via the TCP/IP port.

Writer

Replicates the Tivoli Workload Scheduler process. It is started by netman on request from the mailman of the connected Tivoli Workload Scheduler domain manager. Writer has the task of writing the events that it receives from the remote mailman in Mailbox.msg.

Mailman

Replicates the Tivoli Workload Scheduler process. Its main task are: Routing events. It reads the events stored in the Mailbox.msg queue and sends them either to the

Chapter 18. Tivoli Workload Scheduler

809

controller, writing them in Tomaster.msg, or to the remote writer on the Tivoli Workload Scheduler domain manager. Establishing the connection with the domain manager by calling the remote netman to start writer. Sending the Symphony file to the primary Tivoli Workload Scheduler directly connected domain manager when a new Symphony file is created. Batchman

Updates the Symphony file and resolves dependencies at the master level. It replicates the functionality of Tivoli Workload Scheduler’s batchman process to a limited extent.

Job log retriever

Receives from each distributed agent the log of a job executed on the agent. After the job log retriever has received the log, it sizes the log according to Tivoli Workload Scheduler for z/OS specifications, translates it from UTF-8 to the Tivoli Workload Scheduler for z/OS EBCDIC codepage, and enqueues it in the inbound queue of the controller. The retrieval of a job log is a lengthy operation and users may request several logs at the same time. For this reason, a subtask is started for each job log retrieve. The subtasks are temporary and terminate after the logs are enqueued in the inbound queue. The user will be notified by a message if running Tivoli Workload Scheduler for z/OS panel interface when the job log is received.

򐂰 Events from server to controller Input translator

Translates the events read from the tomaster.msg file to the Tivoli Workload Scheduler for z/OS format (including UTF-8 to EBCDIC translation), and writes them in the inbound queue

Receiver subtask

A subtask of the end-to-end task run in the Tivoli Workload Scheduler for z/OS controller. It receives events from the inbound queue and enqueues them to the Event manager (EM) task. The events have already been filtered and elaborated by the input translator.

򐂰 Events from controller to server Sender subtask

810

A subtask of the end-to-end task in the Tivoli Workload Scheduler for z/OS controller. It receives events for changes in the engine plan, related to Tivoli Workload Scheduler agents. The Tivoli Workload Scheduler for

Troubleshooting Tivoli Using the Latest Features

z/OS tasks that can change the current plan are: General Service (GS), Normal Mode Manager (NMM), Event Manager (EM), and Workstation Analyzer (WA). The NMM sends events to the sender task when the plan is extended or replanned for synchronization purposes. Output translator

Receives the events in Tivoli Workload Scheduler for z/OS format form the outbound queue and elaborates them to activate the correct Tivoli Workload Scheduler function. It also translates event names from the Tivoli Workload Scheduler for z/OS EBCDIC codepage to UTF-8. The output translator interacts with three different components, depending on the type of the event: Starts a job log retriever subtask if the event is to retrieve the log of a job from a Tivoli Workload Scheduler agent. Enqueues an event in NetReq.msg if the event is to start or stop mailman. Enqueues events in mailbox.msg for the other events that are sent to update the Symphony file on the Tivoli Workload Scheduler agents (for example, events for a job that has changed status, events for manual changes on jobs or workstation, or events to link/unlink workstations).

Tivoli Workload Scheduler for z/OS data sets and files used for end-to-end The Tivoli Workload Scheduler for z/OS server and controller uses the following data sets and files for end-to-end scheduling: EQQTWSIN

Sequential data set used to queue events sent by the server to the controller (the inbound queue). Must be defined in the Tivoli Workload Scheduler for z/OS controller and server started task procedure.

EQQTWSOU

Sequential data set used to queue events sent by the controller to the server (the outbound queue).

Symphony

HFS file containing the active copy of the plan used by the distributed Tivoli Workload Scheduler agents. This file is not shown in Figure 18-8 on page 809.

Chapter 18. Tivoli Workload Scheduler

811

Sinfonia

HFS file containing the distribution copy of the plan used by the distributed Tivoli Workload Scheduler agents. This file is not shown in Figure 18-8 on page 809.

NetReq.msg

HFS file used to queue requests for the netman process.

Mailbox.msg

HFS file used to queue events sent to the mailman process.

intercom.msg

HFS file used to queue events sent to the batchman process.

tomaster.msg

HFS file used to queue events sent to the input translator process.

EQQSCLIB

Partitioned data set used as a repository for the definitions of the jobs running on distributed agents. This data set is not shown in Figure 18-8 on page 809. The EQQSCLIB data set are described in 18.3.4, “Tivoli Workload Scheduler for z/OS end-to-end database objects” on page 813.

EQQSCPDS

VSAM data set containing a copy of the current plan used by the daily plan batch programs to create the Symphony file. This data set is not shown in Figure 18-8 on page 809. The end-to-end plan creating process is described in 18.3.5, “Tivoli Workload Scheduler for z/OS end-to-end plans” on page 815.

18.3.3 Tivoli Workload Scheduler for z/OS end-to-end configuration The topology of the distributed Tivoli Workload Scheduler network that is connected to Tivoli Workload Scheduler for z/OS engine is described in parameter statements for the Tivoli Workload Scheduler for z/OS server and for the Tivoli Workload Scheduler for z/OS programs that handles the long-term plan and the current plan. Parameter statements are also used to activate the end-to-end subtasks in the Tivoli Workload Scheduler for z/OS controller. Please refer to Chapter 3, “Planning, installation, and configuration of the TWS 8.1”, in the End-to-End Scheduling with Tivoli Workload Scheduler 8.1, SG24-6022 for a detailed explanation of these parameters. Here you will also find an example on how to reflect a specific Tivoli Workload Scheduler network topology in Tivoli Workload Scheduler for z/OS server and plan programs using the Tivoli Workload Scheduler for z/OS topology parameter statements.

812

Troubleshooting Tivoli Using the Latest Features

18.3.4 Tivoli Workload Scheduler for z/OS end-to-end database objects To be able to run any workload in the Tivoli Workload Scheduler distributed network, you must define some database objects related to the Tivoli Workload Scheduler workload in Tivoli Workload Scheduler for z/OS engine databases. The Tivoli Workload Scheduler for z/OS end-to-end related database objects are: 򐂰 Tivoli Workload Scheduler for z/OS fault tolerant workstations

A fault tolerant workstation is a computer workstation configured to schedule jobs on distributed agents. The workstation must also be defined in the server CPUREC initialization statement. 򐂰 Tivoli Workload Scheduler for z/OS job streams, jobs, and dependencies

Job streams and jobs to run on Tivoli Workload Scheduler distributed agents are defined like other job streams and jobs in Tivoli Workload Scheduler for z/OS. To run a job on a Tivoli Workload Scheduler distributed agent, the job is simply defined on a fault tolerant workstation. Dependencies between Tivoli Workload Scheduler distributed jobs are created exactly the same way as other job dependencies in the Tivoli Workload Scheduler for z/OS engine. This is also the case when creating dependencies between Tivoli Workload Scheduler distributed jobs and Tivoli Workload Scheduler for z/OS mainframe jobs Some of the Tivoli Workload Scheduler for z/OS mainframe specific options will not be available for Tivoli Workload Scheduler distributed jobs. 򐂰 Tivoli Workload Scheduler for z/OS resources

Only global resources are supported and can be used for Tivoli Workload Scheduler distributed jobs. That means the resource dependency is resolved by the Tivoli Workload Scheduler for z/OS engine (controller) and not locally on the distributed agent. For a job running on a distributed agent, the usage of resources causes the loss of fault tolerance. Only the engine determines the availability of a resource and consequently lets the distributed agent start the job. Thus, if a job running on a distributed agent uses a resource, the following occurs: – When the resource is available, the engine sets the state of the job to started and the extended status to waiting for submission. – The engine sends a release-dependency event to the distributed agent. – The distributed agent starts the job. If the connection between the engine and the distributed agent breaks, the operation does not start on the distributed agent even if the resource becomes available.

Chapter 18. Tivoli Workload Scheduler

813

Note: When you monitor a job on a fault tolerant agent by means of the Tivoli Workload Scheduler interfaces, you will not be able to see the resource used by the job. Instead, you see the job, OPCMASTER#GLOBAL.SPECIAL_RESOURCES. The dependency on OPCMASTER#GLOBAL.SPECIAL_RESOURCES is set by the engine. When monitoring by means of the Tivoli Workload Scheduler for z/OS interfaces, you can see the resources as expected.

Every job that has special resource dependencies has a dependency to this job. When the engine allocates the resource for the job, the dependency is released (the engine sends a release event for the specific resource to the agent through the distributed network). 򐂰 The task associated to the distributed agent job, defined in Tivoli Workload Scheduler for z/OS

A special partitioned data set, EQQSCLIB, allocated in the Tivoli Workload Scheduler for z/OS engine started task procedure, is used to store the job or task definitions for distributed agent jobs. For every distributed agent job definition in Tivoli Workload Scheduler for z/OS, there must be a corresponding member in the EQQSCLIB data set. The members of EQQSCLIB contain a JOBREC statement that describes the path to the job or the command to be executed and the user to be used when the job or command is executed. Example for a UNIX script: JOBREC JOBSCR(/Tivoli/tws/scripts/script001_accounting) JOBUSR(userid01)

Example for a UNIX command: JOBREC JOBCMD(ls) JOBUSR(userid01)

It is not possible to use Tivoli Workload Scheduler for z/OS JCL variables or automatic recovery statements in the task definition for distributed agent jobs, because the task definition is placed in a separate library and does not contain the actually script (JCL), only the placement (the path). The JOBREC definitions are read by the Tivoli Workload Scheduler for z/OS plan programs, when producing the new current plan, and placed as part of the job definition in the Symphony file. If a Tivoli Workload Scheduler distributed job stream is added to the plan in Tivoli Workload Scheduler for z/OS, the JOBREC definition will be read by Tivoli Workload Scheduler for z/OS, copied to the Symphony file on the Tivoli Workload Scheduler for z/OS server, and sent (as events) by the server to the Tivoli Workload Scheduler agent Symphony files via the directly connected Tivoli Workload Scheduler domain manager.

814

Troubleshooting Tivoli Using the Latest Features

It is important to remember that the EQQSCLIB members only have a pointer (the path) to the job that is going to be executed. The actually job (the “JCL”) is placed locally on the distributed agent or workstation in the directory defined by the JOBREC JOBSCR definition.

18.3.5 Tivoli Workload Scheduler for z/OS end-to-end plans When scheduling jobs in the Tivoli Workload Scheduler environment, current plan processing also includes the automatic generation of the Symphony file that goes to the Tivoli Workload Scheduler for z/OS server and Tivoli Workload Scheduler subordinate domain managers, as well as fault tolerant agents. If the end-to-end feature is activated in The Tivoli Workload Scheduler for z/OS, the current plan program will read the topology definitions described in the TOPLOGY, DOMREC, CPUREC, and USRREC initialization statements (see 18.3.3, “Tivoli Workload Scheduler for z/OS end-to-end configuration” on page 812) and the script library (EQQSCLIB) as part of the planning process. Information from the initialization statements and the script library will be used to create a Symphony file for the Tivoli Workload Scheduler distributed agents (see Figure 18-9).

Databases

Current Plan Extension & Replan

Re sources

Old current plan

Workstations

Job Streams

Remove completed job streams

1. 2. 3.

Add detail for next day

Extract TW S plan form current plan Add topology (domain, workstation) Add task definition (path and user) for distributed TW S jobs

Script library

New current plan

New Symphony

To polo gy Definitions

Figure 18-9 Creation of Symphony file in TWS for z/OS plan programs

Note that, creating the plan, extracting plan objects related to the distributed agents and building the related Symphony file do not involve Jnextday or any of

Chapter 18. Tivoli Workload Scheduler

815

the Jnextday processes used by Tivoli Workload Scheduler. The process is handled by the Tivoli Workload Scheduler for z/OS planning programs.

Detailed description of the of the Symphony creation See Figure 18-8 on page 809 for a description of the tasks and processes involved in the Symphony creation. 򐂰 Tivoli Workload Scheduler for z/OS Normal Mode Manager (NMM) sends an event to the output translator to stop the Tivoli Workload Scheduler network. In the meanwhile, the plan program has started producing the Tivoli Workload Scheduler for z/OS plan and the Symphony file with workstation information. 򐂰 The output translator stops the Tivoli Workload Scheduler for z/OS server in USS and stops processing incoming events. An End Sync event is added to the inbound queue. The output translator starts to stop all the Tivoli Workload Scheduler agents. 򐂰 Event Manager (EM) process all the events on the inbound queue while the Sync Stop event is found, then notifies NMM that the Tivoli Workload Scheduler network has been stopped. 򐂰 When the plan program has produced the new plan, NMM eventually waits for EM to finish processing events. After that, the NMM applies the job tracking events received while the new plan was produced on the new plan. It then make a backup of the new current plan on the Tivoli Workload Scheduler for z/OS current plan data set (CP1, CP2) and to Symphony Current Plan (SCP) data set. NMM sends a CP Ready Sync event to Output Translator to separate events from the old plan and events from the new plan. 򐂰 The Tivoli Workload Scheduler for z/OS mainframe schedule is resumed. 򐂰 The plan program starts producing Symphony file starting from SCP. 򐂰 When the Symphony has been created, the plan program ends, and NMM notifies the Output Translator that the new Symphony is ready. 򐂰 The Output Translator copies the new Symphony (Symnew file) into Symphony and Sinfonia file, and a Symphony OK (or NOT OK) Sync event is sent to the Tivoli Workload Scheduler for z/OS engine, which logs a message in the engine message log indicating that the Symphony has been switched (or not). 򐂰 The Tivoli Workload Scheduler for z/OS server master is started in USS and the Input Translator starts to process new events. As in Tivoli Workload Scheduler, distributed Mailman and Batchman process events left in local event files start distributing the new Symphony to the whole Tivoli Workload Scheduler.

When the Symphony file is created by the Tivoli Workload Scheduler for z/OS plan programs, it (or, more precisely, the Sinfonia file) will be distributed to the

816

Troubleshooting Tivoli Using the Latest Features

Tivoli Workload Scheduler for z/OS subordinate domain manager, which in turn distributes the Symphony (Sinfonia) file to its subordinate domain managers and fault tolerant agents (see Figure 18-10).

MASTERDM

z/OS

Master Domain Manager

The TWS plan is extracted from the TWS for z/OS plan TWS for z/OS plan

DomainZ Domain Manager DMZ

AIX

The TWS plan is then distributed to the subordinate DMs and FTAs TWS plan

DomainA

DomainB AIX

HPUX

Domain Manager DMA

FTA1

Domain Manager DMB

FTA2 AIX

TWS plan

FTA3 OS/400

FTA4

Windows 2000

Solaris

Figure 18-10 Symphony file distribution from TWS for z/OS server to TWS agents

The Symphony file is generated: 򐂰 Every time the Tivoli Workload Scheduler for z/OS plan is extended or replanned 򐂰 When a Symphony Renew batch job is submitted (from Tivoli Workload Scheduler for z/OS legacy ISPF panels, option 3.5)

The Symphony file contains: 򐂰 Jobs to be executed on Tivoli Workload Scheduler distributed agents 򐂰 z/OS (mainframe) jobs that are predecessors to Tivoli Workload Scheduler distributed jobs 򐂰 Job Streams that has at least one job in the Symphony file

Chapter 18. Tivoli Workload Scheduler

817

After the Symphony file is created and distributed to the Tivoli Workload Scheduler distributed agents, the Symphony file is updated by events: 򐂰 When job status changes 򐂰 When jobs or job streams are modified 򐂰 When jobs or job streams for the Tivoli Workload Scheduler distributed agents are added to the plan in Tivoli Workload Scheduler for z/OS engine

If you look at the Symphony file locally on a Tivoli Workload Scheduler distributed agent, from the Job Scheduling Console or by using the Tivoli Workload Scheduler command line interface to the plan (Conman), you will see that: 򐂰 The Tivoli Workload Scheduler workstation has the same name of the related Workstation. OPCMASTER is the hardcoded name for the master domain manager workstation for the Tivoli Workload Scheduler for z/OS engine. 򐂰 The name of the job stream (or schedule) is the hexadecimal representation of the occurrence (job stream instance) token (unique and invariant internal identifier for occurrences). The job streams are always defined on the OPCMASTER workstation (having no dependencies, this does not reduce fault tolerance). 򐂰 Using this hexadecimal representation for the job stream instances makes it possible to have several instances for the same job stream, since they have a unique job stream name. Therefore, it is possible to have a plan in the Tivoli Workload Scheduler for z/OS engine and a distributed Symphony file that spans more than 24 hours. 򐂰 The job name has the form: __ or ___

Where: – is “J” for normal jobs or “P” for jobs that are representing pending predecessors. – is the operation number for the job in the job stream. – is incremented when the same operation is re-created; if 0, it is omitted. – is the occurrence (job stream) name.

818

Troubleshooting Tivoli Using the Latest Features

In normal situations, the Symphony file is automatically generated as part of the Tivoli Workload Scheduler for z/OS plan process. Since the topology definitions are read and built into the Symphony file as part of the Tivoli Workload Scheduler for z/OS plan programs, regular operation situations can occur, where you need to renew (or rebuild) the Symphony file from the Tivoli Workload Scheduler for z/OS plan: 򐂰 When you make changes to the script library or to the definitions of the TOPOLOGY statement 򐂰 When you add or change information in the plan, such as workstation definitions

To have the Symphony file rebuilt or renewed, you can use the SYMPHONY RENEW option of the DAILY PLANNING menu (option 3.5 in the legacy Tivoli Workload Scheduler for z/OS ISPF panels). This renew function can also be used to recover from error situations, such as: 򐂰 There is a non-valid job definition in the script library. 򐂰 The workstation definitions are incorrect. 򐂰 An incorrect Windows NT user name or password is specified. 򐂰 You make changes to the script library or to the definitions of the TOPOLOGY statement.

18.4 Troubleshooting for Tivoli Workload Scheduler for z/OS We will show you how to identify the distinguishing features that will help you obtain a solution. The answer might then be found in the manuals, but, often, it is not possible to get a solution or circumvention without involving the Tivoli support structure. However, it can be helpful for first analysis or for providing the right documentation when you are facing a problem. A good guideline for this chapter is the Tivoli Workload Scheduler for z/OS V8R1 Diagnosis Guide and Reference, LY19-6410. To identify an error, you must first gather information related to the problem, such as abend codes and dumps. You can then determine whether the problem is in Tivoli Workload Scheduler for z/OS. If the problem is in Tivoli Workload Scheduler for z/OS, this chapter helps you classify and describe the problem. The external symptoms of several problems are described to help you identify which problem type to investigate. Each problem type requires a different procedure when you describe the problem. Use these procedures to build a string of keywords and to obtain documentation relevant to the problem. This

Chapter 18. Tivoli Workload Scheduler

819

combination of a keyword string and associated documentation helps you describe the problem accurately to the Tivoli service personnel.

18.4.1 Using keywords to describe a problem A keyword is a word or abbreviation that describes a single aspect of a program failure to the Tivoli Support Center. You use keywords to describe all aspects of a problem, from the Tivoli Workload Scheduler for z/OS component ID to the area of failure. You then use the problem analysis procedures to build a keyword string. For example, if your program failure is due to the abnormal termination of a task, the keyword is ABEND. Other keywords are also formed to describe particular aspects of abnormal termination, such as the name of the module where the abend occurred. These keywords are then combined to form a keyword string. Let us look at the following example: 5697WSZ01 ABEND0C4 EQQYVARG

In this example, 5697-WSZC01 is the Tivoli Workload Scheduler for z/OS component ID, ABEND is the problem type, and 0C4 is the abend code. EQQYVARG is the module containing the abend.

18.4.2 Searching the software-support database To determine if the problem has been noted before, you can use the keyword string that you create to search the software-support database. If a problem similar to yours is described in the database, a solution is probably available. To widen or narrow the database search, you can vary the keyword string you develop. If you have access to the Tivoli support database, you can use the keyword string to search for solutions to problems similar to yours. Link to the Tivoli support database at the following URL: http://www.tivoli.com/support/

18.4.3 Problem-type keywords The problem-type keywords are used to identify the failure that occurred. Table 18-1 on page 821 lists the keywords and the problem types they identify.

820

Troubleshooting Tivoli Using the Latest Features

Table 18-1 Keywords Keywords

Keywords meaning

ABEND

Abnormal end

ABENDU

Abnormal end with user abend code

DOC

Documentation

LOOP

Loop

WAIT

Wait

MSG

Message

PERF

Performance

INCORROUT

Incorrect output

ABEND Choose the ABEND keyword when the Tivoli Workload Scheduler for z/OS program comes to an abnormal end with a system abend code. You should also use ABEND when any program that services Tivoli OPC (for example, VTAM) terminates it, and one of the following symptoms appears: 򐂰 An abend message at an operator console. The abend message contains the abend code and is found in the system console log. 򐂰 A dump is created in a dump dataset.

ABENDU Choose the ABENDU keyword when the Tivoli Workload Scheduler for z/OS program comes to an abnormal end with a user abend code and the explanation of the abend code states that it is a program error. Also, choose this keyword when a user abend (which is not supposed to signify a program error) occurs when it should not occur, according to the explanation. If a message was issued, use the MSG keyword to document it.

DOC Choose the DOC keyword when one or more of the following symptoms appears: 򐂰 There is incomplete or inaccurate information in a Tivoli OPC publication. 򐂰 The published description of Tivoli OPC does not agree with its actual operation.

Chapter 18. Tivoli Workload Scheduler

821

INCORROUT Choose the INCORROUT keyword when one or more of these symptoms appears: 򐂰 You received unexpected output, and the problem does not appear to be a loop. 򐂰 The output appears to be incorrect or incomplete. 򐂰 The output is formatted incorrectly. 򐂰 The output comes from damaged files or from files that are not set up or updated correctly.

LOOP Choose the LOOP keyword when one or more of the following symptoms exists: 򐂰 Part of the program (other than a message) is repeating itself. 򐂰 A Tivoli Workload Scheduler for z/OS command has not completed after an expected period of time, and the processor usage is at higher-than-normal levels. 򐂰 The processor is used at higher-than-normal levels, a workstation operator experiences terminal lockout, or there is a high channel activity to a Tivoli Workload Scheduler for z/OS database.

MSG Choose the MSG keyword to specify a message failure. Use this keyword when a Tivoli Workload Scheduler for z/OS problem causes an error message. The message might appear at the system console or in the message log, or both. The messages issued by Tivoli Workload Scheduler for z/OS appear in the following formats: 򐂰 EQQ FnnnC 򐂰 EQQ FFnnC 򐂰 EQQ nnnnC

The message is followed by the message text. The variable components represent: 򐂰 F or FF: This is the Tivoli Workload Scheduler for z/OS component that issued the message. 򐂰 nn, nnn, or nnnn: This is the message number. 򐂰 C: Severity code of I (information), W (warning), or E (error).

822

Troubleshooting Tivoli Using the Latest Features

The following are message number examples: 򐂰 EQQN008E 򐂰 EQQWl10W 򐂰 EQQF008I

If the message that is associated with your problem does not have the EQQ prefix, your problem is probably not associated with Tivoli Workload Scheduler for z/OS, and you should not use the MSG keyword.

PERFM Choose the PERFM keyword when one or more of the following symptoms appears: 򐂰 Tivoli Workload Scheduler for z/OS event processing or commands, including commands entered from a terminal in session, take an excessive amount of time to complete. 򐂰 Tivoli Workload Scheduler for z/OS performance characteristics do not meet explicitly stated expectations. Describe the actual and expected performances and the explicit source of the performance expectation.

WAIT Choose the WAIT keyword when one or more of the following symptoms appears: 򐂰 The Tivoli Workload Scheduler for z/OS program, or any program that services this program, has suspended activity while waiting for a condition to be satisfied without issuing a message to indicate why it is waiting. 򐂰 The console operator cannot enter commands or otherwise communicate with the subsystem and the Tivoli Workload Scheduler for z/OS does not appear to be in a loop.

18.4.4 Problem analysis procedures This section details the procedures that you use to further describe a problem. First, you gather the information for the specific problem type. When you have chosen a problem-type keyword, you need to collect problem documentation and create a keyword string to describe the problem. To do this, gather the information for the specific problem: 򐂰 System or user abnormal termination procedure (ABEND or ABENDU) 򐂰 Documentation procedure (DOC) 򐂰 Incorrect output procedure (INCORROUT)

Chapter 18. Tivoli Workload Scheduler

823

򐂰 Loop procedure (LOOP) 򐂰 Message procedure (MSG) 򐂰 Performance procedure (PERFM) 򐂰 Wait procedure (WAIT)

18.4.5 Abnormal termination (ABEND or ABENDU) procedure A malfunction in the system can cause an abnormal termination (ABEND). Abend categories are: 򐂰 User ABEND 򐂰 System ABEND

User abends originate in the application program. Abend codes are documented in Appendix A, “Abend Codes” of the Tivoli Workload Scheduler for z/OS V8R1 Diagnosis Guide and Reference, LY19-6410, and Tivoli Workload Scheduler for z/OS V8R1 Messages and Codes, SH19-4548. A common user abend is 3999, as shown in Example 18-1. Example 18-1 User abend 3999 Explanation: An internal validity checking has discovered an error condition (internal Tivoli OPC error). A message that contains the reason for the abend, as well as other debugging information, is written to the Tivoli OPC diagnostic file, EQQDUMP. Problem determination: None. System programmer response: Call your IBM representative.

You may find the occurrence Data Router task abended while processing the following queue element in the system log (Example 18-2). Example 18-2 Symptom dump output IEA995I SYMPTOM DUMP OUTPUT USER COMPLETION CODE=3999 TIME=15.46.40 SEQ=00456 CPU=0000 ASID=0031 PSW AT TIME OF ERROR 078D1000 800618CE ILC 2 INTC 0D ACTIVE LOAD MODULE ADDRESS=00054DF8 OFFSET=0000CAD6 NAME=EQQBEX DATA AT PSW 000618C8 - 00181610 0A0D1812 0A0D47F0 GPR 0-3 80000000 80000F9F 00000F9F 000844E8 GPR 4-7 C5D8D8C2 C4C54040 C5E7C9E3 40404040 GPR 8-11 00000000 00000001 00000F9F 00061728 GPR 12-15 00000000 001DA4C0 800579D2 00000000 END OF SYMPTOM DUMP

824

Troubleshooting Tivoli Using the Latest Features

In addition, Tivoli Workload Scheduler for z/OS writes diagnostic information in its EQQDUMP dataset: EQQ0000T MODULE: EQQDXQPR, REASON: INVDEST

System abends can occur, for example, when a program instruction refers to a storage area that does not exist anymore.

18.4.6 The diagnostic file (EQQDUMP) When Tivoli Workload Scheduler for z/OS internal validity checking discovers error conditions within the network communication function, debugging information is written to the diagnostic file (defined by ddname EQQDUMP). For serious error conditions, Tivoli Workload Scheduler for z/OS abends with user code 3999 as well. The diagnostic information consists of the message EQQ0000T, which gives the name of the module in error and the reason for the error in two 8-byte character strings. Tivoli Workload Scheduler for z/OS also writes a formatted version of the trace table to the diagnostic file. In most situations, Tivoli Workload Scheduler for z/OS will also snap the data that it considers to be in error.

18.4.7 Trace information Tivoli Workload Scheduler for z/OS maintains an internal trace to make it possible to see the order that its modules have been invoked in prior to an abend. The trace is wraparound with an end mark after the last trace entry added. Each entry consists of two 8-byte character fields: The Module Name field and the Reason field. The end mark consists of a string of 16 asterisks (X'5C'). For most abnormal terminations, a trace table is written in the diagnostic file (EQQDUMP). These trace entries are intended to be used by Tivoli support when they are diagnosing Tivoli Workload Scheduler for z/OS problems. A trace entry with the reason PROLOG is added upon entry to the module. Similarly, an entry with EPILOG is added at the exit from the module. When trace entries are added for other reasons, the reason is provided in the Reason field.

18.4.8 System dump dataset An abnormal end (abend) of major tasks may affect the entire Tivoli Workload Scheduler for z/OS PLEX (or SYSPLEX) and can jeopardize the whole production. The Recovery and Terminating Manager (RTM) of the operating system produces valuable information for diagnostic purposes. Therefore, it is extremely important to make sure that this information is kept in a dataset, called the system dump dataset, for further analysis.

Chapter 18. Tivoli Workload Scheduler

825

The sample JCL procedure for an Tivoli Workload Scheduler for z/OS address space includes a SYSMDUMP DD statement, and a dump dataset is allocated by the EQQPCS02 JCL created by EQQJOBS. SYSMDUMP is the dump format preferred by the service organization. Ensure that the dump options for SYSMDUMP include RGN, LSQA, TRT, CSA, and GRSQ on systems where a Tivoli Workload Scheduler for z/OS address space will execute. To display the current SYSMDUMP options, issue the z/OS command DISPLAY DUMP,OPTIONS. You can use the CHNGDUMP command to alter the SYSMDUMP options. Note that this will only change the parameters until the next IPL. Do not forget to insert a SYSMDUMP DD statement into the JCL of your PIF programs. It is very important to use the right disposition of the dump dataset, because you have to be sure that the dump written to the dataset will not be replaced by maintask or recursive abends. Therefore, we recommend that you use DISP=MOD. The disadvantage of the disposition is that the dump dataset can be burst when multiple dumps are written, so make sure that you save the dumps and clear them afterwards. The following is a SYSMDUMP example: //SYSMDUMP DD DISP=MOD,DSN=OPC.V2R3M0.DMP

Please note that //SYSOUT=* destroys the internal format of the dump and renders it useless. When you experience an abend and find no dumps in your dump datasets, look at your dump analysis and elimination (DAE) set up. DAE can be used to prevent the creating of certain kind of dumps. See the z/OS V1R3.0 MVS Initialization and Tuning Guide, SA22-7591 for more information.

18.4.9 LOOP procedure If your problem type is LOOP, you should take the following steps: 򐂰 Use the Tivoli Workload Scheduler for z/OS message log or system console log to help you identify what happened just before the program loop occurred. 򐂰 Obtain a dump using the z/OS DUMP command. The internal system trace is very helpful for the Tivoli support representative to analyze a loop. The default trace table is 64 KB for each processor and could not be enough when encountering a wide spread loop. We recommend increasing the trace table to 120 KB before obtaining the dump with the following console command: /Trace ST,120K

򐂰 To become familiar with obtaining a console dump, see 18.4.13, “Preparing a console dump” on page 829. 򐂰 Document instruction addresses from within the loop, if possible. 򐂰 Provide a description of the situation leading up to the problem.

826

Troubleshooting Tivoli Using the Latest Features

18.4.10 Message (MSG) procedure If your Tivoli Workload Scheduler for z/OS problem type is MSG, you should take the following steps: 򐂰 Look up the message in Tivoli Workload Scheduler for z/OS V8R1 Messages and Codes, SH19-4548, for an explanation. This manual includes information on what action Tivoli Workload Scheduler for z/OS takes and what action the operator should take in response to a message. If you plan to report the problem, gather the documentation before you take action. 򐂰 Copy the message identifier and the message text. The Tivoli Support Center representative needs the exact message text. 򐂰 Supplement the MSG keyword with the message identifier. You use the supplemented keyword in your keyword string when searching the software support database.

With OS/390 V2R5, Tivoli Workload Scheduler for z/OS introduced a new task, EQQTTOP, which handles the communication between the TCP/IP server that now runs on UNIX System Services (USS) in full function mode. EQQTTOP uses C coding in order to use the new C socket interface. New messages are implemented, some of them pointing to other z/OS manuals. For example: EQQTT20E THE RECEIVE SOCKET CALL FAILED WITH ERROR CODE 1036

򐂰 Explanation: An error was encountered when the TCP/IP communication task attempted to issue a receive socket call to TCP/IP. The ERRNO value is the error code returned by the failing socket call (see Table 18-2). 򐂰 System action: Depending on the failing call, either the TCP/IP communication task is terminated or the specific socket connection is closed. Whenever possible, the task is automatically restarted. If the socket connection was closed, it is reestablished. 򐂰 System programmer response: Check the error code in the z/OS CS IP and SNA manual and make any possible corrective action. If the error reoccurs, save the message log (EQQMLOG) and contact your Tivoli representative.

To find the cause, look in Chapter 11, “SNMP pe_error messages”, of the z/OS V1R2 Communications Server: IP and SNA Codes, SC31-8791. Table 18-2 Socket error codes Error number

Message name

Error description

1036

EIBMNOACTIVETCP

TC/IP is not active

New modify commands in Tivoli Workload Scheduler for z/OS are a handy way to get important information very quickly. When you want to find out which Tivoli

Chapter 18. Tivoli Workload Scheduler

827

Workload Scheduler for z/OS task is active or inactive (other than by looking into MLOG for related messages), enter the command in SDSF shown in Example 18-3. Example 18-3 Modify command /F procname,status,subtask /*where procname is the subsystem name of engine or agent */

This will show you the tasks shown in Example 18-4. Example 18-4 Task display F TWSC,STATUS,SUBTASK EQQZ207I NORMAL MODE MGR EQQZ207I JOB SUBMIT TASK EQQZ207I DATA ROUTER TASK EQQZ207I TCP/IP TASK EQQZ207I EVENT MANAGER EQQZ207I GENERAL SERVICE EQQZ207I JT LOG ARCHIVER EQQZ207I EXTERNAL ROUTER EQQZ207I WS ANALYZER

IS IS IS IS IS IS IS IS IS

ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE INACTIVE ACTIVE ACTIVE ACTIVE

Example 18-4 shows that the general service task has an inactive status. To find more details, have a look into MLOG. The modify commands are described in the Tivoli Workload Scheduler for z/OS V8R1 Quick Reference, GH19-4541.

18.4.11 Performance (PERFM) procedure If your problem concerns performance, you should: 򐂰 Document the actual performance, the expected performance, and the source of information for the expected performance. If a document is the source, note the order number and page number of the document. 򐂰 Document the information about your operating environment, such as the number of active initiators, the number of TSO users, and the number of Tivoli Workload Scheduler for z/OS users connected. Any user modifications to the program exits, REXX programs, and command lists can affect performance. You should consider whether the user-installed code, REXX programs, or CLISTs are contributing to the problem. 򐂰 Document any modifications to your system. Performance problems can be related to various system limitations. Your market division representative might be able to identify possible causes of a performance problem.

828

Troubleshooting Tivoli Using the Latest Features

18.4.12 WAIT procedure If your problem type is WAIT, you should take the following steps: 򐂰 Research the activity before system activity was suspended, identifying which operation is in the wait state. 򐂰 Specify any messages that were sent to the message log or to the system console. 򐂰 Obtain a dump using the z/OS DUMP command. Check if the dump options include RGN and GRSQ. 򐂰 A wait state in the system is similar to a hang; however, the processing is suspended. Usually, it is recognized by the system not being able to submit jobs. A probable cause could be that one task holds a resource while other tasks are waiting until the owning task releases the resource. Such resource contentions can happen a lot of the time but are not serious if they are resolved in a short time. If you experience a long wait or hang, you can display an eventual resource contention when entering the following command in SDSF (shown in Example 18-5). Example 18-5 Display resource contention COMMAND INPUT ===> /D grs,c

ISG343I 23.23.19 GRS STATUS 043 S=SYSTEMS SYSZDRK OPCATURN2 SYSNAME MCEVS4

JOBNAME OPCA

ASID 003F

TCBADDR EXC/SHR STATUS 007DE070 EXCLUSIVE OWN

As you see, there are two tasks trying to get access (or lock) for one resource exclusive. Exclusive means that no other task can get the lock at the same time. An exclusive lock is usually an update access. The second task has to wait until the first, which is currently the owner, releases it. Message ISG343I returns with two fields, called Major and Minor name. In our example, SYSZDRK is the major name, and OPCATURN2 is the minor name. SYSZDRK represents the active current plan while the first four digits of the minor name represents your Tivoli Workload Scheduler for z/OS subsystem name. With this information, you can search for known problems in the software database. If you find no hint, your Tivoli support representative may ask you for a console dump.

18.4.13 Preparing a console dump The console dump contains a snapshot of virtual storage areas, such as system dumps. The major difference is that a system dump is created by the operating

Chapter 18. Tivoli Workload Scheduler

829

system when an abnormal end happens. The console dump has to be created by you via z/OS commands from the system console. The dump options are very important because they influence the different parts of storage to be dumped. For waits or hangs, the GRSQ option must be turned on. Example 18-6 shows the display of the current dump options. Example 18-6 Display dump option COMMAND INPUT ===>/d d,o RESPONSE=MCEVS4 IEE857I 18.22.00 DUMP OPTION 371 SDUMP- ADD OPTIONS (ALLPSA,SQA,LSQA,RGN,LPA,TRT,CSA,SWA,SUMDUMP, ALLNUC,Q=YES,GRSQ),BUFFERS=00000000K, MAXSPACE=00001200M,MSGTIME=99999 MINUTES

SDUMP indicates the options for the SYSMDUMP, which is the preferred type of dump. The options shown are sufficient for almost every dump in Tivoli Workload Scheduler for z/OS. For a detailed explanation, refer to the z/OS system commands and options for SDUMP types. If you miss one of these options, you can change it with the change dump command (CD). For GRSQ, as an example: CD SET,SDUMP=(GRSQ)

You need to be sure that the dump datasets, which have been provided by the z/OS installation, are free to be taken. Run the command in Example 18-7 to see the dump datasets. Example 18-7 Display dump datasets COMMAND INPUT ===>/d d,t RESPONSE=MCEVS4 IEE853I 18.43.40 SYS1.DUMP TITLES 385 SYS1.DUMP DATA SETS AVAILABLE=003 AND FULL=000 CAPTURED DUMPS=0000, SPACE USED=00000000M, SPACE FREE=00001200M

Example 18-7 shows that all three can be used for console dumps. If not, you can clear a certain one. Make sure that nobody needs it anymore. To clear a certain dump dataset, you can issue following command: //dd clear,dsn=00

In this case, Sys1.Dump00 is eligible.

830

Troubleshooting Tivoli Using the Latest Features

18.4.14 Dump the failing system Now, you are ready to obtain the console dump for further analysis. Run the dump command in Example 18-8. Example 18-8 Dump command COMMAND INPUT ===> dump comm=(demo) 19:13:27.27 SFRA4 00000290 DUMP COMM=(DEMO) 19:13:27.30 SFRA4 00000090 *17 IEE094D SPECIFY OPERAND For DUMP COMMAND

Enter the outstanding reply number, 17, as in Example 18-9. Example 18-9 Dump the address space COMMAND INPUT ===>/17,tsoname=(opca) SFRA4 SFRA4

00000290 00000090 00000090 00000090 00000090 00000290 00000290 00000290 00000090

R 17,TSONAME=(OPCA) IEE600I REPLY TO 17 IS;TSONAME=(OPCA) IEA794I SVC DUMP HAS CAPTURED: 482 DUMPID=049 REQUESTED BY JOB (*MASTER*) DUMP TITLE=DEMO IEF196I IGD100I 40AA ALLOCATED TO DDNAME SYS00273 IEF196I IEF285I SYS1.DUMP01 IEF196I IEF285I VOL SER NOS= O260C1. IEA611I COMPLETE DUMP ON SYS1.DUMP01

Do not be confused about the TSONAME parameter. It specifies the name of an address space to be dumped. Alternatively, you can use ASID(hex) as well. Dump processing finished successfully as indicated by the message IEA611I. Please verify the existence of this message when you provide the dump to your local Tivoli support.

18.4.15 Information needed for all problems Even when you are unable to identify a problem type, you should gather the following information for any problem you have. Begin your initial problem analysis by examining the contents of the message log dataset with the following steps: 1. Obtain a copy of the Tivoli Workload Scheduler for z/OS message log. This is a sequential dataset defined by the EQQMLOG ddname. 2. Record the Tivoli Workload Scheduler for z/OS component ID: 5697-WSZ01. The component ID should be the first keyword in the string preceding the problem type and other modifier keywords.

Chapter 18. Tivoli Workload Scheduler

831

3. Record the maintenance level for all operating environments, particularly those for z/OS, JES, ISPF, and RACF. 4. Document any additional program temporary fixes (PTFs) or APARs that have been applied to your level of Tivoli Workload Scheduler for z/OS. 5. If the problem is within the network communication function, obtain copies of the EQQDUMP file. 6. Obtain copies of the Tivoli Workload Scheduler for z/OS diagnostic files defined to the user address space and to the subsystem address space by SYSMDUMP. 7. Obtain a copy of the system log. 8. Reconstruct the sequence of events leading to the problem. Include any commands entered just before the problem occurred. Write down the exact events that lead to the problem: – What was the first indication of the problem? – What were you trying to do? – What should have happened? – What did happen? – Can you recreate the problem? 9. Specify any unique information about the problem or about your system: – Indicate any other applications that were running when the problem occurred. – Describe how Tivoli Workload Scheduler for z/OS was started. – Describe all user modifications to active Tivoli Workload Scheduler for z/OS programs. If more information is needed, a Tivoli Support Center representative will guide you concerning any additional diagnostic traces that you can run.

18.4.16 Performing problem determination for tracking events Successful tracking of jobs in Tivoli Workload Scheduler for z/OS relies on the creation of different events written in the agent address space and processed from the engine. The engine waits for the complete arrival of these events and updates the current plan accordingly. The different tracking events are listed in Table 18-3 on page 833.

832

Troubleshooting Tivoli Using the Latest Features

Table 18-3 Tracking events Event number

Event name

Meaning

1

Reader

A job has entered the system.

2

Start event

A job has started to execute.

3S

Step end event

A step has finished execution.

3J

Job end event

A job has finished execution.

3P

Job termination event

A job has been added to the output queue.

4

Print event

An output group has been printed.

5

Purge event

A job output has been purged from JES spool.

The events are prefixed with either A (for JES2) or B (for JES3). At least the set of type 1, 2, 3J, and 3P events is needed to correctly track the several stages of a job’s life. The creation of step-end events (3S) depends on the value you specify in the STEPEVENTS keyword of the EWTROPTS statement. The default is to create a step-end event only for abending steps in a job or started task. The creation of print events depends on the value you specify in the PRINTEVENTS keyword of the EWTROPTS statement. By default, print events are created. If you find that the current plan status of a job is not reflecting the status in JES, you may have missing events. A good starting point is to run the Tivoli Workload Scheduler for z/OS AUDIT package for the affected occurrence to easily see which events are processed from the engine and which are missing, or you can browse your event datasets for the job name and job number to prove which events are not written. Problem determination depends on which event is missing and whether the events are created on a JES2 or JES3 system. In Table 18-4 on page 834, the first column refers to the event type that is missing, and the second column tells you what action to perform. The first entry in the table applies when all event types are missing (when the event dataset does not contain any tracking events).

Chapter 18. Tivoli Workload Scheduler

833

Table 18-4 Problem determination of tracking events Type

Problem determination actions

ALL

1. In the EQQMLOG dataset, verify that the event writer has started successfully. 2. Verify that the definition of the EQQEVDS ddname in the Tivoli Workload Scheduler for z/OS started-task procedure is correct, that is, the events are written to the correct dataset. 3. Verify that the required exits have been installed. 4. Verify that the IEFSSN nn member of SYS1.PARMLIB has been updated correctly and that an IPL of the MVS system has been performed since the update.

A1

If both A3P and A5 events are also missing: 1. Verify that the Tivoli OPC version of the JES2 exit 7 routine has been correctly installed. Use the $T EXIT(7) JES command. 2. Verify that the JES2 initialization dataset contains a LOAD statement and an EXIT7 statement for the Tivoli OPC version of JES2 exit 7 (OPCAXIT7). 3. Verify that the exit has been added to a load module library reachable by JES2 and that JES2 has been restarted since this was done. If either A3P or A5 events are present in the event dataset, call a Tivoli service representative for programming assistance.

834

Troubleshooting Tivoli Using the Latest Features

Type

Problem determination actions

B1

1. Verify that the Tivoli Workload Scheduler for z/OS version of the JES3 exit IATUX29 routine has been correctly installed. 2. Verify that the exit has been added to a load-module library that JES3 can access. 3. Verify that JES3 has been restarted.

A2/B2

1. Verify that the job for which no type 2 event was created has started to execute. A type 2 event will not be created for a job that is flushed from the system because of JCL errors. 2. Verify that the IEFUJI exit has been correctly installed: a. Verify that the System Management Facility (SMF) parameter member SMFPRM nn in the SYS1.PARMLIB dataset specifies that the IEFUJI exit should be called. b. Verify that the IEFUJI exit has not been disabled by an operator command. c. Verify that the correct version of IEFUJI is active. If SYS1.PARMLIB defines LPALIB as a concatenation of several libraries, z/OS uses the first IEFUJI module found. d. Verify that the library containing this module was updated by the Tivoli OPC version of IEFUJI and that z/OS has been IPLd since the change was made.

Chapter 18. Tivoli Workload Scheduler

835

Type

Problem determination actions

A3S/B3S

If type 3J events are also missing: 1. Verify that the IEFACTRT exit has been correctly installed. 2. Verify that the SMF parameter member SMFPRM nn in the SYS1.PARMLIB dataset specifies that the IEFACTRT exit should be called. 3. Verify that the IEFACTRT exit has not been disabled by an operator command. 4. Verify that the correct version of IEFACTRT is active. If SYS1.PARMLIB defines LPALIB as a concatenation of several libraries, z/OS uses the first IEFACTRT module found. 5. Verify that this library was updated by the Tivoli Workload Scheduler for z/OS version of IEFACTRT and that z/OS has been IPLd since the change was made. If type 3J events are not missing, verify in the EQQMLOG dataset that the event writer has been requested to generate step-end events. Step-end events are only created if the EWTROPTS statement specifies STEPEVENTS(ALL) or STEPEVENTS(NZERO) or if the job step abended.

A3J/B3J

If type 3S events are also missing, follow the procedures described for type 3S events. If type 3S events are not missing, call a Tivoli service representative for programming assistance.

836

Troubleshooting Tivoli Using the Latest Features

Type

Problem determination actions

A3P

If A1 events are also missing, follow the procedures described for A1 events. If A1 events are not missing, call a Tivoli service representative for programming assistance.

B3P

1. Verify that the Tivoli Workload Scheduler for z/OS version of the JES3 exit IATUX19 routine has been correctly installed. 2. Verify that the exit has been added to a load-module library that JES3 can access. 3. Verify that JES3 has been restarted.

Chapter 18. Tivoli Workload Scheduler

837

Type

Problem determination actions

A4/B4

1. If you have specified PRINTEVENTS(NO) on the EWTROPTS initialization statement, no type 4 events are created. 2. Verify that JES has printed the job for which no type 4 event was created. Type 4 events will not be created for a job that creates only held SYSOUT datasets. 3. Verify that the IEFU83 exit has been correctly installed: a. Verify that the SMF parameter member SMFPRM nn in the SYS1.PARMLIB dataset specifies that the IEFU83 exit should be called. b. Verify that the IEFU83 exit has not been disabled by an operator command. c. Verify that the correct version of IEFU83 is active. If SYS1.PARMLIB defines LPALIB as a concatenation of several libraries, z/OS uses the first IEFU83 module found. d. Verify that the library containing this module was updated by the Tivoli Workload Scheduler for z/OS version of IEFU83 and that MVS has been IPLd since the change was made. e. For JES2 users (A4 event), ensure that you have not specified TYPE6=NO on the JOBCLASS and STCCLASS statements of the JES2 initialization parameters.

838

Troubleshooting Tivoli Using the Latest Features

Type

Problem determination actions

A5

1. Verify that JES2 has purged the job for which no A5 event was created. 2. Ensure that you have not specified TYPE26=NO on the JOBCLASS and STCCLASS statements of the JES2 initialization parameters. 3. If A1 events are also missing, follow the procedures described for A1 events. 4. If A1 events are not missing, call a Tivoli service representative for programming assistance.

B5

1. Verify that JES3 has purged the job for which no B5 event was created. 2. If B4 events are also missing, follow the procedures described for B4 events. 3. If B4 events are not missing, call a Tivoli service representative for programming assistance.

18.5 Troubleshooting end-to-end solution This section describes troubleshooting possibilities for the end-to-end components. It helps you to determine with which parts of the product to look for useful diagnostic information and to solve common error scenarios.

18.5.1 End-to-end working directory The working directory is important for troubleshooting because it contains, for example, the messages produced from several end-to-end processes and tracking events from the distributed environment. To look at the files into the directory, you need to have an UNIX System Services UID and the proper authorization. There are several ways to navigate through the directory. 򐂰 Using the ishell

– This kind of dialog uses ISPF service and is useful if you are not familiar with the native shell environment. – The ishell can be run from the TSO command processor if you enter ish in the command line.

Chapter 18. Tivoli Workload Scheduler

839

򐂰 Using native shell

– Native shell is similar the UNIX shell, except that not all commands are supported. – Using the native shell, enter omvs into the TSO command processor command line. We recommend using the ishell to better illustrate the directory layout. Run ish to list the contents of your working directory. The directory is the same as the one you defined in the wrkdir parameter of the topology member (in our installation, /tws/twsctpwrk). Example 18-10 shows the directory. Example 18-10 Listing the work directory Directory List Select one or more files with / or action codes. EUID=0 /tws/ Type Filename _ Dir . _ Dir .. l Dir twsctpwrk

The list of files shown in Example 18-11 appears. Example 18-11 Working directory layout Type Filename _ Dir . _ Dir .. _ File Intercom.msg _ File localopts _ File Mailbox.msg _ Dir mozart _ File NetConf _ File NetReq.msg _ Dir pobox _ File Sinfold _ File Sinfonia _ Dir stdlist _ File Symold _ File Symphony _ File Translator.chk _ File Translator.wjl

We will explain each file in more detail in Table 18-5 on page 841.

840

Troubleshooting Tivoli Using the Latest Features

Table 18-5 Files and directory structure of UNIX System Services File name

Explanation

Intercom.msg

Inter-process communication messages between batchman and mailman process

localopts

Customization-related parameter applying for this workstation

Mailbox.msg

Messages from other distributed workstations

Mozart directory

Contains database objects

NetConf

Contains network tuning options for distributed workstations

NetReq.msg

Message file read by the netman

Pobox directory

Message queue files for inter-workstation communication

Sinfold

Old production plan file (Symphony)

Sinfonia

Copy of the Symphony file

Stdlist directory

Contains batchman, mailman, writer, netman logs, and translator traces

Symold

Copy of the Sinfold

Symphony

Symphony file created by Tivoli Workload Scheduler for z/OS

Translator.chk

Translator files

Translator.wjl

Translator files

Recommendation: Please do not modify or manipulate these files without contacting your Tivoli support.

18.5.2 The standard list directory The standard list (stdlist) directory (Example 18-12 on page 842) covers important files and directories used to find error messages related to end-to-end processing. It is subdivided into several directories names that are related to the date messages that have been issued. The netman process generates (at midnight) a new directory and switches to it.

Chapter 18. Tivoli Workload Scheduler

841

Example 18-12 Stdlist directory Dir .. _ File stderr _ File stdout _ Dir 2002.02.21 _ Dir 2002.02.22 _ Dir 2002.02.23 l Dir 2002.02.25

If you list the directory for a specific date, you will see three files, as listed in Example 18-13. Example 18-13 Stdlist files Type Filename _ Dir . _ Dir .. _ File NETMAN _ File STC _ File TRANSLATOR

The netman file holds all message files related to the netman process while batchman, writer, and mailman write to the STC file. This name can vary from the type of the installation. File STC represents the same file name as the Tivoli Workload Scheduler user ID in the Tivoli Workload Scheduler stdlist directory. The translator file is used by the translator process. Example 18-14 shows an example output of the STC file mentioned in Example 18-13. Example 18-14 Batchman messages BATCHMAN:01:20/Received Bl: BATCHMAN:01:20/OPCMASTER#B73EA5E0519ECF25.J_010_F202DWTESTSTREAM #J1086 BATCHMAN:01:20/Jobman streamed BATCHMAN:01:20/OPCMASTER#B73EA5E0519ECF25.J_010_F202DWTESTSTREAM (#J1086) BATCHMAN:01:20/AWS22010075I Changing schedule B73EA5E0519ECF25 status to BATCHMAN:01:20/EXEC BATCHMAN:01:20/Received Us: F202 BATCHMAN:01:20/+ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ BATCHMAN:01:20/+ AWS22010001E Unable to stream job BATCHMAN:01:20/+ OPCMASTER#B73EA5E0519ECF25.J_010_F202DWTESTSTREAM in file BATCHMAN:01:20/+ DIR: Error launching Invalid argument: BATCHMAN:01:20/+ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

842

Troubleshooting Tivoli Using the Latest Features

18.5.3 The standard list messages The messages you find within the files are all prefixed with AWS and an 8-digit number, either ending with I for informational, or an E for error messages. Error messages are prefixed with a plus sign (+) as well. The process that issues the message is shown on the first column. Tivoli Workload Scheduler 8.1 has new documentation, Tivoli Workload Scheduler 8.1 Error Messages, SH19-4557, that gives you a detailed explanation of messages and corresponding operator responses, as shown in Example 18-15. Example 18-15 Message explanation 22010069E Schedule (schedule name) is stuck, operator intervention required Explanation: A schedule is in the STUCK state. Schedules will go stuck for different reasons: When jobs in the schedule cannot launch because they depend on a job that has abended. This is the most common reason. When operator intervention is required for a reply to a prompt on one of the jobs in the schedule. If a job on which another depends cannot launch because its priority is zero. Schedule Name is the name of the schedule that is stuck. Operator Response: Use internal policies to determine how to handle the abended job. User can either cancel the job to satisfy the other job dependencies or rerun it again successfully. For prompts, replying to the active prompt will cause a change in status. For priorities, altering the value to any number greater than 0 will cause a change in state as well.

You can use the message number as a search argument in the knowledge database, which can be accessed from the following site: http://www.tivoli.com/support

You might need to investigate the batchjob output and the end-to-end server log in the Tivoli Workload Scheduler for z/OS to get an entire picture of the error you are facing. The controller message logs and batch output contains information about Symphony creation and its switch. The server log includes Starter and Translator log information.

18.5.4 Diagnose and fix problems with unlinked workstations If the workstation is not linked as it should, the cause of the problem could be that the writer process has not been initiated correctly or the run number for Symphony file on the fault tolerant workstation is not the same as the run number

Chapter 18. Tivoli Workload Scheduler

843

on the Tivoli Workload Scheduler master. If you mark the unlinked workstations and right-click, you will get a pop-up menu, as shown in Figure 18-11. Then click Link to try to link the workstation.

Figure 18-11 Link the workstation

You can check the Symphony run number and Symphony status in the legacy ISPF using option 6.6 (see Figure 18-12 on page 845).

844

Troubleshooting Tivoli Using the Latest Features

Command ===> Current plan created Planning period end Backup information: Last CP backup First logged event after backup

: 02/02/28 11.02 : 02/03/01 13.30

: 02/02/28 11.45 : 02/02/28 11.46

Daily planning status: Under production NCP ready

: No : No

Symphony status: Symphony run number Under production New Symphony ready

: 118 : No : No

In use ddname of: Current plan

: EQQCP2DS

Time stamp:

0102059F 16460266

Figure 18-12 Displaying the Symphony run number

If the workstation is Not Available/Offline, the reason could be that the mailman, batchman, and jobman processes are not started on the fault tolerant workstation. You can right-click the workstation to get the pop-up menu, and then click Set Status.... This will give you a new panel where you can try to activate the workstation by clicking the Active radio button (Figure 18-13 on page 846). This action will try to start the Mailman, Batchman and Jobman processes on the fault tolerant workstation (issuing a conman start command on the agent).

Chapter 18. Tivoli Workload Scheduler

845

Figure 18-13 Setting status to active

If you still encounter link problems, you may have to take a closer look at the following definitions: 򐂰 Check if the TPLGYPRM parameter in servopts of the end-to-end server and the batchopt points to the same topology member. 򐂰 Verify the host name and port number definitions of the topology member.

– Ensure that the port numbers are equal within the entire end-to-end environment. – Check your DOMREC and CPUREC definitions, especially the cpunode and cputcpip keywords. 򐂰 If you modified the member, make sure that you either run a replan, a daily plan extend, or a Symphony renew job. 򐂰 Investigate the end-to-end server logs as to whether the Symphony file has been successfully created and switched. 򐂰 Use the netstat command from the TSO command processor to check if the connection between the end-to-end server and the distributed domain manager is established.

846

Troubleshooting Tivoli Using the Latest Features

18.5.5 Symphony renew option In normal situations, the Symphony file is automatically generated during the daily plan processing. Some examples of error situations include the following: 򐂰 There is a non-valid job definition in the script library. 򐂰 The workstation definitions are incorrect. 򐂰 An incorrect Windows user name or password is specified.

But, sometimes during the regular operation situations, you might need to renew the Symphony file, such as when: 򐂰 You make changes to the script library or to the definitions of the TOPOLOGY statement. 򐂰 You add or change information in the current plan, such as workstation definitions.

If a problem occurs during the building of the Symphony file, the Symphony file will not be built. To create the Symphony file, you must perform a Symphony renew after correcting the errors. You need to look to the following logs to check if the Symphony file has been created successfully: 򐂰 The return code of the batchjob may be the first place to look. The messages produced by the job should include the messages shown in Example 18-16. Example 18-16 Symphony creation message in batch output EQQ3101I 0000048 JOBS ADDED TO THE SYMPHONY FILE FROM THE CURRENT Plan EQQ3087I THE SYMPHONY FILE HAS BEEN SUCCESSFULLY CREATED

Note: In a certain situation, when the scriptlib contains syntax errors indicated by the message EQQZ086E, the Symphony renew job ended with return code 0. We have already addressed this issue. 򐂰 The end-to-end server log must issue the messages that the input translator finished waiting for batchman (see Example 18-17). Example 18-17 Creation messages in end-to-end server log EQQPT30I EQQPT22I EQQPT31I EQQPT20I EQQPT21I EQQPT23I

Starting switching Symphony Input Translator thread stopped until new Symphony will be available Symphony successfully switched Input Translator waiting for Batchman is started Input Translator finished waiting for Batchman Input Translator thread is running

Chapter 18. Tivoli Workload Scheduler

847

򐂰 Messages in the log of the Tivoli Workload Scheduler for z/OS engine should contain the messages shown in Example 18-18. Example 18-18 Creation message in the z/OS engine log EQQN111I A NEW SYMPHONY FILE HAS BEEN CREATED EQQW090I THE NEW SYMPHONY FILE HAS BEEN SUCCESSFULLY SWITCHED

If the Symphony file is not created successfully, you need to investigate the logs for any error messages. Look in the Tivoli Workload Scheduler for z/OS V8R1 Messages and Codes, SH19-4548 for an explanation and system programmer response, or contact your Tivoli customer support. Note: Recovering the current plan from an error situation may also imply recovering the Symphony file. If the Symphony file is not up-to-date with the Current Plan, submit the Symphony renew or the daily plan batch job.

See also the Disaster recovery planning chapter in Tivoli Workload Scheduler for z/OS V8R1 Customization and Tuning , SH19-4544.

18.5.6 UNIX System Services diagnostics UNIX System Services provide powerful display commands. They can help you verify if the required processes are still running, and return valuable information about the HFS dataset. Example 1: To display z/OS UNIX System Services process information on all z/OS System Services address spaces owned by our end-to-end server user TWSRES1, enter in SDSF: /DISPLAY OMVS,U=TWSRES1

The output is shown in Example 18-19. Example 18-19 End-to-end process BPXO040I 13.14.12 DISPLAY OMVS 000F ACTIVE USER JOBNAME ASID TWSRES1 TWSCJSC 0050 LATCHWAITPID= 0 TWSRES1 TWSCTP 004F LATCHWAITPID= 0 TWSRES1 TWSCTP 004F LATCHWAITPID= 0 TWSRES1 TWSCTP 004F LATCHWAITPID= 0

848

OMVS 009 OMVS=(3A) PID PPID STATE START CT_SECS 65614 1 1FI--- 10.16.33 1.93 CMD=EQQPHTOP 65615 1 MW---- 10.16.33 102.46 CMD=EQQPHTOP 65616 65615 1S---- 10.16.33 102.46 CMD=/usr/lpp/TWS/TWS810/bin/starter /usr/lpp 65617 65616 HS---- 10.16.33 102.46 CMD=/usr/lpp/TWS/TWS810/bin/translator SYSZD

Troubleshooting Tivoli Using the Latest Features

TWSRES1 TWSCTP LATCHWAITPID= TWSRES1 TWSC LATCHWAITPID= TWSRES1 TWSCTP LATCHWAITPID= TWSRES1 TWSCTP LATCHWAITPID= TWSRES1 TWSCTP LATCHWAITPID= TWSRES1 TWSRES1 LATCHWAITPID= TWSRES1 TWSRES1 LATCHWAITPID= TWSRES1 TWSCTP LATCHWAITPID=

004F 004E 004F 004F 004F 001F 001F 004F

65618 65616 1F---- 10.16.34 102.46 0 CMD=/usr/lpp/TWS/TWS810/bin/netman -port 312 50397284 1 1RI--- 13.54.26 28.82 0 CMD=EQQTTTOP 33620089 65618 1F---- 12.34.18 102.46 0 CMD=EQQTTTOP 33620089 65618 1F---- 12.34.18 102.46 0 CMD=/usr/lpp/TWS/TWS810/bin/writer -- 2001 50397311 65618 1F---- 12.34.11 102.46 0 CMD=/usr/lpp/TWS/TWS810/bin/mailman -parm 32 33620135 1 MRI--- 12.00.09 1.83 0 CMD=EXEC 67174569 33620135 1CI--- 12.01.40 1.83 0 CMD=sh -L 33620140 50397311 1F---- 12.34.12 102.46 0 CMD=/usr/lpp/TWS/TWS810/bin/batchman -parm

The display output shows that job name TWSCTP (our end-to-end server started task) running in address space id x’4f’ started several end-to-end processes like translator, netman, writer, mailman, and batchman at a specific time. Every process has a process ID (PID) assigned. Looking to the parent process ID (PPID), you can see that mailman and writer have been spawned by the netman process. Example 2: To display detailed file system information on currently mounted files, enter: /DISPLAY OMVS,FILE

The output is as shown in Example 18-20. Example 18-20 File system status BPXO045I 13.39.04 DISPLAY OMVS 234 OMVS 000F ACTIVE OMVS=(3A) TYPENAME DEVICE ----------STATUS----------AUTOMNT 27 ACTIVE NAME=*AMD/u PATH=/u OWNER=SC65 AUTOMOVE=Y CLIENT=N TFS 238 ACTIVE NAME=/SC63/TMP PATH=/SC63/tmp MOUNT PARM=-s 500 OWNER=SC63 AUTOMOVE=N CLIENT=N TFS 223 ACTIVE NAME=/SC65/TMP PATH=/SC65/tmp MOUNT PARM=-s 500

MODE RDWR

RDWR

RDWR

Chapter 18. Tivoli Workload Scheduler

849

OWNER=SC65 AUTOMOVE=N CLIENT=Y HFS 65 ACTIVE NAME=WTSCPLX2.SC65.SYSTEM.HFS PATH=/SC65 OWNER=SC65 AUTOMOVE=N CLIENT=Y HFS 50 ACTIVE NAME=OMVS.TWS810.TWSCTP.HFS PATH=/tws/twsctpwrk OWNER=SC65 AUTOMOVE=Y CLIENT=Y

RDWR

RDWR

The display command returns the name of the HFS dataset, mountpoint, owner, and mode once it has been mounted. Example 3: To display information about current system-wide parmlib limits, enter: / DISPLAY OMVS,L

The output is as shown in Example 18-21. Example 18-21 Displaying limits BPXO051I 14.52.40 DISPLAY OMVS 073 OMVS 000F ACTIVE OMVS=(3A) SYSTEM WIDE LIMITS: LIMMSG=NONE CURRENT HIGHWATER SYSTEM USAGE USAGE LIMIT MAXPROCSYS 47 61 300 MAXUIDS 1 2 50 MAXPTYS 0 1 256 MAXMMAPAREA 0 0 4096 MAXSHAREPAGES 0 0 32768000 IPCMSGNIDS 10 10 20000 IPCSEMNIDS 0 0 20000 IPCSHMNIDS 0 0 20000 IPCSHMSPAGES 0 0 2621440 IPCMSGQBYTES --108 262144 IPCMSGQMNUM --9 10000 IPCSHMMPAGES --0 25600 SHRLIBRGNSIZE 0 0 67108864 SHRLIBMAXPAGES 0 0 4096

Displaying the limits is useful to determine if you hit any system-specific limitations. For more display examples, you can also look in z/OS V1R4.0 MVS System Commands, SA22-7627.

850

Troubleshooting Tivoli Using the Latest Features

Example 4: Verify the current utilization of the working directory (see Example 18-22). This is not possible via the display omvs command. Instead, you can use the df -k shell command. In the Ishell, type u in the command line before the working directory. Example 18-22 File system utilization File System Attributes File system name: OMVS.TWS810.TWSCTP.HFS Mount point: /tws/twsctpwrk Status . . . . . File system type Mount mode . . . Device number . Type number . . DD name . . . . Block size . . . Total blocks . . Available blocks Blocks in use .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

: : : : : : : : : :

Available HFS R/W 50 1 4096 256008 251383 4595

18.5.7 TCP/IP server If an internal trace is necessary to pin down problems on the TCP/IP server, you can insert diagnose flags within the server parameter file in order to trace certain types of data. Perform this task only if the Tivoli support engineer asks you, because it could generate a huge trace output. The trace information will be put into the EQQMLOG of the server. To activate the trace, define the diagnose statement in the following way: DIAGNOSE SERVERFLA(.............) DIAGONSE TPLGYFLAGS(...........)

Where SERVERFLA produces general trace output and TPLGYFLAGS produces output related to end-to-end processing. Ask your IBM support center for the right settings related to your problem.

Chapter 18. Tivoli Workload Scheduler

851

18.5.8 Tivoli Workload Scheduler for z/OS connector The wopcconn utility provides a connector trace that can be useful for debugging by the Tivoli support engineers. The trace can either be activated at the command line with wopcconn or in interactive mode. To control the current settings, issue the following command: wopcconn -view -e engine_name | -o object_id

In order to set a trace level, issue the following command: wopcconn -set -e engine_name | -o object_id

[-t trace_level] [-l trace_length]

Or, use interactive mode as follows: 1. Log on to the managed node where the connector is installed and type in wopcconn. The screen in Example 18-23 should display. Example 18-23 Connector display ******** OPC Connector manage program ******** Main menu 1. Create new OPC Connector 2. Manage an existing OPC Connector 0. Exit

2. Select option 2. The screen in Example 18-24 should display. Example 18-24 Instance selection ******** OPC Connector manage program ******** Select instance menu 1. OPC 0. Exit

3. Choose the instance where you need to activate the trace level. The screen in Example 18-25 on page 853 should display.

852

Troubleshooting Tivoli Using the Latest Features

Example 18-25 Changing connector attributes ******** OPC Connector manage program ******** Manage menu Name Object id Managed node Status

: : : :

OPC 1929225022.1.1771#OPC::Engine# itso7 Active

OPC version

: 2.3.0

1. Stop the OPC Connector 2. Start the OPC Connector 3. Restart the OPC Connector 4. View/Change attributes 5. Remove instance 0. Exit

4. Select option 4 to change the connector attributes. 5. Change the trace level with option 6 to your value (Example 18-26). Example 18-26 Setting the new trace level ******** OPC Connector manage program ******** View/Change attributes menu Name Object id Managed node Status

: : : :

OPC 1929225022.1.1771#OPC::Engine# itso7 Active

OPC version

: 2.3.0

2. Name

: OPC

3. IP Address or Hostname: 9.39.62.19 4. IP portnumber : 3111 5. Trace Length 6. Trace Level

: 524288 : 1

0. Undo changes 1. Commit changes

6. Commit your changes and restart the connector to activate it.

Chapter 18. Tivoli Workload Scheduler

853

The default length of the trace is 512 KB for each instance. When the length is exceeded, the trace is wrapped around. If an unexpected error occurs, the trace must be copied as soon as possible. You will find the trace in the $DBDIR/OPC/engine_name.log directory. The trace levels, listed in Table 18-6, are available for a Tivoli Workload Scheduler for z/OS connector instance. Table 18-6 Trace levels Level

Trace data

0

Errors

1

Called methods Connections IDs

2

Filters PIF requests Numbers of elements returned in queries

3

Flow the connection details Main functions in/out values

4

Service functions in/out Main functions in/out values

5

Frequently called functions in/out

18.6 Troubleshooting the Job Scheduling Console In case of errors when using the Job Scheduling Console, check the message identifier to understand the source of the error: GJS0xxx

JSC error

GJSQxxx

Tivoli Workload Scheduler for z/OS specific error

GJSWxxx

Tivoli Workload Scheduler specific error

Read error details for explanations and suggested behaviors. The console and error log can be found in the \Jsconsole\dat\.tmeconsole directory. Consult the trace file. Remember that error tracing is active by default. Also, check the file bin\java\error.log for untraced errors.

854

Troubleshooting Tivoli Using the Latest Features

JSC error examples Here we show you common JSC error situations and the possible causes. 򐂰 Error description: Error while logging into the TMR host (see Figure 18-14).

Figure 18-14 JSC log on error message

Possible cause: Check login/password, TCP/IP connection to Tivoli host, and Tivoli authorizations. 򐂰 Error description: Connector error message (see Figure 18-15)

Figure 18-15 Connector link failure

Possible cause: The connector is probably not correctly installed or instances have not been created. Check the connector installation. Check also if JSC installation is corrupted or incomplete. 򐂰 Error description: One of the connector instances is not enabled because of errors during loading (see Figure 18-16 on page 856).

Chapter 18. Tivoli Workload Scheduler

855

Figure 18-16 Disabled instance

Possible cause: Check if the version of that engine is supported by the JSC. For a complete list of the compatible versions, refer to the Tivoli Workload Scheduler Release Notes Version 8.1. 򐂰 Error description: There is a problem with Tivoli Management Framework (for example oserv down, marshal error) (Figure 18-17).

Figure 18-17 Tivoli Management Framework failure

Possible cause: Check Tivoli Management Framework status. Also compare the JSC and connector trace files.

856

Troubleshooting Tivoli Using the Latest Features

򐂰 Error description: There is a problem with TCP/IP communication between the connector and the Tivoli Workload Scheduler for z/OS host (Figure 18-18).

Figure 18-18 Allocation error message

Possible cause: Check network integrity and Tivoli Workload Scheduler for z/OS connector parameters.

18.6.1 Trace for the Job Scheduling Console The trace utility provides a strong mechanism to find and diagnose the JSC problems. A log file is produced that monitors all JSC activities. Tracing can work at different detail levels to filter data of interest. Trace output can be customized. Open the console file using: 򐂰 ...\bin\java\console.bat on Windows 򐂰 .../bin/java/SUNconsole.sh on Solaris 򐂰 .../bin/java/AIXconsole.sh on AIX

Find the section where the user can customize variable values. Locate the two variables, TRACELEVEL and TRACEDATA (Example 18-27 on page 858). They should be set to 0 by default.

Chapter 18. Tivoli Workload Scheduler

857

Example 18-27 Console.bat file REM REM set set REM

---------- Section to be customized --------change the following lines to adjust trace settings TRACELEVEL=0 TRACEDATA=0 ------ End of section to be customized -------

Change the value of the variable TRACELEVEL to activate the control flow trace at different levels. Change the value of the variable TRACEDATA to activate the data flow trace at different levels. Acceptable values range from 0 to 3. TRACELEVEL also allows the value -1, which completely disables the trace, as shown in Table 18-7. Table 18-7 Tracelevel values Trace level value

Trace types

-1

No trace at all (to be used only in particular cases).

0

Only errors and warnings are recorded.

1

Errors, warnings, and info/debug lines are recorded.

2

Errors, warnings, and methods entry/exit are recorded.

3

Errors, warnings, info/debug lines, and method entry/exit are recorded.

Table 18-8 lists trace data values and their corresponding trace types. Table 18-8 Tracedata

858

Tracedata value

Trace type

0

No data is traced.

1

Data structures from/to the connector are traced.

2

The internal value of the JSC beans are recorded.

3

Both data structures and bean internal values are recorded.

Troubleshooting Tivoli Using the Latest Features

Note: Tracing can adversely affect the performance of JSC. Use major values of TRACELEVEL and TRACEDATA only when necessary.

Trace files can become huge. Use advanced customization to optimize disk space allocation. Move or delete log files related to the previous executions.

18.7 Troubleshooting Tivoli Workload Scheduler Below are some common Tivoli Workload Scheduler problems and how to solve them.

18.7.1 FTAs not linking to the master 򐂰 If netman is not running on the FTA:

– If netman has not been started, start it from the command line with the StartUp command. Note that this will start only netman, not any other Tivoli Workload Scheduler processes. – If netman started as root and not as a Tivoli Workload Scheduler user, bring Tivoli Workload Scheduler down normally, and then start up as a Tivoli Workload Scheduler user through the conman command line on the master or FTA: unlink stop ; wait shut ; wait StartUp

– If netman could not create a standard list directory: •

If the file system is full, open some space in the file system.



If a file with the same name as the directory already exists, delete the file with the same name as the directory. The directory name would be in a yyyy.mm.dd format.



If the directory or netman standard list is owned by root and not Tivoli Workload Scheduler, change the ownership of the directory standard list from the command line in UNIX with the chown yyyy.mm.dd command. Note that this must be done as root.

򐂰 If the host file or DNS changes, it means that:

– The host file on the FTA or master has been changed. – The DNS entry for the FTA or master has been changed. – The host name on the FTA or master has been changed.

Chapter 18. Tivoli Workload Scheduler

859

򐂰 If the communication processes are hung:

– Mailman process down or hung on FTA: •

Tivoli Workload Scheduler was not brought down properly. Try to always bring Tivoli Workload Scheduler down properly via the conman command line on the master or FTA using the following commands: unlink stop ; wait shut ; wait



If the mailman read corrupted data, try to bring Tivoli Workload Scheduler down normally. If this is not successful, kill the mailman process with the following steps. UNIX: Run ps -ef | grep maestro to find the process ID. Run kill -9 to kill the mailman process. Windows (commands in TWShome\unsupported directory): Run listproc to find the process ID. Run killproc to kill the mailman process.



If batchman is hung: Try to bring Tivoli Workload Scheduler down normally. If not successful, kill the mailman process, as explained in the previous bullet.

򐂰 If the writer process for FTA is down or hung on the master, it means that:

– FTA was not properly unlinked from the master. – The writer read corrupted data. – Multiple writers are running for the same FTA. Use ps -ef | grep maestro to check that the writer processes are running. If there is more than one process for the same FTA, perform the following steps:

860



Shut down Tivoli Workload Scheduler normally.



Check the processes for multiple writers again.



If there are multiple writers, kill them.

Troubleshooting Tivoli Using the Latest Features

򐂰 If the netman process is hung:

– If multiple netman processes are running, try shutting down netman properly first. If this is not successful, kill netman using the following commands: UNIX: Use ps -ef | grep maestro to find the running processes. Issue kill -9 to kill the netman process. Windows (commands in unsupported directory): Use listproc to find the process ID. Run killproc to kill the mailman process. – Hung port/socket; FIN_WAIT2 on netman port. •

Use netstat -a | grep for both UNIX and NT systems to check if netman is listening.



Look for FIN_WAIT2 for the Tivoli Workload Scheduler port.



If FIN_WAIT2 does not time out (approximately 10 minutes), reboot.

򐂰 Network problems to look for outside of Tivoli Workload Scheduler include:

– The router is down in a WAN environment. – The switch or network hub is down on an FTA segment. – There has been a power outage. – There are physical defects in the network card/wiring.

18.7.2 Batchman not up or will not stay up (batchman down) 򐂰 If the message file has reached 1 MB:

Check the size of the message files (files whose names end with .msg) in the Tivoli Workload Scheduler home directory and pobox subdirectory. 48 bytes is the minimum size of these files. – Use the evtsize command to expand temporarily, and then try to start Tivoli Workload Scheduler: evtsize

For example: evtsize Mailbox.msg 2000000

– If necessary, remove the message file (only after failing with the EVTSIZE and start). Message files contain important messages being sent between Tivoli Workload Scheduler processes and between Tivoli Workload Scheduler

Chapter 18. Tivoli Workload Scheduler

861

agents. Remove a message file only as a last resort; all data in the message file will be lost. 򐂰 Jobman not owned by root.

If jobman (in the bin subdirectory of the Tivoli Workload Scheduler home directory) is not owned by root, correct this problem by logging in as root and running the chown root jobman command. 򐂰 Read bad record in Symphony file.

This can happen for the following reasons: – There is a byte order problem between UNIX and Intel platforms (requires patches). – Initialization process interrupted or failed. – Cannot create Jobtable. – Corrupt data in Jobtable. 򐂰 Message file corruption.

This can be for the following reasons: – Bad data – File system full – Power outage – CPU hardware crash

18.7.3 Jobs not running 򐂰 Jobs not running on NT

– If NT authorizations for Tivoli Workload Scheduler users are not in place, you can try the following:

862



Act as part of the operating system.



Increase quotas.



Log on as a batch job.



Log on as a service.



Log on locally.



Replace a process level token.

Troubleshooting Tivoli Using the Latest Features

– Valid NT or Domain user for FTA not in the Tivoli Workload Scheduler user database Add the Tivoli Workload Scheduler user for FTA in the Tivoli Workload Scheduler user database. Do not fill in the CPU Name field if the Tivoli Workload Scheduler user is a domain account. – Password for NT user has been changed. Do one of the following: •

Change the password on NT to match the one in the Tivoli Workload Scheduler user database.



Change the password in the Tivoli Workload Scheduler user database to a new password.

Note that changes to the Tivoli Workload Scheduler user database will not take effect until Jnextday. If the user definition existed previously, you can use the altpass command to change the password for theproduction day. 򐂰 Jobs not running on NT or UNIX

– Batchman down. See 18.7.2, “Batchman not up or will not stay up (batchman down)” on page 861. – Limit set to 0. Change the limit to 10 via the conman command line: For a single FTA: lc ;10

For all FTAs: lc @;10;noask

– Fence set above the limit. Change fence to 10 via the conman command line: For all FTAs: f @;10;noask

If dependencies are not met, it could be for the following reasons: •

Start time not reached yet or UNTIL time has passed.



OPENS file not present yet.



Job FOLLOW not complete.

Chapter 18. Tivoli Workload Scheduler

863

18.7.4 Jnextday is hung or still in EXEC state 򐂰 Stageman cannot get exclusive access to Symphony. 򐂰 Batchman and/or mailman was not stopped before running Jnextday from the command line. 򐂰 Jnextday not able stop all FTAs.

– Network segment down and cannot reach all FTAs. – One or more of the FTAs has crashed. – Netman not running on all FTAs. 򐂰 Jnextday not able to start or initialize all FTAs.

– The master or FTA was manually started before Jnextday completed stageman. Reissue a link from the master to the FTA. – The master was not able to start batchman after stageman completed. See 18.7.2, “Batchman not up or will not stay up (batchman down)” on page 861. – The master was not able to link to FTA. See 18.7.1, “FTAs not linking to the master” on page 859.

18.7.5 Jnextday in ABEND state 򐂰 Jnextday not completing compiler processes.

This may be due to bad or missing data in the schedule or job. You can perform the following actions: – Check for missing calendars. – Check for missing resources. – Check for missing parameters. 򐂰 Jnextday not completing stageman process.

This may be due to bad or missing data in the CARRYFORWARD schedule. You can perform the following actions: – Run show jobs or show schedules to find the bad schedule. – Add missing data and rerun Jnextday. – Cancel the schedule and rerun Jnextday.

864

Troubleshooting Tivoli Using the Latest Features

򐂰 Jnexday not completing logman process.

The reason may be one of the following: – Negative run-time error (requires patch). – The master was manually started before logman completed.

18.7.6 FTA still not linked after Jnextday 򐂰 Symphony file corruption

Corruption during transfer of Sinfonia file – Byte order problem between UNIX and Intel Apply patches that correct byte order problem. Recent versions of Tivoli Workload Scheduler are unaffected by this problem. 򐂰 Symphony file, but no new run number, date, or time stamp

You can perform the following actions: – Try to link FTA. See 18.7.1, “FTAs not linking to the master” on page 859. – Remove Symphony and message files (on FTA only) and link from the master again. 򐂰 Run number, Symphony file, but no date or time stamp

You can perform the following actions: – Try to link FTA. See 18.7.1, “FTAs not linking to the master” on page 859. – Remove Symphony and message files (on FTA only) and link from the master again.

18.7.7 Introduction to the Tivoli Workload Scheduler 8.1 tracing facility Tivoli Workload Scheduler 8.1 uses a new tracing facility called Autotrace. (Autotrace is also available for the Tivoli Management Framework components starting with Tivoli Management Framework Version 4.1. Please refer to Chapter 5, “Autotrace” on page 131 for detailed information about Autotrace.). Before the incorporation of Autotrace into Tivoli Workload Scheduler, support personnel used a comparatively primitive tracing method: If a problem appeared that could not be explained without a trace, the program developers would compile a debug version of the affected program and the customer would put this debug program in place until the problem happened again. With the new Autotrace tracing facility, potentially valuable trace data is collected all the time. This data can be exported to a binary file and sent to IBM support in the event of a program failure.

Chapter 18. Tivoli Workload Scheduler

865

Normally, it should not be necessary to capture a trace. If IBM support requests a trace, you can capture a trace using the command shown in Example 18-28. Example 18-28 Capturing a trace using the atctl snap command atctl snap 1 snapfile.at

The snapfile.at file is the captured trace file. This file can be named anything you like, but it is customary to end the file name with .at to identify it as an AutoTrace snap file. The file is not readable without special tools and library files.

866

Troubleshooting Tivoli Using the Latest Features

19

Chapter 19.

IBM Tivoli Configuration Manager This chapter covers IBM Tivoli Configuration Manager 4.2 troubleshooting together with a discussion of new features, components, and architecture. Although the chapter is based on IBM Tivoli Configuration Manager 4.2, a great deal of the troubleshooting techniques, and log and trace files described in this chapter are also applicable for Software Distribution 4.0 or 4.1 and Inventory 4.0, so if you are using one of these products, you will find this chapter useful for troubleshooting. In this chapter, the following topics are discussed: 򐂰 Section 19.1, “IBM Tivoli Configuration Manager 4.2 overview” on page 869 򐂰 Section 19.2, “IBM Tivoli Configuration Manager 4.2 components” on page 869 򐂰 Section 19.3, “IBM Tivoli Configuration Manager 4.2 new features” on page 871 򐂰 Section 19.4, “Troubleshooting Software Distribution” on page 882 򐂰 Section 19.5, “Troubleshooting Activity Planner” on page 902 򐂰 Section 19.6, “Troubleshooting Change Manager” on page 908 򐂰 Section 19.7, “Troubleshooting Web Gateway and device management” on page 910

© Copyright IBM Corp. 2003. All rights reserved.

867

򐂰 Section 19.8, “Troubleshooting Web User Interface” on page 923 򐂰 Section 19.9, “Troubleshooting Enterprise Directory Integration” on page 927 򐂰 Section 19.10, “Troubleshooting Inventory” on page 928

868

Troubleshooting Tivoli Using the Latest Features

19.1 IBM Tivoli Configuration Manager 4.2 overview IBM Tivoli Configuration Manager 4.2 integrates Software Distribution, Inventory, and Tivoli Management Framework into a powerful distribution, change management, and asset management tool in the enterprise environment. With the release of IBM Tivoli Configuration Manager 4.2, new features as well as enhancements have been added to further extend the management of the customer’s environment. IBM Tivoli Configuration Manager 4.2 is a single package that includes: 򐂰 Tivoli Management Framework 򐂰 Software Distribution 򐂰 Inventory 򐂰 Activity Planner 򐂰 Change Manager 򐂰 Web User Interface 򐂰 Resource Manager 򐂰 Web Gateway

Some of the features listed above are new in the IBM Tivoli Configuration Manager 4.2 packaging.

19.2 IBM Tivoli Configuration Manager 4.2 components IBM Tivoli Configuration Manager 4.2 can be installed using the IBM Tivoli Configuration Manager install program, Tivoli Desktop installation can be provided by Tivoli Management Framework, the Tivoli command line, the Tivoli Software Installation Service, or any combination of these options. Please refer to the IBM Tivoli Configuration Manager Planning and Installation Version 4.2, GC23-4702 for detailed information and prerequisites. The following are the typical components to be installed on the Tivoli Server: 򐂰 IBM Tivoli Software Distribution Version 4.2 򐂰 IBM Tivoli Inventory Version 4.2 򐂰 IBM Tivoli Activity Planner Version 4.2 򐂰 IBM Tivoli Change Manager Version 4.2 򐂰 IBM Tivoli Software Package Editor Version 4.2 򐂰 IBM Tivoli Resource Manager Version 4.2

Chapter 19. IBM Tivoli Configuration Manager

869

򐂰 IBM Tivoli Web Infrastructure Version 4.2 򐂰 IBM Enterprise Directory Query Facility Version 4.2 Note: Before installing IBM Enterprise Directory Query Facility Version 4.2, please make sure there is already an installed and configured LDAP directory server. For a list of supported LDAP directory servers, see the IBM Tivoli Configuration Manager Version 4.2 Release Notes, GI11-0926.

The following are the typical components to be installed on the Tivoli managed nodes: 򐂰 IBM Tivoli Software Distribution Version 4.2 򐂰 IBM Tivoli Software Distribution Gateway Version 4.2 򐂰 IBM Tivoli Inventory Version 4.2 򐂰 IBM Tivoli Inventory Gateway Version 4.2 򐂰 IBM Tivoli Activity Planner Version 4.2 򐂰 IBM Tivoli Change Manager Version 4.2 򐂰 IBM Tivoli Software Package Editor Version 4.2 򐂰 IBM Tivoli Resource Manager Gateway Version 4.2 򐂰 IBM Tivoli Enterprise Directory Query Facility Version 4.2 Note: Install gateway products on managed nodes designated as gateways.

The following are components that can be installed on Tivoli Management Agents (endpoints). Use the IBM Tivoli Configuration Manager install program profile manager to distribute respective software package blocks (SPBs), or a disconnected install on the local machine to add these components to the endpoint: 򐂰 IBM Tivoli Desktop Version 4.2 򐂰 IBM Tivoli Software Package Editor Version 4.2 򐂰 IBM Tivoli Software Distribution Pristine Tool Version 4.2 򐂰 IBM Tivoli Web Gateway Version 4.2 򐂰 IBM Tivoli Web Interfaces Version 4.2

870

Troubleshooting Tivoli Using the Latest Features

Note: The IBM Tivoli Web Gateway component allows access to Web objects and enables device management. This component must be installed on the endpoint that will serve as the Web gateway for Web and device access. When installing the Web Gateway component, the Web Gateway database must be installed prior to the Web Gateway Server.

19.3 IBM Tivoli Configuration Manager 4.2 new features New features have been added to IBM Tivoli Configuration Manager 4.2 to leverage the diverse distributed environment while using the same robust features of Software Distribution, Inventory, and Tivoli Management Framework. All Software Distribution and Inventory tools that were previously used are packaged into IBM Tivoli Configuration Manager 4.2 product with additional features for tighter integration and extensive options. Now, deployment, change, and asset management can span beyond the traditional desktops. At the same time, the functions of the products work together in a seamless manner for added value to the enterprise. The major features added to IBM Tivoli Configuration Manager 4.2 are introduced in the following sections below: 򐂰 Section 19.3.1, “New Web UI” on page 871 򐂰 Section 19.3.2, “Resource Manager” on page 876 򐂰 Section 19.3.3, “Device management” on page 877 򐂰 Section 19.3.4, “Integration with Enterprise Directories” on page 880 򐂰 Section 19.3.5, “Native packaging support” on page 881 򐂰 Section 19.3.6, “Multicast distribution” on page 881

19.3.1 New Web UI Whether configuration management operations are required to be done via the Web or, in the case of security limitations such as firewalls (having your Web server in the Demilitarized Zone or DMZ), the Configuration Manager Web Interface allows you to manage a number of Web objects while using a single URL. This support provides the ability to install software to a device that does not have the Tivoli agent (TMA) installed. This is achieved by downloading the appropriate software installation support to the device via a signed applet. The new WEB User Interface (WEB UI, or Web Interface) allows Software Distribution and Inventory to manage endpoints in the DMZ, via the proxy support, or to publish packages and profiles to the DMZ for endpoints, in the

Chapter 19. IBM Tivoli Configuration Manager

871

Internet, to access via a browser. This is important, as the network environments are becoming more secure. The endpoints are still able to be managed while in the DMZ. If such endpoints would like to download an application while in the DMZ, it is now possible to publish the packages and profiles to the DMZ so that the endpoints can download an application through a Web browser. A good example of this would be all the company executives needing a reporting software while travelling the various business units around the globe to present reporting data. A software package can be created for download via a Web browser. Once the software package is downloaded, it will check hardware and software prerequisites before installing the reporting application. Software Distribution 4.1 had some limitations of the WEB UI. These included: 򐂰 HTTPS protocol was not supported. 򐂰 Authentication was based on Tivoli Management Framework roles. 򐂰 Passwords were not encrypted. 򐂰 Web server had to be installed on a managed node. This managed node was not allowed to be in the DMZ.

In order to remove these limitations, IBM Tivoli Configuration Manager is leveraging existing technologies, such as IBM Tivoli Access Manager (formerly known as Policy Director) and IBM WebSphere, along with the Endpoint/Gateway proxy support. The Endpoint/Gateway proxy support is an add-on component that enables the core Tivoli Management Framework-based applications (such as Software Distribution and Inventory) to manage devices across firewalls. This solution addresses the support of firewalls between Tivoli Management Framework-based gateways and Tivoli Management Framework-based endpoints only. Software Distribution and Inventory can manage devices that sit on the other side of a firewall from the Tivoli Management Framework-based gateway (see Figure 19-1 on page 873).

872

Troubleshooting Tivoli Using the Latest Features

Internet Browser

DMZ

WebSeal

Intranet

Web Server

Signed Applet

Policy Director S W D

LDAP

Auth. Servlet

B

A

I N V

TMR WEB Server

TWG WebUI Sevlets + Applic. Servlets

MCollect EP

GW

Application Server Exploitation

Proxy tech.

Figure 19-1 WEB UI and firewalls

Main components of WEB UI Here are the main components of the WEB UI.

Tivoli Management Agent (TMA) on every Application Server The TMA must be on every Application Server. The function of the TMAs are to distribute SPBs and Inventory profiles to the application server so that clients can download them from the Web. The SPBs and Inventory profiles are not on Web servers, because application servers provide a higher level of security than a Web server.

Tivoli Management Agent (TMA) on one machine in the Access Manager Framework This TMA is used to automatically create/delete resources (software packages, Inventory profiles, and reference models) in the Access Manager database and to configure access rights on these resources. Performing a downcall on the TMA, which resides on a machine where the PDRTE is installed, does the configuration.

Chapter 19. IBM Tivoli Configuration Manager

873

Access Manager API (azhAPI) The Access Manager API resides on the Application Server. This API allows the application server to access Access Manager in order to perform authentication and authorization tasks.

Software Distribution/Inventory API The SWD/Inventory API sits on the Application Server. This API submits HTTP(s) requests to a Web Server (Application Level Gateway) in the secure zone that performs all the required tasks. Tasks can include retrieving properties for software packages, reference models, inventory profiles, generating action lists for the synchronization with reference models, and updating the Inventory database.

Client Platform-Specific plug-ins A JAVA servlet plug-in will be provided to generate the display of resources available to the user for that platform once the user authentication has taken place. The default view will display only those resources available to the user for the specific platform on which they are currently using. A secondary view, based on policy, will allow the end user to view all resources in which they have access to. The same plug-in will then be used to download and execute the selected management action. The plug-in is responsible for determining/generating the machine ID, such as GUID, on which the user is currently running.

Application Level Gateway This is basically a Web Server with a Java servlet engine. A Java servlet is installed on this for each task we need. This example shows a user flow when using Web User Interface from a browser: 1. The browser contacts the WebSEAL from the Internet through a secure connection. 2. The applet asks the user to grant all the special privileges it needs. 3. The applet checks the local configuration and eventually installs the Software Distribution installation engine. It creates also a GUID to identify the particular machine. 4. A Java servlet or a Java Server Pages (JSP) creates a page with a list of a public software packages and model, based on the operating system of the machine. 5. When the user chooses to access a restricted area, the user ID and password are sent to the WebSEAL using a form through the HTTPS protocol. 6. WebSEAL authenticates the user contacting the Policy Director Framework and creates the credential for the particular session.

874

Troubleshooting Tivoli Using the Latest Features

7. The application server, through the SWD/INV API, asks Software Distribution for the list of published software packages and models depending on the operating system of the machine. 8. The application server, using the azhAPI provided by the Access Manager, accesses authorization information maintained by the Access Manager to filter the software packages and models to which the user has access, and finally the HTML pages are generated. 9. When the user chooses to install a software package, the software package block is downloaded to the client and the installation engine does the rest. A catalog containing the CM Status of the client is maintained on the client itself. 10.When the user chooses to synchronize with a model, the applet sends all the information about the CM Status of the machine to the server. The application server asks the CCM to generate the list of actions needed to take the machine from the actual state to the desired state. The list of actions is presented on a HTML page (preview), and if the user accepts the applet, it executes all the actions and downloads all required software packages and inventory profiles. Web objects must be published by the Tivoli administrator. For example, a package must be pushed to the TMA machine with the Web Gateway installed. This package is then stored in a depot on the TMA. The administrator then grants access to the Web Interface client users. The Web Interface client users with access are then able to pull the package from the Web Interface by connecting to the gateway and then downloading it to the client’s machine. Web objects consist of: 򐂰 Software packages 򐂰 Inventory profiles 򐂰 Reference models

We can manage Web objects by: 򐂰 Installing software packages 򐂰 Verifying software packages 򐂰 Uninstalling software packages 򐂰 Viewing reference models 򐂰 Synchronizing reference models 򐂰 Running inventory scans

Chapter 19. IBM Tivoli Configuration Manager

875

To configure this on the Tivoli Server, register Software Distribution, and Inventory plug-ins to the Web application. If you install a component (Software Distribution or Inventory) on the Tivoli Server after you install the Web Interface, the respective Web Interface plug-in is registered automatically. However, if you install a component before installing the Web Interface, you must register the plug-in manually. To configure this on the Application Server (Web Gateway), a SPB is available. Before installing the Web Gateway component for Web object access using the Web Interfaces component, make sure to install and configure the following software: 򐂰 IBM DB2 򐂰 IBM WebSphere Application Server 򐂰 IBM Tivoli Access Manager 򐂰 IBM Tivoli Access Manager WebSEAL

19.3.2 Resource Manager Tivoli Resource Manager (TRM) is used to manage pervasive devices and users as resources. TRM’s main roles are to: 򐂰 Create an association between each device and assigned endpoint 򐂰 Retrieve users’ info and its endpoint 򐂰 Determine where resources, pervasive or users, are associated

To enable TRM, the IBM Tivoli Resource Manager Version 4.2 must be installed on the Tivoli Server and it should also be installed on the managed nodes to run Resource Manager commands. Tivoli Resource Manager Gateway Version 4.2 should be on gateways that communicate with endpoints that host the Web Gateway component (also referred to as resource gateways). The Tivoli Web Gateway Version 4.2 must be installed on the Tivoli endpoints that connects to pervasive devices. Before installing the Web Gateway component for resource management of devices, you must install and configure the following software: 򐂰 IBM DB2 򐂰 IBM WebSphere Application Server

To optionally protect the enrollment URLs, you can use IBM Tivoli Access Manager WebSEAL. Resource groups must contain resources of the same type. Either there is a device group only for pervasive devices or users group for enterprise directory users. The members of a resource group can be static or dynamic. The

876

Troubleshooting Tivoli Using the Latest Features

Resource Group shields applications, such as Software Distribution or Inventory, to known device or user concepts by taking care to create an association between each device or user with its assigned endpoint. Figure 19-2 shows the infrastructure of Resource Manager.

Resource Manager

DataBase Interface

Device Directory

Group

LDAP dSA

Table1

Figure 19-2 Resource Manager infrastructure

A component of Resource Manager resides on the Tivoli Server. A Resource Manager gateway component connects the server with the endpoints that are connected by the pervasive devices in the region. A resource gateway on the endpoint enables you to manage the devices that connect to it. In this release, the only resource gateway supported is Web Gateway. The Tivoli Server connects with a RIM host that interacts with an RDBMS server. This relational database keeps track of devices and the endpoint that manages them. Resource Manager enables you to work with the resource users that are defined in an Enterprise Directory server, for example, the Lightweight Directory Access Protocol (LDAP) server. Users are associated with endpoints in a one-to-one relationship and the mapping is stored in the LDAP server. Resource Manager enables you to view the association between a user and an endpoint. Resource tasks will be carried on by Resource Manager. It will use a database interface to address the Device Directory (which is a storing system) and to pull information from the Enterprise Directory server via LDAP (see figure 1-6). The database interface implementation is resource-type specific.

19.3.3 Device management IBM Tivoli Configuration Manager 4.2 has a new feature that extends management to pervasive devices. Distribution and Inventory scans can now be done against devices. Imagine sending a weekly price list to the Palm devices of 20,000 business partners. Another scenario would have all the pervasive devices become part of a reference model. You can have a reference model for Sales,

Chapter 19. IBM Tivoli Configuration Manager

877

Marketing, Executives, Accounting, and so on, such that when a user changes a role in the organization or group, the software on the device changes and the new role will be reflected on the pervasive device. In addition to being able to send a profile to a group that contains pervasive devices, Activity Planner extends targets and Change Manager extends subscribers to pervasive devices. The Tivoli Web Gateway (TWG) is the Device Management Server (DMS) extended to allow management actions (Inventory, Software Distribution, and Device Configuration) to be controlled from a TME cloud. In the Tivoli environment, the devices are managed using the Tivoli Device Manager (TDM) service. Using this application, the administrator can define devices, link them to the TMAs that can directly or indirectly manage them, and can create device groups. Device groups are known to the Tivoli Management Framework (a device group is a specialized profile manager) and can be used by Tivoli applications to address devices. Figure 19-3 shows an example of a flow in Software Distribution.

1

Administrator

Conf iguration Chang e Manager

2

Inv entory DB

3

SWDistManager Object

4

Activity Planner Manager

T ivoli Device Manager

5

Dev ice Direc tor y 6

TMR Server

Sof tware Dist Engine 6

Software Distribution Agent

Endpoint

Subagent 7 8

CT Abstraction Layer

11

Result Collector 10

Websphere Device Gateway 9

Devices

Figure 19-3 Data flow using Software Distribution to push to devices

878

Troubleshooting Tivoli Using the Latest Features

Where: 1. The administrator defines a reference model for the marketing people that have a Device of type Palm OS assigned. The default configuration should have an e-mail client, a browser, and a list of contacts for the main customers installed. The software to be installed to the devices is packaged in a Software Distribution package. Suppose that some new people join the marketing division of the company. To install the right software on the new Palm Pilots, the administrator adds them to the device group containing all Palm OS for marketing people and, using CCM, synchronizes the reference model of marketing people to the new devices. 2. CCM, using information in the Inventory database, determines the state of the package on the devices and prepares an Activity Planner (APM) plan to install it on the devices. 3. CCM submits the plan to APM. 4. Before starting an activity of the plan, APM interacts with TDM to define a temporary group to contain the list of devices to be addressed by the operation. 5. APM submits the request to the Software Distribution engine. The request addresses the new temporary group generated. 6. Software Distribution, once received by the device group, interacts with TDM to know the list of the TMAs that control the target devices and submits the request to the endpoints. The diagram shows a single endpoint, but a distribution could actually spawn across several endpoints. 7. When each endpoint receives the distribution, the Software Distribution Agent decodes the software package and executes the actions on the objects, as described in the software package. In this case, the built-in actions are specific for Palm. 8. The built-in action for the Palm device (sub agent) converts the software package into a group of TWG packages and submits a job, addressing all packages, to the Web Gateway. 9. When a target device connects to the TWG, the TWG executes the requested actions on the devices. 10. TWG sends the result of the job execution to the Result Collector. 11.The Results Collector collects results, sends multiple results based on how the administrator has configured the Results Collector, and sends them to the SWD Manager. The SWD Manager is responsible of the report management for Software Distribution. After these operations, the report is sent to APM to allow the update of the state of the plan on devices. The arrow from the Result Collector and the SWDistManager is dotted because the way reports flow from the endpoint to the collector is an open issue at this time.

Chapter 19. IBM Tivoli Configuration Manager

879

Device types currently supported are: 򐂰 Palm 򐂰 WinCE 򐂰 Nokia 9200 Series Note: Inventory is supported only for Palm and WinCE.

With the exception of the Nokia device, the agent is installed on the device via the device synchronization program. TWG supplies a device agent and plug-in for the Nokia 9200 Communicator series devices. The plug-in resides on the TWG server. The device agent resides on a host PC. For management tasks, a Nokia 9200 series device is connected to a host PC through a serial or infrared connection. Acting as a client, the device agent communicates with the plug-in on the TWG server. The device agent is referred to as a proxy agent because it does not reside on the device; however, it does act on behalf of the device. The following actions can be done against mobile devices: 򐂰 Software Distribution

Distribute software to mobile devices. 򐂰 Inventory

Gather hardware, software, and configuration data about mobile devices. Currently, Nokia is not supported. 򐂰 Device Configuration

Setting device parameters.

19.3.4 Integration with Enterprise Directories IBM Tivoli Configuration Manager 4.2 is now able to leverage enterprise directory information to initially address users-based operations inside Tivoli management environment. LDAP is the protocol used to access the enterprise directory. Using the Enterprise Directory integration, IBM Tivoli Configuration Manager can: 򐂰 Add/Delete/Modify the association between the user and endpoint 򐂰 Group different users in object containers 򐂰 Query users using search filtering Note: Only one user can be associated to one endpoint.

880

Troubleshooting Tivoli Using the Latest Features

Both Configuration Manager and Activity Planner exploit the enterprise directories by expanding the subscribers to: 򐂰 Users 򐂰 Users Groups 򐂰 Directory Query

19.3.5 Native packaging support In addition to Microsoft Setup, MSI, InstallShield and OS/2 CID supported packaging, IBM Tivoli Configuration Manager 4.2 supports three new native packages for the most common UNIX flavors: 򐂰 Installp (for AIX) 򐂰 Pkgadd and Patchadd (for Solaris) 򐂰 RPM (for LINUX)

This will allow IBM Tivoli Configuration Manager 4.2 to: 򐂰 Provide built-in actions 򐂰 Use redirect and bundled installation 򐂰 Provide Wizard to wrap UNIX packages in software package (SP) 򐂰 Make native packages manageable at an enterprise scale by leveraging Tivoli Software Distribution functionality 򐂰 Support all native actions where available

19.3.6 Multicast distribution Multicast is a new feature of Tivoli Management Framework Version 4.1. It is an add-on to MDist2. Traditionally, MDist2 has a one-to-one TCP connection with each target. Therefore, if there are 50 targets, MDist2 must send the distribution data 50 times. Using multicast, the distribution data is sent only once, regardless of the amount of receivers. This is due to the fact that multicast uses UDP broadcast packets. All the targets read from the same data stream. The benefit to using multicast is the decrease in the distribution time and network traffic. This is very useful when sending data to multiple targets over satellite or slow network links. A good example of this would be MDist2 distributing a large application of 100 MB to 100 endpoints, which would take about 5.6 hours (assuming the default bandwidth of 500 kbps). Using multicast, this same distribution could be done in less than 3.5 minutes.

Chapter 19. IBM Tivoli Configuration Manager

881

Tivoli multicast option will support the Reliable Multicast Transport Protocol (RMTP). When the client receives all its packets, it will send the server a list of dropped packets. The server receives this list from every target, takes the union of dropped packets and sends them again. This will cause some overhead but the amount of overhead depends on the number of receivers. Multicast may be used between managed nodes to load MDist2 depots or between a gateway and its agents. Distributions will be multicast from repeaters that have been enabled for multicast. The Default behavior is to not use multicast. For other enhancements that are available with IBM Tivoli Configuration Manager 4.2, please refer to IBM Tivoli Configuration Manager Introduction Version 4.2, GC23-4703.

19.4 Troubleshooting Software Distribution In this section, we will address troubleshooting techniques for Software Distribution components. We will provide a general approach for diagnosing distribution problems. The internals, architecture, and diagnostic techniques for all major components of Software Distribution will also be reviewed, because understanding the process flow of each component is very useful when troubleshooting a problem.

19.4.1 General troubleshooting The problem determination steps should be based on the process flow of the components of the products so that the point of failure could be determined and rectified. For Software Distribution, there are three main components in the process flow. These are the Tivoli Management Framework infrastructure, the endpoint, and software package profile. The Tivoli Management Framework infrastructure needs to be reviewed first, since distribution of any profiles will not work without it. Then the endpoint need to be checked to see if is in working order. The last thing to check is the Software Distribution profile and its associated SPD or SPB. The Tivoli Management Framework environment is the infrastructure used by Software Distribution, so the first thing to check is that the framework environment is functioning correctly. You must check that all required Software Distribution components and prerequisites have been correctly installed and configured on the Tivoli Server and gateways. You need to check that the functions of the Tivoli Server, all gateways, and managed nodes are all functioning correctly. A wping of the managed node may indicate that the

882

Troubleshooting Tivoli Using the Latest Features

managed nodes are up and running, but does not necessarily mean that it is functioning correctly. You can test this by pushing out other types of profiles, such as Inventory or remote control profile to endpoints other than the suspect endpoints. A failure here would indicate a problem with the Tivoli Management Framework environment. The gatelog of the gateway, the oservlog, and mdist log would need to be reviewed at a increased trace level. Refer to Chapter 3, “Problem determination” on page 57 to further determine the cause of the problem. Checking the condition of the endpoint, beside checking that it is running, you should push out other types of profiles, such as a Inventory or remote control profile or even other software package profiles to check that it is functioning correctly. If this is failing, the problem could be with the endpoint. If more than one endpoint is encountering this problem, check for any set pattern. For example, all the problematic endpoints are serviced by the same gateway or the same profile is failing on all these endpoints. The problem then could be with the gateway or the profile. For problems with the gateway and endpoint, refer to Chapter 3, “Problem determination” on page 57 to further determine the cause of the problem. With the Tivoli Management Framework infrastructure and the endpoint in working order, the problem could be the software package profile or the software package. Test the software package profile by distributing it to a known working endpoint. A failure here could indicate that the problem could be with the profile or the software package. The best source of information of the distribution is in the software package log file and the Software Distribution trace files. Review these logs to determine the cause of the problem. The settings of the software package profiles should be checked, as these settings or options can affect the operations on the endpoints. One of the things to watch out for is nested software packages. A software distribution to a group of endpoints failing with an error like “failed cm_status check” could be due to one of the nested packages already being installed on one of the targeted endpoints. Using the force or ignore options should allow the distribution to complete. Refer to the IBM Tivoli Configuration Manager Software Distribution manuals for the requirements and implications of using these options. There can be times where the installation of the software package is successful, but the status in the log does not correctly indicate it. This can occur with user_program defined as the final action, which has a indefinite timeout or when a manual reboot of the target is required. This is due to the records of the states of the software packages in the Inventory database being out of sync with the endpoints’ catalogs, which hold the states of the software package. You will need to run wsyncsp to reconcile the information.

Chapter 19. IBM Tivoli Configuration Manager

883

The other sections in this chapter will detail how to enable tracing and which log files need to be reviewed for the different components. Important: There is a manual called IBM Tivoli Configuration Manager Messages and Codes, SC23-4706 that comes with IBM Tivoli Configuration Manager. You can find messages and error codes from all IBM Tivoli Configuration Manager components in this manual. Messages (either from the GUI or command line) have a standard message format, for example: AMNxxxxY

Where: 򐂰 AMN: Suffix for AP Editor and Monitor (GUIs & CLIs) 򐂰 xxxx: Message number 򐂰 Y: Severity code

19.4.2 Check the log The log is the first place to check for error information. It can be located on the TMR Server, a managed node, or the target. The log is the preferred method for checking for errors because: 򐂰 E-mail and notice groups are not as detailed 򐂰 There is an option to send standard out and standard error of a user defined program to the log

In general, the first step in troubleshooting is to consult the log file, which contains more information than a Software Distribution notice group entry or an e-mail sent about the software package operation. Log files provide a detailed list of successful or failed attempts to distribute software packages for each endpoint. The append_log keyword keyword is set in the software package definition file. Check this file to verify that the append_log keyword is set. If it is, the log file will contain information about software package operations. The default location of the log file is on the TMR Server under $BINDIR/../swdis/work. However, it is possible to generate a log file on a specific managed node with the log_host_name attribute in the SPD file that specifies the label of the managed node, typically the host name, where the log file is generated The default name for the log is package-name^package-version.log. The software package name supports the use of the (.) period delimiter, or the (^) caret symbol. You can specify the log file’s location on any managed node or

884

Troubleshooting Tivoli Using the Latest Features

target in the Package Properties dialog from the Software Package Editor. To change the location of the log file using a software package definition file, update the log_file_path attribute. We recommend that you generate a detailed log on the endpoint that records each action in the software package and the results of the change management operations on the package. The target log file is set with the log_object_list stanza in the SPD file, and the location keyword that identifies the path name or subdirectory. If the directory does not already exist, it will be created. The log file will also be the SPname.SPversion.log or SPname^SPversion.log, same as for the log file name on the TMR or designated managed node. IBM Tivoli Configuration Manager, by default, will overwrite the log file with each new distribution of the software package.

19.4.3 Check the Distribution Status Console You need to do the following tasks to check the Console: 򐂰 Determine which targets are failing 򐂰 Determine if repeaters are failing 򐂰 Determine the status of different distributions:

– Waiting – Interrupted – Receiving Consulting the Distribution Status Console is a good way to get a graphical representation of what the status is of all the systems involved in a distribution. By using the Distribution Status Console, you can determine which targets or repeaters did not receive the distribution. A PAUSED status for the distribution could be for an endpoint operating in mobile mode. Check the login_mode of the endpoint by running wep . If the target endpoint is not running in mobile mode and you did not pause the distribution, continue further troubleshooting by checking the software package log file, mdist log, gatelog, and the lcfd.log to determine the point of failure.

Chapter 19. IBM Tivoli Configuration Manager

885

Note: You can also cancel a distribution from the Tivoli Status Console. To cancel (or pause or delete) a distribution, highlight the distribution and click the appropriate button. To cancel the distribution from the command line, enter the following command: wmdist -c xxxxxxx...

Where xxxxxxx... is the distribution ID number.

19.4.4 Make sure that Tivoli Management Framework is functional In order to make sure that Framework is functional, consider the following: 򐂰 Suspect this problem if the endpoint was previously able to receive distributions, and suddenly is unsuccessful 򐂰 Make sure the oserv is running on gateways 򐂰 Verify the setup of endpoints

Of course, a distribution will fail if gateways are not receiving the distribution or if endpoints are not connected to gateways. If a particular endpoint suddenly can no longer receive distributions, then Tivoli Management Framework is a good place to check for problems. Run the Tivoli Management Framework command odadmin odlist to confirm all systems are connected. Use the wping command to confirm the oserv is running on a particular gateway.

19.4.5 Check for MDist2 problems Software distribution uses MDist2 for distributions and the distribution information can be found in the MDist2 Distribution Manager’s log distmgr.log, which is located in $DBDIR. You can set the trace level to the maximum level by running wmdist -D 9. Review this log for any possible causes and rectify. Review the MDist2 settings to ensure that there are within range and the time-out settings have not been exceeded. Many distribution problems are caused by problems transmitting the package along the chain of repeaters to the endpoints that are the targets of a distribution. Failures occur, for example, because a connection is lost between an endpoint and its gateway, a distribution times out because of performance problems, or a user program fails or does not complete.

886

Troubleshooting Tivoli Using the Latest Features

Common MDist2 problems The following list provides some examples of distribution failures that are symptoms of problems with the distribution network of repeaters, gateways, and endpoints: 򐂰 You can successfully distribute a software package to one endpoint but a distribution to multiple endpoints fails.

Ensure that the repeaters are optimized and configured correctly. 򐂰 You have network storms.

Use the wmdist command to examine the MDist2 parameters and to change them if necessary. In particular, check the value of the max_sessions_high, max_sessions_medium, and max_sessions_low parameters, which sets the number of allowable connections: High-priority (default: 5), medium-priority (default: 10), and low-priority (default: 40) connections, respectively. 򐂰 You can no longer distribute to an endpoint to which you previously were able to send distributions.

Ensure that the endpoint is connected to its gateway and is active. 򐂰 Distribution times out.

If the distribution times out, check the values for the send and execute timeouts set by using the wmdist command. 򐂰 Distribution takes longer than expected.

You can use the wmdist -I command to monitor the progress of loading the software package at each repeater (it gives the number of bytes transferred and the percentage complete). If you decide that performance is bad, you may decide to change the way in which your network is configured (netload). The alternative wdepot command checks on the existence of a package at a depot, and thus may be useful if the level of completion is of no interest to you. You may also consider configuring a machine that is continually used as a source host as a repeater. By configuring the source host as a repeater, you can tune the communication parameters of the machine so that the software package is routed directly from the source host to its gateway. This saves time and network load.

Chapter 19. IBM Tivoli Configuration Manager

887

19.4.6 Verify the setup of endpoints Do the following to verify the setup of endpoints: 򐂰 Verify the endpoint connection 򐂰 Verify endpoint configuration settings 򐂰 Verify the gateway log

The wep command can provide a list of all endpoints in a TMR and their assigned gateways, retrieve and set endpoint information, migrate an endpoint from one gateway to another, and update any endpoint data changes within a TMR. This command also can list information in the endpoint list that is maintained by the endpoint manager. The wadminep command set with the view_config_info option lists the configuration settings for a particular endpoint. After configuring a gateway, you can set the set_debug_level option with the wgateway command to track information about the gateway. The wgateway command lists gateway object identification numbers, names, and status within a TMR. Also, wmdist -s debug_level must be set to track info (MDist2 uses this level).

19.4.7 Check lost-n-found In some cases, the problem is that the log host no longer exists or the software package no longer exists. To check this, run wchkdb: 򐂰 wchkdb moves the orphaned software package to lost and found 򐂰 To check lost and found, run wls /lost-n-found. 򐂰 Use wmvspobj to move the software package to a profile manager

Tivoli Management Framework provides a lost-n-found collection on the server to store database objects that are orphaned due to broken links or lost data. Tivoli Management Framework transfers the software packages that have encountered one of the following problems to the lost-n-found collection when either the log host or software package source host no longer exist. The wchkdb method moves the software package to the lost-n-found collection. The wls /lost-n-found command lists the contents of this collection. Check the notice groups to determine why the software package was moved; a notice is generated any time a software package is moved to the lost-n-found collection. wmvspobj moves the SP from the lost-n-found collection to a profile manager. Finally, correct the problem that initially moved the software package before distributing it.

888

Troubleshooting Tivoli Using the Latest Features

19.4.8 Troubleshooting the software package Check the software package definition file (SPD): 򐂰 Use it if you suspect the software package itself is the problem, such as if other SP distributions succeed. 򐂰 The SPD file provides details of the software package in one look. 򐂰 Use the wgetspat command to get the attributes of the software package.

The SPD file allows for setting all possible properties and options in a readable text format, including those only available using an import or export. The SPD file can be considered the instructions or control file defining which actions and how they are performed. There are three ways to obtain the software package definition file, which is given a suffix of .spd, for example, SPname.SPversion.spd, by convention: 򐂰 Select Java Endpoint Software Package Editor -> File -> Save as and select the .spd suffix. 򐂰 The wexpspo command, which allows for exporting content of a software package to either a file or standard out 򐂰 On the Tivoli Desktop, right-click on the SP profile and select Export.

The wgetspat command extracts the attributes of the SP object, which may be quite useful in debugging a problem. Some of the relevant attributes to review for diagnosing a problem are the settings for: – stop_on_error Specifies whether to stop (fail) a distribution to an endpoint when any error (fatal or non-fatal) occurs. – backup_fmt Specifies whether and where to back up any files being overwritten by the files distributed in the software package. – list_path Specifies where to write a list of files distributed to an endpoint. – prog_env Sets the environment for the configuration programs on an endpoint. (This keyword applies to UNIX and Windows NT/W2000 platforms only.) – log_file Specifies the file to which log information is written.

Chapter 19. IBM Tivoli Configuration Manager

889

– log_host Specifies the machine on which the log file resides.

19.4.9 Software Distribution traces Traces provide more detailed information about packaging or distributions enabled for the specific component related to the failure or failed Software Distribution operation. Therefore, traces may be taken on the server, source host, endpoint, preparation site, or disconnected CLI. On endpoints, the trace level is set in the Software Distribution base configuration file, swdis.ini, located in the system directory on the target system for the respective OS platform: 򐂰 Windows: \winnt\ or \winnt40\ 򐂰 OS/2: \os2\ 򐂰 NetWare: sys:\System 򐂰 UNIX: /etc/Tivoli/ (global for root user) $HOME/.swdis/ (local / private for non-root user)

Important: Setting the trace level using swdist.ini works only for endpoints, starting with IBM Tivoli Configuration Manager Version 4.2. There is a new command in IBM Tivoli Configuration Manager Version 4.2 called wswdcfg, which sets trace info on the Software Distribution Servers and managed nodes. The syntax is as follows: wswdcfg –s trace_level= 0, 1, .....6 wswdcfg –h hostname

This command is not applicable for endpoints, where swdist.ini should be used. There is also another troubleshooting command that is new with IBM Tivoli Configuration Manager Version 4.2: wmsgbrowse. It is used for investigating the Notification Manager queue (browse the message queue, filter them, find undelivered messages, and so on) in order to understand the problem. For details on both of these troubleshooting commands, please refer to the IBM Tivoli Configuration Manager Reference Manual for Software Distribution Version 4.2, SC23-4712.

890

Troubleshooting Tivoli Using the Latest Features

The trace level by default is zero (as seen in Example 19-1) or none, which really indicates no tracing or tracing is, in effect, disabled. The new trace level takes effect on the next distribution or execution. Example 19-1 swdis.ini [#SERVER] product_dir=/usr/local/Tivoli/bin/swdis working_dir=/usr/local/Tivoli/bin/swdis/work backup_dir=/usr/local/Tivoli/bin/swdis/backup trace_level=0 trace_size=1000000 report_threads_limit=10 inventory_rim_name=inv_query autopack_dir=/usr/local/Tivoli/bin/swdis/autopack staging_dir=usr/local/Tivoli/bin/swdis/service user_file_variables=/usr/local/Tivoli/bin/swdis/swdis.var import_libraries=spd,libscimp [aix-tmr01b] product_dir=/opt/Tivoli/swdis/1 working_dir=/opt/Tivoli/swdis/1/work backup_dir=/opt/Tivoli/swdis/1/backup trace_level=0 trace_size=1000000 send_timeout=300 autopack_dir=/opt/Tivoli/swdis/1/autopack staging_dir=opt/Tivoli/swdis/1/service user_file_variables=/opt/Tivoli/swdis/1/swdis.var import_libraries=spd,libecimp inventory_scan_file=/opt/Tivoli/lcf/inv/SCANNER/sd_scan.nfo

Note: It is possible to specify a new keyword in the swdist.ini file called trace_style, which is used to generate more than 10 trace files (default). This key must be activated only on an L3/Development request.

There is no maximum size of each trace file; the default size per type is 1,000,000 bytes. When the trace_size specified is reached, the first trace file is overwritten. For example, the trace files can be written from spo1.trc up to spo9.trc (sp01.trc, sp02.trc ...), and if the specified maximum size is reached and sp09 gets full, sp01.trc is overwritten (unless trace_style keyword is activated). The trace file depends on the machine role for the installed component. The trace files themselves are created initially, with trace_level = 0, zero byte, until the trace_level is increased. Example 19-2 on page 892 shows the swdist.ini file with trace level set to 5.

Chapter 19. IBM Tivoli Configuration Manager

891

Example 19-2 Listing in swdis.ini on endpoint C:\WINNT>type swdis.ini |more [3B-053] speditor_dir=C:\Tivoli\swdis\1\speditor product_dir=C:\Tivoli\swdis\1 working_dir=C:\Tivoli\swdis\1\work backup_dir=C:\Tivoli\swdis\1\backup profile_dir=C:\Tivoli\swdis\1\work\profiles trace_level=5 trace_size=1000000 send_timeout=300 autopack_dir=C:\Tivoli\swdis\1\autopack staging_dir=Tivoli\swdis\1\service user_file_variables=C:\Tivoli\swdis\1\swdis.var inventory_scan_file=C:\Tivoli\lcf\inv\SCANNER\sd_scan.nfo [#MOBILE] speditor_dir=C:\Tivoli\swdis\1\speditor product_dir=C:\Tivoli\swdis\1 working_dir=C:\Tivoli\swdis\1\work backup_dir=C:\Tivoli\swdis\1\backup profile_dir=C:\Tivoli\swdis\1\work\profiles trace_level=5 trace_size=1000000 send_timeout=300 autopack_dir=C:\Tivoli\swdis\1\autopack staging_dir=Tivoli\swdis\1\service user_file_variables=C:\Tivoli\swdis\1\swdis.var inventory_scan_file=C:\Tivoli\lcf\inv\SCANNER\sd_scan.nfo

Example 19-3 shows the trace files on an Windows NT machine. Example 19-3 Windows trace files Directory of C:\Tivoli\swdis\1 02/22/2001 02/22/2001 02/22/2001 02/02/2001 02/08/2001 03/06/2001 02/02/2001 02/02/2001 03/05/2001

01:38a 01:38a 01:38a 04:46p 03:56p 12:45p 07:01p 04:46p 05:03p

0 0

61,658 0 186

. .. runsped1.trc spde1.trc speditor tmesdis1.trc wdusrpr1.trc wdusrprf.bat work

Example 19-4 on page 893 shows traces files on a UNIX machine.

892

Troubleshooting Tivoli Using the Latest Features

Example 19-4 UNIX trace files eastham> pwd /usr/local/Tivoli/bin/swdis eastham> ls -al total 40 drwxr-xr-x 5 root system drwxr-xr-x 10 root system drwxr-xr-x 2 root system drwxr-xr-x 2 root system -rw------1 root system -rw------1 root system -rw------1 root system

512 512 512 512 0 0 0

Feb Feb Mar Mar Feb Jan Jan

05 26 06 05 01 31 31

14:57 11:19 15:15 17:17 11:05 15:39 20:02

. .. apm ccm runsped1.trc spde1.trc spo1.trc

It may be worthwhile to erase any existing trace files to ensure a good capture for recreation or diagnosis. 򐂰 Software Distribution trace levels

– 0: None (default) – 1: Fatal – 2: Error – 3: Warning – 4: Info – 5: Verbose – 6: On L3/Development request only 򐂰 Software Distribution trace flags

– [F]: Fatal Failure – [E]: Error – [W]: Warning – [I]: Information Here is a summary of the log files at the different locations: 򐂰 Server (spo_core.exe)

– tmesdis*.trc •

CLI

– spo*.trc •

Import/Export



Requests to source host

Chapter 19. IBM Tivoli Configuration Manager

893

򐂰 Source Host (spd_eng.exe)

– spde*.trc •

Import/Export



Build

– mdist*.trc •

MDist2 interfaces

򐂰 Endpoint/PrepSite (spd_eng.exe)

– tmesdis*.trc •

CLI

– spde*.trc •

Build (prep.site)



Execution

– autopck*.trc •

autopack (prepsite)

19.4.10 Troubleshooting Data Moving The Data Moving process flow may help you diagnose problems associated with Data Moving. We will also cover logging and trace for Data Moving.

Data Moving process flow Figure 19-4 on page 895 shows the Data Moving process flow.

894

Troubleshooting Tivoli Using the Latest Features

Figure 19-4 Data Moving process flow

Where: 1. The Data Moving command is submitted at the TMR Server side from the CLI or APM. 2. Checks are made at the TMR Server to verify that DataMovingRequest.1 software package is a valid object. 3. The command is then forwarded to the source host. 4. Checks are made to ensure that the file is present. The package then is built at the Source side before the scripts (if any) are run at the source host and the package is transmitted through MDist2. 5. The Distribution ID is assigned and transmitted to the TMR Server. 6. The Distribution ID is forwarded to operator’s workstation from the TMR Server. 7. The package is distributed to the repeater from the source host. 8. The package is then transmitted to the endpoints and all the package actions are completed. 9. Results or data (for Retrieve operation) are forwarded to the source host. 10.After script (if any) is run at the source host. 11.Results are forwarded to the TMR Server.

Chapter 19. IBM Tivoli Configuration Manager

895

Data Moving log file The log and trace settings for Data Moving are the same as for Software Distribution software packages. 򐂰 The DataMovingRequestxxx.log

The DataMovingRequestsXXX.log is located under the working_dir designation in the swdis.ini file on the TMR Server, for example, on an AIX TMR Server under the /usr/local/Tivoli/bin/swdis/work/ path. The log file records information regarding Data Movement operations and distributions.

Data Moving trace files The Data Moving trace files are: 򐂰 tmesdisxx.trc, swdmgrxx.trc and spoxx.trc

Located on the TMR Server. They report all the traces associated with the wspmvdata command. These files are unique in the case of interconnected TMRs when the Tivoli Software Distribution Server component is installed after the interconnection. 򐂰 spde*.trc

Resides on the source host and endpoint. Records diagnostic information about the Import, Export, Build, and Execution processes.

19.4.11 Troubleshooting Mobile Computing We will cover the Mobile Computing process flow and log and trace files for the Mobile Computing to help you diagnose problems associated with the software distribution to mobile workstations.

Tivoli Mobile Computing process flow Figure 19-5 on page 897 shows the Mobile Computing process flow.

896

Troubleshooting Tivoli Using the Latest Features

Figure 19-5 Process flow for Mobile Computing

There is a distribution for the mobile endpoint in the gateway’s queue. When the endpoint logs into the gateway, the gateway sends the job queue to the mobile endpoint. The mobile endpoint disconnects from the network. The process flow in Figure 19-5 is as follows: 1. The user starts the mobile GUI in disconnect mode. 2. The mobile agent checks the local queue to see if any package needs to be shown in the mobile GUI. 3. A response is received from the local queue and the packages (if any) are displayed in the mobile GUI. 4. The user selects an action on a package in the mobile GUI. Control is given to the SWD Service. 5. The SWD Service handles the request and updates the queue. When the mobile endpoint is re-connected to the network, results will be sent back to the gateway to be updated in the RDBMS.

Chapter 19. IBM Tivoli Configuration Manager

897

Mobile Computing configuration, log, and trace files For troubleshooting the server side, set the trace level of the MDist2 Distribution Manager with wmdist -D 7 (0-9). If you are using a UNIX TMR Server, you can watch the trace in real time with tail -f $DBDIR/distmgr.log. To watch what is happening on the gateways, set the trace level of the gateway with wgateway $gateway set_debug_level 7 (0-9). You can watch the trace in real time with tail -f $DBDIR/gatelog on the UNIX gateway concerned. For tracing the Mobile Computing environment on the endpoint, you have two options. Setting logLevel=4 (0-4) in Mobile.cfg generates trace information for the Mobile Agent. Setting guiLogLevel=4 (0-4) generates trace information for the Mobile Console. In both cases, the trace information is written to Mobile.log, which is located in $LCF_DATDIR/Mobile/Mobile.log. Also, setting the trace level of the endpoint can be informative. Add log_threshold=5 to $LCF_DATDIR/last.cfg and restart the endpoint. The trace information is written into $LCF_DATDIR/lcfd.log.

19.4.12 Troubleshooting a pristine installation For troubleshooting a pristine installation, we will cover the pristine process flow and pristine log files.

Process flow for pristine installations Figure 19-6 on page 899 shows the process flow of the pristine installation.

898

Troubleshooting Tivoli Using the Latest Features

Figure 19-6 Pristine tool process flow

Windows configuration Follow these steps: 1. Code Server is created. 2. Configuration of the Response File for the OS and Tivoli endpoint is completed. 3. Boot diskette configuration is performed. It is based on the Network Administrator Client for Windows. 4. Disk partition configuration is completed. 5. Pristine workstation is booted with the boot diskette. 6. When the operating system and TMA are installed, TMA connects to the dedicated gateway. 7. Login_Policy is run and workstation software is updated according to the Reference Model.

Chapter 19. IBM Tivoli Configuration Manager

899

OS2 configuration Follow these steps: 1. Code Server is created. 2. External commands execution is performed; sedisk creates three bootable diskettes. The diskette has three other CID commands for network support. 3. Boot diskettes configuration is performed. It is based on the LAN CID Utility for OS2. 4. Disk partition configuration is completed. 5. Pristine workstation is booted with the boot diskette. 6. When the operating system and TMA are installed, TMA connects to the dedicated gateway. 7. Login_Policy is run and workstation software is updated according to the Reference Model.

Pristine log files The pristine tool is an utility to assist customers with customizing a native OS installation. The log information related to each installed operating system component are located on the Code Server under the ImageSharingDrive\log\ path, as they are created by the operating system. There is no log file generated at the server for the TMA installation. However, when the Login_Policy is run, a log file named pristine.log is generated.

19.4.13 Troubleshooting discovering and synchronization Log information for the discovering and synchronization features can be collected through the following processes: 򐂰 Discovering

The log file associated with the software package. 򐂰 Synchronization

The wsyncsp.log file created in the working directory, reported in the swdis.ini file.

900

Troubleshooting Tivoli Using the Latest Features

19.4.14 Change Management Status summary Change Management (CM) Status is a handy way to understand the current status of the package. Table 19-1 summarizes the Change Management Status information. Table 19-1 Change management status summary Operation

State

Undo state

Reboot state

Flag

Install

Prepared

P repared

ReB oot requested

Changing

R emove

C ommitted

U ndoable

Discovered

In E rror

Restored

Hidden

-

Where: Pos 1: Operation

Indicates the last operation that was performed on the software package, either I (install) or R (remove).

Pos 2: State

Indicates the state of the software package, either P (prepared) or C (committed).

Pos 3: Undo state

If the SP is in an undo state, there will be a letter in the third position of the five character sequence, which can be P (prepared), U (undoable), or R (restored), or a dash (-), if the undo state does not apply.

Pos 4: Reboot

A B indicates a reboot was requested. A dash (-) indicates a reboot was not requested.

Pos 5: Flag

An E indicates the software package is in error and may not work properly.

ICU--

Install has been committed and can be undone.

IP-BC

Install has been prepared and will be committed during the next reboot.

RCU--

Remove has been committed but can be undone.

IC--E

Install has been committed, but the SP is in error (the application may not work properly).

IC-D-

The software package has been added with the use of the wdsetsps command.

IC-H-

The software package has been superseded by a later version of a package installed in undoable mode.

Chapter 19. IBM Tivoli Configuration Manager

901

In summary, the overall state of a software package is represented by a sequence of five letters.

19.5 Troubleshooting Activity Planner The Activity Planner processes are described in detail in this section, which may help you diagnose AP problems. AP also has various log and trace files for its components, which will also be covered.

19.5.1 Activity Planner processes The internals of APM processes are: APM_core

All the APM threads are generated by a single process called APM_core.

Executer

The operations submitted through the Task Library and SWD plug-ins are managed by the Executer thread.

APMHandler

The APMHandler thread manages all the APM work, together with the Executer. The APMHandler also determines if an activity in an Activity Plan is eligible to start.

APMain

The APMain thread initializes APM, starts the Executer and APMHandler, and then waits for new requests. Requests are then queued to the APMHandler.

19.5.2 Activity Planner configuration file The APM configuration file, apm.ini, sets the AP log file and trace files size, location, and debug level. AP log and trace files are generated at the TMR Server. All the log files and traces are created on the server side, and on the managed node only for the GUI component. The apm.ini file is created at installation time. Table 19-2 shows the location of the AP configuration file, apm.ini, for the operating system. Table 19-2 Location of apm.ini, APM configuration file

902

File Name

Operating system

Path

apm.ini

UNIX

/etc/Tivoli

apm.ini

NT/W2000

$SystemDrive\WINNT

Troubleshooting Tivoli Using the Latest Features

A sample apm.ini file is shown in Example 19-5. Example 19-5 Sample apm.ini ;APM configuration file [DEFAULT] trace_level=0 working_dir=C:\Tivoli\bin\w32-ix86\..\w32-ix86\..\apm trace_size=1000000 log_max_file=100000 log_level=5 plugin_download=enabled log_file=apmlog TME_Host=morbidelli TME_User=tivapm [MAIN] trace_level=0 working_dir=C:\Tivoli\bin\w32-ix86\..\w32-ix86\..\apm trace_size=1000000 [HANDLER] trace_level=0 working_dir=C:\Tivoli\bin\w32-ix86\..\w32-ix86\..\apm trace_size=1000000 [EXECUTER] trace_level=0 working_dir=C:\Tivoli\bin\w32-ix86\..\w32-ix86\..\apm trace_size=1000000 [APMCLI] trace_level=0 working_dir=C:\Tivoli\bin\w32-ix86\..\w32-ix86\..\apm trace_size=1000000 [APMEDITOR] trace_level=0 working_dir=C:\Tivoli\bin\w32-ix86\..\w32-ix86\..\apm trace_size=1000000 plugin_download=enabled [MONITOR] enable_auto_update=true auto_update_interval=180 trace_level=0 working_dir=C:\Tivoli\bin\w32-ix86\..\w32-ix86\..\apm trace_size=1000000 plugin_download=enabled

Chapter 19. IBM Tivoli Configuration Manager

903

Tip: A new command, wtrcapm, can be used to change or view the current log and trace settings for the Activity Plan engine components. For example: wtrcapm –H –s trace_level=3

changes the value of trace_level key to 3 in the [HANDLER] session in the apm.ini file.

19.5.3 Activity Planner log files Table 19-3 summarizes the log files for AP. Table 19-3 Location of APM log files Log type

File name

Operating system

Path

AP Monitor

apmon.log

UNIX

/tmp/

AP Monitor

apmon.log

NT/W2000

$SystemDrive\WIN NT\

AP Editor

apmed.log

UNIX

/tmp/

AP Editor

apmed.log

NT/W2000

$SystemDrive\WIN NT\

APM General

apmlog*

UNIX

working_dir in apm.ini

APM General

apmlog*

NT/W2000

working_dir in apm.in

APM Internal

apm.log

UNIX

/tmp/

APM Internal

apm.log

NT/W2000

$SystemDrive\

Example 19-6 shows the APM log files on our UNIX TMR Server. Example 19-6 APM log files eastham> pwd /tmp eastham> ls -al ap*.* -rw------1 root -rw-r--r-1 root -rw-r--r-1 root -rw-r--r-1 root -rw-r--r-1 root eastham>

904

system nobody system nobody system

Troubleshooting Tivoli Using the Latest Features

735094 204 22334 37 22839

Mar Mar Mar Mar Mar

07 01 06 01 06

06:10 15:43 18:51 15:37 17:05

apm.log apm_uninst.log apmed.log apmmn_uninst.log apmon.log

򐂰 APM general log: apmlog

Records operational level messages for APM, including plan submission and completion. It is created by default. 򐂰 AP Monitor log: apmon.log

Specific for the Activity Planner GUI. 򐂰 AP Editor log: apmed.log

Contains log information specific to the AP Editor. 򐂰 APM internal log: apm.log

Contains all information related to the APM_core functionality. It records all APM calls made to its IDL interface. It also records all the JVM initialization and completion messages. It is not generated by default. To generate and record to the APM internal log, apm.log, an APM environment variable, __APM_DEBUG__ must be enabled through the use of the Tivoli Management Framework command odadmin environ set , or setting a System Environment variable. An example of the usage of the odadmin environ get and odadmin environ set commands to enable the APM environment variable, __APM_DEBUG__, is shown in Example 19-7. Example 19-7 odadmin environ example eastham> eastham> eastham> eastham> 15382 26368 28916

odadmin environ get >/tmp/environ echo __APM_DEBUG__=1>>/tmp/environ odadmin environ set kill 15382

19.5.4 Activity Planner trace files All APM trace files are located in the /apm/ subdirectory under product_dir designation in swdis.ini on the TMR Server. Example 19-8 on page 906 shows the APM trace files on our UNIX TMR Server.

Chapter 19. IBM Tivoli Configuration Manager

905

Example 19-8 APM trace files eastham> pwd /usr/local/Tivoli/bin/swdis/apm eastham> ls -al total 10880 drwxr-xr-x 2 root system drwxr-xr-x 5 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system -rw-r--r-1 root system

512 512 1000042 259 1000004 1000083 2635 1000021 5187 1000071 34647 292027 100024 416 84063 170

Mar Feb Feb Mar Feb Mar Mar Mar Mar Feb Mar Mar Feb Mar Feb Mar

07 05 27 07 27 04 07 05 07 05 07 07 10 07 01 06

06:00 14:57 16:35 06:05 16:33 15:03 06:00 16:43 06:00 13:16 06:00 06:00 19:00 06:00 12:25 21:01

. .. APDefault0.trc APDefault1.trc APDefault2.trc APExecuter0.tr APExecuter1.tr APMCli0.trc APMCli1.trc APMHandler0.tr APMHandler1.tr APMMain0.trc apmlog0 apmlog1 logs.tar.Z repqueue.dat

Where:

906

APDefault0.trc

Contains all trace messages related to threads not tracked in the other files. It is considered a general trace file.

APExecuter.trc

Reports all traces associated with the Executer thread that manages the operations submitted at the Task Library and SWD plug-ins.

APHandler.trc

Contains all the traces associated with the APMHandler thread.

APMain.trc

Records the Main thread traces, which involves initialization of APM, starting of the Executer, and the APMHandler.

APMCli.trc

Contains the traces related to the CLI execution.

APMonitor.trc

This trace file records traces associated with the use of the Activity Plan Monitor.

APEditor.trc

This trace file records traces associated with the use of the Activity Plan Editor.

Troubleshooting Tivoli Using the Latest Features

Setting the GUI trace level To set the trace level for the Activity Planner GUI, press the F2 key to display the Update trace level dialog and type the new value in the Insert new trace level text box. Possible values are numbers between 0 and 5; the default level is 0. Note: If you do not have write access to the folder where the GUI traces are written, the trace information is written to the user’s home directory.

Figure 19-7 shows a sample APM executer trace file, APExecuter(0..).trc.

Figure 19-7 APM executer trace file

Valid values for trace and log levels are: 򐂰 0 (none) 򐂰 1 (fatal) 򐂰 2 (error) 򐂰 3 (warning) 򐂰 4 (information) 򐂰 5 (verbose)

Chapter 19. IBM Tivoli Configuration Manager

907

The default value for trace_level is 0; no trace is generated unless the trace level is modified. The default value for log_level is 5; all messages are logged unless the log_level is modified.

19.6 Troubleshooting Change Manager In this section, we will cover Change Manager (CM) or Configuration Change Manager log and trace files, and how to customize them.

19.6.1 Change Manager configuration file The location of the CM configuration file, confccm.xml for each operating system is shown in Table 19-4. Table 19-4 Location of CCM configuration file File name

Operating system

Path

confccm.xml

UNIX

/etc/Tivoli/

confccm.xml

NT/W2000

$SystemDrive\WINNT\

The CM configuration file is organized in stanzas that define CM implementations, such as elements, dependencies, and security. The CM configuration file can be customized to add new Java classes to change the current implementation, in such a case where the user decides to add new elements different from those currently supported. The confccm.xml configuration file can be customized to set the debug level for the CM traces. All log and trace files are created on the TMR Server.

19.6.2 Change Manager log files The CM log file records the same information contained in the ccm_apm*.trc file, except only those entries generated by the C code of CCM_core are executable. It is not generated by default. When enabling the CM environment variable __CCM_DEBUG__ with use of the Tivoli Management Framework command odadmin environ set, or set as a system variable, the appropriately named log file, ccm.log, is created. However, it may be necessary to kill or recycle CM for the environment variable setting to actually take effect.

908

Troubleshooting Tivoli Using the Latest Features

The location of ccm.log is shown in Table 19-5. Table 19-5 Location of CM log file File name

Operating system

Path

ccm.log

UNIX

/tmp/

ccm.log

NT/W2000

$TMPDIR

19.6.3 Change Manager trace files CM trace files are located under the directory specified in the working_dir parameter of the swdis.ini file on the TMR Server. 򐂰 ccm*.trc

The ccm*.trc trace file contains all of the actions performed using the CM GUI. For example: – Creation of a reference model – Import of a reference model – Export of a reference model – Preview operation It is located on the TMR Server and those managed nodes where the CM GUI is installed. It is located under the $(working_dir). 򐂰 ccm_apm*.trc

The ccm_apm*.trc trace file contains all the actions performed by CCM_core when other applications, such as APM, SD WEB UI, and Pristine, interact with CM. All of the traces tracked by the Java code executed by APM Java Virtual Machine are reported in this file. Examples of what is recorded include: – Submit operations for APM – CM answer to a WEB UI request – CM synchronization operation for pristine machines The ccm_apm*.trc trace file is located only on the TMR Server. 򐂰 ccm_webxxx.trc

Contains the history of all operations performed by the Change Manager engine when interacting with the new WEB UI 4.2 on the Application server.

Chapter 19. IBM Tivoli Configuration Manager

909

򐂰 ccm_clixxx.trc

Contains the history of all the operations performed using the CM command line as well as the operations performed when Change Manager interacts with the Activity Planner to download the plug-in information. You can set the trace level to determine the level of detail recorded for each operation. The trace file is only created if trace_level keyword is set to greater than zero (default is zero). Trace values are as follows: – 0 (none) – 1 (fatal) – 2 (error) – 3 (warning) – 4 (info) – 5 (verbose) You can set the trace level using the wtrcccm command. Alternatively, you can use the Change Manager GUI and press the F2 key. When you press the F2 key, the Update trace dialog box displays, and you can enter a new value in the New trace level text box. Also, trace_size determines the maximum number of records that can be written to the file (default is 1000000).

19.7 Troubleshooting Web Gateway and device management In this section we will cover troubleshooting the Web Gateway and device management, starting with the installation troubleshooting.

19.7.1 Troubleshooting Web Gateway installation Review the error message shown in the failed installation and review the log file cmsummary.log. The example error message in Figure 19-8 on page 911 indicates that the installation program is failing to install the Web Gateway database.

910

Troubleshooting Tivoli Using the Latest Features

Figure 19-8 Failed TWG installation dialog

You can check the following in this case: 򐂰 Ensure that the dmsadmin and dmsuser user IDs were successfully created on the Web Gateway database server. 򐂰 Verify that the passwords provided to the Web Gateway database installation are correct. Verify the passwords by connecting to DB2 with the user name and password specified.

From a DB2 environment, type: db2 connect to dms using dmsadmin using password

Note: This command only works if the Web Gateway database was created during the database installation. 򐂰 Ensure that the directories specified during the Web Gateway database installation have sufficient disk space. These directories are Database home and Database container home. 򐂰 Ensure that the DB2 instance specified during the Web Gateway database installation is correct. To list the valid DB2 instances, type db2ilist from a DB2 command environment.

Chapter 19. IBM Tivoli Configuration Manager

911

򐂰 Ensure the DB2 port is correct. Open the services file and locate the following line (for readability, the line below appears on two lines): db2c /tcp #Connection port for DB2 instance

For UNIX, the services file is located in the /etc/services file. For Windows, it is located in the drive:\WINNT\system32\drivers\etc\services file. 򐂰 You can review the log files for more information. The log files are located in the /tmp/dms_top/logs/pid/ directory on the Web Gateway database server.

For Web Gateway installation problems, you can also check for the existence of the TWGinst_stdout.log and TWGinst_stderr.log log files on the Web gateway server. Review the log files to determine where the install is failing. If the files do not exist, run the TWG_inst_driver.bat file from the TivTwg\tmp_inst directory and pipe the output to a file. Review the output file to determine point of failure.

19.7.2 Common Web Gateway and device management problems Here are some typical problems encountered when using the Web Gateway and device management.

Problems with starting the Web Gateway 򐂰 Problem: The following message appears in the DMS_stdout.log file when Web Gateway is starting in WebSphere Application Server: java.lang.ClassCastException

Solution: The wrong JDBC driver is being used. Web Gateway requires the JDBC 2.0 driver. You must configure DB2 to use the JDBC 2.0 driver and reinstall Web Gateway with the JDBC driver home installation parameter set to the JDBC 2.0 driver. 򐂰 Problem: The following message appears in the DMS_stdout.log file when Web Gateway is starting in WebSphere Application Server: DYM2794E: Failed to create the database connection pool. COM.ibm.db2.jdbc.DB2Exception: [IBM][JDBC Driver] CLI0616E Error opening socket. SQLSTATE=08S01

Solution: Ensure that DB2 is started and that the DB2 client is configured correctly. 򐂰 Problem: When starting Web Gateway in WebSphere Application Server, the following message appears in the DMS_stdout.log file: DYM2718E: An error occurred while trying to initialize the Policy Director environment.

912

Troubleshooting Tivoli Using the Latest Features

Solution: This message occurs when the IBM Tivoli Access Manager Java Runtime Environment is not installed and configured correctly on the Web Gateway server. Verify that the IBM Tivoli Access Manager Java Runtime Environment is installed on the Web Gateway server. 򐂰 Problem: When starting Web Gateway in WebSphere Application Server, the following message appears in the DMS_stdout.log file: DYM2719E: An error occurred while trying to create a Policy Director context.

Solution: The Web Gateway server is not configured correctly. Open the twgConfig.properties file to verify that the PD_ADMIN_USERID and PD_ADMIN_PW values are correct. To verify these values, log on to the pdadmin utility on the IBM Tivoli Access Manager Server, then type the following: pdadmin –a sec_master –p password

This message also occurs when the IBM Tivoli Access Manager Java Runtime Environment is not installed and configured correctly on the Web Gateway server. 򐂰 Problem: When starting Web Gateway in WebSphere Application Server, the following message appears in the DMS_stdout.log file: com.tivoli.pd.jutil.PDExceptionjava.io.FileNotFoundException: pd_config_file (No such file or directory)

Solution: The Web Gateway server is not configured correctly. Open the twgConfig.properties file to verify that the PD_CONFIG_FILE value exists on the Web Gateway server. 򐂰 Problem: Unable to log in to Web Gateway Server.

Solution: Do the following: – Use the IP address instead of the host name for the Web Gateway server to check if it is a DNS issue. – For a Palm OS device, check the settings in the Config.INI used to create the Config.PDB file. You can regenerate a corrected Config.PDB and install it on the Palm device or, alternatively, modify the settings on the device. – If you are using IBM Access Manager WebSEAL Server, make sure to include the WebSEAL_hostname and junction_name in the URL for the server. – There is a HTTP 400 Error when connecting. Check the name resolution. Make sure the host PC can contact the Web Gateway server.

Chapter 19. IBM Tivoli Configuration Manager

913

– Conduit returns an error/HTTP error code 500. Make sure the IBM WebSphere AdminServer 4.0 service is started. – Could not connect to the server. Check the proxy setting and port number. The port number should be 80. – HTTP error 404. Check the servlet name. – Palm OS device using network/modem connection when device is attached to host PC with a cradle. Use AttachmentOption=2 to specify that the Palm device should always use the cradle connection. A new Config.PDB file will need to be generated and copied to the Palm device.

Problems with using the Web Gateway 򐂰 Problem: The Web Gateway server started without errors. Then, the following message appears in the DMS_stdout.log file. SQL0973N Not enough storage is available in the "APP_CTL_HEAP" heap to process the statement.

Solution: To address this problem, refer to the IBM Tivoli Configuration Manager User's Guide for Deployment Services Version 4.2, SC23-4710. 򐂰 Problem: The Web Gateway server started without errors. Then, DB2 creates messages saying the ISPB_DATA or ISPB_INDEX tablespaces is full.

Solution: To address this problem, refer to Chapter 6, “Maintaining and troubleshooting a configuration management environment”, in IBM Tivoli Configuration Manager Planning and Installation Version 4.2, GC23-4702. You also need to reorganize the database tables; refer to the IBM Tivoli Configuration Manager Release Notes for information. 򐂰 Problem: On AIX, the Web Gateway server started without errors. Then the following message appears in the DMS_stdout.log file: Could not fork process

Solution: Increase the maximum number of file descriptors in AIX. Setting this value to 5000 should be sufficient. Use ulimit -a to determine how many file descriptors are currently in use. Use the following command to set the value to 5000 in the terminal in which WebSphere Application Server is started: ulimit -n 5000

򐂰 Problem: The Web Gateway server started without errors. Then, the following message appears in the DMS_stdout.log file: java.lang.OutOfMemory

Solution: This message indicates that the maximum heap size for the DMS_AppServer application server process has been reached.

914

Troubleshooting Tivoli Using the Latest Features

The default heap size is 256 MB. Use the WebSphere Application Server Administrative Console to increase the maximum value of the heap to a number larger than the default, such as 512 MB.

Problems with registering device classes and job classes 򐂰 Problem: When installing Web Gateway on AIX, the device classes and job types are not registered.

Solution: This is a known problem. It occurs with versions of WebSphere Application Server earlier than Version 4.0.3. Web Gateway requires Version 4.0.3. Verify that WebSphere Application Server is at the required level and reinstall Web Gateway.

Problems with enrolling a device 򐂰 Problem: When trying to automatically enroll a device in Web Gateway, the following message appears in the DMS_stdout.log file: DYM2043E: A device entry was not inserted into the database because the server setting indicates AUTO_ENROLL is set to false.

Solution: You must register Web Gateway with the Tivoli Server and enable auto-enrollment for that Web Gateway. To fix the problem, do the following: a. Set up the Tivoli command prompt environment on the Tivoli Server. b. Run this command on the Tivoli Server: wresgw add endpoint -C TWG

c. Run this command on the Tivoli Server: wresgw autoenroll enable endpoint

Problems with connecting the agent to the Web Gateway 򐂰 Problem: The Nokia 9200 Communicator Series agent will not connect to the Web Gateway server.

Solution: To try enrolling or processing a job, disconnect and reconnect the Nokia 9200 Communicator Series device to the host PC. If there is a RS_NO_JOBS_TO_RUN or RS_JOB_COMPLETED message near the end (last ten or so lines) of the JavaAgentLog.txt file, the device agent has successfully connected. If the connection failed, the log file contains a Connection failed or Unable to connect string near the end of the file. The trace contains the Web addresses that the device agent tried to connect to for the plug-in and the enrollment servlet. If the Web addresses are incorrect, the connection fails. Verify that the Web addresses are correct.

Chapter 19. IBM Tivoli Configuration Manager

915

Note: Whether logging is enabled or disabled, if there is a TNIERROR.txt file in the installation directory, there have been some serious startup problems. If the TNIERROR.txt file is present, it contains information about the problem. 򐂰 Problem: The device agent cannot connect to the Web Gateway server.

Solution: The device agent must be able to resolve and reach the following server addresses: – Initial connection Web address or Server URL – Server redirect host name – Enrollment server Web address If any of these Web addresses are set up with host names instead of the IP address and you do not have DNS set up on the device (or if there is some other TCP/IP connection issue with reaching the Web address from the device), the agent is unable to connect to the management server. For PalmOS and Windows CE agents, if the host name or address cannot be resolved or reached, the host name or address is displayed. To change the initial connection Web address or Server URL, do the following: – For PalmOS and Windows CE devices, this address is configured with the device agent configuration user interface. – The Nokia 9200 Communicator Series agent stores this address in the NokiaInterfaceSettings.cfg file, which is located in the default installation directory on the host PC. 򐂰 Problem: A return code was received when attempting to connect a device to Web Gateway.

Solution: There are several return codes displayed on the device screen or written to log files when a connection between the device and Web Gateway is not working properly. Generally, the PalmOS agent displays the HTTP return codes on the device screen. The Windows CE and Nokia 9200 Communicator Series agents only indicate a connection failure message. For any type of agent to server communication, the access log file on the HTTP server, which is being connected to, also tracks these return codes in the second to last field in each log file entry. The last field in each log file entry is the number of bytes being sent in the body of the response.

916

Troubleshooting Tivoli Using the Latest Features

The following are some common HTTP return codes used during Web Gateway device agent to server communications: – 200 In general, a 200 return code indicates a successful connection to the particular URL. However, this return code is also used when the HTTP server has returned an HTML content page with error messages in the body of the response. The device agents do not show HTML content pages. – 401: Access to URL is not authorized If IBM Tivoli Access Manager or some other HTTP authentication front-end is used, this return code occurs if the user ID or password configured in the device agent is incorrect. – 403: Access to URL is forbidden This return code occurs if there is a problem with the security configuration of the HTTP server or client. – 404: URL not found This return code occurs if the path portion of the servlet name that was configured on the client or in the enrollment server Web address is incorrect. This return code also identifies when the Web Gateway application server is not running within WebSphere. Use the WebSphere Administration Console to verify the status of DMS_AppServer application server. – 405: Method not allowed This return code occurs if the client connection URL path or enrollment server Web address is configured to an incorrect Web Gateway servlet path, such as if the client was configured to connect to an HTML Web page. – 500: Internal server error This return code indicates that WebSphere Application Server is not running. This return code also occurs if there is an error within the processing servlets. Use the DMS_stdout.log and DMS_stderr.log files to obtain more details. For additional details, enable tracing for the plug-in and dmserver components. – 502 If this return code occurs when connecting to the DeviceEnrollmentServlet, it usually indicates incorrect or missing

Chapter 19. IBM Tivoli Configuration Manager

917

parameters. To obtain more details, use the DMS_stdout.log and DMS_stderr.log files. – 925 Refer to “Receiving return codes from the C Language APIs” on page 920.

Problems with publishing and downloading a package 򐂰 Problem: When publishing a package using the wweb command, the following message appears in the DMS_stdout.log file: DYM2725E: Received a Policy Director error while assigning users to a package:

Solution: The Web Gateway server is not configured correctly. Open the twgConfig.properties file to verify that the WEBSEAL_MOUNT_POINT value is correct. To verify this value, start the pdadmin utility and type in the following command: object list /WebSEAL

Using the host name of the WebSEAL server returned in the previous command, type the following command to find the junction point: object list /WebSEAL/hostname

Use the exact output, both format and case, to specify the appropriate junction point. The format of this command is as follows: /WebSEAL/hostname/junction_point

򐂰 Problem: Unable to download packages for another use when using the Web Interface.

Solution: The Web Gateway server is not configured correctly. Open the twgConfig.properties file to verify that the WEBSEAL_ENABLED parameter is set to true. 򐂰 Problem: Unable to download a package published to a user using the wweb command when using the Web Interface.

Solution: The Web Gateway server is not configured correctly. Open the twgConfig.properties file to verify that the WEBSEAL_PROTOCOL, WEBSEAL_HOST_NAME, and WEBSEAL_PORT parameters have the correct values.

918

Troubleshooting Tivoli Using the Latest Features

Problems with running jobs for devices 򐂰 Problem: A job ran on a device successfully, but the results do not appear on the Tivoli Server.

Solution: Verify that the endpoint on Web Gateway is successfully communicating with the Tivoli Server. To verify this, type the following on the Tivoli Server: wep endpoint status

򐂰 Problem: A job was submitted to a device. When the device connects to Web Gateway, the following message is displayed: No job is submitted for your device

Solution: Verify that the target devices for the distribution included that device. To list the devices for the distribution, type the following command from the Tivoli Server: wwebgw -d dist_id @Endpoint:web_gw_target

If the device is not listed, resubmit the job to your device and then rerun the wwebgw command. If the device is listed, verify that the job types are properly registered. Type the following command to list the registered device classes and their job types: TWG_HOME/bin/deviceclass.sh –list

򐂰 Problem: When trying to run a job on devices in a clustered Web Gateway environment, the job fails because the software package or inventory profile cannot be accessed.

Solution: Verify that the IBM HTTP Server on the primary server in the cluster is running. Software packages and inventory profiles reside on the primary server. 򐂰 Problem: The distribution was successful (profiles successfully distributed) but no inventory scan or Software Distribution operation were performed on the device.

Solution: – Check the db2 database of the Web Gateway to confirmed that jobs have been created on it. Open a DB2 command line and run: db2 connect to dms user dmsadmin using db2 select * from submitted_job

If there are jobs in the database, you should get an output similar to Figure 19-9 on page 920.

Chapter 19. IBM Tivoli Configuration Manager

919

Figure 19-9 Inventory scan job in Web Gateway database

– Check to make sure that the device is a member of the resource group to which you have distributed the profile. The dynamic resource group will only define its members at runtime. – Check to make sure that the conduit is installed on the host PC. – Do not use resource groups with names that begin with _INTERNAL_RESGRP. This groups are automatically created by Resource Manager during its operation and are automatically deleted when it is no longer required. 򐂰 Question: The Web Gateway server was configured incorrectly. Before I fixed the configuration in the twgConfig.properties file, I submitted jobs to devices. Will those jobs still run on the devices?

Answer: No. You must resubmit the jobs to the devices.

Receiving return codes from the C Language APIs 򐂰 Problem: A return code of 925 was received when attempting to create or delete a device, publish or unpublish a package, or submit a job. What does this mean and how is it debugged?

Solution: A 925 return code means there is a problem contacting Web Gateway. Verify that Web Gateway is started in WebSphere Application Server. 򐂰 Problem: A return code was received when attempting to create or delete a device, or publish or unpublish a package, or submit a job. The return code value was not 925.

Solution: Verify that Web Gateway is started in WebSphere Application Server. You need to enable the twgapi component trace to obtain debugging information.

920

Troubleshooting Tivoli Using the Latest Features

Using a non-standard port number 򐂰 Question: If the Web Gateway server is running on a non-standard HTTP port, is there any post installation steps that I need to follow?

Answer: Yes. Refer to Chapter 6, “Maintaining and troubleshooting a configuration management environment”, in IBM Tivoli Configuration Manager Planning and Installation, Version 4.2, GC23-4702.

Using Web Gateway in a cluster environment 򐂰 Problem: When starting Web Gateway in WebSphere Application Server in a cluster environment, the following message appears in the DMS_stdout.log file: DYM9038W: Machines {0} and {1} might be running results collector servlets that are accessing the same Web Gateway database. This could cause data corruption or loss. Please check your machines' configuration.

Solution: This message is caused by not setting the cluster environment to true for all non-primary servers in the Web Gateway cluster.

Inventory problems Inventory scan completed successfully on the devices, but there is no data in the database. The scanned data is stored on the Web Gateway and the Web Gateway Component makes an upcall to the gateway to request data collection. The data is collected in the same way as for inventory scans of PCs and Unix boxes. Check the mcollect.log on the gateway. Refer to 19.10, “Troubleshooting Inventory” on page 928 for more detail on troubleshooting the inventory data collection. Enable tracing of the traceEnabled.resultscollector component as detailed above and review the output log file.

Software Distribution problems When there are problems distributing to devices, because there are several components involved, the first step is to understand where the distribution has failed. When a package is distributed, it arrives on the endpoint where the Web Gateway is installed and there it is converted to the TWG jobs. If jobs are not created, the problem was in the Software Distribution code (for example the path specified as destination was too long and the file was not created at the endpoint). If jobs are generated, but there were errors executing them, the problem can be at the TWG or device level. For the reporting flow, reports are generated by TWG code and sent to the SWD notification manager (this is a place to check the problem). For all steps, problem determination is different.

Chapter 19. IBM Tivoli Configuration Manager

921

A good starting point is to check the swd_profile_name. log for the details of the failure. Refer to the 19.4, “Troubleshooting Software Distribution” on page 882 section for more details on tracing failed distributions.

Resource Manager problems A general failure that occurs when trying to register the resource type could be due to a communication failure with the Web Gateway or the Web Gateway is not functioning. These errors should show up in TRMRDBMS.log and TRMResourceManager.log in the $DBDIR directory. There are also other TRM*.logs for the various components of Resource Manager on the TMR Server under the $DBDIR directory. Review the appropriate log relating to the problem you are encountering to further determine the cause of the problem. The logs for the various components of Resource Manager are: 򐂰 TRMDGMAppMgr.log 򐂰 TRMDGMAppMgrUI.log 򐂰 TRMDGMDowncalls.log 򐂰 TRMDGMRegistry.log 򐂰 TRMGroup.log 򐂰 TRMGroupUI.log 򐂰 TRMRDBMS.log 򐂰 TRMResourceManager.log 򐂰 TRMResourceManagerUI.log 򐂰 TRMUserDB.log 򐂰 TRMUserUI.log

Log information can be changed by setting the variable in the Tivoli environment (odadmin environ get/set): TRM_DEBUG_LEVEL = (LEVEL_DBG_MIN/LEVEL_DBG_MID/LEVEL_DBG_MAX) TRM_MAX_LOG_SIZE = log files max size TRM_LOG_PATH = path to store log files

19.7.3 Tracing the Web Gateway On the Web Gateway, locate the traceConfig.properties file in the app_server_dir/installedApps/dmsserver_hostname_DMS_WebApp.ear/dmserv er.war/WEB-INF/classes directory. To turn on tracing, change EnableTrace=false to EnableTrace=true.

922

Troubleshooting Tivoli Using the Latest Features

The other components that need to be turned on (true) are traceEnable.dmserver and traceEnabled.twgapi. Depending on the situation, your Support Representative may request turning on tracing for the other components. If the servlets are not running, start them to put the new trace settings into effect. If the servlets are running, do one of the following to put the new trace setting in effect without restarting the servlets: 򐂰 On any Tivoli Web Gateway (TWG) machine, perform the following: server -app dmserver -trace set -host dmserver_hostname

򐂰 On any TWG Unix machine, perform the following command: ./server.sh -app dmserver -trace set -host dmserver_hostname

򐂰 From any machine with a browser, go to the following URL: http://dmserver_hostname/dmserver/TraceServlet?trace=set

The output file of the tracing is DMS_stdout.log, DMS_stderr.log, and DMSMsg1.log, which are located in app_server_dir/log directory. The default for Windows installation is C:\WebSphere\AppServer\log. You should also provide the ApiServlet.log in the /tmp directory to your Support Representative.

19.8 Troubleshooting Web User Interface In this section, we will cover troubleshooting of the Web User Interface. First, let us see some common problems associated with the Web User Interface.

19.8.1 Common Web User Interface problems These are the common Web User Interface problems: 򐂰 Web Interface login problems

– A log in to the WEB UI was unsucessful; this could be due to the security level of the browser set too high. Reduce the level of security and test again. – A successful login occurred, but a Java or ActiveX error message was encountered and the Operations Console does not show it. The supported Java plug-in for the browser may not have been installed.

Chapter 19. IBM Tivoli Configuration Manager

923

– The login is successful, but no pop-up window appears to download the two set of files for the Web Interface. The Java plug-in for the browser have not been installed. 򐂰 Unable to publish Web objects

Before publishing any Web objects, you must make sure the following are up and running: – Gateways servicing the endpoints that have Web Gateway components installed. – Endpoints on the Web Gateway servers. – Primary and secondary Web Gateway servers. – DB2 server. – DB2 client. – Web Server. – WebSphere Application Server. – Access Manager. – WebSEAL server. Check the DMS_stdout.log log file and make sure that the log has the following entry, which indicates that the Web Gateway server has started successfully: WSVR0023I: Server DMS_AppServer open for e-business

An insufficient authorization error from using the wweb command normally indicates that the Tivoli Administrator does not have the WEB UI_Admin role. A Profile not found error that appears after running wweb under Windows could be due to the absence of an extra caret (^) when specifying the profile name with a caret in its name. For example, profile mysoftware^1.0 needs to be specified as @mysoftware^^1.0 when running wweb under Windows. A general oserv error could indicate a problem with the gateway that services the endpoint that has the Web Gateway component installed. Restart the gateway with the wgateway command and test again. The wweb command completed with a distribution ID when publishing a software package but does not show up on the Web Interface. Check the file package_name.log in the ../swdis/work directory of the Tivoli Server for the result of publishing of the software package. The users file in ../swdis/work should contain the list of the users, for which the Web objects are published. On the endpoint where the Web Gateway server is located, the outcome of the publishing of the software packages is recorded in the file called results located in the ..\swdis\1\ directory. Check this file for any error messages. The

924

Troubleshooting Tivoli Using the Latest Features

file called users contain the list of users that have access to the published Web objects. Enable the tracing and review the DMS_stdout.log for errors. You may have to enable more components for tracing depending on the situation. Publishing the invalid user ID will fail and the log should reflect that. Refer to 19.8.2, “Tracing the Web User Interface” on page 925 for details on enabling tracing. 򐂰 Problems with software package installation

– Error in downloading attachment in the Operations Console. This error is not seen in the software_package.log. The Web Gateway server could be down or the host name of the Web Gateway server cannot be resolved. Make sure that you can resolve both the short name and fully qualified domain name of the Web Gateway server and perform the install again. – DISSE0082E error decoding software package object appears. It could be corrupted, or not a valid object. This error can be seen in both the Operations Console and in the software_package.log located on the Tivoli Server in the ../swdis/work directory. You will encounter this corrupt file error if you use an IP address instead of the host name of the WebSEAL server in the URL to get to the Web Interface. Make sure the host name and short name of both the WebSEAL server and Web Gateway server can be resolved, use the host name of the WebSEAL server in the URL for the WEB UI Interface, and login again and redo the install of the software package. 򐂰 Problems with the inventory scan

The Inventory scan completed successfully, but there is no data in the RDBMS database. Check the mcollect.log on the gateways and the trace file for the TWG-MCollect. Refer to 19.10, “Troubleshooting Inventory” on page 928 to debug the failed collection.

19.8.2 Tracing the Web User Interface You can use wwebcfg to set the tracing parameters for the Web User Interface. The output trace files (webui*.trc) are located in the $DBDIR/webui directory. The available parameters for wwebcfg are: 򐂰 product_dir 򐂰 working_dir 򐂰 trace_size 򐂰 trace_level

Chapter 19. IBM Tivoli Configuration Manager

925

The default for product_dir is $DBDIR/webui and $DBDIR/webui/work for working_dir. The default trace_size is 1000000 and when the trace file size reaches this size, a new file is created. The trace_level can be set from 0 to 6, as shown in Table 19-6. Your support personnel may request a higher level depending on the situation. Table 19-6 Settings for trace_level trace_level

Specifies

0

No traces

1

Level fatal

2

Level error

3

Level warning

4

Level info

5

Level verbose

6

Maximum level

Tracing Software Distribution WEB UI plug-in Set the trace_level parameter of wswdcfg to level with by running: wswdcfg -s trace_level=9

The traces (*.trc) are located on the TMR Server (by default) in the $product_dir directory, which is specified in the [#SERVER] section of the swdis.ini file. The swdis.ini is located on C:/WINNT for a Windows Tivoli Server and /etc/Tivoli for a UNIX TMR Server. In product_dir, spo*.trc and spde*.trc should have traces of the publishing of software packages to TWG. swdmgr*.trc should have the results of the publishing. The swd traces directory can be changed using the product_dir parameter: wswdcfg -s product_dir=trace_dir

Tracing for the Web Gateway See 19.7.3, “Tracing the Web Gateway” on page 922.

926

Troubleshooting Tivoli Using the Latest Features

Trace files of the endpoint on the Web Gateway server Locate swdis.ini, which is located in C:/WINNT in Windows and etc/Tivoli in UNIX. Set the trace_level to level 6 (trace_level=6) in the endpoint section of the swdis.ini file. The trace files spde*.trc will be located in $product_dir, as specified in the swdis.ini file. When software packages are published to the TWG, these files should have the traces in them.

Trace files of the servlets on the Web Gateway server The trace files of the servlets are: 򐂰 LCF_ROOT/debuglog 򐂰 LCF_ROOT/wctrace 򐂰 LCF_ROOT/wclog

WEB UI client trace files When you connect to the Web Interface for the first time, a pop-up window asks for a installation directory for the WEB UI applet. The trace files webui.trc and webui.log will be located in that directory.

Trace files for TWG-MCOLLECT The trace files for the Tivoli Web Gateway Mcollect are on the application server: 򐂰 WAS_HOME_DIR/bin/EndpointMCollect3.trace 򐂰 WAS_HOME_DIR/bin/JavaEndpointMCollect.trace

19.9 Troubleshooting Enterprise Directory Integration Most of the issues involved in troubleshooting revolves around making sure the connection to the LDAP server is working and that the access to the context is correct. You can check this situation by setting a trace on the TMR Server. For example: odadmin environ set DQ_TRACE=max_size (MB)

The trace files are written to $DBDIR on the TMR Server. DirQueryCli0 contains the CLI trace and DirQueryEngine0.trc contains the engine trace. Sometimes the problem is a port issue. For Directory Integration, the communication takes place through a socket listening on port 9090. If this port is reserved to other applications, we suggest changing it by setting the variable

Chapter 19. IBM Tivoli Configuration Manager

927

DQ_PORT to a different value. The command to do this task is odadmin environ get/set.

19.10 Troubleshooting Inventory This section covers important information that is required to troubleshoot Inventory scans. It is important that you understand the basic tasks and the log files used in order to troubleshoot effectively. Table 19-7 on page 929 contains summary of log files that we will be discussing in this chapter.

928

Troubleshooting Tivoli Using the Latest Features

Table 19-7 Log file information Component

Path

Log file name

Default log level

Debug level

Endpoint: Upcall and downcall information.

$LCFDIR/dat/1.

lcfd.log

1

3

Endpoint scan engine: Inventory profile information

Directory where the wepscan command was run. Created by the wepscan -d command.

sa_config.log

N/A

3

Endpoint scan engine: Scan data

Directory where the wepscan command was run. Created by the wepscan -d command.

sa_results.log

N/A

3

Endpoint scan engine: Debug information

Directory where the wepscan command was run. Created by the wepscan -d command

INV_SA.LOG

N/A

3

Hardware scan library: Debug information

$LCGFDIR\inv\Scan.

libInvHW.log

N/A

N/A

Gateway: Upcall and downcall information

$DBDIR.

gatelog

3

6

RIM object SQLstatements and DB return codes

$RIM_DB_LOG. Created by wrimtrace.

invdh_1_rim.log

N/A

INFO|ERROR

Data Handler: Debug information

$DBDIR/inventory/data_handler.

mcollect.log

1

3

Collector: Debug information

$runtime_location/.

mcollect.log

1

3

19.10.1 Enabling logging and tracing The lcfd.log at debug level 3 contains important endpoint activity information. You can see the upcalls made by the endpoint, as well as downcalls made by the gateway to the endpoint. Depending on the type of problem you are troubleshooting, you may elect to only have certain traces enabled. For the purpose of this exercise, we will enable all the traces.

Chapter 19. IBM Tivoli Configuration Manager

929

Setting endpoint debug level To enable endpoint debugging level 3, do the following from the endpoint: 1. Change the log_threshold line in the $LCFDIR/dat/1/last.cfg file to log_threshold=3. 2. Save the updated last.cfg. 3. Restart the lcfd service. Open command line on the endpoint and run the commands in Example 19-9. Example 19-9 Restart the lcfd service net The The net The The

stop lcfd Tivoli endpoint Tivoli endpoint start lcfd Tivoli endpoint Tivoli endpoint

service is stopping. service was stopped successfully. service is starting. service was started successfully.

Setting Collector logging Mcollect.log debug level 3 is the highest log level and is the most widely used for troubleshooting. It can, however, generate a very large log file in a short amount of time and will wrap every time it reaches the maximum debug_log_size. You should only set the debug level to three when troubleshooting. The Collector must be stopped and started, as illustrated in Example 19-10, when changing the debug level. Once these commands are executed, simply view or tail -f the SCS log file. Remember to set logging back to level 1 after you finish. To enable Collector debugging, run the commands in Example 19-10. All commands are in bold. Example 19-10 Enabling collector debug level wcollect -d 3 @Gateway:win-rptr01a-gw wcollect -h graceful @Gateway:win-rptr01a-gw Performed 'graceful' halt of collector: @Gateway:win-rptr01a-gw. wcollect -s @Gateway:win-rptr01a-gw Started collector: @Gateway:win-rptr01a-gw.

Setting Data Handler logging Setting the Data Handler debug level is similar to setting the Collector, except instead of specifying a Collector, you specify the Data Handler object, as illustrated in Example 19-11 on page 931.

930

Troubleshooting Tivoli Using the Latest Features

Example 19-11 Enabling Data Handler debug level wcollect -d 3 @InvDataHandler:inv_data_handler wcollect -h graceful 3 @InvDataHandler:inv_data_handler Performed 'graceful' halt of collector: @InvDataHandler:inv_data_handler. wcollect -s @InvDataHandler:inv_data_handler Started collector: @InvDataHandler:inv_data_handler.

Setting the gateway debug level A number of interesting messages can also be viewed in the gatelog file. If you are tracing SCS, you should set the gateway debug level to 9. Be sure to set it back to default when you are done tracing. The gateway log file is called gatelog and is located in $DBDIR. This file contains information on downcalls, upcalls, and cache misses, and thus can be very useful when troubleshooting Inventory. Use the wgateway command to set the debug level of the gatelog file. Since you must restart the gateway, make sure there are no active distributions in progress. The commands in Example 19-12 sets the gateway debug level to 9 on a gateway named win-rptr01a-gw. Example 19-12 Setting gatelog to debug level 9 C:\Tivoli>wgateway win-rptr01a-gw set_debug_level 9 C:\Tivoli>wgateway win-rptr01a-gw restart

Setting the RIM trace The RIM log file contains very useful information for troubleshooting database related problems. From the RIM log, you can see what database calls are being made from the Data Handler to the RIM interface and what is returned by the database. To enable RIM trace, do the following: 1. From the RIM host managed node, run the following command: odadmin environ get>c:\environ_rim.txt

2. Edit the c:\environ_rim.txt file by adding the following line: RIM_DB_LOG=c:/temp/invdh_1_rim.log

3. set the RIM_DB_LOG environment variable: odadmin environ set < c:\environ_rim.txt

4. Enable tracing on the Data Handler RIM object: wrimtrace invdh_1 "INFORMATION|ERROR"

Chapter 19. IBM Tivoli Configuration Manager

931

5. Kill all Rim_vendor_Agent processes that are running on the RIM host managed node (see Example 19-13). Example 19-13 Killing the RIM agent process ntprocinfo|grep -i agent 1980 RIM_Oracle_Agent WIN-INV01A\tmersrvd ntprocinfo -k 1980

08/06/2002 10:02:09

RIM tracing is now enabled and will be tracing all invdh_1 RIM object calls. Note: The RIM log can grow and fill up disk space very quickly at “INFORMATION:ERROR”. Be sure to set it back to no tracing when you are done tracing by running the following command: wrimtrace invdh_1 "TRACE_OFF"

Tracing the inventory scan process At this point, you have enabled debugging on the following components: 򐂰 Endpoint 򐂰 Gateway 򐂰 RIM object 򐂰 Data Handler 򐂰 Collector

Distributing an existing inventory profile to a small number of targets Tip: When troubleshooting, keep the target list to a minimum.

For the purposes of explaining the troubleshooting steps, we have separated the Inventory discovery process into multiple steps, as illustrated in Figure 19-10 on page 933.

932

Troubleshooting Tivoli Using the Latest Features

Figure 19-10 Inventory discovery process

Steps 1 and 2 Steps 1 and 2 are concerned with the distribution of the Inventory profile through the MDist2 hierarchy. MDist2 distributions can be monitored through the Distribution Status console or by using the wmdist command. This also applies to the Inventory Profile distribution. The status console can be launched from the Tivoli Desktop by double clicking on the Distribution Status icon (see Figure 19-11 on page 934).

Chapter 19. IBM Tivoli Configuration Manager

933

Figure 19-11 Distribution Status icon

6. From the Distribution Status Console, you must select the Inventory distribution you wish to monitor, as shown in Figure 19-12 on page 935.

934

Troubleshooting Tivoli Using the Latest Features

Figure 19-12 Distribution status console

Refer to the Tivoli Management Framework Reference Manual Version 4.1, SC32-0806 for the wmdist command options.

Steps 3 and 4 In steps 3 and 4, the endpoint receives the profile, scans the endpoint, and creates and parses the MIF files. After the MDist2 distribution is successful, the SCS takes over and the data is sent to the RDBMS. To trace these steps, you need to analyze three log files: lcfd.log, gatelog, and the mcollect.log.

Chapter 19. IBM Tivoli Configuration Manager

935

Example 19-14 gives an example of the lcfd.log at level 3 during steps 3 and 4. Example 19-14 Example of lcfd.log file Aug 12 17:18:16 1 lcfd Spawning: C:\Program Files\Tivoli\lcf\dat\1\cache\bin\w32-ix86\TME\INVENTORY\inv_config_ep_meths.exe , ses: 27810234 Aug 12 17:18:16 3 lcfd full_exec_path = C:\Program Files\Tivoli\lcf\dat\1\cache\bin\w32-ix86\TME\INVENTORY\inv_config_ep_meths.exe Aug 12 17:18:17 3 lcfd command_line = "C:\Program Files\Tivoli\lcf\dat\1\cache\bin\w32-ix86\TME\INVENTORY\inv_config_ep_meths.exe " 27810234 C:\PROGRA~1\Tivoli\lcf\dat\1\last.cfg 336 99999999ld lcfd foo1 foo2 foo3 196608 9HoBAX8VAQE= WIN-ARCH01A w32-ix86 6 1370748664 9.3.4.45+9494 \\.\pipe\lcfd9495 Aug 12 17:18:17 Q lcfd Entering Listener (running). . . . Aug 12 17:18:18 Q ic_discover calling method. Aug 12 17:18:18 Q ic_discover mdist: Local depot directory is not configured. Aug 12 17:18:18 Q ic_discover Entering send_methstat Aug 12 17:18:18 Q ic_discover Entering send_struct Aug 12 17:18:18 Q ic_discover net_send of 1730 bytes, session 662766132 Aug 12 17:18:18 Q ic_discover Leaving send_struct Aug 12 17:18:18 Q ic_discover Leaving send_methstat Aug 12 17:18:18 Q ic_discover Entering net_recv, receive a message Aug 12 17:18:18 Q ic_discover Leaving net_recv: bytes=16, (type=10 session=662766132) . . . . Aug 12 17:18:19 Q ic_discover Entering net_recv, receive a message Aug 12 17:18:19 Q ic_discover Leaving net_recv: bytes=10, (type=11 session=662766132) Aug 12 17:18:19 Q ic_discover net_send of 52 bytes, session 662766132 Aug 12 17:18:26 3 ic_discover full_exec_path = C:\Program Files\Tivoli\lcf\inv\SCAN\wscanner.exe Aug 12 17:18:26 3 ic_discover command_line = "C:\Program Files\Tivoli\lcf\inv\SCAN\wscanner.exe" -i wscanner.cfg Aug 12 17:20:10 Q ic_discover Entering t_generic_stub ... . . . Aug 12 17:20:22 Q mcollect-downcall net_send of 26 bytes, session 662766134 Aug 12 17:20:22 Q mcollect-downcall Entering net_recv, receive a message Aug 12 17:20:22 Q mcollect-downcall Leaving net_recv: bytes=52, (type=5 session=662766134) Aug 12 17:20:22 Q mcollect-downcall net_send of 1672 bytes, session 662766134

936

Troubleshooting Tivoli Using the Latest Features

Aug 12 17:20:22 Q mcollect-downcall Entering net_recv, receive a message Aug 12 17:20:22 Q mcollect-downcall Leaving net_recv: bytes=52, (type=5 session=662766134) Aug 12 17:20:22 Q mcollect-downcall net_send of 10 bytes, session 662766134 Aug 12 17:20:22 Q MethInit method returned. Aug 12 17:20:22 Q MethInit send_results (max/len) 80/6 Aug 12 17:20:22 Q MethInit Entering send_methstat Aug 12 17:20:22 Q MethInit Entering send_struct Aug 12 17:20:22 Q MethInit net_send of 60 bytes, session 662766134 Aug 12 17:20:22 Q MethInit Leaving send_struct Aug 12 17:20:22 Q MethInit Leaving send_methstat Aug 12 17:20:22 2 MethInit Clean Shutdown mc_get_data.

Log file summary 1. When an Inventory distribution/scan starts, you will see the net_recv message, followed by a series of cache checks. The only file that is not in the permanent cache is inv_config_ep_meths(.exe); this file is located in the method cache. 2. After the cache checks you will see the execution of the method ic_discover. You will see that LCFD knows where the executable inv_config_ep_meths is and initialize the method ic_discover. The method always starts up in the $LCFROOT/dat/ directory and then changes to $LCFROOT/inv/SCAN. 3. You will see more net_rcv messages as the files config.dmp, bios.ini, swsigs.ini, and so on, are downloaded. 4. If a software or hardware scan has been configured, you will see the ic_discover method start wscanner. This should be wscanner -i wscanner.cfg. The ic_dicover method created the wscanner.cfg file; it is not downloaded. 5. ic_discover will shut down. 6. You should then see the upcall by ic_discover to Mcollect. Then there should be a series of cache checks for the method mc_get_data. The exec associated with this method is Endpoint_prog1(.exe). 7. You will see the mcollect method looking for the DAT file and a series of net_send and net_rcv messages. 8. The mcollect method will shut down. Example 19-15 on page 938 gives an example of the gatelog.

Chapter 19. IBM Tivoli Configuration Manager

937

Example 19-15 Example of gatelog 2002/08/12 17:31:58 +06 011499A8: downcall: looking for c:/Tivoli/bin/lcf_bundle.40/bin/w32-ix86/TME/INVENTORY/inv_config_ep_meths 2002/08/12 17:31:58 +06 011499A8: downcall: c:/Tivoli/bin/lcf_bundle.40/bin/w32-ix86/TME/INVENTORY/inv_config_ep_meths found. 2002/08/12 17:31:58 +06 011499A8: downcall: Method body /bin/w32-ix86/TME/INVENTORY/inv_config_ep_meths found. 2002/08/12 17:31:58 +06 011499A8: calling impl_to_gw_path /lib/w32-ix86/libccms60_lcf.dll

The gatelog will not have any messages that originated from Inventory source code. However, it will have messages that are a result of an Inventory distribution. 򐂰 You will first see the MDist ID of the Inventory distribution (this is the MDist ID not Inventory scan ID). 򐂰 You see all of the gateway checks for ic_discover methods. 򐂰 Then you will see the processing of the job, followed by the downcalls for the dependencies. 򐂰 MDist will then receive the segments from the distribution. For Inventory, this will be config.dmp, bios.ini, and possibly swsigs.ini. 򐂰 You will be notified if the job is successful

Steps 5 through 7 In steps 5 through 7, the Collector sends collections to the Data Handler. The Data Handler decompresses and decodes the collection before it writes data into the repository through one or multiple RIM interfaces. You can traces these steps by analyzing the Data Handler mcollect.log and invdh_1_rim.log logs. The RIM interface log file is especially useful in investigating database related problems. From the RIM log, you can get the following information: 򐂰 Calls made by Data Handler to the RIM object. This can include database connection information to SQL statements. 򐂰 Database response to calls made. If there is an error while performing a database operation, the error code is usually reflected in this log file.

Example 19-16 on page 939 gives an example of a RIM trace log file.

938

Troubleshooting Tivoli Using the Latest Features

Example 19-16 Example of RIM trace 00002780 [Mon Aug 12 12:43:45 2002] Trace Message - Connection ID:: IOM Command:RETRIEVE2 row_param: Tab le Name :LAST_SIG_UPDATE Columns: UPDATE_TABLE(S): LAST_UPDATE(L): 0 rows:

where_clause: UPDATE_TABLE = '2' number1: 0number2: 0string1: string2: 00002780 [Mon Aug 12 12:43:45 2002] Trace Message - Connection ID:: REPLY IOM COMMAND : RETRIEVE2 Result : Success Result : Success rows: No. of Rows 1 First Row : Table Name :LAST_SIG_UPDATE Columns: UPDATE_TABLE(S): 2 LAST_UPDATE(L): 1 00002780 [Mon Aug 12 12:43:45 2002] Trace Message - Connection ID:: IOM Command:RETRIEVE2 row_param: Table Name :LAST_SIG_UPDATE Columns: UPDATE_TABLE(S): LAST_UPDATE(L): 0 rows:

where_clause: UPDATE_TABLE = '3' number1: 0number2: 0string1: string2: 00002780 [Mon Aug 12 12:43:46 2002] Trace Message - Connection ID:: REPLY IOM COMMAND : RETRIEVE2 Result : Success Result : Success rows: No. of Rows 1 First Row : Table Name :LAST_SIG_UPDATE Columns: UPDATE_TABLE(S): 3 LAST_UPDATE(L): 0 00002780 [Mon Aug 12 12:43:48 2002] Trace Message - Connection ID:: IOM Command:DELETE row_param: rows:

where_clause: COMPUTER_SYS_ID = 'WIN-INV01A-XXX0001' and SWARE_SIG_ID = '0eaaa f792e22141e070f6e3506caa8a9' number1: 0number2: 0string1: MATCHED_SWARE string2: 00002780 [Mon Aug 12 12:43:48 2002] Trace Message - Connection ID::

Chapter 19. IBM Tivoli Configuration Manager

939

REPLY IOM COMMAND : Result : Success rows:

DELETE Result : Success

00002780 [Mon Aug 12 12:43:48 2002] Trace Message - Connection ID:: IOM Comm

Log file summary In the RIM trace log file, you will see the SQL statements that are executed when the RIM interface inserts or updates the Inventory repository. If data was successfully inserted into the repository, you will see the success message reflected here. In case of failures, this log may contain important return code information, which is very useful when troubleshooting.

19.10.2 Troubleshooting on the endpoint The wepscan command has two debug options, -c and -d. These options provide effective troubleshooting tools when diagnosing endpoint specific problems. 򐂰 The -c option reads the profile configuration file (config.dmp) and writes results into a text file sa_config.log. The config.dmp is created on the endpoint when the profile is distributed to it. Example 19-17 is an extract from the sa_config.log file. Example 19-17 Extract from sa_config.log cat sa_config.log =====================Configuration Parameters===================== Configuration Version = {34323030} Endpoint Label = {WIN-RPTR01A} Endpoint OID = {1370748664.7.522+#TMF_Endpoint::Endpoint#} Computer System ID = {ZY8VMYX7KHFPVJ8PFCK6000005D1} Receiver OID = {1370748664.2.35#InvDataHandler#} Update or Replace = {Update with differences}

=====================Hardware Parameters===================== Hardware granularity: Processor - ON Memory - ON Memory Modules - ON Operating System - ON Storage - ON IP Address - ON Modem - ON

940

Troubleshooting Tivoli Using the Latest Features

Network Adapter - ON Partition - ON PC System Params - ON PCI Device - ON Pointing Device - ON SMBIOS - ON Keyboard - ON IPX Address - ON Video - ON Printer - ON USB Device - ON

=====================Software Parameters===================== Software scan = {Native Installer Scanning} {Scan for matching software signatures} Software outfile = {Native Installer Scanning} {Scan for matching software signatures} Exclude Directories: */TMP/ */TEMP/ */TEMPORARY INTERNET FILES/ /RECYCLED/ /RECYCLER/ Include Extensions: *.EXE *.DLL *.COM *.NLM *.SIG Custom Before Script -

Custom Script -

򐂰 -d option has three levels: 1- 3. The three levels are as follows: 1

Logs error messages.

2

Logs error and warning messages.

3

Logs error and warning messages and debugging information. (Debugging information is not available from NetWare or OS/2 endpoints.)

Chapter 19. IBM Tivoli Configuration Manager

941

Depending on the log level used, the following files will be created. All these logs will be saved in the same directory from which you ran wepscan. 򐂰 sa_results.log: Contains the scan data in ASCII format (see Example 19-18). Example 19-18 Extract from sa_results.log cat sa_results.log ======> Inventory 4.2 Results Structure {34323030} Time Stamp ===> {2002-08-19-08.16.46.000000} hardware_system_id ===> {ZY8VMYX7KHFPVJ8PFCK6000005D1} scan_id ===> {0} ======> Delete Rows Insert Rows {COMPUTER} Primary Key[0] {COMPUTER_SYS_ID} Column Name [0] {COMPUTER_SYS_ID} ===> {ZY8VMYX7KHFPVJ8PFCK6000005D1} [1] Column Name [1] {COMPUTER_SCANTIME} ===> {2002-08-19-08.16.46.000000} [2] Column Name [2] {TME_OBJECT_ID} ===> {1370748664.7.522+#TMF_Endpoint::Endpoint#} [1] Column Name [3] {TME_OBJECT_LABEL} ===> {WIN-RPTR01A} [1] Table Name ==> {COMPUTER} Primary Key[0] {COMPUTER_SYS_ID} Column Name [0] {COMPUTER_SYS_ID} ===> {ZY8VMYX7KHFPVJ8PFCK6000005D1} [1] Column Name [1] {COMPUTER_SCANTIME} ===> {2002-08-19-08.15.55.000000} [2] Column Name [2] {TME_OBJECT_ID} ===> {1370748664.7.522+#TMF_Endpoint::Endpoint#} [1] Column Name [3] {TME_OBJECT_LABEL} ===> {WIN-RPTR01A} [1] Table Name ==> {COMPUTER_SYS_MEM} Primary Key[0] {COMPUTER_SYS_ID} Column Name [0] {COMPUTER_SYS_ID} ===> {ZY8VMYX7KHFPVJ8PFCK6000005D1} [1] Column Name [1] {PHYSICAL_TOTAL_KB} ===> {260848} [0] Column Name [2] {PHYSICAL_FREE_KB} ===> {16124} [0] Column Name [3] {TOTAL_PAGES} ===> {65212} [0] Column Name [4] {FREE_PAGES} ===> {4031} [0] Column Name [5] {PAGE_SIZE} ===> {4096} [0] Column Name [6] {VIRT_TOTAL_KB} ===> {1024468} [0]

942

Troubleshooting Tivoli Using the Latest Features

Column Name [7] {VIRT_FREE_KB} ===> {790560} [0] Table Name ==> {INST_PARTITION} Primary Key[0] {COMPUTER_SYS_ID} Primary Key[1] {FS_ACCESS_POINT} Column Name [0] {COMPUTER_SYS_ID} ===> {ZY8VMYX7KHFPVJ8PFCK6000005D1} [1] Column Name [1] {FS_ACCESS_POINT} ===> {C:\} [1] Column Name [2] {DEV_NAME} ===> {C_Drive} [1] Column Name [3] {PARTITION_TYPE} ===> {Logical Drive} [1] Column Name [4] {MEDIA_TYPE} ===> {Local Disk} [1] Column Name [5] {PHYSICAL_SIZE_KB} ===> {39078080} [0] Column Name [6] {FS_TYPE} ===> {OS/2 HPFS | Win NTFS | QNX Ver 2} [1] Column Name [7] {FS_MOUNT_POINT} ===> {C:\} [1] Column Name [8] {FS_TOTAL_SIZE_KB} ===> {39078080} [0] Column Name [9] {FS_FREE_SIZE_KB} ===> {33007304} [0] Table Name ==> {NET_ADAPTER} Primary Key[0] {COMPUTER_SYS_ID} Primary Key[1] {PERM_MAC_ADDR} Column Name [0] {COMPUTER_SYS_ID} ===> {ZY8VMYX7KHFPVJ8PFCK6000005D1} [1] Column Name [1] {PERM_MAC_ADDR} ===> {00:02:55:BF:AC:F2} [1] Column Name [2] {CURRENT_ADDR} ===> {00:02:55:BF:AC:F2} [1] Column Name [3] {ADAPTER_TYPE} ===> {} [1] Column Name [4] {ADAPTER_MODEL} ===> {Intel(R) PRO/100 VE Desktop Connection} [1] Column Name [5] {MANUFACTURER} ===> {Intel} [1] Column Name [6] {INST_DATE} ===> {} [1]

======> History Rows {DUMMY} Table Name ==> {NET_ADAPTER} Primary Key[0] {COMPUTER_SYS_ID} Primary Key[1] {PERM_MAC_ADDR} Column Name [0] {COMPUTER_SYS_ID} ===> {ZY8VMYX7KHFPVJ8PFCK6000005D1} [1] Column Name [1] {PERM_MAC_ADDR} ===> {00:60:94:89:42:39} [1] Column Name [2] {CURRENT_ADDR} ===> {00:60:94:89:42:39} [1] Column Name [3] {ADAPTER_TYPE} ===> {} [1] Column Name [4] {ADAPTER_MODEL} ===> {IBM 16/4 Token-Ring PCI Management Adapter} [1] Column Name [5] {MANUFACTURER} ===> {IBM} [1] Column Name [6] {INST_DATE} ===> {} [1]

򐂰 sa_config.log: The same log file that is created using the –c option.

Chapter 19. IBM Tivoli Configuration Manager

943

򐂰 INV_SA.LOG: Contains debugging information. It is identical to the log file that is created using the wdistinv command and the inv_ep_debug keyword. If you used the -s option, this log file is sent to the inventory Data Handler.

Example 19-19 is an extract from the INV_SA.LOG file. Example 19-19 Extract from INV_SA.LOG cat INV_SA.LOG ************************************************************* Aug 19 08:15:53 0 [pid:00002140] Entering initialize log file: debug level = 3 Aug 19 08:15:53 0 [pid:00002140] wepscan is starting Aug 19 08:15:53 3 [pid:00002140] INFO: config.dmp has been found and opened Aug 19 08:15:53 3 [pid:00002140] INFO: call sa_discover Aug 19 08:15:53 3 [pid:00002140] call sa_clean_files Aug 19 08:15:53 3 [pid:00002140] INFO: get scan configuration Aug 19 08:15:53 3 [pid:00002140] INFO: BIOS scan is starting Aug 19 08:15:54 3 [pid:00002140] INFO: BIOS scan finished Aug 19 08:15:54 3 [pid:00002140] INFO: DMI scan is starting Aug 19 08:15:55 3 [pid:00002140] INFO: DMI scan finished Aug 19 08:15:55 3 [pid:00002140] INFO: clean old hardware scan log file Aug 19 08:15:55 3 [pid:00002140] INFO: inventory scan is starting Aug 19 08:15:55 3 [pid:00002140] INFO: call write_scanner_configuration Aug 19 08:15:55 3 [pid:00002140] INFO: call run_wscanner Aug 19 08:16:46 0 [pid:00002140] The logging information from hardware scan

/-----------------------------------------------------------------------------\ Start Time: Mon Aug 19 08:15:55 2002 Initializing SMBIOS Tables... /-----------------------------------------------------------------------------\ . . . Aug 19 08:16:46 3 [pid:00002140] INFO: Parsing MIF file:dmiscan.mif Aug 19 08:16:46 3 [pid:00002140] INFO: call mif_par_init_global Aug 19 08:16:46 3 [pid:00002140] INFO: call mif_yyparse Aug 19 08:16:46 3 [pid:00002140] INFO: call save_component Aug 19 08:16:46 3 [pid:00002140] INFO: finish Parsing MIF file Aug 19 08:16:46 3 [pid:00002140] INFO: The data option for this scan is Update with differents Aug 19 08:16:46 3 [pid:00002140] INFO: call mif_par_init_global Aug 19 08:16:46 3 [pid:00002140] INFO: call mif_yyparse Aug 19 08:16:46 3 [pid:00002140] INFO: call save_component Aug 19 08:16:46 3 [pid:00002140] INFO: call diff_mif_component Aug 19 08:16:46 3 [pid:00002140] INFO: process custom scan result

944

Troubleshooting Tivoli Using the Latest Features

Aug 19 08:16:46 delete rows Aug 19 08:16:46 insert rows Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 scan Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 with differents Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 tables Aug 19 08:16:46 tables Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 match scan Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:46 Aug 19 08:16:47 Aug 19 08:16:47 with differents Aug 19 08:16:47 Aug 19 08:16:47 Aug 19 08:16:47 Aug 19 08:16:47 Aug 19 08:16:47 Aug 19 08:16:47 tables Aug 19 08:16:47 tables Aug 19 08:16:47 Aug 19 08:16:47 Aug 19 08:16:47 Aug 19 08:16:47 a DAT file

3 [pid:00002140] INFO: call process_custom_component, process 3 [pid:00002140] INFO: call process_custom_component, process 3 [pid:00002140] INFO: call mif_dispose_component 3 [pid:00002140] INFO: call mif_dispose_component 3 [pid:00002140] INFO: call process_mif_results for registry 3 3 3 3 3 3

[pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140]

INFO: INFO: INFO: INFO: INFO: INFO:

Parsing MIF file:tivrscan.mif call mif_par_init_global call mif_yyparse call save_component finish Parsing MIF file The data option for this scan is Update

3 3 3 3 3 3

[pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140]

INFO: INFO: INFO: INFO: INFO: INFO:

call mif_par_init_global call mif_yyparse call save_component call diff_mif_component process Registry scan result call process_tables, process insert

3 [pid:00002140] INFO: call process_tables, process delete 3 [pid:00002140] INFO: call mif_dispose_component 3 [pid:00002140] INFO: call mif_dispose_component 3 [pid:00002140] INFO: call process_mif_results for Signature 3 3 3 3 3 3

[pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140]

INFO: INFO: INFO: INFO: INFO: INFO:

Parsing MIF file:tivsscan.mif call mif_par_init_global call mif_yyparse call save_component finish Parsing MIF file The data option for this scan is Update

3 3 3 3 3 3

[pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140]

INFO: INFO: INFO: INFO: INFO: INFO:

call mif_par_init_global call mif_yyparse call save_component call diff_mif_component process Signature Match scan result call process_tables, process insert

3 [pid:00002140] INFO: call process_tables, process delete 3 3 3 3

[pid:00002140] [pid:00002140] [pid:00002140] [pid:00002140]

INFO: call mif_dispose_component INFO: call mif_dispose_component INFO: run custom script INFO: finished mif parsing and xlation, create

Chapter 19. IBM Tivoli Configuration Manager

945

Aug 19 08:16:47 3 [pid:00002140] INFO: sa_discover finished Aug 19 08:16:47 0 [pid:00002140] wepscan finished Aug 19 08:16:47 0 [pid:00002140] log file is closed

You can also create these log files by creating an environment variable on your system named WEPSCAN_DEBUG. Set this environment variable to a value of 1, 2, or 3. These values correspond to the options you specify with the -d option. When using the -d option, the libInvHW.log is created. This file contains more detailed debug information about the Hardware scan library. It is very useful when troubleshooting hardware scan related problem. At this point in time, there is no similar log file for the software scan library. Example 19-20 is an example of the log file. Example 19-20 Extract from libInvHW.log file cat libInvHW.log /-----------------------------------------------------------------------------\ Start Time: Mon Aug 19 08:15:55 2002 Initializing SMBIOS Tables... /-----------------------------------------------------------------------------\ Begin Group SMBIOS getTable() Read f0000 segment Found _DMI_ at 000f:5F60 tablelen=1922 tableaddr=000EFA50 seg=000E off=FA50 numTables=54 revision=0000 maj=0 min=0 Copy from e segment Split across e segment and f segment 0000 00 14 00 00 01 02 2C E5 03 07 80 DE 08 00 00 00 --- >......,.........< 0010 00 00 B7 01 49 42 4D 00 32 30 4B 54 33 32 41 55 --- >....IBM.20KT32AU< 0020 53 00 30 33 2F 32 31 2F 32 30 30 32 00 00 01 19 --- >S.03/21/2002....< . . . \-----------------------------------------------------------------------------/ Finished group #4 /-----------------------------------------------------------------------------\ Begin Group Storage getTable() WMI Instance Enumeration done - DiskDrive.

946

Troubleshooting Tivoli Using the Latest Features

Grabbed a WMI data set. Input - Manufac: [(Standard disk drives)] Model: [MAXTOR 6L040J2] Description: [Disk drive] Caption: [MAXTOR 6L040J2] Output - Manufac: [MAXTOR] Model: [MAXTOR 6L040J2] PNPid: IDE\DISKMAXTOR_6L040J2__________________________A93.0500\3636323235343334303136 382020202020202 Serial: [662245431086] Cyl: 4866 Sec: 63 Heads: 255 Size(MB): 38170 WMI Instance Enumeration done - CD-ROM drive. Grabbed a WMI data set. Input - Manufac: [(Standard CD-ROM drives)] Model: [SAMSUNG CDRW/DVD SM-308B] Description: [CD-ROM Drive] Caption: [SAMSUNG CDRW/DVD SM-308B] Output - Manufac: [SAMSUNG] Model: [SAMSUNG CDRW/DVD SM-308B] PNPid: IDE\CDROMSAMSUNG_CDRW/DVD_SM-308B________________T103____\5&23E77B36&0&0.0.0 Serial: [] WMI Instance Enumeration done - floppy drive. Grabbed a WMI data set. Input - Manufac: [(Standard floppy disk drives)] Model: [Floppy disk drive] Description: [Floppy disk drive] Caption: [Floppy disk drive] Output - Manufac: [] Model: [] End Group Storage getTable()

Chapter 19. IBM Tivoli Configuration Manager

947

948

Troubleshooting Tivoli Using the Latest Features

20

Chapter 20.

Tivoli Remote Control A help desk or system administrator is often required to manage or correct problems on PCs that are in a different location than the administrator. Tivoli Remote Control is used to take control of any Intel-based system in the Tivoli environment from another Intel-based machine in the Tivoli environment. Note: This chapter is based on Tivoli Remote Control Version 3.7.

The following topics are discussed in this chapter: 򐂰 Section 20.1, “Tivoli Remote Control components” on page 950 򐂰 Section 20.2, “Troubleshooting Tivoli Remote Control” on page 952

© Copyright IBM Corp. 2003. All rights reserved.

949

20.1 Tivoli Remote Control components There are three components of Tivoli Remote Control. These are: 򐂰 Tivoli Remote Control Server 򐂰 Tivoli Remote Control Target 򐂰 Tivoli Remote Control Controller

Together, these components add Tivoli Remote Control capabilities to support most configurations. Installing a component on a machine in the Tivoli environment assigns one or more specific roles to that machine. Tivoli Remote Control operates with the following roles: Target

A machine that can be controlled from a Controller.

Controller

A machine that has the capabilities to take control of targets.

Gateway

A relay-station used to control the TCP/IP flow and optionally do protocol conversion between TCP/IP and SPX/IPX protocols. Note that this is distinct from the gateway that manages TMA endpoints.

Server

The system that controls the Tivoli Remote Control environment.

The minimum usable configuration involves at least one server, one controller, and one target. The mechanism used to ensure that the user at the controller is authorized to take control of the target is implemented using the Tivoli Management Framework. The same applies to the initialization of sessions between the controller and target. This implies that some sort of Tivoli Management Framework stub (endpoint or Tivoli Management Framework) has to be present on all nodes in the Tivoli Remote Control environment. From a controller, the user can issue remote commands on a target, much like the TCP/IP rexec service, or activate a remote control session. To do the latter, the user at the controller uses the Tivoli Desktop to select the target for the session, parameters to control the session initialization, and to ask the Remote Control server to initialize the session.

950

Troubleshooting Tivoli Using the Latest Features

20.1.1 Remote Control trace To find out what is really happening behind the scenes of Tivoli Remote Control, a number of facilities are available. As for any other Tivoli applications, the oserv command gives a first indication of any problems. To further investigate problems, Tivoli Remote Control allows the generation of a unique trace file for each session. This option is available only for Windows 2000, Windows NT, Windows 95/98, and OS/2 controllers and targets. On Windows 2000, Windows NT, and Windows 95/98 platforms, tracing is controlled using a registry key. The registry paths are: 򐂰 Target: HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Remote Control Target\trace length 򐂰 Controller: HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Remote Control Controller\trace length

The trace length value shows how many bytes are to be traced for each communication buffer. The recommended value is 40. The default value, 0, means no trace. A value other than 0 will start tracing, dumping the first number of bytes, specified in trace length, from each communication buffer to the trace file. The trace files are created in the directory: \PCREMOTE\\CTL

Where is the directory where the Tivoli Management Agent has been installed and can have one of the following values: 򐂰 w32-ix86 for Windows NT or Windows 2000 򐂰 w95 for Windows 95 or Windows 98

The names of the trace files are: 򐂰 Chat: CHCTLTRC.TXT for Controller, CHTGTTRC.TXT for Target 򐂰 File Transfer: FTCTLTRC.TXT, FTTGTTRC.TXT for Target 򐂰 Remote Control: TRACE\mmddhhss.TRC for both Controller and Target

Where mm is month, dd is day, hh is hour, and ss is minutes Tracing is also available for the OS/2 platform. To set the trace on an OS/2 controller or target, add the following statement to the CONFIG.SYS file: SET RCDEBUG=xxyy

Chapter 20. Tivoli Remote Control

951

Where xxyy can have the following values: 򐂰 0100 for Tivoli Remote Control. It enables the events to be logged in a file. 򐂰 02yy for TCP/IP. yy represents the number of bytes for each send or receive to be dumped in the trace file. You can specify a maximum value of 99. The recommended value is 40. 򐂰 04yy IPX for IPX (only for the OS/2 controller). yy represents the number of bytes for each send or receive to be dumped in the trace file. You can specify a maximum value of 99. The recommended value is 40.

20.1.2 Tivoli Remote Control logging Besides the trace files, Tivoli Remote Control offers the option of logging the start and stop of sessions. This option is available only for Windows 2000, Windows NT and Windows 95/98, and like the tracing options, it is controlled by settings in the local registry of the individual machine. The paths are: 򐂰 Target: HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Remote Control Target\logging 򐂰 Controller: HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Remote Control Controller\logging

To set event logging, change the following registry key logging value to 1. Allowable values are: 򐂰 0 for no logging (default) 򐂰 1 for log events

There are five events that are logged: 򐂰 Taking over a target 򐂰 Closing session with target 򐂰 Closing session with target, with error 򐂰 Could not establish a session with target 򐂰 Startup error

20.2 Troubleshooting Tivoli Remote Control Troubleshooting the Tivoli Remote Control environment involves the usual Tivoli Management Framework troubleshooting procedures; looking in the output from odstat and/or wtrace. In addition, on Windows 2000, NT and Windows 95/98,

952

Troubleshooting Tivoli Using the Latest Features

information can be gathered from the eventlog and the trace files. Tracing is also an option on OS/2.

20.2.1 Tivoli Management Framework troubleshooting From the TMR Server, or the Tivoli Remote Control gateway, the odstat output will contain information with respect to starting and stopping the various components. Look for the following methods: 򐂰 start_gateway 򐂰 start_target 򐂰 start_controller 򐂰 close_gateway

There are no references to the stop of controller-target sessions.

20.2.2 Windows eventlog If logging is enabled, the eventlog on controllers will show the events that are discussed in 20.1.2, “Tivoli Remote Control logging” on page 952. The information is stored locally and includes the type of event and the TCP/IP address of the target, as shown in Figure 20-1.

Figure 20-1 Remote Control Controller event

Chapter 20. Tivoli Remote Control

953

20.2.3 Trace files The information in the trace files is intended for internal usage, and is not documented. In order to interpret the information, a deep knowledge of the protocol used by Tivoli Remote Control to send/receive data is needed. For practical purposes, the trace file will be of use to determine when and how much data is passing between the controller and the target. An example of a trace file from a Controller is shown in Example 20-1. Example 20-1 Extract from a Tivoli Remote Control trace file Open succeeded. --> rc = 0000000000 Send starting. --> rc = 0000000000 Send succeeded. --> rc = 0000000000 buffer lenght = 11 dumping = 11 0B 00 00 41 67 6F 50 61 6F 6C 6F Send starting. --> rc = 0000000000 Send succeeded. --> rc = 0000000000 buffer lenght = 26 dumping = 26 1A 00 00 12 00 01 02 00 02 01 6F 02 00 45 51 4E 4B 42 45 4E 55 2E 44 41 54 00 First Receive. --> rc = 0000000002 buffer lenght = 2 dumping = 2 0C 03 Receive succeeded. --> rc = 0000000000 buffer lenght = 780 dumping = 200 0C 03 00 12 00 24 00 00 04 00 03 03 00 00 00 00 80 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF FF 80 00 00 00 80 80 00 80 00 00 00 80 00 80 00 00 80 80 80 00 C0 C0 C0 00 00 00 FF 00 00 00 FF FF 00 FF 00 00 00 FF 00 FF 00 00 FF FF FF 00 0A AA AA AA 3B 00 05 0F 00 00 10 49 00 B8 00 14 00 B8 00 00 3A 00 5C 00 57 00 49 00 4E 00 4E 00 00 73 00 79 00 73 00 74 00 65 00 6D 00 00 3B 00 43 00 3A 00 5C 00 57 00 49 00

954

Troubleshooting Tivoli Using the Latest Features

01 00 FF 80 FF FF 43 14 54 33 4E

00 00 00 00 00 00 00 00 00 00 00

00 00 00 80 00 FF C8 43 5C 32 5C

00 3F 00 3F 00 5C 00 43 00 3A 00 5C 00 57 00 49 00 4E 00 4E 00 54 00 5C First Receive. --> rc = 0000000002 buffer lenght = 2 dumping = 2 10 00 Receive succeeded. --> rc = 0000000000 buffer lenght = 16 dumping = 16 10 00 00 12 00 02 14 00 05 73 00 02 00 03 03 00

Chapter 20. Tivoli Remote Control

955

956

Troubleshooting Tivoli Using the Latest Features

Part 4

Part

4

Security applications

© Copyright IBM Corp. 2003. All rights reserved.

957

958

Troubleshooting Tivoli Using the Latest Features

21

Chapter 21.

IBM Tivoli Access Manager for Operating Systems This chapter provides a brief overview of IBM Tivoli Access Manager for Operating Systems, including auditing and problem determination techniques that will help the implementer establish a repository of problem determination information. Note: This chapter is based on IBM Tivoli Access Manager for Operating Systems Version 4.1.

The following topics are covered in this chapter: 򐂰 Section 21.1, “Introduction” on page 960 򐂰 Section 21.2, “Components and architecture” on page 960 򐂰 Section 21.3, “Auditing” on page 962 򐂰 Section 21.4, “Troubleshooting” on page 965

© Copyright IBM Corp. 2003. All rights reserved.

959

21.1 Introduction IBM Tivoli Access Manager for Operating Systems acts as an extra layer of security over the standard 'permission bits' provided by UNIX. It hooks into the operating system to make access control decisions on security-sensitive operations (such as opening a file, logging in, or using a TCP/IP port). Because it hooks into system calls, we can audit what goes on and track who is trying to access what resources. Because IBM Tivoli Access Manager for Operating Systems can secure files, we can increase the security of audit files and application logs by preventing unauthorized access. All of our controls apply to root as well as any other user. This allows us to partition the capabilities of an administrator and continue to protect critical resources to avoid accidental or deliberate damage. IBM Tivoli Access Manager for Operating Systems provides the same capabilities, managed in the same way, across all the major UNIX platforms. This becomes more significant with Linux, as even single-vendor UNIX implementations are quite likely to add Linux systems, increasing the complexity due to the management of multiple system types. IBM Tivoli Access Manager for Operating Systems can share the management infrastructure with the rest of Tivoli Policy Director (although all the components are included in IBM Tivoli Access Manager for Operating Systems; there are no pre-reqs). Because IBM Tivoli Access Manager for Operating Systems used to be a part of Tivoli Security Manager, and Security Manager is now part of the provisioning piece in IBM Tivoli Identity Manager, there is very good integration between IBM Tivoli Access Manager for Operating Systems and IBM Tivoli Identity Manager. IBM Tivoli Access Manager for Operating Systems extends the provisioning in IBM Tivoli Identity Manager to become a part of the role-based access control management that IBM Tivoli Identity Manager provides.

21.2 Components and architecture Figure 21-1 on page 961 explains the components and architecture of IBM Tivoli Access Manager for Operating Systems.

960

Troubleshooting Tivoli Using the Latest Features

Figure 21-1 IBM Tivoli Access Manager architecture and components

This diagram is best explained from the bottom up: 򐂰 Bottom layer

Each computer near the bottom of the chart represents a UNIX system with a specific security configuration. Multiple systems can be grouped together into a branch so that every system gets the same security policy (this helps with the manageability of policy). Each system includes some IBM Tivoli Access Manager for Operating Systems code that intercepts system calls and determines access based on policy. Each system caches data on disk locally to allow continued operation in the event it is separated from the Policy Director management server (this data is also used for performance reasons) IBM Tivoli Access Manager for Operating Systems does not go to the Tivoli

Chapter 21. IBM Tivoli Access Manager for Operating Systems

961

Policy Director server for access control decisions; the decision is made locally. IBM Tivoli Access Manager for Operating Systems can send security events to IBM Tivoli Enterprise Console and IBM Tivoli Risk Manager. 򐂰 Middle layer

Administration of IBM Tivoli Access Manager for Operating Systems is performed through one or more Policy Director servers (the pdadmin and Web Portal Manager command and web interfaces). This provides a single view of all access policies, as well as a consolidated view of all users. 򐂰 Top Layer

The IBM Tivoli Access Manager for Operating Systems environment can be managed as a part of a larger provisioning implementation by adding IBM Tivoli Identity Manager. This allows IBM Tivoli Access Manager for Operating Systems to be integrated into the same administration model as other operating systems and provides a mechanism for centralizing the creation of user accounts (which must exist both in Tivoli Policy Director and on the individual UNIX systems).

21.3 Auditing IBM Tivoli Access Manager for Operating Systems provides extensive auditing capabilities that permit you to track authorization access decisions made on protected resources, as well as the monitoring activity of an administrative nature, such as the starting and stopping of the daemons. This section provides information about the types of events that can be audited, the format of the resulting log entries, and how to view the log entries.

21.3.1 Auditing authorization decisions You can audit authorization access decisions for specific resources by enabling resource based auditing. Use Protected Object Policy (POP) access controls to enable resource based auditing. To enable it, do the following steps: 1. Create a POP. 2. Set the audit level attribute to permit, deny, or both. 3. Attach the POP to the resources you want audited. Audit records for authorization access decisions are also generated if the permit or deny level is set in the global audit level. The auditing levels for the global audit level and the resource audit level are cumulative. For example, if the global audit level is set to deny, and a resource has a POP attached to it with an audit

962

Troubleshooting Tivoli Using the Latest Features

level of permit, every authorization decision for access to that resource will be audited. You can audit authorization decisions that are specific to login by setting the global loginpermit and logindeny audit levels. Setting the loginpermit global audit level results in the generation of audit records for all login authorization decisions that permit the login action. Setting the logindeny global audit level results in the generation of audit records for all login authorization decisions that deny the login action. Authorization decisions that are specific to login are also audited if the global permit and deny audit levels are set. The loginpermit and logindeny audit levels allow you to globally audit login separately from other authorization decisions.

21.3.2 Auditing administrative activity You can audit administrative activity by setting the admin audit level in the global audit level. The admin audit level causes IBM Tivoli Access Manager for Operating Systems to generate audit records for events such as starting and stopping the IBM Tivoli Access Manager for Operating Systems daemons, loss of connectivity with the Tivoli Policy Director User Registry, TCB-related activity, such as a file being marked untrusted by the TCB monitoring function, and the detection of invalid policy. The admin audit level also causes the generation of audit records for events related to a user login account being enabled or disabled when a login activity policy is being enforced.

21.3.3 Auditing trace events IBM Tivoli Access Manager for Operating Systems supports the auditing of trace_exec and trace_file audit events. Trace style audit events are generated by setting the trace_exec and trace_file levels in the global audit level. Setting the trace_exec global audit level results in the generation of an audit record for each exec() system call. These records are generated regardless of whether the program being executed is protected by the IBM Tivoli Access Manager for Operating Systems policy or not. Setting the trace_file global audit level results in the generation of an audit record for each access to a file system resource that is protected by IBM Tivoli Access Manager for Operating Systems policy. Note that trace_exec and trace_file audit records are generated only for processes that descend from a login event that was detected by IBM Tivoli Access Manager for Operating Systems. The following processes do not generate trace_exec or trace_file audit records: 򐂰 Processes that are started or descended from the UNIX init process during system boot.

Chapter 21. IBM Tivoli Access Manager for Operating Systems

963

򐂰 Processes that are active before IBM Tivoli Access Manager for Operating Systems and its processes are started. 򐂰 Processes that are running programs that are registered as Immune-Programs in the Trusted Computing Base (TCB).

21.3.4 Global audit levels Table 21-1 shows the global audit levels for IBM Tivoli Access Manager for Operating Systems. Table 21-1 Global audit levels

964

Audit Level

Description

none

Turns off all auditing. This is the default value.

permit

Tracks all authorization decisions that permit access to a protected resource.

deny

Tracks all authorization decisions that deny access to a protected resource.

loginpermit

Tracks all login authorization decisions that permit the login.

logindeny

Tracks all login authorization decisions that deny the login.

admin

Tracks activity of an administrative nature. For example, if the global audit level has the admin level set, a log entry is created each time one of the IBM Tivoli Access Manager for Operating Systems daemons is started or stopped.

trace_exec

Tracks program invocations initiated by exec() that occur in processes that descend from a login event that was detected by IBM Tivoli Access Manager for Operating Systems.

trace_file

Tracks all accesses to protected files.

all

Enables all of the following audit levels.

info

Tracks actions that are done automatically, such as receiving valid policy updates.

verbose

Enables all of the following audit levels: 򐂰 permit 򐂰 deny 򐂰 loginpermit 򐂰 logindeny 򐂰 admin 򐂰 info

Troubleshooting Tivoli Using the Latest Features

21.3.5 Using warning mode to verify policy You can check the effects of an authorization policy on a system without enabling enforcement of the policy by enabling the warning mode. If warning mode is enabled, an audit record is generated for accesses to resources that would normally be denied due to policy but are granted because of warning mode. View the audit log to determine if the current authorization policy is having the desired effect. You can enable warning mode globally for all policy or for specific protected resources. Note: If you enable global warning, you have no enforcement in effect. Make sure you enable enforcement again, when required.

Enabling, disabling, and querying global warning mode To enable global warning mode immediately, enter: pdosctl -w on

To disable global warning mode immediately, enter: pdosctl -w off

To enable global warning mode to take effect the next time IBM Tivoli Access Manager for Operating Systems is restarted, enter: pdoscfg -warning on

To disable global warning mode to take effect at the next restart of IBM Tivoli Access Manager for Operating Systems, enter: pdoscfg -warning off

To query the current global warning mode setting, specify the -w with no arguments: pdosctl -w

The output is: The global warning mode setting is off

21.4 Troubleshooting In this section, we will cover the troubleshooting information for IBM Tivoli Access Manager for Operating Systems.

Chapter 21. IBM Tivoli Access Manager for Operating Systems

965

21.4.1 IBM Tivoli Access Manager for Operating Systems log files Each IBM Tivoli Access Manager for Operating Systems daemon maintains a log file that records significant events and error conditions. The records written to the log files contain a UTC time stamp, information identifying the IBM Tivoli Access Manager for Operating Systems code logging the message, the message classification, and the message text. The message classification indicates the severity of the message and is NOTIFY, WARNING, ERROR, or FATAL. These log files are in the /var/pdos/log directory and are named pdos_daemon_name.log. The log files can be useful for diagnostic purposes. Figure 21-2 shows the log files used by IBM Tivoli Access Manager for Operating Systems.

Figure 21-2 IBM Tivoli Access Manager for Operating Systems log files

21.4.2 Installation problems How do you determine if IBM Tivoli Access Manager for Operating Systems is installed on a UNIX managed node or endpoint (IBM Tivoli Access Manager for Operating Systems machine)? Run the following commands on the IBM Tivoli Access Manager for Operating Systems machine: 򐂰 Generic check: Has IBM Tivoli Access Manager for Operating Systems been installed? ls -l /opt/pdos/bin ls -l /opt/pdos/kernel ls -l /var/pdos

966

Troubleshooting Tivoli Using the Latest Features

򐂰 Platform specific check: Was the IBM Tivoli Access Manager for Operating Systems installation successful? AIX: lslpp -l | grep -i pdos Solaris: pkginfo -i | grep -i pdos HP: swlist -l fileset | grep -i pdos Redhat/Linux: rpm -qa | grep pdos

If IBM Tivoli Access Manager for Operating Systems is not installed, make sure all prerequisite products are installed (see the IBM Tivoli Access Manager for Operating Systems Release Notes, GI11-0951) and then reinstall IBM Tivoli Access Manager for Operating Systems (see the TSSM Supplement for Policy Director, GC32-0473). If that fails, check the installation process. Also, check the disk space on the IBM Tivoli Access Manager for Operating Systems machine.

21.4.3 Configuration problems Check that IBM Tivoli Access Manager for Operating Systems has been successfully configured. The output from the pdoscfg command will show either: "The configuration process completed successfully”

or “The configuration process did not complete successfully. See /var/pdos/log/pdoscfg.log for details.”

If configuration is not successful, check /var/pdos/log/pdoscfg.log for errors, look for ERROR, and then look at the surrounding lines for more information. Typical configuration problems occur due to problems with: 򐂰 Wrong certificate files used. 򐂰 Connectivity to Policy Director management server or LDAP server. 򐂰 Policy Director management server or LDAP set up incorrectly. 򐂰 Not enough disk space in /var/pdos.

Check connectivity to Tivoli Policy Director server Enter the following command on the IBM Tivoli Access Manager for Operating Systems machine (as root): pdadmin -a sec_master -p

If it works, try: pdadmin> acl list

Chapter 21. IBM Tivoli Access Manager for Operating Systems

967

If that works, you have basic connectivity to the Tivoli Policy Director server. If you do not have connectivity, try the following: 򐂰 Make sure you can ping the Tivoli Policy Director server machine. 򐂰 Make sure the Policy Director Runtime configuration parameters were entered correctly (look at /opt/PolicyDirector/etc/pd.conf). 򐂰 Make sure the Tivoli Policy Director certificate files on the IBM Tivoli Access Manager for Operating Systems machine and the Tivoli Policy Director server machine match (use the cksum UNIX command). 򐂰 On the Tivoli Policy Director server machine, make sure the pdmgrd process is running.

Check connectivity to LDAP server Enter the following command on the IBM Tivoli Access Manager for Operating Systems machine (as root): ldapsearch -h -p -D cn=root -w -b "" -s base -v "objectclass=*"

Note: The LDAP server name and port number are in the /opt/PolicyDirector/etc/ldap.conf file.

If that works, you have basic connectivity to the LDAP server; if you get the error message Can't contact LDAP server, try the following: 򐂰 Make sure you can ping the LDAP server machine. 򐂰 Make sure the LDAP certificate file used during configuration of IBM Tivoli Access Manager for Operating Systems matches the certificate used by the LDAP server (use the cksum UNIX command). 򐂰 Check the Policy Director Runtime configuration information. Look at /opt/PolicyDirector/etc/ldap.conf. 򐂰 On the LDAP server machine, make sure the slapd process is running.

Check SSL connectivity to LDAP server Enter the following command on the IBM Tivoli Access Manager for Operating Systems machine (as root): ldapsearch -h -p -Z -K /var/pdos/certs/pdos.kdb -b " " -s base -v "objectclass=*"

Note: LDAP server name and ssl port number are in the /opt/PolicyDirector/etc/ldap.conf file. Default SSL port number is 636.

968

Troubleshooting Tivoli Using the Latest Features

If that works, you have SSL connectivity to the LDAP server. If it fails, try the following: 򐂰 Make sure the LDAP certificate file used during configuration of IBM Tivoli Access Manager for Operating Systems matches the certificate used by the LDAP server (use the cksum UNIX command). 򐂰 Check network connectivity using ping, if you have not already done so.

Check certificate files IBM Tivoli Access Manager for Operating Systems configuration uses two certificate files: 򐂰 Policy Director server certificate file (for example, pdcacert.b64) is created when Tivoli Policy Director server is configured (/var/PolicyDirector/keytabs/pdcacert.b64). 򐂰 PDMgr can be configured to let the cert be automatically downloaded. The LDAP server SSL certificate file (for example, ldapcacert.b64) is created during LDAP server configuration.

These files have to be placed on the IBM Tivoli Access Manager for Operating Systems machine before the IBM Tivoli Access Manager for Operating Systems installation/configuration. Use the IBM Tivoli Access Manager for Operating Systems provided Certificate Transfer Utility or ftp (binary mode) to transfer the file. If the wrong certificate files are placed on the IBM Tivoli Access Manager for Operating Systems machine, the IBM Tivoli Access Manager for Operating Systems configuration will fail.

Check disk space on IBM Tivoli Access Manager for Operating Systems machine Use the following command to check the disk space on UNIX platforms: df -k

The -k option causes disk space to be displayed in KB. This displays the amount of free space for each mounted file system. If the root file system (mounted on "/") is out of space or very low on space, there are likely to be problems with the OS itself, not just the IBM Tivoli Access Manager for Operating Systems. The IBM Tivoli Access Manager for Operating Systems Release Notes recommend that there be separate file systems mounted on /var/pdos, /var/pdos/log, and /var/pdos/audit. These are key to IBM Tivoli Access Manager

Chapter 21. IBM Tivoli Access Manager for Operating Systems

969

for Operating Systems. If one is very low on space, this may adversely affect IBM Tivoli Access Manager for Operating Systems. If there is no file system mounted on /var/pdos, then check /var. Note: HP and Solaris systems by default only allow non-root writes to a file system if it is less than 90% full. This is for file systems of type hpfs and ufs. Most of the IBM Tivoli Access Manager for Operating Systems daemons and commands run as osseal.

21.4.4 Run-time problems Check the following for IBM Tivoli Access Manager for Operating Systems run-time problems: 򐂰 Make sure all IBM Tivoli Access Manager for Operating Systems daemons are running. 򐂰 Check for core files (/var/pdos//core). 򐂰 Check the log files in /var/pdos/log (and /var/pdos/pdostecd). 򐂰 Check the policy with pdadmin or Web Portal Manager.

– pdosd.log and admin auditing show replica updates were handled. – Use audit as a verification tool. 򐂰 Verify user credentials.

– pdoswhoami displays access info associated with the invoking user. – pdoswhois displays access info associated with a specific process ID. – The pdosrefresh command refreshes credentials. – Use the pdadmin command to verify user info in Policy Director. 򐂰 Check the disk space on IBM Tivoli Access Manager for Operating Systems machine. 򐂰 Use debug tracing if the problem is reproducible. 򐂰 Use pdosexempt to exempt a shell window for debug purposes.

Verification of policy Make sure you enable enforcement again when required. Use warning mode to verify the policy: 򐂰 Check the effect of authorization policy without enforcement. Warning mode is disabled by default. An audit record is generated for accesses to resources that would normally be denied.

970

Troubleshooting Tivoli Using the Latest Features

Enable global warning mode or resource warning mode with the following commands: – pdosctl -w on – pdosctl -w off – pdoscfg -warning on – pdoscfg -warning off

Global warning mode on means no policy enforcement. You can also use auditing to verify policy: – Set the audit level, globally or resource specific. – Monitor the effects of the authorization policy. Also note that this might not be feasible in environments where the customer uses auditing extensively, as they might not want to change their predefined levels. In such cases, use pdadmin to view the policy and also refer to “Checking the health of the policy database” on page 972 to compare the sequence number of the replica database with the Master Policy Database.

Auditing to verify the policy To enable warning mode for accesses to the protected object name, issue the following commands: pdadmin> object create /OSSEAL/Default/NetIncoming/tcp/telnet/*.company.com "company.com" 0 is yes pdadmin> pop create sample_pop pdadmin> pop modify sample_pop set warning yes pdadmin> pop attach /OSSEAL/Default/NetIncoming/tcp/telnet/*.company.com

To disable: pdadmin> pop modify sample_pop set warning no

To query: pdadmin> pop show sample_pop

Note: Wait for policy to be downloaded after changes have been made.

NetIncoming access using telnet from systems with host names that match the pattern *.company.com that would normally be denied are now permitted. An audit record is generated that shows that access was permitted due to resource warning mode. To disable warning mode, set the warning attribute to no or detach the POP from the protected object name.

Chapter 21. IBM Tivoli Access Manager for Operating Systems

971

Verify credentials You can use the following commands to verify credentials: 򐂰 The pdoswhoami command displays access info associated with the invoking user. 򐂰 The pdoswhois command displays access info associated with a specific process ID. 򐂰 The pdosrefresh command refreshes credentials. 򐂰 The pdadmin command verifies user info in Policy Director.

Checking the health of the policy database IBM Tivoli Access Manager for Operating Systems stores a replica policy database that is the exact same copy of the master policy database on the Access Manager Policy Server. Inconsistencies between the two databases may cause problems. To verify this situation, you can use the pdacld_dump utility that comes with Access Manager Policy Server (PDMgr). The file is located in the /opt/PolicyDirector/sbin directory. The utility can be downloaded to the IBM Tivoli Access Manager for Operating Systems machines and run against the replica database. Compare the sequence number reported by the tool with the sequence number reported in the pdosd.log file. The following is an example usage: /opt/PolicyDirector/sbin/pdacld_dump -f /var/PolicyDirector/db/master_authzn.db -s

Example 21-1 shows a summary for master_authzn.db. Example 21-1 Summary for master_authzn.db Dumped 4620 of 4620 objects. DB Sequence number :33121 DB SSL Sequence number :1062 FrequencyCount vs ObjectType vs BasePrefix summary 971:1281:/auth/pobject-map 0 invalid objects were encountered.

972

Troubleshooting Tivoli Using the Latest Features

A

Appendix A.

Tivoli/Windows whitepaper This appendix provides details on Tivoli implementation on Windows platforms. This chapter is intended for those responsible for planning, implementing, and supporting Tivoli Management Framework in an environment with the Microsoft Workstation and Server family. We will focus only on the basic services provided by Tivoli Management Framework (TMF) and Tivoli Management Agent (TMA). As these basic services are responsible for the spawning of processes and enforcing of security, this chapter will provide a good foundation in how Tivoli applications interact with Windows NT 4.0, 2000 and XP operating systems.

© Copyright IBM Corp. 2003. All rights reserved.

973

Scope This chapter is written to cover Tivoli Management Framework Versions 3.7.x and 4.1. This will address both the Tivoli Management Framework (TMF) and the Tivoli Management Agent (TMA). Windows NT 4.0, 2000 and XP environments are covered in this chapter.

Conventions This chapter will address both the classic Tivoli Management Framework as well as the Tivoli Management Agent. The following abbreviations have the following meanings: TMF

Tivoli Management Framework (oserv service).

TMA

Tivoli Management Agent (lcfd service).

NT

Unless otherwise noted, NT will describe NT 4.0, Windows 2000, and XP.

Prior to the General Availability release of Version 3.6, the TMA endpoint was referred to as the LCFD or LCF. This chapter will use lcfd to describe the process running on the NT system, otherwise TMA will be used to describe the general implementation of Tivoli's endpoint technology.

Tivoli Authentication Package (TAP) One of the fundamental differences between the UNIX and NT implementation of Tivoli Management Framework is the Tivoli Authentication Package. This section will explore the purpose and implementation of TAP.

Why TAP A requirement of the Tivoli Object Request Broker (oserv) and Tivoli Management Agent (lcfd) is that it be able to run methods in the context of a given user associated with the method. That is, the resources accessible to the method are those accessible to the given user. Such methods are known as setuid methods. The Tivoli Authentication Package (TAP) is installed and loaded by the Local Security Authentication (LSA) subsystem of NT system, allowing setuid methods to work on NT system.

974

Troubleshooting Tivoli Using the Latest Features

Understanding TAP While Tivoli utilizes APIs provided by Microsoft, the current security focus often has Tivoli as a risk due to the use of the Authentication Package. This perceived risk often impacts deployment schedules. In most cases, we advise you to review this chapter with the Security team, as well as highlight this section in terms of understanding the level of risk associated with using Tivoli. The Tivoli Authentication Package (TAP) functions as an extension to the Local System Authority (LSA) subsystem in Windows NT/2000/XP. TAP itself does not create tokens, and there is no way to invoke TAP functions directly. Rather, a process calls the LSA to request a token and specifies which authentication package to use. The %SYSTEMROOT%\system32\TivoliAP.dll file is Tivoli's implementation of this authentication package. It is registered with the LSA, under the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ Lsa\AuthenticationPackages. When the LSASS.EXE (LSA Service) starts, it will load TivoliAP.dll and hold the file open. This is the reason a reboot is necessary after an initial installation or a patch that revises this DLL file. The calling process provides required data in the format expected by the selected package, and the LSA passes it to the authentication package for verification. If the data is OK, the authentication package returns a SID and other information back to the LSA, which then creates a token and returns it to the original caller. Processes that call the LSA and select TAP as the authentication package must meet the following requirements to prevent unauthorized use: 1. The calling process must know the TAP protocol. Various structures and data are provided to the LSA and subsequently passed to TAP, as described above. Knowledge of this protocol requires access to the source code. 2. The calling process must know how to pass specific TME credentials. The caller must retrieve this value and pass it to TAP, as described by item 1. 3. The calling process must have SeTcbPrivilege ("Act as part of the operating system") privileges. By default, only the SYSTEM account has this privilege, but a member of the Administrators group can assign it. Impersonating a user account requires a valid process token. The oserv and lcfd processes request this token from the LSA, specifying TAP as the desired authentication package. If the LSA grants the request, it generates a process token which contains the same information as if the impersonated user had actually logged onto the local system, except the token does not contain credentials for accessing remote resources (for example, network shares) as the requested user. If a Tivoli Remote Access Account (TRAA) has been set, the token will contain credentials for accessing remote resources as the TRAA (TRAA is explained in detail in the next section).

Appendix A. Tivoli/Windows whitepaper

975

Tivoli ID maps have no effect on this procedure. They simply provide a convenient way to map a single label to specific accounts on different platforms. For example, a task configured to run as $root_user may run as Administrator on Windows NT or as root on Solaris. Once the ID map has been resolved to the actual account name, the impersonation procedure is the same as described in the first section. Tivoli ID maps contain no actual account information. The safeguard that prevents the unauthorized use of TAP is the Windows NT/2000/XP security system itself. The LSA requires the calling process to have the SeTcbPrivilege before it will grant any requests. This applies to all authentication packages, not just TAP. Only the SYSTEM account has SeTcbPrivilege by default, and only a member of the Administrators group can grant this privilege. When the Tivoli Management Framework is installed, a Tivoli_Admin_Privileges group is created and the SeTcbPrivilege is granted to this group. The member of the Administrators group who is installing TMF and the built-in Administrator account are made members of this group. The account mapped to $root_user (by default, the account that installs TMF or the Administrator account) must be a member of this group so that TMF tasks may be run as the requested user. Any process that is already running as SYSTEM or Administrator has unrestricted access to the system. Even if a rogue process were installed and running as SYSTEM or Administrator, it would still have to know the TAP password and protocol before it could use the LSA and TAP to generate impersonation tokens for accounts without knowing their individual passwords. Any attempts to discover the TAP password can be monitored by configuring Windows to log all failed logons. The TAP password on a machine may be changed to a new random value by shutting down the oserv or lcfd process, running wsettap/wlcftap -a, and restarting the oserv or lcfd process.

Understanding the Tivoli Remote Access Account (TRAA) TRAA serves the following functions: 򐂰 Used to access remote resources in NT/2000 (as if doing 'net use'). 򐂰 It is necessary, because processes spawned by TAP do not have credentials (a password). 򐂰 Used to share binaries for managed nodes (did not have to install on each system); rarely used.

976

Troubleshooting Tivoli Using the Latest Features

TRAA is not: 򐂰 Related to TAP 򐂰 Required by Tivoli Management Framework; it can be unless a Tivoli application specifically requires it.

The purpose of the Tivoli Remote Access Account (TRAA) is to allow access to remote NT/2000 resources, like network shares, through the oserv/lcfd. Although the installation and the documentation refer to the TRAA account being used to remotely share binaries, the TRAA account has much wider consequences. When a program runs in NT, a logon token is associated with that program. Logon tokens are created when a user logs in, when the LogonUser() function is called, or by authentication packages such as TAP. Logon tokens contain a user account name used to access local resources and zero or more credentials used to access remote resources. When an NT/2000 program accesses a local resource, it uses the user account referenced in the program's logon token to determine if it can access the local resource. When an NT/2000 program accesses a remote resource, such as a network share, the LAN Manager authenticates that the logon token associated with the requesting program has the correct set of credentials that allows it to access that remote resource. A credential contains an account name and that account's encrypted password. A user logging in, or a call to the LsaLogonUser() function, creates a logon token that initially has one credential. This one credential contains the user and password for the account used to created the logon token. Additional credentials may be added to a logon token. When you specify an account and password to connect to a remote disk drive, a credential containing that account name and password are added to your logon token. Logon tokens created by TAP do not have any credentials that contain the user and password for the account used to create the logon token, because the password is not used to create the token. Instead, if a TRAA has been specified using the -r option of wsettap.exe (TMF) or wlcftap.exe (TMA), TAP creates a logon token and adds the credentials of the TRAA to that token. Therefore, a program with a logon token created by TAP accesses local resources accessible to the user specified to TAP when the token was created (the setuid user), and will access remote resources using the TRAA (if any). The -r option of the wsettap.exe (TMF) and wlcftap.exe (TMA) commands is used to set and change the TRAA. The account information is stored in the registry key HKEY_LOCAL_MACHINE\Security\Policy\Secrets\Tivoli Remote Access Account Credential\. By default, HKEY_LOCAL_MACHINE\Security is unreadable.

Appendix A. Tivoli/Windows whitepaper

977

The TRAA does not have any impact on the local managed node or endpoint, and does not need to be set in order for the standard Tivoli Management Framework functions. If design requirements of the Tivoli Management Region require that some application will need access to a remote resource, the TRAA account will only need the necessary access to these remote resources. An example of using the TRAA account would be a FilePackage's afterscript that copies the install log to a remote share. Another example would be doing an admin install of an MSI from a remote share. The TRAA account could be set to \ and the share would allow write access to this share by the \ account. Another example would be tasks created to start a particular service remotely using the sc.exe or netsvc.exe command (part of the NT Resource Kit) if a TEC event arrived saying that the service was down. One design of the task could be to run the task locally on the machine as a local or domain NT administrator. Another design would be to set the TRAA account on one of the TMF or TMA nodes to a domain level administrator, and execute the task on this node. While TAP has no dependencies on password changes, the nodes will be impacted if the TRAA account's password is changed. Because each NT managed node/endpoint's TRAA account is stored locally and holds the password, a change in the account would require that the wsettap.exe (TMF) and wlcftap.exe (TMA) be run on each node to update the password for TRAA. If the account or password has changed in the domain yet the TRAA account has not been updated, oserv will not start, and lcfd will start but may come up with TAP disabled. If the TRAA account is to be used in the environment, it is best that the account and password remain static.

Order of account selection The account selection process will start first by attempting to resolve an unqualified user account using the local SAM database, and if unsuccessful, it will resolve the name in the domain, then in trusted domains. An example of an unqualified account would be where a user is specified as fred, instead of ROCK\fred. In this case, when the task is invoked on an NT endpoint, first there is a search for MACHINE\fred. If this local account did not exist, then the domain in which the system is a part of is searched; finally, any trusted domains are searched in an attempt to resolve the account. Especially in complex domains with various trusts, it is best to fully qualify the user ID.

978

Troubleshooting Tivoli Using the Latest Features

wsettap.exe/wlcftap.exe The wsettap.exe and wlcftap.exe programs manage the TMF and TMA nodes respectively in several ways. 򐂰 Enables (-a) and disables (-d) the Tivoli Authentication Package from the LSA list of Authentication packages 򐂰 Enables and modifies (-r and -k) the TRAA account. 򐂰 Introduced in Tivoli Management Framework Version 3.7.1: selects which Domain Controllers are targeted for authentication requests. 򐂰 -P: Targets only the Primary Domain Controller (PDC); this is the default setting. 򐂰 -B: Targets any Domain Controller (Primary or Backup); this is the suggested setting.

It is often said that the TAP account has been modified using wsettap/wlcftap. This is not technically correct, as there is no such thing as a TAP account. When the wsettap -r/wlcftap -r is issued, the TRAA account is modified. Because the TRAA account is configured using the same command that also configures TAP behavior, these two concepts frequently get confused.

Tivoli accounts This section will discuss the accounts created by the Tivoli installation process, and how certain enterprise policies affect TMF and TMA on NT.

Accounts created Tivoli introduces two accounts at installation time. The accounts are the user tmersrvd and the group Tivoli_Admin_Privileges. These accounts are created locally on the NT/2000 target's SAM database and are configured the same for TMF and TMA. On Windows NT 4.0 only, you can view User Rights changes in User Manager under the Policies menu option. On Windows 2000 and Windows XP, you can view User Rights within the appropriate Security Policy editor (Local Security Settings on a member server and Domain Controller Security Policy on Active Directory Domain Controllers). User rights can also be granularly changed using the ntrights.exe application in the resource kit by running ntrights +r SeChangeNotifyPrivilege -u tmersrvd).

tmersrvd The tmersrvd account is an unprivileged account. A password is randomly generated at installation and the account can be disabled without affecting the

Appendix A. Tivoli/Windows whitepaper

979

framework. Many of the Tivoli methods will run in the context of tmersrvd. The password can also be changed with no adverse effect on Tivoli. The user rights required are: 򐂰 Bypass Traverse Checking (SeChangeNotifyPrivilege)

The tmersrvd account does not get assigned to this User Right directly. A new install of NT will assign the special group 'Everyone' to Bypass Traverse Checking (In non-US versions of NT, this group will be referred to as the local language equivalent). This allows a user to traverse to a specific directory tree even if the user has no access to the parent directories. If security policies in the corporate enterprise disallow Bypass Traverse Checking for Everyone, the tmersrvd account needs to be added directly to this user right. 򐂰 Log on Locally (SeInteractiveLogonRight)

This is assigned to the tmersrvd account during installation of the Tivoli Management Framework. 򐂰 Read/Execute file permissions on %Systemroot%\system32\ (in order to utilize TAP)

– Under Windows NT, the default file permissions were typically Everyone:Read/Write/Execute. – Under Windows 2000/XP, these permissions were tightened and Everyone was removed. It is important that tmersrvd explicitly has permissions to this directory. 򐂰 Read/Write/Execute file permissions on %TEMP% (typically %Systemroot%\TEMP)

– Under Windows NT, the default file permissions were typically Everyone:Read/Write/Execute. – Under Windows 2000/XP, these permissions were tightened and Everyone was removed. It is important that tmersrvd explicitly has permissions to this directory.

Tivoli_Admin_Privileges This group is, by default, assigned to the built-in administrator, unless the TMR Server is NT, in which case the account used to install the TMR Server will be assigned to this group. It has three required advanced User Rights: 򐂰 Act as Part of the Operating System (SeTcbPrivilege)

The user can act as a trusted part of the operating system. 򐂰 Increase Quotas (SeIncreaseQuotaPrivilege)

980

Troubleshooting Tivoli Using the Latest Features

The user can increase object quotas. Each object has a quota assigned to it. 򐂰 Replace a process level token (SeAssignPrimaryTokenPrivilege)

The user can modify a process' access token. Note: Windows 2000 does not always honor the inheritance of these privileges from the Tivoli_Admin_Privileges group. See “Common problems, troubleshooting, and FAQs” on page 1022 for more information.

The Act as Part of the Operating System privilege is required when running the wsettap/wlcftap command with no options specified, because it communicates with the LSA to retrieve the current configuration of TAP. Other operations of the wsettap/wlcftap command communicate with the registry and not with the LSA, so they do not require a special privilege (any invocation of the wsettap/wlcftap command must be from a member of the Administrators group). The privileges Increase Quota and Replace a Process Level Token assigned to the Tivoli_Admin_Privileges group are the privileges required to start a process as a different user. The run_task and sentry_engine methods, for example, require these privileges. These methods run as the $root_user. This means if you change the value of the $root_user idmap, you must ensure that the account is a member of the Tivoli_Admin_Privileges group. If the account is not part of the Tivoli_Admin_Privileges group, TMF/TMA nodes will receive tap_call_init failed, error 38.

Accounts used by Tivoli Management Framework There are at least six types of accounts that Tivoli Management Framework can use in the environment: initial installation of TMF/TMA, system, privileged, unprivileged, idmap, TRAA, and any user-defined accounts.

Installation account The installation account is used at the time that the TMF or TMA files are being installed. For TMF, this is when the initial framework is being installed through the classic install or SIS. For TMA, this would be when the endpoint software is being installed through SIS or winstlcf. This installation account can be a local account or a domain account that is resolvable by this node. This account has two requirements: 򐂰 Part of the Administrators group 򐂰 Log on Locally user right

Appendix A. Tivoli/Windows whitepaper

981

Once the Tivoli Management Framework (TMF) or Lightweight client (TMA) has been installed, this installation account is not used. Therefore, the account can be revoked or disabled. Important: The account used for installation of an NT TMR Server does have a special role. When installation completes, the $root_user idmap will map that installation account to the w32-ix86 interp. Therefore, prior to removing/disabling this account, set the $root_user idmap to another Administrator and map the new login to the TivoliRoot Administrator (see also the wauthadmin command). Also, be sure to include this new Administrator in the Tivoli_Admin_Privileges group.

For example, a company creates a domain administrator MASTER\tivinstall. The MASTER domain is the top-level domain and there are several resource domains that have a 2-way trust relationship with MASTER. When TMF/TMA nodes are being installed, an NT Administrator in the MASTER domain will enable the account and provide a password to the Tivoli Administrator responsible for installation. This Tivoli Administrator logs into the TMR as their normal account, and specifies this user and password while installing the nodes. After completing the installations, the NT Administrator will disable MASTER\tivinstall until additional installations are required.

System account Tivoli Management Framework runs the oserv (TMF) and lcfd (TMA) service as NT AUTHORITY\SYSTEM. In addition, TMF installations will also have the spider.exe (HTTP server) and possibly the gateway.exe (used to communicate to TMA endpoints) processes also running as NT AUTHORITY\SYSTEM.

Privileged account Tivoli Management Framework uses the privileged account when a management function of the Tivoli Management Framework needs access to privileged resources on the NT system. Tivoli Management Framework will request that a certain program be started as this privileged account to access this resource.

Unprivileged account The unprivileged account is the tmersrvd account; it is analogous to the nobody account on UNIX. Tivoli Management Framework will attempt to run many executables as this user to minimize any possible security compromises. If the tmersrvd account is not found on the local NT's SAM database, it will look to the domain for this account. This tmersrvd account must exist in order for Tivoli to function properly.

982

Troubleshooting Tivoli Using the Latest Features

As noted earlier, this account has no real access. However, if security concerns require that all local accounts must be disabled, the tmersrvd account can be disabled, if that account is in the local account database. Depending on your Tivoli Management Framework patch level, you may disable the tmersrvd account on the domain/Active Directory. If you are experiencing problems, an easy test is to enable the domain tmersrvd account. One consideration regarding using a domain-based tmersrvd account: due to the load placed on the Primary Domain Controller, it is possible to saturate the PDC/BDCs with authentication calls.

ID map account Because Tivoli Management Framework spans a heterogeneous environment, the idmap was introduced to provide a means of mapping a special ID (referred to as an idmap) to an OS-specific user account. On NT, the idmap may contain a reference to w32-ix86, which is the definition within Tivoli to describe an NT node. The idmap $root_user is pre-configured to resolve to the account Administrator on NT/2000. This map is used for various processes on NT. When the oserv/lcfd service is asked to resolve the method that is to run as $root_user, it will look for the string Administrator. Beginning with NT 4.0, the idmap $root_group or another group ID must exist for an administrator. Idmaps can be modified to reflect a naming convention within an enterprise using the widmap command: # widmap list_maps root_group root_user tme_sec # widmap list_entries root_user default root nw3 supervisor nw4 Admin os400-v3r2 QTIVROOT os400-v3r7 QTIVROOT w32-ix86 Administrator #widmap rm_entry root_user w32-ix86

(This is optional; if you add_entry to an existing entry, it will be replaced) #widmap add_entry root_user TivAdmin w32-ix86 #widmap list_entries root_user w32-ix86 TivAdmin

Appendix A. Tivoli/Windows whitepaper

983

The root_group idmap for NT is not actually used when a process starts. However, it is important that the root_group map has a group listed for NT, although it does not need to be a privileged group.

TRAA account As described in “Understanding the Tivoli Remote Access Account (TRAA)” on page 976, the TRAA account is used when a Tivoli process must access a remote resource. By default, no Tivoli process requires that the TRAA account be defined. When defining the TRAA account, it is important to identify the reason for this account (that is, to enable a task to run a domain command or having a software distribution package afterscript write a file to a remote share). With these needs identified, create this domain account with the necessary rights. Again, the TRAA account needs only access to the resource defined. If a password change has occurred without updating the node(s) where the TRAA is set, oserv will fail on startup, lcfd nodes will restart but TAP may be disabled, and TRAA will be unusable. An example would be a company that creates the domain account MASTER\tivuser. MASTER\tivuser is granted write access to a net share called \\SERVER\tivfiles. This file allows only the user MASTER\tivuser write access and the Administrators group would have full access. MASTER\tivuser's password is set to not expire, and only the Tivoli Administrator responsible for installation knows the password. Because MASTER\tivuser only has this limited set of rights, there is a low risk of this account being compromised and attacking other resources.

User defined accounts Tivoli Management Framework can be configured to use other user accounts as design needs warrant. There are several areas that allow a Tivoli administrator to define a certain action to run as a certain user. A task, for example, is created to manage MSSQL, and the task is defined to run as a given MSSQL Administrator.

Changes to NT accounts used by Tivoli Management Framework Because of the TMF/TMA design based on the CreateProcessAsUser() system call, any process spawned to access local resources will not be affected by passwords. The exception is the TRAA account. Changing the password will require resetting of the password for any TMF/TMA's TRAA affected by the change. One method of automating this change is to create a task that will execute as the Administrator and issue the wsettap/wlcftap command with the new account.

984

Troubleshooting Tivoli Using the Latest Features

With the exception of the tmersrvd account, accounts that have been disabled, expired, or locked out will fail when the oserv attempts to start a process as this user. This is due to TAP checking the status of this account prior to passing SID information back to LSA.

Privileged account comparison between Framework versions Note: This section is extremely important, as there have been changes in the use of the privileged accounts between Framework versions 3.6 and 3.6.3. For specific details on how to determine the 'SET_USER' value of TME methods, see “Options for SET_USER” on page 1040.

TMF/TMA when SET_USER=root Prior to Version 3.6, the built-in administrator account was used for most processes needing to run in the context of a privileged user. 򐂰 SET_USER=root maps to the local, built-in NT Administrator. 򐂰 This is a special account reserved by NT, and has full rights to the system. 򐂰 The account is defined as SID 500. Tivoli calls this special SID rather than the name 'Administrator' 򐂰 The account can be renamed and oserv/lcfd will run as the renamed account, since the SID is the same. 򐂰 The account cannot be demoted in privilege; therefore, all Tivoli privileged processes will have access to the local resources.

TMF/TMA when SET_USER=$root_user Version 3.6 through 3.6.2 will run most privileged methods as the user name obtained using the $root_user. This provides the Tivoli Administrator with the ability to define the user for all Tivoli privileged processes. Several points need to be understood, however: 򐂰 SET_USER=$root_user does not use SID 500, so the account name in $root_user idmap must map correctly to a user account in local or domain SAM. Failure to map the account name will cause the tap_get_sid_logon_token failed error. 򐂰 The account must be part of the NT Administrators and Tivoli_Admin_Privileges group. 򐂰 The account must have the UserRight Log on Locally 򐂰 If the MACHINE\Administrator account is not renamed, there are no modifications to the $root_user idmap necessary unless there is a need to run

Appendix A. Tivoli/Windows whitepaper

985

the Tivoli Management Framework privileged programs as another local or domain account. 򐂰 If the MACHINE\Administrator account is renamed or the design of the TMR dictates using a domain account for privileged accounts:

– MACHINE\Administrator renaming is consistent on all TMF/TMA nodes, or a local Administrator account is created on all TMF/TMA nodes. The account would not have to be called Administrator. Rather, it could be called anything, but would need to be consistent on all TMF/TMA endpoints and the root_user idmap can be updated to reflect the new account name – If a domain account is used, and there is a failure in communicating to the primary domain controller, TMF/TMA could adversely be affected, as the account's SID could not be obtained to place in the token structure. This would cause the oserv/lcfd to not be able to spawn the requested process.

The TMF/TMA in Framework version 3.6.3 and higher One of the limitations with the use of the $root_user idmap design introduced in 3.6 was dealing with decentralized control of user accounts on NT machines, as well as the dilemma of managing accounts in a multi-domain environment. Version 3.6.3 addresses this limitation by reintroducing the ability to run all privileged methods as the SID 500 Administrator. In order for this to be enabled, the $root_user idmap must have a special user keyword for w32-ix86. To enable this behavior, set the $root_user idmap to BuiltinNTAdministrator (this is case-sensitive). Once set, all privileged methods will run as the SID 500 Administrator. Important: From Tivoli Management Framework Version 3.7b through Version 3.7.1-TMF-0005, this functionality was broken. If you are upgrading from 3.6.x to 3.7.1+ and are using BuiltinNTAdministrator, this will impact you! It is necessary to set your $root_user idmap to something other than BuiltinNTAdministrator if you are within this range of product versions. After you are beyond 3.7.1-TMF-0005 (endpoint version 93), you can use BuiltinNTAdministrator.

Examples of Account Management using different TME versions 򐂰 Example of Tivoli Management Framework Version 3.6 using a local SAM account

Spinal Tap Incorporated has several NT domains. Each of these NT domains are located in a given geographical location, and are linked to the corporate office via a 128 KB FrameRelay. In the corporate office, the master domain

986

Troubleshooting Tivoli Using the Latest Features

has 2-way trusts with each of the geography domains. Each geography manages their domain separately, and there is no way to enforce consistent naming conventions in all the domains. The company does not have a consistent naming convention for the built-in Administrator account (SID 500), other than that it is a mandate to rename the account. One solution is to use a domain account in the master domain for the $root_user idmap. However, because of the slow links, using a domain account could impact the performance and reliability of Tivoli Management Framework. Instead, the company creates a new local account on each NT named Nigel. This account is then added to the Administrators and Tivoli_Admin_Privileges group. The $root_user idmap is then changed to reflect this name for w32-ix86. In this case, because the account is local, all authentication for the privileged account will occur locally. 򐂰 Example of Tivoli Management Framework Version 3.6 using a domain SAM account

Acme Sprockets' NT domain design consists of a Multiple Master Domain with 2-way trusts between the various master and resource domains. Each domain is part of the campus network, and is centrally managed. The corporate policy is to rename the built-in Administrator account on each NT workstation and server, along with the domain SAMs. Acme creates a domain account on the master domain called TivPriv. The idmap is then set to the new user: – widmap rm_entry root_user w32-ix86 (Remove the Administrator reference) – widmap add_entry root_user w32-ix86 MASTER\\TivPriv (Note the double '\' ; this is to escape the '\'.) In this case, for every processes running as privileged, it will authenticate to the MASTER domain and get the SID value for TivPriv.

Security This section will address concerns regarding how Tivoli is configured for security on TMF/TMA nodes.

TME functions dependent on NT/Active Directory environment Here are the TME functions dependent on NT/Active Directory environment.

Appendix A. Tivoli/Windows whitepaper

987

Functional areas During the initial design phase or the actual implementation, there are several areas within TME that are tied closely to the underlying authentication and user access design provided by the Microsoft NT environment. If it is determined that these functions will be used, it is important to understand the ongoing changes within the NT environment, and to be sure you are made aware of any changes to that environment. 򐂰 TRAA Account (access to remote objects, like shares) 򐂰 Use of tasks 򐂰 Automated responses in IBM Tivoli Monitoring 򐂰 Tivoli Administrator's access to the Tivoli environment (both CLI and GUI) 򐂰 Installation 򐂰 Log file adapters not installed as a service 򐂰 Scheduled jobs

Considerations when renaming accounts used by TME Changes to accounts, such as the $root_user, can have an impact on the overall stability and reliability of your Tivoli environment. When change does take place for accounts, the following planning is recommended: 򐂰 Create a new account rather than rename the account when the change is necessary. 򐂰 Change the idmap to reflect the new account:

– Most changes will take effect immediately in the environment in terms of short-lived processes, such as tasks, jobs and CLI access. – Long-running processes such as adapters will require a restart to reflect the change. 򐂰 The TRAA account requires special care, as this must be changed on each node to reflect the change. 򐂰 Once the changes have take place, disable/remove the previous account.

Security and TME: PDC/Windows NT or mixed NT/Windows 2000 environment In this section you will find security considerations in PDC/Windows NT or mixed NT/Windows 2000 environment.

988

Troubleshooting Tivoli Using the Latest Features

Authentication to the Primary and Backup Domain Controller TAP's design prior to TMF 3.7.1 was to request all domain user authentications from the Primary Domain Controller and bypass any local Backup Domain Controllers. The reasoning for this design focused on the premise that the PDC is the only true source of valid user information and status, while Backup Domain Controllers could "fall behind” or fail to replicate user changes. This design, however, does have some shortcomings: 򐂰 If network outages exist, user authentication will fail due to a lack of access to the PDC. 򐂰 Domain controllers managing large numbers of users could potentially become overloaded and unable to respond to requests.

These situations would introduce instability to the Tivoli Management Framework, as it was not possible to execute a given task, as TAP would not be able to create the correct token. As well, it was realized that changes in user status within a NT Domain was rare, and any changes were typically updated across all BDCs well before any Tivoli task was called on to run as that user. In 3.7.1, TAP was enhanced to be capable of authenticating to a PDC or a BDC. This feature is enabled using the wsettap/wlcftap command, with the parameter '-B|-P': 򐂰 -B will enable authentication to any Domain Controller. This is the suggested setting, though not the default, since prior releases of Tivoli Management Framework behaved like the -P flag. 򐂰 -P will enable authentication to only the Primary Domain Controller. This is the default when TME is installed.

Account creation on PDC/BDC Another issue pertaining to Primary and Backup Domain Controllers will be the creation of the Tivoli accounts at installation. It is recommended to install TMF/TMA on the Primary Domain Controller first, and then synchronize the backup domain controllers to allow the newly created accounts to propagate. If an installation is attempted first on a Backup Domain Controller, the installation will fail because the accounts have not been updated on the Primary Domain Controller. Either wait 15 minutes for the domain servers to resynchronize and attempt the installation again or force the synchronization. It is only necessary to install on the PDC once during this cycle; in the past, there have been misunderstandings regarding this task, which led some customers to believe the Tivoli code on the PDC had to be re-installed prior to every BDC installation. This is not the case. Once the PDC has been successfully installed, the accounts will be visible from the BDCs and installation can be conducted on any BDC in that domain.

Appendix A. Tivoli/Windows whitepaper

989

Authentication on BDC While the BDC does have a copy of the domain SAM database in place, TAP will still authenticate to the PDC account, unless the -B option of wsettap/wlcftap is used in 3.7.1.

Account Management with PDC/BDC Many NT environments use several NT domains to manage the environment. One common design is the use of a Master domain and then resource domains that are two-way trusted with the Master domain. If design requirements demand the use of a domain account for Tivoli Management Framework and the NT domains are configured similarly to the model described above, one could create an account in each of the domains with the same name: MASTER\TivAdmin US\TivAdmin EUROPE\TivAdmin JAPAN\TivAdmin

The root_user idmap would map w32-ix86 to TivAdmin. When a TMF/TMA node runs a privileged process, the account selection will occur in the following order: 1. Is TivAdmin in the local SAM database? If so, use it. If not, continue. 2. Is TivAdmin in the domain in which this system is a part of? If so, use it. If not, continue. 3. Is TivAdmin in any trusted domain? If so, use it. If not, a failure occurs, because the account cannot be found. Using the same model, assume a given task must run as MASTER\TivAdmin, since this is part of the MSSQL login list. One would need to specifically define MASTER\TivAdmin in the UID field of the task, or create a new idmap that resolves to this account. Had MASTER not been part of the specified account, a node in the JAPAN domain would get the SID for the JAPAN\TivAdmin rather than the MASTER\TivAdmin.

Security and TME: Active Directory and Windows 2000/XP In this section, you will find security considerations for Active Directory and Windows 2000/XP.

990

Troubleshooting Tivoli Using the Latest Features

Tivoli Authentication Package integration with Active Directory With Windows 2000, Microsoft introduced several changes to the security subsystem. These included: 򐂰 Kerberos 5 򐂰 NTLM: NT LAN Manager can be disabled 򐂰 Flexible Single Master Operation (FSMO) role: PDC Emulator

The Tivoli Authentication Package had been enhanced in 3.6.2 and later to use the Kerberos style credential instead of the NTLM style credential.

Necessary changes for TME to properly function in Active Directory environments In order for Tivoli Management Framework (and supporting applications) to function in an Active Directory environment, "Compatibility mode" or a "Compatibility mode work-around" must be implemented. "Compatibility mode" involves placing the "Everyone" group within the "Pre-Windows 2000 Compatible Access" group. Effectively, what this does is to give Active Directory permission to answer null/anonymous queries for user/group information in a more NT-like fashion. Having this setting active is considered a security weakness that many customers have protested implementing. As a result, a more granular approach to providing these capabilities with Active Directory has been investigated by Tivoli Support and is now offered as a supported alternative.

Gaining a better understanding of the "Pre-Windows 2000 Compatible Access" group The intended usage of the "Pre-Windows 2000 Compatible Access" group is often misunderstood. This section provides more detail on the "Pre-Windows 2000 Compatible Access" group to help give a better understanding of its intent and also to help illustrate how Tivoli's proposed work-around accomplishes what we need. The best way found to describe the behavior of the "Pre-Windows 2000 Compatible Access" group is to put it into a sentence. While the sentence used does not fully describe all uses of the "Pre-Windows 2000 Compatible Access" group, it is sufficient for this discussion: Members of "Pre-Windows 2000 Compatible Access" group] are able to make queries regarding any user in AD.

Appendix A. Tivoli/Windows whitepaper

991

򐂰 If "Everyone" were added to the "Pre-Windows 2000 Compatible Access" group, the sentence would read: EVERYONE is able to make queries regarding any user in AD.

Note: Everyone includes users we do not even know, including unauthenticated users such as null/anonymous, which is the intent of adding "Everyone". "Everyone" is the only container that contains "anonymous" in Windows 2000. In Windows XP and the not-yet-released Windows .NET (formerly Windows 2002 Server) an "Anonymous" account does exist and can be added to the "Pre-Windows 2000 Compatible Access" group to accomplish the desired behavior. 򐂰 If the "Pre-Windows 2000 Compatible Access" group were empty, the sentence would read: NO ONE is able to make queries regarding any user in AD.

The two examples above are the only viable configurations for the "Pre-Windows 2000 Compatible Access" group. This is why if a specific user (such as TMERSRVD) were added to the "Pre-Windows 2000 Compatible Access" group, the sentence would read: TMERSRVD is able to make queries regarding any user in AD.

The problem here is that it is not TMERSRVD that makes the query, it is "anonymous". So, adding TMERSRVD to "Pre-Windows 2000 Compatible Access" has no effect whatsoever. This holds true for any account added to the "Pre-Windows 2000 Compatible Access" group. Here is how this factors into the work-around: The intention of the work-around is the make the end of the sentence, "any user in AD", more granular. Tivoli does not need to query any user in the environment, only an extremely small subset of users, which are detailed later. Utilizing the DSACLS commands (also detailed later), we are able to mimic the behavior of the "Pre-Windows 2000 Compatible Access" group, but for a specific user, not "any user in AD". As a result, the sentence we want to construct is: EVERYONE is able to make queries regarding [specific user] in AD.

Remember that "Everyone" includes anonymous, so that is the only thing we can use.

992

Troubleshooting Tivoli Using the Latest Features

After issuing the DSACLS commands described later, our sentence reads: EVERYONE is able to make queries regarding TMERSRVD in AD.

This is exactly what we are looking for.

Available options for Tivoli to function in an Active Directory environment 򐂰 Option #1: Activate full "Compatibility Mode"

The first available option available to allow Tivoli to function properly was first proposed by Tivoli originally. This involves opening the entire environment to anonymous queries. To accomplish this on a Windows 2000 Domain Controller, issue the following command: net local group "Pre-Windows 2000 Compatible Access" everyone /add

On Windows XP and Windows .NET Server, an "Anonymous" account actually exists. Also, it is possible to configure your policy settings such that "Anonymous" is not part of "Everyone". As a result, the best way to enable compatibility is to add "Anonymous" to the "Pre-Windows 2000 Compatible Access" group: net local group "Pre-Windows 2000 Compatible Access" Anonymous /add

While this is by far the simplest approach, it's also the least secure. Depending on your environment, this may or may not be an acceptable practice. 򐂰 Option #2: Allowing anonymous queries for specific users

In order to utilize this option, several requirements must first be fulfilled: a. The DSACLS.EXE command from the Windows 2000 Support Tools is needed. This file can be found after installing the Support Tools from the \SUPPORT\TOOLS folder on any Windows 2000 installation media. b. The domain-level accounts that Tivoli utilizes must be identified. These accounts may include: i. The tmersrvd account (Tivoli's "nobody"-equivalent account on Windows). ii. The "root_user" account for the w32-ix86 interpreter. To check this, issue widmap list_entries root_user on your TMR Server(s). iii. Any account defined in Tivoli Tasks/Jobs. iv. Any account defined in IBM Tivoli Monitoring profiles. In order to permit anonymous queries regarding specific accounts, the following commands need to be issued (after modification for your environment) on a Domain Controller.

Appendix A. Tivoli/Windows whitepaper

993

The example commands in Example A-1 would be issued for the tmersrvd account located in the domain foo.com ([email protected]). Example: A-1 Allowing anonymous queries for specific users dsacls "CN=tmersrvd,CN=users,DC=foo,DC=com" /G "\Everyone:GRRCRPLOLC" /I:T dsacls "CN=tmersrvd,CN=users,DC=foo,DC=com" /G "\Everyone:RP;Remote Access Information" dsacls "CN=tmersrvd,CN=users,DC=foo,DC=com" /G "\Everyone:RP;General Information" dsacls "CN=tmersrvd,CN=users,DC=foo,DC=com" /G "\Everyone:RP;Group Membership" dsacls "CN=tmersrvd,CN=users,DC=foo,DC=com" /G "\Everyone:RP;Logon Information" dsacls "CN=tmersrvd,CN=users,DC=foo,DC=com" /G "\Everyone:RP;Account Restrictions" The series of commands above need to be repeated for each every identified user in your Tivoli environment

Things to consider: 򐂰 The commands mentioned Example A-1 will likely need to be issued on one Domain Controller in every Domain within your environment. This would not be the case if all of your accounts were defined exclusively at a top-level domain, but user accounts often do not reside in the "root" Domain. 򐂰 Accounts that are local to every machine in your environment (Member Servers, Workstations, and Domain Controllers) will not need to be opened to anonymous queries. This would be the case if, for example, your root_user account is set to BuiltinNTAdministrator, since that indicates to always use the local (SID 500) Administrator account. In general, the requirement for anonymous queries only comes into play when Tivoli is attempting to utilize a non-local account.

Use of Active Directory's Group Policy and TME Some customers are using the Group Policy to manage the mapping of accounts on a large group of nodes in the enterprise. This method of account management within Active Directory is useful for better managing the policies across an entire enterprise, rather than having to manage individual systems. In environments where Group Policy is taking place, the account management is also done within these policies. There are two concepts that are required to ensure the TME environment functions properly. First, it is critical that the root_user idmap points to a Privileged account that is part of the Group Policy. Additionally, that account must also be present in the Tivoli_Admin_Privileges group.

994

Troubleshooting Tivoli Using the Latest Features

Second, the use of the Group Policy eliminates the need for the local tmersrvd account and Tivoli_Admin_Privileges group on the local system. Once the Group Policy is deployed, the local accounts can be removed. The account selection that is highlighted in “Account Management with PDC/BDC” on page 990 is altered in that the 'local' account is in reality the accounts that are members of the group policy.

File system considerations TMF requires that it be installed on an NTFS file system. During the installation, Tivoli will check that the target drive is an NTFS drive. TRIP does not require NTFS. It is installed on the C drive by way of the remote installation with the designated CurrentNtRepeat. TMA does not require installation on an NTFS file system. However, due to the insecure nature of the FAT file system, it is advisable to install the TMA files on an NTFS file system.

Permissions on Installation Directories: Base Tivoli/LCF directory The base directory for Tivoli Management Framework's TMF/TMA install will be installed with the following directory permissions: 򐂰 Administrators:Full Control 򐂰 System:Full Control 򐂰 Creator Owner:Full Control 򐂰 Everyone:Change 򐂰 Server Operators:Change

%TEMP% Depending on how you are currently logged into a system, you can get different results from a request to provide the path to %TEMP%. With an interactive (GUI/Explorer) login to a Windows 2000 system, %TEMP% maps to a directory off C:\ Documents and Settings\\Local Settings\TEMP. If you telnet into a Windows 2000 machine, %TEMP% maps to %Systemroot%\TEMP\ (this is the path we are concerned with). The tmersrvd account needs read/write/execute file permissions to %TEMP% (typically %systemroot%\temp).

%SYSTEMROOT%\system32 The tmersrvd account will require read/execute access to %SYSTEMROOT%\system32. If this directory is restricted to only

Appendix A. Tivoli/Windows whitepaper

995

Administrators, and the Everyone group is not part of the access control list for %SYSTEMROOT%\system32, TMF/TMA will fail. To correct this, add “Bypass Traverse Checking” to the tmersrvd account. “Bypass Traverse Checking” is referred to differently on non-US versions of NT system. For example, the French equivalent is “Outrepasser le contrôle de parcours”.

Registry access By default, there is no Tivoli process that manipulates the NT Registry. The commands wsettap and wlcftap will add or remove a value in LSA's AuthenticationPackage key. wsettap, wlcftap, wmailhost, wlocalhost, and Desktop For Windows will add and modify Tivoli specific keys as well. However, applications like IBM Tivoli Configuration Manager and IBM Tivoli Monitoring can use other accounts to edit/view the registry, depending on modifications done by the Tivoli administrator when configuring the profile.

Tivoli Management Framework install and removal This section will discuss the specifics of an NT TMF/TMA installation and uninstall. “Preparing NT for a Tivoli installation” on page 1002 will provide information on how to prepare an NT in advance for a Tivoli installation.

Installation of the Tivoli Remote Installation Package (TRIP) The Tivoli TMF (and TMA through SIS) remote installation requires that the target is running either rexec or rsh. As NT does not provide for either, Tivoli introduced the Tivoli Remote Installation Package (TRIP) as part of the installation sequence to remotely install an rexec process on the target NT. When the Tivoli Administrator creating the TMF/TMA node selects Install, the installation process takes these steps: 򐂰 The TMR Server looks up the CurrentNtRepeat machine, which is an NT managed node that already has TRIP installed (TRIP does not need to be running on the CurrentNtRepeat to remotely install the rexec service). 򐂰 Using the NT API “OpenService()”, the CurrentNtRepeat machine will check to see whether TRIP is already running on the target node. If so, it will proceed to the creation of the directories. 򐂰 The CurrentNtRepeat node will attempt to map the \\\c$ drive using Server Message Block requests.

996

Troubleshooting Tivoli Using the Latest Features

򐂰 The CurrentNtRepeat will attempt the mapping of the drive as the Default Access Account user specified in the Client Install window. 򐂰 Once mapped, the CurrentNtRepeat node will copy the necessary files to c:\Tivoli\TRIP. Once copied over, the service will be created with the trip -machine command that is executed from the CurrentNtRepeat machine. At this point, the target NT is running rexec as SYSTEM.

Knowing the domain of the CurrentNtRepeat machine is important when creating new NT TMF and TMA. If the CurrentNtRepeat node is in a domain that is not trusted by the target node's domain, the installation will fail. Conflicts on the target NT system will occur if there is already an rexec package, or if another process has port 512. NT 3.51 SP5 introduced an enhanced spooler process that would use port 512. In this scenario, the conflicting service will need to be shut down until after the installation is complete. In addition, other software packages such as XSM use their own rexec process, which you should be aware of to look for reasons for failure to start the service (Service Specific Error 8.) REXEC's assigned port is 512. %SYSTEMROOT%\system32\drivers\etc\services is an NT file that has, among other services, the entry for rexec (exec tcp512). If this entry is modified to another port, the installation will fail, as the installation expects port 512.

Creation of a Tivoli managed node (TMF) Once TRIP is installed, the installation at the TMR will next send a series of scripts to the target to create the directories defined by the Tivoli administrator. The main check done here is that the file system that the install will be placed in is NTFS and that there is enough disk space available. This sequence is the same using the classic installation method or the new Software Installation Service (SIS). With SIS, the Tivoli administrator is prompted for the TRAA account along with the installation directories.

Creation of Tivoli-specific accounts on target node The last step for the pre-installation is the creation of the Tivoli-specific accounts on the target NT. An executable (ntconfig.exe) is executed on the target and will run as the user defined in the Default Access Account. It will be responsible for creating the tmersrvd account as well as the TIVOLI_ADMIN_PRIVILEGES group, and assigning the required user rights.

Setting the TRAA account If this is the classic installation method, the Tivoli administrator will be prompted to enter the Tivoli Remote Access Account (TRAA) after selecting Continue in the preliminary portion of the install. As noted above, this account is used when a

Appendix A. Tivoli/Windows whitepaper

997

Tivoli Management Framework process accesses a remote NT object or if you install the binaries and libraries on a remote drive. By default, no Framework processes will access remote objects, but if there is a user-customized script or if a task is executed on the NT, the TRAA account will be necessary for the process to have access. There are three options: 򐂰 None

This provides no TRAA account. Be aware that if this is a reinstall of an NT managed node, and there was a TRAA account set prior, selecting NONE will keep the old TRAA account intact 򐂰 Use Default

This will use the same account and password that was specified in the Default Access Account. This is not an ideal choice, as this will grant the TRAA account the full rights that the Default Access Account has, and is a potential security risk. 򐂰 User defined

Defines an account other than the Default Access Account.

Installation of the Tivoli Management Framework files The Framework files are installed by a process on the target node called sapack. This process is rexec'ed to the target, and runs from the Tivoli database directory. This process runs as the user defined in the Default Access Account. The Tivoli installation groups the files by their type, such as binaries, message catalogs, and OS-independent files. During this installation, sapack will lay down the files directly from the incoming network stream. There is no staging done on any of the drives.

Creation of the client database The creation of the client database is broken up into two portions. The first deals with just starting the oserv on the target, and the second is the creation of the various objects on the target's database to make it a TMF node.

Starting the oserv for the first time The start of the oserv on the target is managed with the $BINDIR/TAS/INSTALL/install2.cfg command. 򐂰 Checks for any dispatchers on port 94 򐂰 Determines the database directory 򐂰 Creates the oserv service using the command:

998

Troubleshooting Tivoli Using the Latest Features

oinstall -install

򐂰 Copies $BINDIR/bin/TivoliAP.dll to %SYSTEMROOT%/SYSTEM32/TivoliAP.dll Note: In addition to copying, it registers it in the Windows Registry as well. 򐂰 Checks to see if TAP is available. If not, it starts the oserv with a '-u' flag 򐂰 If the host name does not match the label that Tivoli assigns to the target, it adds the name to the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform\wlocalhost 򐂰 Starts the oserv and attempts to contact TMR Server. It uses the following switches: i

Initializes the client database.

h

TMR Server label.

k

Path to database directory.

b

Binary directory.

l

Library Directory.

u

Used for NT. Bypasses TAP for the initial install.

At this point, the oserv service is running as SYSTEM. If the -u flag was used, all other Tivoli processes will also run as SYSTEM.

Configuring the client database Once the oserv service is running, the next step is to create the objects locally to populate the database. These processes are started from the TMR Server using $BINDIR/TMF/BASESRVC/client.cfg and will start as the built-in Administrator (Framework version 3.6.2 or earlier) or the $root_user idmap, if the -u option is passed to the lcfd/oserv (Framework version 3.7 or later). If this is a reinstalled NT system, and TAP is enabled, there will be several processes running as tmersrvd.

Installation of the Tivoli Management Agent (TMA) This section discusses the installation of the Tivoli Management Agent for NT. The two methods discussed are SIS and a command line style installation similar to TRIP using winstlcf.

Appendix A. Tivoli/Windows whitepaper

999

Installation of TMA using SIS The SIS installation is similar to the creation of a TMF node. It requires that TRIP be installed on the endpoint. If TRIP does not exist, it will utilize the CurrentNtRepeat method like the TMF installation does.

Interrogating the target With TRIP functioning, SIS will first push several executables into c:\temp\deploy. The first is sis_sh.exe, which is a Borne Shell for NT system. The second is worldname.exe, which will identify the local language equivalent of Everyone. Next, it will identify whether the %SYSTEMROOT%\system32\kbdus.dll exists. Finally, a remote execution of ping will enforce that the node is able to resolve the TMR Server's host name.

Creation of Tivoli-specific accounts The executable ntconfig.exe (found in the TRIP directory) will be executed, creating the tmersrvd account and the Tivoli_Admin_Privileges group.

Seed file created Using an executable called sapack.exe, a file called w32-ix86_seed is sent to c:\temp\deploy. This file is used for the initial start of the lcfd service.

lcfd installed Once the seed file is in place, the lcfd.exe executed with the -I option enables it as a service. Once completed, it will place several files outside of the base TMA directory: %SYSTEMROOT%\system32\TivoliAP.dll %SYSTEMROOT%\Tivoli\lcf\1\lcf_env.cmd

Finally, it will register the Tivoli Authentication Package with LSA using wlcftap.exe -a.

lcfd started and logged in to the gateway Once TAP has been enabled, lcfd is started with the parameters passed to it by the seed file as well as any user-defined parameters. TMA will attempt to log into a gateway. Once completed, NT system will need to be rebooted for TAP to be loaded by the LSA. If this was a previous installation of TMA, or is a managed node being migrated to TMA, the NT will not need to be rebooted, since TAP has already been recognized by the LSA.

1000

Troubleshooting Tivoli Using the Latest Features

Installation using winstlcf The winstlcf is a Perl script that can be issued from the TMR Server or any Gateway/TMF node. This method is similar to a TRIP install, as it uses the Server Message Block (SMB) protocol to install and enable TMA on NT endpoints.

Sequence of events Issue the following command: winstlcf -l 3656 -g 1.1.1.2+3646 -N binkley serene

Where: 򐂰 binkley is the system that copies the files to the target. It is called the proxy node. This proxy node must already be defined as an endpoint. 򐂰 serene is the target NT system. 򐂰 The -g option is to identify a gateway. If -g is not used, the lcfd will broadcast via UDP to attempt to contact a gateway.

The steps for the installation are as follows: 1. The gateway that manages the proxy endpoint responsible for copying the files is contacted. 2. Installing system contacts the target node using the account passed to the winstlcf script. 3. The drive is mapped to the proxy and the Tivoli directories are created on the target node. The executables are copied over to the lcf directory: bin\w32-ix86\mrt\lcfd.exe bin\w32-ix86\mrt\lcfep.exe bin\w32-ix86\mrt\libcpl.dll bin\w32-ix86\mrt\libdes.dll bin\w32-ix86\mrt\libmrt.dll bin\w32-ix86\mrt\msvcrt40.dll

4. Via Server Message Block (SMB), the lcfd is installed as a service and is started as the user used for the installation. 5. At this point, the installing endpoint is complete and logs off. 6. Once the lcfd on the target is started, it will contact the gateway and the login process starts. Once checked in, the gateway downloads the following files: bin\w32-ix86\endpoint\NTLCFInst.exe bin\w32-ix86\endpoint\wlcftap.exe bin\w32-ix86\endpoint\TivoliAP.dll bin\w32-ix86\endpoint\ntconfig.exe bin\w32-ix86\endpoint\libacct.dll bin\w32-ix86\endpoint\reboot.exe

Appendix A. Tivoli/Windows whitepaper

1001

7. The lcfd executes ntconfig.exe to create the accounts. The lcfd executes wlcftap to enable TAP and set the TRAA account, if required. If TAP was previously installed, the TRAA account may still exist from the previous installation. 8. The lcfd will reboot the system if it was instructed to do so. 9. There is a limited amount of logging available using this command. If there are failures, check lcfhost.err. This file is located in the present working directory where winstlcf is issued from. If the lcfd process is not starting, set the trace level to 3 on the TMA node. To do this, either manually start lcfd with the option -d 3, else change the log_threshold=3 in last.cfg.

Preparing NT for a Tivoli installation If possible, there are several steps that can be taken during the creation of the NT workstations/servers that will assist in the time to create a TMF node, and to not have to reboot the NT system when it becomes a TMF or TMA node.

Loading TAP The reason that a newly created TMF/TMA node must reboot is for the TivoliAP.dll to be loaded by LSASS.exe (LSA subsystem). Once loaded, the Tivoli Authentication Package is available to generate login tokens for the oserv and lcfd services. If a gold build of NT is developed to provide a consistent level of software to all NT server and workstations being deployed, it is possible to have the TivoliAP.dll loaded and enabled as well. This will allow the NT to become a TMF/TMA node later on, and not have to reboot after the installation, since TAP is available already. To do this, copy the TivoliAP.dll to %SYSTEMROOT%\system32 and issue wsettap -a (create a script that would edit the registry directly). The wsettap -a command does not require an oserv or lcfd, so this could be done at the time the NT system image is being installed.

Loading Tivoli files prior to using SIS/Classic install In environments where slow links exist, creation of managed nodes can be problematic due to line speed and time. To anticipate a creation of a managed node, the NT node being shipped to the remote site can have the entire Tivoli directory copied from an existing managed node. Once done, the Database directory would be deleted. When the NTsystem is to be created as a managed node and the directory paths have been specified to match the target's already-installed files, the only portion that the install needs to do is the database portion. This can significantly decrease the time required for installing the managed node.

1002

Troubleshooting Tivoli Using the Latest Features

Uninstalling TMF In the event that the TMF node installation failed, or a user needs to remove a fully populated TMF node, it is important to follow the steps below to clean up all references to the TMF node.

Removing the node from the Tivoli database First, before removing the node, identify the dispatcher number assigned to the NT system with the odadmin odlist command. Then, issue the command: wrmnode

This command will clean up all references in the database, which includes the various subscriptions to profile managers. 1. If this command completed successfully, first do a wchkdb -ux, then issue the following command: wchknode -ncsxvu dispatcher (number noted above)

This will verify that all references of the removed node no longer exist in the Tivoli Object Repository. 2. If the above wrmnode command fails, there is a likelihood that all references to the TMF node have not been completely removed. Issue the following command: wrmnode -d

3. If this fails, then try: odadmin odlist objects dispatch number

– If the number of objects remaining are three or less, then issue the command: odadmin odlist rm_od disp number

– If there were more than three objects, it still may be necessary to remove the node with the odadmin odlist rm_od command, although more than seven objects is a good indication of other problems and it would be best to contact your support provider.

Verification of a removed managed node in the Tivoli database There are three important locations where a TMF node is referenced. After removing a TMF node a Tivoli administrator could verify that the removed managed node does not exist in the three locations by running the following commands: 򐂰 wls -l /Library/ManagedNode 򐂰 wlookup -ar ManagedNode

Appendix A. Tivoli/Windows whitepaper

1003

򐂰 odadmin odlist

This does not take into account subscriptions to profile managers, and so on.

Removing files from the NT system Once the Tivoli database has been cleaned up, the client oserv should be stopped. If not, stop the process. Once the oserv is stopped, it is necessary to clean up some registry entries made by Tivoli. 1. Remove the oserv from the service list. The key still exists, but the values are nulled: oinstall -remove

2. Remove the entry in LSA for the Tivoli authentication package: wsettap -d

3. Remove TRIP as a service: trip -remove

You can keep TRIP running (and not do this step), as this will eliminate one step in the reinstall. 4. Issue the wlocalhost command. If it returns a host name, and this name is not valid, reset the value to the host name that is applicable to this machine: wlocalhost new name

or remove it from the registry: HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform\localhost

5. Remove the following files from NT system: – %SYSTEMROOT%\system32\drivers\etc\Tivoli – Path to base directory of Program Files\Tivoli (except if you are leaving TRIP; you would remove the other directories from inside the Tivoli\ directory structure, leaving the Tivoli\trip directory).

Removing TMA In the event that TMA needs to be removed, follow these steps to remove TMA from the NT node.

Removing the endpoint from the Framework database 1. First, remove the endpoint Tivoli Management Framework database by issuing the following command: wdelep

1004

Troubleshooting Tivoli Using the Latest Features

2. Once the endpoint has been removed from the Tivoli database, issue the following commands to remove the TMA files and modifications: a. If TMA was installed from the CD (InstallShield) From the %LCFROOT% directory: uninst.bat

Appendix A. Tivoli/Windows whitepaper

1005

b. If TMA was installed remotely, there will be no uninst.bat file. Follow these steps to manually remove the TMA installation: i. Remove TAP from the LSA: wlcftap -d

Note: If TMA and TMF are installed on the same machine, do not remove TAP; this will affect the TMF installation on the NT system.

ii. Remove lcfd from the services list: lcfd -r "lcfd"

iii. Remove the icon from the TaskBar: lcfep -s

iv. Finally, remove the files from the NT system located in: %SYSTEMROOT%\system32\drivers\etc\Tivoli %SYSTEMROOT%\Tivoli\lcf Directory to lcf base\lcf

Tivoli files placed under %SYSTEMROOT% This section will list the various Tivoli files that are placed under %SYSTEMROOT%. Note: Note that these files depend on the version of the products and might be different in the later versions of the products.

Tivoli Remote Installation Package (TRIP) There are no files placed under %SYSTEMROOT%

Tivoli Management Agent (TMA) Files for Tivoli Management Agent are: 򐂰 %SYSTEMROOT%\system32

– sis_sh.exe – TivoliAP.dll – worldname.exe 򐂰 %SYSTEMROOT%\Tivoli\1\lcf

– lcf_env.cmd – lcf_env.sh

1006

Troubleshooting Tivoli Using the Latest Features

Tivoli Management Framework Files for Tivoli Management Framework are: 򐂰 %SYSTEMROOT%\system32

– sis_sh.exe – TivoliAP.dll – worldname.exe 򐂰 %SYSTEMROOT\system32\drivers\etc\Tivoli

– setup_env.cmd – setup_env.sh – \tll.conf\arg – \tll.conf\layout – \tll.conf\library – \tll.conf\task

Remote Control Files for Remote Control are: 򐂰 Remote Control Server

There are no files placed under %SYSTEMROOT%. 򐂰 Remote Control Controller

– %SYSTEMROOT% 46,080

RCSERV.EXE

– %SYSTEMROOT%\system32 4,096

EQNMSG.DLL

򐂰 Remote Control Target

– %SYSTEMROOT% 26,624

eqnhook.dll

46,080

RCSERV.EXE

– %SYSTEMROOT%\system32 Command Prompt.lnk Modified 17,408

VDD.DLL

7,168

VDDFIFO.DLL

12,288

VDDHOOK.DLL

Appendix A. Tivoli/Windows whitepaper

1007

– %SYSTEMROOT%\system32\drivers 21,024

KEYEX.SYS

20,288

MOUEX.SYS

8,288

TGRAB.SYS

򐂰 Security

There are no files placed under %SYSTEMROOT%. 򐂰 UserAdmin

There are no files placed under %SYSTEMROOT%. 򐂰 IBM Tivoli Configuration Manager

swdis.ini and swdis.bak. 򐂰 IBM Tivoli Monitoring

There are no files placed under %SYSTEMROOT%. 򐂰 TEC

There are no files placed under %SYSTEMROOT%. 򐂰 Inventory

There are no files placed under %SYSTEMROOT%.

DLL conflicts There are incompatibilities with different versions of MSVCRT40.DLL shipped from Microsoft (which seems to vary from software package to software package). Prior to Framework version 3.7, the Tivoli installations would be impacted by changes to the version placed in %SYSTEMROOT%\system32. Tivoli installs Version 4.0.0.5270 of the DLL. If Tivoli processes start to fail, or processes such as ntprocinfo.exe utilizes 100% of the CPU, this would suggest a version of MSVCRT40.DLL that is not backwards compatible has been introduced into the %SYSTEMROOT%\system32 directory. Important: This is an issue only with the TMF nodes. TMA nodes have the MSVCRT40.dll installed in the TMA directories and thereby eliminate possible conflicts. This is also addressed in Version 3.6.1 for the managed nodes.

To correct this on NT versions that have this issue, follow these steps: In Framework version 3.1.3 or 3.1.4, 3.2 SuperPatch, or 3.6 and 3.6.1, run: copy %BINDIR%\mslib\msvcrt40.dll %DBDIR% copy %BINDIR%\mslib\msvcrt40.dll %BINDIR%\tools

1008

Troubleshooting Tivoli Using the Latest Features

copy %BINDIR%\mslib\msvcrt40.dll %BINDIR%\bin

At this point, Tivoli processes will be insulated from the changed MSVCRT40.DLL in the %SYSTEMROOT%\system32 (after a restart of the oserv). Loading of DLLs works differently in Windows NT than previous versions of Windows. NT loads a DLL separately for each process, because each application has its own address space in Windows NT; the address space is shared in 16-bit Windows. For more information, see Microsoft Development Network or Article ID Q100635, found at: http://support.microsoft.com/default.aspx?scid=fh;EN-US;kbhowto&sd=GN&ln=EN-US& FR=0

The way the LoadLibrary() works in NT system is as follows: When no path is specified, the function searches for the file in the following sequence: 1. The directory from which the application loaded. 2. The current directory. 3. The 32-bit Windows system directory. Use the GetSystemDirectory function to get the path of this directory. The name of this directory is SYSTEM32. 4. The 16-bit Windows system directory. There is no Win32 function that obtains the path of this directory, but it is searched. The name of this directory is SYSTEM. 5. The Windows directory. Use the GetWindowsDirectory function to get the path of this directory. 6. The directories that are listed in the PATH environment variable. To confirm this behavior, use a tool like listdlls.exe from http://www.sysinternals.com. Processes like lsass will use the %SYSTEMROOT%\system32\msvcrt40.dll and Tivoli processes like oserv will use %DBDIR%\msvcrt40.dll. Another method is to use tlist and locate the version of the dll in question. Tivoli ships Version 4.0.0.5270.

Microsoft platform-specific topics regarding TME In this section, we will cover platform-specific topics regarding TME.

Appendix A. Tivoli/Windows whitepaper

1009

Windows NT (Version 4.0) The current versions of TME products are fully supported with NT Version 4 Service Pack 6.

Windows 2000 The current versions of TME products are fully supported with Windows 2000.

Windows XP 򐂰 Only Windows XP Professional is supported. Windows XP Home Edition, Windows XP 64-bit Edition, and the Windows Whistler Server family are not supported. 򐂰 Only endpoints can be installed on Windows XP Professional systems. Tivoli Management Region servers and managed nodes are not supported as of Tivoli Management Framework Version 3.7.1. 򐂰 At least patch 3.7.1-TMF-0005 must be installed (includes endpoint version 93). This endpoint version is the first to support Windows XP. 򐂰 At least patch 3.7.1-TMF-0008 must be installed (includes winstlcf support for Windows XP). This command must be invoked from a Windows NT or Windows 2000 managed node.

Installation on Windows XP Here are some installation specifics for Tivoli on Windows XP.

New endpoint on Windows XP Use the InstallShield image to install an endpoint version 93 (or newer), or use the winstlcf command. Using the winstlcf command requires an existing endpoint version 93 on another Windows NT, Windows 2000, or Windows XP Professional system.

Upgrade Windows NT or 2000 endpoint to Windows XP Follow these steps to upgrade a Windows NT or 2000 endpoint to Windows XP: 1. Upgrade the existing endpoint to version 93 (or newer) 2. Upgrade the operating system. If the Windows XP Professional endpoint is older than version 93, the configuration is unsupported. This configuration can lead to endpoint failures, including an inability to upgrade the endpoint due to known technical problems.

1010

Troubleshooting Tivoli Using the Latest Features

Upgrade Windows 95, 98, or ME endpoint to Windows XP Do these steps: 1. Uninstall the existing endpoint. 2. Upgrade the operating system. 3. Install a endpoint version 93 (or newer). The Windows 95/98/ME endpoint has a completely different set of binaries and is a different Tivoli interpreter type (win95 versus w32-ix86) from Windows NT/2000/XP endpoints. Changing from a win95 interp to a w32-ix86 operating system requires a reinstallation of the TMA code.

Technical issues on XP/2000 resolved by endpoint V 93+ The following technical issues are solved starting with endpoint version 93. If a system is in an unsupported configuration (for example, a pre-version 93 LCF endpoint on Windows XP), it might be possible to resolve the problem long enough to upgrade the endpoint to a supported configuration. 򐂰 Stopping the lcfd service using the net stop lcfd command causes a system error dialog to be displayed. 򐂰 Windows 2000 and XP systems have more restrictive default security on system directories, which can prevent read/execute permission to temporary scripts and binaries required by spawned tasks and methods.

Workaround: Grant read/execute permissions for group Everyone to the %TEMP% directory. 򐂰 Membership in the Administrators group was determined by checking the owner of the current process security descriptor. Windows XP changed the default owner to be the individual user rather than the group.

Workaround: Change local security policy named "System Objects: Default owner for objects created by members of the administrators group" from "Object creator" to "Administrators group". 򐂰 Granting read/execute permission to %SystemRoot%\System32 for group Everyone conflicts with more restrictive default security settings on Windows 2000 and Windows XP systems. Grant permission to tmersrvd account instead.

Technical issues on XP not resolved by LCF V 93 Here we discuss some XP technical issues that are not resolved by LCF V 93. 򐂰 "ForceGuest" (simple file sharing)

By default, Windows XP systems that are not a member of a domain force all incoming network connections to "Guest" level access, even when a user

Appendix A. Tivoli/Windows whitepaper

1011

name and password have been specified. Because of this new "ForceGuest" behavior, either the "Guest" user account or the "Everyone" group (the only group to which Guest belongs) must have permissions on the share (and the shared directories and files). "ForceGuest" causes remote installations that rely on the default administrative share (for example, C$) to fail because "Guest" and "Everyone" do not have sufficient access. Solution: – Make the Windows XP system a member of a domain prior to remote installation. Joining a domain changes the default behavior of "ForceGuest" to be compatible with Windows NT and 2000 authentication. OR – Manually disable the "ForceGuest" behavior of the Windows XP system prior to remote installation. This can be done in one of the following ways: •

From Windows Explorer, select Tools -> Folder Options, select the View tab, and uncheck Use simple file sharing (recommended), which is found at the bottom of the Advanced Settings list.



From the Local Security Policy tool, open the Security Settings tree and select the Local Policies -> Security Options folder. Change the Network access: Sharing and security model for local accounts policy from Guest only - local users authenticate as Guest to Classic - local users authenticate as themselves.

For related information, see Microsoft Knowledge Base article #Q290403, found at: http://support.microsoft.com/default.aspx?scid=fh;EN-US;kbhowto&sd=GN&ln=EN-US& FR=0

򐂰 Remote reboot

Windows XP systems that are not a member of a domain cannot be rebooted remotely when no one is currently logged into the system. Solution: Choose one of the following three solutions: – Make the Windows XP system a member of a domain prior to remote installation. – Log into the Windows XP desktop before starting installation. – Choose to not perform a reboot of Windows XP during remote installation, then reboot manually when installation is complete.

1012

Troubleshooting Tivoli Using the Latest Features

򐂰 Administrator account

By default, Windows XP systems that are not a member of a domain hide the built-in Administrator account. While hidden, the Administrator account cannot be selected to log onto the XP system interactively. This can cause problems or confusion when running Tivoli commands from XP managed nodes. Solution: Choose one of the following three solutions: – From the Tivoli Server, add the Administrator-equivalent accounts to the logins for the Tivoli administrator. See the wsetadmin command for more information. – Make the Windows XP system a member of a domain. This forces the logon process to behave like Windows NT/2000 systems and the Administrator account may be used for interactive logons. – Remove all accounts from the XP system that have administrator privileges except Administrator. This forces the built-in Administrator account to become visible and available for interactive XP logons. 򐂰 Display issues

GUI tasks and methods executed on a Windows XP endpoint do not display properly. This is either a change or a bug in the Windows security system involving the access control lists (ACLs) for the default Windows workstation and Desktop.

Environmental considerations There are several environmental concerns pertaining to NT/Windows 2000 that may affect TMF/TMA.

How shell and Perl scripts work on NT Shell and Perl scripts within Tivoli start with the line #!/bin/sh or #!/etc/Tivoli/bin/perl. Although these paths do not exist on NT, Tivoli Management Framework will catch these references and direct them to the correct executable. On TMF, the oserv process will read the line and if it sees either 'sh' or 'perl', it will use the perl and sh found in %BINDIR%\tools (TMF). TMA will rely on the dependencies when sh or perl are encountered. Because Tivoli uses #!/bin/sh, it is important to not replace Tivoli's version of bash.exe and sh.exe. If your enterprise needs dictate another version of sh.exe,

Appendix A. Tivoli/Windows whitepaper

1013

place it in a separate location and adjust the NT PATH environment to include that version. Tivoli's implementation will use the Tivoli-supplied bash.exe and sh.exe.

Dependencies and TMA Because TMA endpoints do not contain the various tools included with TMF (%BINDIR%\tools), it will be necessary to create dependencies to provide support to scripts using commands not found with the standard NT/TMA release. The Tivoli Management Framework Release Notes Version 4.1, GI11-0890 describe how to create dependencies on TMA. For example, the run_task method needs to have a dependency on bash (sh) in order for tasks to execute a script using #!/bin/sh. To set this up, run: wdepset -c task-library-tools -a w32-ix86 bin/w32-ix86/tools/sh.exe +a +p %TOOLS% wdepset -e @DependencyMgr:task-library-tools -a \ w32-ix86 bin/w32-ix86/tools/win32gnu.dll +a +p %TOOLS% wchdep @Classes:TaskEndpoint @DependencyMgr:task-library-tools run_task wgateway dbcheck

You can issue the wdepset -e command for each tool you want to be downloaded when a task is run, like sed and awk, or for perl. You must issue a wgateway dbcheck against all gateways whenever you update dependencies so that the new method headers are cached. Tip: If your enterprise requirements dictate that Tivoli product tasks will use a set of tools on the TMA nodes, it may be best to create a file package that is then distributed to each node after its initial creation. This may prove to be an easier method of providing tools to the nodes rather than using dependencies.

Name resolution/WINS NT system offers several means to resolve host names. Tivoli utilizes the standard gethostbyname() and gethostbyaddr() system calls to resolve name and IP addresses. When this is passed to NT system, it will use not only DNS (if configured), but also WINS, hosts, and LMHOSTS. If a failure occurs, be sure that the NT system is properly configured for name resolution, such as Enable DNS for Windows Resolution in the TCP/IP properties. One note about WINS: Although WINS is a valid resolver, the WINS database is not static by default and should not be relied on as the primary resolver when the NT system is a managed node.

1014

Troubleshooting Tivoli Using the Latest Features

There are several reports of problems with using Fully Qualified Domain Names. For example, a post on NTBUGTRAQ's listserv (http://www.ntbugtraq.com) reports that a 15 character FQDN will fail to resolve on NTsystem due to NetBIOS naming conflicts. Another issue regarding NT system's implementation of name resolution is that Microsoft allows DNS host names to have _ (underscores) in them on clients (This is not RFC compliant; underscores are illegal in DNS). By default, a WINS name is converted in the MS TCP/IP DNS implementation to a dash. Therefore, host names with the underscore will be converted to the dash, and therefore cause problems in determining the real host name.

Sourcing the Tivoli environment If a user wishes to source the Tivoli environment, there are setup files located on the node: 򐂰 TMF: %SYSTEMROOT%\system32\drivers\etc\Tivoli\setup_env.cmd. 򐂰 TMA: %SYSTEMROOT%\Tivoli\lcf\1\lcf_env.cmd

If one would like to have a command shell that sources the Tivoli environment when invoked, create a shortcut to the cmd.exe executable, and select Properties. Under Shortcut, append to cmd.exe /k in the Target Field. For example, a TMF file being sourced would use: cmd /k %SYSTEMROOT%\system32\drivers\etc\Tivoli\setup_env.cmd

Tivoli Desktop for TMF The Tivoli Desktop is not installed with the NT TMF or TMA. To install the Windows version of the desktop, initiate \PC\Desktop\Disk1\setup.exe. This will install the Tivoli Desktop application. It is still possible to point an existing Windows Tivoli Desktop at an NT managed node even if the Desktop For Windows is not installed on that NT system. The managed node is capable of supporting remote desktops. TMA nodes do not support remote desktops.

Appendix A. Tivoli/Windows whitepaper

1015

Basic performance tuning considerations Although not important to TMA, there are several NT tuning parameters useful to TMF nodes serving as repeaters, gateways, a TMR or TEC server: 򐂰 NT 4.0: Select Control Panel -> Network -> Services -> Server -> Properties. This section allows the tuning of system-wide caches. The ideal setting is Maximize Throughput for Network Applications. 򐂰 NT 4.0: Select Control Panel -> System -> Performance. Set the Performance Boost to the low setting if possible. 򐂰 Windows 2000: Select Control Panel -> Network and Dial-up Connections. Right-click your primary NIC and choose Properties. Highlight File and Printer Sharing for Microsoft Networks and click Properties. Under Optimization, choose Maximize data throughput for network applications. 򐂰 Windows 2000: In the Control Panel, select System -> Advanced -> Performance Options -> Background Services (default). 򐂰 Screensavers

Screensavers will impact the performance of NT system. It appears to consume 100% of the CPU, as seen when perfmon is run.

Non-US Keyboard issue If an NT TMF/TMA install fails with the error: Creating separate WindowStation and DeskTop. Create WindowStation failed. The specified module could not be found.

this is due to the input locales. Copy keybus.dll to %SYSTEMROOT%\system32. This can also cause net_recv errors with TMA nodes when issuing wadminep view_version.

Port restriction causes TIME_WAIT to last 169 seconds Tivoli provides a means to restrict port usage to a given range (odadmin will reveal whether port restrictions are in use). On NT system, the restrictions will cause closed TCP connections to persist 169 seconds in a TIME_WAIT state.

1016

Troubleshooting Tivoli Using the Latest Features

To minimize the delay, one can edit the NT system's TCP settings as follows: 1. Locate the key: HKEY_LOCAL_MACHINE/System/CurrentControlSet/Services/Tcpip/ Parameters 2. Click on Edit/Add Value. 3. You then enter the value name TcpTimedWaitDelay. 4. Change the datatype from its default (REG_SZ) to REG_DWORD. 5. When you click on OK it will then ask you for Data. Enter 60 (decimal) for 60 seconds. 6. Then reboot. All TIMED_WAIT'ed ports will disappear in 60 seconds.

Service Pack versions supported with Tivoli Tivoli Management Framework is tested against all Services Packs for NT system, Service Pack 4 for Windows 2000 and Service Pack 1 for XP.

TCP/IP speed tweaks for Windows NT, 2000 and XP The Web site SpeedGuide.net (http://www.speedguide.net) provides some useful information on fine-tuning TCP/IP to perform faster over broadband Internet connections, such as a cable modem or DSL. What is interesting is that these speed tips can have a positive performance improvement on LAN-based networks as well. In general, the tuning involves increasing the TCP Receive Window and TCP Window size, as well as the Maximum Transmission Unit (MTU), to correspond with the maximum Ethernet packet size. More information, as well as downloadable registry updates for Windows 2000 and Windows XP systems, can be found at: http://www.speedguide.net/Cable_modems/cable_reg_win2k.shtml

Settings for Windows NT systems can be found at: http://www.speedguide.net/Cable_modems/cable_registry.shtml

Please note that the settings for NT and 2000 or XP systems are not interchangeable, for example, NT's maximum receive window is limited to 64 KB, while Windows 2000 can be has high as 256 KB. It also should be noted that these options may not be optimal for some network environments and the changes involved should be tested thoroughly before implementing them in production or on a wide-scale basis.

Appendix A. Tivoli/Windows whitepaper

1017

Example A-2 shows an example “speed tweak” file for Windows 2000. Example: A-2 Contents of file: Speed_tweak_for_Windows_2000.REG Windows Registry Editor Version 5.00[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]"Sac kOpts"=dword:00000001"TcpWindowSize"=dword:0003ebc0"Tcp1323Opts"=dword:00000001 "DefaultTTL"=dword:00000040"EnablePMTUBHDetect"=dword:00000000"EnablePMTUDiscov ery"=dword:00000001"GlobalMaxTcpWindowSize"=dword:0003ebc0

Tivoli, Microsoft, and third party utilities for NT Below are a list of commands and concepts that are unique to the NT environment.

NT-specific commands provided by TME 򐂰 wrunui.exe (TMF only)

Executable that allows a script, sentry monitor or task to execute a graphic-based application on the NT console. For example: wrunui notepad

򐂰 bash.exe

Bourne Again Shell. Provides full bourne shell facilities. This is located in %BINDIR%\tools. There is also sh.exe, which is the renamed bash executable. 򐂰 wsettap.exe (TMF) or wlcftap.exe (TMA) wsettab and wlcftap have three roles:

– Sets the TRAA account for access to remote access of NT objects: wsettap -r DOMAIN\fred sets TRAA to that domain account wlcftap -r "" nulls TRAA account

– Activates and disables the Tivoli Authentication Package: wsettap -d wlcftap -a

Disables TAP Activates TAP

򐂰 Configures TAP to query only PDC or any PDC/BDC for authentication: wsettap -B wlcftap -P

1018

for ANY Domain Controller for Primary Domain Controller

Troubleshooting Tivoli Using the Latest Features

򐂰 smtp_client.exe (TMF only)

Provides a mail client for NT. Useful for scripts that want to initiate a mail message as a result of a script. For example: smtp_client [email protected] < c:\temp\file

If smtp_client is issued with no e-mail address, it will look for the value in HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform\mailhost. This key can be set using wmailhost.exe. 򐂰 TRAA (Tivoli Remote Access Account)

An account and password stored locally on NT that is used when a Tivoli process accesses a remote object or if the Tivoli binaries and libraries are installed on a remote share. A TRAA account is not needed unless customer design dictates access to these remote NT objects (like network shares). 򐂰 TRIP (Tivoli Remote Installation Package)

Provides rexec functionality for installation of managed node or TMA using SIS or the classic install method. Once installed, provides remote startup of the oserv (odadmin start ) and for remote interconnects of TMRs. TRIP can be disabled after installation if remote startup of oserv service is not needed 򐂰 wlocalhost.exe (TMF only)

Command sets the wlocalhost key and value in registry: HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform\wlocalhost

The key is used when NT is a fail-over machine or when the host name differs from the Tivoli label of the managed node. 򐂰 wmailhost.exe (TMF only)

Command to specify an SMTP server. TMF nodes use this reference when the wsupport or smtp_client command is used, as well as applications like DistributedMonitoring. It stores the SMTP server in HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform\mailhost. When attempting to connect to SMTP server, it will first try the host referenced with wmailhost, then the localhost, and then the host that the given e-mail address specifies (for example, [email protected]). 򐂰 ntprocinfo.exe

Provides a list of processes running on the NT system.

Appendix A. Tivoli/Windows whitepaper

1019

Windows commands for working with TME products 򐂰 ipconfig/all

Provides network information related to TCP/IP. 򐂰 netstat

Provides router and port information. 򐂰 nbtstat

Displays protocol statistics and current TCP/IP connections using NBT (NetBIOS over TCP/IP). 򐂰 perfmon

Useful for obtaining overall system resource usage. 򐂰 Some useful commands from the NTResKit

– tlist.exe: Process list – telnetd.exe: Telnet service (Windows 2000 has telnet built-in) – sc.exe: NT Service Controller – netsvc.exe: NT Service Controller – ntrights.exe: Grants/revokes any advanced rights for a user on a local or remote computer

Other utilities Some useful commands from http://www.sysinternals.com: 򐂰 listdll.exe

Shows a list of processes and the DLLs each process has loaded 򐂰 nthandleex.exe

Shows the processes running, the user the process is running as, and open handles 򐂰 ntregmon.exe

Shows what processes are accessing the registry

Concerns regarding perceived security vulnerabilities While Tivoli utilizes APIs provided by Microsoft, the current security focus often raises Tivoli as a risk due to the use of the Authentication Package. This perceived risk often impacts deployment schedules. In most cases, it is advised

1020

Troubleshooting Tivoli Using the Latest Features

that you review this chapter with the Security team, as well as highlight this section in terms of understanding the level of risk associated with using Tivoli.

Utilizing the LSA/Authentication package implementation The Tivoli Authentication Package (TAP) functions as an extension to the Local System Authority (LSA) subsystem in Windows NT/2000/XP. TAP itself does not create tokens, and there is no way to invoke TAP functions directly. Rather, a process calls the LSA to request a token and specifies which authentication package to use. This calling process provides required data in the format expected by the selected package, and the LSA passes it to the authentication package for verification. If the data is OK, the authentication package returns a SID and other information back to the LSA, which then creates a token and returns it to the original caller. Processes that call the LSA and select TAP as the authentication package must meet the following requirements to prevent unauthorized use: 򐂰 The calling process must know the TAP protocol. Various structures and data are provided to the LSA and subsequently passed to TAP. Knowledge of this protocol requires access to the source code. 򐂰 The calling process must know how to pass specific TME credentials. The caller process must pass specific credentials to TAP. 򐂰 The calling process must have SeTcbPrivilege ("Act as part of the operating system") privileges. By default, only the SYSTEM account has this privilege, but it can be assigned by a member of the Administrators group.

Vulnerability analysis Impersonating a user account requires a valid process token. The oserv and lcfd processes request this token from the LSA, specifying TAP as the desired authentication package. If the LSA grants the request, it generates a process token that contains the same information as if the impersonated user had actually logged onto the local system, except the token does not contain credentials for accessing remote resources (shares) as the requested user. If a Tivoli Remote Access Account (TRAA) has been set, the token will contain credentials for accessing remote resources as the TRAA. Setting the TRAA account is not a requirement of the Tivoli Management Framework, and most customers leave it . Tivoli ID maps have no effect on this procedure. They simply provide a convenient way to map a single label to specific accounts on different platforms.

Appendix A. Tivoli/Windows whitepaper

1021

For example, a task configured to run as $root_user may run as Administrator on Windows NT or as root on Solaris. Once the ID map has been resolved to the actual account name, the impersonation procedure is the same as in a normal Tivoli ID map operation. Tivoli ID maps contain no actual account information. The safeguard that prevents the unauthorized use of TAP is the Windows NT/2000/XP security system itself. As noted in the first section, the LSA requires the calling process to have the SeTcbPrivilege before it will grant any requests. This applies to all authentication packages, not just TAP. Only the SYSTEM account has SeTcbPrivilege by default, and only a member of the Administrators group can grant this privilege. When TMF is installed, a Tivoli_Admin_Privileges group is created and the SeTcbPrivilege is granted to this group. The member of the Administrators group who is installing TMF and the built-in Administrator account are made members of this group. The account mapped to $root_user (by default the account that installs TMF or the Administrator account) must be a member of this group so that TMF tasks may be run as the requested user. Any process that is already running as SYSTEM or Administrator has unrestricted access to the system. Even if a rogue process were installed and running as SYSTEM or Administrator, it would still have to know the TAP password and protocol before it could use the LSA and TAP to generate impersonation tokens for accounts without knowing their individual passwords. Any attempts to discover the TAP password can be monitored by configuring Windows to log all failed logons. Also, the TAP password on a machine may be changed to a new random value by shutting down the oserv or lcfd process and by running wsettap/wlcftap -a and restarting the oserv or lcfd process.

Common problems, troubleshooting, and FAQs Below is a general list of Tivoli issues encountered on NT/2000/XP.

Issues related to the OS 򐂰 Inheritance of user privileges from group membership is not honored by the operating system.

This is seen almost exclusively on Windows 2000 systems, and among those systems, most frequently, they are involved in Active Directory. Tivoli requires the privileges: – Act as part of operating system – Increase Quotas – Replace a process level token

1022

Troubleshooting Tivoli Using the Latest Features

To make administration of the privileges simpler, Tivoli creates the Tivoli_Admin_Privileges group and assigns these privileges to this group; however, users who are a part of Tivoli_Admin_Privileges may not inherit those privileges, as would be expected. The work-around involves assigning those three privileges to users individually. 򐂰 Security Audit event 529 contains "TivoliAP" when NTLM is set to "Send NTLMv2 response only/refuse LM & NTLM".

Microsoft has confirmed this is a bug (Q312827) in Windows 2000 systems that are configured to refuse LanManager (LM) and NT LAN Manager (NTLM) requests and only accept NTLMv2. This is an error with the error message itself. More information on this bug can be found at: http://support.microsoft.com/default.aspx?scid=kb;EN-US;q312827

򐂰 Issues with TAP

TAP issues typically fall into two areas: Initial startup of the oserv and spawning processes as invalid accounts. The wsettap/wlcftap commands can be issued regardless of the oserv's state (up or down). To issue wsettap/wlcftap, the caller must be a member of both the Administrators group and the Tivoli_Admin_Privileges group. If a user gets the error Access is denied, this is a result of the user not being part of the two groups. 򐂰 Failure to start oserv

Failures starting the oserv will generally receive error 1067. In all cases, review the %DBDIR%\oservlog on the failing NT system for a better explanation. Below are several errors and the solutions to restart the oserv. – Tivoli Authentication Package is not properly installed or loaded by LSA. The error is: !tap_init_failed A specified authentication package is unknown. TAP is not known to the LSA subsystem.



TAP not listed in the Authentication Package key: Issue the command: wsettap -a or wlcftap -a

Then reboot the machine. •

Another Authentication Package is listed before the Tivoli Authentication Package. If this is the case, move TAP to the second position (after msv1_0) in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\A uthenticationPackages.

Appendix A. Tivoli/Windows whitepaper

1023

– !tap_call_init failed, Error 2221

The TRAA account no longer exists. Reset the TRAA account to a valid account by running: wsettap -r \

or wsettap -r ""

(This will nullify the TRAA account.) – tap_init_failed Error 38

The privileged account is not part of the Tivoli_Admin_Privileges group. – !tap_call_init failed. The user must change his password before he logs on the first time.

The TRAA account is being forced to reset the password. Disable this feature. – tap_get_sid_logon_token failed: No mapping between account names and security IDs was done. The tmersrvd account has been removed. – tap_get_sid_logon_token failed: Logon failure: the user has not been granted the requested logon type at this computer.

The tmersrvd account does not have Log on Locally as a UserRight. – CreateWindowStation failed: Access is denied

tmersrvd is denied access to the %SYSTEMROOT% directory. Verify that tmersrvd has Bypass Traverse Checking and that the account is not specifically denied access. 򐂰 Executing processes using invalid account

If the Tivoli Management Framework attempts to spawn a process as an invalid user, such as invoking a task or a sentry monitor, the following errors can occur: – user_string_to_sid failed with code '77' Account used does not exist or the idmap does not have the correct entry for the w32-ix86 interp (widmap). This is common with tasks, where the task is specified to run within the context of a given UID. – tap_make_sid_logon_token' failed with code '77' Either: •

1024

Account is not allowed to log on locally. You need to add this User Right so that account in question can run successfully.

Troubleshooting Tivoli Using the Latest Features

or •

Account is disabled. Be sure account has Password Never Expired set.

or •

Account has User Must Change Password set. Disable this.

Note: You have to be sure that the account is not only set to not expire, but also it is no longer disabled.

– Logon failure: unknown user name or bad password. Reset the account on the NT, or synchronize the domain. – Logon failure: the user has not been granted the requested logon type at this computer.

Either the TRAA user is not allowed to access a share, or the TRAA user is not able to access the computer from the network. Check the share permissions and/or the UserRights. This can also occur if a task's UID does not have the Logon Locally set. – LookUpAccountFailed

Account does not exist on local NT system or in the domain.

Startup of oserv The following errors may appear when oserv is started. Suggested solutions are given with each error. 򐂰 @odinit: Unable to establish connection to ALI !oserv: odlist init failed. IPC Shutdown (67)

There is already a connection from the NT system to the TMR Server that is currently in a TIME_WAIT state. Issue netstat -a and verify the TIME_WAIT state with the TMR Server's port 94. 򐂰 !oserv: odlist init failed. requested resource not found (30)

This can be several possible problems, including: – TAP has the TRAA account incorrectly set. Reset the TRAA account. – Name resolution for the NT system is incorrect. The NT system is unable to look up its IP address and host name correctly. Add a line to %SYSTEMROOT%\system32\drivers\etc\hosts with the correct host name and IP address for NT. 򐂰 bind failed winsock_comm.c. No such File or directory

The wlocalhost registry key is incorrectly set. Issue the command:

Appendix A. Tivoli/Windows whitepaper

1025

wlocalhost

1026

Troubleshooting Tivoli Using the Latest Features

򐂰 Application failed to initialize

This is a DLL conflict with MSVCRT40.DLL. See “DLL conflicts” on page 1008.

Using TRAA with tasks The following errors may appear when using TRAA with tasks. Suggested solutions are given with each error. 򐂰 Access is denied executing task

The TRAA account does not have permissions to access a remote share. 򐂰 User sees a phantom mapped drive.

If a task is run on NT, and the script maps a drive using the net use DRIVE command, and the script does not delete the map; a user will see the drive mapped and will be unable to delete the mapped drive. Although mapping and deleting the map for a task is acceptable, the other alternative that will alleviate problems with the ghost drive letters is to use the \\\ (UNC) format.

General Tivoli Management Framework The following errors may appear when using Tivoli Management Framework. Suggested solutions are given with each error. 򐂰 Error in oservlog regarding 'hurl'

This problem can occur if NT serves as a TMR Server and a network intensive task or a wchkdb -ux was invoked. Increase the number of rpc_max_threads to the maximum supported by the OS (typically 2038) by running: odadmin set_rpc_max_threads 2038

Any number higher than the supported limit will set it to the maximum; you could use 10000 (higher than the supported limit) in the set_rpc_max_threads command and the result will most likely be 2038. For example: odadmin get_rpc_max_threads

The result will be 2038. 򐂰 Unexplained failures of processes to communicate

One possible issue to consider during deployment is whether the NTsystem has been “hardened” to reduce the number of ephemeral ports available to an application. To investigate this, search on MaxUserPort.

Appendix A. Tivoli/Windows whitepaper

1027

The description of this key is: MaxUserPort REG_DWORD Range: Port number in hexadecimal (0x1388 Default: 0x1388

0xFFFE)

Determines the maximum port number used when an application requests an available user port from the system. Typically, ephemeral ports are those that are short-lived and are allocated to port numbers 1025 through 5000 (0x1388).

Install issues The following errors may appear during an install. Suggested solutions are given with each error. 򐂰 There is some difficulty installing a managed node on NT/2000. The odadmin odlist objects command only shows three objects.

Though there may be several reasons for this failure, one particularly notable one has to do with Bash and NT/2000's Path variable. Normally, the environmental variable "Path" is in camel-case, as noted; if, however, this environmental variable's case changes to anything other than "Path", the installation of a managed node on that system will fail. This is due to Bash's inability to find the proper path to commonly used executables used in the installation routine. – How to check your Path variable: The quickest way is to issue set in a Command Prompt window and look through the output for the Path= line (it will be in alphabetical order). – How to fix your Path variable: •

Windows NT:

i. Select Start -> Settings -> Control Panel -> System. Select the Environment tab. ii. Scroll through the System Variables list, highlight the Path variable. iii. At the bottom of the screen, modify the Variable field to read (exactly) Path. iv. No reboot is necessary. •

Windows 2000:

i. Select Start -> Settings -> Control Panel -> System. Select the Advanced Tab. Press the Environmental Variables button. ii. Scroll through the System Variables list, highlight the Path variable. iii. Click the Edit button.

1028

Troubleshooting Tivoli Using the Latest Features

iv. Modify the Variable Name field to read (exactly) Path. v. No reboot is necessary but to see the fix has worked, you have to reload the Command Prompt. 򐂰 Problem when removing a managed node.

If wrmnode -f -d fails with a "FRWSL0007E An authorization error of type "FRWOG0022E insufficient authorization" occurred”: a. Stop and remove the oserv service on the faulty node %BINDIR%\bin\oinstall -remove

b. Delete the %DBDIR% on the faulty node. c. Back on the TMR, do a forceful removal of the node: odadmin odlist rm_od

d. Do a wchkdb -u (or -ux). e. Ensure the Path variable is fixed on the node.

Local responses fail on Windows 2000 Domain Controllers This problem is related to IBM Tivoli Monitoring. It appears that the Effective Setting shown in the Local Policy Setting is not accurate for the actual setting when applied directly to a Windows 2000 Active Directory Domain Controller (native mode) locally. The problem was highlighted by IBM Tivoli Monitoring, where the dm_ep_engine process is unable to spawn any sub-processes, and so was unable to run any non-wntmon.dll monitor. The Security event log would show a successful logon/logoff entry, but no entries at all for tmersvrd, which are normally seen as part of a sub-processed monitor running. Also, all local responses fail. This is seen even if the widmap is set to either Administrator or BuiltinNTAdministrator. Basically, each of the four key advanced user rights need to be set to apply locally explicitly, even though they appear in the dialogs as grey-checked under Effective Policy Setting. This works on non-ADS Servers, but not on local ADS-Server native mode. To better understand the settings, consider that: 򐂰 Local Policy Setting: Represents the Local Security Policy setting that can be modified by selecting Control Panel -> Administrative Tools -> Local Security Policy -> Local Policy -> User right Assignment.

Appendix A. Tivoli/Windows whitepaper

1029

򐂰 Effective Policy Setting: Represents the Domain Controller Security Policy that can be modified following this path: Control Panel -> Administrative Tools -> Domain Controller Security Policy -> Security Setting -> Local Policy -> User right Assignment .

The sequence to change the User Right, as reported below, must be: 򐂰 Domain Controller Security policy secedit /refreshpolicy machine_policy must be run to propagate the modification at Domain level. It could be necessary to wait a few minutes before proceeding with Local Security policy. 򐂰 Local Security policy

The following changes need to be made: a. Modify the Bypass traverse checking options from the defaults (shown in Figure A-1) to the new values shown in Figure A-2 on page 1031.

Figure A-1 Bypass traverse checking options: Before the change

1030

Troubleshooting Tivoli Using the Latest Features

Figure A-2 Bypass traverse checking options: After the change

b. Modify the Act as part of the operating system options from the defaults (shown in Figure A-3 on page 1032) to the new values shown in Figure A-4 on page 1033.

Appendix A. Tivoli/Windows whitepaper

1031

Figure A-3 Act as part of the operating system options: Before the change

1032

Troubleshooting Tivoli Using the Latest Features

Figure A-4 Act as part of the operating system options: After the change

c. Modify the Increase quotas options from the defaults (shown in Figure A-5 on page 1034) to the new values shown in Figure A-6 on page 1035.

Appendix A. Tivoli/Windows whitepaper

1033

Figure A-5 Increase quotas options: Before the change

1034

Troubleshooting Tivoli Using the Latest Features

Figure A-6 Increase quotas options: After the change

d. Modify the Replace a process level token options from the defaults (shown in Figure A-7 on page 1036) to the new values shown in Figure A-8 on page 1037.

Appendix A. Tivoli/Windows whitepaper

1035

Figure A-7 Replace a process level token options: Before the change

1036

Troubleshooting Tivoli Using the Latest Features

Figure A-8 Replace a process level token options: After the change

Identifying which user a given process will run as There will be times when it is necessary to identify what a process would be running as to identify possible permissions or troubleshooting a failed action. Below are several methods to identify the user a given method will run as.

TMR/managed node/gateway using odstat/wtrace This outlines how one identifies the user of a process that runs on a TMF node. For this example, the Tivoli action was to do the following: root#wrunquery -h freedom -f c:/tmp/test

GetNode

where wrunquery command is run against managed node freedom Follow these steps: 1. Identify a method in an odstat and its associated method:

Appendix A. Tivoli/Windows whitepaper

1037

334 M hdoq 1-984 done 6 0 15:14:24 1721656771.4.7#TMF_ManagedNode::Managed_Node# write_to_file

2. Issue the command resolve OID method: root#resolve 1721656771.4.7 write_to_file 1721656771.1.345

At this point, you have identified the behavior object where the method is defined. 3. Issue the command objcall om_stat method: root#objcall $TMR.1.345 om_stat write_to_file CATALOG= SET_USER=* SET_GROUP=* EXPORT=TRUE EXECUTE=FALSE default

At this point, you have identified who the method will run as by using the SET_USER flag. In this case, it is '*'. 4. Issue the command objcall om_get_definition method default: root#objcall $TMR.1.345 om_get_definition write_to_file default STORAGE=/TAS/MANAGED_NODE/man_node_skel1 MODEL=queued-obj-daemon

At this point, the man_node_skel1 process has been identified as the implementation of the write_to_file method. 5. Because the SET_USER has a '*', we must identify what the user will be. To do this, we identify the Tivoli Administrator that would start this process, and reference one of their logins: root#objcall $TMR.0.0 get_principal_id [email protected] $root_user $root_group

At this point, we know that the write_to_file method will run as the root_user idmap. 6. The file was written to an NT machine, so we must resolve the root_user idmap: root#widmap resolve_entry root_user w32-ix86 Administrator

Conclusion: The process man_node_skel1 will run as the user Administrator.

1038

Troubleshooting Tivoli Using the Latest Features

TMA endpoint Identifying what user a process is to run as in an NT TMA node is not the same as with TMF nodes. With the advent of caching methods on gateways, it will be difficult to identify methods acting on the endpoints using odstat. Below is an alternative means of identifying the method that is running against the endpoint, and then identifying the user and process. This example will look at the distribution of a FilePackage to a TMA node: 1. Set the debug level to 6 on the gateway: wgateway set_debug_level 6

2. Restart the gateway: wgateway restart

3. Once the gateway is up, execute the action on an endpoint that is assigned to that gateway: wdistfp -a -d @FilePackage:StdConfig @Endpoint:binkley

where SdConfig is the filepackage and binkley is endpoint. 4. Once completed, locate %DBDIR%\gatelog ($DBDIR/gatelog on UNIX) on the gateway. In the gatelog file, locate the methods that the endpoint was verifying that it had already cached or needed, or where a mdist flag is referenced, as follows: 1998/11/24 11:53:40 +06: mdist: distribution ID = 27,method = fps_install,size = 0

Note: On a busy gateway, this will be next to impossible, as there could be multiple actions occurring. It is recommended that this be done on a test TMR.

5. Once the method has been identified, locate the method in the Tivoli Object Repository: odbls -M fps_install -k $DBDIR 721656771.1.330 method: fps_install

Note: There is a possibility that a method may be overloaded, in which case, odbls will return several possible instances. This is beyond the scope of this discussion, but if there are duplicate methods listed from the above command, the best thing to do check all instances to see if they return the same value.

6. Now that the behavior object has been identified, execute the command:

Appendix A. Tivoli/Windows whitepaper

1039

objcall Behavior Object om_stat method objcall $TMR.1.330 om_stat fps_install SET_USER=$root_user

Identifying the user using ADE *.ist files Another method of identifying a user is to install the ADE files that are part of the Tivoli Management Framework CD and match the method (from an odstat/wtrace) to the appropriate *.ist file. This is useful for the basic framework methods. These *.ist files do not exist for the applications, but only for Tivoli Management Framework. As an example, a Tivoli Administrator creates a policy region as follows: 1. wtrace reveals this information: Object ID: Method: Principal:

1721656771.1.195#SharedPolicyRegions::Engine# create_policy_region CRITSIT-LAB\mhahn (36458574/0)

2. Search the *.ist files for the method create_policy_region: grep -i create_policy_region *.ist PolicyGUI.ist

3. Identify the user that the method will run as: TMF_imp_PolicyRegion::GUI::create_policy_region } = {"default", "/TAS/PRDO/Policy_GUI";}; };

Options for SET_USER The options for SET_USER are: 򐂰 Privileged: SET_USER=root

On NT, this will map to the built-in Administrator account. It is important to stress that this built-in Administrator account can be renamed and not affect any privileged methods. 򐂰 Unprivileged: SET_USER=

(If viewing *.ist files from ADE, this is referred to as default.) On NT, this will map to the tmersrvd account 򐂰 Idmap: SET_USER=$value

The $ has a special meaning to the oserv/lcfd. This will refer to an idmap. To view the idmap, use the widmap command. An example of this follows: SET_USER=$root_user #widmap resolve_entry root_user w32-ix86

1040

Troubleshooting Tivoli Using the Latest Features

Administrator

Idmaps are managed at a TMR level and are designed to provide a means of mapping certain accounts based on the OS. Important: When an idmap references an account name, such as Administrator, TAP must locate that name first in the local SAM database or in the domain SAM. If an account 'Administrator' is not found, the process will not start. This behavior is different with SET_USER=root, as root is mapped to a SID directly, which will map to whatever the built-in Administrator account has been renamed to. 򐂰 User-defined: SET_USER=* or Tivoli applications that support setting a UID, like tasks, and SentryProfiles

As Tivoli Management Framework is deployed, there are many areas in the products that allow customization. Tasks and SentryProfiles are two of the areas that allow Tivoli Administrators to define what user these customized programs will run as.

Appendix A. Tivoli/Windows whitepaper

1041

1042

Troubleshooting Tivoli Using the Latest Features

B

Appendix B.

Tivoli/NetWare whitepaper This appendix contains information specific to installing endpoints and gateways on NetWare systems. A NetWare gateway running on a NetWare server can manage endpoint connections using the TCP/IP or IPX/SPX protocol. Additionally, this appendix contains information about migrating NetWare clients, running PC agent software to endpoints, and migrating NetWare managed sites to gateways. This content of this appendix has been compiled by Shantilal C. Lodhia of IBM US. The content in this appendix was taken from various sources, including the Tivoli Management Framework Planning for Deployment Guide Version 4.1, GC32-0803, Tivoli Management Framework Reference Manual Version 4.1, SC32-0806, IBM Tivoli Enterprise Console Installation Guide Version 3.8, GC32-0823, Tivoli Management Framework User ’s Guide Version 4.1, GC32-0805, and the Novell Netware Web site, found at http://www.novell.com/products/netware/. Additional contributions were provided by Kevin Alexander of IBM US.

© Copyright IBM Corp. 2003. All rights reserved.

1043

NetWare considerations The endpoint authenticates methods on a NetWare server by using the Netware Directory Services (NDS). When you install a gateway or endpoint on a NetWare server, you must specify the Organizational Context in which the installation application creates the tmersrvd account (gateway) or the lcfrsrvd account (endpoint). These accounts are very limited (similar to the nobody account used by the Tivoli object dispatcher). These accounts are used to authorize unprivileged methods. If a method is unprivileged, it cannot write, remove, copy, move, or affect any system function.

NetWare accounts Tivoli Management Framework requires special accounts to manage NetWare gateways and endpoints. The following accounts are needed when you use Novell Directory Service (NDS): 򐂰 For endpoints, Admin (for root) and lcfrsrvd (for nobody) 򐂰 For gateways, Admin (for root) and tmersrvd (for nobody)

If you use NetWare Bindery emulation instead of NDS, the supervisor account is needed instead of the Admin account. These accounts are created during the installation when it is done using Novell Requester. In bindery, the accounts are created on the local machine. In NDS, you choose whether to have a single lcfrsrvd or tmersrvd account in the NDS tree where the machine is located or to have one in each NDS subcontext. When you install a NetWare gateway or endpoint using the InstallShield image, you are prompted to provide the NDS context in which the NetWare server works. This will be the context in which the account will be created or, if already present, updated with the information on the new gateway or endpoint. If you want to have a unique user in the NDS tree, you will have to provide the name of the higher level context. If you want to have a user in the subcontext, you have to specify the name of the subcontext where the machine is located. For example, you have an NDS tree called IBM and two NetWare systems named Server1 and Server2. Server1 is located in NDS context USA and Server2 is located in subcontext Texas.USA. You want to install an endpoint on Server2. An endpoint was already installed on Server1 and its installation created the lcfrsrvd account in the context USA. In this situation, you can update this account or create a new account in the subcontext (Texas.USA). In the first case, you need to specify USA as the context. Otherwise, you need to specify Texas.USA as the context.

1044

Troubleshooting Tivoli Using the Latest Features

If the Windows system does not have Novell Requester, the account is not created. To create the account, use the addadmin utility provided by Tivoli Management Framework. Important: Never use the Novell NWADMIN utility to create the account.

To create a Tivoli unprivileged account for use with Tivoli Management Framework, perform the following steps: 1. Load the appropriate library: – For gateways, enter the following command: load SYS:tivoli\bin\nwr-ix86\bin\libnds5 NDS_context

– For endpoints, enter the following command: load SYS:tivoli\lcf\bin\nw4\mrt\lcfutil5 NDS_context

2. Load the ADDADMIN.NLM file: – For gateways, enter the following command: load SYS:tivoli\bin\nwr-ix86\tmf\lcf\addadmin

– For endpoints, enter the following command: load SYS:tivoli\lcf\bin\mrt\addadmin

3. Use the addadmin command to create the accounts in your context. With this utility, you can create the account in the subcontext or update the one in the tree. When logging in from a subcontext, log in to the tree as .Admin.context, where context is the context of the Admin account. When creating or updating the one in the tree, log in as Admin. When prompted to specify the account (tmersrvd or lcfrsrvd) and the context where you want to create the account is in a subcontext, use dotted notation. For example, to create the tmersrvd account in subcontext agodina, type .tmersrvd.agodina as the account name.

Installing NetWare gateways You can install the gateway on a NetWare system from any supported Windows system that can access the NetWare server. You must log in to the NetWare system as Admin (NDS) or supervisor (bindery).

Appendix B. Tivoli/NetWare whitepaper

1045

Note: 򐂰 It is preferred that you log in to NetWare as Admin. For purposes of installing the NetWare Gateway, you can install it with the admin equivalent. But you will need to delete the tmesrv accounts and recreate them with the addadmin utility. 򐂰 If you are logged in to the NDS tree where the NetWare system is, but this tree is not set as primary for the Windows system, the appropriate user accounts cannot be created or updated correctly. The installation gives you an error message saying that it was unable to create the appropriate accounts.

Installing a gateway on a NetWare system consists of the following steps: 1. Install the NetWare binaries for the gateway. 2. Register the NetWare managed node. 3. Create the gateway.

Installing the NetWare binaries To create a NetWare gateway, the first step is to run /PC/NWGW/SETUP.EXE from the Tivoli Management Framework CD. A welcome dialog is displayed, stating that you are creating the NetWare gateway. Follow the instructions displayed in the dialogs. When prompted for the name, specify the host name of the Tivoli Management Region server to which the gateway will connect. When prompted, specify the Tivoli Management Region server installation password, if previously defined. Note: Because this is an InstallShield program, you can create a Response File to install the image in unattended mode.

When you install the NetWare binaries, the installation adds the following NetWare configuration files (NCF) to the NDS context: 򐂰 oserv1st.ncf 򐂰 oservrun.ncf

After running setup.exe, the object dispatcher is started.

1046

Troubleshooting Tivoli Using the Latest Features

Additionally, if you installed the NetWare gateway from a system not running Novell Requester to map the NetWare system, you need to run the following command from the NetWare server console to start the object dispatcher the first time: SYS:tivoli\bin\nwr-ix86\oserv1st -s Install_password

The -s option specifies the installation password, if defined.

Registering the NetWare managed node Registering the NetWare managed node consists of the following steps: 1. Starting the object dispatcher 2. Running the registration script

Starting and stopping the object dispatcher To restart the object dispatcher, run the following command: oservrun

To stop the object dispatcher, run the following command: oservend

Running the registration script After starting the object dispatcher on the NetWare system, you need to run the nw_TMF_Install.sh script from the Tivoli management region server to register the managed node with the Tivoli name registry bash nw_TMF_Install.sh host_name, where host_name is the TCP/IP host name of the NetWare system.

Creating the NetWare gateway After registering the NetWare managed node, you can create the gateway using any of the available installation mechanisms. For detailed information, refer to the Tivoli Framework 3.7.1 Installation Guide, GC32-0395.

Installing endpoints on NetWare The endpoint is supported on Novell NetWare 6.x or 5.x on NetWare running on Intel 486 or Pentium Systems. Tivoli Management Framework is supported as a gateway only on NetWare 5.0 with Service Pack 4.

Appendix B. Tivoli/NetWare whitepaper

1047

Note: Enable long name support by loading the LONGNAME.NLM module before installing Tivoli Management Framework on all volumes.

You can install the endpoint on a NetWare system from any supported Windows system that can access the NetWare server. For NDS, log in as Admin. For bindery, log in as supervisor. Note: If you are logged in to the tree where the NetWare system is, but this tree is not set as primary for the Windows system, the appropriate user accounts cannot be created or updated correctly. The installation gives you an error message saying that it was unable to create the appropriate accounts.

The endpoint must be installed on the SYS volume of the server. The method cache, however, can reside on any volume by setting the cache_loc option on the lcfd command. You can override default settings using the Advanced Configuration dialog of the installation process. To remotely install to a NetWare system, the Windows system must be running Novell Requester. If remotely installed, you must manually start the endpoint from the local machine. At the NetWare console, enter lcf to start the endpoint. The lcf command runs the LCF.NCF file, which was created during the remote installation. The LCF.NCF file contains the following: 򐂰 Load statement for LCFUTIL5.NLM and LCFD.NLM 򐂰 Any configuration changes specified on the Advanced Configuration dialog Note: The LCFRUNBASE001 entry and the -C option in the LCF.NCF file contain the path to the endpoint run directory. This directory is created during installation and is the working directory for the endpoint.

Endpoints in Novell Directory Services (NDS) An endpoint on a NetWare server can run in NDS with bindery either on or off. If bindery is set, no adjustments to the endpoint are required after installation. If the server is in bindery is off or if you have switched from bindery modes after installing the endpoint, the following procedure is needed to run the endpoint: 1. If you have an endpoint running, stop the endpoint daemon with the following command: lcfstop

2. Choose the context in which you want the lcfrsrvd account to reside.

1048

Troubleshooting Tivoli Using the Latest Features

3. If you installed an endpoint in bindery, remove the previously installed lcfrsrvd account. 4. Load the addadmin utility with the following commands: LOAD SYS:tivoli\lcf\bin\mrt\nw4\lcfutil5 NDS_context LOAD SYS:tivoli\lcf\bin\mrt\addadmin NDS_context

5. Use the addadmin utility to create the lcfrsrvd account in the context that you choose. For detailed instructions, refer to Appendix D, "Operating System Considerations", in the Tivoli Framework 3.7.1 Installation Guide, GC32-0395. 6. Edit the SYS:system\lcf.sys file. a. Add the following line: NWDS_CONTEXT= your_context

b. Remove the following line: BINDERY_EMULATION=yes

7. Enter lcf at the console to start the endpoint.

Migration of NetWare clients and managed sites The following sections describe migration considerations for NetWare clients and NetWare managed sites (that is, a NetWare server running Tivoli NetWare Repeater or the clients of the NetWare server).

NetWare clients operating as PC managed nodes NetWare clients running the NetWare PC agent through a PC managed node (that is, they are clients of a NetWare server but are not included in a NetWare managed site) should be converted to endpoint. Install an endpoint on these systems.

NetWare managed sites with IPX/SPX clients If a NetWare managed site contains a client running the IPX PC agent, create a NetWare gateway on the NetWare managed site and create endpoints on the IPX PC agents. Until the migration is completed, the NetWare server must remain a Tivoli NetWare Repeater.

Appendix B. Tivoli/NetWare whitepaper

1049

NetWare Managed sites with TCP/IP clients If a NetWare managed site contains clients running the TCP/IP PC agent, create endpoints on the TCP/IP PC agents. When creating these endpoints, assign them to an existing gateway, or create a NetWare gateway on the existing NetWare managed site. After assigning the endpoints to gateways, you cannot use the NetWare server to distribute files; use the gateway to distribute files to the endpoints.

Migrating NetWare clients to endpoints Changing the Tivoli environment to use an endpoint instead of a NetWare managed site and a NetWare server involves the following steps: 1. Installing an endpoint on each client of the NetWare server. If you want to manage the NetWare server using Tivoli Enterprise products, install an endpoint on this system as well. 2. Finding all profile managers that have the NetWare managed site as a subscriber. If your Tivoli environment uses interconnected regions, check profile managers in all regions. 3. Replacing the individual NetWare managed site subscriber in each profile manager with the set of endpoints representing the selected clients of the NetWare managed site. Refer to “Replacing a NetWare managed site subscriber in a profile manager” on page 1051 for more information. 4. Finding jobs in task libraries and queries in query libraries that reference the NetWare managed site subscriber. Replace the individual NetWare managed site subscribers with the set of endpoints representing the selected client of the NetWare managed site. After you successfully change from distributing through a NetWare managed site to distributing through the gateway, optional clean up includes: 򐂰 Removing the PC agent code from each modified system. 򐂰 Deleting the NetWareManagedSite object from the Tivoli Management Region. 򐂰 If the managed node that hosted the NetWare managed site is not needed for another purpose (for example, as a host for another NetWare managed site or as a gateway, you can delete the managed node from the Tivoli management region and uninstall Tivoli Management Framework from that system. 򐂰 If the Tivoli NetWare Repeater referenced by the NetWare managed site is not used by another NetWare managed site, you can remove the Tivoli NetWare Repeater code from the NetWare server.

1050

Troubleshooting Tivoli Using the Latest Features

There is no Tivoli Enterprise command or task for uninstalling the Tivoli NetWare Repeater. You need to remove the Tivoli binaries and other files from the system. If you need assistance, contact your Tivoli service provider.

Replacing a NetWare managed site subscriber in a profile manager Multiple NetWare managed sites can distribute through a single NetWare server. This type of deployment is often used when the clients of the NetWare server run more than one type of operating system. For example, consider the NetWare server rainbow and clients in Figure B-1. Two of the clients of the NetWare server (red and orange) run Windows NT. The remaining clients (yellow and green) run Windows 95. An administrator might need to be able to easily distribute some files to all clients, other files to only the Windows NT clients, and another set of files to only the Windows 95 clients. To do this, create three NetWare managed sites that distribute through the same NetWare server, selecting a different set of clients from rainbow's available clients list for each. For NetWare managed site rainbow-all, select all available clients. For rainbow-nt, select only Windows NT clients. For rainbow-w95, select only Windows 95 clients. This Tivoli environment is illustrated in Figure B-1. Netware Server rainbow

TMR Server

Profile Manager

rainbow-95 red Windows NT

orange Windows NT

yellow Windows 95

Profile Manager Profile Manager

rainbow-all

rainbow-NT

green Windows 95

Figure B-1 NetWare scenario

If red, orange, yellow, and green are running the TCP/IP PC agent, you have the option of managing them as endpoints through a gateway, rather than as NetWare clients through NetWare managed sites. To do this, change any profile managers that have rainbow-all, rainbow-nt, or rainbow-w95 as subscribers so they distribute to the appropriate set of selected NetWare clients: 򐂰 Subscriber rainbow-all must be replaced with red, orange, yellow, and green. 򐂰 Subscriber rainbow-nt must be replaced with red and orange. 򐂰 Subscriber rainbow-w95 must be replaced with yellow and green.

Appendix B. Tivoli/NetWare whitepaper

1051

Two different approaches accomplish this goal: Replace the single NetWare managed site subscriber with the set of endpoints that corresponds to the selected clients in the NetWare managed site. An example of this approach is described in “Replacing a NetWare managed site with endpoint subscribers” on page 1052. Create a new profile manager whose subscribers are the endpoints that correspond to the selected clients in the NetWare managed site. Subscribe the new profile manager to any profile manager that had the original NetWare managed site as a subscriber. An example of this approach is described in “Replacing a NetWare managed site with a profile manager” on page 1054. An advantage of this approach is that it identifies the endpoints as a specific subset of clients of the NetWare server, which may be important in your Tivoli environment. The other subscribers in the existing profile managers may influence which approach you choose. If the profile manager already contains other endpoint subscribers (which means it is in dataless mode), you cannot subscribe a profile manager. If the existing profile manager contains other profile managers as subscribers (which means it is in database mode), you cannot subscribe the endpoints directly. Note: Do not distribute the same profile to a single system using both the NetWare managed site and the endpoint. This is possible if another NetWare managed site distributes to the same Tivoli NetWare repeater as the original NetWare managed site.

Replacing a NetWare managed site with endpoint subscribers Suppose you have the following profile manager (Figure B-2 on page 1053).

1052

Troubleshooting Tivoli Using the Latest Features

Profile Manager

Deploy

Profile

File Pack1 Subscribers

NT2

rainbow-NT NT1

Figure B-2 Replacing a NetWare managed site with endpoint subscribers 1/2

Profile manager Deploy contains a Tivoli Software Distribution file package that is distributed to Windows NT servers NT1 and NT2,as well as to the NetWare managed site rainbow-nt (whose selected NetWare clients are red and orange). Deploy operates in database mode. If the identity of red and orange as clients of the NetWare server rainbow is not critical to your management practices, you can replace the subscriber rainbow-nt with the endpoints corresponding to the selected NetWare clients. After installing an endpoint on the NetWare clients red and orange, enter these commands to correct the subscribers of Deploy: wunsub @ProfileManager:Deploy @NetWareManagedSite:rainbow-nt wsetpm -d @ProfileManager:Deploy wsub @ProfileManager:Deploy @Endpoint:red @Endpoint:orange

Profile manager Deploy must be converted to dataless mode to contain endpoint subscribers. The resulting profile manager is illustrated by Figure B-3 on page 1054.

Appendix B. Tivoli/NetWare whitepaper

1053

Profile Manager

Deploy

Profile

File Pack1 Subscribers

NT2

red

NT1

orange

Figure B-3 Replacing a NetWare managed site with endpoint subscribers 2/2

Replacing a NetWare managed site with a profile manager Suppose you have the following profile manager hierarchy in Figure B-4.

Profile Manager

SoftDist

Profile

FixPack2 Subscribers

Netware managed site

rainbow-w95

Netware clients

yellow

green

Dataless Profile Manager

W95-eps

Endpoints

Bonarie

Aruba

Figure B-4 Replacing a NetWare managed site with a profile manager 1/2

1054

Troubleshooting Tivoli Using the Latest Features

Profile manager SoftDist contains a Tivoli Software Distribution file package that is distributed to the NetWare managed site rainbow-w95 (whose selected NetWare clients are yellow and green) and to profile manager W95-eps, which contains the Windows 95 endpoints Bonaire and Aruba. SoftDist operates in database mode; W95-eps is in dataless mode. In this example, you cannot replace SoftDist subscriber rainbow-w95 with endpoints yellow and green, because SoftDist is in database mode. Also, suppose you prefer to preserve the identity of yellow and green as clients of the NetWare server rainbow (even though you will not use rainbow to stage your distributions). To implement this, create the dataless profile manager rainbow-w95-eps (assume it belongs in policy region Development) with endpoints yellow and green as subscribers. After installing an endpoint on the NetWare clients yellow and green, enter these commands to correct the subscribers of SoftDist: wcrtprfmgr @PolicyRegion:Development rainbow-w95-eps wsetpm -d @ProfileManager:Netware-w95-eps wsub @ProfileManager:rainbow-w95-eps @Endpoint:yellow \ @End1point:green wunsub @ProfileManager:SoftDist @NetWareManagedSite:rainbow-w95 wsub @ProfileManager:SoftDist @ProfileManager:rainbow-w95-eps

The resulting profile manager hierarchy is illustrated by Figure B-5.

Profile Manager

SoftDist

Profile

FixPack2 Subscribers

Dataless Profile Manager

rainbow-w95-eps

Endpoints

yellow

green

Dataless Profile Manager

W95-eps

Endpoints

Bonarie

Aruba

Figure B-5 Replacing a NetWare managed site with a profile manager 2/2

Appendix B. Tivoli/NetWare whitepaper

1055

Special considerations for NetWare managed site subscribers If administrators for your installation change the list of selected clients for a NetWare managed site immediately prior to distribution, converting these NetWare managed sites to the gateway and endpoint distribution method can be problematic, because you will not know which endpoints to subscribe in place of the NetWare managed site. A related problem can occur if a NetWare managed site has more than 100 selected clients. A NetWare managed site distributes to a maximum of 100 clients; if more than 100 clients are selected, a subset of 100 receives the distribution. You must decide whether to subscribe the endpoints corresponding to all of the selected clients, or only those that receive the distribution.

Running the NetWare PC Agent and lcfd concurrently An endpoint and a PC agent can coexist on a single system. In other words, you can add an endpoint to a system that uses the PC agent to communicate with the Tivoli NetWare Repeater on a NetWare server, without changing existing distributions through a NetWare managed site. Do not distribute the same profile to a system using both a NetWare managed site and an endpoint.

Network limitations In general, Tivoli Management Framework uses Transmission Control Protocol/Internet Protocol (TCP/IP) for communication, although it does support Internetwork Packet Exchange/Sequenced Packet Exchange (IPX/SPX) communication on PCs connected to NetWare gateways. The NetWare gateways communicate with the Tivoli management region server using TCP/IP. Thus, the amount of network traffic that can be sustained by the networking interfaces on a Tivoli server represents a practical limit to the number of Tivoli clients that can be reasonably serviced by a single Tivoli server. Consider the following ways to improve the Tivoli server networking performance: 򐂰 Introduce high-speed network adapters on the Tivoli server. 򐂰 Design the network topology so that unnecessary traffic does not flow on the Tivoli server network segment. 򐂰 Use local file system storage for Tivoli binaries and libraries. 򐂰 Limit the network traffic from the Tivoli server to other clients and vice versa.

1056

Troubleshooting Tivoli Using the Latest Features

Considerations for NetWare gateways Tivoli Management Framework provides stand-alone gateways, that is, they cannot be considered as managed nodes. These gateways are available for NetWare operating systems. In general, they provide the same functionality of other gateways, such as: 򐂰 Endpoint login and communication 򐂰 Invocation of methods to be run on the endpoint and gateway

This section describes some of the special considerations of these gateways.

NetWare gateways The following subsections describe the gateway proxy for NetWare and IPX support.

Gateway proxies for NetWare NetWare gateways require a gateway proxy to run endpoint policy scripts. Any managed node can act as a gateway proxy. Using the wgateway command, you define a list of managed nodes that will act as a gateway proxy. The NetWare gateway contacts each of the managed nodes in the list, one at a time, until it finds a managed node available to run the script. If all the managed nodes in the list are down or unavailable, the NetWare gateway is unable to run the endpoint policy scripts. By default, no gateway proxies are defined. See the wgateway command in the Tivoli Framework 3.7.1 Reference Manual, SC31-8434 for information about modifying gateway proxies.

IPX support The NetWare gateway allows you to connect to endpoints with Transmission Control Protocol/Internet Protocol (TCP/IP) or IPX. To connect to an endpoint in IPX, you can use the IPX address, name resolution, or IPX broadcast. IPX/SPX name resolution allows login by the server name of the NetWare gateway. Thus, the endpoint does not have to know the IPX address of the NetWare gateway for login. It is important to note that this name resolution uses RIP packets. Refer to the Novell documentation for more information about routing RIP packets.

Additional considerations for NetWare gateways The following are additional considerations specific to gateways on NetWare: 򐂰 Enable long name support on the NetWare system. 򐂰 Install the gateway on any volume (SYS, VOL1, and so on). The gateway process uses SYS:public\Tivoli as the working directory.

Appendix B. Tivoli/NetWare whitepaper

1057

򐂰 Ensure that the NetWare system is able to ping the Tivoli server.

Endpoint considerations Endpoints do not maintain the Tivoli object database. This keeps the amount of resources required on the endpoint to a minimum, while simplifying the management of the database in the managed environment. To select computer systems for endpoints, consider the following: 򐂰 Ensure that enough disk space is available on the endpoint for the Tivoli method cache. The endpoint has a method cache on its disk. By default, the method cache size is up to 20 MB, and the system drive must have sufficient space to accommodate this. This depends on which Tivoli application the endpoint runs. 򐂰 Ensure that the computer system has at least 1 MB of memory (resident set size). The endpoint uses approximately 1 MB of memory. 򐂰 Select any administrator to install the endpoint. Installing the endpoint does not require the Tivoli administrator authorization role that is required for the installation of other Tivoli Management Framework components. 򐂰 Specify the login interfaces list as the installation option. If multiple gateways are on the same subnet and the endpoint is allowed to perform the broadcast for the initial login, several of the gateways may receive the login request and process it. The login interfaces list directs the endpoint to contact a specific set of gateways instead of broadcasting to find one. The broadcast operation will have an impact on the performance of the network. 򐂰 Use the endpoint's default port (9495) for all endpoint installations. Gateways by default are configured to use port 9494. This allows both gateway software services and endpoint software to run on the same system. 򐂰 As with Windows NT and Windows 2000 managed nodes, format the system drive with NTFS. If the system has no NTFS drives, create an NTFS drive for the installation of the endpoint. 򐂰 Install Tivoli applications that will be used on endpoints on the gateway that the endpoint gets logged in to. Install the same applications on all the gateways. This prevents problems that could occur if an endpoint logs in to a gateway not supporting all the applications.

Communication protocols Tivoli Management Framework-based applications operate using primarily TCP/IP. As a result, all Tivoli management region servers, managed nodes, and gateways in the managed environment must have TCP/IP installed, configured, and operational on their systems. Endpoints can use either TCP/IP or

1058

Troubleshooting Tivoli Using the Latest Features

Internetwork Packet Exchange/Sequenced Packet Exchange (IPX/SPX). TCP/IP is supported for all endpoint platforms; IPX/SPX is supported only for endpoints on Windows 2000, Windows NT, Windows 98, Windows 95, and NetWare.

NetWare gateways directory structure When you install a gateway on a NetWare system, the installation process creates several directories that contain the following files: 򐂰 Binaries 򐂰 Message catalogs 򐂰 Databases

These files are installed in the directories shown in the following dir structure: 򐂰 \Tivoli 򐂰 \Tivoli\db\host.db 򐂰 \Tivoli\bin\nwr-ix86 򐂰 \Tivoli\bin\generic 򐂰 \Tivoli\bin\lcf_bundle 򐂰 \Tivoli\bin\lcf_bundle.40 򐂰 \Tivoli\msg_cat

Where: \Tivoli\db

Contains the database files for the managed nodes.

\Tivoli\bin

Contains the following Tivoli binary directories:

nwr-ix86

Contains the Tivoli binaries used by NetWare managed nodes. The content of this directory depends on the Tivoli Enterprise products that you have installed.

generic

Contains Web pages and language code sets.

lcf_bundle

Contains the endpoint binaries required for each supported platform running Tivoli Enterprise software released prior to Tivoli Management Framework Version 3.6. These binaries support product compatibility with Tivoli Management Framework Version 3.7.1.

lcf_bundle.40

Contains the endpoint binaries required for each supported platform running Tivoli Management Framework Version 3.7.1.

\Tivoli\msg_cat

Contains the Tivoli message catalogs.

Appendix B. Tivoli/NetWare whitepaper

1059

Troubleshooting Abend recovery enhancements in 4.11 and above include the following features and capabilities: 򐂰 Additional abend information is displayed on the server console 򐂰 When an abend occurs, the header Additional Information: is displayed on the server console followed by the probable cause of the Abend.

When an abend occurs the information about the Abend is automatically written to a text file called Abend.LOG. The Abend information is formatted and written to appear like a mini core dump. The Abend>LOG file is initially created on the DOS partition of the server. However, the next time the SYS: volume is mounted, the information is written to SYS:\SYSTEM directory and the Abend.LOG partition is removed The SET AUTO RESTART AFTER Abend DELAY TIME parameter enables the NetWare server to automatically recover from an abend. According to the selected value, the server will recover in a variety of ways. The three values you can set are 0, 1, or 2. The default parameter is 1. 򐂰 If the parameter is set to 0, the server will not try to recover from the abend. 򐂰 If the parameter is set to 1, the server will attempt to recover from the abend. 򐂰 If the parameter is set to 2, the server will attempt to recover from the abend.

Questions related Tivoli/NetWare troubleshooting 򐂰 What is a core dump?

"Core dump," or "image dump," or "memory dump," or just "image," or "dump" are all terms used to describe the process of downloading the contents of your servers memory - to a file - at a given point in time. The dump is then used by Novell Technical Support to help troubleshoot various problems that may be exhibited at the server 򐂰 Who can read the core dump?

Novell Technical Support engineers 򐂰 A core dump is a lot of trouble. What should I do before I go to this extent?

Yes, it is a lot of trouble to take a core dump, and usually, it is a lot more trouble to read one. In order to avoid having us troubleshoot problems that are already fixed, we require that your server be running the current released OS patches and current LAN and disk drivers loaded. Current patches and drivers fix many issues. If you are running current patches and drivers and still have a problem, it may be time to make a core dump.

1060

Troubleshooting Tivoli Using the Latest Features

The file TABNDx.EXE (where x indicates the current revision of the file) can be downloaded from the following Web site: http://support.Novell.com

This file contains troubleshooting tools and documents that may be helpful in troubleshooting your server abend, or hang condition. 򐂰 How do I get a core dump from my NetWare server?

There are several methods that can be used to obtain a core dump (or memory image) from a NetWare server. The following is an explanation of the methods available: – Method # 1: Dump to Floppy This is the simplest method, but it may not be practical, depending on how much server RAM you have. When the server abends, you are asked Send coredump to HardDisk (Answer no to send coredump to Floppy) (y/n)? Respond with n and you will be prompted to insert a formatted disk in drive A:\. A core dump cannot be sent to any floppy drive other then A:\. You will be prompted for floppies until the entire contents of your server memory has been copied. You will not see any files on the floppy disk after the copy; however, the "bytes free" will indicate that there is data on the disk. – Method # 2: Dump to Hard Drive This method is much faster than dumping the image to floppy. In this case, the core dump is copied to the C:\coredump.img file. When you are prompted with the message Send coredump to HardDisk (Answer no to send coredump to Floppy) (y/n)?, y is the default. Just press Enter and the image will begin copying. You must have enough free space on the DOS partition to hold the entire server memory. If you have not planned this extra space into your C:\ partition, you may be able to add an inexpensive IDE drive and use the entire drive as a DOS partition. After the image is copied to the drive, the server can be brought back up and users can log in. You will then need imgcopy.nlm. This nlm is contained in the download file TABND x.EXE (where x is the current revision of the file). When imgcopy.nlm is loaded, it will allow you to copy the coredump.img file from the C:\ partition to the NetWare partition. After the file is on the NetWare partition, the file can be ftp'd to us or it can be backed up on tape and the tape can be sent to Novell. Before you do this, make sure that we have the software needed to restore the file.

Appendix B. Tivoli/NetWare whitepaper

1061

Launch the debugger by holding, simultaneously, the left Shift key, the right Shift key, and the Alt key, and then press the Esc key. Here are a few commands to be used while in the debugger: .c Force a core dump. q Quite to dos. g Go back to the point where you came into the debugger. h Help

When you enter .c at the # prompt, you will be prompted with a message, Send coredump to HardDisk (Answer no to send coredump to Floppy) (y/n)? You may then follow the instructions given for either method #1 or method #2. – Method # 3: Dump to Server This is the faster way to write the dump. It is called the network method. The problem server must have an additional Ethernet LAN card (must be Ethernet) installed. This card will enable the server to have a client connection to a second server. This second server will become the destination for the image file. When the problem server abends, the core dump is sent across the LAN connection to the destination server. This is how to setup for this type of coredump: i. Shut down the problem server and power it off. ii. Put in an additional LAN Card (Ethernet only). iii. Bring the server back up to a DOS prompt. iv. Set up an nwclient directory, and load the client drivers. v. Log in to the server that will be the destination server, and map a drive to the place you want the dump to be sent to (for example, f:\dumpdir). vi. Go to the destination server and write down your connection number (get this from monitor). Also, write down the drive mapping (that is, F:\dumpdir). vii. Bring up the problem server. viii.You should test the connection by going into the debugger and initiating a core dump. Press the following keys all at once (Alt, left, Shift, right Shift, and Esc). ix. At the # prompt, type .c . x. Follow the prompts with a Y to dump to a hard drive. When you are prompted for a path, enter f:\dumpdir\coredump.img (or the mapping you wrote down, if it is different). Press Enter. The image should start copying.

1062

Troubleshooting Tivoli Using the Latest Features

xi. If this works, go on to the next step. If it does not, here are some possible reasons: Hardware configuration problem on additional LAN card. Using Netx with some version of LAN drivers produces bad packets. For example, connection to second server, routing, or network. Other normal client to server troubleshooting issues. xii. If you are able to duplicate the abend, take a core dump. Most likely, however, you will have to wait until the next time it abends on its own. Your default connection time to the destination server will be 15 minutes. In order to hold the connection longer, you have two choices: Increase the watchdog time out parameters or load netalive.nlm (included in the TABND2A.EXE file). You will load netalive with two parameters, the name of the problem server, and the name of the destination server. For more information see the Readme document for netalive at the Novell support Web site: http://support.Novell.com

Before sending the dump to IBM Support, please do the following: 1. For troubleshooting purposes, it is highly recommended that you collect a NetWare Server Configuration report (config.txt) before reporting a problem to IBM Support. To generate a config.txt report, enter following command at the server console: Load config /ads

Config.txt will be generated on the SYS volume of the NetWare server. 2. If you see/believe Tivoli software is causing the NetWare server to experience reboots (after an abend), then you will need to turn off the Auto restart after an Abend on the server so a memory/core dump can be collected. By default, the Auto reboot after an abend feature is turned ON. 3. To display the Novell NetWare Modules (NLMs) currently running on the NetWare server, type at the Modules console (a list, one full screen at a time, will be displayed). To display only information of a particular module currently loaded at the server, type Module . 4. Use Novell's Conlog utility to capture the console messages. To capture the NetWare Server console messages, type the following command on the server console: Load conlog

If it is already running, you will get a confirmation message.

Appendix B. Tivoli/NetWare whitepaper

1063

5. Recreate the problem and wait for the console message to appear and then type: Unload conlog

This will stop capturing the messages. Look for console.log file in the ETC directory located on SYS volume of the server.

1064

Troubleshooting Tivoli Using the Latest Features

C

Appendix C.

Scripts This appendix contains scripts written and/or collected by the project team, their colleagues, or other world-wide Tivoli users, who wish to share their experience and knowledge with the readers of this redbook. The scripts are mostly Korn shell scripts, but some are Perl scripts as well. They are provided on an as-is basis, with no guarantee or support.

© Copyright IBM Corp. 2003. All rights reserved.

1065

log_it.pm This script is used as a common subroutine to be called by various Perl scripts to provide a common logging standard: ######################################################### # # # &logit (Severity, Data) # # sub to write to the log file with a time stamp... # # # # This is a common subroutine to be called by various # # perl scripts to provide a common logging standard. # # It is passed two variables, Severity and Data. Where # # the severity is either: fatal, critical, minor, # # warning, harmless, unknown, info, or debug. # # Data is the verbiage to write to the log. # # This sub also relies on variables to be set in the # # main script. They are as follows: # # # # %LOG hash with keys matching the severity levels: # # fatal, critical, minor, warning, harmless, etc... # # The value for each key determines whether or not # # logging is enabled for that level. # # i.e. $LOG{fatal}=1 # # # # $LOGFILE scalar variable which is the location of # # the log file. # # # # $LOGSOURCE scalar is the Primary source (i.e. gw_pol # # for gateway policy.) # # # # $LOGSUBSOURCE scalar is the Secondary source (i.e. # # after for after_install_policy) # # # # $TMR scalar is the TMR Short Name (i.e. prrm) # # # # $LOGUNQID is a unique identifier for the particular # # instance of the script. This can be either a PID # # or something easier to read like the endpoint label. # # # ######################################################### sub Logit { local($SEV,$DATA) = @_; if ($LOG{$SEV}) { ($MIN,$HRS,$DAY,$MONTH,$YEAR)=(localtime)[1..6]; $MONTH+=1; if ($MONTH OID:LABEL:DESCRIPTION:REVISION $OID = $_[0]; $OidInfo = ""; $Revision = `idlcall $OID _get_revision`; if ($Revision =~ m/SYSTEM/) { $Revision = "" ; }; $OidInfo = join(",", $OID, `idlcall $OID _get_label`,`idlcall $OID _get_description`, $Revision ); $OidInfo =~ tr/\"//d; # Remove quotes return $OidInfo } sub PrintProduct { # Print Product Information # Input STRING in format: OID:LABEL:DESCRIPTION:REVISION ($OID, $LABEL, $DESCRIPTION, $REVISION) = split (/,/,$_[0]); ($OID,$garbage) = split(/#/,$OID); # remove Object Information... print "$LABEL -- $OID \n"; print " $DESCRIPTION, $REVISION\n" if $opt_v; } sub PrintPatch { # Print Patch Information # INPUT STRING in format: OID:LABEL:DESCRIPTION: ($OID,$LABEL,$DESCRIPTION) = split (/,/, $_[0]); ($OID, $garbage) = split(/#/,$OID); print " --> $LABEL -- $OID \n"; print " $DESCRIPTION\n" if $opt_v; } sub PrintAliases { # Gets and Prints Aliases of PATCH/PRoDCUT OID's # INPUT: OID ($OID,$LABEL,$remainder) = split(/,/, $_[0],3); $TNR = `wlookup NameRegistry`; chop($TNR);

1078

Troubleshooting Tivoli Using the Latest Features

if ($OID =~ /Engine/) { $type = "distinguished" } elsif ($OID =~ /Product/) { $type = "ProductInfo" } elsif ($OID =~ /Patch/ ) { $type = "PatchInfo" } $aliases = ` idlcall $TNR lookup '"$type" "$LABEL"' | idlarg 3 | idlarg 2 `; $aliases =~ tr/\}\{//d; @Alias = split(/\s/, $aliases ); shift @Alias; if (shift @Alias != 0) { print " ***************** Aliases *************** \n"; $i = 0; for ( $i = 0; $i < $#Alias; ) { printf " %-30s %-30s\n", $Alias[$i++], $Alias[$i++]; } } } sub Set_Env { $TivEnv = "/etc/Tivoli/setup_env.sh"; @EnvVal = `. $TivEnv; env`; foreach $EnvValu (@EnvVal) { chop($EnvValu); ($Key, $Val) = split(/=/, $EnvValu); $ENV{$Key} = $Val; } } sub print_help { print print print print print print print print

"usage: $0 [-va] -n MNname \n"; "\n"; " Options: \n"; " -v Verbose mode, show Product Descriptions \n"; " -a Show product aliases from NameRegistry \n"; " -n REQUIRED to pass ManagedNode name that \n"; " you want to see what products/patches \n"; " are installed \n";

}

Appendix C. Scripts

1079

installed.pl This script will query the TMR for all installed products and returns a selection list. For the product selected, it will display all installed nodes and the patch level for each: #!/usr/bin/perl use Getopt::Std; getopts('ah'); &Set_Env(); # check for usage... if( $opt_h ) { &print_help; exit(0); } $TNR=`wlookup NameRegistry`; chomp $TNR; %Patches=(); for (`wls -l /Library/ProductInfo`) { ($OID)=split(/\s/,$_); $PatchDesc = `idlcall $OID _get_description`; $PatchDesc =~ s/\"//g; $Patches{$PatchDesc} = $OID; } $cnt=1; foreach $Patch (sort keys %Patches) { push(@ReturnList,$Patch); print"$cnt. \t$Patch\n"; $cnt++; } print "0. \tQuit\n"; print "\nSelect The product number you wish to query: "; $Select=; print"\n\n"; if($Select == 0){exit 0;} print "$ReturnList[$Select-1]\t$RetOID\n"; print"-------------------------------------------\n"; # Generate a list of all Patches installed on the TMR for the given product # @PatchOIDS is the array of Patch OIDS generated.

1080

Troubleshooting Tivoli Using the Latest Features

@PatchOIDS=(); $RetOID=$Patches{$ReturnList[$Select-1]}; $InstalledPatches = `idlcall $RetOID _get_patches`; $InstalledPatches =~ tr/\{\}//d; $InstalledPatches =~ s/^\s/$1/; @PatchOIDS = split(/\s/, $InstalledPatches) ; shift @PatchOIDS;

for $ManagedNode (`wlookup -Lar ManagedNode`) { $ManagedNode =~ s/\s//g; if (&Check_Install($ManagedNode, $RetOID)) { print"MN : $ManagedNode\n"; for $PatchOID (@PatchOIDS) { $PatchLabel=`idlcall $PatchOID _get_label`; $PatchLabel =~ s/\"//g; if(&Check_Install($ManagedNode, $PatchOID)){ print"\t$PatchLabel\n"; if($opt_a){ for(&GetAliases($PatchOID)){ print"\t-\t$_\n"; } } } } } } sub Check_Install { my ($ManagedNode, $OID)= @_; $InstallReturn=`idlcall $OID get_host_locations \'{ 1 \"$ManagedNode\" }\' | idlarg 1`; if($InstallReturn ne "0") { return 1; } else { return 0; } } sub GetAliases { my ($OID) = @_; $Label=`idlcall $OID _get_label`; $Aliases = ` idlcall $TNR lookup '"PatchInfo" $Label' | idlarg 3 | idlarg 2 `; $Aliases =~ tr/\}\{//d; $Aliases =~ s/\s//g; @Alias = split(/[\"]+/,$Aliases); shift @Alias;

Appendix C. Scripts

1081

return @Alias; } sub Set_Env { $TivEnv = "/etc/Tivoli/setup_env.sh"; @EnvVal = `. $TivEnv; env`; foreach $EnvValu (@EnvVal) { chop($EnvValu); ($Key, $Val) = split(/=/, $EnvValu); $ENV{$Key} = $Val; } } sub print_help { print " usage: $0 [-ah] \n"; print " \n"; print " Options: \n"; print " -a Show product aliases from NameRegistry \n"; print " -h Show this help message \n"; print " \n"; print " Description: \n"; print " This script will query the TMR for all installed products and\n"; print " return a selection list. For the product selected, it will\n"; print " display all installed nodes and the patch level for each.\n"; }

1082

Troubleshooting Tivoli Using the Latest Features

D

Appendix D.

Additional material This redbook refers to additional material that can be downloaded from the Internet as described below.

Locating the Web material The Web material associated with this redbook is available in softcopy on the Internet from the IBM Redbooks Web server. Point your Web browser to: ftp://www.redbooks.ibm.com/redbooks/SG246614

Alternatively, you can go to the IBM Redbooks Web site at: ibm.com/redbooks

Select the Additional materials and open the directory that corresponds with the redbook form number, SG246614.

Using the Web material The additional Web material that accompanies this redbook includes the following files:

File name SG246614.zip

© Copyright IBM Corp. 2003. All rights reserved.

Description Zipped Code Samples

1083

System requirements for downloading the Web material The following system configuration is recommended: Hard disk space : Operating System:

1 MB minimum Windows/UNIX

How to use the Web material Create a subdirectory (folder) on your workstation, and unzip the contents of the Web material zip file into this folder.

1084

Troubleshooting Tivoli Using the Latest Features

Abbreviations and acronyms ACF

Adapter Configuration Facility

DHCP

Dynamic Host Configuration Protocol

ACL

Access Control List

DII

ADE

Advanced Development Environment

Dynamic Invocation Interface

DLL

Dynamically Linked Library

AEF

Application Extension Facility

DM

Domain Manager

DMI

ALI

Authentication, Location and Inheritance

Desktop Management Interface

DMTF

ANSI

American National Standards Institute

Desktop Management Task Force

DO

Display Object

AP

Activity Planner

DPO

Default Policy Object

APE

Activity Plan Editor

EP

Endpoint

API

Application Programming Interface

FAT

File Allocation Table

APM

Activity Plan Monitor

FFDC

First Failure Data Capture

AS/400

Application S/400

FQHN

Fully Qualified Host Name

BARC

Before, After, Remove & Configuration (script)

FSFI

Free Space Fragmentation Index

BDC

Backup Domain Controller

FTA

Fault Tolerant Agent

BDT

Bulk Data Transfer

GEM

Global Enterprise Manager

BO

Behavior Object

GUI

Graphical User Interface

HDB

Historical Database

IANA

Internet Assigned Number Authority

IBM

International Business Machines Corporation

IDL

Interface Definition Language

IOM

Inter-Object Messaging

IPC

Inter-Process Communication

IPX/SPX

Internet Packet Exchange/Sequenced Packet Exchange

ISMP

Install Shield MultiPlatform

BOA

Basic Object Adapter

CCMS

Configuration and Change Management System

CIM

Common Information Model

CLI

Command Line Interface

CM

Change Manager

CORBA

Common Object Request Broker Architecture

CTOC

Collection Table of Contents

DBA

Database Administrator

DBDIR

DataBase Directory

© Copyright IBM Corp. 2003. All rights reserved.

1085

ITCM

International Tivoli Configuration Manager

RDBMS

Relational DataBase Management System/Server

ITM

International Tivoli Monitoring

RIM

RDBMS Interface Module

RPC

Remote Procedure Call

ITSO

International Technical Support Organization

RPM

Red Hat Package format

SAM

Security Account Manager

SCS

Scalable Collection Service

SeOS

Security Operating System

SIS

Software Installation Service

Lightweight Directory Access Protocol

SMB

Server Message Block

SP

Software Package

LQM

Local Queue Manager

SPB

Software Package Block

LSA

Local Security Authentication

SPD

Software Package Definition

SSL

Secure Socket Layer

TAP

Tivoli Authentication Package

TCP/IP

Transmission Control Protocol/Internet Protocol

JRE

Java Runtime Environment

JRIM

Java RIM

LCF

Lightweight Client Framework

LDAP

MCSL

Monitoring Capabilities Subscription Language

MDist1

Multiplexed Distribution1

MDist2

Multiplexed Distribution2

MDM

Mater Domain Manager

TEC

Tivoli Enterprise Console

MIF

Management Information File

TEIDL

Tivoli-Extended IDL

TLI

Task Library Language

NAT

Network Address Translation

TMA

Tivoli Management Agent

NIS

Network Information Service

TME

Tivoli Management Environment

NLS

National Language Support

TMF

NQM

Network Queue Manager

Tivoli Management Framework

NTFS

NT File System

TMR

OID

Object ID

OMG

Object Management Group

Tivoli Management Region or TME 10 Management Region

ORB

Object Request Broker

TNR

Tivoli Name Registry

PD/PSI

Problem Determination/Problem Source Isolation

TRAA

Tivoli Remote Access Account

TRIP

Used to Refer to the Tivoli Remote Execution Service

TTL

Time To Live

TWS

Tivoli Workload Scheduler

UID

User Identifier

VPO

Validation Policy Object

PDC

Primary Domain Controller

PO

Prototype Object

RACF

Resource Access Control Facility

RCS

Revision Control System

1086

Troubleshooting Tivoli Using the Latest Features

WINS

Windows Internet Naming Service

XBO

Extended Behavior Object

XML

EXtensible Markup Language

Abbreviations and acronyms

1087

1088

Troubleshooting Tivoli Using the Latest Features

Related publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.

IBM Redbooks For information on ordering these publications, see “How to get IBM Redbooks” on page 1092. 򐂰 End-to-End Scheduling with Tivoli Workload Scheduler 8.1, SG24-6022 򐂰 IBM Tivoli Monitoring Version 5.1: Advance Resource Monitoring, SG24-5519 򐂰 IBM WebSphere Version 4.0 Advanced Edition Handbook, SG24-6176 򐂰 Maintaining Your Tivoli Environment, SG24-5013 򐂰 Tivoli Enterprise Internals and Problem Determination, SG24-2034

Other resources These publications are also relevant as further information sources: 򐂰 IBM Tivoli Access Manager for Operating Systems Release Notes, GI11-0951 򐂰 IBM Tivoli Business Systems Manager Administrator's Guide Version 2.1, GC32-0799 򐂰 IBM Tivoli Business Systems Manager Installation and Configuration Version 2.1, GC32-0800 򐂰 IBM Tivoli Business Systems Manager Release Notes v2.1, SC23-4841 򐂰 IBM Tivoli Configuration Manager Introduction Version 4.2, GC23-4703 򐂰 IBM Tivoli Configuration Manager Planning and Installation Version 4.2, GC23-4702 򐂰 IBM Tivoli Configuration Manager Reference Manual for Software Distribution Version 4.2, SC23-4712 򐂰 IBM Tivoli Configuration Manager Version 4.2 Release Notes, GI11-0926 򐂰 IBM Tivoli Configuration Manager User's Guide for Deployment Services Version 4.2, SC23-4710 򐂰 IBM Tivoli Enterprise Console Adapters Guide Version 3.8, GC32-0668.

© Copyright IBM Corp. 2003. All rights reserved.

1089

򐂰 IBM Tivoli Enterprise Console Installation Guide Version 3.8, GC32-0823 򐂰 IBM Tivoli Enterprise Console Rule Builder's Guide Version 3.8, GC32-0669 򐂰 IBM Tivoli Enterprise Console User's Guide Version 3.8, GC32-0667 򐂰 IBM Tivoli Monitoring User’s Guide Version 5.1.1, SH19-4569 򐂰 Tivoli Application Services Manual Volume 2 3.6, SC31-8350 򐂰 Tivoli Enterprise Console Reference Manual Version 3.7 , GC32-0666 򐂰 Tivoli Enterprise Data Warehouse Enabling an Application Version 1.1, GC32-0745 򐂰 Tivoli Enterprise Data Warehouse Installing and Configuring Version 1.1 , GC32-0744 򐂰 Tivoli Enterprise Data Warehouse Release Notes Version 1.1, GI11-0857 򐂰 Tivoli Framework 3.7.1 Installation Guide, GC32-0395 򐂰 Tivoli Framework 3.7.1 Reference Manual, SC31-8434 򐂰 Tivoli Framework 3.7.1 User's Guide, GC31-8433 򐂰 Tivoli Management Framework Enterprise Installation Guide Version 4.1, GC32-0804 򐂰 Tivoli Management Framework Maintenance and Troubleshooting Guide, GC32-0807 򐂰 Tivoli Management Framework Planning for Deployment Guide Version 4.1, GC32-0803 򐂰 Tivoli Management Framework Reference Manual Version 4.1, SC32-0806 򐂰 Tivoli Management Framework Release Notes Version 4.1, GI11-0890 򐂰 Tivoli Management Framework User ’s Guide Version 4.1, GC32-0805 򐂰 Tivoli Workload Scheduler 8.1 Error Messages, SH19-4557 򐂰 Tivoli Workload Scheduler Release Notes Version 8.1 (no number) 򐂰 Tivoli Workload Scheduler for z/OS V8R1 Customization and Tuning, SH19-4544 򐂰 Tivoli Workload Scheduler for z/OS V8R1 Diagnosis Guide and Reference, LY19-6410 򐂰 Tivoli Workload Scheduler for z/OS V8R1 Messages and Codes, SH19-4548 򐂰 Tivoli Workload Scheduler for z/OS V8R1 Quick Reference, GH19-4541 򐂰 TME 10 ADE Framework Services, GC31-8348 򐂰 TSSM Supplement for Policy Director, GC32-0473 򐂰 z/OS V1R3.0 MVS Initialization and Tuning Guide, SA22-7591

1090

Troubleshooting Tivoli Using the Latest Features

򐂰 z/OS V1R4.0 MVS System Commands, SA22-7627

Referenced Web sites These Web sites are also relevant as further information sources: 򐂰 Deconetix Web site http://www.deconetix.com

򐂰 Distributed Management Task Force Web site http://dmtf.org

򐂰 IBM Redbooks TME 10 archive ftp://www.redbooks.ibm.com/redbooks/tme10_archive

򐂰 IBM Tivoli Support Migration Web site http://www.tivoli.com/support

򐂰 Java Search Page http://search.java.sun.com

򐂰 Microsoft Knowledge Base http://support.microsoft.com/default.aspx?scid=fh;EN-US;kbhowto&sd=GN&ln=

򐂰 Novell Netware Web site http://www.novell.com/products/netware/

򐂰 Novell support Web site http://support.Novell.com

򐂰 NTBugtraq Web site http://www.ntbugtraq.com

򐂰 Orb Data Web site http://www.orb-data.com

򐂰 Orb Data Technical Exchange http://www.orb-data.com/TechExchange.html

򐂰 Speedguide Web site (information about TCP/IP "Speed tweaks" for Windows NT and 2000) http://www.speedguide.net

򐂰 SpeedGuide Cable Modems & xDSL Registry Tweaks Web site http://www.speedguide.net/Cable_modems/cable_registry.shtml

Related publications

1091

򐂰 SpeedGuide Windows 2000 & Windows XP Broadband Registry Tweaks Web site http://www.speedguide.net/Cable_modems/cable_reg_win2k.shtml

򐂰 Sysinternals.com Web site (useful utilities for Windows) http://www.sysinternals.com

򐂰 Tivoli Field Guides http://www-3.ibm.com/software/sysmgmt/products/support/Field_Guides.html

򐂰 Tivoli support ftp site ftp://ftp.software.ibm.com/software/tivoli_support/

򐂰 Tivoli support Web site http://www-3.ibm.com/software/sysmgmt/products/support/

򐂰 TME10 Tivoli Technical Mailing List Web site http://publib-b.boulder.ibm.com/redbooks.nsf/Portals/TivoliCustom1

򐂰 Windows Management Instrumentation Tutorial Web site http://www.microsoft.com/downloads/release.asp?releaseid=12570

How to get IBM Redbooks You can order hardcopy Redbooks, as well as view, download, or search for Redbooks at the following Web site: ibm.com/redbooks

You can also download additional materials (code samples or diskette/CD-ROM images) from that site.

IBM Redbooks collections Redbooks are also available on CD-ROMs. Click the CD-ROMs button on the Redbooks Web site for information about all the CD-ROMs offered, as well as updates and formats.

1092

Troubleshooting Tivoli Using the Latest Features

Index Symbols $LOG 123 $LOGFILE 123, 126 $LOGSOURCE 123, 126 $LOGSUBSOURCE 123, 126 $LOGUNQID 123 $TMR 123, 126 ( 248 .out 122 /states 429 \SERVERivfiles 984 _INTERNAL_RESGRP 920

Numerics 3.7.1-TMF-0005 1010

A abbreviations 1085 ABEND 821 ABENDU 821 aborted 399 Abstract Class 25 access control list 1013 access count 418 Access Manager API 874 acronyms 1085 active distributions 407 active RuleBase 531 Activity Planner 879 configuration file 902 log files 904 processes 902 trace files 905 troubleshooting 902 addadmin utility 1049 ADE See Tivoli Advanced Development Environment ADE files 1040 administrator 306 actions in remote TMRs 380 creating 308 group name 310

© Copyright IBM Corp. 2003. All rights reserved.

ID mapping 316 remove or delete from Tivoli 320 roles 323 Tivoli name 310 Tivoli roles 323 UNAUTHORIZED 325 user login name 310 Adupd.ldf 235 Advanced Configuration dialog 1048 aggregation 618 ALI object 29, 31 All_Profiles_PM 54 allow_install 491 allow_install_policy.pl 1067 altpass 863 AMS 736 Analyzer 622 anatomy of a Tivoli CD-ROM 146 ApiServlet.log 923 APM 879 applet 874 Application Extension Facility 33 Application Level Gateway 874 Application Management Specification See AMS 736 Application server 927 apply maintenance 212 architecture 618 asynchronous interface 399 attributes 33, 680 authentication 530 Authentication Package 1023 authentication packages 977 authorization roles 306 Autotrace 132 atctl 134 channel number assignment 137 channel operations 140 reset a channel 140 resume tracing 140 snap a channel 140 suspend tracing 140 configuration files 134 control files 133

1093

customer environment 138 database 132 determine active products 139 determine configuration channels 139 dynamic control 132 enable tracing 139 existing config file 139 First Failure Data Capture 132 initialization 138 run-time components 133 shared library 133 Snap Viewer 142 output 142 starting 143 terminology 133 process ID 133 product ID 133 trace ID 133 trace buffers 132 trace ID 132 Trace Profiler 132 Autotrace Snap Viewer 142 azhAPI 874–875

B back out 272 backup 278–280, 282, 287, 289, 291, 777, 780 backupdb.log 286 backups directory 286 default directory 300 object 302 Tivoli object database 277 BackupClient 301 base object 29 bash 768 bash program 768 bash.exe 197, 1018 basic collection 27 Basic Object Adaptor 19 BBGUI 328, 330, 332 bdb file 74 bdbe 58, 96 bdbx 58, 67, 74–76, 96–97 behavior object 33 behavior of an object 681 binary backup 289 bleshooting 839 BOA

1094

See Basic Object Adapter broken connection 431 browser 874, 923 built-in NT Administrator 985 BuiltinNTAdministrator 986, 994 bulletin board 328, 332 BUSC See Business System Container 681 business intelligence reporting 759 Business System Container 681 Bypass Traverse Checking 980

C cache 399 cache_loc option 1048 callback 400 cancel 429 canceling a distribution 438 caret 924 CARRYFORWARD schedule 864 catalog statistics 780 ccm RIM object 233 CCMS 42, 45, 47, 52 CCMS components 42 CCMS configuration profiles 52 CCMS Distribution 45 CCMS profile databases 52 CCMS profile endpoints 52 CCMS profile managers 52 CCMS profile organizers 52 CD 158 CD image 761 CD swapping 761 CDW See central data warehouse chain 149 Change Management Status summary 901 Change Manager configuration file 908 trace files 909 check_db 102–103 checkpoint memory buffer size 428 checkpoint restart 427 chown 859 CIM 619–620, 622 architecture 619 Object Manager 620 Schema 620

Troubleshooting Tivoli Using the Latest Features

Specification 620 Static classes 620 CIMOM 620 class 33 class objects 25, 28, 32–33 classes 41 Client Platform-Specific plug-ins 874 client.cfg.error 122 client.cfg.output 122 CLOSEDIR.lck 190 CM See Change Manager CM Status 875 CM Status summary 901 CNOTICE 329 Code Server 900 collection 24, 27, 358 collection checking 104 collection interface 105 collection member 106 collection objects 24 collector debugging 930 command line utility 783 Common Information Model 619 common model 620 Common Object Request Broker Architecture 19 communications between workstations 795 Compatibility Mode 993 concurrent connections 403, 406 Conduit 914 Configuration and Change Management System See CCMS 42 conn_retry_cutoff 433 conn_retry_interval 405, 430–431, 433 connecting TMRs 356 connections 407, 409 connector 794 consistency 97 consistent install 212 CONTENTS.LST 147 control database 768 Control server 759 CORBA See Common Object Request Broker Architecture core dump 1060 core model 620 correlation 618 corrupted data 860

corruption 97 create_instance 32 credential 991 CurrentNTRepeat 154 CurrentNtRepeat 995, 997 custom backup script 288–289 custom binary backup script 291 custom notice 282, 287 custom notice group 329 custom script 768

D data analysis 759 data flow 399 Data Handler debug information 929 troubleshooting 929 data mart 759 data mining 759 Data Moving log file 896 process flow 894 trace files 896 troubleshooting 894 data stream 881 database 278, 282, 289 database backup 273, 282 database directory 294 database profile managers 43, 45 dataless 43 dataless profile manager 45, 47 datatypes 41, 72 DB2 CLP 766 DB2 e-fix 762 DB2 log file 761 db2stop 783 dbaccess for Informix 766 debug 929 debug level 930 debug level 3 929 decision tree file 622 default 884 default policy 33, 306 task library 339, 341 default priority 404 delivery operation 400 Demilitized Zone See DMZ

Index

1095

depot 397, 416, 419, 875 depot commands delete 421 list 421 purge 421 desired state 875 Desktop for Windows 309 Device Directory 877 device groups 878 Device Management Server 878 device management troubleshooting 910 DHCP 764 DII See Dynamic Invocation Interface disconnected install 870 disconnected support 433 disk_max 414 DISPLAY variable 761 distinguished resources 29 distmgr.log 886 distribute over_all_no_merge option 51 distributed environment 871 Distributed Management Task Force 619 distribution 49 distribution control 435 Distribution ID 895 distribution maintain option 50 distribution over_all option 51 distribution over_opts option 50 distribution status 435 Canceled 436 Expired 436 Failed 436 Interrupted 436 Paused 436 Receiving 435 Rejected 436 Sending 436 Successful 436 Unavailable 436 Waiting 435 Distribution Status Console 934 DLL conflicts 1008 DMTF 619, 621 DMZ 872 DNS 913 documentation library 760 double byte character set 760 downcall 465

1096

download an application 872 drop databases 782 dswin.log file 216 dummy.out 767 duplicate endpoints 491 dynamic buffer evaluation 429 Dynamic Invocation Interface 21

E Effective setting 1029 emergency fix 261 endpoint considerations for NetWare 1058 endpoint gateway 427, 434 endpoint management 494 endpoint manager 888 endpoint upgrade 271 endpoints anticipated gateway 493 common misconceptions 469 common problems 469 configuration/startup problems 480 correcting duplicate endpoints 491 distribution 622 duplicate endpoints 487 gateway failure 492 ghost installation images 469 host machine failure 492 improper login interval settings 492 initial login 470 installation problems 479 isolation login 473 lcfd.log 481 login failure 478 login problems 481 login process 470 login_interval 493 migratory login 473 multiple endpoints 487 normal login 473 orphaned endpoint login 474 orphaned endpoints 469 re-attempt login 493 select_gateway policy 493 udp_interval setting 493 upcall 492 engine 618 Enterprise Directories 880 Enterprise Directory Integration

Troubleshooting Tivoli Using the Latest Features

trace 927 troubleshooting 927 Enterprise Directory server 877 environment 155 EtcTivoli 155 o_dispatch 155 RIM_DB_LOG 460 wlocalhost 155 environment variable 761 epmgrlog 118–119 EQQDUMP 825 establish connection 408 EtcTivoli 155 ETL programs 758 Event Activity Report 575 event processing 823 event repository 529 Event Server 370 ExBadClass 110 ExBadClassName 110 ExBadMember 106 ExBadPRName 107, 111 ExClassUnreadable 111 ExCollectionUnreadable 105, 107 ExDbCheckNotFound 104 exec_program 529 exec_task 529 execsql 767 execute_timeout 433 ExEntryNotFound 104 ExInvalidMemberData 106 ExInvalidPDO 110 ExInvalidPRO 108 ExInvalidPVO 110 exit status 400 ExMemberBadBackref 107 ExMemberNoBackref 106 ExMemberUnreadable 105, 107 ExMemberWriteError 108 ExNestedResourceNotFound 104 ExNoMember 108 ExRegionUnreadable 110 extension schemas 620 Extract 758 Extract, Transform and Load See ETL programs ExUnsupportedClass 107, 111

F failed installation 761 failed login 471 fan out 409, 425 FAT file system 154 file package block 426 file system 415 file system backup 272–273, 293 file_versions 298 filtered collection 27 firewall 872 fix_db 102 ForceGuest 1012 FrameRelay 986 free space 761 fresh server install 212 FRESH subdirectory 214 From depot 422, 427 FRWSL0007E 1029 Fully Qualified Domain Name 1015 future releases 758

G gatelog 118–119, 931 gateway 425, 626–629 gateway database 96 gateway log file 931 gateway method 465 gateway proxies for NetWare 1057 gateway repeater 398, 409 gateway session timeout 392 GEM 685 general availability patch 261 general troubleshooting 882 get_net_aliases 488 get_principal_roles 31 gethostbyaddr() 1014 gethostbyname() 1014 getpwname failed 347 GetSystemDirectory 1009 Global Enterprise Manager See GEM Global Repeater Manager timeout 392 gold-image 269–270, 272 gold-image binaries 269 grep 58, 860 GUID 874

Index

1097

H hexadecimal representation 680 HTML page 875 HTTPS protocol 872 Hub TMR 368–369 hub-spoke architecture 367

I IBM 763 IBM DB2 876 IBM Enterprise Directory Query Facility 870 IBM support center 851 IBM Tivoli Access Manager for Operating Systems auditing trace events 963 certificate files 969 Certificate Transfer Utility 969 check disk space 969 configuration problems 967 connectivity to LDAP server 968 connectivity to Tivoli Policy Director server 967 default SSL port number 968 global audit levels 964 Immune-Programs 964 log files 966 master_authzn.db 972 osseal 970 policy database 972 prerequisite products 967 run-time problems 970 slapd process 968 SSL connectivity 968 verification of policy 970 IBM Tivoli Access Manager WebSEAL 876 IBM Tivoli Configuration Manager append_log keyword 884 backup_fmt 889 broken links 888 cm_status 883 components 869 list_path 889 log_file 889 log_file_path 885 log_host 890 log_host_name 884 log_object_list 885 lost and found 888 lost data 888 multicast distribution 881

1098

new features 871 notice group entry 884 object identification number 888 packaging 869 prog_env 889 set_debug_level option 888 software package 889 stop_on_error 889 trace_size 891 user_program 883 wadminep command 888 wep command 888 wexpspo command 889 wgateway command 888 wgetspat command 889 wls command 888 wping command 886 IBM Tivoli Monitoring additional response actions 624 Apache 623 architecture 618 Autotrace 640 base classes 622 basics 617–618 CIM 620 CIM-based resource monitoring 620 components 618 DMCollectEpEnv 642 DMCollectMnLog 642 endpoint logs 632 endpoint upcall traces 630 gateway logs and traces 626 generate XML file 640 heartbeat engine traces 627 log and traces format 624 MDist2 support 623 MOF files 622 new features 623 profile 622 profile core trace 631 profile distribution process 626 request manager trace 631 resource data 622 resource model 622 serviceability tasks 641 Tivoli Business Systems Manager adapter trace 629 Tivoli Business Systems Manager engine traces 629

Troubleshooting Tivoli Using the Latest Features

Tivoli Business Systems Manager events 622 Tivoli Business Systems Manager transport trace 630 Tivoli Enterprise Console events 622 Tivoli Enterprise Data Warehouse support 624 TMR server logs and traces 626 tools 640 Web Health Console 618 Web Health Console logs and traces 638 Web-based Health Console 623 WebSphere tracing 639 winmsd command 642 IBM WebSphere Application Server 876 IBMupd.ldf 235 ic_discover method 937 id 148 ID mapping 316 root_group 317 root_user 317 IDL 34 idlarg 71, 274 idlattr 24, 54, 60, 66, 73, 76, 331, 333 idlcall 24, 43, 53, 59, 66, 68, 99, 102, 275–276, 295, 329–330, 332 idmap 981 image dump 1060 image_report 199 imdb.bdb 26–27 In use count 418 INCORROUT 821 IND file 192 index 780 index keyword 146 inherit 25 inheritance 27, 33 initial login 490–491 initial login sequence 488 install logs 118 install media image 266 install repository 186 directories 191 install2.cfg.error 122 install2.cfg.output 122 installation automatic startup 155 classic install logs 120 client installation behind the scenes 159 CONTENTS.LST 146 core installation process 145

Desktop Install 248 environment variables 155 files 122 installation dialog 146 installation object 160 Java Virtual Machine 214 JRE installation 580 large installations 113 logs 120 mass installation 5 methods 5 NFS mount considerations 154 overview 151 PATCHES.LST 146 pre-install checks 152 process 5 reinstall 156 remote startup 155 RIM 445 server installation behind the scenes 156 Software Installation Service 186 TEC Java Console 578, 580 TEC server 534 Tivoli applications 152 Tivoli Business Systems Manager 712 Tivoli clients 151 Tivoli endpoints 152 Tivoli Management Framework 151 Tivoli patches 152 Tivoli Workload Scheduler 794 TMR server 151, 212 TRIP 162 troubleshooting 237, 761 installation methodology 270 installation repository 192 installing NetWare gateways 1045 InstallShield 578, 782 InstallShield image 1044 InstallShield Multi-Platform See ISMP instance manager 25, 329 instances 33 Instantiable Class 25 instantiate 25 Integrated Desktop Install Change Manager GUI 248 Components to install Activity Planner GUI 248 Distribution Status Console 248

Index

1099

Inventory GUI 248 Software Package Editor 248 Tivoli Desktop for Windows 248 Tivoli Java components 248 Integrated Install APM plug-in 216 benefits 212 CM plug-in 216 cmismp.log 240 cmsummary.log 237 create custom tablespaces 228 create default tablespaces 228 creation of gateway 215 creation of Query Libraries 216 creation of RIM objects 216 installation types 212 Integrated Desktop Install 248 Integrated Endpoint Install 253 Java components 217 Java Virtual Machine 214 No configuration 228 overview 212 Run schema scripts only 228 Server Install 212 authorization roles 213 behind the scenes 215 Custom Install 224 database requirements 213 installation programs 214 Inventory requirement 226 Typical Install 217 Silent Install 216 traditional logs 242 troubleshooting 237, 251 wcrtgate 215 winstall 216 wpatch 216 Intel 865 interactive reporting 759 interconnected TMRs 355, 364 managing TMR 381 remote connection 356 secure connection 357 troubleshooting 375 Interface 34 Interface Definition Language See IDL 34 Interface Repository 35, 57, 72 interfaces 41

1100

intermediate client 397 internal methods 29 Internet 871, 874 inter-object message timeouts 301 interrogating the target 1000 interruption 427 intrinsic 29 inv_query RIM 216 invdh_1 216 Inventory component Collector logging 930 log files 928 RIM trace 931 troubleshooting 928 troubleshooting on the endpoint 940 Inventory scan ID 938 IOM channel 565 IOM connection 565 ipconfig 1020 IPX support 1057 IPX/SPX clients 1049 IR See install repository ir.lck 190 ir.loc 208 irview 37–38, 54, 62, 67, 72–73, 76 ISMP 5, 212

J Java 874 Java 1.3 217 Java Client Framework See JCF 248 Java Client Framework for Tivoli 217 Java front end 215 Java interface 398, 681 Java RDBMS Interface Module 217 Java RIM See JRIM Java Runtime Environment See JRE Java Script 622 Java Server Pages See JSP Java servlet 874 Java servlet plug-in 874 Java Virtual Machine

Troubleshooting Tivoli Using the Latest Features

See JVM JavaHelp 248 JCF 248 JCF based GUIs 248 JCF spawning 248 job 338 internals 339 Job FOLLOW 863 Job Scheduling Console connector trace 852 error examples 855 severity code 822 trace table 825 traces 857 troubleshooting 854 TWS for z/OS connector troubleshooting 852 WAIT keyword 823 journals 11 JRE 248, 579–580 JRIM 248 JSP 874 JVM 580

K kbdus.dll 197 Kerberos 991 keybus.dll 1016 kill 561, 860 killproc 860 Korn shell scripts 1065

L LAN 426–427 LAN repeaters 425 large software packages 427 lastfplog 121 launch_sis 204 LCF.NCF 1048 lcfd daemon 464 lcfd service 930 lcfd.log 929 lcfrsrvd account 1044 LDAP 877 server 235, 877 troubleshooting 927 LDAP Data Interchange Format 235 license_tag 147 Lightweight Directory Access Protocol 877

See LDAP limited-availability patch 261 Line of Business See LOB 681 Linux machines 782 listdll.exe 1020 listproc 860 LMHOSTS 1014 Load 758 load a software package 422 load depot 420 LoadLibrary 1009 LOB 681 LOB objects 681 Local Area Network See LAN local hierarchy 425 localhost 1019 locking mechanism 487 log file 117–118, 761–762 log file adapter 121 Log on Locally 980 log rotation 128 Log to File 282 log_it.pm 128 login sequence 488 login_interval 490 logs endpoints 632 profile distribution 626 server 626 Tmw2k_srv.log 626 LOOP 821 lost data mart database 781 lost-n-found 888 lowest granularity 759 LPAR 739 LSASS.EXE 975

M machine ID 874 machine reboot 428 mailman 860 Maintenance rearrange a table 778 updating system catalog statistics 780 maintenance backup 777

Index

1101

gather statistics 780 removing old data 778 maintenance release level 265 malformed ASCII exception 301 managed node 248, 442 Managed Node Repeater 398 managed resources 307 Managed_NodePD 36–37 ManagedNode_get_create_dialog 38, 72 management hub 795 marker files 270 markers 761 MASTERivuser. 984 max_conn parameter MDist 1 408 MDist 2 407 max_sessions_high 403 max_sessions_low 403 max_sessions_medium 403 Mcollect 930 mcollect method 937 MDist1 415 MDist2 268, 403, 412 assured delivery 429 asynchronous delivery 399 checkpoint restart 427 data depots 416 depot configuration 419 disconnected endpoint support 433 distribution connections 406 distribution control and status 438 Distribution Manager 438 distribution speed configuration 409 Loading software packages on depots 420 MDist1 comparison 399 priority queues 402 remote depot repeater 427 resource limits per repeater 406 retransmit 427 retrying broken connections 405 segments 418 what is new 398 MDist2 components distribution manager 398 GUI 398 repeater depot 397 repeater manager 397 repeater queue 397 Repeater site 397

1102

MDist2 problems 886 mdist2 RIM object 229 MDist2.bdb 429 MDist2.log 429 mem_max 414 Member_Lost 108 memory dump 1060 message catalog 622 Meta database 678 meta-schema 620 MethInit 48 method cache 464, 937 method fork failed errors 347 methods 25–27 _get_client_files 285, 300 _get_default_host 300 _get_server_files 285, 300 RIM_iom_session 457 run-task 340 snapshot 286 Microsoft 1008 Microsoft Knowledge Base 1012 Microsoft SNA server trace 750 Microsoft SQL server 676 miniprod.sav 190 minitmr.sav 191 Mobile Computing configuration files 898 log files 898 process flow 896 trace files 898 mobile devices 880 MOF file 622 base 622 resources 622 MSG 822 MSSQL Administrator 984 MSVCRT40.DLL 1008, 1027 multicast 881 multiple connections 406 multiple distributions 410 multiplex distribution 121

N Name Registry 28, 103, 330–331, 355, 364–365, 370–371, 472 name resolution 913, 1014 naming standards 371

Troubleshooting Tivoli Using the Latest Features

native mode 1029 native OS installation 900 nbtstat 1020 NDS Netware Directory Services NDS tree 1044 NDSupd.ldf 235 negative net_load 412 negative number 412 nested collection 27 net_load 410 net_load parameter MDist 1 412 MDist 2 412 NetBIOS naming conflicts 1015 NetBIOS over TCP/IP 1020 netstat 846, 861, 1020, 1025 netsvc.exe 1020 NetView 712 NetView APM 731 NetView Application Management Interface 731 Netware Bindery emulation 1044 NetWare clients 1049 NetWare considerations 1044 Netware Directory Services 1044 NetWare gateways 1057 NetWare managed sites 1049 NetWare PC agent 1049 NetWare server 1044 NetWare Server Configuration report 1063 NetWareManagedSite object 1050 network bandwidth 410, 413–414 network connection 763 network failures 428 network intensive task 1027 network resources 406 network traffic 796 next tier repeater 431 NIS 763 no connection 763 nobody 982 Nokia 9200 Series 880 non-service binaries 597 non-Tivoli applications 758 notice database repair 332 notice expiration 328 notice group objects 331 notice group OID 331 notice groups 327–331

notice manager object 328, 330–331 notice.bdb 298, 327, 330–331 notice.log 327, 330–331 notices 25 notices database 329 notices database corruption 329 notification 282 Notification Manager 890 notification system 328, 330 notify_interval 437 Novell NetWare 5.x. 1047 Novell NetWare 6.x 1047 Novell NetWare Modules 1063 Novell NWADMIN utility 1045 Novell Requester 1044 nslookup 102 NT 3.51 SP5 997 NT LAN Manager 991 NT registry 752 NT Server Message Block protocol 164 NT/Windows Event Adapter adapter defaults 596 debugging 595 diagnostic logging 603 running from command line 598 running in debug mode 601 running in test mode 600 tecad_nt.exe 597 tecad_win.exe 597 TECADHOME environment variable 602 NTFS 154 convert FAT to 154 NtfServer 331 NtfServer process 328, 330–331 nthandleex.exe 1020 ntprocinfo 480 ntprocinfo.exe 1019 ntregmon.exe 1020 NTResKit 1020 ntrights.exe 1020

O o_dispatch 155 o_get_groups 31 objcall 24, 43, 54, 58, 66, 73, 76, 331 object 25, 161–162 object database 25, 96, 100, 279 object dispatcher 29, 59, 61, 63

Index

1103

object hierarchy 33 object ID 322 plus and minus signs 465 object identifier 24 Object Management Group 19 Object Repository 25, 57 Object Repository Architecture 26 Object Request Broker 19 odadmin 6, 58, 65, 101, 114, 374, 931 environ 6, 287, 460 odlist 6, 183 objects 183 rm_od 183 region 375, 377 trace 7 odadmin environ 931 odbls 61, 67, 76–77 odstat 7, 63, 76, 101–102, 488, 554 OLAP 759 analysis 759 old copy of central data warehouse 781 older version of control database 780 om_get_acl 31 OMG See Object Management Group OPENS file 863 Operations Console 925 Oracle 449 ORB See Object Request Broker Orb Data Limited 11 original IP 764 orphaned Software Package 888 OS/390 713–714 oserv 25, 27, 29, 58, 119, 268, 294, 374, 886 port 94 and o_dispatch 155 See also object dispatcher oserv object 29–30 oserv process 490 oservend 1047 oservlog 118–119 sample output 157 oservrun 1047 otherpages 58, 296

P package_description 147 Palm 880

1104

Palm device 879 parent directories 980 parsing failed 529 patch factory 260 patch objects 276 patch_for 146–147 Patchadd 881 patches 260–262, 264–266, 268, 273 PATCHES.LST 146–147 pause 429 pausing a distribution 437 PC managed node 1049 PDC Emulator 991 per distribution 399 per repeater 399 perfmon 1016 Perl scripts 123, 1065 permanent cache 937 permanent_storage 419, 424 persistent information 429 ping 763 Pkgadd 881 planner RIM object 231 Policy Director 872 policy region 306, 369 POP See Protected Object Policy 962 positive net_load 412 Post Status Dialog on Desktop 282 Post Tivoli Notice 282 power failures 428 Presentation Services 762 priority level 402 Priority Level group box 404 pristine process flow 898 troubleshooting 898 pristine.log 900 process id 860 product.sav 190–191 profile endpoint objects 43 profile manager 42–44 profiles 42 Protected Object Policy 962 prototype object 33 proxy agent 880 proxy setting 914 proxy support 871 pruning processes 778

Troubleshooting Tivoli Using the Latest Features

ps 860 publish packages 871–872

Q query 213 queuing mechanism 397

R random port number 594 rapid deployment 269 RDBMS client 442 RDBMS Interface Module See RIM RDBMS server 442 read-locks 370 rearrange a table 778 reboot 782 recovery 278 Redbooks Web site 1092 Contact us xxxiv regedt32 782 region check_db 108 registration script 1047 release level 265 Reliable Multicast Transport Protocol 882 remote share 1027 remotely install to a NetWare system 1048 remove client from a TMR 183 oserv 182 reorganize warehouse program 778 reorganizing the data 778 repeater 409, 425 repeater depot 397 repeater queue 431 repeater sites 425 reporting application 872 Repository 34–35, 213 rescue 297 resolve 76 resolve hostnames 1014 Resource Interface Instrumentation Library Type 621 resource limits 406 resource locking 487, 490 Resource Manager 877 Resource Manager Gateway 217 resource model 618, 622

components 622 resource visibility 358 resources 28 Response File byNode 194 byProduct 194 export 193 IND 193 restore 294, 780 items not restored 298 restore endpoints 273 re-submit 433 Results Collector 879 resume 429 resuming a distribution 437 retry function 397 retry_ep_cutof 430 revision 146–147 REXEC 997 rexec 996–997 REXX environmen 712 RI See Report Interface RIM 441, 527, 530 API 443 calls 530 database 435 database tables 446 installing 445 problems 563 RDBMS_Interface Translation Layer 444 Vendor Adaptor Layer 444 RIM agent process 932 RIM connectivity 452 RIM host 370, 448 RIM host managed node 932 RIM interface log file 938 RIM log 932 RIM object 442 RIM tracing 563–564 starting 563 stopping 563 RIM_DB_LOG 931 Rim_vendor_Agent 932 RIP packets 1057 rmattr 30 rmobj 30 rolling back 272 root 306

Index

1105

root cause analysis 622 root_group ID map 317 root_user ID map 317 rotate_logs.pl 1072 router 861 rpc_max_threads 1027 rpi.strings table 766 RPM 881 rpt_dir 420 rsh 996 rule profiling 571 rule tracing 565 rules.trace 121 run_task 981 RUNSTATS 780

S safeguard 1022 SAM 987 same code base 398 sapack 269, 998 SC 681 sc.exe 1020 scheduled job 282 Scheduler 282, 349 common errors 351 security_db 44 security_update 48 seed file 1000 segment 416 Send E-mail to 282 send_timeout 433 sentry_engine 981 serial number 418 Server Message Block 1001 setattr 30 SeTcbPrivilege 976 setting Collector logging 930 setuid 974 Setup.exe 215 shared connections 409 shared library 133 Show SQL 766 SID 500 985 signed apple 871 silent install 216 silent installation 761 simplified maintenance 212

1106

Sinfonia 865 single byte character set 760 single install media image 266 single server installation 217 SIS Installation AS/400 197 client prerequisites 196 disk space probe 198 endpoint prerequisites 197 file package pushed 199 install of a managed node 200 install of a TMA endpoint 200 prerequisite checks 196 product and product prerequisites 198 SIS installer 270 sis.ini 201 sisclean.log 203 sisgui 189 sisguisub.sh 204 slow WAN link 427 SLOW_LINK parameter 413 SMTP server 1019 SNA server 713 snapshot shell script 286 SNMP 620 Software Depot 419 Software Distribution Agent 879 Software Distribution engine 879 Software Distribution/Inventory API 874 Software Installation Service 185 locks 190 log files 203 Response File 186 response file 193 synchronize with TMR 201 software package 872, 875 Software Package Editor 885 source host 399 SP See software package speed tweak 1017 Spider daemon 592 starting 594 stopping 594 spoke TMR 368–369 Spool Thread 382 spooler 997 spooling 414

Troubleshooting Tivoli Using the Latest Features

SQL execution engine 767 sql output 766 SQL statements 938 sqlplus for Oracle 766 sqlscript.sh 767–768 star schema 766 StartUp 859 stat_intv timeout 392 storage mechanism 397 store permanently 416 store temporarily 423 store-and-forward 399, 416 stored procedure 678 submission 401 subscribers 45, 408 substrate objects 29 supervisor account 1044 swdis.ini 429, 927 Swing 248 synchronize with a model 875 synchronous transfer 396 SYS volume 1048 system hang 416 system performance 406

T table of contents 418 tag files 270 target_net_load 410 task 337 allow root to run 344 executables distribution 340 task library 338 commands 342 common errors 347 default and validation policies 339 TBSM Distributed Edition 730 TCP connection 881 TCP connections 409 TDM 879 TEC 3.7 diagnostic logging 538 diagnostic logging level 548 installation debugging 534 reporting 565 restart 545 rule profiling 571 rule tracing 565

starting trace 546 stopping trace 546 tracing 538 troubleshooting 525 TEC Consoles 527 TEC Java Console Debug Window 590 debugging 577 history logging 591 installation 578 run-time logging 586 uninstallation 578 TEC processes tec_dispatch 529 tec_reception 528 tec_rule 528 tec_server 528 tec_task 529 tec_ui_server 529 TEC Server restarting 546 start tracing 560 starting 546 stop tracing 560 TEC UI Server diagnostic logging 557 shutting down 562 tec_diag_config file 121 tec_dispatch 121 tec_reception 121 tec_rule 121 tec_task 121 tecad_logfile.boot.log 121 Technical Exchange 11 TEDW installer 761–762 TEIDL 34 telnet 154, 995 telnetd.exe 1020 test database connection 767 Testing patches 267 throughput 401, 425 time stamp 123, 865 time stamps 362 timeout 431 Time-out Settings 432 Deadline 432 Execution Timeout 432 Notification Interval 432 Send Timeout 432

Index

1107

Tivoli 527 Tivoli Administrator 326 Tivoli administrator 875 Tivoli binaries 289 Tivoli Business Systems Manager 678 _C table 680 AgentListener.que 684 allocate oibject instance 680 ASIDBValidater 676 asisp_createBATC 680 attribute table 679 base services 676 base services logging 750 Batch Job object 679 BATCH_A 679 BATCH_C 679 BATCH_V 680 behavior of an object 681 BSM Database Validater 676 Business System Container 681 child event 683 class implementation 679 class number 680 class table 680 classic interface 681 cno column 680 communication problems 750 Complex - Machine - LPAR hierarchy 739 configuration 746 connection to the database 676 containment hierarchy 681 create object instance 680 Database 677 ASIRuleSvc 677 extended stored procedures 677 Meta 677 Meta tables 677 ObjectQueues 677 queue support 677 SQL Enterprise Manage 678 database calls 750 Database Master 677 Database Microsoft SQL Server 677 Database Object 677 database overview 677 database server machine 734 database triggers 683 database view 680 DBRetryInterval 676

1108

Troubleshooting Tivoli Using the Latest Features

dedicated IP address 747 dequeue 684 Distributed Edition 730 Agent Listener 732 database extension 736 DM36.log 732 endpoint monitoring 732 Event Enablement 732 lcfd.log 732 dumpfqueue 684 Dynamic, Filtered LOB Child 683 Enterprise Outliner view 681 event enablement 734 GEM 685 generate TEC events 731 Health Monitoring system 750 hexadecimal representation 680 higher-level object 685 highest instance ID number 680 ID column 680 ihseeerr.log 734 ihstserr.log 734 inheritance hierarchy 681 instance ID 680 instance table 679 internal functions 681 Java interface 681 Line of Business Containment Link 683 Line of Business Logical Link 683 Line of Business Static Link 683 Line-of-Business view 681 link_chain table 681 list of object classes 678 Listener service 714 LOB objects 681 LOB relationship types 683 log files 716 LogLevel parameter 753 lost connection 713 managed application component 730 message exchange 714 Meta Database link_chain 679 link_type_table 678 method_table 679 obj_class_table 678 metadata 677 Microsoft SNA server trace 750 morphable 681

MVS Upload Rule Server 713 obj_class table 680 object attributes 678 object connection 682 object database structure 678 object hierarchy 681 object link types 678 object pump 713 object registration event 715 object relationship 680 objects 683 original objects 681 OS/390 input component 712 OS/390 related components 716 MVS enqueue proxy server 716 MVS event handler 716 MVS listener 716 MVS upload rule services 716 Paused mode 676 PHYC connection 683 prerequisites 746 propagation matrices 681 propagation matrix relationship 683 propagation mechanism 686 agent on different machine 689 Alert State 686–687 ASIPADispatcher 689 attribute of the object 687 child event 687 critical priority 687 detailed process 689 exception 687 Ignore priority 687 log files 689 Object priority 686 Object type 687 propagation agent 689 propagation matrix 686 PropagationBucketArray 689 PropagationMatrix 689 PropagationSetting 689 state 686 status 688 Tivoli BSM Propagation Agent Dispatcher 689 Windows NT service 689 queue processing 683 database 683 file-based 683

remote access for logs 755 remove object instance 680 REXX environment 712 rule engine 735 rule files 735 SC66-Upload.que 715 Sender service 714 server types application server 746 database server 746 event server 746 history server 746 Notification Services 754 problem determination 750 propagation server 746 SNA server 746 setting table 680 SGTMMODS dataset 712 SNA server 713 Source/390 713 status propagation 685 Stored procedures asisp_create 680 stored procedures 680 alloc_ 680 asisp_view 680 delete_ 680 TEC 733 TestLoopInterval 676 Tivoli Distributed Monitoring profiles 730 tserver 734 unique object ID 680 Tivoli database 97, 272, 278, 289 Tivoli database backup 295 Tivoli Desktop 279, 326–327, 870 Tivoli Device Manager 878 Tivoli endpoint 464 Tivoli Enterprise Data Warehouse core application 760 data mart 759 documentation library 760 ETL processes 759 ETL programs 758 installation media 760 non-English languages 760 packaging 760 patches 761 source applications 758 warehouse packs 758

Index

1109

Tivoli Extended IDL 34 Tivoli Field Guides 7 Tivoli Global Enterprise Manager 685 Tivoli Mailing List 8 Tivoli managed node 870 Tivoli Management Agent 464 See TMA Tivoli Management Framework 24, 47, 217, 878 Tivoli Management Framework 3.7 382 Tivoli Management Framework-based applications 872 Tivoli Management Region See TMR Tivoli Name Registry 24, 28, 353, 355, 369 Tivoli NetWare Repeater 1049 Tivoli NetWare repeater 1052 Tivoli notice 282 Tivoli object database 100, 279 Tivoli Object Repository 57 Tivoli patches 262 Tivoli region 442 Tivoli Remote Access Account 156 Tivoli Remote Control Chat 951 components 950 Controller 950 enabling logging for Windows 952 enabling traces for OS/2 951 enabling traces for Windows 951 events that are logged 952 File Transfer 951 gateway 950 logging 952 minimum usable configuration 950 protocol conversion 950 registry paths 951 server 950 SPX/IPX 950 Target 950 TCP/IP 950 Tivoli Management Framework troubleshooting 953 trace 951 trace files 954 troubleshooting 952 Windows eventlog 953 Tivoli Remote Execution Service See Tivoli Remote Installation Package Tivoli Remote Installation Package 154

1110

Tivoli Resource Manager 876 Tivoli Web Gateway 878 Tivoli Workload Scheduler ABEND 821 ABENDU 821 Abnormal termination 824 AutoTrace 865 batchman down 861 byte order problem 865 compiler processes 864 end-to-end working directory 839 evtsize 861 FTA not linked 865 FTAs not linking to the master 859 INCORROUT 822 information needed to identify a problem 831 internal trace 851 Jnextday in ABEND 864 Jnextday is hung 864 jobs not running 862 JSC error examples 855 LOOP procedure 826 missing calendars 864 missing resources 864 MSG 821 multi-domain configuration 796 multiple netman processes 861 negative run-time error 865 PERFM keyword 823 preparing a console dump 829 problem analysis 823 problem-type keywords 820 single domain configuration 795 software-support database 820 standard list directory 841 standard list messages 843 Starter log information 843 Symphony renew 847 system dump dataset 825 TCP/IP server 851 Trace information 825 TRACEDATA 858 TRACELEVEL 858 tracing facility 865 tracking events 832 Translator log information 843 troubleshooting 859 troubleshooting checklist 859 TWS port 861

Troubleshooting Tivoli Using the Latest Features

unlinked workstations 843 using keywords 820 Wait 821 writer process down 860 Tivoli Workload Scheduler for z/OS troubleshooting 819 Tivoli Workload Scheduler network 794 Tivoli Workload Scheduler troubleshooting 76 tivoli.cinstall 203 tivoli.sinstall 122 Tivoli_Admin_Privileges 976 Tivoli-Specific Accounts 1000 TivPriv 987 tlist.exe 1020 TMA See Tivoli Management Agent TME 10 Advanced Development Environment 21 tmersrvd 982, 1024 tmersvrd 1029 TMPDIR 287 TMR 398 TMRSync.sh 192, 202 tmstat 62 TOPOLOGY statement 847 total number of sessions 407 TouchPoint 621 Touchpoint Service Layer 621 TRAA account 984 trace 890 IBM Tivoli Access Manager for Operating Systems 963 IBM Tivoli Monitoring 624 Inventory 929 Job Scheduling Console 857 RIM 444 Software Distribution 890 Tivoli Business Systems Manager 753 Tivoli Enterprise Console 538 Tivoli Remote Control 951 Tivoli Workload Scheduler 865 Tivoli Workload Scheduler for z/OS 825 wtrace 78 Trace ID 132 trace_style 891 transaction locking error 370 Transform 758 trigger 678, 766 TRIP 480 TRM 876

troubleshooting Activity Planner 902 administrators 321 APM 902 base configuration file 890 Change Manager 908 Collector 930 common installation problems 761–762 creating the first report 765 customization of TEDW 766 Data Handler 930 Data Moving 894 DB2 errors 761 diagnostic file 825 distribution ID 886 endpoints 469 End-to-End Tivoli Workload Scheduler 839 Enterprise Directory Integration 927 existing Web server 762 free space. 761 gateway log 888 IBM Console 763 IBM Tivoli Access Manager for Operating Systems 965 IBM Tivoli Monitoring 624 installation 761 installation logs 761–762 insufficient disk space 761 Integrated Install 237, 251 interconnected TMRs 375 Inventory 928 Job Scheduling Console 854 MDist2 887 Mobile Computing 896 NT/Windows Event Adapter 595 odadmin odlist command 886 ping hostname 763 pristine 898 pristine installation 898 reinstall the warehouse pack 762 Report Interface 763 Resource Manager problems 922 RIM 563 Scheduler 351 Software Distribution traces 890 software package 889 synchronization 900 tasks and jobs 344 TEC 3.7 installation 534

Index

1111

TEC Java Console 577 TEC Server 3.7 538 TEC UI Server 557 Tivoli Business Systems Manager 750 Tivoli Remote Control 952 Tivoli Software Distribution 882 Tivoli Workload Scheduler connector 852 Tivoli Workload Scheduler for z/OS 819 UNIX System Services 848 Web Gateway and device management 910 Web Gateway installation 910 Web User Interface 923 Trusted Computing Base 964 tuning SIS 200 TWH.log 761–762 TWS See Tivoli Workload Scheduler types of classes 25 Types of ETLs Central Data Warehouse 759 data mart 759

U UDP 155 UDP broadcast 881 uname 155 Uninstall TEDW 781 UNIX 561, 621 unload depot 420 upcall 467 update count 418 upgrade Tivoli environment 212 User Interface Server 3.7 527 user login name 310 user_db 44

V validation policy 33, 306 task library 339, 341 allow root to run tasks 344 VB script 622 vdisp 101 view_config_info 888 vulnerability analysis 1021 vwlogger 768 vwserver 768

1112

W wadminep 271, 273 WAN 425 warehouse pack 758, 761 warehouse target 780 wauthadmin 321, 982 WBEM 619 wbkupdb 95, 286–287, 289, 300 wchkdb 94–96, 98–99, 101–102, 111, 113, 183, 279, 377, 888 wchkdb -u 585 wchknode 94, 98, 111, 113–115 wclient 152 wconnect 364, 371 wcpcdrom 158, 266 wcrtadmin 309 wcrtgate 215 wcrtjob 343 wcrtrim 216, 447 wcrttask 343 wcrttask command 346 wcrttlib 343 wdel 331 wdelep 1004 wdeljob 343 wdelsched 349 wdeltask 343 wdepot describe 419 wdisconn 374 wdistrib 49–50 wdisttask 341, 343 wdmcmd 786 wdmcollect 786 wdmlseng 785 Web browser 872 Web Gateway installation troubleshooting 910 Web Gateway troubleshooting 910 Web objects 875 Web server configuration 764 WEB UI limitations for Software Distribution 4.1 872 Web User Interface client trace files 927 DISSE0082E error message 925 inventory scan problems 925 login problems 923 main components 873 Profile not found error 924 software package installation problems 925 tracing 925

Troubleshooting Tivoli Using the Latest Features

tracing WEB UI plug-in 926 troubleshooting 923 TWG-MCOLLECT 927 unable to publish Web objects 924 Web-based Enterprise Management 619 WebSEAL 874, 876 WebSphere 872 webui.log 927 webui.trc 927 wedsched 349 wenblsched 349 wepupgrd 273 wgateway 96, 931 wgetadmin 101, 321 wgetallinst 355 wgetjob 343 wgetpolm 342 wgetsched 349–350 wgetsub 53 wgettask 339, 343 Wide Area Network See WAN widmap 317 WinCE 880 Windows .NET 992 Windows 95 1059 Windows 98 1059 Windows Management Instrumentation 621 see WMI 621 Windows NT 564 Administrator 306 Windows NT 4.0 Resource Kit 748 Windows Whistler Server 1010 Windows XP 992 Windows XP Professional systems 1010 WINS 1014 WINS database 1014 winstall 152, 216 winstlcf 152, 981, 1001 winstsp 427 winvfilter 453 winvpackage 453 winvrmnode 453 Wizard 881 wlcftap 1006 wldsp 420, 427 wln 321 wlocalhost 155, 996, 1004 wlocalhost.exe 1019

wlookup 28, 54, 73, 329, 331, 355, 359 Administrator 321 InterRegion 358 ManagedNode 301 wls 67, 331, 355, 359 /Library/BackupClient 285, 302 wlsconn 363, 373–375 wlsemsg 543 wlsinst 273 wlsnotif 327 wlspol 342 wlspolm 342 wlstlib 343 wmailhost 996 wmailhost.exe 1019 wmdist 419, 428, 437 wmsgbrowse 890 wmvspobj 888 Workaround for Windows 2000 768 Workaround for Windows NT 768 wpatch 152, 269, 271 wputmeth 33 wputpolm 342 wregister 190, 207, 332, 366 wrimtest 455 wrimtrace 443, 460, 563, 931 write-locks 370 wrm 321 wrmnode 98, 101, 113–114, 183 wrpt 412 wruninvquery 360–361 wrunjob 343 wruntask 343 wrunui.exe 1018 wscanner 937 wscanner.cfg 937 wschedjob 349 wserver 151 wsetadmin 321 wsetemsg 543 wsetesvrcfg 121 wsetjob 343 wsetrim 446 wsettap 979 wsettap.exe 1018 wsettask 343 wsis 193 wsndnotif 327 wstarthttpd 592

Index

1113

wstartsched 349 wstophttpd 592 wsupport 1019 wswdcfg 890 wsyncsp 883 wsyncsp.log 900 wtailnotif 327 wtaskabort 343 wtdbclear 543 wtdumper 543 wtll 343, 346 export file 346 wtrace 7, 65, 554 wuldsp 420 wuninst.log 203 wupdate 95, 332, 363–365, 375 wxpnotif 327 wxterm 344

X X11 server 761

1114

Troubleshooting Tivoli Using the Latest Features

Troubleshooting Tivoli Using the Latest Features

(2.0” spine) 2.0” 2.498” 1052 1314 pages

(

Back cover

®

Troubleshooting Tivoli Using the Latest Features Insider’s guide to Tivoli troubleshooting Updated for post 3.6 Framework and applications New troubleshooting functions included

This IBM Redbook is an update of the existing Tivoli Enterprise Internals and Problem Determination, SG24-2034 redbook. The material is revised and updated for Tivoli Management Framework and applications post Version 3.6. Some of the applications that are covered from the troubleshooting point of view in this redbook are: Tivoli Management Framework and related concepts Tivoli Enterprise Console IBM Tivoli Monitoring Tivoli Business Systems Manager Tivoli Enterprise Data Warehouse Tivoli Workload Scheduler IBM Tivoli Configuration Manager Tivoli Remote Control IBM Tivoli Access Manager for Operating Systems Another subject that is associated with troubleshooting is proper maintenance of your Tivoli environment, because proper Tivoli maintenance procedures eliminate many potential Tivoli problems. In addition to the troubleshooting information, this redbook briefly touches on some best practices information for maintaining your Tivoli environment, mostly from the troubleshooting perspective.

INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information: ibm.com/redbooks SG24-6614-00

ISBN 0738426911