Demonstrations, Exercises, Exercise Solutions - IBM Information Analyzer Essentials v11.5 (Course code KM803 ERC 2.0).pdf

Demonstrations, Exercises, Exercise Solutions, IBM Information Analyzer Essentials

721 121 7MB

English Pages 164 Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Demonstrations, Exercises, Exercise Solutions - IBM Information Analyzer Essentials v11.5 (Course code KM803 ERC 2.0).pdf

Table of contents :
Preface......Page 1
01-Information_analysis_overview......Page 3
02-Information_Server_overview......Page 13
03-Information_Analyzer_overview......Page 19
04-Information_analyzer_setup......Page 24
05-Data_Classes......Page 44
06-Column_Analysis......Page 56
07-Data_profiling_techniques......Page 79
08-Table_analysis......Page 97
09-Cross_table_analysis......Page 105
10-Baseline_analysis......Page 113
11-Reporting_and_publishing_results......Page 117
12-Data_rules_and_metrics......Page 128

Citation preview

------- --- ----

::..::..:::�::e

Demonstrations, Exercises, Exercise Solutions

IBM Information Analyzer Essentials v11.5 Course code KM803 ERC 2.0

IBM Training

Preface

August, 2016 NOTICES This information was developed for products and services offered in the USA. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 United States of America The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. TRADEMARKS IBM, the IBM logo, ibm.com and InfoSphere are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. © Copyright International Business Machines Corporation 2016. This document may not be reproduced in whole or in part without the prior written permission of IBM. US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

P-2

Information analysis overview

Information analysis overview

Information Analyzer v11.5 © Copyright IBM Corporation 2016 Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

Demonstration 1 Read case study

Information analysis overview

© Copyright IBM Corporation 2016

Demonstration 1: Read case study

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-20

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

Demonstration 1: Read case study Purpose: Introduction to the Chemco data warehouse case study. Describe the business requirements for the ChemCo Data Warehouse course project.

Task 1. Read case study. Executive Summary ChemCo Corporation is a leader in the wholesale chemical supply marketplace, providing their customers with a wide range of chemical intermediate manufacturing products, such as hexchloride, propanol, and ammonia. ChemCo Corporation made the strategic decision to build a decision support system consisting of a central data warehouse which will in turn feed several analysis databases. A comprehensive understanding of the data that will source this data warehouse is critical to estimate needed data cleansing and ETL programming efforts. Company Stats: Name of Business: ChemCo Corporation Type: Chemical supply Organizational structure: 12 regional warehouses with corporate headquarters in Denver, Colorado. The Business Challenge ChemCo wants to build a global, unified view of their product and customer data. To select a trusted system of record, ChemCo must first investigate data quality issues. Source Systems and Issues ChemCo Corporation has identified multiple data sources as feeds to the data warehouse. The potential source systems vary in data quality and use different methods for identifying customers. These issues are a serious concern of the management and they would like to see a comprehensive plan for addressing these problems. The challenge is to identify rules for cleansing the data to provide consolidated views of the data across all sources. Existing systems are: • Customer Sales • Inventory • Finance © Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-21

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

Data requirements: • Customer name information is spread across free-form text fields. Business users would like to see this organized into specific fields. • Remove all duplicate customer records. • Establish a unique customer profile. • All blank entries exist in some fields. Blanks and nulls (no value whatever) should be treated as invalid entries (not true of the current systems). • Sales information must be accurate and conform to documented business rules, especially all computed data fields. You have been assigned to the project in the role of Data Analyst and are charged with the task of performing a Data Quality Assessment on the Sales data. Results: You have been introduced to the Chemco data warehouse case study. You have read the business requirements for the ChemCo Data Warehouse course project.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-22

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

Demonstration 2 Read project scenario

Information analysis overview

© Copyright IBM Corporation 2016

Demonstration 2: Read project scenario

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-23

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

Demonstration 2: Read project scenario Purpose: Understand and describe the business and project requirements for the ChemCo Data Warehouse project.

Task 1. Read the ChemCo project approach. 1.

A project team has been assembled to perform a Data Quality Assessment of the ChemCo data. This demonstration describes the makeup of the project team. Review the ChemCo Data Warehouse project plan and staff assignments. This is a reading demonstration to explain the project configuration to support data analysis for the business case. This is meant to simulate a real project configuration and how it is staffed.

2.

The following ChemCo project definition establishes business requirements and identifies candidate source data.

ChemCo management has decided to use a project methodology comprised of several phases: 1. 2. 3. 4. 5.

Analysis Design Construction Testing Implementation

During the analysis phase the project manager wants to have project roles assigned, user IDs created and given access to software, potential source data identified and assessed, and a data warehouse data model created. You have two roles: • InfoSphere Software Administrator (for this demonstration only) • Data Analyst (for all remaining demonstrations)

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-24

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

Your project role is that of a Data Analyst. You have been asked to participate in source system assessment, test data design, and end user acceptance testing; consequently, you will participate in all project phases. Your first task is to understand the project business requirements and then perform a data assessment on the potential source data; the problems you discover should be documented and reported to the full project team since your results will be used to assess data cleansing requirements. Using DataStage, the source data has been extracted and stored in sequential flat files. Results: You have read the business and project requirements for the ChemCo Data Warehouse project.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-25

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

Demonstration 3 Review Chemco data

Information analysis overview

© Copyright IBM Corporation 2016

Demonstration 3: Review Chemco data

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-28

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

Demonstration 3: Review Chemco data Purpose: Become familiar with Chemco source data.

Task 1. Locate ChemCo sequential files on virtual machine. 1.

2.

The data files to be analyzed in this course are contained in the C:\CourseData\KM803Files\Chemco\Seq folder on your VM Windows machine. Open this folder and verify you have 15 files present - 11 have a .txt extension, 3 have an .rpt extension and one has an .INI extension. Using Notepad, open the CUSTOMER.txt file. Note that the first record is not true data - rather, it contains the column names for the customer.txt file. The QETXT.INI file will compensate for this by using the FLN=1 parameter setting. This will direct the ODBC driver to skip the first record when presenting source data to Information Analyzer.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-29

U n i t 1 I n f o r m a t i o n a n a l y s i s o ve r vi e w

3.

Open the QETXT.INI file. QETXT.INI is an ODBC configuration file. It describes the files within the sequential database directory. For example, if you use a text editor to open the file you can find the entry for the CUSTOMER.txt file described earlier. Note the file name, first data line number switch, delimiter, and column definitions. A portion of QETXT.INI is shown below:

Results: You have become familiar with Chemco source data.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

1-30

Information Server overview

Information Server overview

Information Analyzer v11.5 © Copyright IBM Corporation 2016 Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 2 I n f o r m a t i o n S e r ve r o ve r vi e w

Demonstration 1 Information Server setup

IBM Information Server Overview

© Copyright IBM Corporation 2016

Demonstration 1: Information Server setup

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-14

U n i t 2 I n f o r m a t i o n S e r ve r o ve r vi e w

Demonstration 1: Information Server setup Purpose: Use administrative functions within Information Server to add users and change reporting defaults. Modify report preferences. Describe the steps needed to log onto Information Server administration and view user IDs and their roles. Before a user can log onto Information Analyzer, the Information Server administrator needs to set up a user id and link it to appropriate roles. This is the top level of the Information Server security architecture. This demonstration shows the background security infrastructure that controls user access to Information Server products. From project business requirements the following were assigned the role of Data Analyst: James Harris - userid jharris Bob LeClair - userid bleclair Joyce Weir - userid jweir

Task 1. Information Server logon. 1.

Log onto Information Server: Double click on the IIS Server LaunchPad icon on the Windows desktop.

If the page does not open, you may need to restart the operating system (log on as student/student if prompted).

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-15

U n i t 2 I n f o r m a t i o n S e r ve r o ve r vi e w

2.

Click the Administration Console icon.

3.

Enter your username and password. Demonstrations in this course use student as the user ID and password student.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-16

U n i t 2 I n f o r m a t i o n S e r ve r o ve r vi e w

Task 2. View users. 1.

Click the Administration tab.

2. 3. 4.

Expand the drop-down window labeled Users and Groups. Click the option labeled Groups. Verify the group IT is present. If the group is not present, on the right side, click the New Group link. Then add a group with a Principal ID and Name of IT. In the Roles section, under Suite and Suite Component, select the Roles check box to select all the roles. Then, in the bottom right corner, click the Save and Close button. In the left pane, click Users. Verify that the following users are present: • jharris • bleclair • jweir If these three users are not present, on the right side, click the New User link to add them. Then specify these credentials for each of the new users:

5. 6.

• jharris: User Name, Password, and Confirm Password is "jharris", First Name is "James", and Last Name is "Harris" • bleclair: User Name, Password, and Confirm Password is "bleclair", First Name is "Bob", and Last Name is "LeClair" • jweir: User Name, Password, and Confirm Password is "jweir", First Name is "Joyce", and Last Name is "Weir" After you have added each user, in the bottom right corner, click the Save and Close button. The role assignments give each person access to functions within the IS product suite but are not specific to any particular project. You will do more with assigning roles for these users in a later demonstration when you create projects. © Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-17

U n i t 2 I n f o r m a t i o n S e r ve r o ve r vi e w

Task 3. Modify reporting. Reports are used to communicate your data analysis findings to the entire project team. You will normally use the reporting functions found in the Information Server client – not the Administration console. However, some reporting controls are found only in the Administration console so the next steps demonstrate how to find and modify some report settings. 1. 2. 3.

Click the Reporting tab. Click the Preferences option. Change the default expiration to expire after 2 days.

4. 5.

Click the Save button located in the lower right portion of your window. Click the Log Out button located in the upper right portion of your window.

Results: You logged onto the server and viewed the users and groups defined to the system. You changed the reporting preferences.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

2-18

Information Analyzer overview

Information Analyzer overview

Information Analyzer v11.5 © Copyright IBM Corporation 2016 Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 3 I n f o r m a t i o n A n a l y z e r o ve r vi e w

Demonstration 1 Information Analyzer tour

• Explore navigation and help

Information Analyzer overview

© Copyright IBM Corporation 2016

Demonstration 1: Information Analyzer tour

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-20

U n i t 3 I n f o r m a t i o n A n a l y z e r o ve r vi e w

Demonstration 1: Information Analyzer tour Purpose: Guided tour through the Information Analyzer GUI. Navigate through Information Analyzer and locate the primary functions. The GUI for Information Analyzer contains standard file menus and also a custom Pillar menu.

Task 1. Logon to Information Analyzer. 1.

Launch IBM InfoSphere Information Server Console from the Desktop

2.

The user ID and password used in this course are student/student. Note: If you get a red flag next to the Server text box then you either entered the wrong name for the server or Information Server is not running.

Task 2. Explore the user interface. The five pillar menus are located in the upper left portion of your screen

1.

Click each pillar menu. Some menus have options that are grayed out. Most of these grayed out options can only be performed in the context of an open project. I. Home pillar menu: This is used for product configuration. Note that all options are available, yet no project has been selected. II. Overview pillar menu: Project level properties and dashboard are here – valid for project context only. III. Investigate pillar menu: This is used to start each investigation type. Valid for project context only.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-21

U n i t 3 I n f o r m a t i o n A n a l y z e r o ve r vi e w

Develop pillar menu: Data Quality functions can be started here. Note: If you do not see the Data Quality entry, then your user ID needs to have the Rules role assigned in the Information Server Administration Console. V. Operate pillar menu: Log and scheduling views used to help troubleshooting efforts for analysis jobs -- project context is not necessary; these functions can also be performed from the Information Server Web Console. In addition to the pillar menus, Information Analyzer has file menus. 2. Click the Edit menu and then click Preferences. 3. Click the Web conferencing compatibility checkbox to select it. This option controls the appearance of the Information Analyzer user interface during Internet presentations. 4. Select Show Analysis tab on Dashboard in the Information Analysis folder (if it is not already selected). Enabling this option will influence your starting page when opening a project. 5. Click the Status Bar option under Select View and then uncheck the Show activity animation in status bar checkbox. This will remove a progress bar that normally appears during job execution. 6. Click the OK button to close the Preferences menu. 7. Click the View menu and select the Palettes option. Note the presence of four objects that should be checked. 8. If the palettes are unchecked, then one at a time click each of the palettes until you achieve a checkmark by each one. The History palette lets you go back to previous workspaces within the context of a user session. Note the presence of Palette tabs now visible in the left portion of the window (under the HOME menu). These tabs will be handy when switching from one workspace to another. 9. Click the File menu. Note that you can create and delete projects. 10. Click the Help menu and then the Help option to view documentation. IV.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-22

U n i t 3 I n f o r m a t i o n A n a l y z e r o ve r vi e w

11. Click the InfoSphere Information Analyzer link. If this link is not visible, in the top right corner, click the Search link, in the search box, type Information Analyzer, press Enter, and then in the search results, click the IBM InfoSphere Information Analyzer link. Information Analyzer documentation is divided further into various topics of interest. More documentation sources will be explored in a later demonstration. 12. Close IBM InfoSphere Information Server Console and all open windows. Results: You navigated through Information Analyzer and located the primary functions. The GUI for Information Analyzer contained standard file menus and also a custom Pillar menu.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-23

Information Analyzer setup

Information Analyzer setup

Information Analyzer v11.5 © Copyright IBM Corporation 2016 Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

Demonstration 1 Configuring Information Analyzer

• Creating ODBC data source • Set Information Analyzer configuration options to enable data profiling jobs

• Connecting Information Analyzer to the Source Data • Importing metadata • Creating projects

Information Analyzer setup

© Copyright IBM Corporation 2016

Demonstration 1: Configuring Information Analyzer

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-35

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

Demonstration 1: Configuring Information Analyzer Purpose: This demonstration will show students the configuration settings for Information Analyzer at the product level. You will create ODBC data source, set Information Analyzer configuration options, add the data store, import Chemco defined metadata and create the project, add users, and register interest in source data.

Task 1. Create ODBC data source. 1. 2. 3.

From the desktop open the 32-bit ODBC manager by double-clicking the odbc admin 32 icon. Click the System DSN tab. Click the Add button.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-36

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

4.

In the Create New Data Source window, click the IBM TextFile driver.

5. 6.

Click Finish. In the Data Source Name box type Chemcoseq. Ensure to type Chemcoseq and not just Chemco. In the Database Directory box type in the path to the sequential files:

7.

.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-37

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

8.

Check the Column Names in First Line box.

9.

Click Test Connect.

It will return successful. © Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-38

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

10. Click OK to close the Test Connect dialog and then click OK to return to the system DSN window. You will be returned to the System DSN window where the new data source will show Chemcoseq.

11. Click OK.

Task 2. Set Information Analyzer configuration options to enable data profiling jobs. 1.

Double-click the IBM InfoSphere Information Server Console icon on the Windows desktop.

2.

Log into Information Server using student/student.

3.

Click the Home pillar menu, open the Configuration branch, and then click the Analysis Settings option.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-39

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

4.

Click the Analysis Database tab. This is the database that will contain the results of your data analysis.

The analysis database - commonly referred to as the IADB - will contain tables with column value histogram data. The IADB database will grow in size as more and more data is analyzed. Note that you can update most options present on this screen. However, it is a product requirement that this database be accessible via both ODBC and JDBC. The connection must be on the server, not the client. These ODBC and JDBC connections have already been created for you.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-40

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

5.

Click the Analysis Engine tab.

The analysis engine is actually the DataStage parallel engine. The DataStage username and password, if used on this screen, must correspond to a username and password with proper DataStage credentials as defined in the Information Server Web console. Do not change any settings - static credentials will work for these demonstrations. The entry under DataStage Project is the name of the DataStage project where all of the Information Analyzer analysis jobs will be executed; by default this is ANALYZERPROJECT. The Retain Scripts option determines whether job execution scripts will be saved in the DataStage project directory once the job has been completed. Since you want to have the script deleted if the job runs successfully, this option is normally set to No. This option can be overridden at the time the individual job is submitted for execution.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-41

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

6.

Click the Analysis Settings tab.

7.

These values are threshold settings that direct Information Analyzer on how to handle various situations in data analysis. These options can be overridden during data profiling review. You will encounter them in later demonstrations. Minimize Information Server.

Task 3. Connecting Information Analyzer to the Source Data. 1.

Double click the Metadata Asset Manager icon on the Windows desktop.

2.

Log into Metadata Asset Manager using student/student.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-42

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

3.

Click the Import tab.

4.

Click the New Import Area button.

5. 6.

Type Chemcoseq into the Import area name box. Move the scroll bar in the Select a Bridge or Connector box down to the ODBC connector and select it. Click Next.

7.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-43

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

8.

Beside the Data connection box, click the Select data connection button.

In the Select a Data Connection window, click the New Data Connection button. 10. Enter Chemcoseq as the name. 11. Choose Chemcoseq in the Data source drop down box, enter student/student in the Username and Password boxes, select the Save 'Password' check box, and then click OK.

9.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-44

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

The new connection will be tested and the window returned back to the Create New Import Area window.

12. Click the Next button. 13. In the Create New Import Area window, click the Select existing asset button 14. 15. 16. 17.

at the end of the Host system name box, and then choose IBMCLASS. Click OK. Click Next. On the next window type Chemcoseq into the Import Description box. Ensure Managed Import is selected and then click Import.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-45

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

Task 4. Importing metadata. Having created a new import area in the previous task and clicked Import a window will show that it is processing the import and then return the following messages:

1. 2.

Click OK. You will now return to the Staged Imports tab. Click the Analyze button and then expand the Host folder to display the data files.

The statistics section shows the status of the assets in the import. You can check to make sure there are no Invalid Identities.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-46

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

3.

4. 5. 6.

7.

Click the Preview button.

The new window also has a statistics section but here certain cells have the value underlined. Click one of these underlined cells to drill down into the details behind the cells value. Once you have reviewed the details, click Close to return to this window. Once satisfied that the import was successful and there are no errors, click the Share to Repository button and click Yes to confirm the import. This will import the assets into the repository. Close Metadata Asset Manager.

Task 5. Creating projects. 1.

Maximize Information Server Console.

2.

Several methods can be used to create a new project. Click the drop-down arrow to the right of the pillar icons.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-47

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

3. 4.

Click the New Project option. Enter Chemco into the Name box and choose Information Analyzer for the type.

5.

Click OK. A project properties screen will appear. Note its tabs. Take a moment to visit each of the other tabs and then return to the Details tab. Owner and Primary Contact information can be assigned, if desired, by clicking the associated icon. This will browse the Information Server user list.

6.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-48

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

7.

Click the Enable drill down security checkbox.

8.

Click the Data Sources tab. This is used to register interest in a data source that already exists in the repository. Recall that you imported the Chemcoseq metadata into the repository in an earlier task. Click the Add button in the lower right-hand portion of the screen.

9.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-49

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

10. Successively click the arrow buttons to reveal the Seq data source tables.

11. To select all tables in the Seq source, click the Seq object and then click OK. You will be returned to the project's Data Sources tab. 12. Verify that you have the following tables:

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-50

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

13. Click the Users tab, select student, and then select all the project roles.

14. Click the Browse button located in the lower portion of the screen. 15. Add user jharris to your project and assign Data Operator, Business Analyst, and DrillDown User roles.

16. Click the Save All button located in the lower-right portion of the screen. 17. Click the Analysis Settings tab. Parameters shown on this screen will be used throughout the profiling analysis but can also be restricted in your project. Note the Select View panel on the lefthand portion of the screen. It defaults to Project view.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-51

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

18. Click the Data Sources view in the Select View panel. Default values for various thresholds are displayed. These values determine when Information Analyzer will suggest certain analysis decisions.

19. Select the Vendor table and then click the Modify button in the lower-right portion of the screen. You will now see the Analysis Settings, but note you are placed on the Options view located in the upper-left portion of the window.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-52

U n i t 4 I n f o r m a t i o n A n a l y ze r s e t u p

20. Click the 'Where clause' view and enter a condition for the VENDORCODE column: VENDORCODE = ASCO. Note: This can be accomplished by clicking the Add Condition button in the lowerright portion of the screen and double-clicking in the column cell and the value cell. By completing the Where clause for the VENDOR table, you are limiting the IA analyses to only the data qualified by that Where clause. This restriction will apply only to the current project. By using the Where clause you can enforce security by value. Threshold parameters can be set at the database, table, column, or even column value (using the Where clause) levels.

21. Click OK and notice that a red flag now appears next to the VENDOR table. This means that analysis settings for the vendor table differ from the analysis settings for the project. 22. Since you do not want to really restrict the records found in the vendor table, repeat the process used to create the condition but remove the condition instead. Make no further changes. 23. Close Information Server Console. Results: This demonstration showed students the configuration settings for Information Analyzer at the product level. You created the ODBC data source, set Information Analyzer configuration options, added the data store, imported Chemco defined metadata and created the project, added users, and registered interest in source data.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-53

Unit 5

Data Classes

Data Classes

Information Analyzer v11.5 © Copyright IBM Corporation 2016 Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 5 Data Classes

Demonstration 1 IGC data classes

• Using IGC examine installed data classes

Data classes

© Copyright IBM Corporation 2016

Demonstration 1: IGC data classes

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-11

Unit 5 Data Classes

Demonstration 1: IGC data classes Purpose: This demonstration shows how to use IGC to examine data classes. A number of default data classes are installed automatically in IGC.

Task 1. Examine the installed data classes in IGC. 1.

Logon to Information Governance Catalog using the IIS Server Launchpad using student/student.

2.

Select the Information Governance Catalog login using student/student.

3.

From the drop down menu, choose Information Assets > Data Classes.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-12

Unit 5 Data Classes

4.

The default installed Data Classes are listed in the left pane.

5.

Select Country Code to see the right pane View Details populated with the details of the Country Code data class, including its type (in this case, Valid Values).

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-13

Unit 5 Data Classes

6.

Click the twisty in the Definition box to see details about Country Code, including all the valid values.

7.

Examine the other data classes until you have found examples of all three types of data classes: Valid Values, Regex, and Java class.

Results: This demonstration showed you how to use IGC to examine data classes. A number of default data classes are installed automatically in IGC.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-14

Unit 5 Data Classes

Demonstration 2 Familiarization with IA thin client

• Work with the IA thin client features

Data classes

© Copyright IBM Corporation 2016

Demonstration 2: Familiarization with IA thin client

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-33

Unit 5 Data Classes

Demonstration 2: Familiarization with IA thin client Purpose: Work with the new Information Analyzer thin client.

Task 1. Explore IA thin client. 1.

Logon to Information Server using the IIS Server Launchpad.

2.

Select Information Analyzer using student/student.

The thin client will show all existing Information Analyzer Thick Client projects:

3.

Use Ctrl+ and Ctrl- to resize the cards. Press Ctrl0 when done to reset to 100%.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-34

Unit 5 Data Classes

4.

Select the Find data tab at the top of the screen. You will see all the metadata imported via IMAM and used in any current Information Analyzer projects:

5.

Click the Sort icon to see the ways you can sort the data sets.

6.

Click the Search icon to see the ways you can sort the data sets.

7.

Examine the list of search options.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-35

Unit 5 Data Classes

8.

Search for the keyword 'ord' (without the quotes) by typing text where it says Type text.

9.

Look at the names of the data set(s) returned. Do they have 'ord' in the file name? Search searches file names, descriptions, and column names. The upper left of the screen tells you that you are looking at a subset of your data sets.

10. Clear the search by clicking the red x.

A filter will show only data sets with any selected data class (for example, 'email address'). To use filters, bring up the search pane as previously.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-36

Unit 5 Data Classes

11. Under Filters, expand Selected data class and uncheck Select all to clear all check boxes.

12. Check Code and then click Apply Filter.

13. You should see 4 data sets now that the filter has been applied:

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-37

Unit 5 Data Classes

14. Now apply an additional filter for 'Found data class' of Date. Apply this filter as per 'Selected data class'.

15. How many data sets do you see now? Multiple filters are an 'and' condition.

The search pane may cover the right hand side of the data sets. Close this by clicking the x in the search pane.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-38

Unit 5 Data Classes

16. Clear the filters by clicking the red x or selecting clear.

17. Close the search pane (if necessary).

Results: You worked with the new Information Analyzer thin client.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-39

Column Analysis

Column Analysis

Information Analyzer v11.5 © Copyright IBM Corporation 2016 Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 6 C o l u m n A n a l ys i s

Demonstration 1 Column Analysis

• Run Column Analysis on tables • Review results

Column analysis

© Copyright IBM Corporation 2016

Demonstration 1: Column Analysis

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-15

U n i t 6 C o l u m n A n a l ys i s

Demonstration 1: Column Analysis Purpose: This demonstration shows how to perform Information Analyzer column analysis. Column analysis examines data content at the column level within a record. This analysis is the first step in understanding your source data and will frequently reveal problems with data quality.

Task 1. Run Column Analysis for Customer and Vendor tables. 1.

Logon to Information Analyzer using the Information Server Console using student/student.

2.

Select the ChemCo project from the Projects list. Recall that analysis functions are performed in the context of a project. Double-click the ChemCo project to open it. Notice the tabs. The Dashboard tab is on the top with Details, Analysis, and Quality tabs underneath. Click the Analysis tab. This tab lists the data that is registered to your project and summarizes the progress of your data profiling effort. (There is not much to show yet.) From the Pillar menus bar, click Investigate > Column Analysis. Expand the Seq data source down to the files and then select Customer.txt. Click the Run Column Analysis option under the Task list located in the upperright portion of the window.

3. 4. 5.

6. 7. 8.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-16

U n i t 6 C o l u m n A n a l ys i s

9.

On the right-hand portion of the screen, verify that the Run Now radio button is on. (Do not click the Sample tab; you will learn more about this option later.)

10. Near the bottom right-hand portion of the window use the drop-down menu to click Submit and then click Submit again.

11. Place the cursor near the bottom of the window until a pop-up screen appears. 12. Click the Details button to view job run statistics. An ActivityStatus panel will appear.

If an error occurs, you will be notified in the Status column. You would then research the source of the error, fix the problem, and then rerun the job. When the job completes, the Status column will display Schedule Complete and a Summary panel will appear on the right-hand side that displays details for the job run when the job is selected.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-17

U n i t 6 C o l u m n A n a l ys i s

13. Click the Close button on the Summary panel. Note the column statuses in the CUSTOMER.txt table are now set to Analyzed.

14. Run Column Analysis on ALL the remaining tables using the same steps as the Customer table.

Task 2. Review Column Analysis for the Vendor table. 1. 2. 3.

Click Investigate > Column Analysis. In the Column Analysis tab, right-click the VENDOR table and click the Open Column Analysis option or you can click the Open Column Analysis option in the Tasks list. Take a few moments to review the information displayed on the View Analysis Summary panel. Note the red flags in the first of the detail columns. These flags indicate that the inferred properties for a column, as determined by Information Analyzer, differ from the formally declared column definitions (metadata from the Metadata import).

Note: If the View Analysis panel indicates that only 1 record was read from your vendor file, recall that you set a Where clause in a previous demonstration that created the condition: VENDORCODE = ASCO. If you forgot to remove the Where clause condition, then you will only see one record from this screen. If this is the case, go back to the previous demonstration instructions and remove the Where clause from the analysis settings and then rerun column analysis for the VENDOR table.

© Copyright IBM Corp. 2007, 2016 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-18

U n i t 6 C o l u m n A n a l ys i s

For each column, the View Analysis Summary screen shows:

4.

5.



Totals: rows, columns



Cardinality



Data Class



Data Type



Length



Precision



Scale



Nullability



Cardinality Type



Format



Review Status

>> and > and