Actian Matrix (Formely ParAccel) - Architecture and SQL [1 ed.] 9781940540313

One of the most exciting new technologies is Columnar and one of the premier pioneers of this technology is Actian’s Mat

169 106 10MB

English Pages 697 Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Actian Matrix (Formely ParAccel) - Architecture and SQL [1 ed.]
 9781940540313

Citation preview

The Tera-Tom Video Series

Lessons with Tera-Tom Teradata Architecture and SQL Video Series These exciting videos make learning and certification much easier

Three ways to view them: 1. Safari (look up Coffing Studios) 2. CoffingDW.com (sign-up on our website) 3. Your company can buy them all for everyone to see (contact [email protected])

The Tera-Tom Genius Series

The Tera-Tom Genius Series consists of ten books. Each book is designed for a specific audience, and Teradata is explained to the level best suited for that audience. The books take a building block approach; always starting out simple, then each page builds upon the previous point. Order them all at www.CoffingDW.com.

Tera-Tom- Author of over 50 Books

Tera-Tom books have been the primary source of Teradata learning for over 20 years. They have helped to teach millions of people all aspects of Teradata. What people love the most about the Tera-Tom books is how easy they are to understand. They are so easy that a seven year old boy (raised by wolves) can understand them!

The Best Query Tool Works on all Systems

When you possess a tool like Nexus, you have access to every system in your enterprise! The Nexus Query Chameleon is the only tool that works on all systems. Its Super Join Builder allows for the ERwin Logical Model to be loaded, and then Nexus shows tables and views visually. It then guides users to show what joins to what. As users choose the tables and columns they want in their report, Nexus builds the SQL for them with each click of the mouse. Nexus was designed for Teradata and Hadoop, but works on all platforms. Nexus even converts table structures between vendors, so querying and managing multi-vendor platforms is transparent. Even if you only work with one system, you will find that the Nexus is the best query tool you have ever used. If you work with multiple systems, you will be even more amazed. Download a free trial at www.CoffingDW.com.

Trademarks and Copyrights Matrix is a trademark of Actian. Microsoft Windows, Windows 2003 Server, SQL Server 2012, SQL Server Compact Edition, .NET, PDW, SQL Server, T-SQL, Azure SQL Data Warehouse and Azure Cloud are trademarks of Microsoft. Teradata, NCR, BYNET and SQL Assistant are registered trademarks of Teradata Corporation, Dayton, Ohio, U.S.A., IBM, DB2 and Netezza are registered trademarks of IBM Corporation, ANSI is a registered trademark of the American National Standards Institute. Ethernet is a trademark of Xerox. UNIX is a trademark of The Open Group. Linux is a trademark of Linus Torvalds. Java and Oracle is a trademark of Oracle. ParAccel is a trademark of ParAccel. Kognitio is a trademark of Kognitio. Greenplum is a trademark of EMC Corporation. Nexus Query Chameleon is a trademark of Coffing Data Warehousing. Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of programs or program segments that are included. The manual is not a publication of Actian Corporation, nor was it produced in conjunction with Actian Corporation. Copyright © October 2015 by Coffing Publishing ISBN 978-1-940540-31-3 All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, neither is any liability assumed for damages resulting from the use of information contained herein..

About Tom Coffing

Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 20 years. Tom has written over 50 books on all aspects of Teradata, Netezza, Kognitio, Redshift, ParAccel, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 Teradata classes in places such as India, Africa, Europe, China, Malaysia, and throughout North America. Tom is also the owner and designer of the Nexus Query Chameleon, the most sophisticated enterprise query tool in the industry. The Nexus works on all platforms, including Hadoop, converts table structures between all systems, and allows companies to load their ERwin logical model inside Nexus. The Nexus guides users like a GPS system. Users point and click on any table or view from any system, and they are guided to what joins to what. As users choose the columns they want on their report, the SQL is built automatically. In High School, Tom was the first athlete from his school to ever place at state. He was selected by his school to represent them at Buckeye Boys State, and Tom was inducted into the first class of the Lakota High School Hall of Fame. At the University of Arizona and University of Nevada Las Vegas, Tom was a two-time All-American wrestler, Sophomore Athlete of the year, and a two-time winner of the 1980 Olympic wrestling trials. Tom graduated with a Bachelor’s degree in Speech Communications. After college, Tom became a state and national champion speech winner for Toastmasters and won two orchid awards as an actor. Tom is the proud father of three wonderful children and has been married for the past 32 years. You can contact Tom at 513 300-0341 or at [email protected].

About Ed Bernier

Ed Bernier has over 20 years’ experience working with databases, data warehousing and analytics. Most recently Ed spent 8 years at Netezza, inventor of the data warehouse appliance. Ed was a Netezza Systems Engineer and he was teamed up with a sales person to provide pre and postsales support in the New England, Upstate NY and Eastern Canada Sales Territory. The Netezza appliance was one of the most exciting products Ed had ever sold at that point in his career. Ed was a pivotal part of IT history as Netezza continued to pioneer the computer industry with the first true data warehouse appliance. Ed then spent two years as a Sr. Systems Engineer at ParAccel, which was eventually purchased by Actian. ParAccel was founded by one of the original co-founders of Netezza, Barry Zane. Ed once again played a key part in IT history as ParAccel became one of the first MPP columnar databases to run on a cluster of commodity x86 nodes. Amazon’s Redshift database is based on the ParAccel Columnar PADB database. Ed currently works at WebAction, which has pioneered a real-time analytics platform that allows clients to run SQL and analytics against streaming data. His responsibilities include Proof of Concepts (POC's), presentations, whiteboard sessions, product training, software installations and upgrades, sales meetings, RFI and RFP responses, proposals and general sales and customer support activities. His academic credentials include: BS, Computer Science, Math at the University of Hartford ; AAS, Electronics at the University of Hartford Ed is an evangelist of next generation analytical solutions for structured, semi-structured and unstructured data sources. Having spent over 25 years working with databases, data warehousing, business intelligence and analytical appliance technology and solutions he understands the power of analytics to gain insight from the captured data. Ed believes that the answers for many of the struggles of society will come through data analysis. Analyzing clinical, claim, and behavioral data should provide greater insights into health. Analyzing data generated by web browsing, twitter, phone traffic and law enforcement databases will improve public safety and reduce crime and terrorism. Analyzing business data and combining this with external sources will provide companies ever greater insight into their customers, suppliers and internal processes giving them the tools they need to compete and improve efficiencies. Promoting next generation analytical solutions is certainly an exciting and worthwhile pursuit especially as new sources of data generated by sensors becomes more readily available.

Table of Contents

Contents Chapter 1 – What is Columnar? .................................................................................................................................... 1 What is Parallel Processing? ...................................................................................................................................... 2 The Basics of a Single Computer ............................................................................................................................... 3 Data in Memory is fast as Lightning .......................................................................................................................... 4 Parallel Processing Of Data ....................................................................................................................................... 5 A Table has Columns and Rows................................................................................................................................. 6 Rows are Placed Inside a Data Block ........................................................................................................................ 7 Moving Data Blocks is Like Checking In Luggage................................................................................................... 8 Facts That Are Disturbing .......................................................................................................................................... 9 Why Columnar? ....................................................................................................................................................... 10 Row Based Blocks vs. Columnar Based Blocks ...................................................................................................... 11 As Row-Based Tables Get Bigger, the Blocks Split................................................................................................ 12 Data Blocks Are Processed One at a Time Per Unit ................................................................................................ 13 Columnar Tables Store Each Column in Separate Blocks ...................................................................................... 14 Visualize the Data – Rows vs. Columns .................................................................................................................. 15 The Architecture of Actian Matrix........................................................................................................................... 16 Matrix has Linear Scalability ................................................................................................................................... 17 Distribution Styles .................................................................................................................................................... 18 Distribution Key Where the Data is Unique ............................................................................................................ 19 Another Way to Create a Table................................................................................................................................ 20 Distribution Key Where the Data is Non-Unique .................................................................................................... 21 Even Distribution Key.............................................................................................................................................. 22 Matching Distribution Keys for Co-Location of Joins ............................................................................................ 23 Big Table / Small Table Joins .................................................................................................................................. 24 Fact and Dimension Table Distribution Key Designs ............................................................................................. 25

Table of Contents Improving Performance By Defining a Sort Key .................................................................................................... 26 Sort Keys Help Group By, Order By and Window Functions................................................................................. 27 Each Block Comes With Metadata .......................................................................................................................... 28 How Data Might Look On A Slice .......................................................................................................................... 29 Question – How Many Blocks Move Into Memory? .............................................................................................. 30 Answer – How Many Blocks Move Into Memory? ................................................................................................ 31 Quiz – Master that Query With the Metadata .......................................................................................................... 32 Answer to Quiz – Master that Query With the Metadata ........................................................................................ 33 The ANALYZE Command Collects Statistics ........................................................................................................ 34 Matrix Automatically ANALYZES Some Create Statements ................................................................................ 35 What is a Vacuum? .................................................................................................................................................. 36 When is a Good Time to Vacuum? .......................................................................................................................... 37 The VACUUM Command Grooms a Table ............................................................................................................ 38 The Matrix database catalog also needs periodic vacuuming and indexing ............................................................ 39 Database Limits ........................................................................................................................................................ 40 Creating a Database.................................................................................................................................................. 41 Creating a User ......................................................................................................................................................... 42 Dropping a User ....................................................................................................................................................... 43 Inserting Into a Table ............................................................................................................................................... 44 Renaming a Table or a Column ............................................................................................................................... 45 Adding and Dropping a Column to a Table ............................................................................................................. 46 Chapter 2 – Best Practices for Table Design .............................................................................................................. 48 Converting Table Structures to Actian Matrix......................................................................................................... 49 Converting Table Structures to Actian Matrix Finale.............................................................................................. 50 Best Practices for Designing Tables ........................................................................................................................ 51 Choose the Best Sort Key ........................................................................................................................................ 52 Each Block Comes With Metadata .......................................................................................................................... 53 Creating a Sort Key .................................................................................................................................................. 54

Table of Contents Sort Keys Help Group By, Order By and Window Functions................................................................................. 55 Choose a Great Distribution Key ............................................................................................................................. 56 Distribution Key Where the Data is Unique ............................................................................................................ 57 Matching Distribution Keys for Co-Location of Joins ............................................................................................ 58 Big Table / Small Table Joins .................................................................................................................................. 59 Define Primary Key and Foreign Key Constraints .................................................................................................. 60 Primary Key and Foreign Key Examples ................................................................................................................ 61 Use the Smallest Column Size When Creating Tables ............................................................................................ 62 Use Date/Time Data Types for Date Columns ........................................................................................................ 63 Specify Redundant Predicates on the Sort Column ................................................................................................. 64 Setting the statement_timeout to Abort Long Queries ............................................................................................ 65 Chapter 3 – Systems Tables ........................................................................................................................................ 67 Actian Matrix System Tables ................................................................................................................................... 68 Trouble Shooting Catalog Table pg_table_def ........................................................................................................ 69 Seeing the System Tables in your Nexus Tree ........................................................................................................ 70 Catalog Table pg_table_def ..................................................................................................................................... 71 Checking Tables for Skew (Poor Distribution) ....................................................................................................... 72 Checking All Statements That Used the Analyze Command .................................................................................. 73 Checking Tables for Skew (Poor Distribution) ....................................................................................................... 74 Checking for Details about the Last Copy Operation .............................................................................................. 75 Checking When a Table Has Last Been Analyzed .................................................................................................. 76 Checking For Column Information on a Table ........................................................................................................ 77 System tables for troubleshooting data loads........................................................................................................... 78 Determining Whether a Query is Writing to Disk ................................................................................................... 79 Chapter 4 – Compression ............................................................................................................................................ 81 Compression Types .................................................................................................................................................. 82 Byte Dictionary Compression .................................................................................................................................. 83

Table of Contents Delta Encoding ......................................................................................................................................................... 84 Deflate Encoding - Lempel–Ziv–Oberhumer (LZO) ............................................................................................... 85 Mostly Encoding ...................................................................................................................................................... 86 Runlength encoding ................................................................................................................................................. 87 Text255 and Text32k Encodings ............................................................................................................................. 88 Analyze Compression using xpx ‘complyze’ .......................................................................................................... 89 Analyze Results from xpx ‘complyze’..................................................................................................................... 90 Copy ......................................................................................................................................................................... 91 Chapter 5 - Temporary Tables .................................................................................................................................... 93 Create Table Syntax ................................................................................................................................................. 94 Basic Temporary Table Examples ........................................................................................................................... 95 More Advanced Temporary Table Examples .......................................................................................................... 96 Advanced Temporary Table Examples .................................................................................................................... 97 Table Limits and CTAS ........................................................................................................................................... 98 Performing a Deep Copy.......................................................................................................................................... 99 Deep Copy Using the Original DDL...................................................................................................................... 100 Deep Copy Using A CTAS .................................................................................................................................... 101 Deep Copy Using A Create Table LIKE ............................................................................................................... 102 Deep Copy by Creating a Temp Table and Truncating Original ........................................................................... 103 CREATING A Derived Table................................................................................................................................ 104 The Three Components of a Derived Table ........................................................................................................... 105 Naming the Derived Table ..................................................................................................................................... 106 Aliasing the Column Names in the Derived Table ................................................................................................ 107 Visualize This Derived Table ................................................................................................................................ 108 Most Derived Tables Are Used To Join To Other Tables ..................................................................................... 109 Multiple Ways to Alias the Columns in a Derived Table ...................................................................................... 110 Our Join Example with a Different Column Aliasing Style .................................................................................. 111 Column Aliasing Can Default for Normal Columns ............................................................................................. 112

Table of Contents CREATING a Derived Table using the WITH Command .................................................................................... 113 Our Join Example With the WITH Syntax ............................................................................................................ 114 WITH Statement That Uses a SELECT *.............................................................................................................. 115 A WITH Clause That Produces Two Tables ......................................................................................................... 116 The Same Derived Query shown Three Different Ways ....................................................................................... 117 Quiz - Answer the Questions ................................................................................................................................. 118 Answer to Quiz - Answer the Questions................................................................................................................ 119 Clever Tricks on Aliasing Columns in a Derived Table ........................................................................................ 120 A Derived Table lives only for the lifetime of a single query ............................................................................... 121 An Example of Two Derived Tables in a Single Query ........................................................................................ 122 Connecting To Matrix Via Nexus .......................................................................................................................... 123 Connecting To Matrix Via Nexus .......................................................................................................................... 124 Connecting To Matrix Via Nexus .......................................................................................................................... 125 Connecting To Matrix Via Nexus .......................................................................................................................... 126 Chapter 6 – Explain.................................................................................................................................................. 128 Three Ways to Run an EXPLAIN.......................................................................................................................... 129 EXPLAIN – Steps, Segments and Streams............................................................................................................ 130 EXPLAIN Terms for Scans and Joins ................................................................................................................... 131 EXPLAIN Terms for Aggregation and Sorts......................................................................................................... 132 EXPLAIN Terms for Set Operators and Miscellaneous Terms ............................................................................ 133 EXPLAIN Terms for Set Operators and Miscellaneous Terms ............................................................................ 134 EXPLAIN Example and the Cost .......................................................................................................................... 135 EXPLAIN Example and the Rows......................................................................................................................... 136 EXPLAIN Example and the Width ........................................................................................................................ 137 Simple EXPLAIN Example and the Costs ............................................................................................................ 138 EXPLAIN Join Example Using DS_BCAST_INNER .......................................................................................... 139 EXPLAIN Join Example Using DS_DIST_NONE ............................................................................................... 140 EXPLAIN Showing DS_DIST_NONE Visually................................................................................................... 141

Table of Contents EXPLAIN With a Warning .................................................................................................................................... 142 EXPLAIN for Ordered Analytics Such as CSUM ................................................................................................. 143 EXPLAIN for Scalar Aggregate Functions ........................................................................................................... 144 EXPLAIN for HashAggregate Functions .............................................................................................................. 145 EXPLAIN Using Limit, Merge and Sort ............................................................................................................... 146 EXPLAIN Using a WHERE Clause Filter ............................................................................................................ 147 EXPLAIN Using the Keyword Distinct ................................................................................................................ 148 EXPLAIN for Subqueries ...................................................................................................................................... 149 Chapter 7 - Basic SQL Functions ............................................................................................................................. 151 Finding the Current Schema on the Leader Node .................................................................................................. 152 Getting Things Setup in Your Search Path ............................................................................................................ 153 Five Details You Need To Know About the Search_Path..................................................................................... 154 Introduction ............................................................................................................................................................ 155 SELECT * (All Columns) in a Table ..................................................................................................................... 156 SELECT Specific Columns in a Table .................................................................................................................. 157 Commas in the Front or Back? .............................................................................................................................. 158 Place your Commas in front for better Debugging Capabilities ............................................................................ 159 Sort the Data with the ORDER BY Keyword ....................................................................................................... 160 ORDER BY Defaults to Ascending ....................................................................................................................... 161 Use the Name or the Number in your ORDER BY Statement .............................................................................. 162 Two Examples of ORDER BY using Different Techniques ................................................................................. 163 Changing the ORDER BY to Descending Order ................................................................................................... 164 NULL Values sort First in Ascending Mode (Default) ......................................................................................... 165 NULL Values sort Last in Descending Mode (DESC).......................................................................................... 166 Major Sort vs. Minor Sorts .................................................................................................................................... 167 Multiple Sort Keys using Names vs. Numbers ...................................................................................................... 168 Sorts are Alphabetical, NOT Logical ..................................................................................................................... 169 Using A CASE Statement to Sort Logically .......................................................................................................... 170

Table of Contents How to ALIAS a Column Name ............................................................................................................................ 171 A Missing Comma can by Mistake become an Alias ............................................................................................ 172 Comments using Double Dashes are Single Line Comments ............................................................................... 173 Comments for Multi-Lines..................................................................................................................................... 174 Comments for Multi-Lines as Double Dashes Per Line ........................................................................................ 175 A Great Technique for Comments to Look for SQL Errors .................................................................................. 176 Chapter 8 – The WHERE Clause.............................................................................................................................. 178 Using Limit to bring back a Sample ...................................................................................................................... 179 Using Limit with an Order By Statement .............................................................................................................. 180 The WHERE Clause limits Returning Rows ......................................................................................................... 181 Using a Column ALIAS throughout the SQL........................................................................................................ 182 Double Quoted Aliases are for Reserved Words and Spaces ................................................................................ 183 Character Data needs Single Quotes in the WHERE Clause................................................................................. 184 Character Data needs Single Quotes, but Numbers Don’t..................................................................................... 185 NULL means UNKNOWN DATA so Equal (=) won’t Work .............................................................................. 186 Use IS NULL or IS NOT NULL when dealing with NULLs ............................................................................... 187 NULL is UNKNOWN DATA so NOT Equal won’t Work .................................................................................. 188 Use IS NULL or IS NOT NULL when dealing with NULLs ............................................................................... 189 Using Greater Than or Equal To (>=).................................................................................................................... 190 AND in the WHERE Clause .................................................................................................................................. 191 Troubleshooting AND ............................................................................................................................................ 192 OR in the WHERE Clause ..................................................................................................................................... 193 Troubleshooting Or ................................................................................................................................................ 194 Troubleshooting Character Data ............................................................................................................................ 195 Using Different Columns in an AND Statement ................................................................................................... 196 Quiz – How many rows will return? ...................................................................................................................... 197 Answer to Quiz – How many rows will return? .................................................................................................... 198 What is the Order of Precedence? .......................................................................................................................... 199

Table of Contents Using Parentheses to change the Order of Precedence .......................................................................................... 200 Using an IN List in place of OR ............................................................................................................................ 201 The IN List is an Excellent Technique................................................................................................................... 202 IN List vs. OR brings the same Results ................................................................................................................. 203 Using a NOT IN List .............................................................................................................................................. 204 A Technique for Handling Nulls with a NOT IN List ........................................................................................... 205 Another Technique for Handling Nulls with a NOT IN List ................................................................................. 206 BETWEEN is Inclusive ......................................................................................................................................... 207 NOT BETWEEN is Also Inclusive ....................................................................................................................... 208 LIKE command Underscore is Wildcard for one Character.................................................................................. 209 LIKE Command Works Differently on Char Vs Varchar ..................................................................................... 210 The Ilike Command Is NOT Case Sensitive .......................................................................................................... 211 Troubleshooting LIKE Command on Character Data ........................................................................................... 212 Introducing the TRIM Command .......................................................................................................................... 213 Quiz – What Data is Left Justified and what is Right? .......................................................................................... 214 Numbers are Right Justified and Character Data is Left ....................................................................................... 215 Answer – What Data is Left Justified and what is Right? ..................................................................................... 216 An Example of Data with Left and Right Justification ......................................................................................... 217 A Visual of CHARACTER Data vs. VARCHAR Data ........................................................................................ 218 Use the TRIM command to remove spaces on CHAR Data ................................................................................. 219 Like and Your Escape Character of Choice ........................................................................................................... 220 Like and the Default Escape Character .................................................................................................................. 221 Similar To Operators .............................................................................................................................................. 222 Similar To Operators .............................................................................................................................................. 223 Similar To Example with Lower Case Letters ....................................................................................................... 224 Similar To Example with Lower and Upper Case Letters ..................................................................................... 225 Similar To Example with Multiple Occurrences ................................................................................................... 226 Multiple Occurrences Must Be Consecutive ......................................................................................................... 227

Table of Contents Chapter 9 – Distinct Vs Group By AND TOP.......................................................................................................... 229 The Distinct Command .......................................................................................................................................... 230 Distinct vs. GROUP BY ........................................................................................................................................ 231 Quiz – How many rows come back from the Distinct? ......................................................................................... 232 Answer – How many rows come back from the Distinct? .................................................................................... 233 TOP Command....................................................................................................................................................... 234 TOP Command is brilliant when ORDER BY is Used! ........................................................................................ 235 What is the Difference between TOP and LIMIT? ................................................................................................ 236 Chapter 10 - Aggregation.......................................................................................................................................... 238 Quiz – You calculate the Answer Set in your own Mind ...................................................................................... 239 Answer – You calculate the Answer Set in your own Mind ................................................................................. 240 The 3 Rules of Aggregation ................................................................................................................................... 241 There are Five Aggregates ..................................................................................................................................... 242 Quiz – How many rows come back? ..................................................................................................................... 243 Answer – How many rows come back? ................................................................................................................. 244 Troubleshooting Aggregates .................................................................................................................................. 245 GROUP BY when Aggregates and Normal Columns Mix ................................................................................... 246 GROUP BY delivers one row per Group .............................................................................................................. 247 GROUP BY Dept_No or GROUP BY 1 the same thing ....................................................................................... 248 Limiting Rows and Improving Performance with WHERE .................................................................................. 249 WHERE Clause in Aggregation limits unneeded Calculations ............................................................................. 250 Keyword HAVING tests Aggregates after they are Totaled ................................................................................. 251 Keyword HAVING is like an Extra WHERE Clause for Totals ........................................................................... 252 Chapter 11 – Join Functions ..................................................................................................................................... 254 A Two-Table Join Using Traditional Syntax ......................................................................................................... 255 A two-table join using Non-ANSI Syntax with Table Alias ................................................................................. 256 You Can Fully Qualify All Columns ..................................................................................................................... 257

Table of Contents A two-table join using ANSI Syntax ..................................................................................................................... 258 Both Queries have the same Results and Performance.......................................................................................... 259 Quiz – Can You Finish the Join Syntax? ............................................................................................................... 260 Answer to Quiz – Can You Finish the Join Syntax? ............................................................................................. 261 Quiz – Can You Find the Error? ............................................................................................................................ 262 Answer to Quiz – Can You Find the Error? .......................................................................................................... 263 Super Quiz – Can You Find the Difficult Error? ................................................................................................... 264 Answer to Super Quiz – Can You Find the Difficult Error? ................................................................................. 265 Quiz – Which rows from both tables won’t Return? ............................................................................................. 266 Answer to Quiz – Which rows from both tables Won’t Return?........................................................................... 267 LEFT OUTER JOIN .............................................................................................................................................. 268 LEFT OUTER JOIN Results ................................................................................................................................. 269 Left Outer Joins Compatible with Oracle .............................................................................................................. 270 RIGHT OUTER JOIN............................................................................................................................................ 271 RIGHT OUTER JOIN Example and Results......................................................................................................... 272 Right Outer Joins Compatible with Oracle ............................................................................................................ 273 FULL OUTER JOIN .............................................................................................................................................. 274 FULL OUTER JOIN Results ................................................................................................................................. 275 Which Tables are the Left and which are the Right? ............................................................................................. 276 Answer - Which Tables are the Left and which are the Right? ............................................................................. 277 INNER JOIN with Additional AND Clause .......................................................................................................... 278 ANSI INNER JOIN with Additional AND Clause ............................................................................................... 279 ANSI INNER JOIN with Additional WHERE Clause .......................................................................................... 280 OUTER JOIN with Additional WHERE Clause ................................................................................................... 281 OUTER JOIN with Additional AND Clause ......................................................................................................... 282 OUTER JOIN with Additional AND Clause Results ............................................................................................ 283 Quiz – Why is this considered an INNER JOIN? .................................................................................................. 284 The DREADED Product Join ................................................................................................................................ 285 The DREADED Product Join Results ................................................................................................................... 286

Table of Contents The Horrifying Cartesian Product Join .................................................................................................................. 287 The ANSI Cartesian Join will ERROR .................................................................................................................. 288 Quiz – Do these Joins Return the Same Answer Set? ........................................................................................... 289 Answer – Do these Joins Return the Same Answer Set? ....................................................................................... 290 The CROSS JOIN .................................................................................................................................................. 291 The CROSS JOIN Answer Set............................................................................................................................... 292 The Self Join.......................................................................................................................................................... 293 The Self Join with ANSI Syntax ............................................................................................................................ 294 Quiz – Will both queries bring back the same Answer Set? ................................................................................. 295 Answer – Will both queries bring back the same Answer Set? ............................................................................. 296 Quiz – Will both queries bring back the same Answer Set? ................................................................................. 297 Answer – Will both queries bring back the same Answer Set? ............................................................................. 298 How would you Join these two tables? .................................................................................................................. 299 An Associative Table is a Bridge that Joins Two Tables ...................................................................................... 300 Quiz – Can you Write the 3-Table Join? ............................................................................................................... 301 Answer to Quiz – Can you Write the 3-Table Join? .............................................................................................. 302 Quiz – Can you Write the 3-Table Join to ANSI Syntax? ..................................................................................... 303 Answer – Can you Write the 3-Table Join to ANSI Syntax? ................................................................................ 304 Quiz – Can you Place the ON Clauses at the End?................................................................................................ 305 Answer – Can you Place the ON Clauses at the End? ........................................................................................... 306 The 5-Table Join – Logical Insurance Model ........................................................................................................ 307 Quiz - Write a Five Table Join Using ANSI Syntax .............................................................................................. 308 Answer - Write a Five Table Join Using ANSI Syntax ......................................................................................... 309 Quiz - Write a Five Table Join Using Non-ANSI Syntax ..................................................................................... 310 Answer - Write a Five Table Join Using Non-ANSI Syntax ................................................................................. 311 Quiz –Re-Write this putting the ON clauses at the END ...................................................................................... 312 Answer –Re-Write this putting the ON clauses at the END .................................................................................. 313

Table of Contents Chapter 12 – Date Functions..................................................................................................................................... 315 Current_Date .......................................................................................................................................................... 316 TIMEOFDAY()...................................................................................................................................................... 317 SYSDATE Returns a Timestamp with Microseconds ........................................................................................... 318 GETDATE Returns a Timestamp without Microseconds ..................................................................................... 319 Add or Subtract Days from a date .......................................................................................................................... 320 The ADD_MONTHS Command Returns a Timestamp ........................................................................................ 321 The ADD_MONTHS Command with Trunc Removes Time ............................................................................... 322 ADD_MONTHS Command to Add 1-Year or 5-Years ........................................................................................ 323 Dateadd Function and Add_Months Function are Different ................................................................................. 324 The EXTRACT Command .................................................................................................................................... 325 EXTRACT from DATES and TIME ..................................................................................................................... 326 EXTRACT with DATE and TIME Literals........................................................................................................... 327 EXTRACT of the Month on Aggregate Queries ................................................................................................... 328 The Datediff command .......................................................................................................................................... 329 The Datediff Function on Column Data ................................................................................................................ 330 The Date_Part Function Using a Date ................................................................................................................... 331 The Date_Part Function Using a Time .................................................................................................................. 332 Date_Part Abbreviations ........................................................................................................................................ 333 The to_char command ............................................................................................................................................ 334 Conversion Functions............................................................................................................................................. 335 Conversion Function Templates ............................................................................................................................ 336 Conversion Function Templates Continued ........................................................................................................... 337 Formatting a Date ................................................................................................................................................... 338 A Summary of Math Operations on Dates ............................................................................................................. 339 Using a Math Operation to find your Age in Years ............................................................................................... 340 Date Related Functions .......................................................................................................................................... 341 A Side Title example with Reserved Words as an Alias ....................................................................................... 342 Implied Extract of Day, Month and Year .............................................................................................................. 343

Table of Contents DATE_PART Function.......................................................................................................................................... 344 DATE_PART Function using an ALIAS .............................................................................................................. 345 DATE_TRUNC Function ...................................................................................................................................... 346 DATE_TRUNC Function using TIME .................................................................................................................. 347 MONTHS_BETWEEN Function .......................................................................................................................... 348 MONTHS_BETWEEN Function in Action .......................................................................................................... 349 ANSI TIME ............................................................................................................................................................ 350 ANSI TIMESTAMP ............................................................................................................................................... 351 Matrix TIMESTAMP Function ............................................................................................................................. 352 Matrix TO_TIMESTAMP Function ...................................................................................................................... 353 Matrix NOW() Function ........................................................................................................................................ 354 Matrix TIMEOFDAY Function ............................................................................................................................. 355 Matrix AGE Function ............................................................................................................................................ 356 Time Zones............................................................................................................................................................. 357 Setting Time Zones ................................................................................................................................................ 358 Using Time Zones .................................................................................................................................................. 359 Intervals for Date, Time and Timestamp ............................................................................................................... 360 Using Intervals ....................................................................................................................................................... 361 Troubleshooting the Basics of a Simple Interval ................................................................................................... 362 Interval Arithmetic Results .................................................................................................................................... 363 A Date Interval Example ........................................................................................................................................ 364 A Time Interval Example ....................................................................................................................................... 365 A DATE Interval Example ..................................................................................................................................... 366 A Complex Time Interval Example using CAST .................................................................................................. 367 A Complex Time Interval Example using CAST .................................................................................................. 368 The OVERLAPS Command .................................................................................................................................. 369 An OVERLAPS Example that Returns No Rows ................................................................................................. 370 The OVERLAPS Command using TIME.............................................................................................................. 371 The OVERLAPS Command using a NULL Value ............................................................................................... 372

Table of Contents Chapter 13 – OLAP Functions ................................................................................................................................. 374 CSUM..................................................................................................................................................................... 375 CSUM – The Sort Explained ................................................................................................................................. 376 CSUM – Rows Unbounded Preceding Explained ................................................................................................. 377 CSUM – Making Sense of the Data ....................................................................................................................... 378 CSUM – Making Even More Sense of the Data .................................................................................................... 379 CSUM – The Major and Minor Sort Key(s) .......................................................................................................... 380 Reset with a PARTITION BY Statement .............................................................................................................. 381 PARTITION BY only Resets a Single OLAP not ALL of them........................................................................... 382 ANSI Moving Window is Current Row and Preceding n Rows ........................................................................... 383 How ANSI Moving SUM Handles the Sort .......................................................................................................... 384 Quiz – How is that Total Calculated? .................................................................................................................... 385 Answer to Quiz – How is that Total Calculated? .................................................................................................. 386 Moving SUM every 3-rows Vs a Continuous Average ......................................................................................... 387 Partition by Resets an ANSI OLAP ....................................................................................................................... 388 Moving Average..................................................................................................................................................... 389 The Moving Window is Current Row and Preceding ............................................................................................ 390 How Moving Average Handles the Sort ................................................................................................................ 391 Quiz – How is that Total Calculated? .................................................................................................................... 392 Answer to Quiz – How is that Total Calculated? .................................................................................................. 393 Quiz – How is that 4th Row Calculated? ................................................................................................................ 394 Answer to Quiz – How is that 4th Row Calculated? .............................................................................................. 395 Moving Average every 3-rows Vs a Continuous Average .................................................................................... 396 Partition By Resets an ANSI OLAP ...................................................................................................................... 397 RANK Defaults to Ascending Order ..................................................................................................................... 398 Getting RANK to Sort in DESC Order .................................................................................................................. 399 RANK() OVER and PARTITION BY .................................................................................................................. 400 RANK() OVER And LIMIT .................................................................................................................................. 401 PERCENT_RANK() OVER .................................................................................................................................. 402

Table of Contents PERCENT_RANK() OVER with 14 rows in Calculation .................................................................................... 403 PERCENT_RANK() OVER with 21 rows in Calculation .................................................................................... 404 Quiz – What Causes the Product_ID to Reset? ..................................................................................................... 405 Answer to Quiz – What Cause the Product_ID to Reset? ..................................................................................... 406 COUNT OVER for a Sequential Number ............................................................................................................. 407 Quiz – What caused the COUNT OVER to Reset? ............................................................................................... 408 Answer to Quiz – What caused the COUNT OVER to Reset? ............................................................................. 409 The MAX OVER Command.................................................................................................................................. 410 MAX OVER with PARTITION BY Reset ............................................................................................................ 411 The MIN OVER Command ................................................................................................................................... 412 Quiz – Fill in the Blank .......................................................................................................................................... 413 Answer – Fill in the Blank ..................................................................................................................................... 414 The Row_Number Command ................................................................................................................................ 415 Quiz – How did the Row_Number Reset? ............................................................................................................. 416 Quiz – How did the Row_Number Reset? ............................................................................................................. 417 Standard Deviation Functions Using STDDEV / OVER ...................................................................................... 418 Standard Deviation Functions and STDDEV / OVER Syntax .............................................................................. 419 STDDEV / OVER Example ................................................................................................................................... 420 VARIANCE / OVER Syntax ................................................................................................................................. 421 Variance Functions Using VARIANCE / OVER .................................................................................................. 422 Using VARIANCE with PARTITION BY Example ............................................................................................ 423 Using FIRST_VALUE and LAST_VALUE ......................................................................................................... 424 Using FIRST_VALUE ........................................................................................................................................... 425 Using LAST_VALUE ............................................................................................................................................ 426 Using LAG and LEAD........................................................................................................................................... 427 Using LEAD........................................................................................................................................................... 428 Using LEAD With and Offset of 2 ........................................................................................................................ 429 Using LAG ............................................................................................................................................................. 430 Using LAG with an Offset of 2 .............................................................................................................................. 431

Table of Contents Chapter 14 – Temporary Tables ............................................................................................................................... 433 CREATING A Derived Table................................................................................................................................ 434 The Three Components of a Derived Table ........................................................................................................... 435 Naming the Derived Table ..................................................................................................................................... 436 Aliasing the Column Names in The Derived Table ............................................................................................... 437 Visualize This Derived Table ................................................................................................................................ 438 Most Derived Tables Are Used To Join To Other Tables ..................................................................................... 439 Multiple Ways to Alias the Columns in a Derived Table ...................................................................................... 440 Our Join Example with a Different Column Aliasing Style .................................................................................. 441 Column Aliasing Can Default for Normal Columns ............................................................................................. 442 CREATING a Derived Table using the WITH Command .................................................................................... 443 Our Join Example With the WITH Syntax ............................................................................................................ 444 WITH ..................................................................................................................................................................... 445 A WITH Clause That Produces Two Tables ......................................................................................................... 446 The Same Derived Query shown Three Different Ways ....................................................................................... 447 Quiz - Answer the Questions ................................................................................................................................. 448 Answer to Quiz - Answer the Questions................................................................................................................ 449 Clever Tricks on Aliasing Columns in a Derived Table ........................................................................................ 450 A Derived Table lives only for the lifetime of a single query ............................................................................... 451 An Example of Two Derived Tables in a Single Query ........................................................................................ 452 Create Table Syntax ............................................................................................................................................... 453 Basic Temporary Table Examples ......................................................................................................................... 454 More Advanced Temporary Table Examples ........................................................................................................ 455 Advanced Temporary Table Examples .................................................................................................................. 456 Performing a Deep Copy........................................................................................................................................ 457 Deep Copy Using the Original DDL...................................................................................................................... 458 Deep Copy Using A CTAS .................................................................................................................................... 459 Deep Copy Using A Create Table LIKE ............................................................................................................... 460 Deep Copy by Creating a Temp Table and Truncating Original ........................................................................... 461

Table of Contents Chapter 15 – Sub-query Functions ........................................................................................................................... 463 An IN List is much like a Subquery ....................................................................................................................... 464 An IN List Never has Duplicates – Just like a Subquery....................................................................................... 465 An IN List Ignores Duplicates ............................................................................................................................... 466 The Subquery ......................................................................................................................................................... 467 The Three Steps of How a Basic Subquery Works................................................................................................ 468 These are Equivalent Queries ................................................................................................................................ 469 The Final Answer Set from the Subquery.............................................................................................................. 470 Quiz- Answer the Difficult Question ..................................................................................................................... 471 Answer to Quiz- Answer the Difficult Question ................................................................................................... 472 Should you use a Subquery or a Join? ................................................................................................................... 473 Quiz- Write the Subquery ...................................................................................................................................... 474 Answer to Quiz- Write the Subquery..................................................................................................................... 475 Quiz- Write the More Difficult Subquery .............................................................................................................. 476 Answer to Quiz- Write the More Difficult Subquery ............................................................................................ 477 Quiz- Write the Subquery with an Aggregate........................................................................................................ 478 Answer to Quiz- Write the Subquery with an Aggregate ...................................................................................... 479 Quiz- Write the Correlated Subquery .................................................................................................................... 480 Answer to Quiz- Write the Correlated Subquery ................................................................................................... 481 The Basics of a Correlated Subquery ..................................................................................................................... 482 The Top Query always runs first in a Correlated Subquery .................................................................................. 483 Correlated Subquery Example vs. a Join with a Derived Table ............................................................................ 484 Quiz- A Second Chance to Write a Correlated Subquery ..................................................................................... 485 Answer - A Second Chance to Write a Correlated Subquery ................................................................................ 486 Quiz- A Third Chance to Write a Correlated Subquery ........................................................................................ 487 Answer - A Third Chance to Write a Correlated Subquery ................................................................................... 488 Quiz- Last Chance to Write a Correlated Subquery .............................................................................................. 489 Answer – Last Chance to Write a Correlated Subquery ........................................................................................ 490 Quiz- Write the NOT Subquery ............................................................................................................................. 491

Table of Contents Answer to Quiz- Write the NOT Subquery ........................................................................................................... 492 Quiz- Write the Subquery using a WHERE Clause............................................................................................... 493 Answer - Write the Subquery using a WHERE Clause ......................................................................................... 494 Quiz- Write the Subquery with Two Parameters ................................................................................................... 495 Answer to Quiz- Write the Subquery with Two Parameters ................................................................................. 496 How the Double Parameter Subquery Works ........................................................................................................ 497 More on how the Double Parameter Subquery Works .......................................................................................... 498 Quiz – Write the Triple Subquery .......................................................................................................................... 499 Answer to Quiz – Write the Triple Subquery ........................................................................................................ 500 Quiz – How many rows return on a NOT IN with a NULL? ................................................................................ 501 Answer – How many rows return on a NOT IN with a NULL? ........................................................................... 502 How to handle a NOT IN with Potential NULL Values........................................................................................ 503 Using a Correlated Exists ....................................................................................................................................... 504 How a Correlated Exists matches up ..................................................................................................................... 505 The Correlated NOT Exists.................................................................................................................................... 506 The Correlated NOT Exists Answer Set ................................................................................................................ 507 Quiz – How many rows come back from this NOT Exists? .................................................................................. 508 Answer – How many rows come back from this NOT Exists? ............................................................................. 509 Chapter 16 – Substrings and Positioning Functions ................................................................................................. 511 The TRIM Command trims both Leading and Trailing Spaces ............................................................................ 512 A Visual of the TRIM Command Using Concatenation........................................................................................ 513 Trim and Trailing is Case Sensitive ....................................................................................................................... 514 How to TRIM Trailing Letters ............................................................................................................................... 515 The SUBSTRING Command................................................................................................................................. 516 How SUBSTRING Works with NO ENDING POSITION .................................................................................. 517 Using SUBSTRING to move Backwards .............................................................................................................. 518 How SUBSTRING Works with a Starting Position of -1 ..................................................................................... 519 How SUBSTRING Works with an Ending Position of 0 ...................................................................................... 520

Table of Contents The POSITION Command finds a Letters Position .............................................................................................. 521 Quiz – Find that SUBSTRING Starting Position .................................................................................................. 522 Answer to Quiz – Find that SUBSTRING Starting Position ................................................................................. 523 Using the SUBSTRING to Find the Second Word On .......................................................................................... 524 Quiz – Why did only one Row Return? ................................................................................................................. 525 Answer to Quiz – Why Did only one Row Return? .............................................................................................. 526 Concatenation ......................................................................................................................................................... 527 Concatenation and SUBSTRING........................................................................................................................... 528 Four Concatenations Together ............................................................................................................................... 529 Troubleshooting Concatenation ............................................................................................................................. 530 Declaring a Cursor ................................................................................................................................................. 531 Chapter 17 – Interrogating the Data.......................................................................................................................... 533 Quiz – What would the Answer be? ...................................................................................................................... 534 Answer to Quiz – What would the Answer be? ..................................................................................................... 535 The NULLIF Command ......................................................................................................................................... 536 Quiz – Fill in the Blank Values in the Answer Set ................................................................................................ 537 Answer to Quiz – Fill in the Blank Values in the Answer Set .............................................................................. 538 Quiz – Fill in the Answers for the NULLIF Command ......................................................................................... 539 Quiz – Fill in the Answers for the NULLIF Command ......................................................................................... 540 Quiz – Fill in the Answers for the NULLIF Command ......................................................................................... 541 Quiz – Fill in the Answers for the NULLIF Command ......................................................................................... 542 The ISNULL, NVL and COALESCE Commands ................................................................................................ 543 The ISNULL, NVL and COALESCE Commands ................................................................................................ 544 The ISNULL, NVL and COALESCE more examples .......................................................................................... 545 The COALESCE Answer Set ................................................................................................................................ 546 The Coalesce Quiz ................................................................................................................................................. 547 Answer – The Coalesce Quiz ................................................................................................................................. 548 The Basics of CAST (Convert And STore) ........................................................................................................... 549

Table of Contents Some Great CAST (Convert And STore) Examples ............................................................................................. 550 Some Great CAST (Convert And STore) Examples ............................................................................................. 551 Some Great CAST (Convert And STore) Examples ............................................................................................. 552 The Basics of the CASE Statements ...................................................................................................................... 553 The Basics of the CASE Statement........................................................................................................................ 554 Valued Case Vs. A Searched Case......................................................................................................................... 555 Quiz - Valued Case Statement ............................................................................................................................... 556 Answer - Valued Case Statement........................................................................................................................... 557 Quiz - Searched CASE Statement .......................................................................................................................... 558 Answer - Searched CASE Statement ..................................................................................................................... 559 Quiz - When NO ELSE is present in CASE Statement ......................................................................................... 560 Answer - When NO ELSE is present in CASE Statement .................................................................................... 561 When an ELSE is present in CASE Statement ...................................................................................................... 562 Answer - When an ELSE is present in CASE Statement ...................................................................................... 563 When an Alias is NOT used in a CASE Statement ............................................................................................... 564 Answer - When an Alias is NOT used in a CASE Statement................................................................................ 565 Combining Searched Case and Valued Case ......................................................................................................... 566 Nested Case ............................................................................................................................................................ 567 Put a CASE in the ORDER BY ............................................................................................................................. 568 Chapter 18 – View Functions ................................................................................................................................... 570 Creating a Simple View to Restrict Sensitive Columns ........................................................................................ 571 Creating a Simple View to Restrict Rows ............................................................................................................. 572 Creating a View to Join Tables Together............................................................................................................... 573 You Select From a View ........................................................................................................................................ 574 Basic Rules for Views ............................................................................................................................................ 575 An ORDER BY Example Inside of a View ........................................................................................................... 576 An ORDER BY Inside of a View that is Queried Differently............................................................................... 577 Creating a View with Ordered Analytics ............................................................................................................... 578

Table of Contents Creating a View with the TOP Command ............................................................................................................. 579 Creating a View with the LIMIT Command.......................................................................................................... 580 Altering a Table...................................................................................................................................................... 581 Altering a Table after a View has been Created .................................................................................................... 582 A View that Errors after An ALTER ..................................................................................................................... 583 Troubleshooting a View ......................................................................................................................................... 584 Updating Data in a Table through a View ............................................................................................................. 585 Chapter 19 – Set Operators Functions ...................................................................................................................... 587 Rules of Set Operators ........................................................................................................................................... 588 INTERSECT Explained Logically......................................................................................................................... 589 INTERSECT Explained Logically......................................................................................................................... 590 UNION Explained Logically ................................................................................................................................. 591 UNION Explained Logically ................................................................................................................................. 592 UNION ALL Explained Logically ........................................................................................................................ 593 UNION Explained Logically ................................................................................................................................. 594 EXCEPT Explained Logically ............................................................................................................................... 595 EXCEPT Explained Logically ............................................................................................................................... 596 Minus Explained Logically .................................................................................................................................... 597 Minus Explained Logically .................................................................................................................................... 598 Testing Your Knowledge ....................................................................................................................................... 599 Answer - Testing Your Knowledge ....................................................................................................................... 600 Testing Your Knowledge ....................................................................................................................................... 601 Answer - Testing Your Knowledge ....................................................................................................................... 602 An Equal Amount of Columns in both SELECT List ........................................................................................... 603 Columns in the SELECT list should be from the same Domain ........................................................................... 604 The Top Query handles all Aliases ........................................................................................................................ 605 The Bottom Query does the ORDER BY (a Number) .......................................................................................... 606 Great Trick: Place your Set Operator in a Derived Table..................................................................................... 607

Table of Contents UNION Vs UNION ALL ....................................................................................................................................... 608 A Great Example of how EXCEPT works ............................................................................................................ 609 Chapter 20 – Statistical Aggregate Functions........................................................................................................... 611 The Stats Table ....................................................................................................................................................... 612 STDDEV ................................................................................................................................................................ 613 Casting STDDEV_SAMP and SQRT (VAR_SAMP)........................................................................................... 614 The STDDEV_POP Function ................................................................................................................................ 615 A STDDEV_POP Example ................................................................................................................................... 616 The STDDEV_SAMP Function............................................................................................................................. 617 A STDDEV_SAMP Example ................................................................................................................................ 618 The VAR_POP Function ....................................................................................................................................... 619 A VAR_POP Example ........................................................................................................................................... 620 The VAR_SAMP Function .................................................................................................................................... 621 A VAR_SAMP Function ....................................................................................................................................... 622 Chapter 21 – Nexus ................................................................................................................................................... 624 Nexus is Now Available on the Microsoft Azure Cloud ....................................................................................... 625 Nexus Queries Every Major System ...................................................................................................................... 626 Setup of Nexus is as easy as pie ............................................................................................................................. 627 Setup of Nexus is a Easy as 1, 2, 3 ........................................................................................................................ 628 Nexus Data Visualization ....................................................................................................................................... 629 Nexus Data Visualization ....................................................................................................................................... 630 Nexus Data Visualization Shows What Tables Can Be Joined ............................................................................. 631 Nexus is doing a Five-Table Join ........................................................................................................................... 632 Nexus Generates the SQL Automatically .............................................................................................................. 633 Nexus Delivers the Report ..................................................................................................................................... 634 Cross-System Joins from Teradata, Oracle and SQL Server ................................................................................. 635 The Tab of the Super Join Builder ......................................................................................................................... 636

Table of Contents The 9 Tabs of the Super Join Builder – Objects Tab 1 .......................................................................................... 637 Selecting Columns in the Objects Tab ................................................................................................................... 638 The 9 Tabs of the Super Join Builder – Columns Tab 2........................................................................................ 639 Removing Columns from the Report in the Columns Tab .................................................................................... 640 The 9 Tabs of the Super Join Builder – Sorting Tab 3 .......................................................................................... 641 The 9 Tabs of the Super Join Builder – Joins Tab 4 .............................................................................................. 642 The 9 Tabs of the Super Join Builder – Where Tab 5 ........................................................................................... 643 Using the WHERE Tab For Additional WHERE or AND .................................................................................... 644 The 9 Tabs of the Super Join Builder – SQL Tab 6............................................................................................... 645 The 9 Tabs of the Super Join Builder – Answer Set Tab 7 ................................................................................... 646 The 9 Tabs of the Super Join Builder – Analytics Tab 9 ....................................................................................... 647 Analytics Tab ......................................................................................................................................................... 648 Analytics Tab – OLAP Example ........................................................................................................................... 649 Analytics Tab – OLAP Example of SQL Generated ............................................................................................. 650 Analytics Tab – Grouping Sets Example ............................................................................................................... 651 Analytics Tab – Grouping Sets Answer Set .......................................................................................................... 652 Nexus Data Movement ........................................................................................................................................... 653 Moving a Single Table To a Different System ...................................................................................................... 654 The Single Table Data Movement Screen ............................................................................................................. 655 Moving an Entire Database To a Different System ............................................................................................... 656 The Database Mover Screen .................................................................................................................................. 657 The Database Mover Options Tab ......................................................................................................................... 658 Converting DDL Table Structures ......................................................................................................................... 659 Converting DDL Table Structures ......................................................................................................................... 660 Converting DDL Table Structures ......................................................................................................................... 661 Hound Dog Compression ....................................................................................................................................... 662 Hound Dog Compression on Teradata ................................................................................................................... 663 Hound Dog Compression on Teradata ................................................................................................................... 664

Chapter 1

What is Columnar?

Chapter 1

What is Columnar?

Chapter 1 – What is Columnar?

“When you go into court you, are putting your fate into the hands of twelve people who weren’t smart enough to get out of jury duty.” – Norm Crosby

Page 1

Chapter 1

What is Columnar?

What is Parallel Processing? "After enlightenment, the laundry" - Zen Proverb

Tera-Tom's Parallel Processing Wash and Dry

"After parallel processing the laundry, enlightenment!" -Matrix Zen Proverb

Two guys were having fun on a Saturday night when one said, “I’ve got to go and do my laundry.” The other said, "What!?" The first man explained that if he went to the laundry mat the next morning, he would be lucky to get one machine and be there all day. But if he went on Saturday night, he could get all the machines. Then, he could do all his wash and dry in two hours. Now that's parallel processing mixed in with a little dry humor! Page 2

Chapter 1

What is Columnar?

The Basics of a Single Computer CPU

Memory How are we doing on orders today?

Orders Order_No 100 200 300 400

Customer_No

Order_Date

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total

12347.53 8005.91 5111.47 15231.62

How would I know? I'm just a disk. I need to transfer the block of data to the memory, and that is a slow process.

“When you are courting a nice girl, an hour seems like a second. When you sit on a red-hot cinder, a second seems like an hour. That’s relativity.” – Albert Einstein

Data on disk does absolutely nothing. When data is requested, the computer moves the data one block at a time from disk into memory. Once the data is in memory, it is processed by the CPU at lightning speed. All computers work this way. The "Achilles Heel" of every computer is the slow process of moving data from disk to memory. The real theory of relativity is to find out how to get blocks of data from the disk into memory faster! Page 3

Chapter 1

What is Columnar?

Data in Memory is fast as Lightning CPU Memory Order_No 100 200 300 400

Customer_No

Order_Date

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 12347.53 8005.91 5111.47 15231.62

Orders Order_No

100 200 300 400

Customer_No

Order_Date

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 12347.53 8005.91 5111.47 15231.62

“You can observe a lot by watching.” – Yogi Berra

Once the data block is moved off of the disk and into memory, the processing of that block happens as fast as lightning. It is the movement of the block from disk into memory that slows down every computer. Data being processed in memory is so fast that even Yogi Berra couldn't catch it!

Page 4

Chapter 1

What is Columnar?

Parallel Processing Of Data Parallel Process

Parallel Process

Memory

Memory

Cust_No

Order_Date

Order_Total

Cust_No

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12347.53 8005.91 5111.47 15231.62

34345699 41456543 51323154 67823486

Order_Date

Orders Cust_No 21345679 32456733 31323134 87323456

Parallel Process Memory

Order_Total

01/01/2013 01/01/2013 01/01/2013 01/01/2013

13347.51 13005.91 7611.57 11671.92

Cust_No

Order_Date

87945679 98756733 35623134 97873456

Orders

Order_Date

Order_Total

Cust_No

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12347.53 8005.91 5111.47 15231.62

34345699 41456543 51323154 67823486

Order_Date

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Parallel Process Memory

Order_Total

Cust_No

Order_Date

Order_Total

8347.53 17005.91 3451.47 19871.62

44445679 32547733 57497134 87768956

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12447.53 8055.66 5651.47 231.62

Order_Total

Cust_No

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Orders

Order_Total 13347.51 13005.91 7611.57 11671.92

Cust_No

Order_Date

87945679 98756733 35623134 97873456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Orders 8347.53 17005.91 3451.47 19871.62

44445679 32547733 57497134 87768956

Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 12447.53 8055.66 5651.47 231.62

"If the facts don't fit the theory, change the facts."

-Albert Einstein

Big Data is all about parallel processing. Parallel processing is all about taking the rows of a table and spreading them among many parallel processing units. Above, we can see a table called Orders. There are 16 rows in the table. Each parallel processor holds four rows. Now they can process the data in parallel and be four times as fast. What Albert Einstein meant to say was, “If the theory doesn't fit the dimension table, change it to a fact." Page 5

Chapter 1

What is Columnar?

A Table has Columns and Rows Emp_No Dept_No First_Name 100 1001 Rafael 200 1002 Maria 300 1003 Charl 400 1004 Kyle 400 1005 Rob 300 1006 Inna 200 1007 Sushma 100 1008 Mo 300 1009 Mo

Last_Name Salary Minal 90000 Gomez 80000 Kertzel 70000 Stover 60000 Rivers 50000 Kinski 50000 Davis 50000 Khan 60000 Swartz 70000

Parallel Process

Parallel Process

Parallel Process

Employee_Table

Employee_Table

Employee_Table

1001 100 Rafael

Minal 90000 1002 200 Maria Gomez 80000 1003 300 Charl Kertzel 70000

1004 400 Kyle

Stover 60000 1005 400 Rob

1007 200 Sushma Davis 50000 1008 100 Mo

Rivers 50000 1006 300 Inna Kinski 50000 Khan

60000 1009 300 Mo Swartz 70000

The table above has 9 rows. Our small system above has three parallel processing units. Each unit holds three rows. Page 6

Chapter 1

What is Columnar?

Rows are Placed Inside a Data Block Parallel Process Memory

Parallel Process

Parallel Process

Memory

Memory

Employee_Table

Employee_Table

Employee_Table

The rows of a table are stored on disk in a data block. Above, you can see we have four rows in each data block. Think of the data block as a suitcase you might take to the airport (without the $50 fee). Page 7

Chapter 1

What is Columnar?

Moving Data Blocks is Like Checking In Luggage Parallel Process

Parallel Process

Parallel Process

Memory

Memory

Memory

Employee_Table

Employee_Table

Employee_Table

Please put your data block on the scale (inside memory)

To a computer, the data block on disk is as heavy as a large suitcase. It is difficult and cumbersome to lift.

Page 8

Chapter 1

What is Columnar?

Facts That Are Disturbing “Life is not the candle or the wick, it's the burning.” – David Joseph Schwartz

Emp_No

1001 1002 1003 1004 1005 1006 1007 1008 1009

Dept_No First_Name

100 200 300 400 400 300 200 100 300

Rafael Maria Charl Kyle Rob Inna Sushma Mo Mo

Last_Name

Salary

Minal Gomez Kertzel Stover Rivers Kinski Davis Khan Swartz

90000 80000 70000 60000 50000 50000 50000 60000 70000

The data block above has 9 rows and five columns. If someone requested to see Rob Rivers’ salary, the entire data block would still have to move into memory. Then, a salary of 50000 would be returned. That is a lot of heavy lifting just to analyze one row and return one column. It is just like burning an entire candle just because you need a flicker of light!

Page 9

Chapter 1

What is Columnar?

Why Columnar?

“Everyone is kneaded out of the same dough but not baked in the same oven.” – Yiddish Proverb

Emp_No

Dept_No

1001 1002 1003 1004 1005 1006 1007 1008 1009

100 200 300 400 400 300 200 100 300

First_Name

Rafael Maria Charl Kyle Rob Inna Sushma Mo Mo

Last_Name

Minal Gomez Kertzel Stover Rivers Kinski Davis Khan Swartz

Salary

90000 80000 70000 60000 50000 50000 50000 60000 70000

Each data block holds a single column. The row can be rebuilt because everything is aligned perfectly. If someone runs a query that would return the average salary, then only one small data block is moved into memory. The salary block moves into memory where it is processed as fast as lightning. We just cut down on moving large blocks by 80%! Why columnar? Because, like our Yiddish Proverb says, "All data is not kneaded on every query, so that is why it costs so much dough."

Page 10

Chapter 1

What is Columnar?

Row Based Blocks vs. Columnar Based Blocks “Two roads diverged in a wood and I took the one less traveled by, and that has made all the difference.” – Robert Frost

Row based

Columnar Design

Both designs have the same amount of data. Both take up just as much space. In this example, both have 9 rows and five columns. If a query needs to analyze all of the rows or return most of the columns, then the row based design is faster and more efficient. However, if the query only needs to analyze a few rows or merely a few columns, then the columnar design is much lighter because not all of the data is moved into memory. Just one or two columns move. Take the road less traveled. Page 11

Chapter 1

What is Columnar?

As Row-Based Tables Get Bigger, the Blocks Split Parallel Process

Parallel Process

Parallel Process

Memory

Memory

Memory

Employee_Table

Employee_Table

Employee_Table

When you go on vacation for two-weeks, you might pack a lot of clothes. As a result, you may need to take two suitcases. A data block can only get so big before it is forced to split, otherwise it might not fit into memory.

Page 12

Chapter 1

What is Columnar?

Data Blocks Are Processed One at a Time Per Unit Memory

Memory

Memory

Employee_Table

Employee_Table

Employee_Table

We’re next

We’re next

We’re next

At the Airport luggage counter, each bag needs to be weighed. You put bag one on first, and then after it is processed, you put on bag two. That is how the processing of data blocks happen- One data block at a time. Page 13

Chapter 1

What is Columnar?

Columnar Tables Store Each Column in Separate Blocks Parallel Process

Parallel Process

Parallel Process

Memory

Memory

Memory

AVG Salary

AVG Salary

AVG Salary

This is the same data you saw on the previous page! The difference is that the above is a columnar design. I have color coded this for you. There are 8 rows in the table and five columns. Notice that the entire row stays on the same disk, but each column is a separate block. This is a brilliant design for Ad Hoc queries and analytics because when only a few columns are needed, columnar can move just the columns it needs to. Columnar can't be beat for queries because the blocks are so much smaller, and what isn't needed isn't moved.

Page 14

Chapter 1

What is Columnar?

Visualize the Data – Rows vs. Columns 24 rows (five columns) stored in 6 blocks in this row-based system

24 rows (five columns) stored in 15 blocks (each column is its own block)

Both examples above have the same data and the same amount of data. If your applications tend to need to analyze the majority of columns or read the entire table, then a row-based system (top example) can move more data into memory. Columnar tables are advantageous when only a few columns need to be read. This is just one of the reasons that analytics goes with columnar like bread goes with butter. A row-based system must move the entire block into memory even if it only needs to read one row or even a single column. If a user above needed to analyze the Salary, the columnar system would move 80% less block mass. Page 15

Chapter 1

What is Columnar?

The Architecture of Actian Matrix The Leader Node manages the distribution of data and builds the plan for the compute nodes to follow.

Leader Node

Compute Node 1 S l i c e

S l i c e

S l i c e

S l i c e

S l i c e

Compute Node n S l i c e

S l i c e

S l i c e

S l i c e

S l i c e

S l i c e

S l i c e

“Be the change that you want to see in the world.” - Mahatma Gandhi

The leader node is the brains behind the entire operation. The user logs into the leader node, and for each SQL query, the leader node will come up with a plan to retrieve the data. It passes that compiled plan to each compute node, and each slice processes their portion of the data. If the data is spread evenly, parallel processing works perfectly. This technology is relatively inexpensive. It might not "be the change", but it will help your company "keep the change" because costs are low. Page 16

Chapter 1

What is Columnar?

Matrix has Linear Scalability Leader Node

Network

Network

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

S L I C E

"A Journey of a thousand miles begins with a single step ."

- Lao Tzu

Actian Matrix was born to be parallel. With each query, a single step is performed in parallel by each Slice. A Matrix system consists of a series of slices that will work in parallel to store and process your data. This design allows you to start small and grow infinitely. If your Matrix system provides you with an excellent Return On Investment (ROI), then continue to invest by purchasing more nodes (adds additional slices). Most companies start small, but after seeing what Matrix can do, they continue to grow their ROI from the single step of implementing a Matrix system to millions of dollars in profits. Double your slices and double your speeds….Forever. Matrix actually provides a journey of a thousand smiles! Page 17

Chapter 1

What is Columnar?

Distribution Styles

1

2

KEY distribution - The rows are distributed according to the values in one column. The leader node places matching values on the same node slice. If you distribute a pair of tables on the joining keys, the leader node co-locates the rows on the slices according to the values in the joining columns. Now, matching values from the common columns are physically stored together. This is extremely important for table joins.

EVEN distribution - The rows are distributed across the slices in a round-robin fashion, regardless of the values in any particular column. EVEN distribution is appropriate when a table does not participate in joins or when there is not a clear choice between KEY distribution and ALL distribution. EVEN distribution is the default distribution style.

Matrix gives you two great choices to distribute your tables. If you have two tables that are being joined together a lot and they are about the same size, then you want to give them both the same distribution key as the join key. This co-locates the matching rows on the same slice. Two rows being joined together must be on the same slice (or Matrix will move one or both of the rows temporarily to satisfy the join requirement). If you join two tables a lot, but one table is really big and the other is small, then you want to have the small table distributed by EVEN. Use your distribution key to ensure joins happen faster, but also use it to spread the data as evenly among the slices as possible.

Page 18

Chapter 1

What is Columnar?

Distribution Key Where the Data is Unique

CREATE TABLE Employee_table (Emp_No INTEGER IF, Dept_No SMALLINT NULL, Last_name CHAR(20) NULL, First_name VARCHAR(12) NULL) DISTKEY(Emp_No);

Emp_No Dept_No First_Name _______ ________ __________ Last_Name _________ 1001 100 Rafael Minal 1002 200 Maria Gomez 1003 300 Charl Kertzel 1004 400 Kyle Stover 1005 400 Rob Rivers 1006 300 Inna Kinski 1007 200 Sushma Davis 1008 100 Mo Khan 1009 300 Mo Swartz

Slice

Slice DISTKEY

Slice

DISTKEY

DISTKEY

1001

100

Rafael

Minal

1002

200

Maria

Gomez

1003

300

Charl

Kertzel

1008

100

Mo

Khan

1007

200

Sushma

Davis

1006

300

Inna

Kinski

1009

300

Mo

Swartz

1005

400

Rob

Rivers

1004

400

Kyle

Stover

The entire row of a table is on a slice, but each column in the row is in a separate container (block). A Unique Distribution Key spreads the rows of a table evenly across the slices. A good Distribution Key is the key to good distribution! Page 19

Chapter 1

What is Columnar?

Another Way to Create a Table

CREATE TABLE Employee_table (Emp_No integer not null distkey sortkey, Dept_No smallint null, Last_name char(20) null, First_name varchar(12) null) ;

Emp_No Dept_No First_Name _______ ________ __________ Last_Name _________ 1001 100 Rafael Minal 1002 200 Maria Gomez 1003 300 Charl Kertzel 1004 400 Kyle Stover 1005 400 Rob Rivers 1006 300 Inna Kinski 1007 200 Sushma Davis 1008 100 Mo Khan 1009 300 Mo Swartz

Slice

Slice distkey

Slice

distkey

distkey

1001

100

Rafael

Minal

1002

200

Maria

Gomez

1003

300

Charl

Kertzel

1008

100

Mo

Khan

1005

400

Rob

Rivers

1004

400

Kyle

Stover

1009

300

Mo

Swartz

1007

200

Sushma

Davis

1006

300

Inna

Kinski

sortkey

sortkey

sortkey

We have chosen the Emp_No column as both the distribution key and the sort key. We can control both!

Page 20

Chapter 1

What is Columnar?

Distribution Key Where the Data is Non-Unique

CREATE TABLE Employee_table (Emp_No INTEGER NULL, Dept_No SMALLINT NULL, Last_name CHAR(20) NULL, First_name VARCHAR(12) NULL) DISTKEY(Dept_No);

Emp_No Dept_No First_Name _______ ________ __________ Last_Name _________ 1001 100 Rafael Minal 1002 200 Maria Gomez 1003 300 Charl Kertzel 1004 400 Kyle Stover 1005 400 Rob Rivers 1006 300 Inna Kinski 1007 200 Sushma Davis 1008 100 Mo Khan 1009 300 Mo Swartz

Slice

Slice

DISTKEY

DISTKEY 1001 1008

100 100

Slice

Rafael Mo

Minal Khan

DISTKEY

1002 1007 1004

200 200 400

Maria Sushma Kyle

Gomez Davis Stover

1005

400

Rob

Rivers

1003

300

Charl

Kertzel

1006 1009

300 300

Inna Mo

Kinski Swartz

The data did not spread evenly among the slices for this table. Do you know why? The Distribution Key is Dept_No. All like values went to the same slice. This distribution isn't perfect, but it is reasonable, so it is an acceptable practice. Page 21

Chapter 1

What is Columnar?

Even Distribution Key

CREATE TABLE Employee_table (Emp_No INTEGER NULL, Dept_No SMALLINT NULL, Last_name CHAR(20) NULL, First_name VARCHAR(12) NULL) diststyle even ;

Emp_No Dept_No First_Name _______ ________ __________ Last_Name _________ 1001 100 Rafael Minal 1002 200 Maria Gomez 1003 300 Charl Kertzel 1004 400 Kyle Stover 1005 400 Rob Rivers 1006 300 Inna Kinski 1007 200 Sushma Davis 1008 100 Mo Khan 1009 300 Mo Swartz

Slice

Slice

Slice

1001

100

Rafael

Minal

1002

200

Maria

Gomez

1003

300

Charl

Kertzel

1008

100

Mo

Khan

1007

200

Sushma

Davis

1006

300

Inna

Kinski

1009

300

Mo

Swartz

1004

400

Kyle

Stover

1005

400

Rob

Rivers

The data did not spread evenly among the slices for this table. Do you know why? The Distribution Key is Dept_No. All like values went to the same slice. This distribution isn't perfect, but it is reasonable, so it is an acceptable practice.

Page 22

Chapter 1

What is Columnar?

Matching Distribution Keys for Co-Location of Joins CREATE TABLE Employee_table (Emp_No INTEGER NULL, Dept_No INTEGER NULL, Last_name CHAR(20) NULL, First_name VARCHAR(12) NULL) DISTKEY(Dept_No);

CREATE TABLE Department_table (Dept_No INTEGER NULL, Dept_Name CHAR(20) NULL, Mgr_No INTEGER Budget Decimal (10,2)) DISTKEY(Dept_No);

Slice

Slice Employee_Table 1001 1008

100 100

Rafael Mo

Employee_Table Minal Khan

Fin

1008

Employee_Table

1002 1007 1004

200 200 400

Maria Sushma Kyle

Gomez Davis Stover

1005

400

Rob

Rivers

Department_Table 100

Slice

1003 1006 1009

Department_Table 90000

200 400

HR IT

1002 1005

500000 600000

300 300 300

Charl Inna Mo

Kertzel Kinski Swartz

Department_Table 300

Mrkt

1006

500000

Notice that both tables are distributed on Dept_No. When these two tables are joined WHERE Dept_No = Dept_No, the rows with matching department numbers are on the same Slice. This is called Co-Location. This makes joins efficient and fast. Page 23

Chapter 1

What is Columnar?

Big Table / Small Table Joins Slice

Slice

Employee_Table

Slice

Employee_Table

DISTKEY

Employee_Table

DISTKEY

DISTKEY

1001 1008

100 100

Rafael Mo

Minal Khan

1002 1007

200 200

Maria Sushma

Gomez Davis

1003 1006

300 300

Charl Inna

Kertzel Kinski

1009

300

Mo

Swartz

1005

400

Rob

Rivers

1004

400

Kyle

Stover

Department_Table

Department_Table

Department_Table

100

Fin

1008

90000

200

HR

1002

500000

300

Mrkt

1006

500000

400

IT

1005

600000

500

Sales

1004

700000

600

Mfg

1007

800000

EVEN

EVEN

EVEN

Notice that the Department_Table has only six rows. Those six rows are evenly distributed across every slice. This is distributed by EVEN. When two joining tables have one large table (fact table) and one small table (dimension table), then use the EVEN keyword to distribute the smaller table. This will force the smaller table to be redistributed rather than the large table.

Page 24

Chapter 1

What is Columnar?

Fact and Dimension Table Distribution Key Designs Line_Order_Fact_Table

Part_Table P_Part_Key

Make the Part_Key the Distribution Key for the two largest tables

LO_Order_Key LO_Line_Number LO_Cust_Key LO_Part_Key LO_Ship_Priority LO_Quantity LO_Extended_Price LO_Supp_Key LO_Order_Total_Price LO_Discount LO_Tax LO_Order_Date LO_Supply_Cost LO_Revenue LO_Ship_Mode

EVEN Customer_Table

C_Cust_Key EVEN

Supplier_Table S_Supp_Key

EVEN Date_Table

D_Date_Key

The fact table (Line_Order_Fact_Table) is the largest table, but the Part_Table is the largest dimension table. That is why you make Part_Key the distribution key for both tables. Now, when these two tables are joined together, the matching Part_Key rows are on the same slice. You can then distribute by EVEN on the other dimension tables which will ensure that the data on these smaller tables are spread evenly across all the nodes in the cluster. Page 25

Chapter 1

What is Columnar?

Improving Performance By Defining a Sort Key CREATE TABLE Order_Table (Order_Number INTEGER NULL, Customer_Number INTEGER NULL, Order_Date DATE NULL sortkey, Order_Total DECIMAL(10,2) NULL) DISTKEY(Order_Number);

Slice1

Slice 2

Slice 3

Order_Table

Order_Table

Order_Table

JAN FEB

JAN FEB

JAN FEB

MAR APR

MAR APR

MAR APR

MAY JUN

MAY JUN

MAY JUN

There are three basic reasons to use the sortkey keyword when creating a table. 1) If recent data is queried most frequently, specify the timestamp or date column as the leading column for the sort key. 2) If you do frequent range filtering or equality filtering on one column, specify that column as the sort key. 3) If you frequently join a (dimension) table, specify the join column as the sort key. Above, you can see we have made our sortkey the Order_Date column. Look how the data is sorted! Page 26

Chapter 1

What is Columnar?

Sort Keys Help Group By, Order By and Window Functions CREATE TABLE Order_Table (Order_Number INTEGER NULL, Customer_Number INTEGER NULL sortkey, Order_Date DATE NULL, Order_Total DECIMAL(10,2) NULL) DISTKEY(Order_Number);

SELECT Customer_Number ,SUM(Order_Total) as "Order Sum" ,AVG(Order_Total) as "Avg Order" FROM Order_Table GROUP BY Customer_Number ORDER BY Customer_Number ;

SELECT Customer_Number ,Order_Date ,Order_Total ,SUM(Order_Total) OVER (Partition By Customer_Number Order By Customer_Number ,Order_Date Rows Unbounded Preceding) as "Cumulative Sum" FROM Order_Table

When data is sorted on a strategic column, it will improve (GROUP BY and ORDER BY operations), window functions (PARTITION BY and ORDER BY operations), and even as a means of optimizing compression. But, as new rows are incrementally loaded, these new rows are sorted but they reside temporarily in a separate region on disk. In order to maintain a fully sorted table, you need to run the VACUUM command at regular intervals. You will also need to run ANALYZE. Page 27

Chapter 1

What is Columnar?

Each Block Comes With Metadata Slice Memory

Metadata

SELECT * FROM Order_Table WHERE Order_Total < 300.00

Metadata = Order_Total < 300.00

Min Value Order_Total 231.62

Max Value 12447.53

Query

Order_Table Order_No _________ 103 107 111 115

Cust_No _________ 44445679 32547733 57497134 87768956

Order_Date ___________ Order_Total __________ 01/01/2014 12447.53 01/15/2014 8055.66 02/12/2014 5651.47 03/17/2014 231.62

Actian Matrix stores columnar data in 1 MB disk blocks by default. The size can be reduced if queries typically include many columns. The min and max values for each block are stored as part of the metadata. If a rangerestricted column is a sort key, the query processor is able to use the min and max values to rapidly skip over large numbers of blocks during table scanning. Where most databases use indexes to determine where data is, Matrix uses the block's metadata to determine where data is NOT! Our query above is looking for data WHERE Order_Total < 300. The metadata shows this block will contain rows, and therefore it will be moved into memory for processing. Each slice has metadata for each of the blocks they own. Page 28

Chapter 1

What is Columnar?

How Data Might Look On A Slice Metadata

Metadata

Min 1/1/2014 Max 3/13/2014

Min 3/14/2014 Max 5/24/2014

1/1/2014 1/2/2014 1/3/2014

1/4/2014 1/5/2014 1/6/2014

1/7/2014 1/8/2014 1/9/2014

1/10/2014 1/11/2014 1/12/2014

3/14/2014 3/15/2014 3/16/2014

3/17/2014 3/18/2014 3/19/2014

3/20/2014 3/21/2014 3/22/2014

3/23/2014 3/24/2014 3/25/2014

1/13/2014 1/14/2014 1/15/2015

1/16/2014 1/17/2014 1/18/2014

1/19/2014 1/20/2014 1/21/2014

1/22/2014 1/23/2014 1/24/2014

3/26/2014 3/27/2014 3/28/2015

3/29/2014 3/30/2014 3/31/2014

4/1/2014 4/2/2014 4/3/2014

4/4/2014 4/5/2014 4/6/2014

1/25/2014 1/26/2014 1/27/2014

1/28/2014 1/29/2014 1/30/2014

1/31/2014 2/1/2014 2/2/2014

2/3/2014 2/4/2014 2/5/2014

4/7/2014 4/8/2014 4/9/2014

4/10/2014 4/11/2014 4/12/2014

4/13/2014 4/14/2014 4/15/2014

4/16/2014 4/17/2014 4/18/2014

2/6/2014 2/7/2014 2/8/2014

2/9/2014 2/10/2014 2/11/2014

2/12/2014 2/13/2014 2/14/2014

2/15/2014 2/16/2014 2/17/2014

4/19/2014 4/20/2014 4/21/2014

4/22/2014 4/23/2014 4/24/2014

4/25/2014 4/26/2014 4/27/2014

4/28/2014 4/29/2014 4/30/2014

2/18/2014 2/19/2014 2/20/2015

2/21/2014 2/22/2014 2/23/2014

2/24/2014 2/25/2014 2/26/2014

2/27/2014 2/28/2014 3/1/2014

5/1/2014 5/2/2014 5/3/2014

5/4/2014 5/5/2014 5/6/2014

5/7/2014 5/8/2014 5/9/2014

5/10/2014 5/11/2014 5/12/2014

3/2/2014 3/3/2014 3/4/2014

3/5/2014 3/6/2014 3/7/2014

3/8/2014 3/9/2014 3/10/2014

3/11/2014 3/12/2014 3/13/2014

5/13/2014 5/14/2014 5/15/2015

5/16/2014 5/17/2014 5/18/2014

5/19/2014 5/20/2014 5/21/2014

5/22/2014 5/23/2014 5/24/2014

Matrix allocates 1 MB per block when a table begins loading. When a block is filled, another is allocated. I want you to imagine that we created a table that had only one column, and that column was Order_Date. On January 1 st, data was loaded. Notice in the examples that as data is loaded, it continues to fill until the block reaches 1 MB. The Order_Date is ordered (because as each day is loaded, it fills up the next slot). Then, notice how the metadata has the min and max Order_Date. The metadata is designed to inform Matrix whether this block should be read when this table is queried. If a query is looking for data in April, then there is no reason to read block 1 because it falls outside of the min/max range. Page 29

Chapter 1

What is Columnar?

Question – How Many Blocks Move Into Memory? Slice

Slice

Slice

Slice

Memory

Memory

Memory

Memory

Metadata

Metadata

Metadata

Metadata

Order_No 100-112 Cust_No 21345679-87323456 Order_Date 01/01/2013-01/01/2013 Order_Total 5111.47-15231.62

Order_No 101-113 Cust_No 34345699-67823486 Order_Date 01/01/2013-01/01/2013 Order_Total 7611.57-13347.51

Order_No 102-114 Cust_No 35623134-98756733 Order_Date 01/01/2013-01/01/2013 Order_Total 3451.47-19871.62

Order_No 103-115 Cust_No 32547733-87768956 Order_Date 01/01/2013-01/01/2013 Order_Total 231.62-12447.53

Orders

Orders

Orders

Orders

Order_No 100 104 108 112

Cust_No

Order_Date

Order_Total

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12347.53 8005.91 5111.47 15231.62

Order_No Cust_No 101 105 109 113

34345699 41456543 51323154 67823486

Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 13347.51 13005.91 7611.57 11671.92

Order_No Cust_No

Order_Date

Order_Total

87945679 98756733 35623134 97873456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

8347.53 17005.91 3451.47 19871.62

102 106 110 114

Order_No Cust_No 103 107 111 115

44445679 32547733 57497134 87768956

Order_Date

Order_Total

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12447.53 8055.66 5651.47 231.62

SELECT * FROM Orders WHERE Order_Total < 250.00

Looking at the SQL and the metadata, how many blocks will need to be moved into memory?

Page 30

Chapter 1

What is Columnar?

Answer – How Many Blocks Move Into Memory? Slice Memory

Slice

Slice

Memory

Memory

Slice Memory Order_No Cust_No 103 107 111 115

44445679 32547733 57497134 87768956

Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 12447.53 8055.66 5651.47 231.62

Metadata

Metadata

Metadata

Metadata

Order_No 100-112 Cust_No 21345679-87323456 Order_Date 01/01/2013-01/01/2013 Order_Total 5111.47-15231.62

Order_No 101-113 Cust_No 34345699-67823486 Order_Date 01/01/2013-01/01/2013 Order_Total 7611.57-13347.51

Order_No 102-114 Cust_No 35623134-98756733 Order_Date 01/01/2013-01/01/2013 Order_Total 3451.47-19871.62

Order_No 103-115 Cust_No 32547733-87768956 Order_Date 01/01/2013-01/01/2013 Order_Total 231.62-12447.53

Orders Order_No 100 104 108 112

Orders

Cust_No

Order_Date

Order_Total

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12347.53 8005.91 5111.47 15231.62

Order_No Cust_No 101 105 109 113

34345699 41456543 51323154 67823486

Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013

Orders Order_Total 13347.51 13005.91 7611.57 11671.92

Order_No Cust_No 102 106 110 114

87945679 98756733 35623134 97873456

Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013

Orders Order_Total 8347.53 17005.91 3451.47 19871.62

Order_No Cust_No 103 107 111 115

44445679 32547733 57497134 87768956

Order_Date

Order_Total

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12447.53 8055.66 5651.47 231.62

SELECT * FROM Orders WHERE Order_Total < 250.00

Only one block moves into memory. The metadata shows that the min and max for Order_total only falls into the range for the last Slice. Only that Slice moves the block into memory. Page 31

Chapter 1

What is Columnar?

Quiz – Master that Query With the Metadata Slice

Slice

Slice

Slice

Memory

Memory

Memory

Memory

Metadata

Metadata

Metadata

Metadata

Order_No 100-112 Cust_No 21345679-87323456 Order_Date 01/01/2013-01/01/2013 Order_Total 5111.47-15231.62

Order_No 101-113 Cust_No 34345699-67823486 Order_Date 01/01/2013-01/01/2013 Order_Total 7611.57-13347.51

Order_No 102-114 Cust_No 35623134-98756733 Order_Date 01/01/2013-01/01/2013 Order_Total 3451.47-19871.62

Order_No 103-115 Cust_No 32547733-87768956 Order_Date 01/01/2013-01/01/2013 Order_Total 231.62-12447.53

Orders Order_No Cust_No 100 104 108 112

21345679 32456733 31323134 87323456

Orders

Order_Date

Order_Total

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12347.53 8005.91 5111.47 15231.62

SELECT * FROM Orders WHERE Order_Total < 800.00 ;

Orders

Order_No Cust_No

Order_Date

Order_Total

34345699 41456543 51323154 67823486

01/01/2013 01/01/2013 01/01/2013 01/01/2013

13347.51 13005.91 7611.57 11671.92

101 105 109 113

Orders

Order_No Cust_No

Order_Date

Order_Total

87945679 98756733 35623134 97873456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

8347.53 17005.91 3451.47 19871.62

102 106 110 114

SELECT * FROM Orders WHERE Order_No =115 ;

Order_No Cust_No 103 107 111 115

44445679 32547733 57497134 87768956

Order_Date

Order_Total

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12447.53 8055.66 5651.47 231.62

SELECT * FROM Orders WHERE Cust_No = 51330045;

Looking at the SQL and the metadata, how many blocks will need to be moved into memory for each query?

Page 32

Chapter 1

What is Columnar?

Answer to Quiz – Master that Query With the Metadata Slice

Slice

Slice

Slice

Memory

Memory

Memory

Memory

Metadata

Metadata

Metadata

Order_No 100-112 Cust_No 21345679-87323456 Order_Date 01/01/2013-01/01/2013 Order_Total 5111.47-15231.62

Order_No 101-113 Cust_No 34345699-67823486 Order_Date 01/01/2013-01/01/2013 Order_Total 7611.57-13347.51

Order_No 102-114 Cust_No 35623134-98756733 Order_Date 01/01/2013-01/01/2013 Order_Total 3451.47-19871.62

Orders Order_No Cust_No 100 104 108 112

21345679 32456733 31323134 87323456

Orders

Order_Date

Order_Total

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12347.53 8005.91 5111.47 15231.62

SELECT * FROM Orders WHERE Order_Total < 800.00 ;

Above are your answers. Page 33

Orders

Order_No Cust_No

Order_Date

Order_Total

34345699 41456543 51323154 67823486

01/01/2013 01/01/2013 01/01/2013 01/01/2013

13347.51 13005.91 7611.57 11671.92

101 105 109 113

1

Metadata Order_No 103-115 Cust_No 32547733-87768956 Order_Date 01/01/2013-01/01/2013 Order_Total 231.62-12447.53

Orders

Order_No Cust_No

Order_Date

Order_Total

87945679 98756733 35623134 97873456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

8347.53 17005.91 3451.47 19871.62

102 106 110 114

SELECT * FROM Orders WHERE Order_No =115 ;

1

Order_No Cust_No 103 107 111 115

44445679 32547733 57497134 87768956

Order_Date

Order_Total

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12447.53 8055.66 5651.47 231.62

SELECT * FROM Orders WHERE Cust_No = 51330045;

4

Chapter 1

What is Columnar?

The ANALYZE Command Collects Statistics analyze ;

This command will analyze every table in the current database.

analyze verbose;

analyze Order_Table;

Will analyze every table in the current database and report progress.

Will analyze only the table named Order_Table.

analyze Order_Table (Customer_Number, Order_Date) ; Will analyze the columns Customer_Number and Order_Date in the table named Order_Table.

The Analyze command updates table statistics for use by the query planner. You can analyze all the tables in an entire database, or you can analyze specific tables including temporary tables. If you want to specifically analyze a table, you can, but not more than one table_name with a single ANALYZE table_name statement. If you do not specify a table_name, all of the tables in the currently connected database are analyzed including the persistent tables in the system catalog. Page 34

Chapter 1

What is Columnar?

Matrix Automatically ANALYZES Some Create Statements Matrix automatically analyzes tables that you create with the following commands: 1. CREATE TABLE AS

2. CREATE TEMP TABLE AS 3. SELECT INTO You do not need to run the ANALYZE command on these tables when they are first created. If you modify them with additional inserts, updates, or deletes, you should analyze them in the same way as other tables.

1 CREATE TABLE Order_Backup AS SELECT * FROM Order_Table ;

2 CREATE TEMP TABLE Ord_Temp distkey (Order_Number) Sortkey (Order_Date) AS SELECT * FROM Order_Table ;

3 SELECT * INTO New_Order_Table from Order_Table;

The above examples won't need the analyze statement because it is done automatically. Yet, if you modify these tables, you will need to run the analyze command. The Analyze command updates table statistics for use by the query planner. You can analyze all the tables in an entire database, or you can analyze specific tables including temporary tables. Page 35

Chapter 1

What is Columnar?

What is a Vacuum? What is a Vacuum? Actian Matrix doesn't automatically reclaim and reuse space that is freed when you delete or update rows. These rows are logically deleted but not physically deleted (until you run a vacuum). The vacuum will reclaim the space.

A Data Block that has NOT been Vacuumed 200 100 100 400 300 500 600

Research and Develop Marketing Marketing Customer Support Sales Human Resources IT

1000234 1256349 1256349 1256349 1333454 1121334 2000000

550000.00 500000.00 800000.00 500000.00 650000.00 450000.00 500000.00

An update was performed A delete was performed

A Data Block that has been Vacuumed 200 100 400 300 500

Research and Develop Marketing Customer Support Sales Human Resources

1000234 1256349 1256349 1333454 1121334

550000.00 800000.00 500000.00 650000.00 450000.00

Actian Matrix doesn't automatically reclaim and reuse space that is freed when you delete rows and update rows. These rows are logically deleted, but not physically deleted (until you run a vacuum). To perform an update, Actian Matrix deletes the original row and appends the updated row, so every update is effectively a delete followed by an insert. When you perform a delete, the rows are marked for deletion but not removed. Page 36

Chapter 1

What is Columnar?

When is a Good Time to Vacuum? When is a Good Time to Vacuum? Run VACUUM during maintenance, batch windows, or time periods when you expect minimal activity on the cluster. A large unsorted region results in longer vacuum times. If you delay vacuuming, the vacuum will take longer because more data has to be reorganized. Keep the vacuum regular enough to properly maintain the table. VACUUM is an I/O intensive operation, so the longer it takes for your vacuum to complete, the more impact it will have on concurrent queries and other database operations running on your cluster. Concurrent queries and write operations are allowed during vacuum operations. DDL operations are not allowed during vaccum. “Time flies like an arrow. Fruit flies like a banana.”

- Groucho Marx

A vacuum can be time consuming and it is very intensive. That is why the above advice is needed. Vacuum wisely. You can run the vacuum command to get rid of the logically deleted rows and resort the table 100% perfectly. When about 10% of the table has changed over time, it is a good practice to run both the Vacuum and Analyze commands. Like Groucho Marx has basically stated, "If data processing slows down and users get groucho, hit your marks and make if fly after a vacuum." Page 37

Chapter 1

What is Columnar?

The VACUUM Command Grooms a Table vacuum ;

Reclaim space (from updates and deletes) and resort rows for all tables in the current database.

vacuum Order_Table ;

Reclaim space (from updates and deletes) and resort rows for the table named Order_Table.

vacuum sort only Order_Table ;

vacuum delete only Order_Table ;

Resort rows for the table named Order_Table.

Reclaim space (from updates and deletes) for the table named Order_Table.

When tables are originally created and loaded, the rows are in perfect order (naturally) or because a sort key was specified. As additional inserts, updates, deletes are performed over time, two things happen. Rows that have been modified are done so logically, thus there are additional rows physically still there, but that have been logically deleted. The second thing that happens is that new rows that are inserted are stored on a different part of the disk, so the sort is no longer 100% accurate. Page 38

Chapter 1

What is Columnar?

The Matrix database catalog also needs periodic vacuuming and indexing vacuumcat ;

Reclaim space in the database catalog for the current database

Indexcat ;

Reindex the database catalog for the current database

When tables and views have been dropped from a database this leaves logically deleted records in the database catalog, which is stored in a Postgres database on the leader node. To physically remove wasted space and reindex the database catalog, Matrix uses the following commands: VACUUMCAT and INDEXCAT. If performance slows down when accessing the database catalog this indicates that it is time to free up wasted space and reindex.

Page 39

Chapter 1

What is Columnar?

Database Limits Actian Matrix enforces these limits for databases. 1. Maximum of 60 user-defined databases per cluster. 2. Maximum of 127 characters for a database name. 3. Cannot be a reserved word.

CREATE DATABASE SQL_Class2 WITH OWNER TeraTom ; “Where there is no patrol car, there is no speed limit.” - Al Capone The following example creates a database named SQL_Class2 and gives ownership to the user TeraTom. You can only create a maximum of 60 different databases per cluster, so get yours created before the mob!

Page 40

Chapter 1

What is Columnar?

Creating a Database create database sql_class ;

Slice

Slice

Slice

sql_class

sql_class

sql_class

“The best way to predict the future is to create it.” - Sophia Bedford-Pierce

A Matrix cluster can have many databases. Above is the syntax to create a database. The database is named sql_class. The data in a database can help you predict the future, and Matrix makes it so easy to create it. I think Sophia Bedford-Pierce must be a DBA! Page 41

Chapter 1

What is Columnar?

Creating a User create user teratom password 'TLc123123' ;

Password must: • be between 8 and 64 characters • have at least one uppercase letter • have at least one lowercase letter • have at least one number

Slice

Slice

Slice

To create a new user, specify the name of the new user and create a password. The password is required, and it must be reasonably secure. It must have between 8 and 64 characters, and it must include at least one uppercase letter, one lowercase letter, and one number. LDAP/AD is supported if you want to synchronize and manage user passwords using this mechanism. In order to do this, users must be created in Matrix with the same name as used in the LDAP/AD namespace. Page 42

Chapter 1

What is Columnar?

Dropping a User

Drop user teratom; “All glory comes from daring to begin.” – Anonymous

If you delete a database user account, the user will no longer be able to access any of the cluster databases. The quote above is the opposite of the DBA credo which states, "All glory comes from daring to drop a user."

Page 43

Chapter 1

What is Columnar?

Inserting Into a Table INSERT INTO Customer_Table VALUES (121346543, 'Lawn Drivers', '555-1234') ;

The INSERT command inserts individual rows into a database table.

Page 44

Chapter 1

What is Columnar?

Renaming a Table or a Column ALTER TABLE Employee_Table rename to Employee_Table_Backup ;

ALTER TABLE Student_Table RENAME COLUMN Grade_Pt to Grade_Point;

The first command renames the Employee_Table to Employee_Table_Backup. The second example renames the column Grade_Pt to Grade_Point. Page 45

Chapter 1

What is Columnar?

Adding and Dropping a Column to a Table

ALTER TABLE Employee_Table ADD COLUMN Mgr int default NULL;

ALTER TABLE Employee_Table DROP COLUMN Mgr ;

In our first example we have added a new column called Mgr to the table Employee_Table. The second example drops that column. Page 46

Chapter 2

Page 47

Best Practices for Table Design

Chapter 2

Best Practices for Table Design

Chapter 2 – Best Practices for Table Design

“Beware of the young doctor and the old barber.” - Benjamin Franklin

Page 48

Chapter 2

Best Practices for Table Design

Converting Table Structures to Actian Matrix

Above, we are converting all of the tables in a Teradata database to Matrix table structures. We went to our Teradata system and right clicked on the database SQL_Class and chose "Convert Table Structures". We selected all of the tables and hit the blue arrow. We then chose to convert to Matrix. Watch in amazement what happens next! Page 49

Chapter 2

Best Practices for Table Design

Converting Table Structures to Actian Matrix Finale

All 20 Teradata tables have now been converted to Matrix. Just cut and paste to your Matrix system, and you have converted the tables.

Page 50

Chapter 2

Best Practices for Table Design

Best Practices for Designing Tables 1. Choose the best sort key 2. Choose a great distribution key 3. Consider defining primary key and foreign key constraints

4. Use the smallest possible column size 5. Use date/time data types for date columns 6. Specify redundant predicates on the sort column “I have found the best way to give advice to your children is to find out what they want and then advise them to do it.” --Harry S. Truman

As you design your database, there are important decisions you must make that will heavily influence overall query performance. These design choices also have a significant effect on how data is stored, which in turn affects query performance by reducing the number of I/O operations and minimizing the memory required to process certain queries. Harry S. Truman was right. "If you want your Matrix system to run brilliantly, take advice from your users, and use best practices to deliver what they asked for".

Page 51

Chapter 2

Best Practices for Table Design

Choose the Best Sort Key When you give an Actian Matrix table a sort key, it stores your data on disk in sorted order. The sort order is used by the optimizer to determine optimal query plans. If recent data is queried most frequently, specify the timestamp column as the leading column for the sort key.

If you do range filtering or equality filtering on one column, specify that column as the sort key. If you frequently join a table, specify the join column as both the sort key and the distribution key. Data sorted correctly helps eliminate unneeded blocks. This is because Matrix has metadata on each block showing column min and max values.

When you give an Actian Matrix table a sort key, it stores your data on disk in sorted order. The sort order is used by the optimizer to determine optimal query plans. If recent data is queried most frequently, specify the timestamp column as the leading column for the sort key. If you do frequent range filtering or equality filtering on one column, specify that column as the sort key. If you frequently join a table, specify the join column as both the sort key and the distribution key. This enables the query optimizer to choose a sort merge join instead of a slower hash join. Because the data is already sorted on the join key, the query optimizer can bypass the sort phase of the sort merge join. Page 52

Chapter 2

Best Practices for Table Design

Each Block Comes With Metadata Slice Memory

Metadata

SELECT * FROM Order_Table WHERE Order_Total < 300.00

Metadata = Order_Total < 300.00

Min Value Order_Total 231.62

Max Value 12447.53

Query

Order_Table Order_No _________ 103 107 111 115

Cust_No _________ 44445679 32547733 57497134 87768956

Order_Date ___________ Order_Total __________ 01/01/2014 12447.53 01/15/2014 8055.66 02/12/2014 5651.47 03/17/2014 231.62

Actian Matrix stores columnar data in 1 MB disk blocks. The min and max values for each block are stored as part of the metadata. If a range-restricted column is a sort key, the query processor is able to use the min and max values to rapidly skip over large numbers of blocks during table. Where most databases use indexes to determine where data is, Matrix uses the block's metadata to determine where data is NOT! Our query above is looking for data WHERE Order_Total < 300. The metadata shows this block will contain rows, and therefore it will be moved into memory for processing. Each slice has metadata for each of the blocks they own. Page 53

Chapter 2

Best Practices for Table Design

Creating a Sort Key CREATE TABLE Order_Table (Order_Number INTEGER NULL, Customer_Number INTEGER NULL, Order_Date DATE NULL sortkey, Order_Total DECIMAL(10,2) NULL) DISTKEY(Order_Number);

Slice1

Slice 2

Slice 3

Order_Table

Order_Table

Order_Table

JAN FEB

JAN FEB

JAN FEB

MAR APR

MAR APR

MAR APR

MAY JUN

MAY JUN

MAY JUN

There are three basic reasons to use the sortkey keyword when creating a table. 1) If recent data is queried most frequently, specify the timestamp or date column as the leading column for the sort key. 2) If you do frequent range filtering or equality filtering on one column, specify that column as the sort key. 3) If you frequently join a (dimension) table, specify the join column as the sort key. Above, you can see we have made our sortkey the Order_Date column. Look how the data is sorted!

Page 54

Chapter 2

Best Practices for Table Design

Sort Keys Help Group By, Order By and Window Functions CREATE TABLE Order_Table (Order_Number INTEGER NULL, Customer_Number INTEGER NULL sortkey, Order_Date DATE NULL, Order_Total DECIMAL(10,2) NULL) DISTKEY(Order_Number);

SELECT Customer_Number ,SUM(Order_Total) as "Order Sum" ,AVG(Order_Total) as "Avg Order" FROM Order_Table GROUP BY Customer_Number ORDER BY Customer_Number ;

SELECT Customer_Number ,Order_Date ,Order_Total ,SUM(Order_Total) OVER (Partition By Customer_Number Order By Customer_Number ,Order_Date Rows Unbounded Preceding) as "Cumulative Sum" FROM Order_Table

When data is sorted on a strategic column, it will improve (GROUP BY and ORDER BY operations), window functions (PARTITION BY and ORDER BY operations), and even as a means of optimizing compression. But, as new rows are incrementally loaded, these new rows are sorted but they reside temporarily in a separate region on disk. In order to maintain a fully sorted table, you need to run the VACUUM command at regular intervals. You will also need to run ANALYZE.

Page 55

Chapter 2

Best Practices for Table Design

Choose a Great Distribution Key Good data distribution has two goals:

1. To distribute data evenly among the nodes and slices in a cluster. 2. To collocate data for joins and aggregations.

Slice

Slice DISTKEY

DISTKEY

1001 1008

100 100

Rafael Mo

Minal Khan

1002 1007

200 200

Maria Sushma

Gomez Davis

1009

300

Mo

Swartz

1005

400

Rob

Rivers

Uneven distribution, or data skew, forces some nodes to do more work than others which slows down the entire process. With parallel processing, a query is only as fast as the slowest node. Even distribution is a key concept when each node processes the information they own simultaneously with their node peers. When rows that participate in joins or aggregations are located on different nodes, more data has to be moved among nodes. This is because Actian Matrix must ensure that two rows being joined are on the same node in the same memory. If this is not the case, then Matrix will either copy the smaller table to all nodes temporarily or redistribute one or both tables. Page 56

Chapter 2

Best Practices for Table Design

Distribution Key Where the Data is Unique

CREATE TABLE Employee_table (Emp_No INTEGER NULL, Dept_No SMALLINT NULL, Last_name CHAR(20) NULL, First_name VARCHAR(12) NULL) DISTKEY(Emp_No);

Emp_No Dept_No First_Name _______ ________ __________ Last_Name _________ 1001 100 Rafael Minal 1002 200 Maria Gomez 1003 300 Charl Kertzel 1004 400 Kyle Stover 1005 400 Rob Rivers 1006 300 Inna Kinski 1007 200 Sushma Davis 1008 100 Mo Khan 1009 300 Mo Swartz

Slice

Slice DISTKEY

Slice

DISTKEY

DISTKEY

1001

100

Rafael

Minal

1002

200

Maria

Gomez

1003

300

Charl

Kertzel

1008

100

Mo

Khan

1007

200

Sushma

Davis

1006

300

Inna

Kinski

1009

300

Mo

Swartz

1005

400

Rob

Rivers

1004

400

Kyle

Stover

The entire row of a table is on a slice, but each column in the row is in a separate container (block). A Unique Distribution Key spreads the rows of a table evenly across the slices. A good Distribution Key is the key to good distribution!

Page 57

Chapter 2

Best Practices for Table Design

Matching Distribution Keys for Co-Location of Joins CREATE TABLE Employee_table (Emp_No INTEGER NULL, Dept_No INTEGER NULL, Last_name CHAR(20) NULL, First_name VARCHAR(12) NULL) DISTKEY(Dept_No);

CREATE TABLE Department_table (Dept_No INTEGER NULL, Dept_Name CHAR(20) NULL, Mgr_No INTEGER Budget Decimal (10,2)) DISTKEY(Dept_No);

Slice

Slice Employee_Table 1001 1008

100 100

Rafael Mo

Employee_Table Minal Khan

Fin

1008

Employee_Table

1002 1007 1004

200 200 400

Maria Sushma Kyle

Gomez Davis Stover

1005

400

Rob

Rivers

Department_Table 100

Slice

1003 1006 1009

Department_Table 90000

200 400

HR IT

1002 1005

500000 600000

300 300 300

Charl Inna Mo

Kertzel Kinski Swartz

Department_Table 300

Mrkt

1006

500000

Notice that both tables are distributed on Dept_No. When these two tables are joined WHERE Dept_No = Dept_No, the rows with matching department numbers are on the same Slice. This is called Co-Location. This makes joins efficient and fast.

Page 58

Chapter 2

Best Practices for Table Design

Big Table / Small Table Joins Slice

Slice

Employee_Table

Slice

Employee_Table

DISTKEY

Employee_Table

DISTKEY

DISTKEY

1001 1008

100 100

Rafael Mo

Minal Khan

1002 1007

200 200

Maria Sushma

Gomez Davis

1003 1006

300 300

Charl Inna

Kertzel Kinski

1009

300

Mo

Swartz

1005

400

Rob

Rivers

1004

400

Kyle

Stover

Department_Table

Department_Table

Department_Table

100

Fin

1008

90000

100

Fin

1008

90000

100

Fin

1008

90000

200 300

HR Mrkt

1002 1006

500000 500000

200 300

HR Mrkt

1002 1006

500000 500000

200 300

HR Mrkt

1002 1006

500000 500000

400

IT

1005

600000

400

IT

1005

600000

400

IT

1005

600000

ALL

ALL

ALL

Notice that the Department_Table has only four rows. Those four rows are copied to every slice. This is distributed by ALL. Now, the Department_Table can be joined to the Employee_Table with a guarantee that matching rows are co-located. They are co-located because the smaller table has copied ALL of its rows to each slice. When two joining tables have one large table (fact table) and the other table is small (dimension table), then use the ALL keyword to distribute the smaller table. Page 59

Chapter 2

Best Practices for Table Design

Define Primary Key and Foreign Key Constraints 1. Define primary key and foreign key constraints between tables wherever appropriate. 2. Primary key and foreign key constraints are informational only.

3. Actian Matrix does not enforce unique, primary key, and foreign key constraints. 4. The query planner uses these keys in certain statistical computations, to infer uniqueness and referential relationships that affect subquery decorrelation techniques, to order large numbers of joins, and to eliminate redundant joins. 5. Actian Matrix does enforce NOT NULL column constraints.

Actian Matrix does not enforce unique, primary-key, and foreign-key constraints. Your application is responsible for ensuring uniqueness and managing the DML operations. The query planner will use primary and foreign keys in certain statistical computations to infer uniqueness and referential relationships that affect subquery decorrelation techniques, to order large numbers of joins, and to eliminate redundant joins. The planner leverages these key relationships, but it assumes that all keys in Actian Matrix tables are valid as loaded. If your application allows invalid foreign keys or primary keys, some queries could return incorrect results. For example, a SELECT DISTINCT query might return duplicate rows if the primary key is not unique. Do not define key constraints for your tables if you doubt their validity. On the other hand, you should always declare primary and foreign keys and uniqueness constraints when you know that they are valid. Page 60

Chapter 2

Best Practices for Table Design

Primary Key and Foreign Key Examples Primary key and CREATE TABLE Customer_Table( foreign key Customer_Number integer not null sortkey, constraints are Customer_Name char(20), informational only Phone_Number varchar(16), PRIMARY KEY(Customer_Number)) diststyle all ;

CREATE TABLE Order_Table( Matrix enforces NOT Order_Number integer not null distkey, NULL column constraints. Customer_Number integer sortkey, Order_Date Date, Actian Matrix does NOT enforce unique, Order_Total Decimal(10,2), primary key, and PRIMARY KEY (Order_Number), foreign key constraints. FOREIGN KEY (Customer_Number) references Customer_Table(Customer_Number)) ; The query planner uses referential integrity in certain situations, to infer uniqueness and for referential relationships that affect subquery techniques, to order large numbers of joins, and to eliminate redundant joins.

Actian Matrix does not enforce primary key and foreign key constraints. The only reason to apply them is so the query optimizer can generate a better query plan.

Page 61

Chapter 2

Best Practices for Table Design

Use the Smallest Column Size When Creating Tables State_Code Char(20) 'Arizona 'California 'Louisiana 'New Mexico 'New Hampshire 'Massachusetts

' ' ' ' ' '

State_Code Char(13) 'Arizona' 'California' 'Louisiana' 'New Mexico' 'New Hampshire' 'Massachusetts'

You will improve query performance by reducing columns to the minimum possible size.

Table size is not impacted, but query processing will be if the column is being processed in a temporary table to gather intermediate results.

Actian Matrix compresses column data very effectively, so creating columns much larger than necessary has minimal impact on the size of data tables. It is in the processing of queries that the size can hurt you. This is because during processing for complex queries, intermediate query results might need to be stored in temporary tables. Because temporary tables are not compressed, unnecessarily large columns consume excessive memory and temporary disk space, which can affect query performance. Don't go overboard here! Don't have columns so small that they can't contain your largest values! Page 62

Chapter 2

Best Practices for Table Design

Use Date/Time Data Types for Date Columns This table uses DATE as a Data Type

CREATE TABLE Order_table (Order_Number INTEGER NULL, Customer_Number INTEGER NULL, Order_Date DATE NULL, Order_Total DECIMAL(10,2) NULL) DISTKEY(Order_Number);

Yes! Good form

This table uses DATE as a CHAR(12)

CREATE TABLE Order_table2 (Order_Number INTEGER NULL, Customer_Number INTEGER NULL, Order_Date Char(12) NULL, Order_Total DECIMAL(10,2) NULL) DISTKEY(Order_Number);

No! Bad form

Use the DATE or TIMESTAMP data type rather than a character type when storing date/time information. Actian Matrix stores DATE and TIMESTAMP data more efficiently than CHAR or VARCHAR, which results in better query performance. Let Actian Matrix handle the DATE or TIMESTAMP conversions internally instead of you trying to do so in your applications. Most of the time users utilize CHAR or VARCHAR is in the ETL process of moving data. There is no need to do that because Matrix handles any conversions necessary. Page 63

Chapter 2

Best Practices for Table Design

Specify Redundant Predicates on the Sort Column We want to join Table1 with Table2. Both have a distribution key on Customer_Number and both have a sort key on Order_Date. SELECT T1.*, T2.Order_Number FROM Table1 as T1 INNER JOIN Table2 as T2 ON T1.Customer_Number = T2.Customer_Number WHERE T1.Order_Date > '1/1/2014'; SELECT T1.*, T2.Order_Number FROM Table1 as T1 INNER JOIN Table2 as T2 ON T1.Customer_Number = T2.Customer_Number The predicate is redundant, but it WHERE T1.Order_Date > '1/1/2014' allows Table2 to skip blocks that AND T2.Order_Date > '1/1/2014'; do not have dates > '1/1/2014'

You should consider using a predicate on the leading sort column of the fact table, or the largest table, in a join. You can also add predicates to filter other tables that participate in the join, even when the predicates are redundant. These predicates refer to WHERE or AND clauses. Because Matrix has the max and min value for each column per block, you can get better performance when you choose a good sortkey. This allows Matrix to skip reading certain blocks because Matrix always checks the min and max values to see if the block should even be read. The second example above uses a redundant AND clause in hopes the entire table won't have to be read. Page 64

Chapter 2

Best Practices for Table Design

Setting the statement_timeout to Abort Long Queries

0 records returned. SET Command Complete ERROR [57014] ERROR: Query cancelled on user's request; Error while executing the query SELECT Command Failed.

The above query aborts because it took longer than 10 milliseconds. The statement_timeout is designed to abort any statement that takes over the milliseconds specified. If the system setting WLM timeout (max_execution_time) is also specified as part of a WLM configuration, the lower of statement_timeout and max_execution_time is used.

Page 65

Chapter 3

Page 66

Systems Tables

Chapter 3

Systems Tables

Chapter 3 – Systems Tables

“He who asks a question may be a fool for five minutes, but he who never asks a question remains a fool forever.” - Unknown

Page 67

Chapter 3

Systems Tables

Actian Matrix System Tables Actian Matrix provides access to the following types of system tables: • STL tables for logging - These system tables are generated from Actian Matrix log files to provide a history of the system. Logging tables have an STL prefix. • STV tables for snapshot data - These tables are virtual system tables that contain snapshots of the current system data. Snapshot tables have an STV prefix. • System views - System views contain a subset of data found in several of the STL and STV system tables. Systems views have an SVV or SVL prefix.

• System catalog tables - The system catalog tables store schema metadata, such as information about tables and columns. System catalog tables have a PG prefix.

Every Matrix system automatically contains a number of system tables. These system tables contain information about the installation and about the various queries and processes that are running on the system. You can query these system tables to collect information about the redshift database that is installed. Page 68

Chapter 3

Systems Tables

Trouble Shooting Catalog Table pg_table_def select * from pg_table_def where tablename = 'employee_table' ; No rows returned

show search_path ; $user, public

set search_path to '$user', 'public', 'sql_class'; The above query references the system catalog table named pg_table_def, and it only runs exclusively on the leader node. PG_TABLE_DEF will only return information for tables in schemas that are included in the search path. The first query failed because the 'employee_table' was not in the search_path. Above, we added sql_class to our path. The first query will work now because the database sql_class has been placed in our search path, and that is where the employee_table resides. Page 69

Chapter 3

Systems Tables

Seeing the System Tables in your Nexus Tree

The Matrix catalog is in the pg_catalog database. You can query these tables with SQL or merely do a "Quick Select" by right clicking on any table in the tree. We just did a "Quick Select" on the pg_aggregate table. Page 70

Chapter 3

Systems Tables

Catalog Table pg_table_def

The above query references the system catalog table named pg_table_def, and it only runs exclusively on the leader node. PG_TABLE_DEF will only return information for tables in schemas that are included in the search path. The query we ran on the previous page failed because the 'employee_table' was not in the search_path. The database that contains the employee_table is the sql_class database. Once we added the database sql_class to our search path, the query ran perfectly! Page 71

Chapter 3

Systems Tables

Checking Tables for Skew (Poor Distribution) SELECT TRIM(name) as Table_Name ,slice ,sum(num_values) as rows from svv_diskusage where name in ('Order_Table', 'Customer_Table') and col =0 group by name, slice order by name, slice;

System views - System views contain a subset of data found in several of the STL and STV system tables. Systems views have an SVV or SVLprefix.

_____________ Table_Name Customer_Table Customer_Table Customer_Table Customer_Table Order_Table Order_Table Order_Table Order_Table

_____ Slice Rows _____ 0 200 1 200 2 200 3 200 0 2500 1 2501 2 2498 3 2497

Uneven distribution, or data distribution skew, forces some nodes or slices to do more work than others which inhibits query performance. To check for distribution skew, you can query the SVV_DISKUSAGE system view. Each row in the system table SVV_DISKUSAGE records the statistics for one disk block. The num_values column gives the number of rows in that disk block, so when you sum(num_values), it returns the number of rows on each slice. Page 72

Chapter 3

Systems Tables

Checking All Statements That Used the Analyze Command SELECT xid ,to_char(starttime, 'HH24:MM:SS.MS') as starttime ,date_diff('sec',starttime,endtime ) as secs ,substring(text, 1, 40) as ActualText FROM svl_statementtext WHERE sequence = 0 AND xid in (select xid from svl_statementtext s where s.text like ’matrix_fetch_sample%' ) order by xid desc, starttime;

xid ___ 1340 1340 1340 1340 1339 1339 1339 1339

starttime ActualText ___________ secs ___ ___________________________________ 12:04:28.511 4 Analyze date 12:04:28.511 1 matrix_fetch_sample: select count(*) from 12:04:29.443 2 matrix_fetch_sample: select * from date 12:04:31.456 1 matrix_fetch_sample: select * from date 12:04:24.388 1 matrix_fetch_sample: select count(*) from 12:04:24.388 4 Analyze sales 12:04:25.322 2 matrix_fetch_sample: select * from sales 12:04:27.363 1 matrix_fetch_sample: select * from sales

The query above returns all the statements that ran in every completed transaction that included an ANALYZE command.

Page 73

Chapter 3

Systems Tables

Checking Tables for Skew (Poor Distribution) select P.name as "Table" ,count(*) as "1 MB blocks" from stv_blocklist as B INNER JOIN stv_tbl_perm as P ON B.tbl = P.id AND B.slice = P.slice WHERE P.name in ('Customer_Table', 'Order_Table') GROUP BY P.name ORDER BY 1 asc;

STV tables for snapshot data These tables are virtual system tables that contain snapshots of the current system data. Snapshot tables have an STV prefix.

Table 1___________ MB blocks _____________ Customer_Table 200 Order_Table 500

You can easily check on how many 1 MB blocks of disk space are used for each table by querying the STV_BLOCKLIST table. This will give you measurements on table sizes. Page 74

Chapter 3

Systems Tables

Checking for Details about the Last Copy Operation SELECT query as Query ,TRIM(filename) as File ,curtime as Updated from stl_load_commits where query = pg_last_copy_id() ;

STLtables for logging - These system tables are generated from Actian Matrix log files to provide a history of the system. Logging tables have an STLprefix.

Query File Updated _______ ___________________________ _______________________ 28555 s3://dw-sql_class/Sales_Table.txt 2013-11-01 17:13:50.611486

The above example returns details for the last COPY operation.

Page 75

Chapter 3

Systems Tables

Checking When a Table Has Last Been Analyzed SELECT query ,rtrim(querytxt) ,starttime FROM stl_query WHERE querytxt like 'matrix_fetch_sample%' AND querytxt like '%Sales_Table%' ORDER BY 1 desc;

Query _____ 81 80 79

querytxt _______________________________________ matrix_fetch_sample: select * from sales matrix_fetch_sample: select * from sales matrix_fetch_sample: select count(*) from sales

starttime _______________ 2014-05-18 12:... 2014-05-18 12:... 2014-05-18 12:...

To find out when ANALYZE commands were run, you can query STL_QUERY. For example, to find out when the Sales_Table was last analyzed, run the query above.

Page 76

Chapter 3

Systems Tables

Checking For Column Information on a Table SELECT Schemaname as "Schema" ,Tablename ,Column ,Type ,Distkey FROM pg_table_def WHERE tablename = 'Department_Table'; System catalog tables - The system catalog tables store schema metadata, such as information about tables and columns. System catalog tables have a PG prefix. Schema ______ Public Public Public Public

Tablename _______________ Department_Table Department_Table Department_Table Department_Table

Column _________________ Dept_No Department_Name Mgr_No Budget

The above example returns information for the Department_Table.

Page 77

Type ____________ Integer Char(20) Integer Decimal(10,2)

Distkey _______ T F F F

Chapter 3

Systems Tables

System tables for troubleshooting data loads

We have created a view that shows details about load errors. This view joins the tables STL_LOAD_ERRORS to STV_TBL_PERM to match table IDs with actual table names.

CREATE VIEW ch_loadview as (SELECT DISTINCT tbl ,TRIM(name) as table_name ,query ,starttime ,trim(filename) as input ,line_number ,colname ,trim(err_reason) as reason ,err_code FROM stl_load_errors sl INNER JOIN stv_tbl_perm sp ON sl.tbl = sp.id);

SELECT * FROM ch_loadview WHERE table_name='Employee_Table';

The example above is helpful in troubleshooting data load issues.

Page 78

Chapter 3

Systems Tables

Determining Whether a Query is Writing to Disk SELECT query, elapsed ,substring FROM svl_qlog ORDER BY query DESC This shows the LIMIT 3 ; last three queries Query Elapsed ______________________ Substring ______ ________ Use the Query ID

1040 1039 1038

9574270 Select * from Claims 87645 Select * from Services 786544 Select * from Providers

Now run this query using the query id

SELECT query, step, rows, workmem, label, is_diskbased FROM svl_query_summary WHERE query = 1040 ORDER BY workmem desc;

query step ________ rows workmem __________ label is_diskbased ____ ____ _________ __________ 1040 0 1040 2

16000000 43205240 16000000 43205240

scan tbl=9 f hash tbl=142 t

If IS_DISKBASED is true ("t") for any step, then that step wrote data to disk.

Page 79

The t means this ran on disk

Chapter 4

Page 80

Compression

Chapter 4

Compression

Chapter 4 – Compression

“Speak in a moment of anger and you’ll deliver the greatest speech you’ll ever regret.” – Anonymous

Page 81

Chapter 4

Compression

Compression Types Encoding Type

Encoding Keyword

Raw (no compression) RAW Byte dictionary BYTEDICT Delta DELTA Delta

DELTA32K

LZO

DEFLATE

Mostlyn Mostlyn Mostlyn Run-length Text Text

MOSTLY8 MOSTLY16 MOSTLY32 RUNLENGTH TEXT255 TEXT32K

Data Types All All except BOOLEAN SMALLINT, INT, BIGINT, DATE, TIMESTAMP, DECIMAL INT, BIGINT, DATE, TIMESTAMP, DECIMAL All except BOOLEAN, REAL, and DOUBLE PRECISION SMALLINT, INT, BIGINT, DECIMAL INT, BIGINT, DECIMAL BIGINT, DECIMAL All VARCHAR only VARCHAR only

The table above identifies the supported compression encodings and the data types that support the encoding. Compression reduces the size of data when it is stored, and it is a column-level operation. Compression conserves storage space and reduces the size of data that is read from storage, which will then reduce the amount of disk I/O, thus improving query performance. By default, Actian Matrix stores data in its raw, uncompressed format, but you can apply a compression type, or encoding, to the columns in a table manually (when the table is created). Or, you can use the COPY command to analyze and apply compression automatically. Either way, it is important to compress your data. Page 82

Chapter 4

Compression

Byte Dictionary Compression Encoding Type

Encoding Keyword

Byte dictionary

BYTEDICT

Uncompressed Data

Compressed Data

Ohio California Minnesota Alaska Oregon Ohio California Minnesota Alaska

1 2 3 4 5 1 2 3 4

Data Types All except BOOLEAN

Dictionary

1 - Ohio 2 - California 3 - Minnesota 4 - Alaska 5 - Oregon

Byte dictionary encoding utilizes a separate dictionary of unique values for each block of column values on disk. Remember, each Actian Matrix disk block occupies 1 MB. The dictionary contains up to 256 one-byte values that are stored as indexes to the original data values. If more than 256 values are stored in a single block, the extra values are written into the block in raw, uncompressed form. The process repeats for each disk block. This encoding is very effective when a column contains a limited number of unique values, and it is especially optimal when there wware less than 256 unique values.

Page 83

Chapter 4

Compression

Delta Encoding Encoding Type

Encoding Keyword

Delta

DELTA

Delta

DELTA32K

Uncompressed Data

4-byte integers

1 2 3 4 5 6 7 8

Data Types SMALLINT, INT, BIGINT, DATE, TIMESTAMP, DECIMAL INT, BIGINT, DATE, TIMESTAMP, DECIMAL

Delta Encoding Compression 0001 1 1 1 1 1 1

The first row is a 4-byte integer (plus one flag byte).

One byte with the number 1. Each is 1 greater than the previous value.

Delta encodings are very useful for date and time columns. Delta encoding compresses data by recording the difference between values that follow each other in the column. These differences are recorded in a separate dictionary for each block of column values on disk. If the column contains 10 integers in sequence from 1 to 10, the first will be stored as a 4-byte integer (plus a 1-byte flag), and the next 9 will each be stored as a byte with the value 1, indicating that it is one greater than the previous value. Delta encoding comes in two variations. DELTA records the differences as 1-byte values (8-bit integers), and DELTA32K records differences as 2-byte values (16bit integers) Page 84

Chapter 4

Compression

Deflate Encoding - Lempel–Ziv–Oberhumer (LZO) Encoding Type LZO

Encoding Keyword

Data Types

DEFLATE

All except BOOLEAN, REAL, and DOUBLE PRECISION



Designed to work best with Char and Varchar data that store long character strings



Is a portable lossless data compression library written in ANSI C



Offers fast compression but extremely fast decompression



Includes slower compression levels achieving a quite competitive compression ratio while still decompressing at this very high speed



Often implemented with a tool called LZOP

Lempel–Ziv–Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed. LZO encoding provides a high compression ratio with good performance. LZO encoding is designed to work well with character data. It is especially good for CHAR and VARCHAR columns that store very long character strings especially free form text such as product descriptions, user comments, or JSON strings. Page 85

Chapter 4

Compression

Mostly Encoding Encoding Type

Data Types

Encoding Keyword

Mostlyn Mostlyn Mostlyn

MOSTLY8 MOSTLY16 MOSTLY32

Encoding Type

SMALLINT, INT, BIGINT, DECIMAL INT, BIGINT, DECIMAL BIGINT, DECIMAL

Range of Values

Compressed Storage Size

Mostly8 Mostly16 Mostly32

1 byte (8 bits) -128 to 127 2 bytes (16 bits) -32768 to 32767 4 bytes (32 bits) -2147483648 to 2147483647

Uncompressed

Smallint 2 bytes

120 100 5000 110 45

Mostly8 compressed 120 100 5000 110 45

Items colored in blue are stored as 1 byte

Mostly encodings are useful when the data type for a column is larger than the majority of the stored values require. By specifying a mostly encoding for this type of column, you can compress the majority of the values in the column to a smaller standard storage size. The remaining values that cannot be compressed are stored in their raw form. Page 86

Chapter 4

Compression

Runlength encoding Encoding Type

Run-length

Original Data Ohio Ohio Ohio Ohio California California California Michigan Michigan

Encoding Keyword

RUNLENGTH

Data Types

All

Original size (bytes) Compressed Value Compressed Size 4 4 4 4 10 10 10 8 8

4, Ohio

3, California

2 Michigan

5 0 0 0 11 0 0 9 0

Runlength encoding replaces a value that is repeated consecutively with a token that consists of the value and a count of the number of consecutive occurrences (the length of the run). This is where the name Runlength comes into play. A separate dictionary of unique values is created for each block of column values on disk. This encoding is best suited to a table in which data values are often repeated consecutively, for example, when the table is sorted by those values. Page 87

Chapter 4

Compression

Text255 and Text32k Encodings Encoding Type Text Text

Data Types

Encoding Keyword TEXT255 TEXT32K

VARCHAR only VARCHAR only

Uncompressed Data

Compressed Data

Separate Dictionary

The beginning of time The beginning of time The beginning of time The beginning of time

1234 1234 1234 1234

1) The 2) beginning 3) of 4) time

Text255 and text32k encodings are useful for compressing VARCHAR columns only. Both compression techniques work best when the same words recur often. A separate dictionary of unique words is created for each block of column values on disk. Text255 has a dictionary that contains the first 245 unique words in the column. Those words are replaced on disk by a one-byte index value representing one of the 245 values, and any words that are not represented in the dictionary are stored uncompressed. This process is repeated for each block. For the text32k encoding, the principle is the same, but the dictionary for each block does not capture a specific number of words. Instead, the dictionary indexes each unique word it finds until the combined entries reach a length of 32K, minus some overhead. The index values are stored in two bytes. Page 88

Chapter 4

Compression

Analyze Compression using xpx ‘complyze’ psql -c “xpx ‘complyze NOTE: The above example can only be executed using the Actian Matrix command line interface psql Table_Name – You must specify a table_name. You can also analyze compression for temporary tables. stl_complyze - Results will be written to a system table (stl_complyze) that summarize the results of various compression options on a per-column basis. Another table is also created with the count of nulls in the table (stl_nullyze). You can query these tables to derive the compression methods that will work best for you. Unsupported compression methods: Note that there are two types of compression not analyzed by complyze: GLOBALDICT encoding DEFLATE (a.k.a. LZ) encoding

There is a parameter that can be set to allow complyze to also analyze DEFLATE encodings as part of the analysis. Set the following parameter prior to running complyze in order to enable this behavior: set complyze_uses_deflate to on;

The xpx ‘complyze ’ command performs compression analysis and produces a report with the suggested column encoding schemes for the tables analyzed. The complyze command does not modify the column encodings of the table but merely makes suggestions. To implement the suggestions, you must recreate the table, or create a new table with the same schema. Complyze does not consider RUNLENGTH encoding on any column that is designated as a SORTKEY. This is because range-restricted scans might perform poorly when SORTKEY columns are compressed much more highly than other columns. Complyze acquires an exclusive table lock, which prevents concurrent reads and writes against the table. Only run the xpx complyze command when the table is idle. Page 89

Chapter 4

Compression

Analyze Results from xpx ‘complyze’ CREATE A VIEW TO REVIEW RESULTS

You can query stl_complyze and manually include the groupings and sort or create a simple view to make it easier to analyze the output from xpx ‘complyze ’: CREATE or REPLACE VIEW stl_complyze_v ( measuretime, tbl, tbl_name, col, col_name, encoding, size ) as SELECT TO_DATE(measuretime, 'YYYY-MM-DD HH24:MI:SS'), tbl, BTRIM(tbl_name), col, BTRIM(col_name), format_encoding(encoding), SUM(size) from stl_complyze group by 1,2,3,4,5,6 order by 1,2,3,4,5,7;

Compression Best Practices: 1. “none” and “raw” are the same thing. If a “none” matches or is very close to the best compressor, use “none” as it has no decompression costs incurred at runtime. 2. If an alwaysN encoding type have almost the same block count as another type (i.e. within 10%), Choose the alwaysN type; there is a numerical bias against alwaysN. 3. If the always8, always16, always32 have almost the same block count, use the lowest N; ie. If always8 is only slightly more than always16, choose always8. 4. LZ is not enabled in the analyzer by default. You’ll need to make your own evaluation whether it is beneficial. If you have very wide varchar and char columns that are not included in your query predicates, you may find it to be very effective. To have complyze also consider DECODE: set complyze_uses_deflate to on; 5. The analyze compression doesn’t report on globaldictXX. You’ll need to make your own evaluation whether it is beneficial based on a count(distinct XN Seq Scan on employee_table (cost=0.00..0.09 rows=9 width=12)

The keyword Aggregate in the EXPLAIN is used for aggregation scalar functions. A scalar function means that only one row and one column are returned in the answer set for an aggregation function. Notice that the above query produces a scalar result. The AVG(Salary) in the Employee_Table is $46782.15. That result is only one column and one row! It is scalar!

Page 144

Chapter 6

Explain

EXPLAIN for HashAggregate Functions

XN HashAggregate (cost=0.18..0.22 rows=5 width=14) -> XN Seq Scan on employee_table (cost=0.00..0.09 rows=9 width=14)

The keyword HashAggregate in the EXPLAIN is used for unsorted grouped aggregate functions. Notice there is no sort!

Page 145

Chapter 6

Explain

EXPLAIN Using Limit, Merge and Sort

Limit – This example returns 5 rows.

XN Limit (cost=1000000000000.23..1000000000000.25 rows=5 width=46) -> XN Merge (cost=1000000000000.23..1000000000000.26 rows=9 width=46) Merge Key: salary -> XN Network (cost=1000000000000.23..1000000000000.26 rows=9 width=46) Send to leader -> XN Sort (cost=1000000000000.23..1000000000000.26 rows=9 width=46) Sort Key: salary -> XN Seq Scan on employee_table (cost=0.00..0.09 rows=9 width=46)

The keyword Limit is used to evaluate the LIMIT clause. The keyword Sort is used to evaluate the ORDER BY clause. The Keyword Merge is used when producing the final sorted results, which is derived from intermediate sorted results that each slice parallel processed. Remember, each slice must perform their work. Then, the data is sorted on each slice and passed to the leader node where a Merge operation is performed. Page 146

Chapter 6

Explain

EXPLAIN Using a WHERE Clause Filter

XN Seq Scan on student_table (cost=0.00..0.12 rows=4 width=53) Filter: (class_code = 'FR'::bpchar)

The keyword Filter is used to evaluate the WHERE clause. In the above example, we are filtering the returning rows by only looking for Freshman who have a class_code of 'FR'. Our EXPLAIN (in yellow) shows the keyword filter looking for 'FR'. Page 147

Chapter 6

Explain

EXPLAIN Using the Keyword Distinct

XN Merge (cost=1000000000000.17..1000000000000.18 rows=4 width=6) Merge Key: class_code -> XN Network (cost=1000000000000.17..1000000000000.18 rows=4 width=6) Send to leader -> XN Sort (cost=1000000000000.17..1000000000000.18 rows=4 width=6) Sort Key: class_code -> XN Unique (cost=0.00..0.12 rows=4 width=6) -> XN Seq Scan on student_table (cost=0.00..0.10 rows=10 width=6)

The keyword Unique in the EXPLAIN is used to evaluate the Distinct clause. In the above example, we do a sequential scan of the entire student_table. Then, you see the keyword UNIQUE in the EXPLAIN plan. This ensures that no duplicate values will be returned. The data is then sorted on each slice and sent to the leader node for a final merge among all slices. Page 148

Chapter 6

Explain

EXPLAIN for Subqueries

The department_table is scanned to deliver the column Dept_No.

The Dept_No columns returning are hashed to slices.

The Employee_Table is scanned and hashed by Dept_No.

The result set is then returned.

A subquery involves at least two queries, a top and bottom query. In the above example, the bottom query is run first on the Department_Table. The result set consists of the column Dept_No. The Employee_Table (top query) is scanned next. Then, the results of both the Department_Table and the Employee_Table scans are hashed by Dept_No. This places all matches on the same slice. The rows can then be joined using a Hash Join in memory. Page 149

Chapter 7

Page 150

Basic SQL Functions

Chapter 7

Basic SQL Functions

Chapter 7 - Basic SQL Functions

“When I was 14, I thought my parents were the stupidest people in the world. When I was 21, I was amazed at how much they learned in seven years.” - Mark Twain

Page 151

Chapter 7

Basic SQL Functions

Finding the Current Schema on the Leader Node Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT current_schema() ;

Messages

Garden of Analysis

This command is a leader node only function Result 1

current_schema 1 public

The CURRENT_SCHEMA function is a leader-node only function. In this example, the query does not reference a table, so it runs exclusively on the leader node in order to show the schema. Our Current_Schema is the public schema. Page 152

Chapter 7

Basic SQL Functions

Getting Things Setup in Your Search Path show search_path ; $user, public

We will be querying the Databases sql_class and sql_views. This is how you can add them to your search_path. You won't have to fully qualify queries.

set search_path to '$user', 'public', 'sql_class', 'sql_views' ; We ran two queries above. The first query showed us our search_path and it contained $user and public. Since we will be querying the databases sql_class and sql_views in our example labs, we need to place them in our search_path. The second example (in yellow) has done this through the "Set search_path" command. Page 153

Chapter 7

Basic SQL Functions

Five Details You Need To Know About the Search_Path When an object is created without a schema, it is placed in the first schema listed in the search path. If the search path is empty, the system will return an error.

Objects that are not in any search path schemas can only be referenced by using a fully qualified name that also includes the schema.

When objects with identical names exist in different schemas, the one found first in the search path is the one that will be used.

The system catalog schema that is named pg_catalog is always searched. If it is mentioned in the path, it is searched in the specified order. If not mentioned, it is searched before any of the path items.

The current session's temporary-table schema, pg_temp_nnn, is searched (if it exists). It can be explicitly listed in the path by using the alias pg_temp. If not listed in the path, it will be searched first (even before pg_catalog). Remember, the temporary schema is only searched for tables and view names. It is not searched for any function names.

Above are the five things you need to know about how the Search_Path works.

Page 154

Chapter 7

Basic SQL Functions

Introduction Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

The Student_Table above will be used in our early SQL Examples

This is a pictorial of the Student_Table which we will use to present some basic examples of SQL and get some hands-on experience with querying this table. This book attempts to show you the table, show you the query, and show you the result set.

Page 155

Chapter 7

Basic SQL Functions

SELECT * (All Columns) in a Table SELECT * FROM Student_Table ;

Student_ID ________ Last_Name ________ 423400 125634 280023 260000 231222 234121 324652 123250 322133 333450

Larkins Hanson McRoberts Johnson Wilson Thomas Delaney Phillips Bond Smith

First_Name ________ Michael Henry Richard Stanley Susie Wendy Danny Martin Jimmy Andy

An asterisk (*) means you want to see ALL columns in the table on your report

Class_Code Grade_Pt _________ _______ FR FR JR ? SO FR SR SR JR SO

0.00 2.88 1.90 ? 3.80 4.00 3.35 3.00 3.95 2.00

Mostly every SQL statement will consist of a SELECT and a FROM. You SELECT the columns you want to see on your report and an Asterisk (*) means you want to see all columns in the table on the returning answer set!

Page 156

Chapter 7

Basic SQL Functions

SELECT Specific Columns in a Table SELECT First_Name ,Last_Name ,Class_Code ,Grade_Pt FROM Student_Table ; First_Name Last_Name _________ Class_Code ________ Grade_Pt _________ _________ Michael Larkins FR 0.00 Henry Hanson FR 2.88 Richard McRoberts JR 1.90 Stanley Johnson ? ? Susie Wilson SO 3.80 Wendy Thomas FR 4.00 Danny Delaney SR 3.35 Martin Phillips SR 3.00 Jimmy Bond JR 3.95 Andy Smith SO 2.00

This is a great way to show the columns you are selecting from the Table_Name.

Page 157

Chapter 7

Basic SQL Functions

Commas in the Front or Back? SELECT First_Name ,Last_Name 1 ,Class_Code ,Grade_Pt FROM Student_Table ;

SELECT First_Name, Last_Name, 2 Class_Code, Grade_Pt FROM Student_Table ;

First_Name Last_Name _________ Class_Code ________ Grade_Pt _________ _________ Michael Larkins FR 0.00 Henry Hanson FR 2.88 Richard McRoberts JR 1.90 Stanley Johnson ? ? Susie Wilson SO 3.80 Wendy Thomas FR 4.00 Danny Delaney SR 3.35 Martin Phillips SR 3.00 Jimmy Bond JR 3.95 Andy Smith SO 2.00 Why is the example on the left better even though they are functionally equivalent? Errors are easier to spot and comments won't cause errors.

Page 158

Chapter 7

Basic SQL Functions

Place your Commas in front for better Debugging Capabilities SELECT First_Name, Last_Name, Class_Code, Grade_Pt, FROM Student_Table ;

Sometimes if you Add or Remove a COLUMN you can overlook an ending Comma!

SELECT

First_Name ,Last_Name ,Class_Code ,Grade_Pt

FROM Student_Table ;

Error!

Successful

"A life filled with love may have some thorns, but a life empty of love will have no roses." Anonymous Having commas in front to separate column names makes it easier to debug. Remember our quote above. "A query filled with commas at the end just might fill you with thorns, but a query filled with commas in the front will allow you to always come up smelling like roses."

Page 159

Chapter 7

Basic SQL Functions

Sort the Data with the ORDER BY Keyword Sorts the Answer Set in Ascending order by default

SELECT * FROM Student_Table ORDER BY Last_Name ;

Student_ID _________ Last_Name First_Name Class_Code Grade_Pt _________ ________ _________ _______ 322133 324652 125634 260000 423400 280023 123250 333450 234121 231222

Bond Delaney Hanson Johnson Larkins McRoberts Phillips Smith Thomas Wilson

Jimmy Danny Henry Stanley Michael Richard Martin Andy Wendy Susie

JR SR FR ? FR JR SR SO FR SO

3.95 3.35 2.88 ? 0.00 1.90 3.00 2.00 4.00 3.80

Rows typically come back to the report in random order. To order the result set, you must use an ORDER BY. When you order by a column, it will order in ASCENDING order. This is called the Major Sort! Page 160

Chapter 7

Basic SQL Functions

ORDER BY Defaults to Ascending Sorts the Answer Set In Ascending Order By Last_Name

SELECT * FROM Student_Table ORDER BY Last_Name ;

Student_ID _________ Last_Name First_Name Class_Code Grade_Pt _________ ________ _________ _______ 322133 324652 125634 260000 423400 280023 123250 333450 234121 231222

Bond Delaney Hanson Johnson Larkins McRoberts Phillips Smith Thomas Wilson

Jimmy Danny Henry Stanley Michael Richard Martin Andy Wendy Susie

JR SR FR ? FR JR SR SO FR SO

3.95 3.35 2.88 ? 0.00 1.90 3.00 2.00 4.00 3.80

Rows typically come back to the report in random order, but we decided to use the ORDER BY statement. Now, the data comes back ordered by Last_Name.

Page 161

Chapter 7

Basic SQL Functions

Use the Name or the Number in your ORDER BY Statement SELECT * FROM Student_Table ORDER BY 2 ;

Sorts the Answer Set by Column 2 which is Last_Name

Sort by the 2nd column coming back on the report

Class_Code Grade_Pt Last_Name First_Name Student_ID _________ _______ ________ _________ _________ 322133 324652 125634 260000 423400 280023 123250 333450 234121 231222

Bond Delaney Hanson Johnson Larkins McRoberts Phillips Smith Thomas Wilson

Jimmy Danny Henry Stanley Michael Richard Martin Andy Wendy Susie

JR SR FR ? FR JR SR SO FR SO

3.95 3.35 2.88 ? 0.00 1.90 3.00 2.00 4.00 3.80

The ORDER BY can use a number to represent the sort column. The number 2 represents the second column on the report.

Page 162

Chapter 7

Basic SQL Functions

Two Examples of ORDER BY using Different Techniques SELECT * FROM Student_Table ORDER BY 5 ;

Student_ID _________ 260000 423400 280023 333450 125634 123250 324652 231222 322133 234121

Same Query

Last_Name First_Name _________ _________ Johnson Larkins McRoberts Smith Hanson Phillips Delaney Wilson Bond Thomas

Stanley Michael Richard Andy Henry Martin Danny Susie Jimmy Wendy

SELECT * FROM Student_Table ORDER BY Grade_Pt ;

Class_Code _________ Grade_Pt _______ ? FR JR SO FR SR SR SO JR FR

? 0.00 1.90 2.00 2.88 3.00 3.35 3.80 3.95 4.00

Notice that the answer set is sorted in ascending order based on the column Grade_Pt. Also, notice that Grade_Pt is the fifth column coming back on the report. That is why the SQL in both statements is ordering by Grade_Pt. Did you notice that the null value came back first? Nulls sort first in ascending order and last in descending order.

Page 163

Chapter 7

Basic SQL Functions

Changing the ORDER BY to Descending Order Sorts the Answer Set In DESC Order By Last_Name

Student_ID Last_Name ________ _________ 231222 Wilson 234121 Thomas 333450 Smith 123250 Phillips 280023 McRoberts 423400 Larkins 260000 Johnson 125634 Hanson 324652 Delaney 322133 Bond

SELECT * FROM Student_Table ORDER BY Last_Name DESC;

First_Name Class_Code Grade_Pt ________ _________ _______ Susie SO 3.80 Wendy FR 4.00 Andy SO 2.00 Martin SR 3.00 Richard JR 1.90 Michael FR 0.00 Stanley ? ? Henry FR 2.88 Danny SR 3.35 Jimmy JR 3.95

Notice that the answer set is sorted in descending order based on the column Last_Name. Also, notice that Last_Name is the second column coming back on the report. We could have done an Order By 2. If you spell out the word DESCENDING the query will fail, so you must remember to just use DESC.

Page 164

Chapter 7

Basic SQL Functions

NULL Values sort First in Ascending Mode (Default) SELECT * FROM Student_Table ORDER BY 5 ;

Student_ID _________ 260000 423400 280023 333450 125634 123250 324652 231222 322133 234121

SELECT * FROM Student_Table ORDER BY Grade_Pt ;

Last_Name First_Name _________ _________ Johnson Larkins McRoberts Smith Hanson Phillips Delaney Wilson Bond Thomas

Stanley Michael Richard Andy Henry Martin Danny Susie Jimmy Wendy

Class_Code _________ Grade_Pt _______ ? FR JR SO FR SR SR SO JR FR

Nulls sort first in ASC Order

? 0.00 1.90 2.00 2.88 3.00 3.35 3.80 3.95 4.00

Did you notice that the null value came back first? Nulls sort first in ascending order and last in descending order. Page 165

Chapter 7

Basic SQL Functions

NULL Values sort Last in Descending Mode (DESC) SELECT * FROM Student_Table ORDER BY 5 DESC ;

Student_ID _________ 234121 322133 231222 324652 123250 125634 333450 280023 423400 260000

Last_Name __________ Thomas Bond Wilson Delaney Phillips Hanson Smith McRoberts Larkins Johnson

SELECT * FROM Student_Table ORDER BY Grade_Pt DESC ;

First_Name __________ Wendy Jimmy Susie Danny Martin Henry Andy Richard Michael Stanley

Class_Code ________ Grade_Pt __________ FR 4.00 JR 3.95 SO 3.80 SR 3.35 SR 3.00 FR 2.88 SO 2.00 Nulls sort JR 1.90 Last in FR 0.00 DESC Order ? ?

You can ORDER BY in descending order by putting a DESC after the column name or its corresponding number. Null Values will sort Last in DESC order. Page 166

Chapter 7

Basic SQL Functions

Major Sort vs. Minor Sorts SELECT * FROM Student_Table ORDER BY Class_Code DESC, Grade_Pt ASC;

Student_ID _________ Last_Name ________ 123250 324652 333450 231222 280023 322133 423400 125634 234121 260000

Phillips Delaney Smith Wilson McRoberts Bond Larkins Hanson Thomas Johnson

Major Sort on Class_Code Descending Minor Sort on Grade_Pt Ascending

First_Name _________ Class_Code Grade_Pt _________ _______ Martin Danny Andy Susie Richard Jimmy Michael Henry Wendy Stanley

SR SR SO SO JR JR FR FR FR ?

Major sorts first

3.00 3.35 2.00 3.80 1.90 3.95 0.00 2.88 4.00 ?

Minor sorts on ties

Major sort is the first sort. There can only be one major sort. A minor sort kicks in if there are Major Sort ties. There can be zero or more minor sorts. Page 167

Chapter 7

Basic SQL Functions

Multiple Sort Keys using Names vs. Numbers SELECT * FROM Employee_Table ORDER BY Dept_No DESC ,Salary ASC ,Last_Name ASC;

SELECT * FROM Employee_Table ORDER BY 2 DESC, 5, 5 ASC ;

These queries sort identically Employee_No __________ 2341218 1256349 1121334 2312225 1324657 1333454 1232578 1000234 2000000

Dept_No _______ 400 400 400 300 200 200 100 10 ?

Last_Name _________ First_Name _______ Salary ________ Reilly Harrison Strickling Larkins Coffing Smith Chambers Smythe Jones

William Herbert Cletus Loraine Billy John Mandee Richard Squiggy

36000.00 54500.00 54500.00 40200.00 41888.88 48000.00 48850.00 64300.00 32800.50

In the example above, the Dept_No is the major sort and we have two minor sorts. The minor sorts are on the Salary and the Last_Name columns. Both Queries above have an equivalent Order by statement and sort exactly the same.

Page 168

Chapter 7

Basic SQL Functions

Sorts are Alphabetical, NOT Logical SELECT * FROM Student_Table ORDER BY Class_Code ;

Student_ID ________ Last_Name First_Name Grade_Pt ________ ________ Class_Code ________ ________ 260000 234121 125634 423400 322133 280023 231222 333450 324652 123250

Johnson Thomas Hanson Larkins Bond McRoberts Wilson Smith Delaney Phillips

Stanley Wendy Henry Michael Jimmy Richard Susie Andy Danny Martin

? FR FR FR JR JR SO SO SR SR

? 4.00 2.88 0.00 3.95 1.90 3.80 2.00 3.35 3.00

This sorts alphabetically. Can you change the sort so the Freshman come first, followed by the Sophomores, Juniors, Seniors and then the Null?

Can you change the query to Order BY Class_Code logically (FR, SO, JR, SR, ?)?

Page 169

Chapter 7

Basic SQL Functions

Using A CASE Statement to Sort Logically SELECT * FROM Student_Table ORDER BY CASE Class_Code WHEN 'FR' WHEN 'SO' CASE in the WHEN 'JR' ORDER BY WHEN 'SR' Statement

THEN 1 THEN 2 THEN 3 THEN 4 ELSE 5

END; Student_ID ________ Last_Name First_Name Grade_Pt ________ ________ Class_Code ________ ________ 234121 125634 423400 333450 231222 280023 322133 123250 324652 260000

This is the way the pros do it.

Page 170

Thomas Hanson Larkins Smith Wilson McRoberts Bond Phillips Delaney Johnson

Wendy Henry Michael Andy Susie Richard Jimmy Martin Danny Stanley

FR FR FR SO SO JR JR SR SR ?

4.00 2.88 0.00 2.00 3.80 1.90 3.95 3.00 3.35 ?

Chapter 7

Basic SQL Functions

How to ALIAS a Column Name SELECT First_Name AS Fname ,Last_Name Lname ,Class_Code "Class Code" ,Grade_Pt AS "AVG" ,Student_ID AS STU_ID FROM Student_Table WHERE Class_Code = 'FR' ; Fname ______ Lname ________ Class Code ____ AVG ______ STU_ID ______

Michael Larkins Henry Hanson Wendy Thomas

FR FR FR

0.00 2.88 4.00

423400 125634 234121

Different Techniques for Aliasing

Notice the column headers

ALIAS Rules! 1) AS is optional 2) Use Double Quotes when Spaces are in the Alias name 3) Use Double Quotes when the Alias is a reserved word

When you ALIAS a column, you give it a new name for the report header. You should always reference the column using the ALIAS everywhere else in the query. You never need Double Quotes in SQL unless you are Aliasing.

Page 171

Chapter 7

Basic SQL Functions

A Missing Comma can by Mistake become an Alias SELECT First_Name, Last_Name, Class_Code Grade_Pt FROM Student_Table ; Missing a Comma

First_Name Last_Name _________ _________ Grade_Pt _______

Michael Susie Richard Jimmy Henry Andy Danny Stanley Wendy Martin

Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips

FR SO JR JR FR SO SR ? FR SR

Aliased as Grade_Pt

Column names must be separated by commas. Notice in this example, there is a comma missing between Class_Code and Grade_Pt. What this will result in is only three columns appearing on your report with one being aliased wrong. Page 172

Chapter 7

Basic SQL Functions

Comments using Double Dashes are Single Line Comments Comment

Student_ID _________ 260000 423400 280023 333450 125634 123250 324652 231222 322133 234121

-- Double Dashes provide a single line comment SELECT * FROM Student_Table ORDER BY Grade_Pt ;

Last_Name First_Name _________ _________ Johnson Stanley Larkins Michael McRoberts Richard Smith Andy Hanson Henry Phillips Martin Delaney Danny Wilson Susie Bond Jimmy Thomas Wendy

Class_Code _________ Grade_Pt _______ ? FR JR SO FR SR SR SO JR FR

Double dashes make a single line comment that will be ignored by the system.

Page 173

? 0.00 1.90 2.00 2.88 3.00 3.35 3.80 3.95 4.00

Chapter 7

Basic SQL Functions

Comments for Multi-Lines Comment

/* This is how you can make multi-line comments to express what is going on in the code. */

SELECT * FROM Student_Table ORDER BY Grade_Pt ;

Student_ID _________ 260000 423400 280023 333450 125634 123250 324652 231222 322133 234121

Last_Name First_Name _________ _________ Johnson Stanley Larkins Michael McRoberts Richard Smith Andy Hanson Henry Phillips Martin Delaney Danny Wilson Susie Bond Jimmy Thomas Wendy

Class_Code _________ Grade_Pt _______ ? FR JR SO FR SR SR SO JR FR

Slash Asterisk starts a multi-line comment and Asterisk Slash ends the comment.

Page 174

? 0.00 1.90 2.00 2.88 3.00 3.35 3.80 3.95 4.00

Chapter 7

Basic SQL Functions

Comments for Multi-Lines as Double Dashes Per Line Comments

Student_ID _________ 260000 423400 280023 333450 125634 123250 324652 231222 322133 234121

-- This is how you can make multi-line comments -- also to express what is going on in the code. SELECT * FROM Student_Table ORDER BY Grade_Pt ;

Last_Name First_Name _________ _________ Johnson Stanley Larkins Michael McRoberts Richard Smith Andy Hanson Henry Phillips Martin Delaney Danny Wilson Susie Bond Jimmy Thomas Wendy

Class_Code _________ Grade_Pt _______ ? FR JR SO FR SR SR SO JR FR

Double Dashes in front of both lines comments both lines out and they’re ignored. Page 175

? 0.00 1.90 2.00 2.88 3.00 3.35 3.80 3.95 4.00

Chapter 7

Basic SQL Functions

A Great Technique for Comments to Look for SQL Errors SELECT Student_ID ,Last_Name ,First_Name ,Class_Code as Sum ,Grade_Pt FROM Student_Table WHERE Grade_Pt > 3.6 ERROR

Comment

SELECT Student_ID ,Last_Name ,First_Name -,Class_Code as Sum ,Grade_Pt FROM Student_Table WHERE Grade_Pt > 3.6

Student_ID Last_Name _________ First_Name _______ Grade_Pt ________ _________ 234121 Thomas Wendy 4.00 231222 Wilson Susie 3.80 322133 Bond Jimmy 3.95

The query on the left had an error because the keyword Sum is reserved. We can test if this is the problem by commenting out that line in our SQL (example on the right). Now, our query works. We know the problem is on the line that we commented out. Once we put "Sum" (double quotes around the alias) it works. Use comments to help you debug.

Page 176

Chapter 8

Page 177

The WHERE Clause

Chapter 8

The WHERE Clause

Chapter 8 – The WHERE Clause

“I saw the angel in the marble and carved until I set him free.” - Michelangelo

Page 178

Chapter 8

The WHERE Clause

Using Limit to bring back a Sample LIMIT { ALL | } The following is an example using LIMIT:

SELECT * FROM Employee_table LIMIT 3 ;

Employee_No Dept_No ____________ ________

1121334 1324657 2341218

400 200 400

Last_Name First_Name ________ Salary __________ __________ Strickling Coffing Reilly

Cletus Billy William

54500.00 41888.88 36000.00

Matrix offers a unique capability in its SQL to limit the number of rows returned from the table's data. It is a LIMIT clause and it is normally added at end of a valid SELECT statement with the above example and syntax. This example uses a LIMIT clause to reduce the rows returned, but in reality, the limiting of rows comes from the WHERE clause.

Page 179

Chapter 8

The WHERE Clause

Using Limit with an Order By Statement SELECT * FROM Employee_table ORDER BY Salary DESC LIMIT 5 ;

Employee_No Dept_No ____________ ________ 1000234 1256349 1121334 1232578 1333454

10 400 400 100 200

The result set returns the top 5 salaried employees

Last_Name First_Name ________ Salary __________ __________ Smythe Harrison Strickling Chambers Smith

Richard Herbert Cletus Mandee John

64300.00 54500.00 54500.00 48850.00 48000.00

The brilliance of the example above is that we have sorted the data using an ORDER BY statement. Since we are sorting by Salary DESC, and we have a limit of 5 rows, this will bring back the top 5 salaried employees.

Page 180

Chapter 8

The WHERE Clause

The WHERE Clause limits Returning Rows Student_Table Student_ID Last_Name First_Name Grade_Pt _________ __________ __________ Class_Code __________ ________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips

Michael Susie Richard Jimmy Henry Andy Danny Stanley Wendy Martin

FR SO JR JR FR SO SR ? FR SR

0.00 3.80 1.90 3.95 2.88 2.00 3.35 ? 4.00 3.00

SELECT First_Name, Last_Name, Class_Code, Grade_Pt FROM Student_Table WHERE First_Name = 'Henry'; First_Name Class_Code _________ Grade_Pt __________ Last_Name __________ __________ Henry Hanson FR 2.88 The WHERE Clause here filters how many ROWS are coming back. In this example, I am asking for the report to show only rows WHERE the first name is Henry. Page 181

Chapter 8

The WHERE Clause

Using a Column ALIAS throughout the SQL SELECT First_Name AS Fname ,Last_Name Lname ,Class_Code "Class Code" ,Grade_Pt AS "AVG" ,Student_ID FROM Student_Table WHERE Fname = 'Henry' ;

Aliasing a column

Use the ALIAS again in your WHERE Clause!

Fname Lname ______ ______ Henry

Hanson

Class Code _____ AVG _________ FR

2.88

Student_ID __________ 125634

When you ALIAS a column, you give it a new name for the report header, but a good rule of thumb is to refer to the column by the alias throughout the query.

Page 182

Chapter 8

The WHERE Clause

Double Quoted Aliases are for Reserved Words and Spaces The AS keyword is always optional.

SELECT First_Name AS Fname ,Last_Name Lname If spaces are in the Alias, you must use ,Class_Code "Class Code" double quotes. ,Grade_Pt AS "AVG" ,Student_ID FROM Student_Table WHERE Fname = 'Henry' If Double Quotes ORDER BY "AVG" ; are used, then use the Double Quotes throughout the SQL

“Write a wise saying and your name will live forever.”

- Anonymous

When you ALIAS a column, you give it a new name for the report header, but a good rule of thumb is to refer to the column by the alias throughout the query. Whoever wrote the above quote was way off. "Write a wise alias and it will live until the query ends – bummer". Page 183

Chapter 8

The WHERE Clause

Character Data needs Single Quotes in the WHERE Clause Nexus Chameleon History

File Edit View Query Tools Help Web Windows System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_table WHERE First_Name = 'Henry' ; Messages

Garden of Analysis

Character data needs single quotes

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1

125634

Hanson

Henry

FR

2.88

In the WHERE clause, if you search for Character data such as first name, you need single quotes around it. You don’t single-quote integers.

Page 184

Chapter 8

The WHERE Clause

Character Data needs Single Quotes, but Numbers Don’t Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_table WHERE Grade_Pt = 0.00 ; Messages

Garden of Analysis

Numeric data never needs quotes Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1

423400

Larkins

Michael

FR

0.00

A Character data (letters) need single quotes, but you need NO Single Quotes for Integers (numbers). Remember, you never use double quotes except for aliasing.

Page 185

Chapter 8

The WHERE Clause

NULL means UNKNOWN DATA so Equal (=) won’t Work Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT * FROM Student_Table WHERE Class_Code = NULL ;

Error

This query errors because of the equal = sign. Null is equal to nothing

First thing you need to know about a NULL is it is unknown data. It is NOT a zero. It is missing data. Since we don’t know what is in a NULL, you can’t use an = sign. You must use IS NULL or IS NOT NULL.

Page 186

Chapter 8

The WHERE Clause

Use IS NULL or IS NOT NULL when dealing with NULLs Nexus Chameleon System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Sandbox

History

File Edit View Query Tools Help Web Windows

Database: SQL Class

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_table WHERE Class_Code IS NULL ; Messages

Garden of Analysis

The only keywords to interrogate Null Values are IS NULL or IS NOT NULL

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1

260000

Johnson

Stanley

?

?

It is the query tool that places the question marks in for Null Values

If you are looking for a row that holds NULL value, you need to put ‘IS NULL’. This will only bring back the rows with a NULL value in it.

Page 187

Chapter 8

The WHERE Clause

NULL is UNKNOWN DATA so NOT Equal won’t Work Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT * FROM Student_Table WHERE Class_Code = NOT NULL ;

This query errors because of the equal = sign. Null means no value, so it can't be equal to anything!

Error

The same goes with = NOT NULL. We can’t compare a NULL with any equal sign. We can only deal with NULL values with IS NULL and IS NOT NULL.

Page 188

Chapter 8

The WHERE Clause

Use IS NULL or IS NOT NULL when dealing with NULLs SELECT * FROM Student_Table WHERE Class_Code IS NOT NULL ;

Student_ID _________

Last_Name _________

423400 125634 280023 231222 234121 324652 123250 322133 333450

Larkins Hanson McRoberts Wilson Thomas Delaney Phillips Bond Smith

First_Name __________ Michael Henry Richard Susie Wendy Danny Martin Jimmy Andy

Class_Code __________ Grade_Pt _______ FR FR JR SO FR SR SR JR SO

0.00 2.88 1.90 3.80 4.00 3.35 3.00 3.95 2.00

Much like before, when you want to bring back the rows that do not have NULLs in them, you put an ‘IS NOT NULL’ in the WHERE Clause. Page 189

Chapter 8

The WHERE Clause

Using Greater Than or Equal To (>=) Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_table WHERE Grade_PT >= 3.0; Messages

Garden of Analysis

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1 2 3 4 5

123250 231222 234121 322133 324652

Phillips Wilson Thomas Bond Delaney

Martin Susie Wendy Jimmy Danny

SR SO FR JR SR

3.00 3.80 4.00 3.95 3.35

The WHERE Clause doesn’t just deal with ‘Equals’. You can look for things that are GREATER or LESSER THAN along with asking for things that are GREATER/LESSER THAN or EQUAL to.

Page 190

Chapter 8

The WHERE Clause

AND in the WHERE Clause Nexus Chameleon History

File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_table WHERE Class_Code = 'Fr' AND First_Name = 'Henry' ; Messages

Garden of Analysis

Both conditions must be met

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1

125634

Hanson

Henry

FR

2.88

Notice the WHERE statement and the word AND. In this example, qualifying rows must have a Class_Code = ‘FR’ and also must have a First_Name of ‘Henry’. Notice how the WHERE and the AND clause are on their own line. Good practice! Page 191

Chapter 8

The WHERE Clause

Troubleshooting AND Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT FROM WHERE AND Messages

* Student_Table Grade_Pt = 3.0 Grade_Pt = 4.0 ; Garden of Analysis

Both conditions must be met Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt No rows are returned because No student can have two different Grade_Pt values

What is going wrong here? You are using an AND to check the same column. What you are basically asking with this syntax is to see the rows that have BOTH a Grade_Pt of 3.0 and a 4.0. That is impossible, so no rows will be returned.

Page 192

Chapter 8

The WHERE Clause

OR in the WHERE Clause Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT FROM WHERE OR

* Student_Table Grade_Pt = 3.0 Grade_Pt = 4.0 ;

Messages

Garden of Analysis

Either conditions can be met Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1 2

123250 234121

Phillips Thomas

Martin Wendy

SR FR

3.00 4.00

Notice above in the WHERE Clause we use OR. OR allows for either of the parameters to be TRUE in order for the data to qualify and return. Page 193

Chapter 8

The WHERE Clause

Troubleshooting Or Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 OR 4.0; error

SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 OR Grade_Pt = 4.0; perfect

Notice above in the WHERE Clause we use OR. OR allows for either of the parameters to be TRUE in order for the data to qualify and return. The first example errors and is a common mistake. The second example is perfect.

Page 194

Chapter 8

The WHERE Clause

Troubleshooting Character Data Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 AND Class_Code = SR ;

Error!!! Why?

This query errors! What is WRONG with this syntax? No Single quotes around SR.

Page 195

Chapter 8

The WHERE Clause

Using Different Columns in an AND Statement Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_table WHERE Grade_Pt = 3.0 AND Class_Code = 'SR' ; Messages

Garden of Analysis

Character data needs single quotes

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1

123250

Phillips

Martin

SR

3.00

Notice that AND separates two different columns, and the data will come back if both are TRUE.

Page 196

Chapter 8

The WHERE Clause

Quiz – How many rows will return? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT * FROM Student_Table WHERE Grade_Pt = 4.0 OR Grade_Pt = 3.0 AND Class_Code = 'SR' ;

Which Seniors have a 3.0 or a 4.0 Grade_Pt average. How many rows will return?

Page 197

A) 2

C) Error

B) 1

D) 3

Chapter 8

The WHERE Clause

Answer to Quiz – How many rows will return? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT * FROM Student_Table WHERE Grade_Pt = 4.0 OR Grade_Pt = 3.0 AND Class_Code = 'SR' ; Student_ID Last_Name __________ First_Name _________ _________ 234121 Thomas Wendy 123250 Phillips Martin

Class_Code Grade_Pt __________ ________ FR 4.00 SR 3.00

We had two rows return! Isn’t that a mystery? Why?

Page 198

Chapter 8

The WHERE Clause

What is the Order of Precedence?

1

()

2

NOT

3

AND

4

OR

SELECT * FROM Student_Table WHERE Grade_Pt = 4.0 OR Grade_Pt = 3.0 AND Class_Code = 'SR' ; Syntax has an ORDER OF PRECEDENCE. It will read anything with parentheses around it first. Then, it will read all the NOT statements. Next, the AND statements. FINALLY, the OR Statements. This is why the last query came out odd. Let’s fix it and bring back the right answer set. Page 199

Chapter 8

The WHERE Clause

Using Parentheses to change the Order of Precedence Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT FROM WHERE AND

* Student_Table (Grade_Pt = 3.0 OR Grade_Pt = 4.0) Class_Code = 'SR' ;

Messages

Garden of Analysis

Parenthesis are evaluated first

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1

123250

Phillips

Martin

SR

3.00

This is the proper way of looking for rows that have both a Grade_Pt of 3.0 or 4.0 AND also having a Class_Code of ‘SR’. Only ONE row comes back. Parentheses are evaluated first, so this allows you to direct exactly what you want to work first. Page 200

Chapter 8

The WHERE Clause

Using an IN List in place of OR Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT * FROM Student_Table WHERE Grade_Pt IN (3.0, 4.0) AND Class_Code = 'SR' ;

Student_ID _________ 123250

Last_Name _________ Phillips

The IN List

First_Name __________ Class_Code __________ Grade_Pt ________ Martin SR 3.00

Using an IN List is a great way of looking for rows that have both a Grade_Pt of 3.0 or 4.0 AND also have a Class_Code of ‘SR’. Only ONE row comes back.

Page 201

Chapter 8

The WHERE Clause

The IN List is an Excellent Technique Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_Table WHERE Grade_Pt IN (2.0, 3.0, 4.0) ; Messages

Student_ID 1 123250 2 234121 3 333450

Garden of Analysis

Last_Name Phillips Thomas Smith

Result 1

First_Name Class_Code Grade_Pt 3.00 Martin SR 4.00 Wendy FR 2.00 Andy SO

The IN Statement avoids retyping the same column name separated by an OR. The IN allows you to search the same column for a list of values. Both queries above are equal, but the IN list is a nice way to keep things easy and organized. Page 202

Chapter 8

The WHERE Clause

IN List vs. OR brings the same Results Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT * FROM Student_Table WHERE Grade_Pt IN (2.0, 3.0, 4.0) ; An IN list is a better technique

Both examples Produce the same results

SELECT FROM WHERE OR OR

*

Student_Table Grade_Pt = 2.0 Grade_Pt = 3.0 Grade_Pt = 4.0 ;

The IN Statement avoids retyping the same column name separated by an OR. The IN allows you to search the same column for a list of values. Both queries above are equal, but the IN list is a nice way to keep things easy and organized. Also, the IN List has the added benefit that the optimizer processes IN lists more efficiently than OR statements. So it may run faster! Page 203

Chapter 8

The WHERE Clause

Using a NOT IN List Nexus Chameleon File Edit View Query Tools Help Web Windows System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_Table WHERE Grade_Pt NOT IN (2.0, 3.0, 4.0) ; Messages

1 2 3 4 5 6

Student_ID 125634 231222 280023 322133 324652 423400

Garden of Analysis

Last_Name Hanson Wilson McRoberts Bond Delaney Larkins

Result 1

First_Name Class_Code Grade_Pt 2.88 Henry FR 3.80 Susie SO 1.90 Richard JR 3.95 Jimmy JR 3.35 Danny SR 0.00 Michael FR

You can also ask to see the results that ARE NOT IN your parameter list. That requires the column name and a NOT IN. Neither the IN nor NOT IN can search for NULLs! Page 204

Chapter 8

The WHERE Clause

A Technique for Handling Nulls with a NOT IN List SELECT FROM WHERE OR

Student_ID _________ 423400 231222 280023 322133 125634 324652 260000

* Student_Table Grade_Pt NOT IN (2.0, 3.0, 4.0) Grade_Pt IS NULL ;

Last_Name _________ Larkins Wilson McRoberts Bond Hanson Delaney Johnson

First_Name __________ Class_Code ________ Grade_Pt __________ Michael Susie Richard Jimmy Henry Danny Stanley

FR SO JR JR FR SR ? The null row now comes back

This is a great technique to look for a NULL when using a NOT IN List.

Page 205

0.00 3.80 1.90 3.95 2.88 3.35 ?

Chapter 8

The WHERE Clause

Another Technique for Handling Nulls with a NOT IN List SELECT * FROM Student_Table WHERE Grade_Pt NOT IN (2.0, 3.0, 4.0) AND Grade_Pt IS NOT NULL ;

Student_ID _________ 423400 231222 280023 322133 125634 324652

Last_Name _________ Larkins Wilson McRoberts Bond Hanson Delaney

First_Name __________ Class_Code ________ Grade_Pt __________ Michael Susie Richard Jimmy Henry Danny

FR SO JR JR FR SR Null rows do NOT come back

This is a great technique to eliminate any NULL values when using a NOT IN List.

Page 206

0.00 3.80 1.90 3.95 2.88 3.35

Chapter 8

The WHERE Clause

BETWEEN is Inclusive SELECT * FROM Student_Table WHERE Grade_Pt BETWEEN 2.0 AND 4.0 ;

Student_ID _________ 125634 231222 324652 322133 234121 333450 123250

Last_Name _________ First_Name __________ Class_Code __________ Grade_Pt ________ Hanson Wilson Delaney Bond Thomas Smith Phillips

Henry Susie Danny Jimmy Wendy Andy Martin

FR SO SR JR FR SO SR

2.88 3.80 3.35 3.95 4.00 2.00 3.00

2.0 and 4.0 come back in the answer set. The BETWEEN statement is therefore inclusive.

This is a BETWEEN. What this allows you to do is see if a column falls in a range. It is inclusive meaning that in our example, we will be getting the rows that also have a 2.0 and 4.0 in their column! Page 207

Chapter 8

The WHERE Clause

NOT BETWEEN is Also Inclusive Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_Table WHERE Grade_Pt NOT BETWEEN 2.0 AND 4.0 ; Messages

Garden of Analysis

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 1.90 1 280023 McRoberts Richard JR 0.00 2 423400 Larkins Michael FR NOT BETWEEN is also inclusive

This is a NOT BETWEEN example. What this allows you to do is see if a column does not fall in a range. It is inclusive meaning that in our example, we will be getting no rows where the grade_pt is between a 2.0 and 4.0 in their column! The 2.0 and the 4.0 will also not return. Page 208

Chapter 8

The WHERE Clause

LIKE command Underscore is Wildcard for one Character Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 Any Last_Name that has an 'a' SELECT * in the second character qualifies FROM Student_Table WHERE Last_Name LIKE '_a%' ; An Underscore _ is a wildcard for a single character when used with the LIKE command Messages

Garden of Analysis

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt 2.88 1 125634 Hanson Henry FR 0.00 2 423400 Larkins Michael FR

The _ underscore sign is a wildcard for any a single character. We are looking for anyone who has an 'a' as the second letter of their last name.

Page 209

Chapter 8

The WHERE Clause

LIKE Command Works Differently on Char Vs Varchar Nexus Chameleon History

File Edit View Query Tools Help Web Windows System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

EXECUTE

Query 1 Query 2 Query 3 SELECT * FROM Student_Table WHERE Last_Name LIKE '%n' ;

Messages

Student_ID 1 125634 2 231222 3 260000

Garden of Analysis

Last_Name Hanson Wilson Johnson

Sandbox ?

New Query

Any Last_Name that ends in 'n'

Result 1

First_Name Class_Code Grade_Pt 2.88 Henry FR 3.80 Susie SO ? Stanley ?

It is important that you know the data type of the column you are using with your LIKE command. VARCHAR and CHAR data differ slightly.

Page 210

Chapter 8

The WHERE Clause

The Ilike Command Is NOT Case Sensitive

The ilike command is NOT case sensitive

With Matrix, the ilike command is NOT case sensitive, but the like command is case sensitive. These rows came back because they have an 'AR' in positions 2 and 3 of their last_name. The 'AR' is not really capitalized, but that is why you use the ilike command. It doesn't care about case! Page 211

Chapter 8

The WHERE Clause

Troubleshooting LIKE Command on Character Data Nexus Chameleon History

File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Student_Table WHERE Last_Name LIKE '%n' ; Messages

Garden of Analysis

Any Last_Name that ends in 'n'

Result 1

Student_ID Last_Name First_Name Class_Code Grade_Pt

No rows returned because Last_Name has a data type of Char(20), so technically it ends with padded spaces.

This is a CHAR(20) data type. That means that any words under 20 characters will pad spaces behind them until they reach 20 characters. You will not get any rows back from this example because technically, no row ends in an ‘N’, but instead ends in a space. Page 212

Chapter 8

The WHERE Clause

Introducing the TRIM Command Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250 Last_Name has a Data Type of CHAR (20)

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT Last_Name FROM Student_Table WHERE TRIM (Last_Name) LIKE '%n' ; Last_Name __________ Hanson Wilson Johnson

This is a CHAR(20) data type. That means that every Last_Name is going to be 20 characters long. Most names are not really 20 characters long, so spaces are padded at the end to ensure filling up all 20 characters. We need to do the TRIM command to remove the leading and trailing spaces. Once the spaces are trimmed, we can find out whose name ends in 'n'. Page 213

Chapter 8

The WHERE Clause

Quiz – What Data is Left Justified and what is Right? SELECT FROM WHERE AND

* Sample_Table Column1 IS NULL Column2 IS NULL ;

Answer Set Column1 Integers are Right Justified!

? Right Justified

Column2

?

Character Data is Left Justified!

Left Justified

Which Column from the Answer Set could have a DATA TYPE of INTEGER, and which could have Character Data?

Page 214

Chapter 8

The WHERE Clause

Numbers are Right Justified and Character Data is Left SELECT FROM WHERE AND

* Sample_Table Column1 IS NULL Column2 IS NULL ;

Answer Set Column1 Integers are Right Justified!

? Right Justified

Column2

?

Character Data is Left Justified!

Left Justified

All Integers will start from the right and move left. Thus, Col1 was defined during the table create statement to hold an INTEGER. The next page shows a clear example. Page 215

Chapter 8

The WHERE Clause

Answer – What Data is Left Justified and what is Right? SELECT Employee_No, First_Name FROM Employee_Table WHERE Employee_No = 2000000;

Answer Set Employee_No ____________ Integers are Right justified!

2000000

First_Name __________ Squiggy

Characters are Left justified!

All Integers will start from the right and move left. All Character data will start from the left and move to the right.

Page 216

Chapter 8

The WHERE Clause

An Example of Data with Left and Right Justification SELECT Student_ID, Last_Name FROM Student_Table ;

Student_ID __________

Integers are Right justified!

423400 125634 280023 260000 231222 234121 324652 123250 322133 333450

Last_Name _______

Larkins Hanson McRoberts Johnson Wilson Thomas Delaney Phillips Bond Smith

Characters are Left justified!

This is how a standard result set will look. Notice that the integer type in Student_ID starts from the right and goes left. Character data type in Last_Name moves left to right like we are used to seeing while reading English.

Page 217

Chapter 8

The WHERE Clause

A Visual of CHARACTER Data vs. VARCHAR Data Character Data on Disk

Last_Name as a Char(20) Jones _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Hanson _ _ _ _ _ _ _ _ _ _ _ _ _ _

Spaces padded at the end

McRoberts _ _ _ _ _ _ _ _ _ _ _ Johnson _ _ _ _ _ _ _ _ _ _ _ _ _ Varchar Data on Disk

Last_Name as a Varchar(20) 2-byte VLI Variable Length Indicator

0

5 Jones

0

6 Hanson

0

9 McRoberts

0

7

No Spaces

Johnson

Character data pads spaces to the right and Varchar uses a 2-byte VLI instead. Page 218

Chapter 8

The WHERE Clause

Use the TRIM command to remove spaces on CHAR Data Character Data on Disk Last_Name as a Char(20) Jones _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Hanson _ _ _ _ _ _ _ _ _ _ _ _ _ _

Spaces padded at the end

Wilson _ _ _ _ _ _ _ _ _ _ _ _ _ _

Johnson _ _ _ _ _ _ _ _ _ _ _ _ _ SELECT Last_Name FROM Student_Table WHERE TRIM (Last_Name) LIKE '%n' ;

Trim removes spaces at the front and back

Last_Name __________ Hanson Wilson Johnson

Last_Name has a Data Type of CHAR (20)

By using the TRIM command on the Last_Name column, you are able to trim off any spaces from the end. Once we use the TRIM on Last_Name, we have eliminated any spaces at the end, so now we are set to bring back anyone with a Last_Name that truly ends in ‘n’! Page 219

Chapter 8

The WHERE Clause

Like and Your Escape Character of Choice

We set the Escape Character to an @ sign

The underscore just after the @ sign is no longer a wildcard.

Sometimes you want to use the LIKE command, but you also want to search for the values of percent (%) or Underscore (_). You can turn off these wildcards by using an escape character. The following example uses the escape character @ to search for strings that include "_" just after the word "start". The @ sign just in front of the underscore (_) means that the underscore is no longer a wildcard, but an actual literal underscore. Page 220

Chapter 8

The WHERE Clause

Like and the Default Escape Character

The \\ are the default escape characters

Sometimes you want to use the LIKE command, but you also want to search for the values of percent (%) or Underscore (_). You can turn off these wildcards by using an escape character. The following example uses the default escape characters \\ to search for strings that include underscore "_" just after the word "start". The \\ just in front of the underscore means that the underscore is no longer a wildcard, but a literal underscore. Page 221

Chapter 8

The WHERE Clause

Similar To Operators . * + ? | ^ $ $ []

() {m} {m,} {m,n} [: :]

Matches any single character. Matches zero or more occurrences. Matches one or more occurrences. Matches zero or one occurrence. Specifies alternative matches; for example, E | H means E or H. Matches the beginning-of-line character. Matches the end-of-line character. $ Matches the end of the string. Brackets specify a matching list, that should match one expression in the list. A caret (^) precedes a nonmatching list, which matches any character except for the expressions represented in the list. Parentheses group items into a single logical item. Repeat the previous item exactly m times. Repeat the previous item m or more times. Repeat the previous item at least m and not more than n times. Matches any character within a POSIX character class. In the following character classes, Actian Matrix supports only ASCII characters: [:alnum:], [:alpha:],[:lower:], [:upper:]

POSIX pattern matching supports the above metacharacters:

Page 222

Chapter 8

The WHERE Clause

Similar To Operators % _ | * + ? {m} {m,} {m,n} () […]

Matches any sequence of zero or more characters. Matches any single character. Denotes alternation (either of two alternatives). Repeat the previous item zero or more times. Repeat the previous item one or more times. Repeat the previous item zero or one time. Repeat the previous item exactly m times. Repeat the previous item m or more times. Repeat the previous item at least m and not more than n times. Parentheses group items into a single logical item. A bracket expression specifies a character class, just as in POSIX regular expressions. SELECT First_Name ,Last_Name FROM Employee_Table WHERE First_Name similar to '%e%|%h%' ORDER BY First_Name;

The following example finds all employees with a First_Name that contain "e" or "h". Regular expression matching using SIMILAR TO is computationally expensive. We recommend using LIKE whenever possible especially when processing a very large number of rows. For example, the following queries are functionally identical, but the query that uses LIKE executes several times faster than the query that uses a regular expression. The next page shows the answer set. Page 223

Chapter 8

The WHERE Clause

Similar To Example with Lower Case Letters

The example above finds all employees with a First_Name that contains an "e" or an "h". Herbert did not return because he has an 'h', but he returned because he does have an 'e' in his First_Name. Page 224

Chapter 8

The WHERE Clause

Similar To Example with Lower and Upper Case Letters

The example above finds all employees with a First_Name that contains an "i" or a capital "H". Notice that "John" is no longer in the answer set (like he was in the previous example). John has an "h" in it, but not a capital "H".

Page 225

Chapter 8

The WHERE Clause

Similar To Example with Multiple Occurrences

The example above finds all employees with a First_Name that contains two l's. Both Billy and William contain two 'l's. Notice that both names have the letter 'l' back to back. Notice that the name William also has the letter 'i' in it twice also, but if I have changed the query to look for 'i' instead of 'l‘, then William would not have come back. The occurrences must follow consecutively. I will show the same query on the next page, but use the 'i' instead of 'l'. Page 226

Chapter 8

The WHERE Clause

Multiple Occurrences Must Be Consecutive

The name William has the letter 'i' in it twice, but no rows came back. This is because the occurrences must follow each other consecutively. There needs to be a name with two occurrences of the letter 'i' back to back! Page 227

Chapter 9

Page 228

Distinct Vs Group By AND TOP

Chapter 9

Distinct Vs Group By AND TOP

Chapter 9 – Distinct Vs Group By AND TOP

“A bird does not sing because it has the answers, it sings because it has a song.” - Anonymous

Page 229

Chapter 9

Distinct Vs Group By AND TOP

The Distinct Command Nexus Chameleon History

File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT DISTINCT Class_Code FROM Student_Table ORDER BY 1 ; Messages

Garden of Analysis

Class_Code 1 2 3 4 5

The keyword DISTINCT won't allow duplicate Class_Code values to return

Result 1

FR JR SO SR ?

DISTINCT eliminates duplicates from returning in the Answer Set.

Page 230

Sandbox

Chapter 9

Distinct Vs Group By AND TOP

Distinct vs. GROUP BY SELECT Class_Code FROM Student_Table GROUP BY Class_Code ORDER BY 1;

SELECT Distinct Class_Code FROM Student_Table ORDER BY 1;

Both examples produce the exact same result

Class_Code _________ FR JR SO SR ? Rules for Distinct Vs. GROUP BY (1) Many Duplicates – use GROUP BY (2) Few Duplicates – use DISTINCT

(3) Space Exceeded - use GROUP BY

Distinct and GROUP BY in the two examples return the same answer set.

Page 231

Chapter 9

Distinct Vs Group By AND TOP

Quiz – How many rows come back from the Distinct? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT Distinct Class_Code, Grade_Pt FROM Student_Table ORDER BY Class_Code, Grade_Pt; How many rows will come back from the above SQL?

Page 232

Chapter 9

Distinct Vs Group By AND TOP

Answer – How many rows come back from the Distinct? SELECT Distinct Class_Code, Grade_Pt FROM Student_Table ORDER BY Class_Code, Grade_Pt ;

Class_Code __________ FR FR FR JR JR SO SO SR SR ?

Grade_Pt ________ 0.00 2.88 4.00 1.90 3.95 2.00 3.80 3.00 3.35 ?

No Rows have the exact same values for both the Class_Code and Grade_Pt. Each row is Distinct!

How many rows will come back from the above SQL? 10. All rows came back. Why? Because there are no exact duplicates that contain a duplicate Class_Code and Duplicate Grade_Pt combined. Each row in the SELECT list is distinct. Page 233

Chapter 9

Distinct Vs Group By AND TOP

TOP Command Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT TOP 3 Last_Name, Class_Code, Grade_Pt FROM sql_class.Student_Table ;

Last_Name Class_Code Grade_Pt __________ __________ ________ Hanson Bond Smith

FR JR SO

2.88 3.95 2.00

In the above example, we brought back 3 rows only. This is because of the TOP 3 statement which means to get an answer set, and then bring back the first 3 rows in that answer set. Because this example does not have an ORDER BY statement, you can consider this example as merely bringing back 3 random rows. Page 234

Chapter 9

Distinct Vs Group By AND TOP

TOP Command is brilliant when ORDER BY is Used! Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

SELECT TOP 3

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

Last_Name, Class_Code, Grade_Pt FROM sql_class.Student_Table WHERE grade_pt is not null ORDER BY Grade_Pt DESC ;

Last_Name Grade_Pt _________ Class_Code _________ ________ Thomas FR 4.00 Bond JR 3.95 Wilson SO 3.80

In the above example, we brought back 3 rows only. This is because of the TOP 3 statement which means to get an answer set, and then bring back the first 3 rows. Because this example uses an ORDER BY statement, the data brought back is from the top 3 students with the highest Grade_Pt. This is the real power of the TOP command. Use it with an ORDER BY!

Page 235

Chapter 9

Distinct Vs Group By AND TOP

What is the Difference between TOP and LIMIT? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

SELECT TOP 3

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

Last_Name, Class_Code, Grade_Pt FROM sql_class.Student_Table WHERE grade_pt is not null ORDER BY Grade_Pt DESC ;

SELECT Last_Name, Class_Code, Grade_Pt FROM sql_class.Student_Table WHERE grade_pt is not null ORDER BY Grade_Pt DESC LIMIT 3;

Both queries above bring back the top 3 students with the highest grade_pt. The TOP command is designed to bring back the top n rows. The LIMIT clause is used more often if you merely want to see a quick sample, but both techniques will work with an ORDER BY statement and both can utilize an ORDER BY statement in the creation of a view.

Page 236

Chapter 10

Page 237

Aggregation

Chapter 10

Aggregation

Chapter 10 - Aggregation

“Matrix climbed Aggregate Mountain and delivered a better way to Sum It.” - Tera-Tom Coffing

Page 238

Chapter 10

Aggregation

Quiz – You calculate the Answer Set in your own Mind Aggregation_Table Employee_No 423400 423401 423402

Salary

100000.00 100000.00 NULL

SELECT AVG(Salary) as "AVG" ,Count(Salary) as SalCnt ,Count(*) as RowCnt FROM Aggregation_Table ;

AVG _____

SalCnt _______ RowCnt ______

Please fill in the values you think will be in the Answer.

What would the result set be from the above query? The next slide shows answers! Page 239

Chapter 10

Aggregation

Answer – You calculate the Answer Set in your own Mind Aggregation_Table Employee_No

Salary

100000.00 100000.00 NULL

423400 423401 423402

Aggregates ignore Null values

SELECT AVG(Salary) as "AVG" ,Count(Salary) as SalCnt ,Count(*) as RowCnt FROM Aggregation_Table ;

AVG _____ 100000.00 Here are your answers!

Page 240

SalCnt RowCnt ______ _______ 2

3

Here are the correct answers

Chapter 10

Aggregation

The 3 Rules of Aggregation Aggregation_Table Employee_No 423400 423401 423402

Salary 100000.00 100000.00 NULL

SELECT AVG(Salary) as "AVG" ,Count(Salary) as SalCnt ,Count(*) as RowCnt FROM Aggregation_Table ;

1) Aggregates Ignore Null Values.

2) Aggregates WANT to come back in one row. 3) You CAN’T mix Aggregates with normal columns unless you use a GROUP BY.

AVG(Salary) = $100000.00

Page 241

Count(Salary) = 2

Count(*) = 3

Chapter 10

Aggregation

There are Five Aggregates Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,COUNT(*) FROM Employee_Table ; Messages

Garden of Analysis

MIN 1 32800.50

The five aggregates are listed above.

Page 242

Sandbox

History

MAX 64300.00

These are the five aggregates

Result 1 SUM 421039.38

AVG 46782.15

Aggregate are designed to return a single row

COUNT 9

Chapter 10

Aggregation

Quiz – How many rows come back? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name __________ Jones Squiggy Smythe Richard Chambers Mandee Coffing Billy Smith John Larkins Loraine Strickling Cletus Reilly William Harrison Herbert

SELECT MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,Count(*) FROM Employee_Table ;

How many rows will the above query produce in the result set?

Page 243

Salary _______ 32800.50 64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00

How many rows come back?

Chapter 10

Aggregation

Answer – How many rows come back? Nexus Chameleon File Edit View Query Tools Help Web Windows System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

Sandbox

History EXECUTE

?

Query 1 Query 2 Query 3

SELECT MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,COUNT(*) FROM Employee_Table ; Messages

Garden of Analysis

MIN 1 32800.50

MAX 64300.00

These are the five aggregates

Result 1

SUM 421039.38

AVG 46782.15

Aggregate are designed to return a single row

How many rows will the above query produce in the result set? The answer is one.

Page 244

New Query

COUNT 9

Chapter 10

Aggregation

Troubleshooting Aggregates Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name __________ Jones Squiggy Smythe Richard Chambers Mandee Coffing Billy Smith John Larkins Loraine Strickling Cletus Reilly William Harrison Herbert

SELECT Dept_No ,MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,Count(*) FROM Employee_Table ;

Salary _______ 32800.50 64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00

NON-Aggregate

If you have a normal column (non aggregate) in your query, you must have a corresponding GROUP BY statement.

Page 245

Chapter 10

Aggregation

GROUP BY when Aggregates and Normal Columns Mix

NON-Aggregate

Group By Needed

If you have a normal column (non aggregate) in your query, you must have a corresponding GROUP BY statement. Page 246

Chapter 10

Aggregation

GROUP BY delivers one row per Group

Group By Needed

Dept_No ________ 10 100 200 300 400 ?

Min(Salary) __________ 64300.00 48850.00 41888.88 40200.00 36000.00 32800.50

NON-Aggregate SELECT Dept_No ,MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,Count(*) FROM Employee_Table GROUP BY Dept_No ORDER BY Dept_No ;

Max(Salary) __________ 64300.00 48850.00 48000.00 40200.00 54500.00 32800.50

AVG(Salary) Count(*) Sum(Salary) ___________ _______ __________ 64300.00 1 64300.00 48850.00 1 48850.00 44944.44 2 89888.88 40200.00 1 40200.00 48333.33 3 145000.00 32800.50 1 32800.50

Group By Dept_No command allow for the Aggregates to be calculated per Dept_No. The data has also been sorted with the ORDER BY statement.

Page 247

Chapter 10

Aggregation

GROUP BY Dept_No or GROUP BY 1 the same thing SELECT Dept_No ,MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,Count(*) FROM Employee_Table GROUP BY Dept_No ORDER BY Dept_No;

Dept_No ________ ? 10 100 200 300 400

Min(Salary) __________ 32800.50 64300.00 48850.00 41888.88 40200.00 36000.00

Both Queries are exactly the same

Max(Salary) __________ 32800.50 64300.00 48850.00 48000.00 40200.00 54500.00

SELECT Dept_No ,MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,Count(*) FROM Employee_Table GROUP BY 1 ORDER BY 1;

Sum(Salary) ___________ AVG(Salary) Count(*) __________ _______ 32800.50 1 32800.50 64300.00 1 64300.00 48850.00 1 48850.00 44944.44 2 89888.88 40200.00 1 40200.00 48333.33 3 145000.00

Both queries above produce the same result. The GROUP BY allows you to either name the column or use the number in the SELECT list just like the ORDER BY. Page 248

Chapter 10

Aggregation

Limiting Rows and Improving Performance with WHERE Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT Dept_No, MIN (Salary), MAX (Salary), SUM (Salary) , AVG (Salary) , COUNT(*) WHERE Clause acts FROM Employee_Table as a filter before any WHERE Dept_No IN (200, 400) Calculations are done GROUP BY Dept_No Order by 1 ;

Will Dept_No 300 be calculated? Of course you know it will…NOT!

Page 249

Chapter 10

Aggregation

WHERE Clause in Aggregation limits unneeded Calculations SELECT Dept_No, MIN (Salary), MAX (Salary), SUM (Salary) , AVG (Salary) , COUNT(*) WHERE Clause acts FROM Employee_Table as a filter before any WHERE Dept_No IN (200, 400) Calculations are done GROUP BY Dept_No Order by 1 ;

Dept_No Min(Salary) Max(Salary) Sum(Salary) AVG(Salary) Count(*) ________ __________ __________ __________ ___________ ________ 200 41888.88 48000.00 89888.88 2 44944.44 400 36000.00 54500.00 145000.00 3 48333.33 The system eliminates reading any other Dept_No’s other than 200 and 400. This means that only Dept_No’s of 200 and 400 will come off the disk to be calculated.

Page 250

Chapter 10

Aggregation

Keyword HAVING tests Aggregates after they are Totaled SELECT Dept_No, MIN (Salary), MAX (Salary), SUM (Salary) , AVG (Salary) , COUNT(*) FROM Employee_Table WHERE Dept_No in (200, 400) GROUP BY Dept_No HAVING Count(*) > 2 ;

HAVING Clause acts as a filter on all Aggregates after they are totaled.

Previous Answer Set Dept_No Min(Salary) Max(Salary) Sum(Salary) AVG(Salary) Count(*) ________ __________ __________ __________ ___________ ________ 200 41888.88 48000.00 89888.88 2 44944.44 400 36000.00 54500.00 145000.00 3 48333.33

NEW Answer Set

??????????????

Can you calculate what the new Answer Set will be after the HAVING Clause is implemented?

The HAVING Clause only works on Aggregate Totals. The WHERE filters rows to be excluded from calculation, but the HAVING filters the Aggregate totals after the calculations, thus eliminating certain Aggregate totals.

Page 251

Chapter 10

Aggregation

Keyword HAVING is like an Extra WHERE Clause for Totals SELECT Dept_No, MIN (Salary), MAX (Salary), SUM (Salary) , AVG (Salary) , COUNT(*) FROM Employee_Table WHERE Dept_No in (200, 400) GROUP BY Dept_No HAVING Count(*) > 2 ;

HAVING clause acts as a filter on all aggregates after they are totaled.

Previous Answer Set (without HAVING statement) Dept_No Min(Salary) Max(Salary) Sum(Salary) AVG(Salary) Count(*) ________ __________ __________ __________ ___________ ________ 200 41888.88 48000.00 89888.88 2 44944.44 400 36000.00 54500.00 145000.00 3 48333.33

New Answer Set using the HAVING Statement Dept_No Max(Salary) __________ Sum(Salary) ___________ AVG(Salary) ________ Count(*) ________ Min(Salary) __________ __________ 400 36000.00 145000.00 54500.00 3 48333.33

The HAVING Clause only works on Aggregate Totals, and in the above example, only Count(*) > 2 can return.

Page 252

Chapter 11

Page 253

Join Functions

Chapter 11

Join Functions

Chapter 11 – Join Functions

“When spider webs unite they can tie up a lion.” - African Proverb

Page 254

Chapter 11

Join Functions

A Two-Table Join Using Traditional Syntax Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

SELECT Customer_Table.Customer_Number The column ,Customer_Name Customer_Number is in both ,Order_Number tables. It must be fully ,Order_Total qualified with the table name FROM Customer_Table, or it errors. Order_Table WHERE Customer_Table.Customer_Number = Order_Table.Customer_Number ; Customer_Number is the column that has matching data in both tables. This is called the "Join Condition"

A Join combines columns on the report from more than one table. The example above joins the Customer_Table and the Order_Table together. The most complicated part of any join is the JOIN CONDITION. The JOIN CONDITION is which Column from each table is a match. In this case, Customer_Number is a match that establishes the relationship, so this join will happen on matching Customer_Number columns. Page 255

Chapter 11

Join Functions

A two-table join using Non-ANSI Syntax with Table Alias Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

SELECT The column Customer_Number is in both tables. It must be fully qualified or it errors.

Cust.Customer_Number ,Customer_Name We alias the table ,Order_Number names to shorten the typing when ,Order_Total fully qualifying a FROM Customer_Table as Cust, column. Order_Table as ORD WHERE Cust.Customer_Number = Ord.Customer_Number;

A Join combines columns on the report from more than one table. The example above joins the Customer_Table and the Order_Table together. The most complicated part of any join is the JOIN CONDITION. The JOIN CONDITION means which Column from each table is a match. In this case, Customer_Number is a match that establishes the relationship. Page 256

Chapter 11

Join Functions

You Can Fully Qualify All Columns Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

The column Customer_Number is in both tables. It must be fully qualified or it errors.

SELECT

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

A good practice is

Cust.Customer_Number to fully qualify all ,Cust.Customer_Name columns in the SELECT list for ,Ord.Order_Number clarity to other ,Ord.Order_Total users. FROM Customer_Table as Cust, Order_Table as ORD WHERE Cust.Customer_Number = Ord.Customer_Number ;

Whenever a column is in both tables, you must fully qualify it when doing a join. You don't have to fully qualify tables that are only in one of the tables because the system knows which table that particular column is in. You can choose to fully qualify every column if you like. This is a good practice because it is more apparent which columns belong to which tables for anyone else looking at your SQL.

Page 257

Chapter 11

Join Functions

A two-table join using ANSI Syntax Order_Table

Customer_Table

Order_Total Customer_Number _________ Order_Number ______________ Customer_Name Customer_Number ___________ _____________ ______________ 11111111 31313131 31323134 57896883 87323456

ON Keyword is used instead of WHERE

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

SELECT Cust.Customer_Number, Customer_Name, Order_Number, Order_Total FROM Customer_Table as Cust INNER JOIN Order_Table as ORD Cust.Customer_Number ON = Ord.Customer_Number ;

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

INNER JOIN Keyword replaces the comma

This is the same join as the previous slide except it is using ANSI syntax. Both will return the same rows with the same performance. Rows are joined when the Customer_Number matches on both tables, but non-matches won’t return. Page 258

Chapter 11

Join Functions

Both Queries have the same Results and Performance Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

Traditional Syntax SELECT Cust.Customer_Number, Customer_Name, Order_Number, Order_Total FROM Customer_Table as Cust, Order_Table as ORD WHERE Cust.Customer_Number = Ord.Customer_Number ;

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

ANSI Syntax SELECT Cust.Customer_Number, Customer_Name, Order_Number, Order_Total FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number ;

Both of these syntax techniques bring back the same result set and have the same performance. The INNER JOIN is considered ANSI. Which one does Outer Joins? Page 259

Chapter 11

Join Functions

Quiz – Can You Finish the Join Syntax? Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT First_Name, Last_Name, Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON Finish the Join

Finish this join by placing the missing SQL in the proper place!

Page 260

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Chapter 11

Join Functions

Answer to Quiz – Can You Finish the Join Syntax? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Department_Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Primary Key

Foreign Key

SELECT First_Name, Last_Name, Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

This query is ready to run.

Page 261

Dept_No is the column that both tables have in common. This is called a Primary Key/Foreign Key relationship

Chapter 11

Join Functions

Quiz – Can You Find the Error? Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________

SELECT First_Name ,Last_Name ,Dept_No ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

This query has an error! Can you find it?

Page 262

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Can you find the error?

Chapter 11

Join Functions

Answer to Quiz – Can You Find the Error? Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

The column Dept_No is in both tables. It needs to be fully qualified as E.Dept_No or D.Dept_No

Department_Table Dept_No ________________ Department_Name ________

SELECT First_Name ,Last_Name ,E.Dept_No ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

If a column in the SELECT list is in both tables, you must fully qualify it.

Page 263

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Chapter 11

Join Functions

Super Quiz – Can You Find the Difficult Error? Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

SELECT First_Name ,Last_Name ,E.Dept_No ,Department_Name Can you find FROM Employee_Table as E the error? INNER JOIN Department_Table as D ON Employee_Table.Dept_No = D.Dept_No ;

This query has an error! Can you find it?

Page 264

Marketing Research and Dev Sales Customer Support Human Resources

Chapter 11

Join Functions

Answer to Super Quiz – Can You Find the Difficult Error? Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

SELECT First_Name, Last_Name, E.Dept_No ,Department_Name Once you FROM Employee_Table as E alias a table INNER JOIN (as E) Department_Table as D ON Employee_Table.Dept_No = D.Dept_No ; You must fully qualify with E.Dept_No (Not Employee_Table.Dept_No) (This query thinks there are three tables (E, D, and Employee_Table)

If a column in the SELECT list is in both tables, you must fully qualify it.

Page 265

Chapter 11

Join Functions

Quiz – Which rows from both tables won’t Return? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Department_Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT E.First_Name ,E.Last_Name ,D.Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

Dept_No ________________ Department_Name ________

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

This inner join will return all rows that have a matching Dept_No in both tables. Which rows won't return?

An Inner Join returns matching rows, but did you know an Outer Join returns both matching rows and nonmatching rows? You will understand soon!

Page 266

Chapter 11

Join Functions

Answer to Quiz – Which rows from both tables Won’t Return? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Department_Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT E.First_Name ,E.Last_Name ,D.Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

Dept_No ________________ Department_Name ________ 100 200 300 400 500

1 2

Squiggy Jones has a NULLDept_No

3

No Employees work in Department 500

Richard Smythe has an invalid Dept_No 10

The bottom line is that the three rows excluded did not have a matching Dept_No. Page 267

Marketing Research and Dev Sales Customer Support Human Resources

Chapter 11

Join Functions

LEFT OUTER JOIN Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

1st Table after FROM is always the LEFT Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

Department_Table Dept_No ________________ Department_Name ________

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Since we are doing a Left Outer Join, the Employee_Table is referred to as the outer table.

This is a LEFT OUTER JOIN. That means that all rows from the LEFT Table will appear in the report regardless if it finds a match on the right table.

Page 268

Chapter 11

Join Functions

LEFT OUTER JOIN Results Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Department_Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

First_Name __________ Mandee Herbert William Loraine Squiggy Richard Cletus Billy John

Dept_No ________________ Department_Name ________ 100 200 300 400 500

Department_Name ________________ Marketing Customer Support Customer Support Sales Nulls show ? mismatches ? Customer Support Research and Dev Research and Dev

Marketing Research and Dev Sales Customer Support Human Resources

The matching rows return just like an inner join, but orphaned rows from the Left table also return.

A LEFT Outer Join Returns all rows from the LEFT Table including all Matches. If a LEFT row can’t find a match, a NULL is placed on right columns not found! Page 269

Chapter 11

Join Functions

Left Outer Joins Compatible with Oracle SELECT Customer_Name, Order_Date, Order_Total FROM Customer_Table as C LEFT OUTER JOIN Order_Table as O ON C.Customer_Number = O.Customer_Number ;

SELECT Customer_Name, Order_Date, Order_Total FROM Customer_Table as C, Order_Table as O WHERE C.Customer_Number = O.Customer_Number (+) ; “Can't died when Could was born.” -- Author Unknown

Matrix supports outer joins using both the ANSI syntax and the Oracle syntax. I think Oracle joins are a real plus!

Page 270

Chapter 11

Join Functions

RIGHT OUTER JOIN Employee_Table Dept_No Employee_No ________ ____________ ? 2000000 10 1000234 100 1232578 200 1324657 200 1333454 300 2312225 400 1121334 400 2341218 400 1256349

2nd Table after FROM is always the RIGHT Table

Salary First_Name _______ Last_Name __________ __________ 32800.50 Squiggy Jones 64300.00 Richard Smythe 48850.00 Chambers Mandee 41888.88 Billy Coffing 48000.00 John Smith 40200.00 Loraine Larkins 54500.00 Strickling Cletus 36000.00 William Reilly 54500.00 Herbert Harrison

SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E RIGHT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

Department_Table Department_Name Dept_No ________________ ________

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Since we are doing a Right Outer Join, the Department_Table is referred to as the outer table.

This is a RIGHT OUTER JOIN. That means that all rows from the RIGHT Table will appear in the report regardless if it finds a match with the LEFT Table.

Page 271

Chapter 11

Join Functions

RIGHT OUTER JOIN Example and Results Department_Table

Employee_Table

Dept_No Employee_No ________ ____________ ? 2000000 10 1000234 100 1232578 200 1324657 200 1333454 300 2312225 400 1121334 400 2341218 400 1256349

Salary First_Name _______ Last_Name __________ __________ 32800.50 Squiggy Jones 64300.00 Richard Smythe 48850.00 Chambers Mandee 41888.88 Billy Coffing 48000.00 John Smith 40200.00 Loraine Larkins 54500.00 Strickling Cletus 36000.00 William Reilly 54500.00 Herbert Harrison

SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E RIGHT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ; Nulls show mismatches

Department_Name Dept_No ________________ ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

First_Name ________________ __________ Department_Name Mandee Herbert William Loraine Cletus Billy John ?

Marketing Customer Support Customer Support Sales Customer Support Research and Dev Research and Dev Human Resources

The matching rows return just like an inner join, but orphaned rows from the Right table also return.

All rows from the Right Table were returned with matches, but since Dept_No 500 didn’t have a match, the system put a NULL Value for Left Column values.

Page 272

Chapter 11

Join Functions

Right Outer Joins Compatible with Oracle SELECT Customer_Name, Order_Date, Order_Total FROM Customer_Table as C RIGHT OUTER JOIN Order_Table as O ON C.Customer_Number = O.Customer_Number ;

SELECT Customer_Name, Order_Date, Order_Total FROM Customer_Table as C, Order_Table as O WHERE C.Customer_Number (+) = O.Customer_Number ;

Matrix supports outer joins using both the ANSI syntax and the Oracle syntax.

Page 273

Chapter 11

Join Functions

FULL OUTER JOIN Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E FULL OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

Department_Table Dept_No ________________ Department_Name ________

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Since we are doing a Full Outer Join, both tables are referred to as the outer table.

The is a FULL OUTER JOIN. That means that all rows from both the RIGHT and LEFT Table will appear in the report regardless if it finds a match. Page 274

Chapter 11

Join Functions

FULL OUTER JOIN Results Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Department_Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E FULL OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;

First_Name __________ Mandee Herbert William Loraine Squiggy Richard Cletus Billy John ?

Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Department_Name ________________ Marketing Customer Support Customer Support Sales ? ? Customer Support Research and Dev Research and Dev Human Resources

The FULL Outer Join Returns all rows from both Tables. NULLs show the flaws!

Page 275

All rows return from both tables on a Full Outer Join

Chapter 11

Join Functions

Which Tables are the Left and which are the Right? Fill in the blank. Is the SELECT Cla.Claim_Id, table a Left Table or a Cla.Claim_Date, Right Table? SUB.Last_Name, SUB.First_Name, Claims __________ "ADD".Phone, Providers __________ Services __________ SER.Service_Pay, Subscribers __________ PRO.Provider_Code, Addresses __________ PRO.Provider_Name FROM CLAIMS Cla LEFT OUTER JOIN PROVIDERS PRO ON Cla.Provider_No = PRO.Provider_Code LEFT OUTER JOIN SERVICES SER ON Cla.Claim_Service = SER.Service_Code LEFT OUTER JOIN SUBSCRIBERS SUB ON Cla.Subscriber_No = SUB.Subscriber_No AND Cla.Member_No = SUB.Member_No LEFT OUTER JOIN ADDRESSES "ADD" ON SUB.Subscriber_No = "ADD".Subscriber_No;

Can you list which tables above are left tables and which tables are right tables?

Page 276

Chapter 11

Join Functions

Answer - Which Tables are the Left and which are the Right? Fill in the blank. SELECT Cla.Claim_Id, Is the table a Left Cla.Claim_Date, Table or a Right SUB.Last_Name, Table? SUB.First_Name, Claims Left "ADD".Phone, Providers Right SER.Service_Pay, Services Right PRO.Provider_Code, Subscribers Right PRO.Provider_Name Addresses Right FROM CLAIMS Cla LEFT OUTER JOIN PROVIDERS PRO ON Cla.Provider_No = PRO.Provider_Code LEFT OUTER JOIN SERVICES SER ON Cla.Claim_Service = SER.Service_Code LEFT OUTER JOIN SUBSCRIBERS SUB ON Cla.Subscriber_No = SUB.Subscriber_No AND Cla.Member_No = SUB.Member_No LEFT OUTER JOIN ADDRESSES "ADD" ON SUB.Subscriber_No = "ADD".Subscriber_No;

There is always only one Left table (the first table after the FROM clause) All tables after the first table are each Right Tables.

Tables are joined two at a time. The result from each join remains the Left Table

The first table is always the left table and the rest are right tables. The results from the first two tables being joined becomes the left table. Page 277

Chapter 11

Join Functions

INNER JOIN with Additional AND Clause Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E, Department_Table as D WHERE E.Dept_No = D.Dept_No AND Department_Name like 'Marke%' ;

The additional AND is performed first in order to eliminate unwanted data, so the join is less intensive than joining everything first and then eliminating rows that don't qualify.

Page 278

Chapter 11

Join Functions

ANSI INNER JOIN with Additional AND Clause Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No AND Department_Name like 'Marke%' ;

The additional AND is performed first in order to eliminate unwanted data, so the join is less intensive than joining everything first and then eliminating after. Page 279

Chapter 11

Join Functions

ANSI INNER JOIN with Additional WHERE Clause Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No WHERE Department_Name like 'Marke%' ;

The additional WHERE is performed first in order to eliminate unwanted data, so the join is less intensive than joining everything first and then eliminating. Page 280

Chapter 11

Join Functions

OUTER JOIN with Additional WHERE Clause Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Department_Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT First_Name, Last_Name, Department_Name FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No WHERE E.Dept_No = 100 ;

Dept_No ________________ Department_Name ________

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

__________ First_Name Department_Name _______________ Marketing Mandee

Only Mandee Chambers is in Dept_No 100

The additional WHERE is performed last on Outer Joins. All rows will be joined first and then the additional WHERE clause filters after the join takes place.

Page 281

Chapter 11

Join Functions

OUTER JOIN with Additional AND Clause Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

SELECT First_Name ,Department_Name AS Dname FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No AND E.Dept_No = 100 ;

The additional AND is performed in conjunction with the ON statement on Outer Joins. All rows will be evaluated with the ON clause and the AND combined.

Page 282

Chapter 11

Join Functions

OUTER JOIN with Additional AND Clause Results Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

OUTER Join with additional AND Clause SELECT First_Name ,Department_Name AS Dname FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No AND E.Dept_No = 100 ;

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

First_Name __________ Mandee Herbert William Loraine Squiggy Richard Cletus Billy John

Dname ________ Marketing ? ? ? ? ? ? ? ?

The additional AND is performed in conjunction with the ON statement on Outer Joins. This can surprise you. Only Mandee is in Dept_No 100, so she showed up like expected, but an outer join returns non-matches also. Ouch!!! Page 283

Chapter 11

Join Functions

Quiz – Why is this considered an INNER JOIN? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

SELECT First_Name, Department_Name FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No AND D.Dept_No = 400 ;

This is considered an INNER JOIN because we are doing a LEFT OUTER JOIN on the Employee_Table and then filtering with the AND for a column in the right table!

Page 284

Chapter 11

Join Functions

The DREADED Product Join Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

No Join Condition Linking the Two Tables!

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E, Department_Table as D WHERE Department_Name like '%m%' Order by 1, 2, 3;

This query becomes a Product Join because it does not possess any JOIN Conditions (Join Keys). Every row from one table is compared to every row of the other table, and quite often, the data is not what you intended to get back.

Page 285

Chapter 11

Join Functions

The DREADED Product Join Results SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E, Department_Table as D WHERE Department_Name like '%m%' Order by 1, 2, 3;

No Join Condition Linking the Two Tables!

First_Name _________ Last_Name _________ Department_Name ________________

Not all rows are displayed

Billy Billy Billy Cletus Cletus Cletus Herbert Herbert Herbert

Coffing Coffing Coffing Strickling Strickling Strickling Harrison Harrison Harrison

Customer Support Human Resources Marketing Customer Support Human Resources Marketing Customer Support Human Resources Marketing

27 Rows came back. Nine employees with each working in three different departments. This data is WRONG!

How can Billy Coffing work in 3 different departments?

A Product Join is often a mistake! 3 Department rows had an ‘m’ in their name, so these were joined to every employee, and the information is worthless.

Page 286

Chapter 11

Join Functions

The Horrifying Cartesian Product Join Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

No WHERE Clause in the join!

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E, Department_Table as D

A Cartesian Product Join is usually a big mistake. Page 287

Department_Table

Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

This joins every row from one table to every row of another table. 9 rows multiplied by 5 rows = 45 rows of complete nonsense!

Chapter 11

Join Functions

The ANSI Cartesian Join will ERROR Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

No ON Clause in the join!

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D

Department_Table Dept_No ________________ Department_Name ________

100 200 300 400 500

This query Errors because ANSI forbids joins without ON clauses.

This causes an error. ANSI won’t let this run unless a join condition is present.

Page 288

Marketing Research and Dev Sales Customer Support Human Resources

Chapter 11

Join Functions

Quiz – Do these Joins Return the Same Answer Set? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Query 1 SELECT First_Name, Department_Name FROM Employee_Table INNER JOIN Department_Table ;

Do these two queries produce the same result?

Page 289

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Query 2 SELECT First_Name, Department_Name FROM Employee_Table, Department_Table ;

Chapter 11

Join Functions

Answer – Do these Joins Return the Same Answer Set? Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Department_Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Query 1 SELECT First_Name, Department_Name FROM Employee_Table INNER JOIN Department_Table ; This query errors

Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Query 2 SELECT First_Name, Department_Name FROM Employee_Table, Department_Table ; Cartesian product join occurs

Do these two queries produce the same result? No, Query 1 Errors due to ANSI syntax and no ON Clause, but Query 2 Product Joins to bring back junk! Page 290

Chapter 11

Join Functions

The CROSS JOIN Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

A Cross Join is the ANSI equivalent to a Product Join

Only a WHERE will work. ON Will NOT!

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

SELECT Customer_Name, Order_Number FROM Customer_Table CROSS JOIN Order_Table WHERE Order_Number = 123456 ORDER BY 1 ;

This query becomes a Product Join because a Cross Join is an ANSI Product Join. It will compare every row from the Customer_Table to Order_Number 123456 in the Order_Table. Check out the Answer Set on the next page.

Page 291

Chapter 11

Join Functions

The CROSS JOIN Answer Set Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Answer Set

SELECT Customer_Name, Order_Number FROM Customer_Table CROSS JOIN Order_Table WHERE Order_Number = 123456 ORDER BY 1 ;

Customer_Name ______________ Order_Number _____________ Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

This Cross Join produces information that just isn’t worth anything quite often!

Page 292

123456 123456 123456 123456 123456

Chapter 11

Join Functions

The Self Join Employee_Table2 Employee_No Dept_No Last_Name First_Name Salary ____________ _______ _________ _________ _______ 1232578 100 Chambers Mandee 48850.00 54500.00 1256349 400 Harrison Herbert 2341218 400 Reilly William 36000.00 54500.00 1121334 400 Strickling Cletus 2312225 300 Larkins Loraine 40200.00 2000000 ? Jones Squiggy 32800.50 1000234 10 Smythe Richard 32800.00 41888.88 1324657 200 Coffing Billy 48000.00 1333454 200 Smith John SELECT Mgrs.Dept_No , Mgrs.Last_Name as MgrName , Mgrs.Salary as MgrSal , Emps.Last_Name as EmpName , Emps.Salary as Empsal FROM Employee_Table2 as Emps, Employee_Table2 as Mgrs WHERE Emps.Dept_No = Mgrs.Dept_No AND Mgrs.Mgr = 'Y' AND Emps.Salary > Mgrs.Salary ;

Mgr ____ Y N Y N Y N N N Y

Which Workers make a bigger Salary than their Manager?

A Self Join gives itself 2 different Aliases, which is then seen as two different tables.

Page 293

Chapter 11

Join Functions

The Self Join with ANSI Syntax Employee_Table2 Employee_No Dept_No Last_Name First_Name Salary ____________ _______ _________ _________ _______ 1232578 100 Chambers Mandee 48850.00 54500.00 1256349 400 Harrison Herbert 2341218 400 Reilly William 36000.00 54500.00 1121334 400 Strickling Cletus 2312225 300 Larkins Loraine 40200.00 2000000 ? Jones Squiggy 32800.50 1000234 10 Smythe Richard 32800.00 41888.88 1324657 200 Coffing Billy 48000.00 1333454 200 Smith John SELECT Mgrs.Dept_No , Mgrs.Last_Name as MgrName , Mgrs.Salary as MgrSal , Emps.Last_Name as EmpName , Emps.Salary as Empsal FROM Employee_Table2 as Emps INNER JOIN Employee_Table2 as Mgrs ON Emps.Dept_No = Mgrs.Dept_No WHERE Mgrs.Mgr = 'Y' AND Emps.Salary > Mgrs.Salary ;

Mgr ____ Y N Y N Y N N N Y

Which Workers make a bigger Salary than their Manager?

A Self Join gives itself 2 different Aliases, which is then seen as two different tables.

Page 294

Chapter 11

Join Functions

Quiz – Will both queries bring back the same Answer Set? Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

SELECT * FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number WHERE Customer_Name like 'Billy%' ORDER BY 1;

Will both queries bring back the same result set?

Page 295

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

SELECT * FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number AND Customer_Name like 'Billy%' ORDER BY 1;

Chapter 11

Join Functions

Answer – Will both queries bring back the same Answer Set? Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

SELECT * FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number WHERE Customer_Name like 'Billy%' ORDER BY 1;

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

SELECT * FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number AND Customer_Name like 'Billy%' ORDER BY 1;

Will both queries bring back the same result set? Yes! because they’re both inner joins.

Page 296

12347.53 8005.91 5111.47 15231.62 23454.84

Chapter 11

Join Functions

Quiz – Will both queries bring back the same Answer Set? Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

SELECT * FROM Customer_Table as Cust LEFT OUTER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number WHERE Customer_Name like 'Billy%' ORDER BY 1;

Will both queries bring back the same result set?

Page 297

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

SELECT * FROM Customer_Table as Cust LEFT OUTER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number AND Customer_Name like 'Billy%' ORDER BY 1;

Chapter 11

Join Functions

Answer – Will both queries bring back the same Answer Set? Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

SELECT * FROM Customer_Table as Cust LEFT OUTER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number WHERE Customer_Name like 'Billy%' ORDER BY 1;

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

SELECT * FROM Customer_Table as Cust LEFT OUTER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number AND Customer_Name like 'Billy%' ORDER BY 1;

Will both queries bring back the same result set? NO! The WHERE is performed last.

Page 298

12347.53 8005.91 5111.47 15231.62 23454.84

Chapter 11

Join Functions

How would you Join these two tables? Course_Table

Course_ID Course_Name Credits _________ _________________ ______ Seats ____ 100 Database Concepts 3 50 200 Introduction to SQL 3 20 210 Advanced SQL 3 22 220 V2R3 SQL Features 2 25 300 Physical Database Design 4 20 400 Database Administration 4 16 Student_Table Student_ID __________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name __________ Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips

First_Name Class_Code Grade_Pt __________ __________ ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00

How would you join these two tables together? You can't do it. There is no matching column with like data. There is no Primary Key/Foreign Key relationship between these two tables. That is why you are about to be introduced to a bridge table. It is formally called an Associative table or a Lookup table.

Page 299

Chapter 11

Join Functions

An Associative Table is a Bridge that Joins Two Tables Associative

Course_Table

Table

Course_ID Course_Name Credits _________ _________________ ______ Seats ____ 100 Database Concepts 3 50 200 Introduction to SQL 3 20 210 Advanced SQL 3 22 220 V2R3 SQL Features 2 25 300 Physical Database Design 4 20 400 Database Administration 4 16

Student_Course_Table Student_ID Course_ID 280023 231222 125634 231222 125634 322133 125634 322133 324652 333450 260000 333450 234121 123250

210 210 100 220 200 220 220 300 200 500 400 400 100 100

Student_Table Student_ID __________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name __________ Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips

First_Name Class_Code Grade_Pt __________ __________ ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00

The Associative Table is a bridge between the Course_Table and Student_Table.

Page 300

Chapter 11

Join Functions

Quiz – Can you Write the 3-Table Join? Associative

Course_Table

Table

Course_ID Course_Name Credits _________ _________________ ______ Seats ____ 100 Database Concepts 3 50 200 Introduction to SQL 3 20 210 Advanced SQL 3 22 220 V2R3 SQL Features 2 25 300 Physical Database Design 4 20 400 Database Administration 4 16

Student_Course_Table Student_ID Course_ID 280023 231222 125634 231222 125634 322133 125634 322133 324652 333450 260000 333450 234121 123250

210 210 100 220 200 220 220 300 200 500 400 400 100 100

Student_Table Student_ID __________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name __________ Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips

First_Name Class_Code Grade_Pt __________ __________ ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00

SELECT ALL Columns from the Course_Table and Student_Table and Join them.

Page 301

Chapter 11

Join Functions

Answer to Quiz – Can you Write the 3-Table Join? Student_Course_Table Student_Table

Student_ID Last_Name First_Name Class_Code Grade_Pt

Course_Table

Student_ID Course_ID

SELECT S.*, C.* FROM Student_Table as S, Course_Table as C, Student_Course_Table as SC Where S.Student_ID = SC.Student_ID AND C.Course_ID = SC.Course_ID ;

Course_ID Course_Name Credits Seats

Notice the * technique of getting ALL columns from both tables!

The Associative Table is a bridge between the Course_Table and Student_Table, and its sole purpose is to join these two tables together. .

Page 302

Chapter 11

Join Functions

Quiz – Can you Write the 3-Table Join to ANSI Syntax? Student_Course_Table Student_Table

Student_ID Last_Name First_Name Class_Code Grade_Pt

Course_Table Student_ID Course_ID

Course_ID Course_Name Credits Seats

SELECT S.*, C.* FROM Student_Table as S, Course_Table as C, Student_Course_Table as SC Where S.Student_ID = SC.Student_ID AND C.Course_ID = SC.Course_ID ; Convert this query to ANSI syntax

Please re-write the above query using ANSI Syntax.

Page 303

Chapter 11

Join Functions

Answer – Can you Write the 3-Table Join to ANSI Syntax? Student_Course_Table

Student_Table Student_ID Last_Name First_Name Class_Code Grade_Pt

Course_Table Student_ID Course_ID

Course_ID Course_Name Credits Seats

ANSI Syntax Traditional Syntax SELECT S.*, C.* FROM Student_Table as S, Course_Table as C, Student_Course_Table as SC Where S.Student_ID = SC.Student_ID AND C.Course_ID = SC.Course_ID ;

Select S.*, C.* From Student_Table as S INNER JOIN Student_Course_Table as SC ON S.Student_ID = SC.Student_ID INNER JOIN Course_Table as C ON C.Course_ID = SC.Course_ID;

The above queries show both traditional and ANSI form for this three table join.

Page 304

Chapter 11

Join Functions

Quiz – Can you Place the ON Clauses at the End? Student_Course_Table Student_Table Student_ID Last_Name First_Name Class_Code Grade_Pt

Course_Table Student_ID Course_ID

Course_ID Course_Name Credits Seats

ANSI Syntax Select S.*, C.* From Student_Table as S INNER JOIN Student_Course_Table as SC ON S.Student_ID = SC.Student_ID INNER JOIN Course_Table as C ON C.Course_ID = SC.Course_ID;

Please re-write the above query and place both ON Clauses at the end.

Page 305

Can you rewrite this and place all of the ON clauses at the end?

Chapter 11

Join Functions

Answer – Can you Place the ON Clauses at the End? Student_Course_Table Student_Table Student_ID Last_Name First_Name Class_Code Grade_Pt

Course_Table Student_ID Course_ID

Course_ID Course_Name Credits Seats

Select S.*, C.* The trick is to From Student_Table as S put the first ON INNER JOIN clause for the Student_Course_Table as SC last join and go INNER JOIN backwards Course_Table as C ON C.Course_ID = SC.Course_ID ON SC.Student_ID = S.Student_ID; This is tricky. The only way it works is to place the ON clauses backwards. The first ON Clause represents the last INNER JOIN and then moves backwards.

Page 306

Chapter 11

Join Functions

The 5-Table Join – Logical Insurance Model Addresses

Subscriber_No

Subscribers

Claims

Subscriber_No

Subscriber_No

Member_No

Member_No

Services Service_Code

Claim_Service

Providers Provider_Code

Provider_No

Above is the logical model for the insurance tables showing the Primary Key and Foreign Key relationships (PK/FK).

Page 307

Chapter 11

Join Functions

Quiz - Write a Five Table Join Using ANSI Syntax Addresses

Subscriber_No

Subscribers

Claims

Subscriber_No

Subscriber_No

Member_No

Member_No

Services Service_Code

Claim_Service

Providers Provider_Code

Provider_No

Your mission is to write a five table join selecting all columns using ANSI syntax.

Page 308

Chapter 11

Join Functions

Answer - Write a Five Table Join Using ANSI Syntax SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.* FROM CLAIMS AS cla1 INNER JOIN SUBSCRIBERS AS sub1 ON cla1.Subscriber_No = sub1.Subscriber_No AND cla1.Member_No = sub1.Member_No INNER JOIN ADDRESSES AS add1 ON sub1.Subscriber_No = add1.Subscriber_No INNER JOIN PROVIDERS AS pro1 ON cla1.Provider_No = pro1.Provider_Code INNER JOIN SERVICES AS ser1 ON cla1.Claim_Service = ser1.Service_Code ; Above is a example of writing a five table join using ANSI syntax.

Page 309

Chapter 11

Join Functions

Quiz - Write a Five Table Join Using Non-ANSI Syntax Addresses

Subscriber_No

Subscribers

Claims

Subscriber_No

Subscriber_No

Member_No

Member_No

Services Service_Code

Claim_Service

Providers Provider_Code

Provider_No

Your mission is to write a five table join selecting all columns using Non-ANSI syntax.

Page 310

Chapter 11

Join Functions

Answer - Write a Five Table Join Using Non-ANSI Syntax SELECT FROM

WHERE AND AND AND AND

cla1.*, sub1.*, add1.* ,pro1.*, ser1.* CLAIMS AS cla1, SUBSCRIBERS AS sub1, ADDRESSES AS add1, PROVIDERS AS pro1, SERVICES AS ser1 cla1.Subscriber_No = sub1.Subscriber_No cla1.Member_No = sub1.Member_No sub1.Subscriber_No = add1.Subscriber_No cla1.Provider_No = pro1.Provider_Code cla1.Claim_Service = ser1.Service_Code ;

Above is the example writing this five table join using Non-ANSI syntax.

Page 311

Chapter 11

Join Functions

Quiz –Re-Write this putting the ON clauses at the END SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.* FROM CLAIMS AS cla1 INNER JOIN SUBSCRIBERS AS sub1 ON cla1.Subscriber_No = sub1.Subscriber_No AND cla1.Member_No = sub1.Member_No INNER JOIN ADDRESSES AS add1 ON sub1.Subscriber_No = add1.Subscriber_No INNER JOIN PROVIDERS AS pro1 ON cla1.Provider_No = pro1.Provider_Code INNER JOIN SERVICES AS ser1 ON cla1.Claim_Service = ser1.Service_Code ;

Above is the example writing this five table join using Non-ANSI syntax.

Page 312

Chapter 11

Join Functions

Answer –Re-Write this putting the ON clauses at the END SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.* FROM PROVIDERS AS pro1 INNER JOIN ADDRESSES AS add1 INNER JOIN SUBSCRIBERS AS sub1 INNER JOIN SERVICES AS ser1 INNER JOIN CLAIMS as cla1 ON cla1.Claim_Service = ser1.Service_Code ON cla1.Subscriber_No = sub1.Subscriber_No AND cla1.Member_No = sub1.Member_No ON sub1.Subscriber_No =add1.Subscriber_No ON cla1.Provider_No = pro1.Provider_Code ;

Above is the example writing this five table join using ANSI syntax with the ON clauses at the end. We had to move the tables around also to make this happen. Notice that the first ON clause represents the last two tables being joined, and then it works backwards.

Page 313

Chapter 12

Page 314

Date Functions

Chapter 12

Date Functions

Chapter 12 – Date Functions

"An inch of time cannot be bought with an inch of gold." - Chinese Proverb

Page 315

Chapter 12

Date Functions

Current_Date Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

EXECUTE

Query 1 Query 2 Query 3 SELECT Current_Date AS ANSI_Date ;

Messages

Garden of Analysis

ANSI_Date 1 10/11/2015

The Current_Date will return today's date.

Page 316

History

Result 1

Sandbox ?

New Query

Chapter 12

Date Functions

TIMEOFDAY() TIMEOFDAY() returns a VARCHAR data type and specifies the weekday, date, and time.

SELECT TIMEOFDAY() ; timeofday -----------Mon Oct 6 22:53:50.333525 2014 UTC

“Always remember that you are unique just like everyone else.” – Anonymous

The TIMEOFDAY function returns the weekday, date and the time.

Page 317

Chapter 12

Date Functions

SYSDATE Returns a Timestamp with Microseconds This example uses the SYSDATE function to return the full timestamp for the current date.

SELECT SYSDATE ; timestamp ----------------------------------2014-10-04 16:10:43.978655

6 Microseconds included

This example uses the SYSDATE function inside the TRUNC function to return the current date without the time included.

SELECT TRUNC(SYSDATE) ;

trunc -----------2014-10-04

The SYSDATE function returns the current date and time according to the system clock on the leader node. The functions CURRENT_DATE and TRUNC(SYSDATE) produce the same results.

Page 318

Chapter 12

Date Functions

GETDATE Returns a Timestamp without Microseconds This example uses the GETDATE() function to return the full timestamp for the current date.

SELECT GETDATE(); timestamp ------------------------2014-10-04 16:10:43

No Microseconds included

This example uses the GETDATE() function inside the TRUNC function to return the current date without the time included.

SELECT TRUNC(GETDATE());

trunc -----------2014-10-04

GETDATE returns a TIMESTAMP. The parentheses are required.

Page 319

Chapter 12

Date Functions

Add or Subtract Days from a date SELECT Order_Date + 60 AS "Due Date" ,Order_Date ,to_char(Order_total,'$99,999.99') as Total_Due FROM Order_Table ORDER BY 1 ;

Due Date __________

Order_Date __________

Total_Due _________

07/03/1998 03/02/1999 11/08/1999 11/30/1999 12/09/1999

05/04/1998 01/01/1999 09/09/1999 10/01/1999 10/10/1999

$ 12,347.53 $ 8,005.91 $ 23,454.84 $ 5,111.47 $ 15,231.62

When you add or subtract from a Date, you are adding/subtracting Days. Because Dates are stored internally on disk as integers, it makes it easy to add days to the calendar. In the query above, we are adding 60 days to the Order_Date. Also, notice the to_char command which will format the amount.

Page 320

Chapter 12

Date Functions

The ADD_MONTHS Command Returns a Timestamp SELECT Order_Date ,Add_Months (Order_Date,2) as "Due Date" ,Order_Total FROM Order_Table ORDER BY 1 ;

Order_Date __________

Due Date _________________

Order_Total __________

05/04/1998 01/01/1999 09/09/1999 10/01/1999 10/10/1999

07/04/1998 00:00:00 03/01/1999 00:00:00 11/09/1999 00:00:00 12/01/1999 00:00:00 12/10/1999 00:00:00

12347.53 8005.91 23454.84 5111.47 15231.62

The ADD_MONTHS function adds a specified number of months to a date or timestamp value. If date is the last day of the month, or if the resulting month is shorter, the function returns the last day of the month in the result. For other dates, the result contains the same day number as the date expression. A positive or negative integer or any value that implicitly converts to an integer. You can even use a negative number to subtract months from dates. The DATEADD function provides similar functionality.

Page 321

Chapter 12

Date Functions

The ADD_MONTHS Command with Trunc Removes Time SELECT Order_Date ,TRUNC(Add_Months (Order_Date,2)) as "Due Date" ,Order_Total FROM Order_Table ORDER BY 1 ;

Order_Date _________

Due Date _________

Order_Total __________

05/04/1998 01/01/1999 09/09/1999 10/01/1999 10/10/1999

07/04/1998 03/01/1999 11/09/1999 12/01/1999 12/10/1999

12347.53 8005.91 23454.84 5111.47 15231.62

The TRUNC command eliminated the 00:00:00 time

Above, we used the TRUNC command to get rid of the time (00:00:00) on the returning answer set. The ADD_MONTHS function adds a specified number of months to a date or timestamp value. If date is the last day of the month, or if the resulting month is shorter, the function returns the last day of the month in the result. For other dates, the result contains the same day number as the date expression. A positive or negative integer or any value that implicitly converts to an integer. You can even use a negative number to subtract months from dates. The DATEADD function provides similar functionality. Page 322

Chapter 12

Date Functions

ADD_MONTHS Command to Add 1-Year or 5-Years There is no Add_Year command, so put in 12 months for 1-year SELECT Order_Date ,TRUNC(Add_Months (Order_Date,12)) as "Due Date" ,Order_Total FROM Order_Table ORDER BY 1 ;

In this example, we multiplied 12 months times 5 for a total of 5 years! SELECT Order_Date ,TRUNC(Add_Months (Order_Date,12 * 5)) as "Due Date" ,Order_Total FROM Order_Table ORDER BY 1 ;

The Add_Months command adds months to any date. Above, we used a great technique that would give us 1-year. We then showed an even better technique to get 5-years.

Page 323

Chapter 12

Date Functions

Dateadd Function and Add_Months Function are Different DATEADD: If there are fewer days in the date you are adding to than in the result month, the result will be the corresponding day of the result month, not the last day of that month. For example, April 30 th + 1 month is May 30th:

SELECT DATEADD (month,1,'2014-04-30'); DATEADD -----------------------2014-05-30 00:00:00

ADD_MONTHS: If the date you are adding to is the last day of the month, the result is always the last day of the result month, regardless of the length of the month. For example, April 30th + 1 month is May 31st:

SELECT ADD_Months ('2014-04-30',1); ADD_Months ------------------------2014-05-31 00:00:00

The DATEADD and ADD_MONTHS functions handle dates that fall at the ends of months differently.

Page 324

Chapter 12

Date Functions

The EXTRACT Command The EXTRACT command extracts portions of Date, Time, and Timestamp SELECT Order_Date ,Add_Months (Order_Date,12 * 5) as "Due Date" ,Order_Total FROM Order_Table WHERE EXTRACT(Month from Order_Date) = 9 ORDER BY 1 ;

“You miss 100 percent of the shots you never take.” – Wayne Gretzky This is the Extract command. It returns a date part, such as a day, month, or year, from a timestamp value or expression.

Page 325

Chapter 12

Date Functions

EXTRACT from DATES and TIME SELECT Current_Date ,EXTRACT(Year from Current_Date) as Yr ,EXTRACT(Month from Current_Date) as Mo ,EXTRACT(Day from Current_Date) as Da ,Current_Time ,EXTRACT(Hour from Current_Time) as Hr ,EXTRACT(Minute from Current_Time) as Mn ,EXTRACT(Second from Current_Time) as Sc

Answer Set DATE YR 08/02/2012 2012

MO 8

DA 2

TIME 10:17:25

HR 10

MN 17

SC 25

Just like the Add_Months, the EXTRACT Command is a Temporal Function or a Time-Based Function.

Page 326

Chapter 12

Date Functions

EXTRACT with DATE and TIME Literals SELECT EXTRACT(YEAR FROM DATE '2000-10-01') AS "Yr" ,EXTRACT(MONTH FROM DATE '2000-10-01') AS "Mth" ,EXTRACT(DAY FROM DATE '2000-10-01') AS "Day" ,EXTRACT(HOUR FROM TIME '10:01:30') AS "Hr" ,EXTRACT(MINUTE FROM TIME '10:01:30') AS "Min" ,EXTRACT(SECOND FROM TIME '10:01:30') AS "Sec" ,EXTRACT(MONTH FROM current_timestamp) AS ts_Mth ,EXTRACT(SECOND FROM current_timestamp) AS ts_Part ;

Answer Set YR MTH DAY HR MIN SEC TS_MTH TS_SEC 2000 10 1 10 1 30 8 1 Just like the Add_Months, the EXTRACT Command is a Temporal Function or a Time-Based Function, and the above is designed to show how to use it with literal values.

Page 327

Chapter 12

Date Functions

EXTRACT of the Month on Aggregate Queries SELECT EXTRACT(Month FROM Order_date) ,COUNT(*) AS Nbr_of_rows ,AVG(Order_Total) FROM Order_Table GROUP BY 1 ORDER BY 1 ;

Answer Set DATE_PART 1 5 9 10

NBR_OF_ROWS 1 1 1 2

AVG 8005.910000 12347.530000 23454.840000 10171.545000

The above SELECT uses the EXTRACT to only display the month and also to control the number of aggregates displayed in the GROUP BY. Notice the Answer Set headers.

Page 328

Chapter 12

Date Functions

The Datediff command How many weeks are there between two dates

SELECT DATEDIFF (week, '2014-01-01', '2014-12-31') as Number_of_Weeks ;

Number_Of_Weeks ------------------------52 How many quarters are there between two dates

SELECT DATEDIFF(qtr, '1998-07-01', current_date) AS Number_of_Quarters ; Number_Of_Quarters -------------------------65

This function uses a datepart (day, week, month etc.) and two target expressions. This function returns the difference between the two expressions. The expressions must be date or timestamp expressions and they must both contain the specified datepart. If the second date is later than the first date, the result is positive. If the second date is earlier than the first date, the result is negative.

Page 329

Chapter 12

Date Functions

The Datediff Function on Column Data How many days are there between two dates

SELECT Order_Number ,Order_Date ,DATEDIFF (Day, Order_Date, Current_Date) as Number_of_Days FROM Order_Table ;

Order_Number _____________ 123585 123777 123512 123456 123552

Order_Date __________ 10/10/1999 09/09/1999 01/01/1999 05/04/1998 10/01/1999

Number_of_Days _______________ 5474 5505 5756 5998 5483

This function uses a datepart (day, week, month etc.) and two target expressions. This function returns the difference between the two expressions. The expressions must be date or timestamp expressions, and they must both contain the specified datepart. If the second date is later than the first date, the result is positive. If the second date is earlier than the first date, the result is negative.

Page 330

Chapter 12

Date Functions

The Date_Part Function Using a Date What orders came in on a Friday?

SELECT * FROM Order_Table WHERE Date_Part (dow, Order_Date) =5 ;

dow = Day of Week 0 – Sunday 1 – Monday 2 – Tuesday 3 – Wednesday 4 – Thursday 5 – Friday 6 - Saturday

dow = Day of Week

Order_Number ____________ Customer_Number _______________ Order_Date __________ Order_Total __________ 123512 123552

11111111 31323134

01/01/1999 10/01/1999

8005.91 5111.47

The specific part of the date value (year, month, or day, for example) that the datepart function operates on. The expression must be a date or timestamp expression that contains the specified date_part.

Page 331

Chapter 12

Date Functions

The Date_Part Function Using a Time Extract the minute from the timestamp

SELECT DATE_PART (minute, '2014-10-20 02:08:01') ;

Minute

pgdate_part __________ 8

“Speak in a moment of anger and you’ll deliver the greatest speech you’ll ever regret.”

– Anonymous

The specific part of the date value (year, month, or day, for example) that the DATE _PART function operates on. The expression must be a date or timestamp expression that contains the specified DATE_PART. Notice that the default column name for the DATE_PART function is PGDATE_PART.

Page 332

Chapter 12

Date Functions

Date_Part Abbreviations Below are dateparts for date or timestamp functions. The following table identifies the datepart and timepart names and abbreviations that are accepted as arguments to the following functions: • • • • • •

DATEADD DATEDIFF DATEPART DATE_PART DATE_TRUNC EXTRACT

Datepart or timepart millennium, millennia century, centuries decade, decades year, years quarter, quarters month, months week, weeks day of week day of year day, days hour, hours minute, minutes second, seconds millisecond, milliseconds microsecond, microseconds

Abbreviations mil, mils c, cent, cents dec, decs y, yr, yrs qtr, qtrs mon, mons w dayofweek, dow, dw, weekday dayofyear, doy, dy, yearday d h, hr, hrs m, min, mins s, sec, secs ms, msec, msecs, mseconds millisec, millisecs, millisecon microsec, microsecs, microsecond usecond, useconds, us, usec, usecs

Above are the functions for datepart or timepart, their parts, and the acceptable abbreviations.

Page 333

Chapter 12

Date Functions

The to_char command SELECT Order_Date ,Order_Date + 60 as "Due Date" ,to_char(Order_Total, '$99,999.99') As Order_Total ,Order_Date + 50 as "Discount Date" ,to_char(Order_Total *.98, '$99,999.99') as Discounted FROM Order_Table ORDER BY 1 ;

Order_Date __________ 05/04/1998 01/01/1999 09/09/1999 10/01/1999 10/10/1999

Due Date _________ 07/03/1998 03/02/1999 11/08/1999 11/30/1999 12/09/1999

Order_Total ____________ Discount Date __________ Discounted __________ $12,347.53 06/23/1998 $12,100.58 $8,005.91 02/20/1999 $7,845.79 $23,454.84 10/29/1999 $22,985.74 $5,111.47 11/20/1999 $5,009.24 $ 15,231.62 11/29/1999 $14,926.99

The to_char command will take a value and convert it to a character string.

Page 334

Chapter 12

Date Functions

Conversion Functions Function Name to_number() to_date() to_timestamp() to_char()

Conversion Operation Character to numeric Character or timestamp to date Character to timestamp Numeric, date or timestamp to character

The following shows the syntax for using these functions: ,to_number(,'') ,to_date(,'') ,to_timestamp(,'') ,to_char() ,to_date(,'') ; The NPS database provides some functions that assist in the conversion of data from one type to another.

Page 335

Chapter 12

Date Functions

Conversion Function Templates HH, HH12 HH24 MI SS SSSS AM, am, A.M., a.m. or PM, pm, P.M., p.m. Y,YYY YYYY YYY YY Y MONTH, Month, month MON, Mon, mon MM DAY, Day, day DY, Dy, dy DDD DD D BC, bc, B.C., b.c or AD, ad, A.D., a.d.

Page 336

Hour of day (01:12). Hour of day (00:23). Minute (00:59). Second (00:59). Seconds past midnight (0:86399). Meridian indicator (uppercase and lowercase). Year (4 and more digits) with a comma. Year (4 and more digits). Last 3 digits of the year. Last 2 digits of the year. Last digit of the year. Full month name (blank-padded to 9 chars). Abbreviated uppercase month name (3 chars). Month number (01:12). Full day name (blank-padded to 9 chars). Abbreviated uppercase day name (3 chars). Day of the year (001:366). Day of the month (01:31). Day of the week (1:7; SUN=1). Era indicator (uppercase and lowercase).

Chapter 12

Date Functions

Conversion Function Templates Continued W WW week IW CC J Q RM rm FM prefix TH, th suffix FX prefix 9 0 . (period) , (comma) PR S L D G MI RN V

Page 337

Week of the month (1:5) where first week start on the first day of the month. Week number of the year (1:53) where the first starts on the first day of the year. ISO week number of the year (The first Thursday of the new year is in week 1.) Century (2 digits). Julian Day (days since January 1, 4712 BC). Quarter Month in Roman Numerals (I-XII; I=January) — uppercase. Month in Roman Numerals (i-xii; i=January) — lowercase. Fill mode (suppresses padding blanks and zeroes). Add uppercase ordinal number suffix. Fixed format global option. Value with the specified number of digits. Value with leading zeros. Decimal point. Group (thousand) separator. Negative value in angle brackets. Negative value with minus sign (uses locale). Currency symbol (uses locale). Decimal point (uses locale). Group separator (uses locale). Minus sign in the specified position (if number < 0). Roman numeral (input between 1 and 3999). Shift n digits (see notes).

Chapter 12

Date Functions

Formatting a Date

The to_char command will take a value and convert it to a character string. This includes formatting a date.

Page 338

Chapter 12

Date Functions

A Summary of Math Operations on Dates

1

DATE – DATE = Interval (days between dates)

2

DATE + or - Integer = Date

Let's find the number of days Tera-Tom has been alive since his last birthday.

SELECT (date '2012-01-10') - (date '1959-01-10') AS "Tom''s Age In Days";

Tera-Tom's Age In Days ___________________ 19358 A DATE – DATE is an interval of days between dates. A DATE + or – Integer = Date. The query above uses the dates the traditional way to deliver the Interval.

Page 339

Chapter 12

Date Functions

Using a Math Operation to find your Age in Years 1

DATE – DATE = Interval (days between dates)

2

DATE + or - Integer = Date

Let's find the number of days Tera-Tom has been alive since his last birthday.

SELECT (date '2012-01-10') - (date '1959-01-10') AS "Tom''s Age In Days";

Tera-Tom's Age In Days ______________________ 19358 Let's find the number of years Tera-Tom has been alive since his last birthday. SELECT ((date '2012-01-10') - (date '1959-01-10'))/365 "Tom''s Age In Years"; Tera-Tom's Age In Years ___________________ 53

A DATE – DATE is an interval of days between dates. A DATE + or – Integer = Date. Both queries above perform the same function, but the top query uses the date functions to find "Days" and the query on the bottom finds "Years".

Page 340

Chapter 12

Date Functions

Date Related Functions TO_CHAR - Receives a date and based on the template characters, displays portions of the date. TO_DATE - Receives a character string and converts it to a date based on the template provided.

SELECT to_char(Order_Date, 'Day – dddd, Mon yy') ,Order_Date -365 "Year Later Date" ,to_char(Order_Total,'$99,999.99') Order_Total ,to_date('Dec 31, 2005','mon dd, yyyy') as "Due Date" FROM Order_Table ORDER BY 2 ;

Answer Set TO_CHAR Later Date ______________ ORDER_TOTAL _________ Due Date _____________________ Year _____________ Monday – 1242, May 98 Friday – 0016, Jan 99 Thursday – 2525, Sep 99 Friday – 2746, Oct 99 Sunday – 2831, Oct 99

Page 341

05/04/1997 01/01/1998 09/09/1998 10/01/1998 10/10/1998

$ 12,347.53 $ 8,005.91 $ 23,454.84 $ 5,111.47 $ 15,231.62

12/31/2005 12/31/2005 12/31/2005 12/31/2005 12/31/2005

Chapter 12

Date Functions

A Side Title example with Reserved Words as an Alias SELECT 'Due Date:' AS "''" /* title as 2 single quotes for no title */ ,EXTRACT(Month FROM Order_date+64) AS "Month" ,EXTRACT(Day FROM Order_date+64) AS "Day" ,EXTRACT(Year FROM Order_date+64) AS "Year" ,to_char(Order_Date, 'Mon-dd, yyyy') ,Order_Total FROM Order_Table ORDER BY 2,3 ; '' Month Due Date: 3 Due Date: 7 Due Date: 11 Due Date: 12 Due Date: 12

Day 6 7 12 4 13

Year 1999 1998 1999 1999 1999

TO_CHAR ORDER_TOTAL Jan-01, 1999 8005.91 May-04, 1998 12347.53 Sep-09, 1999 23454.84 Oct-01, 1999 5111.47 Oct-10, 1999 15231.62

The next SELECT operation uses entirely ANSI compliant code to show the month and day of the payment due date in 2 months and 4 days. Notice it uses double quotes to allow reserved words as alias names.

Page 342

Chapter 12

Date Functions

Implied Extract of Day, Month and Year Compatibility: Matrix Extension.

The syntax for implied extract:

SELECT to_char(,'DD') /* extracts the day */ ,to_char(,'MM') /* extracts the month */ ,to_char(,'YYYY') /* extracts the year */ FROM ;

--The following SELECT uses math to extract the three portions of Tom's literal birthday

SELECT to_char(date '2012-01-10','DD') AS Day_portion ,to_char(date '2012-01-10','MM') AS Month_portion ,to_char(date '2012-01-10','YYYY') AS Year_portion ; DAY_PORTION 10

MONTH_PORTION YEAR_PORTION 01 2012

It was mentioned earlier that Matrix stores a date as an integer and therefore allows math operations to be performed on a date. Although the EXTRACT works great and it is ANSI compliant, it is a function. Therefore, it must be executed and the parameters passed to it to identify the desired portion as data. Then, it must pass back the answer. As a result, there is additional overhead processing required to use it. Page 343

Chapter 12

Date Functions

DATE_PART Function Compatibility: Matrix Extension. Syntax of DATE_PART: DATE_PART(' 1 VARIANCE = (SUM(expr2) - ((SUM(expr))2 / COUNT(expr))) / (COUNT(expr) - 1)

VAR_POP Function Returns the population variance of a set of numbers after discarding the nulls in this set. If you apply this function to an empty set, it returns null. VAR_POP = (SUM(expr2) - ((SUM(expr))2 / COUNT(expr))) / COUNT(expr)

VAR_ SAMP Function Returns the sample variance of a set of numbers after discarding the nulls in this set. If you apply this function to an empty set, it returns null. The above information is an introduction to the Variance functions used in conjunction with an OVER statement.

Page 422

Chapter 13

OLAP Functions

Using VARIANCE with PARTITION BY Example SELECT Product_ID AS "Prod", Sale_Date, Daily_Sales AS "Sales" ,VARIANCE(Daily_Sales) OVER ( ORDER BY sale_date ROWS unbounded PRECEDING) AS VAR_S ,VARIANCE(Daily_Sales) OVER ( PARTITION BY sale_date ORDER BY sale_date ROWS unbounded PRECEDING ) VAR_P ,VAR_POP(Daily_Sales) OVER ( ORDER BY sale_date ROWS unbounded PRECEDING) AS VAR_POP ,VAR_SAMP(Daily_Sales) OVER ( ORDER BY sale_date ROWS unbounded PRECEDING) AS VAR_SMP FROM sql_class..sales_table WHERE EXTRACT(MONTH FROM Sale_Date) = 9 ; Prod ____ 1000 2000 3000 1000 2000 3000 1000 2000 3000

SALE_DATE ________ Sales _______________ VAR_S ___________ 09/28/2000 48850.40 0.000000 09/28/2000 41888.88 24231380.355200 09/28/2000 61301.77 96726612.289900 09/29/2000 54500.22 68132259.897492 09/29/2000 48000.00 53742301.588280 09/29/2000 34509.13 87815719.265187 09/30/2000 36000.07 94355558.030514 09/30/2000 49850.03 82333329.261021 09/30/2000 43868.86 73037469.176561

VAR_P ______________ 0.000000 24231380.355200 96726612.289900 0.000000 21126430.024200 103983352.132234 0.000000 95910696.000800 48252273.772434

VAR_POP _______________ 0.000000 12115690.177600 64484408.193267 51099194.923119 42993841.270624 73179766.054322 80876192.597584 72041663.103394 64922194.823610

VAR_SMP _______________ ? 24231380.355200 96726612.289900 68132259.897492 53742301.588280 87815719.265187 94355558.030514 82333329.261021 73037469.176561

Wow! Another amazing example. The above example uses all three standard deviation functions to produce output sorting on the sales date for the dates in September. . Page 423

Chapter 13

OLAP Functions

Using FIRST_VALUE and LAST_VALUE The FIRST_VALUE and LAST_VALUE functions allow you to specify sorted aggregate groups and return the first or last value of each group. The function needs to know the length of the data at run time and does not allow a decimal value.

Syntax for FIRST_VALUE and LAST_VALUE: {FIRST_VALUE | LAST_VALUE} ({ | | *}) OVER ([PARTITION BY [,...]] ORDER BY { [ASC | DESC] } [,...]] [ ROWS | RANGE {{ CURRENT ROW | UNBOUNDED | PRECEDING} | BETWEEN {CURRENT ROW | UNBOUNDED PRECEDING | PRECEDING} AND {CURRENT ROW | UNBOUNDED FOLLOWING | FOLLOWING}}] [ EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO OTHERS ] ) ;

The above information provides information and the syntax for FIRST_VALUE and LAST_Value.

Page 424

Chapter 13

OLAP Functions

Using FIRST_VALUE SELECT Last_name, first_name, dept_no ,FIRST_VALUE(first_name) OVER (ORDER BY dept_no, last_name desc rows unbounded preceding) AS "First All" ,FIRST_VALUE(first_name) OVER (PARTITION BY dept_no ORDER BY dept_no, last_name desc rows unbounded preceding) AS "First Partition" FROM SQL_Class..Employee_Table; Last_Name First_Name __________ __________ Jones Squiggy Smythe Richard Chambers Mandee Smith John Coffing Billy Larkins Loraine Strickling Cletus Reilly Willi Harrison Herbert

Dept_No ________ ? 10 100 200 200 300 400 400 400

First All ___________ First Partition ________ Squiggy Squiggy Squiggy Richard Squiggy Mandee Squiggy John Squiggy John Squiggy Loraine Squiggy Cletus Squiggy Cletus Squiggy Cletus

The above example uses FIRST_VALUE to show you the very first first_name returned. It also uses the keyword Partition to show you the very first first_name returned in each department.

Page 425

Chapter 13

OLAP Functions

Using LAST_VALUE SELECT Last_name, first_name, dept_no ,LAST_VALUE(first_name) OVER (ORDER BY dept_no, last_name desc rows unbounded preceding) AS "Last All" ,LAST_VALUE(first_name) OVER (PARTITION BY dept_no ORDER BY dept_no, last_name desc rows unbounded preceding) AS "Last Partition" FROM sql_class.Employee_Table; Last_Name _________ Jones Smythe Chambers Smith Coffing Larkins Strickling Reilly Harrison

First_Name Dept_No __________ _______ Squiggy ? Richard 10 Mandee 100 John 200 Billy 200 Loraine 300 Cletus 400 William 400 Herbert 400

Last All ________ Squiggy Richard Mandee John Billy Loraine Cletus William Herbert

Last Partition ___________ Squiggy Richard Mandee John Billy Loraine Cletus William Herbert

The FIRST_VALUE and LAST_VALUE are good to use anytime you need to propagate a value from one row to all or multiple rows based on a sorted sequence. However, the output from the LAST_VALUE function appears to be incorrect or incomplete until you understand a few concepts. The SQL request specifies "rows unbounded preceding" and LAST_VALUE looks at the last row. The current row is always the last row, and therefore, it appears in the output. Page 426

Chapter 13

OLAP Functions

Using LAG and LEAD The LAG and LEAD functions allow you to compare different rows of a table by specifying an offset from the current row. You can use these functions to analyze change and variation.

Syntax for LAG and LEAD: {LAG | LEAD} (, [ [, ]]) OVER ([PARTITION BY [,...]] ORDER BY [ASC | DESC] [,...] );

“Only he who attempts the ridiculous may achieve the impossible.” – Don Quixote The above provides information and the syntax for LAG and LEAD.

Page 427

Chapter 13

OLAP Functions

Using LEAD SELECT last_name, dept_no ,lead(dept_no) over (order by dept_no, last_name) as "Lead All" ,lead(dept_no) over (Partition by dept_no order by dept_no, last_name) as "Lead Partition" FROM employee_table;

Last_Name __________ Jones Smythe Chambers Coffing Smith Larkins Harrison Reilly Strickling

Dept_No _______ ? 10 100 200 200 300 400 400 400

Lead All _______ 10 100 200 200 300 400 400 400 ?

Lead Partition __________ ? ? ? 200 ? ? 400 400 ?

As you can see, the first LEAD brings back the value from the next row except for the last which has no row following it. The offset value was not specified in this example, so it defaulted to a value of 1 row.

Page 428

Chapter 13

OLAP Functions

Using LEAD With and Offset of 2 SELECT last_name, dept_no ,lead(dept_no,2) over (order by dept_no, last_name) as "Lead All" ,lead(dept_no,2) over (Partition by dept_no order by dept_no, last_name) as "Lead Partition" FROM employee_table;

Last_Name __________ Jones Smythe Chambers Coffing Smith Larkins Harrison Reilly Strickling

Dept_No _______

Lead All _______

? 10 100 200 200 300 400 400 400

100 200 200 300 400 400 400 ? ?

Lead Partition __________ ? ? ? ? ? ? 400 ? ?

Above, each value in the first LEAD is 2 rows away, and the partitioning only shows when values are contained in each value group with 1 more than offset value.

Page 429

Chapter 13

OLAP Functions

Using LAG SELECT last_name, dept_no ,lag(dept_no) over (order by dept_no, last_name) as "Lag All" ,lag(dept_no) over (Partition by dept_no order by dept_no, last_name) as "Lag Partition" FROM employee_table; Last_Name __________ Jones Smythe Chambers Coffing Smith Larkins Harrison Reilly Strickling

Dept_No _______ ? 10 100 200 200 300 400 400 400

Lag All Lag Partition ______ __________ ? ? 10 100 200 200 300 400 400

? ? ? ? 200 ? ? 400 400

From the example above, you see that LAG uses the value from a previous row and makes it available in the next row. For LAG, the first row(s) will contain a null based on the value in the offset. Above is defaulted to 1. The first null comes from the function, whereas the second row gets the null from the first row.

Page 430

Chapter 13

OLAP Functions

Using LAG with an Offset of 2 SELECT last_name, dept_no ,lag(dept_no,2) over (order by dept_no, last_name) as "Lag All" ,lag(dept_no,2) over (Partition by dept_no order by dept_no, last_name) as "Lag Partition" FROM employee_table;

Last_Name __________

Jones Smythe Chambers Coffing Smith Larkins Harrison Reilly Strickling

Dept_No _______

? 10 100 200 200 300 400 400 400

Lag All Lag Partition ______ __________

? ? ? 10 100 200 200 300 400

? ? ? ? ? ? ? ? 400

For this example, the first two rows have a null because there is not a row two rows before these. The number of nulls will always be the same as the offset value. There is a third null because Jones Dept_No is null.

Page 431

Chapter 14

Page 432

Temporary Tables

Chapter 14

Temporary Tables

Chapter 14 – Temporary Tables

“I cannot imagine any condition which would cause this ship to founder. Modern shipbuilding has gone beyond that.” - E. I. Smith, Captain of the Titanic

Page 433

Chapter 14

Temporary Tables

CREATING A Derived Table • • • •

Exists only within a query Materialized by a SELECT Statement inside a query Space comes from the User’s Spool space Deleted when the query ends

SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) AS TeraTom(AVGSAL) ; A query within a query.

AVGSAL ________ 46782.15

Answer Set

The SELECT Statement that creates and populates the Derived table is always inside Parentheses.

Page 434

Chapter 14

Temporary Tables

The Three Components of a Derived Table SELECT E.*, Salary – AVGSAL as PlusMinAvg FROM Employee_Table as E INNER JOIN (SELECT Dept_No, AVG(salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;

Dept_No AVGSAL ________ ________ ? 32800.50 10 64300.00 100 48850.00 200 44944.44 300 40200.00 400 48333.33 The derived table lives in memory

1

A derived table will always have a SELECT query to materialize the derived table with data. The SELECT query always starts with an open parenthesis and ends with a close parenthesis.

2

The derived table must be given a name. Above we called our derived table TeraTom.

3

You will need to define (alias) the columns in the derived table. Above we allowed Dept_No to default to Dept_No, but we had to specifically alias AVG(Salary) as AVGSAL.

Every derived table must have the three components listed above.

Page 435

TeraTom

Chapter 14

Temporary Tables

Naming the Derived Table SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) AS TeraTom(AVGSAL) ;

The name of the Derived Table is TeraTom

AVGSAL ________ 46782.15

Answer Set

In the example above, TeraTom is the name we gave the Derived Table. It is mandatory that you always name the table or it errors.

Page 436

Chapter 14

Temporary Tables

Aliasing the Column Names in The Derived Table SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) AS TeraTom(AVGSAL) ; AVGSALis the Column Name in the derived table named TeraTom

AVGSAL ________

46782.15

Answer Set

AVGSAL is the name we gave to the column in our Derived Table that we call TeraTom. Our SELECT (which builds the columns) shows we are only going to have one column in our derived table, and we have named that column AVGSAL.

Page 437

Chapter 14

Temporary Tables

Visualize This Derived Table SELECT E.*, Salary – AVGSAL as PlusMinAvg FROM Employee_Table as E INNER JOIN (SELECT Dept_No, AVG(salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;

Employee_No ____________ Dept_No ________ 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 1256349 400 2341218 400

TeraTom Dept_No AVGSAL ________ ________ ? 32800.50 10 64300.00 100 48850.00 200 44944.44 300 40200.00 400 48333.33

The derived table is built first

Last_Name Salary PlusMinAvg ___________ First_Name ___________ ________ ___________ Smythe Richard 64300.00 0.00 Chambers Mandee 48850.00 0.00 Coffing Billy 41888.88 -3055.56 Smith John 48000.00 3055.56 Larkins Loraine 40200.00 0.00 Strickling Cletus 54500.00 6166.67 Harrison Herbert 54500.00 6166.67 Reilly William 36000.00 -12333.33

Our example above shows the data in the derived table named TeraTom. This query allows us to see each employee and the plus or minus avg of their salary compared to the other workers in their department.

Page 438

Chapter 14

Temporary Tables

Most Derived Tables Are Used To Join To Other Tables SELECT E.*, AVGSAL The SELECT materializes FROM Employee_Table as E the Derived Table INNER JOIN (SELECT Dept_No, AVG(salary) FROM Employee_Table GROUP BY Dept_No) AS TeraTom (Dept_No, AVGSAL)

ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;

The derived table name is TeraTom

The columns are aliased

Employee_No Dept_No Last_Name First_Name ______ Salary ___________ _______ ________ ________ 1000234 10 Smythe Richard 64300.00 1232578 100 Chambers Mandee 48850.00 1324657 200 Coffing Billy 41888.88 1333454 200 Smith John 48000.00 2312225 300 Larkins Loraine 40200.00 1121334 400 Strickling Cletus 54500.00 1256349 400 Harrison Herbert 54500.00 2341218 400 Reilly William 36000.00

AVGSAL _______ 64300.00 48850.00 44944.44 44944.44 40200.00 48333.33 48333.33 48333.33

The first five columns in the Answer Set came from the Employee_Table. AVGSAL came from the derived table named TeraTom.

Page 439

Chapter 14

Temporary Tables

Multiple Ways to Alias the Columns in a Derived Table 1

SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) AS TeraTom(AVGSAL) ; The derived table must always be named

2

SELECT * FROM (SELECT AVG(salary) AS AVGSAL FROM Employee_Table) AS TeraTom ; The derived table must always be named

Page 440

Aliasing the column(s) can be done here

Alias can be done inside the derived SELECT statement

Chapter 14

Temporary Tables

Our Join Example with a Different Column Aliasing Style I don't need to alias this SELECT E.*, AVGSAL because it can default to its FROM Employee_Table as E current name INNER JOIN (SELECT Dept_No as Dept_No, AVG(salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom I must alias this

ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;

column because it is an aggregate

Employee_No Dept_No _________ Last_Name _________ First_Name _______ Salary __________ ________ 1000234 1232578 1324657 1333454 2312225 1121334 1256349 2341218

Page 441

10 100 200 200 300 400 400 400

Smythe Chambers Coffing Smith Larkins Strickling Harrison Reilly

Richard Mandee Billy John Loraine Cletus Herbert William

64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 54500.00 36000.00

AVGSAL _______ 64300.00 48850.00 44944.44 44944.44 40200.00 48333.33 48333.33 48333.33

Chapter 14

Temporary Tables

Column Aliasing Can Default for Normal Columns I don't need to alias this SELECT E.*, AVGSAL because it can default to its FROM Employee_Table as E current name INNER JOIN (SELECT Dept_No, AVG(salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;

TeraTom Dept_No AVGSAL ________ ________ ? 32800.50 10 64300.00 100 48850.00 200 44944.44 300 40200.00 400 48333.33 The derived table is built first

Page 442

Chapter 14

Temporary Tables

CREATING a Derived Table using the WITH Command Create the Derived Table before we run the query!

WITH TeraTom(AVGSAL) AS (SELECT AVG(salary)FROM Employee_Table) SELECT * FROM TeraTom ;

AVGSAL ________ 46782.15

Answer Set

When using the WITH Command, we can CREATE our Derived table before running the main query. The only issue here is that you can only have 1 WITH. . Page 443

Chapter 14

Temporary Tables

Our Join Example With the WITH Syntax WITH TeraTom (Dept_No, AVGSAL) AS (SELECT Dept_No , AVG(Salary) FROM Employee_Table GROUP BY Dept_No) SELECT E.*, AVGSAL FROM Employee_Table as E INNER JOIN TeraTom ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;

Now, the lower portion of the query refers to TeraTom Almost like it is a permanent table, but it is not!

Page 444

TeraTom Dept_No AVGSAL ________ ________ ? 32800.50 10 64300.00 100 48850.00 200 44944.44 300 40200.00 400 48333.33 The derived table is built first

Chapter 14

Temporary Tables

WITH with TeraTom as (select * from Student_Table) SELECT * FROM TeraTom ORDER BY 1 LIMIT 5;

“We're going to have the best-educated American people in the world.” – Dan Quayle

The following example shows the simplest possible case of a query that contains a WITH clause. The WITH query named TeraTom selects all of the rows from the Student_Table. The main query, in turn, selects all of the rows from TeraTom. The TeraTom table exists only for the life of the query.

Page 445

Chapter 14

Temporary Tables

A WITH Clause That Produces Two Tables with Budget_Derived as (SELECT Max(Budget) as Max_Budget FROM Department_Table), Emp_Derived as (SELECT Dept_No, AVG(Salary) as Avg_Sal FROM Employee_Table GROUP BY Dept_No) select E.*, Max_Budget – Budget as Under_Max, Avg_Sal FROM Employee_Table as E INNER JOIN Emp_Derived On E.Dept_No = Emp_Derived.Dept_No INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No;

The following example shows two tables created from the With statement.

Page 446

Chapter 14

Temporary Tables

The Same Derived Query shown Three Different Ways

1

SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) TeraTom (AVGSAL) ; Alias CAN be done here or here

2

3

Page 447

SELECT * FROM (SELECT AVG(salary) as AVGSAL FROM Employee_Table) TeraTom ;

WITH TeraTom(AVGSAL) AS (SELECT AVG(salary)FROM Employee_Table) SELECT * FROM TeraTom ;

Chapter 14

Temporary Tables

Quiz - Answer the Questions SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty ;

1) What is the name of the derived table? __________ 2) How many columns are in the derived table? _______ 3) What is the name of the derived table columns? ______

4) Is there more than one row in the derived table? _______ 5) What common keys join the Employee and Derived? _______ 6) Why were the join keys named differently? ______________

Page 448

Chapter 14

Temporary Tables

Answer to Quiz - Answer the Questions SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty ;

1) What is the name of the derived table? TeraTom 2) How many columns are in the derived table? 2

3) What’s the name of the derived columns? Depty and AVGSAL 4) Is their more than one row in the derived table? Yes 5) What keys join the tables? Dept_No and Depty 6) Why were the join keys named differently? If both were named Dept_No, we would error unless we full qualified.

Page 449

Chapter 14

Temporary Tables

Clever Tricks on Aliasing Columns in a Derived Table SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table Alias Here INNER JOIN

1

(SELECT Dept_No as Depty, AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) as TeraTom ON Dept_No = Depty ;

SELECT E.Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table as E INNER JOIN Alias Here

2

(SELECT Dept_No, AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) as TeraTom ON E.Dept_No = TeraTom.Dept_No ;

Page 450

Chapter 14

Temporary Tables

A Derived Table lives only for the lifetime of a single query BT ; First query

1

Begin Transaction

WITH T (Dept_No, AVGSAL) AS (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) SELECT T.Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table as E INNER JOIN T ON E.Dept_No = T.Dept_No ;

Second query

2

SELECT * FROM T ;

ET;

Page 451

End Transaction

Error – Query Fails…. T does Not exist.

Chapter 14

Temporary Tables

An Example of Two Derived Tables in a Single Query WITH T (Dept_No, AVGSAL) AS (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) SELECT T.Dept_No, First_Name, Last_Name, AVGSAL, Counter FROM Employee_Table as E INNER JOIN T ON E.Dept_No = T.Dept_No INNER JOIN (SELECT Employee_No, SUM(1) OVER(PARTITION BY Dept_No ORDER BY Dept_No, Last_Name Rows Unbounded Preceding) FROM Employee_Table) as S (Employee_No, Counter) ON E.Employee_No = S.Employee_No ORDER BY T.Dept_No;

Page 452

Chapter 14

Temporary Tables

Create Table Syntax CREATE [ [LOCAL ] { TEMPORARY | TEMP } ] TABLE table_name ( { column_name data_type [column_attributes] [ column_constraints ] | table_constraints | LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] } [, ... ] ) [table_attribute] where column_attributes are: [ DEFAULT default_expr ] [ IDENTITY ( seed, step ) ] [ ENCODE encoding ] [ DISTKEY ] [ SORTKEY ] and column_constraints are: [ { NOT NULL | NULL } ] [ { UNIQUE | PRIMARY KEY } ] [ REFERENCES reftable [ ( refcolumn ) ] ] and table_constraints are: [ UNIQUE ( column_name [, ... ] ) ] [ PRIMARY KEY ( column_name [, ... ] ) ] [ FOREIGN KEY (column_name [, ... ] ) REFERENCES reftable [ ( refcolumn ) ] and table_attributes are: [ DISTSTYLE { EVEN | KEY } ] [ DISTKEY ( column_name ) ] [ SORTKEY ( column_name [, ...] ) ]

Create Table Syntax creates a new table in the current database. The owner of this table is the issuer of the CREATE TABLE command.

Page 453

Chapter 14

Temporary Tables

Basic Temporary Table Examples If you begin a table name with a # it will automatically create a temporary table.

CREATE TABLE #Employee_Temp (Emp_No INTEGER NULL, Dept_No SMALLINT NULL, Last_name CHAR(20) NULL, First_name VARCHAR(12) NULL) DISTKEY(Emp_No);

We now materialize our table with an Insert/Select statement.

INSERT INTO #Employee_Temp SELECT * FROM Employee_Table ; When you create a temporary table, it is visible only within the current session. The table is automatically dropped at the end of the session. Above, we use the pound sign (#) at the front of the table name to automatically make the table a temporary table. We then populate the table with an Insert/Select . . Page 454

Chapter 14

Temporary Tables

More Advanced Temporary Table Examples

1

Both examples create a temporary table named Emp_Temp. It inherits its columns, Distkey and Sortkey from the Employee_Table

CREATE Temp Table Emp_Temp (like Employee_Table); CREATE Temp Table Emp_Temp as SELECT * FROM Employee_Table ;

2

This creates a temp table that has only the First_Name and the Last_Name columns in it.

CREATE Temp Table Employ_T as SELECT First_Name ,Last_Name from Employee_Table ;

When you create a temporary table, it is visible only within the current session. The table is automatically dropped at the end of the session. A derived table only lasts for the life of a single query, but a temporary table last the entire session. This allows a user to run hundreds of queries against the temporary table. A temporary table can have the same name as a permanent table, but I don't recommend this. You don't give a temporary table a schema because it is automatically associated with the user’s session. Once the session is over, the table and data are dropped. If the user tries to query the table in another session, the system won't recognize the table. In other words, the table doesn't exist outside of the current session it was created in.

Page 455

Chapter 14

Temporary Tables

Advanced Temporary Table Examples

1

This redistributes the Order_Table on a different key column from the incoming data which is sorted on the Order_Date column and defines no SORTKEY column; therefore the table is not sorted.

CREATE Temp Table Order_Temp distkey(Customer_Number) AS SELECT * FROM Order_Table ;

3

The following statement applies even distribution and defines a sort key: The resulting table has a sort key but no distribution key.

2

This example applies even distribution to the table but does not define an explicit sort key: The new table has no sort key and no distribution key.

CREATE Temp Table Sales_Temp diststyle even as SELECT Product_Id, Sale_Date, Daily_Sales from Sales_Table ;

CREATE Temp Table Sales_Sorted diststyle even sortkey (Sale_Date) as SELECT Product_ID, Sale_Date, Daily_Sales FROM Sales_Table ;

When you create a temporary table, it is visible only within the current session. The table is automatically dropped at the end of the session. Above are some examples that allow you to define a different distkey, diststyle and sortkey. Users (by default) are granted permission to create temporary tables by their automatic membership in the PUBLIC group. To remove the privilege for any users to create temporary tables, revoke the TEMP permission from the PUBLIC group, and then explicitly grant the permission to create temporary tables to specific users or groups of users. Page 456

Chapter 14

Temporary Tables

Performing a Deep Copy A deep copy recreates and repopulates a table by using a bulk insert which automatically sorts the table. If a table has a large unsorted region, a deep copy is much faster than a vacuum. You can choose one of four methods to create a copy of the original table: 1) Use the original table DDL. This is the best method for perfect reproduction.

2) Use CREATE TABLE AS (CTAS). If the original DDL is not available, you can use CREATE TABLE AS to create a copy of current table, then rename the copy. The new table will not inherit the encoding, distkey, sortkey, not null, primary key, and foreign key attributes of the parent table. 3) Use CREATE TABLE LIKE. If the original DDL is not available, you can use CREATE TABLE LIKE to recreate the original table. The new table will not inherit the primary key and foreign key attributes of the parent table. The new table does, though, inherit the encoding, distkey, sortkey, and not null attributes of the parent table. 4) Create a temporary table and truncate the original table. If you need to retain the primary key and foreign key attributes of the parent table, you can use CTAS to create a temporary table, then truncate the original table and populate it from the temporary table. This method is slower than CREATE TABLE LIKE because it requires two insert statements.

A deep copy recreates and repopulates a table by using a bulk insert, which automatically sorts the table. If a table has a large unsorted region, a deep copy is much faster than a vacuum. The difference is that you cannot make concurrent updates during a deep copy operation which you can do during a vacuum. The next four slides will show each technique with an example. Page 457

Chapter 14

Temporary Tables

Deep Copy Using the Original DDL 1) Use the original table DDL. This is the best method for perfect reproduction. 1. Create a copy of the table using the original CREATE TABLE DDL. 2. Use an INSERT INTO … SELECT statement to populate the copy with data from the original table. 3. Drop the original table. 4. Use an ALTER TABLE statement to rename the copy to the original table name.

The following example performs a deep copy on the Sales_Table using a duplicate of the Sales_Table named Sales_Table_Copy. CREATE TABLE Sales_Table_Copy ( … );

Using the original DDL statement of the Sales_Table

INSERT INTO Sales_Table_Copy (select * from Sales_Table); DROP TABLE Sales_Table; ALTER TABLE Sales_Table_Copy rename to Sales_Table;

A deep copy recreates and repopulates a table by using a bulk insert which automatically sorts the table.

Page 458

Chapter 14

Temporary Tables

Deep Copy Using A CTAS 2) Use CREATE TABLE AS (CTAS). If the original DDL is not available, you can use CREATE TABLE AS to create a copy of current table, then rename the copy. The new table will not inherit the encoding, distkey, sortkey, not null, primary key, and foreign key attributes of the parent table.

1.

Create a copy of the original table by using CREATE TABLE AS to select the rows from the original table.

2.

Drop the original table.

3.

Use an ALTER TABLE statement to rename the new table to the original table.

The following example performs a deep copy on the Sales_Table using a duplicate of the Sales_Table named Sales_Table_Copy. CREATE TABLE Sales_Table_Copy as (select * from Sales_Table) ; DROP TABLE Sales_Table ; ALTER TABLE Sales_Table_Copy rename to Sales_Table ;

A deep copy recreates and repopulates a table by using a bulk insert which automatically sorts the table.

Page 459

Chapter 14

Temporary Tables

Deep Copy Using A Create Table LIKE 2) Use CREATE TABLE LIKE. If the original DDL is not available, you can use CREATE TABLE LIKE to recreate the original table. The new table will not inherit the primary key and foreign key attributes of the parent table. The new table does though inherit the encoding, distkey, sortkey, and not null attributes of the parent table. 1.

Create a new table using CREATE TABLE LIKE.

2.

Use an INSERT INTO … SELECT statement to copy the rows from the current table to the new table.

3.

Drop the current table.

4.

Use an ALTER TABLE statement to rename the new table to the original table. The following example performs a deep copy on the Sales_Table using a duplicate of the Sales_Table named Sales_Table_Copy. CREATE TABLE Sales_Table_Copy (like Sales_Table); INSERT INTO Sales_Table_Copy (select * from Sales_Table); DROP TABLE Sales_Table; ALTER TABLE Sales_Table_Copy RENAME to Sales_Table;

A deep copy recreates and repopulates a table by using a bulk insert which automatically sorts the table.

Page 460

Chapter 14

Temporary Tables

Deep Copy by Creating a Temp Table and Truncating Original Create a temporary table and truncate the original table. If you need to retain the primary key and foreign key attributes of the parent table, you can use CTAS to create a temporary table, then truncate the original table and populate it from the temporary table. This method is slower than CREATE TABLE LIKE because it requires two insert statements. 1.

Use CREATE TABLE AS to create a temporary table with the rows from the original table.

2.

Truncate the current table.

3.

Use an INSERT INTO … SELECT statement to copy the rows from the temporary table to the original table.

4.

Drop the temporary table. The following example performs a deep copy on the Sales_Table using a duplicate of the Sales_Table named Sales_Table_Copy. CREATE Temp Table Sales_Table_Copy as select * from Sales_Table ; TRUNCATE Sales_Table ; Insert Into Sales_Table (select * from Sales_Table_Copy); DROP Table Sales_Table_Copy;

A deep copy recreates and repopulates a table by using a bulk insert which automatically sorts the table.

Page 461

Chapter 15

Page 462

Sub-query Functions

Chapter 15

Sub-query Functions

Chapter 15 – Sub-query Functions

“An invasion of Armies can be resisted, but not an idea whose time has come.” - Victor Hugo

Page 463

Chapter 15

Sub-query Functions

An IN List is much like a Subquery Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Employee_Table WHERE Dept_No IN (100, 200) ;

Messages

Garden of Analysis

Result 1

Employee_No Dept_No Last_Name 1 1232578 100 Chambers 2 1324657 200 Coffing 3 1333454 200 Smith

First_Name Salary 48850.00 Mandee 41888.88 Billy 48000.00 John

This query is very simple and easy to understand. It uses an IN List to find all Employees who are in Dept_No 100 or Dept_No 200. Page 464

Chapter 15

Sub-query Functions

An IN List Never has Duplicates – Just like a Subquery Nexus Chameleon File Edit View Query Tools Help Web Windows System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

Sandbox

History EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Employee_Table WHERE Dept_No IN (100, 100, 200, 200) ;

Messages

Garden of Analysis

Duplicates in an IN-List are silly

Result 1

Employee_No Dept_No Last_Name 1 1232578 100 Chambers 2 1324657 200 Coffing 3 1333454 200 Smith

First_Name Salary 48850.00 Mandee 41888.88 Billy 48000.00 John

The answer still only produced three rows

What is going on with this IN List? Why in the world are their duplicates in there? Will this query even work? What will the result set look like? Turn the page!

Page 465

Chapter 15

Sub-query Functions

An IN List Ignores Duplicates Nexus Chameleon File Edit View Query Tools Help Web Windows System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

Sandbox

History EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Employee_Table WHERE Dept_No IN (100, 100, 200, 200) ;

Messages

Garden of Analysis

Duplicates in an IN-List are silly

Result 1

Employee_No Dept_No Last_Name 1 1232578 100 Chambers 2 1324657 200 Coffing 3 1333454 200 Smith

First_Name Salary 48850.00 Mandee 41888.88 Billy 48000.00 John

The answer still only produced three rows

Duplicate values are ignored here. We got the same rows back as before, and it is as if the system ignored the duplicate values in the IN List. That is exactly what happened.

Page 466

Chapter 15

Sub-query Functions

The Subquery Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

There is a Top Query and a Bottom Query!

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table

Dept_No ________________ Department_Name ________

SELECT * FROM Employee_Table WHERE Dept_No IN ( SELECT Dept_No FROM Department_Table) ;

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Which Query Runs First?

The query above is a Subquery which means there are multiple queries in the same SQL. The bottom query runs first, and its purpose in life is to build a distinct list of values that it passes to the top query. The top query then returns the result set. This query solves the problem: Show all Employees in Valid Departments!

Page 467

Chapter 15

Sub-query Functions

The Three Steps of How a Basic Subquery Works Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

SELECT * FROM Employee_Table 1 WHERE Dept_No IN ( SELECT Dept_No The Bottom Query runs first! FROM Department_Table) ;

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

2 The result is passed to the top query!

3 SELECT * FROM Employee_Table WHERE Dept_No IN (100, 200, 300, 400, 500) ;

The top query runs using the bottom query answer set

The bottom query runs first and builds a distinct IN list. Then the top query runs using the list.

Page 468

Chapter 15

Sub-query Functions

These are Equivalent Queries Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

1

2

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

SELECT * FROM Employee_Table WHERE Dept_No IN ( SELECT Dept_No FROM Department_Table) ;

SELECT * FROM Employee_Table WHERE Dept_No IN (100, 200, 300, 400, 500) ;

Both queries above are the same. Query 2 has values in an IN list. Query 1 runs a subquery to build the values in the IN list.

Page 469

Chapter 15

Sub-query Functions

The Final Answer Set from the Subquery Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400 Remember that a subquery never has columns return in the final answer set

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Page 470

Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources Notice that No employees are in dept 500

SELECT * FROM Employee_Table WHERE Dept_No IN ( SELECT Dept_No FROM Department_Table) ; Employee_No Dept_No ____________ ________ 1232578 100 1324657 200 1333454 200 2312225 300 1256349 400 2341218 400 1121334 400

.

Department_Table

Last_Name __________ Chambers Coffing Smith Larkins Harrison Reilly Strickling

First_Name __________ Mandee Billy John Loraine Herbert William Cletus

Salary ________ 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00

Chapter 15

Sub-query Functions

Quiz- Answer the Difficult Question Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

How are Subqueries similar to Joins between two tables?

A great question was asked above. Do you know the key to answering? Turn the page!

Page 471

Chapter 15

Sub-query Functions

Answer to Quiz- Answer the Difficult Question Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Department_Table

Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

Primary Key

Foreign Key

How are Subqueries similar to Joins between two tables?

A Subquery between two tables or a Join between two tables will each need a common key that represents the relationship. This is called a Primary Key/Foreign Key relationship.

A Subquery will use a common key linking the two tables together very similar to a join! When subquerying between two tables, look for the common link between the two tables. Most of the time they both have a column with the same name, but not always.

Page 472

Chapter 15

Sub-query Functions

Should you use a Subquery or a Join? Employee_Table

Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Department_Table

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

When do I Subquery? SELECT * FROM Employee_Table WHERE Dept_No IN ( SELECT Dept_No FROM Department_Table) ;

Dept_No ________________ Department_Name ________ 100 200 300 400 500

Marketing Research and Dev Sales Customer Support Human Resources

When do I perform a Join? SELECT E.*, Department_Name FROM Employee_Table as E Inner Join Department_Table as D ON E.Dept_No = D.Dept_No;

If you only want to see a report where the final result set has only columns from one table, use a Subquery. Obviously, if you need columns on the report where the final result set has columns from both tables, you have to do a Join.

Page 473

Chapter 15

Sub-query Functions

Quiz- Write the Subquery Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________

11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Write the Subquery

Select all columns in the Customer_Table if the customer has placed an order!

Here is your opportunity to show how smart you are. Write a Subquery that will bring back everything from the Customer_Table if the customer has placed an order in the Order_Table. Good luck! Advice: Look for the common key among both tables!

Page 474

Chapter 15

Sub-query Functions

Answer to Quiz- Write the Subquery Nexus Chameleon History

File Edit View Query Tools Help Web Windows System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Customer_Table WHERE Customer_Number IN (SELECT Customer_Number FROM Order_Table) ; Messages

1 2 3 4

Garden of Analysis

Customer_Number 11111111 31323134 57896883 87323456

Result 1

Customer_Name Phone_Number Billy's Best Choice 555-1234 555-1212 ACE Consulting 347-8954 XYZ Plumbing 322-1012 Databases N-U

The common key among both tables is Customer_Number. The bottom query runs first and delivers a distinct list of Customer_Numbers which the top query uses in the IN List!

Page 475

Chapter 15

Sub-query Functions

Quiz- Write the More Difficult Subquery Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Write the Subquery Select all columns in the Customer_Table if the customer has placed an order over $10,000.00 Dollars!

Here is your opportunity to show how smart you are. Write a Subquery that will bring back everything from the Customer_Table if the customer has placed an order in the Order_Table that is greater than $10,000.00.

Page 476

Chapter 15

Sub-query Functions

Answer to Quiz- Write the More Difficult Subquery Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Here is your answer!

Page 477

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Customer_Table WHERE Customer_Number IN ( SELECT Customer_Number FROM Order_Table WHERE Order_Total > 10000.00) ; Messages

Garden of Analysis

Customer_Number 1 11111111 2 57896883 3 87323456

Result 1

Customer_Name Phone_Number Billy's Best Choice 555-1234 347-8954 XYZ Plumbing 322-1012 Databases N-U

Chapter 15

Sub-query Functions

Quiz- Write the Subquery with an Aggregate Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name __________ Jones Squiggy Smythe Richard Chambers Mandee Coffing Billy Smith John Larkins Loraine Strickling Cletus Reilly William Harrison Herbert

Salary _______ 32800.50 64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00

Write the Subquery Select all columns in the Employee_Table if the employee makes a greater Salary than the AVERAGE Salary. Another opportunity knocking! Would someone please answer the query door?

Page 478

Chapter 15

Sub-query Functions

Answer to Quiz- Write the Subquery with an Aggregate Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Page 479

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Employee_Table WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table) ; Messages

Garden of Analysis

Result 1

Employee_No Dept_No Last_Name First_Name 10 Smythe 1000234 Richard 1 400 Strickling Cletus 1121334 2 100 Chambers Mandee 1232578 3 200 Smith 1333454 John 4 400 Harrison 1256349 Herbert 5

Salary 64300.00 54500.00 48850.00 48000.00 54500.00

Chapter 15

Sub-query Functions

Quiz- Write the Correlated Subquery Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

Write the Correlated Subquery

Select all columns in the Employee_Table if the employee makes a greater Salary than the AVERAGE Salary (within their own Department). Another opportunity knocking! This is a tough one, and only the best get this written correctly.

Page 480

Chapter 15

Sub-query Functions

Answer to Quiz- Write the Correlated Subquery Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Page 481

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT * This co-relates or FROM Employee_Table as EE correlates the top WHERE Salary > ( query to the bottom SELECT AVG(Salary) FROM Employee_Table as EEEE WHERE EEEE.Dept_No = EE.Dept_No) ; Messages

1 2 3

Garden of Analysis

Result 1

Employee_No Dept_No Last_Name First_Name Salary 400 Strickling Cletus 54500.00 1121334 200 Smith 48000.00 1333454 John 400 Harrison 54500.00 1256349 Herbert

Chapter 15

Sub-query Functions

The Basics of a Correlated Subquery The Top Query is Co-Related (Correlated) with the Bottom Query. The table name from the top query and the table name from the bottom query are given a different alias.

The bottom query WHERE clause co-relates Dept_No from Top and Bottom. The top query is run first. The bottom query is run one time for each distinct value delivered from the top query. SELECT * FROM Employee_Table as EE WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table as EEEE WHERE EE.Dept_No = EEEE.Dept_No) ;

A correlated subquery breaks all the rules. It is the top query that runs first. Then, the bottom query is run one time for each distinct column in the bottom WHERE clause. In our example, this is the column Dept_No. This is because in our example, the WHERE clause is comparing the column Dept_No. After the top query runs and brings back its rows, the bottom query will run one time for each distinct Dept_No. If this is confusing, it is not you. These take a little time to understand, but I have a plan to make you an expert. Keep reading! Page 482

Chapter 15

Sub-query Functions

The Top Query always runs first in a Correlated Subquery The Top Query runs first (colored in blue)

SELECT * FROM Employee_Table as EE WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table as EEEE WHERE EE.Dept_No = EEEE.Dept_No)

EE.Dept_No = EEEE.Dept_No

SELECT * FROM Employee_Table as EE Employee_No ____________ Dept_No ________ Last_Name _________ Null is 2000000 skipped ? Jones 1000234 10 Smythe 1232578 100 Chambers 1324657 200 Coffing 1333454 200 Smith 2312225 300 Larkins 1121334 400 Strickling 2341218 400 Reilly 1256349 400 Harrison

First_Name _______ Salary _________ Squiggy 32800.50 Richard 64300.00 Mandee 48850.00 Billy 41888.88 John 48000.00 Loraine 40200.00 Cletus 54500.00 William 36000.00 Herbert 54500.00

Dept_No ________ 10 100 200 300 400

Employee_No ________ Dept_No __________ Last_Name __________ First_Name _______ Salary ____________ 1333454 1256349 1121334

Page 483

200 400 400

Smith Harrison Strickling

John Herbert Cletus

The bottom Query (in red) runs 1 time for each distinct Dept_No

48000.00 54500.00 54500.00

AVGSAL ________ 64300.00 48850.00 44944.44 40200.00 48333.33

Only these three employees make more than the AVG salary within their own department

Chapter 15

Sub-query Functions

Correlated Subquery Example vs. a Join with a Derived Table SELECT Last_Name, Dept_No, Salary FROM Employee_Table as EE WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table as EEEE WHERE EE.Dept_No = EEEE.Dept_No) ;

SELECT E.*, AVGSAL FROM Employee_Table as E INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty AND Salary > AVGSAL ;

Correlated Subquery Last_Name Dept_No _______ Salary __________ ________ Smith 200 48000.00 Harrison 400 54500.00 Strickling 400 54500.00

Join with a Derived Table Last_Name Dept_No _________ ________ Smith 200 Harrison 400 Strickling 400

Salary AVGSAL _______ ________ 48000.00 44944.44 54500.00 48333.33 54500.00 48333.33

Both queries above will bring back all employees making a salary that is greater than the average salary in their department. The biggest difference is that the Join with the Derived Table also shows the Average Salary in the result set. Page 484

Chapter 15

Sub-query Functions

Quiz- A Second Chance to Write a Correlated Subquery Sales_Table

Product_ID _________ Sale_Date __________ 1000 10/02/2000 1000 09/30/2000 1000 10/01/2000 All Rows are 2000 10/04/2000 NOT 2000 10/02/2000 Displayed 2000 09/28/2000 3000 10/04/2000 3000 10/02/2000 3000 10/03/2000

Daily_Sales __________ 32800.50 36000.07 40200.43 32800.50 36021.93 41888.88 15675.33 19678.94 21553.79

Write the Correlated Subquery Select all columns in the Sales_Table if the Daily_Sales column is greater than the Average Daily_Sales within its own Product_ID. Another opportunity knocking! This is your second chance. I will even give you a third chance.

Page 485

Chapter 15

Sub-query Functions

Answer - A Second Chance to Write a Correlated Subquery Select all columns in the Sales_Table if the Daily_Sales column is greater than the Average Daily_Sales within its own Product_ID. SELECT * FROM Sales_Table as TopS WHERE Daily_Sales > ( SELECT AVG(Daily_Sales) FROM Sales_Table as BotS WHERE TopS.Product_ID = BotS.Product_ID) ORDER BY Product_ID, Sale_Date ; Product_ID _________ Sale_Date __________ Daily_Sales __________

Answer Set

Page 486

1000 1000 1000 1000 2000 2000 2000 3000 3000 3000

09/28/2000 09/29/2000 10/03/2000 10/04/2000 09/29/2000 09/30/2000 10/01/2000 09/28/2000 09/29/2000 09/30/2000

48850.40 54500.22 64300.00 54553.10 48000.00 49850.03 54850.29 61301.77 34509.13 43868.86

Chapter 15

Sub-query Functions

Quiz- A Third Chance to Write a Correlated Subquery Sales_Table

Product_ID _________ Sale_Date __________ 1000 10/02/2000 1000 09/30/2000 1000 10/01/2000 All Rows are 2000 10/04/2000 NOT 2000 10/02/2000 Displayed 2000 09/28/2000 3000 10/04/2000 3000 10/02/2000 3000 10/03/2000

Daily_Sales __________ 32800.50 36000.07 40200.43 32800.50 36021.93 41888.88 15675.33 19678.94 21553.79

Write the Correlated Subquery Select all columns in the Sales_Table if the Daily_Sales column is greater than the Average Daily_Sales within its own Sale_Date. Another opportunity knocking! There is just one minor adjustment and you are home free.

Page 487

Chapter 15

Sub-query Functions

Answer - A Third Chance to Write a Correlated Subquery Select all columns in the Sales_Table if the Daily_Sales column is greater than the Average Daily_Sales within its own Sale_Date. SELECT * FROM Sales_Table as TopS WHERE Daily_Sales > ( SELECT AVG(Daily_Sales) FROM Sales_Table as BotS WHERE TopS.Sale_Date = BotS.Sale_Date) ORDER BY Sale_Date ; Product_ID _________ Sale_Date __________ Daily_Sales __________

Answer Set

Page 488

3000 2000 1000 3000 2000 2000 2000 1000 2000 1000 1000

09/28/2000 09/29/2000 09/29/2000 09/30/2000 09/30/2000 10/01/2000 10/02/2000 10/02/2000 10/03/2000 10/03/2000 10/04/2000

61301.77 48000.00 54500.22 43868.86 49850.03 54850.29 36021.93 32800.50 43200.18 64300.00 54553.10

Chapter 15

Sub-query Functions

Quiz- Last Chance to Write a Correlated Subquery Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

Write the Correlated Subquery Select all columns in the Student_Table if the Grade_Pt column is greater than the Average Grade_Pt within its own Class_Code. Another opportunity knocking! There is just one minor adjustment and you are home free.

Page 489

Chapter 15

Sub-query Functions

Answer – Last Chance to Write a Correlated Subquery Select all columns in the Student_Table if the Grade_Pt column is greater than the Average Grade_Pt within its own Class_Code.

SELECT * FROM Student_Table as TopS WHERE Grade_Pt > ( SELECT AVG(Grade_Pt) FROM Student_Table as BotS WHERE TopS. Class_Code = BotS.Class_Code ) ORDER BY Class_Code ;

Answer Set Student_ID Last_Name First_Name __________ __________ __________ Class_Code __________ Grade_Pt ________ 234121 125634 322133 231222 324652

Page 490

Thomas Hanson Bond Wilson Delaney

Wendy Henry Jimmy Susie Danny

FR FR JR SO SR

4.00 2.88 3.95 3.80 3.35

Chapter 15

Sub-query Functions

Quiz- Write the NOT Subquery Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

Write the Subquery Select all columns in the Customer_Table if the Customer has NOT placed an order. Another opportunity knocking! Write the above query!

Page 491

12347.53 8005.91 5111.47 15231.62 23454.84

Chapter 15

Sub-query Functions

Answer to Quiz- Write the NOT Subquery Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Matrix

Systems + + + + + + + + + + + + + + +

Page 492

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Customer_Table WHERE Customer_Number NOT IN (SELECT Customer_Number FROM Order_Table WHERE Customer_Number IS NOT NULL) ; Messages

Garden of Analysis

Customer_Number

1

31313131

Use this technique to get rid of Nulls

Result 1

Customer_Name Acme Products

Phone_Number 555-1111

Chapter 15

Sub-query Functions

Quiz- Write the Subquery using a WHERE Clause Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

Write the Subquery Select all columns in the Order_Table that were placed by a customer with ‘Bill’ anywhere in their name.

Another opportunity to show your brilliance is ready for you. Make it happen!

Page 493

12347.53 8005.91 5111.47 15231.62 23454.84

Chapter 15

Sub-query Functions

Answer - Write the Subquery using a WHERE Clause Nexus Chameleon File Edit View Query Tools Help Web Windows System: Matrix

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT * FROM Order_Table WHERE Customer_Number IN (SELECT Customer_Number FROM Customer_Table WHERE Customer_Name ilike '%Bill%') ; Messages

Garden of Analysis

Result 1

Order_Number Customer_Number Order_Date Order_Total 1 2

123456 123512

Great job on writing your query just like the one above!

Page 494

History

11111111 05/04/1998 12347.53 8005.91 11111111 01/01/1999

Chapter 15

Sub-query Functions

Quiz- Write the Subquery with Two Parameters Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

Write the Subquery What is the highest dollar order for each Customer? This Subquery will involve two parameters! Get ready to be amazed at either yourself or the Answer on the next page!

Page 495

12347.53 8005.91 5111.47 15231.62 23454.84

Chapter 15

Sub-query Functions

Answer to Quiz- Write the Subquery with Two Parameters Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Write the Subquery

What is the highest dollar order for each Customer? This Subquery will involve two parameters!

SELECT Customer_Number, Order_Number, Order_Total FROM Order_Table WHERE (Customer_Number, Order_Total) IN (SELECT Customer_Number, MAX(Order_Total) FROM Order_Table GROUP BY 1) ;

This is how you utilize multiple parameters in a Subquery! Turn the page for more.

Page 496

Notice two parameters in the top query and two in the bottom.

Chapter 15

Sub-query Functions

How the Double Parameter Subquery Works Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

SELECT Customer_Number, Order_Number, Order_Total FROM Order_Table WHERE (Customer_Number, Order_Total) IN (SELECT Customer_Number, MAX(Order_Total) FROM Order_Table GROUP BY 1) ; Customer_Number ________________ Max(Order_Total) _______________ 11111111 31323134 87323456 57896883

12347.53 5111.47 15231.62 23454.84

These 4 rows are sent to the top query

The bottom query runs first returning two columns. Turn to the next page for more info!

Page 497

Chapter 15

Sub-query Functions

More on how the Double Parameter Subquery Works Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

SELECT Customer_Number, Order_Number, Order_Total FROM Order_Table WHERE (Customer_Number, Order_Total ) IN ( 11111111 ,12347.53 The top query now uses the 31323134 , 5111.47 In-list 87323456 ,15231.62 57896883 ,23454.84 );

The IN list is built and the top query can now process for the final Answer Set.

Page 498

Chapter 15

Sub-query Functions

Quiz – Write the Triple Subquery Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Write the Subquery

What is the Customer_Name who has the highest dollar order among all customers? This query will have multiple Subqueries!

Good luck in writing this. Remember that this will involve multiple Subqueries.

Page 499

Chapter 15

Sub-query Functions

Answer to Quiz – Write the Triple Subquery Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Write the Subquery What is the Customer_Name who has the highest dollar order among all customers? This query will have multiple Subqueries! SELECT Customer_Name XYZ Plumbing FROM Customer_Table WHERE Customer_Number IN 58796883 This runs (SELECT Customer_Number FROM Order_Table second WHERE Order_Total IN (SELECT Max(Order_Total) FROM Order_Table)) ; 23454.84 This runs first This runs third

The query is above and, of course, the answer is XYZ Plumbing.

Page 500

Chapter 15

Sub-query Functions

Quiz – How many rows return on a NOT IN with a NULL? Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777 000099

11111111 11111111 31323134 87323456 57896883 NULL

We added a Null Value to the Order_Table

12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL

SELECT Customer_Name FROM Customer_Table WHERE Customer_Number NOT IN (SELECT Customer_Number FROM Order_Table ) ;

How many rows return from the query now that a NULL value is in a Customer_Number?

We really didn’t place a new row inside the Order_Table with a NULL value for the Customer_Number column, but in theory, if we had, how many rows would return?

Page 501

Chapter 15

Sub-query Functions

Answer – How many rows return on a NOT IN with a NULL? Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777 000099

11111111 11111111 31323134 87323456 57896883 NULL

We added a Null Value to the Order_Table

12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL

SELECT Customer_Name FROM Customer_Table WHERE Customer_Number NOT IN (SELECT Customer_Number FROM Order_Table ) ;

How many rows return from the query now that a NULL value is in a Customer_Number? ZERO rows will return

The answer is no rows come back. This is because when you have a NULL value in a NOT IN list, the system doesn’t know the value of NULL, so it returns nothing.

Page 502

Chapter 15

Sub-query Functions

How to handle a NOT IN with Potential NULL Values Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777 000099

11111111 11111111 31323134 87323456 57896883 NULL

We added a Null Value to the Order_Table

12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL

SELECT Customer_Name FROM Customer_Table WHERE Customer_Number NOT IN (SELECT Customer_Number FROM Order_Table WHERE Customer_Number IS NOT NULL) ;

How many rows return NOW from the query? 1 Acme Products

You can utilize a WHERE clause that tests to make sure Customer_Number IS NOT NULL. This should be used when a NOT IN could encounter a NULL.

Page 503

Chapter 15

Sub-query Functions

Using a Correlated Exists Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Use EXISTS to find which Customers have placed an Order?

SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ;

The EXISTS command will determine via a Boolean if something is True or False. If a customer placed an order, it EXISTS, and using the Correlated Exists statement, only customers who have placed an order will return in the answer set.

Page 504

Chapter 15

Sub-query Functions

How a Correlated Exists matches up Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Does not Acme Products Exist in ACE Consulting Order_Table XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ; Customer_Number ________________

________________ Customer_Name

11111111 31323134 57896883 87323456

Billy’s Best Choice ACE Consulting XYZ Plumbing Databases N-U

Only customers who placed an order return with the above Correlated EXISTS.

Page 505

Chapter 15

Sub-query Functions

The Correlated NOT Exists Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Use NOT EXISTS to find which Customers have NOT placed an Order? SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE NOT EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ;

The EXISTS command will determine via a Boolean if something is True or False. If a customer placed an order, it EXISTS, and using the Correlated Exists statement, only customers who have placed an order will return in the answer set. EXISTS is different than IN as it is less restrictive as you will soon understand.

Page 506

Chapter 15

Sub-query Functions

The Correlated NOT Exists Answer Set Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777

11111111 11111111 31323134 87323456 57896883

12347.53 8005.91 5111.47 15231.62 23454.84

Use NOT EXISTS to find which Customers have NOT placed an Order? SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE NOT EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ;

Customer_Number Customer_Name ________________ ______________ 31313131

Acme Products

The only customer who did NOT place an order was Acme Products.

Page 507

Chapter 15

Sub-query Functions

Quiz – How many rows come back from this NOT Exists? Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777 000099

11111111 11111111 31323134 87323456 57896883 NULL

We added a Null Value to the Order_Table

12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL

SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE NOT EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ;

How many rows return from the query?

A NULL value in a list for queries with NOT IN returned nothing, but you must now decide if that is also true for the NOT EXISTS. How many rows will return? Page 508

Chapter 15

Sub-query Functions

Answer – How many rows come back from this NOT Exists? Customer_Table

Order_Table

Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456

Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U

123456 123512 123552 123585 123777 000099

11111111 11111111 31323134 87323456 57896883 NULL

We added a Null Value to the Order_Table

12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL

SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE NOT EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ; How many rows return from the query? One row Acme Products

NOT EXISTS is unaffected by a NULL in the list and that’s why it is more flexible!

Page 509

Chapter 16

Page 510

Substrings and Positioning Functions

Chapter 16

Substrings and Positioning Functions

Chapter 16 – Substrings and Positioning Functions

“It’s always been and always will be the same in the world: the horse does the work and the coachman is tipped.” - Anonymous

Page 511

Chapter 16

Substrings and Positioning Functions

The TRIM Command trims both Leading and Trailing Spaces Query 1 SELECT Last_Name ,Trim(Last_Name) AS No_Spaces FROM Employee_Table ;

Query 2 SELECT Last_Name ,Trim(Both from Last_Name) AS No_Spaces FROM Employee_Table ;

Both queries above do the exact same thing. They remove spaces from the beginning and the end of the column Last_Name.

Both queries trim both the leading and trailing spaces from Last_Name.

Page 512

Chapter 16

Substrings and Positioning Functions

A Visual of the TRIM Command Using Concatenation Concatenation without Trim and with Trim SELECT Last_Name concatenate ,First_Name ,Last_Name || First_Name as NameBackwards ,TRIM(Last_Name) || First_Name as TrimNameBackwards FROM Employee_Table

Last_Name First_Name __________ __________ Jones Squiggy Smith John Smythe Richard Harrison Herbert Chambers Mandee Strickling Cletus Reilly William Coffing Billy Larkins Loraine

NameBackwards TrimNameBackwards ______________________ __________________ Jones Squiggy JonesSquiggy Smith John SmithJohn Smythe Richard SmytheRichard Harrison Herbert HarrisonHerbert Chambers Mandee ChambersMandee Strickling Cletus StricklingCletus Reilly William ReillyWilliam Coffing Billy CoffingBilly Larkins Loraine LarkinsLoraine

When you use the TRIM command on a column, that column will have all beginning and ending spaces removed. Page 513

Chapter 16

Substrings and Positioning Functions

Trim and Trailing is Case Sensitive VARCHAR Capitol 'Y'

SELECT First_Name, Trim(trailing 'Y' from First_Name) AS No_Y, Trim(trailing 'y' from First_Name) AS Success FROM Employee_Table Lower Case 'y' ORDER BY 1; For leading and trailing TRIM commands, case sensitivity is important. First_Name No_Y Success __________ ________ __________ Billy Billy Bill Cletus Cletus Cletus Herbert Herbert Herbert John John John Loraine Loraine Loraine Mandee Mandee Mandee Richard Richard Richard Squiggy Squiggy Squigg William William William

For LEADING and TRAILNG TRIM commands, case sensitivity is required.

Page 514

Chapter 16

Substrings and Positioning Functions

How to TRIM Trailing Letters VARCHAR

SELECT First_Name ,Trim(trailing 'y' from First_Name) AS No_Y ,Last_Name ,Trim(trailing 'g' from (TRIM (Last_Name))) AS No_G FROM Employee_Table ; CHAR(20)

First_Name No_Y __________ ________

Last_Name _________ No_G __________

Squiggy John Richard Herbert Mandee Cletus William Billy Loraine

Jones Smith Smythe Harrison Chambers Strickling Reilly Coffing Larkins

Squigg John Richard Herbert Mandee Cletus William Bill Loraine

Jones Smith Smythe Harrison Chambers Stricklin Reilly Coffin Larkins

The above example removed the trailing ‘y’ from the First_Name and the trailing ‘g’ from the Last_Name. Remember that this is case sensitive.

Page 515

Chapter 16

Substrings and Positioning Functions

The SUBSTRING Command SELECT First_Name, SUBSTRING (First_Name FROM 2 for 3) AS Quiz FROM Employee_Table ; Start in position 2

First_Name __________ Squiggy John Richard Herbert Mandee Cletus William Billy Loraine

Go for 3 positions

Quiz ______ qui ohn ich erb and let ill ill ora

This is a SUBSTRING. The substring is passed two parameters, and they are the starting position of the string and the number of positions to return (from the starting position). The above example will start in position 2 and go for 3 positions!

Page 516

Chapter 16

Substrings and Positioning Functions

How SUBSTRING Works with NO ENDING POSITION SELECT First_Name, SUBSTRING (First_Name FROM 2) AS GoToEnd FROM Employee_Table ; Start in Position 2

First_Name GoToEnd __________ _________ Squiggy quiggy John ohn Richard ichard Herbert erbert Mandee andee Cletus letus William illiam Billy illy Loraine oraine If you don’t tell the Substring the end position, it will go all the way to the end.

Page 517

Chapter 16

Substrings and Positioning Functions

Using SUBSTRING to move Backwards SELECT First_Name, SUBSTRING (First_Name FROM 0 For 6) AS Before1 FROM Employee_Table ; Start in Position 0 one space before

Before1 First_Name __________ ________ Squig Squiggy John John Richa Richard Herbe Herbert Mande Mandee Cletu Cletus Willi William Billy Billy Lorai Loraine

A starting position of zero moves one space in front of the beginning. Notice that our FOR Length is 6 so ‘Squiggy’ turns into ‘ Squig’. The point being made here is that both the starting position and ending positions can move backwards which will come in handy as you see other examples. Page 518

Chapter 16

Substrings and Positioning Functions

How SUBSTRING Works with a Starting Position of -1 SELECT First_Name, SUBSTRING (First_Name FROM -1 For 3) AS Before2 FROM Employee_Table ; Start in Position -1. This is two spaces before.

First_Name Before2 __________ ________ Squiggy S John J Richard R Herbert H Mandee M Cletus C William W Billy B Loraine L

A starting position of -1 moves two spaces in front of the beginning. Notice that our FOR Length is 3, so each name delivers only the first initial. The point being made here is that both the starting position and ending positions can move backwards which will come in handy as you see other examples. Page 519

Chapter 16

Substrings and Positioning Functions

How SUBSTRING Works with an Ending Position of 0 SELECT First_Name, SUBSTRING (First_Name FROM 3 For 0) AS WhatsUp FROM Employee_Table ; Go for 0 positions

First_Name WhatsUp __________ ________ Squiggy John Richard Herbert Mandee Cletus William Billy Loraine

In our example above, we start in position 3, but we go for zero positions, so nothing is delivered in the column. That is what’s up!

Page 520

Chapter 16

Substrings and Positioning Functions

The POSITION Command finds a Letters Position SELECT Last_Name ,Position ('e' in Last_Name) AS Find_The_E ,Position ('f' in Last_Name) AS Find_The_F FROM Employee_Table ;

4th

e is in position

e is 2nd position in name

Last_Name __________ Jones Smith Smythe Harrison Chambers Strickling Reilly Coffing Larkins

Find_The_E __________ 4 0 6 0 6 0 2 0 0

Find_The_F __________ 0 0 0 No f is in 0 the name 0 0 0 1st f is in 3 3rd position 0

This is the position counter. What it will do is tell you what position a letter is on. Why did Jones have a 4 in the result set? The ‘e’ was in the 4th position. Why did Smith get a zero for both columns? There is no ‘e’ in Smith and no ‘f’ in Smith. If there are two ‘f’s, only the first occurrence is reported.

Page 521

Chapter 16

Substrings and Positioning Functions

Quiz – Find that SUBSTRING Starting Position SELECT DISTINCT Department_Name as Dept_Name ,SUBSTRING(Department_Name FROM POSITION(' ' IN Department_Name) +1) as Word2 FROM Department_Table WHERE POSITION(' ' IN trim(Department_Name)) >0;

Dept_Name __________________ Customer Support Human Resources Research and Develop

Word2 ___________ Support Resources and Develop

What is the Starting Position here?

What is the Starting position of the Substring in the above query? Hint: This only looks for a Dept_Name that has two words or more.

Page 522

Chapter 16

Substrings and Positioning Functions

Answer to Quiz – Find that SUBSTRING Starting Position SELECT DISTINCT Department_Name as Dept_Name ,SUBSTRING(Department_Name FROM POSITION(' ' IN Department_Name) +1) as Word2 FROM Department_Table WHERE POSITION(' ' IN trim(Department_Name)) >0; Dept_Name __________________ Customer Support Human Resources Research and Develop

Word2 ___________ Support Resources and Develop

What is the Starting Position here?

The Starting Position is calculated by finding the length up to the first SPACE and then adding 1.

Customer Support (FROM 10) Human Resources (FROM 7) Research and Develop FROM 10)

What is the Starting position of the Substring in the above query? See above!

Page 523

Chapter 16

Substrings and Positioning Functions

Using the SUBSTRING to Find the Second Word On SELECT DISTINCT Department_Name as Dept_Name ,SUBSTRING(Department_Name FROM POSITION(' ' IN Department_Name) +1) as Word2 FROM Department_Table WHERE POSITION(' ' IN trim(Department_Name)) >0;

Dept_Name __________________ Customer Support Human Resources Research and Develop

Word2 ____________ Support Resources and Develop

Notice we only had three rows come back. That is because our WHERE looks for only Department_Name that has multiple words. Then, notice that our starting position of the Substring is a subquery that looks for the first space. Then, it adds 1 to the starting position, and we have a starting position for the 2nd word. We don’t give a FOR length parameter, so it goes to the end. .

Page 524

Chapter 16

Substrings and Positioning Functions

Quiz – Why did only one Row Return? SELECT Department_Name ,SUBSTRING(Department_Name from POSITION(' ' IN Department_Name) + 1 + POSITION(' ' IN SUBSTRING(Department_Name FROM POSITION(' ' IN Department_Name) + 1))) as Third_Word FROM Department_Table WHERE POSITION(' ' IN TRIM(Substring(Department_Name from POSITION(' ' in Department_Name) + 1)))> 0

Dept_Name _________ Research and Develop

Why did only one row come back?

Page 525

Third_Word __________ Develop

Chapter 16

Substrings and Positioning Functions

Answer to Quiz – Why Did only one Row Return? SELECT Department_Name ,SUBSTRING(Department_Name from POSITION(' ' IN Department_Name) + 1 + POSITION(' ' IN SUBSTRING(Department_Name FROM POSITION(' ' IN Department_Name) + 1))) as Third_Word FROM Department_Table WHERE POSITION(' ' IN TRIM(Substring(Department_Name from POSITION(' ' in Department_Name) + 1)))> 0

Dept_Name __________________ Research and Develop

Third_Word __________ Develop

It has 3 words

Why did only one row come back? It’s the Only Department Name with three words. The SUBSTRING and the WHERE clause both look for the first space, and if they find it, they look for the second space. If they find that, add 1 to it, and their Starting Position is the third word. There is no FOR position, so it defaults to “go to the end”.

Page 526

Chapter 16

Substrings and Positioning Functions

Concatenation

Two Pipe Symbols together (no space) mean concatenate

SELECT First_Name ,Last_Name ,First_Name || ' ' || Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy'

First_Name _________

Last_Name Full_Name _________ ___________

Squiggy

Jones

Squiggy Jones

See those || ? Those represent concatenation. That allows you to combine multiple columns into one column. The || (Pipe Symbol) on your keyboard is just above the ENTER key. Don’t put a space in between, just put two Pipe Symbols together. In this example, we have combined the first name, then a single space, and then the last name to get a new column called ‘Full name’ like Squiggy Jones.

Page 527

Chapter 16

Substrings and Positioning Functions

Concatenation and SUBSTRING A Period (.) and a space

SELECT First_Name ,Last_Name ,Substring(First_Name, 1, 1) || '. ' || Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy'

_________ First_Name _________ Last_Name _________ Full_Name Squiggy Jones S. Jones Of the three items being concatenated together, what is the first item of concatenation in the example above? The first initial of the First_Name. Then, we concatenated a literal space and a period. Then, we concatenated the Last_Name.

Page 528

Chapter 16

Substrings and Positioning Functions

Four Concatenations Together CHAR(20)

VARCHAR(12)

SELECT First_Name ,Last_Name ,TRIM(Last_Name) ||' ' || Substring(First_Name, 1, 1) || '.' AS Last_Name_1st FROM Employee_Table WHERE First_Name = 'Squiggy' ;

First_Name Last_Name_1st __________ Last_Name _________ _____________

Squiggy

Jones

Jones S.

Why did we TRIM the Last_Name? To get rid of the spaces or the output would have looked odd. How many items are being concatenated in the example above? There are 4 items concatenated. We start with the Last_Name (after we trim it), followed by a single space, then we have the First Initial of the First Name, and at the end we have a period.

Page 529

Chapter 16

Substrings and Positioning Functions

Troubleshooting Concatenation ERROR: There should never be spaces between the pipe symbols

SELECT First_Name ,Last_Name ,TRIM (Last_Name) | | First_Name AS LastFirst FROM Employee_Table WHERE First_Name = 'Squiggy' ; This is now perfect

SELECT First_Name ,Last_Name ,TRIM (Last_Name) || First_Name AS LastFirst FROM Employee_Table WHERE First_Name = 'Squiggy' ; First_Name Last_Name ___________ LastFirst __________ __________ Squiggy

Jones

JonesSquiggy

What happened above to cause the error? Can you see it? The Pipe Symbols || have a space between them like | |, when it should be ||. It is a tough one to spot, so be careful. Page 530

Chapter 16

Substrings and Positioning Functions

Declaring a Cursor

The above example declares a cursor named TeraTom to select sales information from the Sales_Table and then fetch rows from the result set using the cursor. Page 531

Chapter 17

Page 532

Interrogating the Data

Chapter 17

Interrogating the Data

Chapter 17 – Interrogating the Data

"The difference between genius and stupidity is that genius has its limits" - Albert Einstein

Page 533

Chapter 17

Interrogating the Data

Quiz – What would the Answer be? Sample_Table Class_Code _________ Fr

Grade_Pt _______ 0

SELECT Class_Code ,Grade_Pt / (Grade_Pt * 2 ) as Math1 FROM Sample_Table ORDER BY 1,2 ;

Can you guess what would return in the Answer Set?

Use the fake table above called Sample_Table, and try and predict what the Answer will be if this query was running on the system.

Page 534

Chapter 17

Interrogating the Data

Answer to Quiz – What would the Answer be? Sample_Table Class_Code _________ Fr

Grade_Pt _______ 0

SELECT Class_Code ,Grade_Pt / (Grade_Pt * 2 ) as Math1 FROM Sample_Table ORDER BY 1,2 ;

Can you guess what would return in the Answer Set? Error – Division by zero You get an error when you DIVIDE by ZERO! Let’s turn the page and fix it!

Page 535

Chapter 17

Interrogating the Data

The NULLIF Command Sample_Table Class_Code _________ Fr

Grade_Pt _______ 0

Notice that in our sample table, the value is 0.

SELECT Class_Code ,Grade_Pt / ( NULLIF (Grade_pt, 0) * 2 ) AS Math1 FROM Sample_Table;

Class_Code ______ Math1 __________ Fr

?

Answer set

What the NULLIF does is take two values and if they match return a NULL. In the above example, the NULLIF compares the first argument, Grade_Pt, with the second argument, 0, and replaces any zeros with NULL. So, the answer set you’d get from this is a simple ‘ FR’, and then a NULL value represented usually by a ‘? ’. If you have a calculation where a ZERO could kill the operation, and you don’t want that, you can use the NULLIF command to convert any zero value to a NULL value. The NULIF command can also be used to return NULL if any two values match. Page 536

Chapter 17

Interrogating the Data

Quiz – Fill in the Blank Values in the Answer Set Sample_Table Cust_No ________ 0

Acc_Balance _______ Location ___________ ? 3

SELECT NULLIF (Cust_No ,0) AS Cust_No ,NULLIF (Acc_Balance, 0) AS Acc_Balance ,NULLIF (Location, 0) AS Location FROM Sample_Table ;

Cust_No Acc_Balance

________ ____________

Location

_________

Fill in the Answer Set above after looking at the table and the query.

Okay! Time to show me your brilliance! What would the Answer Set produce?

Page 537

Chapter 17

Interrogating the Data

Answer to Quiz – Fill in the Blank Values in the Answer Set Sample_Table Cust_No ________ 0

Acc_Balance Location ___________ _______ ? 3

SELECT NULLIF (Cust_No, 0) AS Cust_No ,NULLIF (Acc_Balance, 0) AS Acc_Balance ,NULLIF (Location, 0) AS Location FROM Sample_Table ;

Cust_No Acc_Balance

Location

________ _____________ _________

?

?

3

Here is the answer set! How’d you do? The NULLIF command found a zero in Cust_No, so it made it NULL. The others were not zero, so they retained their value. The only time NULLIF changes data is if the 1 st argument and the 2nd argument are equal, and then it changes it to NULL.

Page 538

Chapter 17

Interrogating the Data

Quiz – Fill in the Answers for the NULLIF Command Sample_Table Cust_No Acc_Balance Location ________ ___________ _______ 0 ? 3 SELECT NULLIF(Cust_No, 0) ,NULLIF(Cust_No, 3) ,NULLIF(Acc_Balance, 0) ,NULLIF(Acc_Balance, 3) ,NULLIF(Location, 0) ,NULLIF(Location, 3) FROM Sample_Table; Cust1 Cust2 ____ Acc1 _____ Acc2 _____ _____

AS Cust1 AS Cust2 AS Acc1 AS Acc2 AS Loc1 AS Loc2

Loc1 ____

Loc2 _____

Fill in the Answer Set above after looking at the table and the query.

As mentioned previously, you can also use the NULLIF() if you are asking Matrix to NULL the answer if the COLUMN matches the number in the parentheses (it doesn’t have to be 0). What would the above Answer Set produce from your analysis?

Page 539

Chapter 17

Interrogating the Data

Quiz – Fill in the Answers for the NULLIF Command Sample_Table Cust_No Acc_Balance Location ________ ___________ _______ 0 ? 3

SELECT NULLIF(Cust_No, 0) ,NULLIF(Cust_No, 3) ,NULLIF(Acc_Balance, 0) ,NULLIF(Acc_Balance, 3) ,NULLIF(Location, 0) ,NULLIF(Location, 3) FROM Sample_Table;

AS Cust1 AS Cust2 AS Acc1 AS Acc2 AS Loc1 AS Loc2

Cust1 Cust2 ____ Acc1 _____ Acc2 ____ Loc1 _____ Loc2 _____ _____ ? 0 ? ? 3 ?

Look at the answers above, and if it doesn’t make sense, go over it again until it does.

Page 540

Chapter 17

Interrogating the Data

Quiz – Fill in the Answers for the NULLIF Command Sample_Table Cust_No Origin_State Current_State ________ __________ ___________ 1003 CT NY 1004 NY NY 1005 MA CT 1006 MA MA

SELECT Cust_No, NULLIF(Current_State, Origin_State) as “Mobile Worker State” FROM Sample_Table;

Cust_No _______ 1003 1004 1005 1005

Mobile Worker State _________________

Fill in the Answer Set above after looking at the table and the query.

Finally, you can also use the NULLIF() if you are asking Matrix to NULL the answer if the COLUMN matches another COLUMN in the parentheses (it doesn’t have to be a literal number or string). What would the above Answer Set produce from your analysis?

Page 541

Chapter 17

Interrogating the Data

Quiz – Fill in the Answers for the NULLIF Command Sample_Table Cust_No Origin_State Current_State ________ __________ ___________ 1003 CT NY 1004 NY NY 1005 MA CT 1006 MA MA SELECT Cust_No, NULLIF(Current_State, Origin_State) as “Mobile Worker State” FROM Sample_Table;

Cust_No _______ 1003 1004 1005 1005

Mobile Worker State _________________ NY ? CT ?

Look at the answers above, and if it doesn’t make sense, go over it again until it does.

Page 542

Chapter 17

Interrogating the Data

The ISNULL, NVL and COALESCE Commands Sample_Table Cust_No Acc_Balance Location ________ ___________ _______ 0 ? 3

Notice the Null! We’re turning it into a 0 shortly!

SELECT ISNULL (Cust_No, 0) as Cust ,NVL (Acc_Balance, 0) as Balance ,COALESCE (Location, 0) as Location FROM Sample_Table ;

Cust Balance _____ _________

Location

_________

Fill in the Answer Set above after looking at the table and the query.

ISNULL, NVL and COALESCE commands all are synonyms for the same function. These functions take any number of expressions as input and returns the value of the first expression in the list that is not null. If all expressions are null, the result is null. When a non-null value is found, the remaining expressions in the list are not evaluated. Page 543

Chapter 17

Interrogating the Data

The ISNULL, NVL and COALESCE Commands Sample_Table Cust_No Acc_Balance Location ________ ___________ _______ 0 ? 3

Notice the Null! We’re turning it into a 0 shortly!

SELECT ISNULL (Cust_No, 0) as Cust ,NVL (Acc_Balance, 0) as Balance ,COALESCE (Location, 0) as Location FROM Sample_Table ;

Cust Balance _____ _________

0

0

Location

_________

3

Fill in the Answer Set above after looking at the table and the query.

The answer set placed a zero in the place of the NULL Acc_Balance, but the other values didn’t change because they were NOT Null. As you probably noticed, these powerful functions can be used in a number of other ways. We’ll show another example in the next slide. Hold onto your seat this is pretty exciting stuff! Page 544

Chapter 17

Interrogating the Data

The ISNULL, NVL and COALESCE more examples Sample_Table Last_Name Home_Phone __________ ___________ Jones 555-1234 Patel ? Gonzales ? Nguyen ?

Work_Phone __________ Cell_Phone ___________ 444-1234 ? 456-7890 454-6789 ? 354-0987 ? ?

SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone) as Phone FROM Sample_Table ;

Last_Name __________

Phone ________

Fill in the Answer Set above after looking at the table and the query.

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null. Coalesce is the ANSI compliant name for this function but many legacy databases also support the synonyms NVL and ISNULL for this function. Page 545

Chapter 17

Interrogating the Data

The COALESCE Answer Set Sample_Table Last_Name Home_Phone __________ ___________ Jones 555-1234 Patel ? Gonzales ? Nguyen ?

Work_Phone __________ Cell_Phone ___________ 444-1234 ? 456-7890 454-6789 ? 354-0987 ? ?

SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone) as Phone FROM Sample_Table ; Last_Name __________ Jones Patel Gonzales Nguyen

Phone ________ 555-1234 456-7890 354-0987 ?

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.

Page 546

Chapter 17

Interrogating the Data

The Coalesce Quiz Sample_Table

Last_Name Home_Phone __________ ___________ Jones 555-1234 Patel ? Gonzales ? Nguyen ?

Work_Phone __________ Cell_Phone ___________ 444-1234 ? 456-7890 454-6789 ? 354-0987 ? ?

SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone, 'No Phone') as Phone FROM Sample_Table ;

Last_Name __________

Phone ________

Fill in the Answer Set above after looking at the table and the query.

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null. Since we decided in the above query we don’t want NULLs, notice we have placed a literal ‘No Phone’ in the list. How will this affect the Answer Set? Page 547

Chapter 17

Interrogating the Data

Answer – The Coalesce Quiz Sample_Table

Last_Name Home_Phone __________ ___________ Jones 555-1234 Patel ? Gonzales ? Nguyen ?

Work_Phone Cell_Phone ___________ __________ 444-1234 ? 456-7890 454-6789 ? 354-0987 ? ?

SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone, 'No Phone') as Phone FROM Sample_Table ;

Last_Name __________ Jones Patel Gonzales Nguyen

Phone ________ 555-1234 456-7890 354-0987 No Phone

Answers are above! We put a literal in the list so there’s no chance of NULL returning.

Page 548

Chapter 17

Interrogating the Data

The Basics of CAST (Convert And STore) CAST will convert a column or value’s data type temporarily into another data type. Below is the syntax:

SELECT CAST( AS [()] ) FROM ;

Examples using CAST: CAST ( CAST ( CAST ( CAST ( CAST ( CAST (

AS CHAR(5) ) /* convert smallint to character */ AS INTEGER ) /* truncates decimals */ AS SMALLINT ) /* convert binary to smallint */ AS BYTE (128) ) /* convert character to binary */ AS VARCHAR(5) ) /* convert byteint to character */ AS FLOAT ) /* convert integer to float point */

Data can be converted from one type to another by using the CAST function. As long as the data involved does not break any data rules (i.e. placing alphabetic or special characters into a numeric data type), the conversion works. The name of the CAST function comes from the Convert And STore operation that it performs.

Page 549

Chapter 17

Interrogating the Data

Some Great CAST (Convert And STore) Examples

SELECT CAST('ABCDE' AS CHAR(1) ) AS Trunc ,CAST(128 AS CHAR(3) ) AS OK ,CAST(127 AS INTEGER ) AS Bigger ;

_____ Trunc A

____ OK 128

____________ Bigger 127

The first CAST truncates the five characters (left to right) to form the single character ‘A’. In the second CAST, the integer 128 is converted to three characters and left justified in the output. The 127 was initially stored in a SMALLINT (5 digits - up to 32767) and then converted to an INTEGER. Hence, it uses 11 character positions for its display, ten numeric digits and a sign (positive assumed) and right justified as numeric.

Page 550

Chapter 17

Interrogating the Data

Some Great CAST (Convert And STore) Examples

SELECT CAST(121.53 AS SMALLINT) AS Whole ,CAST(121.53 AS DECIMAL(3,0)) AS Rounder ;

______ _______ Whole Rounder 121 122

The value of 121.53 was initially stored as a DECIMAL as 5 total digits with 2 of them to the right of the decimal point. Then, it is converted to a SMALLINT using CAST to remove the decimal positions. Therefore, it truncates data by stripping off the decimal portion. It does not round data using this data type. On the other hand, the CAST in the fifth column called Rounder is converted to a DECIMAL as 3 digits with no digits (3,0) to the right of the decimal, so it will round data values instead of truncating. Since .53 is greater than .5, it is rounded up to 122.

Page 551

Chapter 17

Interrogating the Data

Some Great CAST (Convert And STore) Examples SELECT Order_Number as OrdNo ,Customer_Number as CustNo ,Order_Date ,Order_Total ,CAST(Order_Total as integer) as Chopped ,CAST(Order_Total as Decimal(5,0)) as Rounded FROM Order_Table ;

OrdNo _________ CustNo Order_Date Order_Total _______ __________ __________ Chopped _______ 123585 123777 123512 123456 123552

87323456 57896883 11111111 11111111 31323134

10/10/1999 09/09/1999 01/01/1999 05/04/1998 10/01/1999

15231.62 23454.84 8005.91 12347.53 5111.47

15231 23454 8005 12347 5111

Rounded _______ 15232 23455 8006 12348 5111

The Column Chopped takes Order_Total (a Decimal (10,2) and CASTs it as an integer which chops off the decimals. Rounded CASTs Order_Total as a Decimal (5,0), which takes the decimals and rounds up if the decimal is .50 or above.

Page 552

Chapter 17

Interrogating the Data

The Basics of the CASE Statements Sample_Table Course_Name ______________

Credits ______

Tera-Tom on SQL

1

SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' END AS CreditAlias FROM Sample_Table ;

Course_Name ___________

CreditAlias __________

Fill in the Answer Set above after looking at the table and the query.

This is a CASE STATEMENT which allows you to evaluate a column in your table, and from that, come up with a new answer for your report. Every CASE begins with a CASE, and they all must end with a corresponding END. What would the answer be?

Page 553

Chapter 17

Interrogating the Data

The Basics of the CASE Statement Sample_Table Course_Name ______________

Credits ______

Tera-Tom on SQL

1

SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' END AS CreditAlias FROM Sample_Table ; Course_Name CreditAlias ______________ __________ Tera-Tom on SQL One Credit This is a CASE STATEMENT which allows you to evaluate a column in your table, and from that, come up with a new answer for your report. Every CASE begins with a CASE, and they all must end with a corresponding END. What would the answer be? Page 554

Chapter 17

Interrogating the Data

Valued Case Vs. A Searched Case SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' Else 'Credits not found' END AS CreditAlias FROM Course_Table ;

The column Credits (in blue) follows the word CASE. This is a valued case statement. Rules for a Valued CASE: 1. You can only check for equality 2. You can only check the value Credits

SELECT Course_Name NO Value follows the word ,CASE CASE. This is a Searched CASE! WHEN Credits