Greenplum - Architecture and SQL [1 ed.] 9781940540337

Greenplum is the first open source data warehouse. Purchased and improved by EMC, sold to Dell, makes Greenplum one of t

165 9 8MB

English Pages 940 Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Greenplum - Architecture and SQL [1 ed.]
 9781940540337

Citation preview

The Tera-Tom Video Series

Lessons with Tera-Tom Teradata Architecture and SQL Video Series These exciting videos make learning and certification much easier

Three ways to view them: 1. Safari (look up Coffing Studios) 2. CoffingDW.com (sign-up on our website) 3. Your company can buy them all for everyone to see (contact [email protected])

Current Books in the Tera-Tom Genius Series

Current Books in the Tera-Tom Genius Series

Our Recommended Book In The Tera-Tom Genius Series

Tera-Tom- Author of over 75 Books

Tera-Tom books have been the primary source of Teradata learning for over 20 years. They have helped to teach millions of people all aspects of Teradata. What people love the most about the Tera-Tom books is how easy they are to understand. They are so easy that a seven year old boy (raised by wolves) can understand them!

The Best Query Tool Works on all Systems

When you possess a tool like Nexus, you have access to every system in your enterprise! The Nexus Query Chameleon is the only tool that works on all systems. Its Super Join Builder allows for the ERwin Logical Model to be loaded, and then Nexus shows tables and views visually. It then guides users to show what joins to what. As users choose the tables and columns they want in their report, Nexus builds the SQL for them with each click of the mouse. Nexus was designed for Teradata and Hadoop, but works on all platforms. Nexus even converts table structures between vendors, so querying and managing multi-vendor platforms is transparent. Even if you only work with one system, you will find that the Nexus is the best query tool you have ever used. If you work with multiple systems, you will be even more amazed. Download a free trial at www.CoffingDW.com.

Trademarks and Copyrights Microsoft Windows, Windows 2003 Server, SQL Server 2012, SQL Server Compact Edition, .NET, PDW, SQL Server, T-SQL, Azure SQL Data Warehouse and Azure Cloud are trademarks of Microsoft. Teradata, NCR, BYNET and SQL Assistant are registered trademarks of Teradata Corporation, Dayton, Ohio, U.S.A., IBM, DB2 and Netezza are registered trademarks of IBM Corporation, ANSI is a registered trademark of the American National Standards Institute. Ethernet is a trademark of Xerox. UNIX is a trademark of The Open Group. Linux is a trademark of Linus Torvalds. Java and Oracle is a trademark of Oracle. ParAccel is a trademark of ParAccel. Kognitio is a trademark of Kognitio. Greenplum is a trademark of EMC Corporation/Dell Corporation. Nexus Query Chameleon is a trademark of Coffing Data Warehousing. Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of programs or program segments that are included. The manual is not a publication of EMC or Dell Corporation, nor was it produced in conjunction with EMC or Dell Corporation. Copyright © November 2015 by Coffing Publishing ISBN 978-1-940540-33-7 All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, neither is any liability assumed for damages resulting from the use of information contained herein.

About Tom Coffing

Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 20 years. Tom has written over 50 books on all aspects of Teradata, Netezza, Kognitio, Redshift, ParAccel, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 Teradata classes in places such as India, Africa, Europe, China, Malaysia, and throughout North America. Tom is also the owner and designer of the Nexus Query Chameleon, the most sophisticated enterprise query tool in the industry. The Nexus works on all platforms, including Hadoop, converts table structures between all systems, and allows companies to load their ERwin logical model inside Nexus. The Nexus guides users like a GPS system. Users point and click on any table or view from any system, and they are guided to what joins to what. As users choose the columns they want on their report, the SQL is built automatically. In High School, Tom was the first athlete from his school to ever place at state. He was selected by his school to represent them at Buckeye Boys State, and Tom was inducted into the first class of the Lakota High School Hall of Fame. At the University of Arizona and University of Nevada Las Vegas, Tom was a two-time All-American wrestler, Sophomore Athlete of the year, and a two-time winner of the 1980 Olympic wrestling trials. Tom graduated with a Bachelor’s degree in Speech Communications. After college, Tom became a state and national champion speech winner for Toastmasters and won two orchid awards as an actor. Tom is the proud father of three wonderful children and has been married for the past 32 years. You can contact Tom at 513 300-0341 or at [email protected].

About Leona Coffing

Leona Coffing is the Co-founder and Chief Financial Officer of Coffing Data Warehousing. She has co-authored several books on data warehousing and has produced more than 75 CoffingDW books. Leona has always been a driving force in the success and growth of Coffing Data Warehousing. She has successfully been able to keep Coffing Data Warehousing as an independent company without utilizing venture capital. Leona is also responsible for managing all of CoffingDW Nexus programmers and has managed and mentored over 30 Coffing Data Warehousing employees. She has also been responsible for part of the design of Nexus. She is credited with the idea of implementing Microsoft products inside Nexus. Leona is also a professional golf caddie on the women’s professional tour, a mother of three children, and a proud grandmother. Leona is a Phoenix, AZ native but now resides in Cincinnati, OH with her husband of 35 years, Tom.

Table of Contents

Contents Chapter 1 – Introduction to the Greenplum Architecture ............................................................................................. 2 What is Parallel Processing? ...................................................................................................................................... 3 The Basics of a Single Computer ............................................................................................................................... 4 Data in Memory is fast as Lightning .......................................................................................................................... 5 Parallel Processing Of Data ....................................................................................................................................... 6 Symmetric Multi-Processing (SMP) Server .............................................................................................................. 7 Commodity Hardware Servers are configured for Greenplum .................................................................................. 8 Commodity Hardware Allows For One Segment per CPU ....................................................................................... 9 The Master Host ....................................................................................................................................................... 10 The Segment's Responsibilities ................................................................................................................................ 11 The Host's Plan is Either All Segments or a Single Segment .................................................................................. 12 A Table has Columns and Rows............................................................................................................................... 13 Greenplum has Linear Scalability ............................................................................................................................ 14 The Architecture of A Greenplum Data Warehouse................................................................................................. 15 Nexus is Now Available for Greenplum .................................................................................................................. 16 Chapter 2 – Greenplum Table Structures ................................................................................................................... 18 The Concepts of Greenplum Tables......................................................................................................................... 19 Tables are Either Distributed by Hash or Random .................................................................................................. 20 A Hash Distributed Table has A Distribution Key .................................................................................................. 21 Picking A Distribution Key That Is Not Very Unique ............................................................................................ 22 Random Distribution Uses a Round Robin Technique ............................................................................................ 23 Tables Will Be Distributed Among All Segments ................................................................................................... 24 The Default For Distribution Chooses the First Column ......................................................................................... 25 Table are Either a Heap or Append-Only ................................................................................................................ 26 Tables are Stored in Either Row or Columnar Format ............................................................................................ 27

Table of Contents Creating a Column Oriented Table .......................................................................................................................... 28 Comparing Normal Table vs. Columnar Tables ...................................................................................................... 29 Columnar can move just One Column Block Into Memory .................................................................................... 30 Segments on Distributions are aligned to Rebuild a Row ....................................................................................... 31 Columnar Tables Store Each Column in Separate Blocks ...................................................................................... 32 Visualize the Data – Rows vs. Columns .................................................................................................................. 33 Table Rows are Either Sorted or Unsorted .............................................................................................................. 34 Creating a Clustered Index in Order to Physically Sort Rows ................................................................................ 35 Physically Ordered Tables Are Faster on Certain Queries ...................................................................................... 36 Another Way to Create a Clustered Table ............................................................................................................... 37 Creating a B-Tree Index and then Running Analyze ............................................................................................... 38 Creating a Bitmap Index .......................................................................................................................................... 39 Why Create a Bitmap Index? ................................................................................................................................... 40 Tables Can Be Partitioned........................................................................................................................................ 41 A Table Partitioned By Range (Per Month) ............................................................................................................ 42 A Visual of a Partitioned Table by Range (Month) ................................................................................................. 43 Tables Can Be Partitioned by Day ........................................................................................................................... 44 Visualize a Partitioned Table by Day ...................................................................................................................... 45 Creating a Partitioned Table Using a List ................................................................................................................ 46 Creating a Multi-Level Partitioned Table ................................................................................................................ 47 Changing a Table to a Partitioned Table.................................................................................................................. 48 Not Null Constraints ................................................................................................................................................ 49 Unique Constraints ................................................................................................................................................... 50 Unique Constraints That Fail ................................................................................................................................... 51 Primary Key Constraints .......................................................................................................................................... 52 A Primary Key Automatically Creates a Unique Index .......................................................................................... 53 Check Constraints .................................................................................................................................................... 54 Creating an Automatic Number Called a Sequence ................................................................................................ 55 Multiple INSERT example using a Sequence ......................................................................................................... 56

Table of Contents

Chapter 3 – Hashing and Data Distribution ................................................................................................................ 58 Distribution Keys Hashed on Unique Values Spread Evenly.................................................................................. 59 Distribution Keys with Non-Unique Values Spread Unevenly ............................................................................... 60 Best Practices for Choosing a Distribution Key ...................................................................................................... 61 The Hash Map Determines which Segment owns the Row..................................................................................... 62 The Hash Map Determines which Node will own the Row .................................................................................... 63 The Hash Map Determines which Node will own the Row .................................................................................... 64 The Hash Map Determines which Node will own the Row .................................................................................... 65 Hash Map Determines which Node will own the Row ........................................................................................... 66 A Review of the Hashing Process ............................................................................................................................ 67 Non-Unique Distribution Keys have Skewed Data ................................................................................................. 68 Non-Unique Distribution Keys have Skewed Data ................................................................................................. 69 Chapter 4 – The Technical Details.............................................................................................................................. 71 Greenplum Limitations ............................................................................................................................................ 72 Every Segment has the Exact Same Tables ............................................................................................................. 73 Tables are Distributed across All Segments ............................................................................................................ 74 The Table Header and the Data Rows are Stored Separately .................................................................................. 75 Segments Store Rows inside a Data Block Called a Page ....................................................................................... 76 To Read a Data Block a Node Moves the Block into Memory ............................................................................... 77 A Full Table Scan Means All Nodes Must Read All Rows..................................................................................... 78 Rows are Organized inside a Page ........................................................................................................................... 79 Moving Data Blocks is Like Checking In Luggage................................................................................................. 80 As Row-Based Tables Get Bigger, the Page Splits ................................................................................................. 81 Data Pages are Processed One at a Time per Unit ................................................................................................... 82 Creating a Table that is a Heap ................................................................................................................................ 83 Heap Page ................................................................................................................................................................. 84 Creating a Table that has a Clustered Index ............................................................................................................ 85

Table of Contents Clustered Index Page................................................................................................................................................ 86 The Row Offset Array is the Guidance System for Every Row .............................................................................. 87 The Row Offset Array Provides Two Search Options (1 of 2) ............................................................................... 88 The Row Offset Array Provides Two Search Options (2 of 2) ............................................................................... 89 The Row Offset Array Helps With Inserts .............................................................................................................. 90 B-Trees ..................................................................................................................................................................... 91 The Building of a B-Tree for a Clustered Index (1 of 3) ......................................................................................... 92 The Building of a B-Tree for a Clustered Index (2 of 3) ......................................................................................... 93 The Building of a B-Tree for a Clustered Index (3 of 3) ......................................................................................... 94 When Do I Create a Clustered Index? ..................................................................................................................... 95 When Do I Create a Non Clustered Index? ............................................................................................................. 96 B-Tree for Non Clustered Index on a Clustered Table (1 of 2) ............................................................................... 97 B-Tree for Non Clustered Index on a Clustered Table (2 of 2) ............................................................................... 98 Adding a Non Clustered Index To A ....................................................................................................................... 99 B-Tree for Non Clustered Index on a Heap Table (1 of 2) .................................................................................... 100 B-Tree for Non Clustered Index on a Heap Table (2 of 2) .................................................................................... 101 Chapter 5 – Physical Database Design .................................................................................................................... 103 The Four Stages of Modeling for Greenplum ........................................................................................................ 104 The Logical Model ................................................................................................................................................. 105 The Logical Model can be loaded inside Nexus .................................................................................................... 106 First, Second and Third Normal Form ................................................................................................................... 107 Quiz – Choose that Normalization Technique ....................................................................................................... 108 Answer to Quiz – Choose that Normalization Technique ..................................................................................... 109 Quiz – What Normalization is it now?................................................................................................................... 110 Answer to Quiz – What Normalization is it now? ................................................................................................. 111 The Employee_Table and Department_Table can be joined ................................................................................. 112 The Employee_Table and Department_Table Join SQL ....................................................................................... 113 The Extended Logical Model Template................................................................................................................. 114

Table of Contents User Access is of Great Importance ....................................................................................................................... 115 User Access in Layman’s Terms ........................................................................................................................... 116 User Access for Joins in Layman’s Terms ............................................................................................................ 117 The Nexus Shows Users the Table’s Distribution Key ......................................................................................... 118 Data Demographics Tell Us if the Column is Worthy........................................................................................... 119 Data Demographics – Distinct Rows ..................................................................................................................... 120 Data Demographics – Distinct Rows Query .......................................................................................................... 121 Data Demographics – Max Rows Null .................................................................................................................. 122 Data Demographics – Max Rows Null Query ....................................................................................................... 123 Data Demographics – Max Rows Per Value ......................................................................................................... 124 Data Demographics – Max Rows Per Value ......................................................................................................... 125 Data Demographics – Typical Rows Per Value .................................................................................................... 126 Typical Rows Per Value Query For Greenplum Systems ..................................................................................... 127 SQL to Get the Average Rows Per Value for a Column (Mean) .......................................................................... 128 Data Demographics – Change Rating .................................................................................................................... 129 Factors When Choosing Greenplum Indexes ........................................................................................................ 130 Distribution Key Data Demographics Candidate Guidelines ................................................................................ 131 Distribution key Access Considerations ................................................................................................................ 132 Answer -Three Important distribution key Considerations ................................................................................... 133 Step 1 is to Pick All Potential Distribution Key Columns ..................................................................................... 134 Step 1 is to Pick All Potential Distribution Key Columns ..................................................................................... 135 Step 2 is to Pick All Potential Secondary Indexes ................................................................................................. 136 Answer to 2nd Step to Picking Potential Secondary Indexes ................................................................................ 137 Choose the Distribution Key and Secondary Indexes............................................................................................ 138 3rd Step is to picking your Indexes ......................................................................................................................... 139 Our Index Picks ...................................................................................................................................................... 140

Table of Contents Chapter 6 – Denormalization ................................................................................................................................... 142 Denormalization ..................................................................................................................................................... 143 Derived Data .......................................................................................................................................................... 144 Repeating Groups ................................................................................................................................................... 145 Pre-Joining Tables .................................................................................................................................................. 146 Storing Summary Data with a Trigger ................................................................................................................... 147 Summary Tables or Data Marts the Old Way ........................................................................................................ 148 Horizontal Partitioning the Old Way ..................................................................................................................... 149 Horizontal Partitioning the New Way .................................................................................................................... 150 Vertical Partitioning the Old Way ......................................................................................................................... 151 Columnar Tables Are the New Vertical Partitioning ............................................................................................. 152 Chapter 7 - Nexus ..................................................................................................................................................... 154 Nexus is Available on the Cloud............................................................................................................................ 155 Nexus Queries Every Major System ...................................................................................................................... 156 How to Use Nexus ................................................................................................................................................. 157 Why is Nexus Special? Visualization and Automatic SQL ................................................................................... 158 Why is Nexus Special? Cross-System Joins .......................................................................................................... 159 Why is Nexus Special? The Amazing Hub System ............................................................................................... 160 Why is Nexus Special? Save Answer Sets as Tables ............................................................................................ 161 Why is Nexus Special? Automated Data Movement ............................................................................................. 162 Why is Nexus Special? Nexus makes the Servers Talk Directly .......................................................................... 163 What Makes Nexus Special? The Garden of Analysis .......................................................................................... 164 The Garden of Analysis Grouping Sets Tab .......................................................................................................... 165 The Garden of Analysis - Grouping Sets Answer Sets .......................................................................................... 166 The Garden of Analysis – Join Tab (1 of 4) .......................................................................................................... 167 The Garden of Analysis – Join Tab (2 of 4) .......................................................................................................... 168 The Garden of Analysis – Join Tab (3 of 4) .......................................................................................................... 169 The Garden of Analysis – Join Tab (4 of 4) .......................................................................................................... 170

Table of Contents The Garden of Analysis – Charts/Graphs Tab (1 of 4) .......................................................................................... 171 The Garden of Analysis – Charts/Graphs Tab (2 of 4) .......................................................................................... 172 The Garden of Analysis – Charts/Graphs Tab (3 of 4) .......................................................................................... 173 The Garden of Analysis – Charts/Graphs Tab (4 of 4) .......................................................................................... 174 The Garden of Analysis – Dynamic Charts Tab (1 of 4) ....................................................................................... 175 The Garden of Analysis – Dynamic Charts Tab (2 of 4) ....................................................................................... 176 The Garden of Analysis – Dynamic Charts Tab (3 of 4) ....................................................................................... 177 The Garden of Analysis – Dynamic Charts Tab (4 of 4) ....................................................................................... 178 The Garden of Analysis – Dashboard Tab (1 of 5)................................................................................................ 179 The Garden of Analysis – Dynamic Charts Tab (2 of 5) ....................................................................................... 180 The Garden of Analysis – Dynamic Charts Tab (3 of 5) ....................................................................................... 181 The Garden of Analysis – Dynamic Charts Tab (4 of 5) ....................................................................................... 182 The Garden of Analysis – Dynamic Charts Tab (5 of 5) ....................................................................................... 183 Getting to the Super Join Builder ........................................................................................................................... 184 The Super Join Builder is the First Entry in the Menu .......................................................................................... 185 The Super Join Builder Shows Tables Visually .................................................................................................... 186 Using the Add Join Button ..................................................................................................................................... 187 What to Do When No Tables are Joinable? ........................................................................................................... 188 Drag a Joinable Object into the Super Join Builder ............................................................................................... 189 You Will See the Add Custom Join Window ........................................................................................................ 190 Defining the Join Columns .................................................................................................................................... 191 Your Tables Will Appear Together ....................................................................................................................... 192 Select the Columns You Want on the Report ........................................................................................................ 193 Check out the SQL Tab to See the SQL that has been built .................................................................................. 194 SQL Tab ................................................................................................................................................................. 195 Hit Execute to get the Report inside the Super Join Builder ................................................................................. 196 The Report is delivered inside the Super Join Builder .......................................................................................... 197 Let's Join Two Tables Again (1 of 6)..................................................................................................................... 198 Let's Join Two Tables Again (2 of 6)..................................................................................................................... 199

Table of Contents Let's Join Two Tables Again (3 of 6)..................................................................................................................... 200 Let's Join Two Tables Again (4 of 6)..................................................................................................................... 201 Let's Join Two Tables Again (5 of 6)..................................................................................................................... 202 Let's Join Two Tables Again (6 of 6)..................................................................................................................... 203 The Tabs of the Super Join Builder Philosophy – One Query............................................................................... 204 The Tabs of the Super Join Builder – Objects Tab ................................................................................................ 205 The Tabs of the Super Join Builder – Columns Tab) ............................................................................................ 206 The Tabs of the Super Join Builder – Sorting Tab ................................................................................................ 207 The Tabs of the Super Join Builder – Joins Tab .................................................................................................... 208 The Tabs of the Super Join Builder – SQL Tab..................................................................................................... 209 The Tabs of the Super Join Builder – Metadata Tab ............................................................................................. 210 The Tabs of the Super Join Builder – Analytics Tab ............................................................................................. 211 The Tabs of the SJB – Analytics Tab – OLAP Screen .......................................................................................... 212 Getting a Simple CSUM in the Analytics Tab – OLAP ........................................................................................ 213 Getting a Simple CSUM – The SQL Automatically Generated ............................................................................ 214 The Answer Set of the CSUM ............................................................................................................................... 215 Getting all of the OLAP functions in the Analytics Tab ....................................................................................... 216 A Five Table Join Using the Menu ........................................................................................................................ 217 The First Table is placed in the Super Join Builder ............................................................................................... 218 Using the Add Join Cascading Menu ..................................................................................................................... 219 All Five Tables Are In the Super Join Builder ...................................................................................................... 220 A Five Table Join Two Steps (Cube) ..................................................................................................................... 221 Choose Cube with Columns from the Left Top of the Table ................................................................................ 222 All Tables are Cubed (Joined Together Instantly) ................................................................................................. 223 Choose Cube and then Choose Your Columns ...................................................................................................... 224 Create Cube - Tables Are Joined Without Columns Selected ............................................................................... 225 Create Cube – Select the Columns You Want on the Report ................................................................................ 226 How to join Greenplum, Oracle and SQL Server Tables ...................................................................................... 227 The Greenplum Table is now in the Super Join Builder........................................................................................ 228

Table of Contents Drag the Joining Oracle Table to the Super Join Builder ...................................................................................... 229 Defining the Join Columns .................................................................................................................................... 230 Choose the Columns You Want on Your Report................................................................................................... 231 Let's Add a SQL Server Table to our Teradata and Oracle Join ........................................................................... 232 Defining the Join Columns .................................................................................................................................... 233 All Three Tables are now in the Super Join Builder.............................................................................................. 234 Change the Hub and Run the Join on Oracle ......................................................................................................... 235 Change the Hub and Run the Join on SQL Server................................................................................................. 236 Simply Amazing - Change the Hub to the Garden of Analysis ............................................................................. 237 Have the Answer Set Saved Automatically to Any System .................................................................................. 238 Saving the Answer Set to an Oracle or SQL Server System ................................................................................. 239 Saving the Answer Set to a Greenplum System .................................................................................................... 240 Saving the Answer Set to a Teradata System ........................................................................................................ 241 Chapter 8 – The Basics of SQL ................................................................................................................................ 243 Introduction ............................................................................................................................................................ 244 SELECT * (All Columns) in a Table ..................................................................................................................... 245 Fully Qualifying a Database, Schema and Table ................................................................................................... 246 SELECT Specific Columns in a Table .................................................................................................................. 247 Commas in the Front or Back? .............................................................................................................................. 248 Place your Commas in front for better Debugging Capabilities ............................................................................ 249 Sort the Data with the ORDER BY Keyword ....................................................................................................... 250 ORDER BY Defaults to Ascending ....................................................................................................................... 251 Use the Name or the Number in your ORDER BY Statement .............................................................................. 252 Two Examples of ORDER BY using Different Techniques ................................................................................. 253 Changing the ORDER BY to Descending Order ................................................................................................... 254 NULL Values sort First in Ascending Mode (Default) ......................................................................................... 255 NULL Values sort Last in Descending Mode (DESC).......................................................................................... 256 Major Sort vs. Minor Sorts .................................................................................................................................... 257

Table of Contents Multiple Sort Keys using Names vs. Numbers ...................................................................................................... 258 Sorts are Alphabetical, NOT Logical ..................................................................................................................... 259 Using A CASE Statement to Sort Logically .......................................................................................................... 260 How to ALIAS a Column Name ............................................................................................................................ 261 A Missing Comma can by Mistake become an Alias ............................................................................................ 262 Comments using Double Dashes are Single Line Comments ............................................................................... 263 Comments for Multi-Lines..................................................................................................................................... 264 Comments for Multi-Lines as Double Dashes Per Line ........................................................................................ 265 A Great Technique for Comments to Look for SQL Errors .................................................................................. 266 Chapter 9 – The WHERE Clause.............................................................................................................................. 268 The WHERE Clause limits Returning Rows ......................................................................................................... 269 Double Quoted Aliases are for Reserved Words and Spaces ................................................................................ 270 Character Data needs Single Quotes in the WHERE Clause................................................................................. 271 Character Data needs Single Quotes, but Numbers Don’t..................................................................................... 272 Comparisons against a Null Value ......................................................................................................................... 273 NULL means UNKNOWN DATA so Equal (=) won’t Work .............................................................................. 274 Use IS NULL or IS NOT NULL when dealing with NULLs ............................................................................... 275 NULL is UNKNOWN DATA so NOT Equal won’t Work .................................................................................. 276 Use IS NULL or IS NOT NULL when dealing with NULLs ............................................................................... 277 Using Greater Than or Equal To (>=).................................................................................................................... 278 AND in the WHERE Clause .................................................................................................................................. 279 Troubleshooting AND ............................................................................................................................................ 280 OR in the WHERE Clause ..................................................................................................................................... 281 Troubleshooting Or ................................................................................................................................................ 282 Troubleshooting Character Data ............................................................................................................................ 283 Using Different Columns in an AND Statement ................................................................................................... 284 Quiz – How many rows will return? ...................................................................................................................... 285 Answer to Quiz – How many rows will return? .................................................................................................... 286

Table of Contents What is the Order of Precedence? .......................................................................................................................... 287 Using Parentheses to change the Order of Precedence .......................................................................................... 288 Using an IN List in place of OR ............................................................................................................................ 289 The IN List is an Excellent Technique................................................................................................................... 290 IN List vs. OR brings the same Results ................................................................................................................. 291 The IN List Can Use Character Data ..................................................................................................................... 292 Using a NOT IN List .............................................................................................................................................. 293 Null Values in a NOT IN List Bring Back No Rows ............................................................................................ 294 A Technique for Handling Nulls with a NOT IN List ........................................................................................... 295 BETWEEN is Inclusive ......................................................................................................................................... 296 NOT BETWEEN is Also Inclusive ....................................................................................................................... 297 LIKE uses Wildcards Percent ‘%’ and Underscore ‘_’ ......................................................................................... 298 LIKE command Underscore is Wildcard for one Character.................................................................................. 299 The ilike Command ................................................................................................................................................ 300 LIKE Command Works Differently on Char Vs Varchar ..................................................................................... 301 Troubleshooting LIKE Command on Character Data ........................................................................................... 302 Introducing the TRIM Command .......................................................................................................................... 303 Introducing the RTRIM Command ........................................................................................................................ 304 Quiz – What Data is Left Justified and what is Right? .......................................................................................... 305 Numbers are Right Justified and Character Data is Left ....................................................................................... 306 Answer – What Data is Left Justified and what is Right? ..................................................................................... 307 An example of Data with Left and Right Justification .......................................................................................... 308 A Visual of CHARACTER Data vs. VARCHAR Data ........................................................................................ 309 Use the TRIM command to remove spaces on CHAR Data ................................................................................. 310 Escape Character in the LIKE Command changes Wildcards .............................................................................. 311 Escape Characters Turn off Wildcards in the LIKE Command ............................................................................ 312 Quiz – Turn off that Wildcard................................................................................................................................ 313 ANSWER – To Find that Wildcard ....................................................................................................................... 314 Introducing the RTRIM Command ........................................................................................................................ 315

Table of Contents Quiz – What Data is Left Justified and What is Right? ......................................................................................... 316 Numbers are Right Justified and Character Data is Left ....................................................................................... 317 Answer – What Data is Left Justified and what is Right? ..................................................................................... 318 An example of Data with Left and Right Justification .......................................................................................... 319 A Visual of CHARACTER Data vs. VARCHAR Data ........................................................................................ 320 RTRIM command Removes Trailing spaces on CHAR Data ............................................................................... 321 Using Like with an AND Clause to Find Multiple Letters .................................................................................... 322 Using Like with an OR Clause to Find Either Letters ........................................................................................... 323 Chapter 10 – Distinct vs. Group By .......................................................................................................................... 325 The Distinct Command .......................................................................................................................................... 326 Distinct vs. GROUP BY ........................................................................................................................................ 327 Quiz – How many rows come back from the Distinct? ......................................................................................... 328 Answer – How many rows come back from the Distinct? .................................................................................... 329 Chapter 11 – Aggregation ......................................................................................................................................... 331 Quiz – You calculate the Answer Set in your own Mind ...................................................................................... 332 Answer – You calculate the Answer Set in your own Mind ................................................................................. 333 Quiz – You calculate the Answer Set in your own Mind ...................................................................................... 334 Answer – You calculate the Answer Set in your own Mind ................................................................................. 335 The 3 Rules of Aggregation ................................................................................................................................... 336 There are Five Aggregates ..................................................................................................................................... 337 Quiz – How many rows come back? ..................................................................................................................... 338 Answer – How many rows come back? ................................................................................................................. 339 Troubleshooting Aggregates .................................................................................................................................. 340 GROUP BY when Aggregates and Normal Columns Mix ................................................................................... 341 GROUP BY delivers one row per Group .............................................................................................................. 342 GROUP BY Dept_No or GROUP BY 1 the same thing ....................................................................................... 343 Limiting Rows and Improving Performance with WHERE .................................................................................. 344

Table of Contents WHERE Clause in Aggregation limits unneeded Calculations ............................................................................. 345 Keyword HAVING tests Aggregates after they are totaled .................................................................................. 346 Aggregates Return Null on Empty Tables ............................................................................................................. 347 Keyword HAVING is like an Extra WHERE Clause for Totals ........................................................................... 348 Keyword HAVING tests Aggregates after they are totaled .................................................................................. 349 Getting the Average Values per Column ............................................................................................................... 350 Three types of Advanced Grouping ....................................................................................................................... 351 Group By Grouping Sets ........................................................................................................................................ 352 Group By Rollup .................................................................................................................................................... 353 GROUP BY Rollup Result Set .............................................................................................................................. 354 GROUP BY Cube .................................................................................................................................................. 355 GROUP BY CUBE Result Set............................................................................................................................... 356 GROUP BY CUBE Result Set............................................................................................................................... 357 Quiz - GROUP BY GROUPING SETS Challenge ............................................................................................... 358 Answer To Quiz - GROUP BY GROUPING SETS Challenge ............................................................................ 359 Chapter 12 – Join Functions ..................................................................................................................................... 361 Greenplum Join Quiz ............................................................................................................................................. 362 Greenplum Join Quiz Answer ................................................................................................................................ 363 Redistribution ......................................................................................................................................................... 364 Big Table Small Table Join Strategy ..................................................................................................................... 365 Duplication of the Smaller Table across All-Distributions ................................................................................... 366 If the Join Condition is the Distribution Key no Movement ................................................................................. 367 Matching Rows That Are On The Same Node Naturally ...................................................................................... 368 What if the Join Condition Columns are Not distribution keyes ........................................................................... 369 Strategy 1 of 4 – The Merge Join ........................................................................................................................... 370 Quiz – Redistribute the Employees by their Dept_No .......................................................................................... 371 Quiz – Employees' Dept_No landed on segment with Matches ............................................................................ 372 Quiz – Redistribute the Orders to the Proper segment .......................................................................................... 373

Table of Contents Answer to Redistribute the Employees by their Dept_No Quiz ............................................................................ 374 Strategy 2 of 4 – The Hash Join ............................................................................................................................. 375 Strategy 3 of 4 – The Nested Join .......................................................................................................................... 376 Strategy 4 of 4 – The Product Join ......................................................................................................................... 377 A Two-Table Join Using Traditional Syntax ......................................................................................................... 378 A two-table join using Non-ANSI Syntax with Table Alias ................................................................................. 379 You Can Fully Qualify All Columns ..................................................................................................................... 380 A two-table join using ANSI Syntax ..................................................................................................................... 381 Both Queries have the same Results and Performance.......................................................................................... 382 Quiz – Can You Finish the Join Syntax? ............................................................................................................... 383 Answer to Quiz – Can You Finish the Join Syntax? ............................................................................................. 384 Quiz – Can You Find the Error? ............................................................................................................................ 385 Answer to Quiz – Can You Find the Error? .......................................................................................................... 386 Super Quiz – Can You Find the Difficult Error? ................................................................................................... 387 Answer to Super Quiz – Can You Find the Difficult Error? ................................................................................. 388 Quiz – Which rows from both tables won’t return? .............................................................................................. 389 Answer to Quiz – Which rows from both tables won’t return? ............................................................................. 390 LEFT OUTER JOIN .............................................................................................................................................. 391 LEFT OUTER JOIN Results ................................................................................................................................. 392 RIGHT OUTER JOIN............................................................................................................................................ 393 RIGHT OUTER JOIN Example and Results......................................................................................................... 394 FULL OUTER JOIN .............................................................................................................................................. 395 FULL OUTER JOIN Results ................................................................................................................................. 396 Which Tables are the Left and which Tables are Right? ....................................................................................... 397 Answer - Which Tables are the Left and Which are the Right? ............................................................................ 398 INNER JOIN with Additional AND Clause .......................................................................................................... 399 ANSI INNER JOIN with Additional AND Clause ............................................................................................... 400 ANSI INNER JOIN with Additional WHERE Clause .......................................................................................... 401 OUTER JOIN with Additional WHERE Clause ................................................................................................... 402

Table of Contents OUTER JOIN with Additional AND Clause ......................................................................................................... 403 OUTER JOIN with Additional AND Clause Results ............................................................................................ 404 Quiz – Why is this considered an INNER JOIN? .................................................................................................. 405 Evaluation Order for Outer Queries ....................................................................................................................... 406 The DREADED Product Join ................................................................................................................................ 407 The DREADED Product Join Results ................................................................................................................... 408 The Horrifying Cartesian Product Join .................................................................................................................. 409 The ANSI Cartesian Join will ERROR .................................................................................................................. 410 Quiz – Do these Joins Return the Same Answer Set? ........................................................................................... 411 Answer – Do these Joins Return the Same Answer Set? ....................................................................................... 412 The CROSS JOIN .................................................................................................................................................. 413 The CROSS JOIN Answer Set............................................................................................................................... 414 The Self Join.......................................................................................................................................................... 415 The Self Join with ANSI Syntax ............................................................................................................................ 416 Quiz – Will both queries bring back the same Answer Set? ................................................................................. 417 Answer – Will both queries bring back the same Answer Set? ............................................................................. 418 Quiz – Will both queries bring back the same Answer Set? ................................................................................. 419 Answer – Will both queries bring back the same Answer Set? ............................................................................. 420 How would you Join these two tables? .................................................................................................................. 421 An Associative Table is a Bridge that Joins Two Tables ...................................................................................... 422 Quiz – Can you write the 3-Table Join? ................................................................................................................ 423 Answer to Quiz – Can you Write the 3-Table Join? .............................................................................................. 424 Quiz – Can you write the 3-Table Join to ANSI Syntax? ...................................................................................... 425 Answer – Can you Write the 3-Table Join to ANSI Syntax? ................................................................................ 426 Quiz – Can you Place the ON Clauses at the End?................................................................................................ 427 Answer – Can you Place the ON Clauses at the End? ........................................................................................... 428 The 5-Table Join – Logical Insurance Model ........................................................................................................ 429 Quiz - Write a Five Table Join Using ANSI Syntax .............................................................................................. 430 Answer - Write a Five Table Join Using ANSI Syntax ......................................................................................... 431

Table of Contents Quiz - Write a Five Table Join Using Non-ANSI Syntax ..................................................................................... 432 Answer - Write a Five Table Join Using Non-ANSI Syntax ................................................................................. 433 Quiz –Re-Write this putting the ON clauses at the END ...................................................................................... 434 Answer –Re-Write this putting the ON clauses at the END .................................................................................. 435 Chapter 13 – Date Functions.................................................................................................................................... 437 Current_Date .......................................................................................................................................................... 438 Current_Date and Current_Time ........................................................................................................................... 439 Current_Date and Current_Timestamp .................................................................................................................. 440 The Many Different Ways to Look at a Timestamp .............................................................................................. 441 Current_Time vs. LocalTime with Precision ......................................................................................................... 442 Local_Time and Local_Timestamp With Precision .............................................................................................. 443 Now () and Timeofday () Functions ...................................................................................................................... 444 Adding A Week to a Date ...................................................................................................................................... 445 Add or Subtract Days from a date .......................................................................................................................... 446 Formatting Dates and Dollar Amounts .................................................................................................................. 447 The EXTRACT Command .................................................................................................................................... 448 EXTRACT from DATES and TIME ..................................................................................................................... 449 EXTRACT Command on the Century ................................................................................................................... 450 EXTRACT Command for the Decade, DOW and DOY ....................................................................................... 451 EXTRACT Microseconds, Milliseconds and Millennium .................................................................................... 452 EXTRACT of the Month on Aggregate Queries ................................................................................................... 453 Date_part Command .............................................................................................................................................. 454 Date_Trunc Command with Time ......................................................................................................................... 455 Date_Trunc Command with Dates ......................................................................................................................... 456 The AGE Command ............................................................................................................................................... 457 AGE Challenge ...................................................................................................................................................... 458 AGE Challenge Results.......................................................................................................................................... 459 Epoch ...................................................................................................................................................................... 460

Table of Contents Using Intervals ....................................................................................................................................................... 461 More Interval Examples ......................................................................................................................................... 462 Interval Arithmetic Results .................................................................................................................................... 463 A Complex Time Interval example using CAST ................................................................................................... 464 The OVERLAPS Command .................................................................................................................................. 465 An OVERLAPS example that Returns No Rows .................................................................................................. 466 The OVERLAPS Command using TIME.............................................................................................................. 467 Using both CAST and CONVERT in Literal Values ............................................................................................ 468 A Better Technique for YEAR, MONTH, and DAY Functions ........................................................................... 469 Chapter 14 – Conversions and Formatting ............................................................................................................... 471 Postgres Conversion Functions .............................................................................................................................. 472 Postgres Conversion Function Templates .............................................................................................................. 473 Postgres Conversion Function Templates Continued ............................................................................................ 474 To_Char command Examples ................................................................................................................................ 475 Formatting A Date with To_Char .......................................................................................................................... 476 Formatting A Date With To_Char Continued ....................................................................................................... 477 To_Number ............................................................................................................................................................ 478 To_Number Examples ........................................................................................................................................... 479 To_Date .................................................................................................................................................................. 480 To_Timestamp ....................................................................................................................................................... 481 Numeric Manipulation Functions .......................................................................................................................... 482 Finding the Cube Root ........................................................................................................................................... 483 Ceiling Gets the Smallest Integer Not Smaller Than X ......................................................................................... 484 Floor Finds the Largest Integer Not Greater Than X ............................................................................................. 485 The Round Function and Precision ........................................................................................................................ 486 Chapter 15 – Sub-query Functions ........................................................................................................................... 488 An IN List is much like a Subquery ....................................................................................................................... 489

Table of Contents An IN List Never has Duplicates – Just like a Subquery....................................................................................... 490 An IN List Ignores Duplicates ............................................................................................................................... 491 The Subquery ......................................................................................................................................................... 492 The Three Steps of How a Basic Subquery Works................................................................................................ 493 These are Equivalent Queries ................................................................................................................................ 494 The Final Answer Set from the Subquery.............................................................................................................. 495 Quiz- Answer the Difficult Question ..................................................................................................................... 496 Answer to Quiz- Answer the Difficult Question ................................................................................................... 497 Should you use a Subquery of a Join? ................................................................................................................... 498 Quiz- Write the Subquery ...................................................................................................................................... 499 Answer to Quiz- Write the Subquery..................................................................................................................... 500 Quiz- Write the More Difficult Subquery .............................................................................................................. 501 Answer to Quiz- Write the More Difficult Subquery ............................................................................................ 502 Quiz – Write the Extreme Subquery ...................................................................................................................... 503 Answer to Quiz – Write the Extreme Subquery .................................................................................................... 504 Quiz- Write the Subquery with an Aggregate........................................................................................................ 505 Answer to Quiz- Write the Subquery with an Aggregate ...................................................................................... 506 Quiz- Write the Correlated Subquery .................................................................................................................... 507 Answer to Quiz- Write the Correlated Subquery ................................................................................................... 508 The Basics of a Correlated Subquery ..................................................................................................................... 509 The Top Query always runs first in a Correlated Subquery .................................................................................. 510 Correlated Subquery Example vs. a Join with a Derived Table ............................................................................ 511 Quiz- A Second Chance to Write a Correlated Subquery ..................................................................................... 512 Answer - A Second Chance to Write a Correlated Subquery ................................................................................ 513 Quiz- A Third Chance to Write a Correlated Subquery ........................................................................................ 514 Answer - A Third Chance to Write a Correlated Subquery ................................................................................... 515 Quiz- Last Chance To Write a Correlated Subquery ............................................................................................. 516 Answer – Last Chance to Write a Correlated Subquery ........................................................................................ 517 Quiz – Write the Extreme Correlated Subquery .................................................................................................... 518

Table of Contents Answer To Quiz – Write the Extreme Correlated Subquery ................................................................................. 519 Quiz- Write the NOT Subquery ............................................................................................................................. 520 Answer to Quiz- Write the NOT Subquery ........................................................................................................... 521 Quiz- Write the Subquery using a WHERE Clause............................................................................................... 522 Answer - Write the Subquery using a WHERE Clause ......................................................................................... 523 Quiz- Write the Subquery with Two Parameters ................................................................................................... 524 Answer to Quiz- Write the Subquery with Two Parameters ................................................................................. 525 How the Double Parameter Subquery Works ........................................................................................................ 526 More on how the Double Parameter Subquery Works .......................................................................................... 527 Quiz – Write the Triple Subquery .......................................................................................................................... 528 Answer to Quiz – Write the Triple Subquery ........................................................................................................ 529 Quiz – How many rows return on a NOT IN with a NULL? ................................................................................ 530 Answer – How many rows return on a NOT IN with a NULL? ........................................................................... 531 How to handle a NOT IN with Potential NULL Values........................................................................................ 532 IN is equivalent to =ANY ...................................................................................................................................... 533 Using a Correlated Exists ....................................................................................................................................... 534 How a Correlated Exists matches up ..................................................................................................................... 535 The Correlated NOT Exists.................................................................................................................................... 536 Quiz – How many rows come back from this NOT Exists? .................................................................................. 537 Answer – How many rows come back from this NOT Exists? ............................................................................. 538 Chapter 16 – OLAP Functions .................................................................................................................................. 540 CSUM..................................................................................................................................................................... 541 CSUM – The Sort Explained ................................................................................................................................. 542 CSUM – Rows Unbounded Preceding Explained ................................................................................................. 543 CSUM – Making Sense of the Data ....................................................................................................................... 544 CSUM – Making Even More Sense of the Data .................................................................................................... 545 CSUM – The Major and Minor Sort Key(s) .......................................................................................................... 546 The ANSI CSUM – Getting a Sequential Number ................................................................................................ 547

Table of Contents Troubleshooting The ANSI OLAP on a GROUP BY ........................................................................................... 548 Reset with a PARTITION BY Statement .............................................................................................................. 549 PARTITION BY only Resets a Single OLAP not ALL of them........................................................................... 550 Moving SUM ......................................................................................................................................................... 551 ANSI Moving Window is Current Row and Preceding n Rows ........................................................................... 552 How ANSI Moving SUM Handles the Sort .......................................................................................................... 553 Quiz – How is that Total Calculated? .................................................................................................................... 554 Answer to Quiz – How is that Total Calculated? .................................................................................................. 555 Moving SUM every 3-rows Vs a Continuous Average ......................................................................................... 556 Partition By Resets an ANSI OLAP ...................................................................................................................... 557 Both the Greenplum Moving Average and ANSI Version .................................................................................... 558 Moving Average..................................................................................................................................................... 559 The Moving Window is Current Row and Preceding ............................................................................................ 560 How Moving Average Handles the Sort ................................................................................................................ 561 Quiz – How is that Total Calculated? .................................................................................................................... 562 Answer to Quiz – How is that Total Calculated? .................................................................................................. 563 Quiz – How is that 4th Row Calculated? ................................................................................................................ 564 Answer to Quiz – How is that 4th Row Calculated? .............................................................................................. 565 Moving Average every 3-rows Vs a Continuous Average .................................................................................... 566 Partition By Resets an ANSI OLAP ...................................................................................................................... 567 Moving Difference using ANSI Syntax with Partition By .................................................................................... 568 RANK Defaults to Ascending Order ..................................................................................................................... 569 Getting RANK to Sort in DESC Order .................................................................................................................. 570 RANK OVER and PARTITION BY ..................................................................................................................... 571 RANK and DENSE RANK ................................................................................................................................... 572 PERCENT_RANK OVER ..................................................................................................................................... 573 PERCENT_RANK OVER with 14 rows in Calculation ....................................................................................... 574 PERCENT_RANK OVER with 21 rows in Calculation ....................................................................................... 575 Quiz – What Causes the Product_ID to Reset? ..................................................................................................... 576

Table of Contents Answer to Quiz – What Cause the Product_ID to Reset? ..................................................................................... 577 COUNT OVER for a Sequential Number ............................................................................................................. 578 Troubleshooting COUNT OVER........................................................................................................................... 579 Quiz – What caused the COUNT OVER to Reset? ............................................................................................... 580 Answer to Quiz – What caused the COUNT OVER to Reset? ............................................................................. 581 The MAX OVER Command.................................................................................................................................. 582 MAX OVER with PARTITION BY Reset ............................................................................................................ 583 Troubleshooting MAX OVER ............................................................................................................................... 584 The MIN OVER Command ................................................................................................................................... 585 Troubleshooting MIN OVER................................................................................................................................. 586 Finding a Value of a Column in the Next Row with MIN .................................................................................... 587 Quiz – Fill in the Blank .......................................................................................................................................... 588 Answer – Fill in the Blank ..................................................................................................................................... 589 The Row_Number Command ................................................................................................................................ 590 Using a Derived Table and Row_Number ............................................................................................................. 591 Quiz – How did the Row_Number Reset? ............................................................................................................. 592 Answer – How did the Row_Number Reset? ........................................................................................................ 593 Ordered Analytics OVER ...................................................................................................................................... 594 CURRENT ROW AND UNBOUNDED FOLLOWING ...................................................................................... 595 Different Windowing Options ............................................................................................................................... 596 The CSUM for Each Product_Id and the Next Start Date ..................................................................................... 597 How Ntile Works ................................................................................................................................................... 598 Ntile ........................................................................................................................................................................ 599 Ntile Continued ...................................................................................................................................................... 600 Ntile Percentile ....................................................................................................................................................... 601 Another Ntile example ........................................................................................................................................... 602 Using Tertiles (Partitions of Four) ......................................................................................................................... 603 NTILE .................................................................................................................................................................... 604 NTILE Using a Value of 10 ................................................................................................................................... 605

Table of Contents NTILE With a Partition.......................................................................................................................................... 606 Using FIRST_VALUE ........................................................................................................................................... 607 FIRST_VALUE ..................................................................................................................................................... 608 FIRST_VALUE after Sorting by the Highest Value ............................................................................................. 609 FIRST_VALUE with Partitioning ......................................................................................................................... 610 Using LAST_VALUE ............................................................................................................................................ 611 LAST_VALUE ...................................................................................................................................................... 612 Using LEAD........................................................................................................................................................... 613 Using LEAD With and Offset of 2 ........................................................................................................................ 614 LEAD ..................................................................................................................................................................... 615 LEAD With Partitioning ........................................................................................................................................ 616 Using LAG ............................................................................................................................................................. 617 Using LAG with an Offset of 2 .............................................................................................................................. 618 LAG ........................................................................................................................................................................ 619 LAG with Partitioning............................................................................................................................................ 620 CUME_DIST ......................................................................................................................................................... 621 CUME_DIST with a Partition................................................................................................................................ 622 SUM (SUM(n)) ...................................................................................................................................................... 623 Chapter 17 – Temporary Tables ............................................................................................................................... 625 There are Two Types of Temporary Tables .......................................................................................................... 626 CREATING A Derived Table................................................................................................................................ 627 Naming the Derived Table ..................................................................................................................................... 628 Aliasing the Column Names in the Derived Table ................................................................................................ 629 Multiple Ways to Alias the Columns in a Derived Table ....................................................................................... 630 CREATING a Derived Table using the WITH Command ..................................................................................... 631 The Same Derived Query shown Three Different Ways ....................................................................................... 632 Most Derived Tables Are Used To Join To Other Tables ..................................................................................... 633 The Three Components of a Derived Table ........................................................................................................... 634

Table of Contents Visualize This Derived Table ................................................................................................................................ 635 A Derived Table and CAST Statements ................................................................................................................ 636 A Derived example Using the WITH Syntax ........................................................................................................ 637 Quiz - Answer the Questions ................................................................................................................................. 638 Answer to Quiz - Answer the Questions................................................................................................................ 639 Clever Tricks on Aliasing Columns in a Derived Table ........................................................................................ 640 An example of Two Derived Tables in a Single Query ......................................................................................... 641 MULTIPLE Derived Tables using the WITH Command ..................................................................................... 642 Finding the First Occurrence.................................................................................................................................. 643 Finding the Last Occurrence .................................................................................................................................. 644 Three Steps to Creating a Temporary Table .......................................................................................................... 645 Three Versions of Creating a Temporary Table .................................................................................................... 646 ON COMMIT PRESERVE ROWS is the Greenplum Default ............................................................................. 647 ON COMMIT DELETE ROWS ............................................................................................................................ 648 How to Use the ON COMMIT DELETE ROWS Option ..................................................................................... 649 ON COMMIT DROP ............................................................................................................................................. 650 How to Use the ON COMMIT DROP Option....................................................................................................... 651 Create Table AS ..................................................................................................................................................... 652 Creating a Temporary Table Using a CTAS that Joins Multiple Tables ............................................................... 653 Create Table LIKE ................................................................................................................................................. 654 Creating a Clustered Index on a Temporary Table ................................................................................................ 655 Chapter 18 – Character Strings ................................................................................................................................. 657 The LENGTH Command Counts Characters ........................................................................................................ 658 The LENGTH Command – Spaces can Count too ................................................................................................ 659 The LENGTH Command Doesn't Count Trailing Spaces ..................................................................................... 660 UPPER and LOWER Commands .......................................................................................................................... 661 Using the LOWER Command ............................................................................................................................... 662 A LOWER Command Example ............................................................................................................................. 663

Table of Contents Using the UPPER Command ................................................................................................................................. 664 An UPPER Command Example ............................................................................................................................ 665 Non-Letters are Unaffected by UPPER and LOWER ........................................................................................... 666 The CHARACTERS Command Counts Characters .............................................................................................. 667 The CHARACTERS Command and Character Data ............................................................................................ 668 CHARACTER_LENGTH and OCTET_LENGTH ............................................................................................... 669 The TRIM Command trims both Leading and Trailing Spaces............................................................................. 670 Trim Combined with the CHARACTERS Command ........................................................................................... 671 How to TRIM only the Trailing Spaces ................................................................................................................. 672 REGEXP_REPLACE ............................................................................................................................................ 673 Concatenation ......................................................................................................................................................... 674 A Visual of the TRIM Command Using Concatenation ........................................................................................ 675 Trim and Trailing is Case Sensitive ....................................................................................................................... 676 How to TRIM Trailing Letters ............................................................................................................................... 677 The SUBSTRING Command................................................................................................................................. 678 SUBSTRING and SUBSTR are equal, but use different syntax ........................................................................... 679 How SUBSTRING Works with NO ENDING POSITION .................................................................................. 680 Using SUBSTRING to move backwards ............................................................................................................... 681 How SUBSTRING Works with a Starting Position of -1 ..................................................................................... 682 How SUBSTRING Works with an Ending Position of 0 ...................................................................................... 683 An example using SUBSTRING, TRIM and CHAR Together ............................................................................. 684 The POSITION Command finds a Letters Position .............................................................................................. 685 Concatenation ......................................................................................................................................................... 686 Concatenation and SUBSTRING........................................................................................................................... 687 Four Concatenations Together ............................................................................................................................... 688 Troubleshooting Concatenation ............................................................................................................................. 689

Table of Contents Chapter 19 – Interrogating the Data......................................................................................................................... 691 Quiz – What would the Answer be? ...................................................................................................................... 692 Answer to Quiz – What would the Answer be? ..................................................................................................... 693 The NULLIF Command ......................................................................................................................................... 694 Quiz – Fill in the Answers for the NULLIF Command ......................................................................................... 695 Answer– Fill in the Answers for the NULLIF Command ..................................................................................... 696 The COALESCE Command – Fill In the Answers ............................................................................................... 697 The COALESCE Answer Set ................................................................................................................................ 698 COALESCE is Equivalent to This CASE Statement ............................................................................................ 699 The COALESCE Command .................................................................................................................................. 700 The COALESCE Answer Set ................................................................................................................................ 701 The COALESCE Quiz ........................................................................................................................................... 702 Answer - The COALESCE Quiz ........................................................................................................................... 703 The Basics of CAST (Convert and Store).............................................................................................................. 704 Some Great CAST (Convert and Store) Examples ................................................................................................ 705 Some Great CAST (Convert and Store) Examples ................................................................................................ 706 Some Great CAST (Convert and Store) example .................................................................................................. 707 Quiz - The Basics of the CASE Statements ........................................................................................................... 708 Answer to Quiz - The Basics of the CASE Statements ......................................................................................... 709 Using an ELSE in the Case Statement ................................................................................................................... 710 Using an ELSE as a Safety Net .............................................................................................................................. 711 Rules for a Valued Case Statement ........................................................................................................................ 712 Rules for a Searched Case Statement ..................................................................................................................... 713 Valued Case Vs. A Searched Case.......................................................................................................................... 714 Quiz - Valued Case Statement ............................................................................................................................... 715 Answer - Valued Case Statement........................................................................................................................... 716 Quiz - Searched Case Statement ............................................................................................................................ 717 Answer - Searched Case Statement ....................................................................................................................... 718 The CASE Challenge ............................................................................................................................................. 720

Table of Contents The CASE Challenge Answer................................................................................................................................ 721 Combining Searched Case and Valued Case ......................................................................................................... 722 A Trick for getting a Horizontal Case.................................................................................................................... 723 Nested Case ............................................................................................................................................................ 724 Chapter 20 – Set Operators Functions ...................................................................................................................... 726 Rules of Set Operators ........................................................................................................................................... 727 Rules of Set Operators ........................................................................................................................................... 728 INTERSECT Explained Logically......................................................................................................................... 729 INTERSECT Explained Logically......................................................................................................................... 730 UNION Explained Logically ................................................................................................................................. 731 UNION Explained Logically ................................................................................................................................. 732 UNION ALL Explained Logically ........................................................................................................................ 733 UNION ALL Explained Logically ........................................................................................................................ 734 EXCEPT Explained Logically ............................................................................................................................... 735 EXCEPT Explained Logically ............................................................................................................................... 736 An Equal Amount of Columns in both SELECT List ........................................................................................... 737 Columns in the SELECT list should be from the same Domain ........................................................................... 738 The Top Query handles all Aliases ........................................................................................................................ 739 The Bottom Query does the ORDER BY (a Number) .......................................................................................... 740 Great Trick: Place your Set Operator in a Derived Table..................................................................................... 741 UNION Vs UNION ALL ....................................................................................................................................... 742 Using UNION ALL and Literals ........................................................................................................................... 743 A Great example of how EXCEPT works ............................................................................................................. 744 Quiz – Build that Query ......................................................................................................................................... 745 Answer To Quiz – Build that Query ...................................................................................................................... 746 USING Multiple SET Operators in a Single Request............................................................................................ 747 Changing the Order of Precedence with Parentheses ............................................................................................ 748 Using UNION ALL for speed in Merging Data Sets ............................................................................................ 749

Table of Contents Chapter 21 – View Functions ................................................................................................................................... 751 The Fundamentals of Views .................................................................................................................................. 752 Creating a Simple View to Restrict Sensitive Columns ........................................................................................ 753 Creating a Simple View to Restrict Rows ............................................................................................................. 754 Basic Rules for Views ............................................................................................................................................ 755 Exception to the ORDER BY Rule inside a View ................................................................................................. 756 Views sometimes CREATED for Formatting ....................................................................................................... 757 Creating a View to Join Tables Together............................................................................................................... 758 Another Way to Alias Columns in a View CREATE ............................................................................................ 759 The Standard Way Most Aliasing is done ............................................................................................................. 760 What Happens When Both Aliasing Options Are Present .................................................................................... 761 Resolving Aliasing Problems in a View CREATE ............................................................................................... 762 Answer to Resolving Aliasing Problems in a View CREATE .............................................................................. 763 Aggregates on View Aggregates............................................................................................................................ 764 Altering A Table .................................................................................................................................................... 765 Altering a Table after a View has been Created .................................................................................................... 766 A View that Errors after an ALTER ...................................................................................................................... 767 Chapter 22 – Table Create and Data Types .............................................................................................................. 769 Greenplum Has Only Two Distribution Policies ................................................................................................... 770 Creating a Table with a Single Column Distribution Key ..................................................................................... 771 The Default Table Storage is a Heap ..................................................................................................................... 772 Creating a Table With a Multi-Column Distribution Key ..................................................................................... 773 Creating a Table with Random Distribution .......................................................................................................... 774 Creating a Table with No Distribution Key ........................................................................................................... 775 Guidelines for Partitioning a Table ........................................................................................................................ 776 Creating a Partitioned Table Using a Range .......................................................................................................... 777 A Visual of One Year of Data with Range Partitioning ........................................................................................ 778 Creating a Partitioned Table Using a Range Per Day ............................................................................................ 779

Table of Contents A Visual of One Year of Data with Range per Day .............................................................................................. 780 Creating a Partitioned Table Using a List .............................................................................................................. 781 Creating a Multi-Level Partitioned Table .............................................................................................................. 782 Changing a Table to a Partitioned Table................................................................................................................ 783 Not Null Constraints .............................................................................................................................................. 784 Unique Constraints ................................................................................................................................................. 785 Primary Key Constraints ........................................................................................................................................ 786 Check Constraints .................................................................................................................................................. 787 Append Only Tables .............................................................................................................................................. 788 Storage is Either Row, Column, or a Combination of Both .................................................................................. 789 Column-Orientated Tables ..................................................................................................................................... 790 CREATE INDEX Syntax....................................................................................................................................... 791 CREATE INDEX Syntax....................................................................................................................................... 792 Create Table LIKE ................................................................................................................................................. 793 Greenplum Data Types .......................................................................................................................................... 794 Greenplum Data Types Continued ......................................................................................................................... 795 Greenplum Data Types Continued ......................................................................................................................... 796 Greenplum Data Types Continued ......................................................................................................................... 797 Greenplum Data Types Continued ......................................................................................................................... 798 Chapter 23 – Data Manipulation Language (DML) ................................................................................................. 800 INSERT Syntax # 1 ................................................................................................................................................ 801 INSERT example with Syntax 1 ............................................................................................................................ 802 INSERT Syntax # 2 ................................................................................................................................................ 803 INSERT example with Syntax 2 ............................................................................................................................ 804 INSERT example with Syntax 3 ............................................................................................................................ 805 INSERT/SELECT Command ................................................................................................................................ 806 INSERT/SELECT example using All Columns (*) .............................................................................................. 807 INSERT/SELECT example with Less Columns ................................................................................................... 808

Table of Contents The UPDATE Command Basic Syntax ................................................................................................................. 809 Two UPDATE Examples ....................................................................................................................................... 810 Subquery UPDATE Command Syntax .................................................................................................................. 811 Example of Subquery UPDATE Command .......................................................................................................... 812 Join UPDATE Command Syntax .......................................................................................................................... 813 Example of an UPDATE Join Command .............................................................................................................. 814 Fast UPDATE ........................................................................................................................................................ 815 The DELETE Command Basic Syntax .................................................................................................................. 816 DELETE and TRUNCATE Examples................................................................................................................... 817 To DELETE or to TRUNCATE ............................................................................................................................ 818 Subquery and Join DELETE Command Syntax .................................................................................................... 819 Example of Subquery DELETE Command ........................................................................................................... 820 Chapter 24 – ANALYZE and VACUUM ................................................................................................................ 822 ANALYZE ............................................................................................................................................................. 823 ANALYZE Options ............................................................................................................................................... 824 What Columns Should You Analyze? ................................................................................................................... 825 Why Analyze? ........................................................................................................................................................ 826 VACUUM .............................................................................................................................................................. 827 VACUUM Options ................................................................................................................................................ 828 Chapter 25 – Greenplum Explain ............................................................................................................................. 830 How to See an EXPLAIN Plan .............................................................................................................................. 831 The Eight Rules to Reading an EXPLAIN Plan .................................................................................................... 832 Interpreting Keywords in an EXPLAIN Plan ........................................................................................................ 833 Interpreting an EXPLAIN Plan .............................................................................................................................. 834 A Single Segment Retrieve – The Fastest Query................................................................................................... 835 EXPLAIN With an ORDER BY Statement........................................................................................................... 836 EXPLAIN ANALYZE ........................................................................................................................................... 837

Table of Contents EXPLAIN With a Range Query on a Table Partitioned By Day........................................................................... 838 EXPLAIN That Uses a B-Tree Index Scan ........................................................................................................... 839 EXPLAIN That Uses a Bitmap Scan ..................................................................................................................... 840 EXPLAIN With a Simple Subquery ...................................................................................................................... 841 EXPLAIN With a Columnar Query ....................................................................................................................... 842 EXPLAIN With a Clustered Index ........................................................................................................................ 843 The Most Important Concept for Joins is the Distribution Key ............................................................................ 844 EXPLAIN With Join that has to Move Data ......................................................................................................... 845 EXPLAIN With Join that has to Move Data ......................................................................................................... 846 Changing the Join Query Changes the EXPLAIN Plan ........................................................................................ 847 Analyzing the Tables Structures For a 3-Table Join.............................................................................................. 848 An EXPLAIN For a 3-Table Join .......................................................................................................................... 849 Explain of a Derived Table vs. a Correlated Subquery ......................................................................................... 850 Explain of the Correlated Subquery ....................................................................................................................... 851 Explain of the Derived Table ................................................................................................................................. 852 Chapter 26 – Statistical Aggregate Functions........................................................................................................... 854 The Stats Table ....................................................................................................................................................... 855 The STDDEV_POP Function ................................................................................................................................ 856 A STDDEV_POP Example ................................................................................................................................... 857 The STDDEV_SAMP Function............................................................................................................................. 858 A STDDEV_SAMP Example ................................................................................................................................ 859 The VAR_POP Function ....................................................................................................................................... 860 A VAR_POP Example ........................................................................................................................................... 861 The VAR_SAMP Function .................................................................................................................................... 862 A VAR_SAMP Example ....................................................................................................................................... 863 The VARIANCE Function..................................................................................................................................... 864 A VARIANCE Example ........................................................................................................................................ 865 The CORR Function .............................................................................................................................................. 866

Table of Contents A CORR Example .................................................................................................................................................. 867 Another CORR Example so you can Compare...................................................................................................... 868 The COVAR_POP Function .................................................................................................................................. 869 A COVAR_POP Example ..................................................................................................................................... 870 Another COVAR_POP Example so you can Compare ......................................................................................... 871 The COVAR_SAMP Function .............................................................................................................................. 872 A COVAR_SAMP Example .................................................................................................................................. 873 Another COVAR_SAMP Example so you can Compare ..................................................................................... 874 The REGR_INTERCEPT Function ....................................................................................................................... 875 A REGR_INTERCEPT Example .......................................................................................................................... 876 Another REGR_INTERCEPT Example so you can Compare .............................................................................. 877 The REGR_SLOPE Function ................................................................................................................................ 878 A REGR_SLOPE Example .................................................................................................................................. 879 Another REGR_SLOPE Example so you can Compare ....................................................................................... 880 The REGR_AVGX Function ................................................................................................................................. 881 A REGR_AVGX Example .................................................................................................................................. 882 Another REGR_AVGX Example so you can Compare ........................................................................................ 883 The REGR_AVGY Function ................................................................................................................................. 884 A REGR_AVGY Example .................................................................................................................................... 885 Another COVAR_POP Example so you can Compare ......................................................................................... 886 The REGR_COUNT Function ............................................................................................................................... 887 A REGR_COUNT Example .................................................................................................................................. 888 The REGR_R2 Function ........................................................................................................................................ 889 A REGR_R2 Example ........................................................................................................................................... 890 The REGR_SXX Function..................................................................................................................................... 891 A REGR_SXX Example ........................................................................................................................................ 892 The REGR_SXY Function..................................................................................................................................... 893 A REGR_SXY Example ........................................................................................................................................ 894 The REGR_SYY Function..................................................................................................................................... 895

Table of Contents A REGR_SYY Example ........................................................................................................................................ 896 Using GROUP BY ................................................................................................................................................. 897

Chapter 1

Introduction to the Greenplum Architecture

Chapter 1

Introduction to the Greenplum Architecture

Chapter 1 – Introduction to the Greenplum Architecture

“The man who has no imagination has no wings.” – Muhammad Ali

Page 2

Chapter 1

Introduction to the Greenplum Architecture

What is Parallel Processing? “After enlightenment, the laundry” - Zen Proverb

Tera-Tom’s Parallel Processing Wash and Dry

Tera-Tom’s Parallel Processing Wash and Dry

Tera-Tom’s Parallel Processing Wash and Dry

Tera-Tom’s Parallel Processing Wash and Dry

Tera-Tom’s Parallel Processing Wash and Dry

“After parallel processing the laundry, enlightenment!” - Greenplum Zen Proverb

Two guys were having fun on a Saturday night when one said, “ I’ve got to go and do my laundry.” The other said, “What?!” The man explained that if he went to the laundry mat the next morning, he would be lucky to get one machine and be there all day. But, if he went on Saturday night he could get all the machines. Then, he could do all his wash and dry in two hours. Now that’s parallel processing mixed in with a little dry humor!

Page 3

Chapter 1

Introduction to the Greenplum Architecture

The Basics of a Single Computer CPU

Memory How are we doing on orders today?

Orders Order_No 100 200 300 400

Customer_No

Order_Date

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 12347.53 8005.91 5111.47 15231.62

How would I know? I'm just a disk. I need to transfer the block of data to the memory, and that is a slow process.

“When you are courting a nice girl, an hour seems like a second. When you sit on a red-hot cinder, a second seems like an hour. That’s relativity.”

– Albert Einstein

Data on disk does absolutely nothing. When data is requested, the computer moves the data one block at a time from disk into memory. Once the data is in memory, it is processed by the CPU at lightning speed. All computers work this way. The "Achilles Heel" of every computer is the slow process of moving data from disk to memory. The real theory of relativity is to find out how to get blocks of data from the disk into memory faster!

Page 4

Chapter 1

Introduction to the Greenplum Architecture

Data in Memory is fast as Lightning CPU Memory Order_No 100 200 300 400

Customer_No

Order_Date

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 12347.53 8005.91 5111.47 15231.62

Orders Order_No 100 200 300 400

Customer_No

Order_Date

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 12347.53 8005.91 5111.47 15231.62

“You can observe a lot by watching.” – Yogi Berra

Once the data block is moved off of the disk and into memory, the processing of that block happens as fast as lightning. It is the movement of the block from disk into memory that slows down every computer. Data being processed in memory is so fast that even Yogi Berra couldn't catch it!

Page 5

Chapter 1

Introduction to the Greenplum Architecture

Parallel Processing Of Data Segment

Segment

Memory

Segment

Memory

Memory

Cust_No

Order_Date

Order_Total

Cust_No

Order_Date

Order_Total

21345679 32456733 31323134 87323456

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12347.53 8005.91 5111.47 15231.62

34345699 41456543 51323154 67823486

01/01/2013 01/01/2013 01/01/2013 01/01/2013

13347.51 13005.91 7611.57 11671.92

Orders Cust_No 21345679 32456733 31323134 87323456

Cust_No

Order_Date

87945679 98756733 35623134 97873456

Orders

Order_Date

Order_Total

Cust_No

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12347.53 8005.91 5111.47 15231.62

34345699 41456543 51323154 67823486

Order_Date

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Segment Memory

Order_Total

Cust_No

Order_Date

Order_Total

8347.53 17005.91 3451.47 19871.62

44445679 32547733 57497134 87768956

01/01/2013 01/01/2013 01/01/2013 01/01/2013

12447.53 8055.66 5651.47 231.62

Order_Total

Cust_No

01/01/2013 01/01/2013 01/01/2013 01/01/2013

Orders

Order_Total 13347.51 13005.91 7611.57 11671.92

Cust_No 87945679 98756733 35623134 97873456

Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013

Orders 8347.53 17005.91 3451.47 19871.62

44445679 32547733 57497134 87768956

Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013

Order_Total 12447.53 8055.66 5651.47 231.62

"If the facts don't fit the theory, change the facts." -Albert Einstein

Big Data is all about parallel processing. Parallel processing is all about taking the rows of a table and spreading them among many parallel processing units. In Greenplum, these parallel processing units are referred to as Segments. Above, we can see a table called Orders. There are 16 rows in the table. Each segment holds four rows. Now they can process the data in parallel and be four times as fast. What Albert Einstein meant to say was, “If the theory doesn't fit the dimension table, change it to a fact." Each Segment shares nothing and holds a portion of every table. Page 6

Chapter 1

Introduction to the Greenplum Architecture

Symmetric Multi-Processing (SMP) Server CPU

CPU

CPU

CPU

Cache

Cache

Cache

Cache

Bus

Shared Memory

Disk I/O

A Symmetric Multi-Processing system has multiple processors for extra power, but these processors share a single operating system, memory pool and they share access to the disks. This is a great architecture for speed, similar to a restaurant that is quick and organized, but it lacks the ability for unlimited expansion. When there are too many cooks in the kitchen you need an MPP system that scales many SMP systems together as one parallel processing data warehouse.

A Symmetric Multi-Processing (SMP) system is a single server that is sometimes referred to as a node. Watch next how Greenplum uses these servers (called Segment Servers) to create a Massively Parallel Processing (MPP) system using commodity hardware. Page 7

Chapter 1

Introduction to the Greenplum Architecture

Commodity Hardware Servers are configured for Greenplum Segment Host 1 CPU

CPU

CPU

Segment Host 2 CPU

Memory S E G M E N T

S E G M E N T

S E G M E N T

CPU

CPU

Segment Host n CPU

Memory S E G M E N T

S E G M E N T

S E G M E N T

CPU

CPU

Memory S E G M E N T

S E G M E N T

S E G M E N T

Greenplum allows you to utilize commodity hardware servers called a Segment Host. Greenplum also allows you to configure your parallel processes called segments. The number of segments per Segment Host is usually defined by the number of CPU's the Segment Host contains. The Segment Hosts are connected together to create a Massively Parallel Processing (MPP) system from each SMP.

Greenplum allows commodity hardware to be utilized to create one giant Greenplum cluster.

Page 8

Chapter 1

Introduction to the Greenplum Architecture

Commodity Hardware Allows For One Segment per CPU SMP Segment Node 1

SMP Segment Node n

Dual-Core CPU

Dual-Core CPU

Dual-Core CPU

Memory

Segment

Segment

Segment

Dual-Core CPU

Memory

Segment

Segment

Segment

Segment

Segment

Greenplum provides incredible speeds with inexpensive costs by allowing customers to purchase commodity hardware. The rule of thumb is to create one segment per CPU. If you have two dual-core CPU processors in a server you should build four segments. By connecting multiple SMP servers together you can scale your Greenplum cluster in a linear fashion. Double the segments and double your speeds forever!

Page 9

Chapter 1

Introduction to the Greenplum Architecture

The Master Host Master Host Dual-Core CPU

Dual-Core CPU

Memory System Catalog



When a user logs into Greenplum, the host will log them in and be responsible for their session.



The host checks the SQL syntax, creates the EXPLAIN plan, checks the security, and builds a plan for the segments to follow.



The host uses system statistics and statistics from the ANALYZE command to build the best plan.



The host doesn't hold user data, but instead holds the Global System Catalog.



The host always delivers the final answer set to the user.

The host is the boss and the segments are the workers. Who doesn't lover their boss? Users login to the host and never communicate directly with the segments. The host builds a plan for the segments to follow that is delivered in plan slices. Each slice instructs the segments what to do. When the segments have done their work they return it to the host.

Page 10

Chapter 1

Introduction to the Greenplum Architecture

The Segment's Responsibilities Segment

Segment

Segment

Segment

Segment

Segment



Segments are responsible for storing and retrieving rows from their assigned disk (Virtual disk).



Segments lock the tables and rows.



Segments sort rows and do all aggregation.



Segments handle all the join processing.



Segments handle all space management and space accounting.

Greenplum segments have the responsibilities listed above.

Page 11

Segment

Segment

Chapter 1

Introduction to the Greenplum Architecture

The Host's Plan is Either All Segments or a Single Segment SQL Statement SELECT * FROM Employee_Table WHERE Employee_No = 2000000 ;

Master Host

Use the Distribution Key in the WHERE Clause with equality and only one segment is contacted.

Distribution Key

INTERCONNECT

Segment Host

Gigabit Ethernet

Segment Host

On most queries the Master Host will broadcast the plan to each segment simultaneously, but if you use the distribution key in the WHERE clause of your SQL with an equality statement, then only a single segment will be contacted to return the row.

Page 12

Chapter 1

Introduction to the Greenplum Architecture

A Table has Columns and Rows Emp_No Dept_No First_Name 100 1001 Rafael 200 1002 Maria 300 1003 Charl 400 1004 Kyle 400 1005 Rob 300 1006 Inna 200 1007 Sushma 100 1008 Mo 300 1009 Mo Segment

Segment

Last_Name Salary Minal 90000 Gomez 80000 Kertzel 70000 Stover 60000 Rivers 50000 Kinski 50000 Davis 50000 Khan 60000 Swartz 70000 Segment

Employee_Table 1001 100 Rafael

Employee_Table Employee_Table Minal 90000 1002 200 Maria Gomez 80000 1003 300 Charl Kertzel 70000

1004 400 Kyle

Stover 60000 1005 400 Rob

1007 200 Sushma Davis 50000 1008 100 Mo

Rivers 50000 1006 300 Inna Kinski 50000 Khan

60000 1009 300 Mo Swartz 70000

The table above has 9 rows. Our small system above has three parallel processing units called segments. Each segment holds three rows. Double your segments and double your speed and power. The idea of parallel processing is to take the rows of a table and spread them across the segments so each segment can process their portion of the data in parallel.

Page 13

Chapter 1

Introduction to the Greenplum Architecture

Greenplum has Linear Scalability Host

Interconnect

Interconnect

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

"A Journey of a thousand miles begins with a single step ."

- Lao Tzu

Greenplum was born to be parallel. With each query, a single step is performed in parallel by each segment. A Greenplum system consists of a series of segments that will work in parallel to store and process your data. This design allows you to start small and grow infinitely. If your Greenplum system provides you with an excellent Return on Investment (ROI), then continue to invest by purchasing more segment nodes. Most companies start small, but after seeing what Greenplum can do, they continue to grow their ROI from the single step of implementing a Greenplum system to millions of dollars in profits. Double your segment nodes and double your speeds….Forever. The Greenplum Data Warehouse actually provides a journey of a thousand smiles! Page 14

Chapter 1

Introduction to the Greenplum Architecture

The Architecture of A Greenplum Data Warehouse Host

The Host manages the distribution of data and builds the plan for the segments to follow.

Segment Node 1 S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

Segment Node n S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

S E G M E N T

“Be the change that you want to see in the world.”

- Mahatma Gandhi

The Host is the brains behind the entire operation. The user logs into the host, and for each SQL query, the host will come up with a plan to retrieve the data. It passes that plan to each segment node, and each of the segments process their portion of the data. If the data is spread evenly, parallel processing works perfectly. This technology is relatively inexpensive. It might not "be the change", but it will help your company "keep the change" because costs are low. Greenplum uses both SMP and MPP technology. Each segment node is an SMP, but then many segment nodes are lined together to become one big MPP system. Depending on the commodity hardware being used and the number of processors this will determine the number of segments per segment node. Above, we can see 8 segments per segment node.

Page 15

Chapter 1

Introduction to the Greenplum Architecture

Nexus is Now Available for Greenplum

Why the Nexus Chameleon should be your query tool of choice: 1) Queries every major system 2) Provides visualization and automatically writes the SQL 3) Can perform cross-system joins with a few clicks of the mouse 4) Converts table structures and moves the table and data between systems 5) Compares and synchronizes databases 6) Can move an entire database of tables or views between systems 7) Has the "Garden of Analysis" to re-query answer sets inside your PC 8) Provides a dashboard of graphs and charts for answer sets

Download the Nexus for a free trial at www.CoffingDW.com and use Nexus in-house or on the cloud.

Page 16

Chapter 2

Page 17

Greenplum Table Structures

Chapter 2

Greenplum Table Structures

Chapter 2 – Greenplum Table Structures

“Let me once again explain the rules. The Greenplum Data Warehouse Rules!” - Tera-Tom Coffing

Page 18

Chapter 2

Greenplum Table Structures

The Concepts of Greenplum Tables 1. Tables are either Distributed by Hash or Distributed Randomly. 2. Tables are either stored in a heap or are append-only tables. 3. The rows of a table by default are unsorted in a heap or they can be physically sorted with a clustered index.

4. Tables are stored physically on disk in either a row or columnar format. 5. Tables can be partitioned.

6. Tables are either permanent, temporary or external Tables. 7. Table can have Primary and Foreign Key constraints although Foreign Key constraints are not enforced. 8. Tables can have Unique constraints and other Boolean constraints.

9. Compression techniques are supported at the table or column level.

Above are some basics about concepts for Greenplum tables. The next several pages will cover each point one at a time. This will allow you to see exactly what is going on immediately.

Page 19

Chapter 2

Greenplum Table Structures

Tables are Either Distributed by Hash or Random Segment

Segment

Segment

Memory

Memory

Memory

Hashed Each Distribution holds different rows. Each row is hashed by the values in a certain column, such as Employee_No

1 4 7 11

Joel Davis Rick Jahns Lynn Meyer Seth Rogers

2 5 8 12

Mary Lewis John Miller Rich Jones Kyle Watson

3 Tony Brady 6 Lana Payne 9 Lorie Stewart 13 Dawn Daily

Random Each segment gets rows in a round robin fashion to ensure even distribution

100 Sales 200 Marketing

300 Finance 500 Research

400 HR 600 IT

The Greenplum database gives you two choices for table distribution. These choices are either distributed by Hash or randomly distributed. Large fact tables are usually hashed and smaller tables are often random. When a table is hashed, one or more columns are chosen as the distribution key. In our example above, the Employee_Table (top) is hashed by the Employee_No. The Random table (bottom) only has six rows in it and they are evenly distributed.

Page 20

Chapter 2

Greenplum Table Structures

A Hash Distributed Table has A Distribution Key CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) DISTRIBUTED BY (Employee_No) ; Segment Memory

Segment Memory

Segment Memory

Hashed Each Segment holds different rows. Each row is hashed by the values in a certain column, such as Employee_No

1 4 7 11

Joel Davis Rick Jahns Lynn Meyer Seth Rogers

2 5 8 12

Mary Lewis John Miller Rich Jones Kyle Watson

3 Tony Brady 6 Lana Payne 9 Lorie Stewart 13 Dawn Daily

Above is a basic TABLE CREATE STATEMENT for a table with a Distribution Key. You can use one or more columns as the Distribution Key on Greenplum. The values in this column will be hashed with a hashing formula and used to distribute the rows of the table across the Segments. Picking a good key is essential. An excellent Distribution Key will allow for even distribution among the many segments.

Page 21

Chapter 2

Greenplum Table Structures

Picking A Distribution Key That Is Not Very Unique CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) DISTRIBUTED BY (Last_Name) ; Segment 1

Segment 2

Segment 3

Segment 4

Memory

Memory

Memory

Memory

Jones Jones Jones Jones Miller

Davis Davis Davis Patel Patel

Luellener Grayson

Valentine Gonzales

The hash formula will distribute like values to the same segment. This can result in skewed data. Pick a good distribution key or you could get uneven data. Notice that like values went to the same segment and the data is unevenly spread.

Page 22

Chapter 2

Greenplum Table Structures

Random Distribution Uses a Round Robin Technique CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) DISTRIBUTED Randomly ; Segment 1

Segment 2

Segment 3

Segment 4

Memory

Memory

Memory

Memory

Davis Davis Davis Patel

Luellener Jones Miller

Valentine Jones Patel

Jones Grayson Gonzales Jones

Above is a basic TABLE CREATE STATEMENT for a table that is Distributed Randomly across all segments. That means that the rows are distributed in a round robin fashion to ensure even distribution. This should be done for relatively small tables, or for tables that don't have a column or a column combination, that will provide reasonably even distribution.

Page 23

Chapter 2

Greenplum Table Structures

Tables Will Be Distributed Among All Segments Segment 1

Segment 2

Segment 3

Segment 4

Memory

Memory

Memory

Memory

Segment 5

Segment 6

Segment 7

Segment 8

Memory

Memory

Memory

Memory

Segment 9

Segment 10

Segment 11

Segment 12

Memory

Memory

Memory

Memory

Above we see 12 segments and five tables. Each table is spread across all 12 segments. All five tables above are row based tables. Some are hash distributed and some are randomly distributed. Just see the data and understand that tables are spread across all segments in order to take full advantage of parallel processing. Greenplum was born to be parallel. Also understand that all segments have the exact same table structures, but each segment is responsible for different rows.

Page 24

Chapter 2

Greenplum Table Structures

The Default For Distribution Chooses the First Column Since no distribution has been defined the system defaults to choosing the first column as the Distribution Key.

CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) );

Segment

Segment

Segment

Memory

Memory

Memory

Emp_Intl

Emp_Intl

Emp_Intl

When no distribution is defined and the table is created, the default is a hash distribution policy. Greenplum will use either the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key.

Page 25

Chapter 2

Greenplum Table Structures

Table are Either a Heap or Append-Only

This table is stored as a Heap

This table is stored as Append-Only

CREATE TABLE Emp_Table ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) DISTRIBUTED BY (Employee_No) ; CREATE TABLE Emp_Append ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) WITH (appendonly=true) DISTRIBUTED BY (Employee_No) ;

UPDATE and DELETE statements can be performed on this table.

NO UPDATE and DELETE statements allowed. This saves about 20 bytes per row

By default, Greenplum Database uses storage in an unsorted heap. Heap tables allows for data to be deleted or updated after it is initially loaded. Append-only table storage works best with denormalized fact tables in a data warehouse environment, where the data is not updated after it is loaded. Append-only tables eliminate about 20 bytes per row because there is not the storage overhead of the per-row update visibility information. Append-only tables do not allow UPDATE and DELETE operations, and single row INSERT statements are not recommended because they are slow.

Page 26

Chapter 2

Greenplum Table Structures

Tables are Stored in Either Row or Columnar Format Segment

Segment

Segment

Memory

Memory

Memory

Employee_Row_Based

Employee_Row_Based

Employee_Row_Based

Employee_Columnar

Employee_Columnar

Employee_Columnar

A table is stored in either a row format or a columnar format. Traditionally, most systems have always stored the rows of a table in a row format (row store). When a query is run on the table the entire block of rows must be moved from disk into memory, where they are processed. This works well when all columns (or most columns) are needed to satisfy the query. Modern designs of computer systems will often now include a column format (column store). This works extremely well on queries that don't need all columns (or most columns) to satisfy the query, such as analytics, aggregations, etc. Only the columns needed will then be transferred from disk into memory. Greenplum gives you a choice of row, column or both.

Page 27

Chapter 2

Greenplum Table Structures

Creating a Column Oriented Table

Column-oriented tables must be Append-Only

CREATE TABLE Emp_Column ( Employee_No INTEGER ,Dept_No SMALLINT This table is ,First_Name VARCHAR(12) stored in a Columnar format ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) WITH (appendonly=true, orientation=column) DISTRIBUTED BY (Last_Name) ;



Column-oriented table storage must be append-only tables.



If rows are frequently inserted into the table, a roworiented table is better optimized for write operations.

A column-oriented table stores the columns in different blocks. A segment still gets the entire row, but only needs to move the column(s) into memory that are required to satisfy the current query. This can save a lot of time and data movement for queries that do not need all of the columns to satisfy the answer set. This works quite well on aggregating of data, ordered-analytics, etc. The only major issue is that column-oriented tables must be appendonly, however there are pretty substantial savings with column-oriented tables because the compression rates are so much better than row-oriented tables.

Page 28

Chapter 2

Greenplum Table Structures

Comparing Normal Table vs. Columnar Tables Segment Employee_Normal Emp_No

Dept_No

First_Name

Last_Name

Salary

1001

100

Rafael

Minal

90000.00

1004

400

Kyle

Stover

60000.00

1007

200

Sushma

Davis

50000.00

Employee_Columnar Emp_No

Dept_No

First_Name

Last_Name

Salary

1001

100

Rafael

Minal

90000.00

1004

400

Kyle

Stover

60000.00

1007

200

Sushma

Davis

50000.00

Above is a picture of the same table stored as a row-based (top) and column-based design. Notice that either way the node gets the entire row, but Greenplum has the option of storing it in either a row-based or column-based design. The row-based data is stored in one giant block so whenever the table is queried the entire block must move from disk into memory. The column-based design allows individual columns to move from disk to memory.

Page 29

Chapter 2

Greenplum Table Structures

Columnar can move just One Column Block Into Memory Segment Memory

Emp_No

1001 1004 1007

SELECT Emp_No FROM Employee_Columnar ;

Query

Employee_Columnar Emp_No

Dept_No

First_Name

Last_Name

Salary

1001

100

Rafael

Minal

90000.00

1004

400

Kyle

Stover

60000.00

1007

200

Sushma

Davis

50000.00

Columnar is brilliant when a query only needs a small portion of the columns from a table to satisfy the query. This is also considered vertical partitioning. Why eat the whole cake when you can take just a piece?

Page 30

Chapter 2

Greenplum Table Structures

Segments on Distributions are aligned to Rebuild a Row Segment Memory

Emp_No

What if the query needed two columns?

Salary

1001 1004 1007

90000.00 SELECT Emp_No, Salary FROM Employee_Columnar ;

60000.00

50000.00

Employee_Columnar Emp_No

Dept_No

First_Name

Last_Name

Salary

1001

100

Rafael

Minal

90000.00

1004

400

Kyle

Stover

60000.00

1007

200

Sushma

Davis

50000.00

Columnar is brilliant when a query only needs a small portion of the columns from a table to satisfy the query. Instead of moving an entire block containing all columns and throwing out the ones you don't need, you can use a columnar design to only retrieve the columns needed to satisfy the query.

Page 31

Chapter 2

Greenplum Table Structures

Columnar Tables Store Each Column in Separate Blocks Segment Memory AVG Salary

Segment Memory AVG Salary

Segment Memory AVG Salary

This is the same data you saw on the previous page! The difference is that the above is a columnar design. I have color coded this for you. There are 8 rows in the table and five columns. Notice that the entire row stays on the same segment, but each column is a separate block. This is a brilliant design for Ad Hoc queries and analytics because when only a few columns are needed, columnar can move just the columns it needs to move. Columnar can't be beat for queries because the blocks are so much smaller, and what isn't needed isn't moved.

Page 32

Chapter 2

Greenplum Table Structures

Visualize the Data – Rows vs. Columns 24 rows (five columns) stored in 6 blocks in this row-based system

24 rows (five columns) stored in 15 blocks (each column is its own block)

Both example above have the same data and the same amount of data. If your applications tend to need to analyze the majority of columns or read the entire table, then a row-based system (top example) can move more data into memory. Columnar tables are advantageous when only a few columns need to be read. This is just one of the reasons that analytics goes with columnar like bread goes with butter. A row-based system must move the entire page into memory even if it only needs to read one row or even a single column. If a user above needed to analyze the Salary, the columnar system would move 80% less block mass.

Page 33

Chapter 2

Greenplum Table Structures

Table Rows are Either Sorted or Unsorted This table is sorted because it was created with a Clustered Index on Employee_No

Sorted

Employee_No Dept_No Last_Name

First_Name

Salary

1001

100

Rafael

Minal

90000

1004

400

Kyle

Stover

60000

1007

200

Sushma

Davis

50000

1020

200

May

Jones

60000

This table is unsorted (heap) because it was NOT created with a Clustered Index

Not Sorted

Employee_No Dept_No Last_Name

First_Name

Salary

1001 1007

100 200

Rafael Sushma

Minal Davis

90000 50000

1020

200

May

Jones

60000

1004

400

Kyle

Stover

60000

The rows of a table are either sorted or unsorted. If the table has a clustered index it is sorted, but if it does not have a clustered index then it is unsorted, which is referred to as a heap. You can only have one clustered index per table because you can only sort a table one way. Sorting has nothing to do with a distribution key or a Random table, but once the rows are placed on a segment they are either sorted (clustered index) or unsorted (heap).

Page 34

Chapter 2

Greenplum Table Structures

Creating a Clustered Index in Order to Physically Sort Rows CREATE TABLE Order_Cluster (Order_Number INTEGER ,Customer_Number INTEGER ,Order_Date DATE ,Order_Total DECIMAL(8,2) ) DISTRIBUTED BY (Order_Number) ;

Create the Table

Index Name

CREATE INDEX Ord_Date_idx ON Order_Cluster (Order_Date) ;

Create an Index

Indexed Column

CLUSTER Ord_Date_idx ON Order_Cluster ;

Cluster the Index

Above, we have sorted the Order_Cluster table on each segment by Order_Date. You can have one clustered index on a table because you can only sort the rows one specific way. Having a Clustered Index on a Date column will help with range queries, because the data on each segment is sorted by date. CLUSTER is not supported with append-only tables or column-oriented tables.

Page 35

Chapter 2

Greenplum Table Structures

Physically Ordered Tables Are Faster on Certain Queries CREATE TABLE Order_Cluster (Order_Number INTEGER ,Customer_Number INTEGER ,Order_Date DATE ,Order_Total DECIMAL(8,2) ) DISTRIBUTED BY (Order_Number) ; CREATE INDEX Ord_Date_idx ON Order_Cluster (Order_Date) ;

CLUSTER Ord_Date_idx ON Order_Cluster ;

Create the Table

Create an Index Cluster the Index

SELECT * FROM Order_Cluster WHERE Order_Date Between '2015-05-01' AND '2015-05-31'

Range queries on date columns can benefit greatly from a clustered index. The above table is physically sorted on each segment by the column Order_Date. The query above won't have to do a Full Table Scan (FTS), but instead read only the rows that fall within the sorted range.

Page 36

Chapter 2

Greenplum Table Structures

Another Way to Create a Clustered Table CREATE TABLE Order_Cluster_New AS SELECT * FROM Order_Table Create the new table ORDER BY Order_Date ; using an ORDER BY statement

DROP table Order_Table ;

Drop the old table

ALTER TABLE Order_Cluster_New RENAME TO Order_Table ; CREATE INDEX O_Date_Idx ON Order_Table (Order_Date) ;

VACUUM ANALYZE Order_Table ;

Rename the new table to the original table

Create an index on the column you did the ORDER BY on

VACUUM and ANALYZE your newly created table

Above, we have sorted the Order_Cluster table on each segment by Order_Date. You can have one clustered index on a table because you can only sort the rows one specific way. Having a Clustered Index on a Date column will help with range queries, because the data on each segment is sorted by date. CLUSTER is not supported with append-only tables or column-oriented tables. This is a different way to create a table that is sorted. Warning: You cannot drop a table that has dependencies such as views.

Page 37

Chapter 2

Greenplum Table Structures

Creating a B-Tree Index and then Running Analyze Create Table Emp_2000 (Employee_No INTEGER ,Dept_No INTEGER ,Last_Name VARCHAR(1000) )Distributed BY (Dept_No) ; CREATE INDEX Emp_Idx on Emp_2000 (Employee_No) ; Analyze Emp_2000 ; EXPLAIN Select * FROM Emp_2000 WHERE Employee_No = 1000020;

Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..200.32 rows=1 width=64) -> Index Scan using emp_idx on emp_2000 (cost=0.00..200.32 rows=1 width=64) Index Cond: employee_no = 1000020

Greenplum provides the index methods B-tree, bitmap, and GiST. The default is a B-tree. Above, we have created a table and loaded it with over 72,000 rows. We then created a B-tree index (non-unique). We ran statistics on the table using the Analyze command. We then typed the keyword EXPLAIN in front of our query to see what type of scan would take place. An Index Scan was utilized. We now know that the index on Employee_No is being used by the system.

Page 38

Chapter 2

Greenplum Table Structures

Creating a Bitmap Index Create Table Emp_75000 (Employee_No INTEGER ,Dept_No INTEGER ,Last_Name VARCHAR(1000) )Distributed BY (Employee_No); CREATE INDEX Dept_bmp_Idx on Emp_75000 USING bitmap (Dept_No) ;

Analyze Emp_75000 ; EXPLAIN Select * FROM Emp_75000 WHERE Dept_No = 1000021;

Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..201.40 rows=1 width=64) -> Index Scan using dept_bmp_idx on emp_75000 (cost=0.00..201.40 rows=1 width=64) Index Cond: dept_no = 1000021

Greenplum provides the index methods B-tree, bitmap, and GiST. The default is a B-tree. Above, we have created a table and loaded it with over 75,000 rows. We then created a Bitmap Index. We ran statistics on the table using the Analyze command. We then typed the keyword EXPLAIN in front of our query to see what type of scan would take place. A Bitmap Index Scan was utilized. We now know that the index on Dept_No is being used by the system.

Page 39

Chapter 2

Greenplum Table Structures

Why Create a Bitmap Index? Bitmap indexes are most effective for queries that contain multiple conditions in the WHERE clause on large data warehouse tables that have few UPDATE and DELETE modifications. Each bit in the bitmap corresponds to a possible tuple ID. If the bit is set, the row with the corresponding tuple ID contains the key value. A mapping function converts the bit position to a tuple ID. Bitmaps reduce normal index size because they are compressed for storage. The size of a bitmap index is equivalent to the number of rows in the table times the number of distinct values in the bitmap indexed column.

SELECT * FROM Employee_Table WHERE Dept_No = 100 AND Last_Name = 'Jones'

Queries where multiple column are ANDed together can be a Bitmap candidate

Greenplum provides the index methods B-tree, bitmap, and GiST. The default is a B-tree. These are best used when a query uses multiple columns that are ANDed together that both have a bitmap index.

Page 40

Chapter 2

Greenplum Table Structures

Tables Can Be Partitioned Greenplum Database supports both range and list partitioning.

Range partitioning is based on a numerical range, such as a date range or price range. List partitioning is based on a list of values, such as region, state codes or a products. Greenplum supports multi-level partitioning so a combination of both types is allowed. Table partitioning logically divides large tables, such as Fact tables into smaller, more manageable tables. Partitioned tables improve query performance through partition elimination. Instead of performing a Full Table Scan (FTS) only the data partitions needed to satisfy the query are scanned. Partitioning does not change the physical distribution of table data across segments. It changes the way each segment sorts the rows. A partitioned table can also help with data maintenance tasks, such as rolling old data out of the data warehouse or loading new data into the data warehouse.

Page 41

Chapter 2

Greenplum Table Structures

A Table Partitioned By Range (Per Month) CREATE TABLE Ord_Tbl_Part ( Order_Number integer ,Customer_Number integer ,Order_Date date ,Order_Total decimal(10,2)) DISTRIBUTED BY (Order_Number) PARTITION BY RANGE (Order_Date) ( START(date '2015-01-01') INCLUSIVE END (date '2015-12-31') EXCLUSIVE EVERY (INTERVAL '1 month'));

Segment 1

Segment 2

Segment 3

Segment 4

Ord_Tbl_Part

Ord_Tbl_Part

Ord_Tbl_Part

Ord_Tbl_Part

01

JAN

JAN

JAN

JAN

02 03

FEB

FEB

FEB

FEB

MAR

MAR

MAR

MAR

12

DEC

DEC

DEC

DEC

Above is the CREATE statement for the Ord_Tbl_Part table. This table is distributed by Hash on the column (Order_Number) and that is how the rows are placed on the proper segments, but this table is partitioned by Order_Date. This partitions the data on each segment by month. This physical partitioning allows for faster loads and faster maintenance (Insert, Update, Deletes). This is the design you want when users are performing range queries on dates. Page 42

Chapter 2

Greenplum Table Structures

A Visual of a Partitioned Table by Range (Month) Segment 1

Segment 2

Segment 3

Segment 4

Ord_Tbl_Part

Ord_Tbl_Part

Ord_Tbl_Part

Ord_Tbl_Part

01

JAN

JAN

JAN

JAN

02 03 04 05 06 07 08 09 10 11

FEB

FEB

FEB

FEB

MAR APR

MAR APR

MAR APR

MAR APR

MAY JUN JUL AUG

MAY JUN JUL AUG

MAY JUN JUL AUG

MAY JUN JUL AUG

SEP

SEP

SEP

SEP

OCT NOV

OCT NOV

OCT NOV

OCT NOV

DEC

DEC

DEC

DEC

12

SELECT * FROM Ord_Tbl_Part WHERE Order_Date Between '2015-05-01' AND '2015-05-31'

Each segment holds rows that were hash distributed by Order_Number, but once the rows for the table arrive on their respective segments they are sorted by month of Order_Date. Each month is stored in separate blocks. The above range query will not do a Full Table Scan (FTS). Each segment merely needs to read their May blocks.

Page 43

Chapter 2

Greenplum Table Structures

Tables Can Be Partitioned by Day CREATE TABLE Ord_Tbl_Day ( Order_Number integer ,Customer_Number integer ,Order_Date date ,Order_Total decimal(10,2)) DISTRIBUTED BY (Order_Number) PARTITION BY RANGE (Order_Date) ( START(date '2015-01-01') INCLUSIVE END (date '2015-12-31') EXCLUSIVE EVERY (INTERVAL '1 Day')); SELECT * FROM Ord_Tbl_Day WHERE Order_Date = '2015-05-31' ;

Each segment reads a Single partition

SELECT * FROM Ord_Tbl_Day WHERE Order_Date Between '2015-05-01' AND '2015-05-07' ; SELECT * FROM Ord_Tbl_Day WHERE Order_Date >= '2015-05-01' AND Order_Date = 1030 is in page 3. Page 93

Chapter 4

The Technical Details

The Building of a B-Tree for a Clustered Index (3 of 3) 1001

Intermediate Node 1001

Header

1030

Header

2000

Header

3000

6000

Intermediate Node 3000

Header

4000

Header

5000

Header

Root Node

Intermediate Node 6000

Header

7000

Header

8000

Header

Leaf Pages containing the actual data rows

Let's look at this B-Tree starting at the leaf level. Each leaf is an 32 K page that contains data rows. Each data row has a RowID containing the FileID:PageNo:RowNum, which takes up 32 bytes. The rows are sorted in each page by Employee_No. Each Intermediate node has a pointer to the first RowID and Employee_No for every leaf it is responsible for. The Root node has a pointer to the first RowID and Employee_No for each Intermediate node. As a leaf adds rows and expands past 32 K it splits. As an Intermediate node adds leafs and expands past 32 K it splits into two more Intermediate nodes. As a Root node continues to add more Intermediate node pointers and expands past 32 K it splits into two Root nodes. The reason they call it a B-Tree (Balanced Tree) is because every row can be retrieved at the exact same speed. Page 94

Chapter 4

The Technical Details

When Do I Create a Clustered Index? 1. OLTP-type applications where very fast single row lookup is required, typically by means of the primary key. Creating a clustered index on the primary key is ideal. 2. Clustered indexes are great for range queries that use operators such as BETWEEN, >, >=, AVGSAL Messages Garden of Analysis Result 1

dept_no last_name 1 200 Smith 2 400 Strickling 3 400 Harrison

first_name John Cletus Herbert

salary 48000.00 54500.00 54500.00

avgsal 44944.44 48333.33 48333.33

Most derived tables involve calculations, aggregations or ordered analytics. This allows tables and derived columns to mix well on the final report. Above, we are finding all employees who make a salary that is greater than the average salary within their own department. We created a derived table that holds all departments and the average salary within the department. We then join the derived table (named TeraTom) to the employee_table where we can check the salary vs. the avg (salary).

Page 637

Chapter 17

Temporary Tables

Quiz - Answer the Questions SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty ;

1) What is the name of the derived table? __________ 2) How many columns are in the derived table? _______ 3) What is the name of the derived table columns? ______

4) Is there more than one row in the derived table? _______ 5) What common keys join the Employee and Derived? _______ 6) Why were the join keys named differently? ______________

Page 638

Chapter 17

Temporary Tables

Answer to Quiz - Answer the Questions SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty ;

1) What is the name of the derived table? TeraTom 2) How many columns are in the derived table? 2

3) What’s the name of the derived columns? Depty and AVGSAL 4) Is their more than one row in the derived table? Yes 5) What keys join the tables? Dept_No and Depty 6) Why were the join keys named differently? If both were named Dept_No, we would error unless we full qualified.

Page 639

Chapter 17

Temporary Tables

Clever Tricks on Aliasing Columns in a Derived Table SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table Alias Here INNER JOIN

1

(SELECT Dept_No as Depty, AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) as TeraTom ON Dept_No = Depty ;

SELECT E.Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table as E INNER JOIN Alias Here

2

(SELECT Dept_No, AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) as TeraTom ON E.Dept_No = TeraTom.Dept_No ;

Page 640

Chapter 17

Temporary Tables

An example of Two Derived Tables in a Single Query WITH T (Dept_No, AVGSAL) AS (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) SELECT T.Dept_No, First_Name, Last_Name, AVGSAL, Counter FROM Employee_Table as E INNER JOIN T ON E.Dept_No = T.Dept_No INNER JOIN (SELECT Employee_No, SUM(1) OVER(PARTITION BY Dept_No ORDER BY Dept_No, Last_Name Rows Unbounded Preceding) FROM Employee_Table) as S (Employee_No, Counter) ON E.Employee_No = S.Employee_No ORDER BY T.Dept_No; We have two derived tables in our example. One is used in a WITH statement and the other is a derived table within the query itself.

Page 641

Chapter 17

Temporary Tables

MULTIPLE Derived Tables using the WITH Command Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum

Systems + + + + + + + + + + + + + + +

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3

Aster Data WITH E AS (SELECT Dept_No, Last_Name, Salary Azure Cloud Separate FROM Employee_Table) multiple DB2 Derived ,D AS (SELECT Dept_No, Department_Name Excel tables in a FROM Department_Table) Greenplum WITH Hadoop SELECT E.*, department_name by using a Kognitio FROM E INNER JOIN D comma Netezza ON E.Dept_No = D.Dept_No Oracle WHERE E.Dept_No = 100 Matrix Redshift Messages Garden of Analysis Result 1 SQL Server Sybase e.dept_no e.last_name e.salary department_name Teradata Chambers 48850.00 Marketing 1 100 Vertica

Using the WITH Command, we can CREATE multiple Derived tables that can be referenced elsewhere in the query.

Page 642

Chapter 17

Temporary Tables

Finding the First Occurrence Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 WITH Derived_Tbl AS (select Product_ID as Prod, Sale_Date, Daily_Sales, Row_Number() over (PARTITION BY product_id ORDER BY Sale_Date ASC) AS Row_Num from sales_table) Select * from Derived_Tbl Where Row_Num = 1 ; Messages

1 2 3

Prod 1000 2000 3000

Garden of Analysis

Result 1

Sale_Date Daily_Sales 09/28/2000 48850.40 09/28/2000 41888.88 09/28/2000 61301.77

Row_Num 1 1 1

Using the Row_Number ordered analytic and by partitioning of Product_ID and the sorting by Sale_Date ASC we are bringing back only the first occurrence of a row based on the earliest Sale_Date. This can be done because we are placing our query in a derived table and then selecting from that derived table using a WHERE clause.

Page 643

Chapter 17

Temporary Tables

Finding the Last Occurrence Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 WITH Derived_Tbl AS (select Product_ID as Prod, Sale_Date, Daily_Sales, Row_Number() over (PARTITION BY product_id ORDER BY Sale_Date Desc) AS Row_Num from sales_table) Select * from Derived_Tbl Where Row_Num = 1 ; Messages

1 2 3

Prod 1000 2000 3000

Garden of Analysis

Result 1

Sale_Date Daily_Sales 10/04/2000 54553.10 10/04/2000 32800.50 10/04/2000 15675.33

Row_Num 1 1 1

Using the Row_Number ordered analytic and by partitioning of Product_ID and the sorting by Sale_Date DESC we are bringing back only the first occurrence of a row based on the latest Sale_Date. This can be done because we are placing our query in a derived table and then selecting from that derived table using a WHERE clause.

Page 644

Chapter 17

Temporary Tables

Three Steps to Creating a Temporary Table CREATE Temporary TABLE Dept_Sum ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT PRESERVE ROWS ; INSERT INTO Dept_Sum SELECT Dept_No ,SUM(Salary) FROM Employee_Table GROUP BY 1 ; SELECT * FROM Dept_Sum ORDER BY Dept_No;

Create the Temporary Table

Populate the Temporary Table With an INSERT/SELECT

SELECT * FROM Dept_Sum ORDER BY 2 DESC;

Query the table all session long

When you use the phrase ON COMMIT PRESERVE ROWS the data will stay in the table all session long. The normal ANSI default is ON COMMIT DELETE ROWS, which will delete the rows after a single transaction. However, Greenplum defaults to ON COMMIT PRESERVE ROWS.

Page 645

Chapter 17

Temporary Tables

Three Versions of Creating a Temporary Table CREATE Temporary TABLE Dept_Agg1 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT PRESERVE ROWS ; CREATE Temporary TABLE Dept_Agg2 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DELETE ROWS ; CREATE Temporary TABLE Dept_Agg3 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DROP ; I will explain how to use these different techniques in the next few pages.

Page 646

Chapter 17

Temporary Tables

ON COMMIT PRESERVE ROWS is the Greenplum Default CREATE Temporary TABLE Dept_Agg5 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT PRESERVE ROWS ; CREATE Temporary TABLE Dept_Agg6 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) This will default to ON COMMIT PRESERVE ROWS

ANSI defaults to ON COMMIT DELETE ROWS, but Greenplum has cleverly made their default ON COMMIT PRESERVE ROWS. Both examples above accomplish the same thing.

Page 647

Chapter 17

Temporary Tables

ON COMMIT DELETE ROWS CREATE Temporary TABLE Dept_Agg8 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DELETE ROWS ; INSERT INTO Dept_Agg8 SELECT Dept_No, SUM(Salary) FROM Employee_Table GROUP BY 1;

SELECT * FROM Dept_Agg8 Order by 1; dept_no _______

sum_salary _________

No rows returned because the table is empty

ON COMMIT DELETE ROWS allows the user a single transaction after creating the table before it deletes the contents. After the INSERT/SELECT the table's rows were deleted. This seems stupid at first, but it is actually smart. The next page will show you how to take advantage of this and why it is used.

Page 648

Chapter 17

Temporary Tables

How to Use the ON COMMIT DELETE ROWS Option CREATE Temporary TABLE Dept_Agg7 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DELETE ROWS ;

Begin Transaction; Answer Set Returns INSERT INTO Dept_Agg7 SELECT Dept_No, SUM(Salary) FROM Employee_Table GROUP BY 1; SELECT * FROM Dept_Agg7 Order by 1;

dept_no _______

10 100 200 300 400 ?

sum_salary _________

64300.00 48850.00 89888.88 40200.00 145000.00 32800.50

End Transaction;

The ON COMMIT DELETE ROWS option allow you only one transaction after creating the temporary table, but you can embed the INSERT/SELECT and the SELECT to get the report you need inside a Begin Transaction/End Transaction statement. This option should be used if you only need the temporary table to produce a single report.

Page 649

Chapter 17

Temporary Tables

ON COMMIT DROP CREATE Temporary TABLE Dept_Aggb ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DROP ; INSERT INTO Dept_Aggb SELECT Dept_No, SUM(Salary) FROM Employee_Table GROUP BY 1; SELECT * FROM Dept_Aggb Order by 1;

Error – The table Dept_Aggb does not exist!

ON COMMIT DROP will drop the temporary table after a single transaction. That is why the error above occurred. After the INSERT/SELECT, the temporary table rows was dropped. This seems stupid at first, but it is actually smart. The next page will show you how to take advantage of this and why it is used.

Page 650

Chapter 17

Temporary Tables

How to Use the ON COMMIT DROP Option Begin Transaction; CREATE Temporary TABLE Dept_AggA ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DROP ;

INSERT INTO Dept_AggA SELECT Dept_No, SUM(Salary) FROM Employee_Table GROUP BY 1; SELECT * FROM Dept_AggA Order by 1;

Answer Set Returns dept_no _______ 10 100 200 300 400 ?

sum_salary _________ 64300.00 48850.00 89888.88 40200.00 145000.00 32800.50

End Transaction;

The ON COMMIT DROP option will drop the temporary table after a single transaction, which includes the CREATE statement. However, you can embed the CREATE Statement, the INSERT/SELECT and the SELECT to get the report you need inside a Begin Transaction/End Transaction statement. The great news is that the table no longer exists!

Page 651

Chapter 17

Temporary Tables

Create Table AS This table is exactly like the Order_Table

CREATE TABLE New_Order AS SELECT * FROM Order_Table This table uses only some columns

CREATE TABLE New_Employee AS SELECT First_Name ,Last_Name ,Salary FROM Employee_Table This table is a temporary table

CREATE Temporary TABLE temp_order AS SELECT * FROM Order_Table Above are some great example to quickly CREATE a Table from another table.

Page 652

Chapter 17

Temporary Tables

Creating a Temporary Table Using a CTAS that Joins Multiple Tables Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 CREATE Temporary Table Emp_Dept_Temp A CTAS statement can create AS SELECT E.*, Department_Name, Budget and populate FROM Employee_Table E a Temporary table INNER JOIN Department_Table D The table goes away ON E.Dept_No = D.Dept_No; After the session is over Messages

Garden of Analysis

Result 1

0 rows processed. CREATE TABLE Command Complete

Only the user can see the table in only the session it was created in

You can create a temporary table using a CTAS (Create Table AS) statement, as in the above example.

Page 653

Chapter 17

Temporary Tables

Create Table LIKE This example uses an INSERT/SELECT

CREATE TABLE Sales3 (LIKE Sales_Table); INSERT INTO Sales3 SELECT * FROM Sales_Table;

SELECT * FROM Sales3;

This example creates a temporary table

CREATE Temporary TABLE Sales4 (LIKE Sales_Table); INSERT INTO Sales4 SELECT * FROM Sales_Table; SELECT * FROM Sales4;

The example above creates at table using the LIKE statement. It then loads the data with an INSERT/SELECT. You are now ready to query the new table. Notice that you can do the same technique to create a temporary table.

Page 654

Chapter 17

Temporary Tables

Creating a Clustered Index on a Temporary Table CREATE Temporary TABLE Dept_Agg_Vol ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT PRESERVE ROWS ;

Create the Table

Index Name

CREATE INDEX Temp_Idx ON Dept_Agg_Vol (Dept_No) ;

Create an Index

Indexed Column

CLUSTER Temp_Idx ON Dept_Agg_Vol ;

Cluster the Index

Above we have sorted the temporary table Dept_Agg_Vol on each segment by Dept_No. You can have one clustered index on a table because you can only sort the rows one specific way. Having a Clustered Index on a Dept_No column will help with range queries, because the data on each segment is sorted by Dept_No. CLUSTER is not supported with append-only tables or column-oriented tables.

Page 655

Chapter 18

Page 656

Character Strings

Chapter 18

Character Strings

Chapter 18 – Character Strings

“It’s always been and always will be the same in the world: the horse does the work and the coachman is tipped.” - Anonymous

Page 657

Chapter 18

Character Strings

The LENGTH Command Counts Characters Nexus Chameleon File Edit View Query Tools Help Web Windows System: Hadoop

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT First_Name ,LENGTH (First_Name) AS Lnth FROM Employee_Table WHERE LENGTH (First_Name) < 7 ORDER BY 1; Messages

first_name

1 2 3 4

Billy Cletus John Mandee

Garden of Analysis

Result 1

Lnth 5 6 4 6

The LENTH command counts the number of characters. If 'Tom' was in the Employee_Table, his length would be three.

Page 658

Chapter 18

Character Strings

The LENGTH Command – Spaces can Count too Nexus Chameleon History

File Edit View Query Tools Help Web Windows System: Oracle

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

EXECUTE

Sandbox ?

New Query

Query 1 Query 2 Query 3 SELECT 'T o m' AS First_Name ,LENGTH('T o m') AS Lnth There are spaces in between each letter Messages

Garden of Analysis

Result 1

First_Name Length 1

T o m

5

Spaces in between count

If ‘T o m’ was in the Employee_Table, his length would be 5. Yes, spaces in between do count as characters.

Page 659

Chapter 18

Character Strings

The LENGTH Command Doesn't Count Trailing Spaces CHAR (20) SELECT Last_Name ,LENGTH (Last_Name) AS Lnth FROM Employee_Table ORDER BY 1; Last_Name Lnth __________ _____ Chambers 8 Coffing 7 Harrison 8 Jones 5 Larkins 7 Reilly 6 Smith 5 Smythe 6 Strickling 10 The LENGTH command counts characters, but it auto-trims the ending spaces at the end of each last name.

Page 660

Chapter 18

Character Strings

UPPER and LOWER Commands Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

?

Query 1 Query 2 Query 3 SELECT Last_Name AS "Name_Normal" ,UPPER (Last_Name) AS "Name_Upper" ,LOWER (Last_name) AS "Name_Lower" FROM Employee_Table WHERE Last_Name LIKE 'S%' ; Messages

Garden of Analysis

Result 1

Name_Normal Name_Upper Name_Lower smythe SMYTHE Smythe 1 STRICKLING strickling Strickling 2 smith SMITH Smith 3

Upper () converts text to uppercase and Lower () converts text to lowercase.

Page 661

Sandbox New Query

Chapter 18

Character Strings

Using the LOWER Command Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1

SELECT LOWER('AbCdE') as "Go Low" FROM Order_Table Limit 1 ; Messages

Garden of Analysis

Result 1

Go Low 1

abcde

The LOWER function converts all letters in a specified string to lowercase letters. If there are characters in the string that are not letters, they are not affected by the LOWER command.

Page 662

Chapter 18

Character Strings

A LOWER Command Example Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 SELECT 'They match' as "Do They Match?" FROM Order_Table WHERE LOWER('ABCDE') = 'abcde' Limit 1 ; Messages

Garden of Analysis

Result 1

Do They Match? 1

They match

The LOWER function converts all letters in a specified string to lowercase letters. If there are characters in the string that are not letters, they are not affected by the LOWER command. Above, we compare a LOWER 'ABCDE' = 'abcde' and they are now equivalent because we have lowercased the 'ABCDE'.

Page 663

Chapter 18

Character Strings

Using the UPPER Command Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1

SELECT UPPER('AbCdE') as "Go upper" FROM Order_Table Limit 1 ; Messages

Garden of Analysis

Result 1

Go upper 1

ABCDE

The UPPER function converts all letters in a specified string to uppercase letters. If there are characters in the string that are not letters, they are not affected by the UPPER command.

Page 664

Chapter 18

Character Strings

An UPPER Command Example Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 SELECT 'They match' as "Do They Match?" FROM Order_Table WHERE 'ABCDE' = UPPER('abcde') LIMIT 1 ; Messages

Garden of Analysis

Result 1

Do They Match?

1

They match

The UPPER function converts all letters in a specified string to uppercase letters. If there are characters in the string that are not letters, they are not affected by the UPPER command. Above, we compare a string of 'ABCDE' = UPPER 'abcde' and they are now equivalent because we have uppercased the 'abcde'.

Page 665

Chapter 18

Character Strings

Non-Letters are Unaffected by UPPER and LOWER Nexus Chameleon History

File Edit View Query Tools Help Web Windows System: Oracle

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

Sandbox

EXECUTE

?

New Query

Query 1 SELECT LOWER('ABCDE1') as "Number Stays" ,UPPER('abCdE2') as "Numbers Hold" FROM Order_Table LIMIT 1 ; Messages

Garden of Analysis

Result 1

Number Stays Numbers Hold 1

abcde1

ABCDE2

The UPPER and LOWER functions convert all letters in a specified string to either upper or lower case letters. If there are characters in the string that are not letters, they are not affected by the UPPER or LOWER commands. Notice in our example that the numbers 1 and 2 were unaffected by the LOWER and UPPER commands.

Page 666

Chapter 18

Character Strings

The CHARACTERS Command Counts Characters Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400

Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert

VARCHAR SELECT First_Name , CHARACTER_Length(First_Name) AS Lnth FROM Employee_Table WHERE CHARACTER_Length (First_Name) < 7 ORDER BY 1;

Answer Set First_Name Lnth __________ ____ Billy 5 Cletus

6

John

4

Mandee

6

The CHARACTERS command counts the number of characters. If ‘Tom’ was in the Employee_Table, his length would be three.

Page 667

Chapter 18

Character Strings

The CHARACTERS Command and Character Data CHAR (20) SELECT Last_Name ,CHARACTER_LENGTH(Last_Name) AS Lnth FROM Employee_Table ORDER BY 1;

Last_Name Lnth __________ _____ Chambers 8 Coffing 7 Harrison 8 Jones 5 Larkins 7 Reilly 6 Smith 5 Smythe 6 Strickling 10

The CHARACTERS command brings back a length even for Char (20) data type.

Page 668

Chapter 18

Character Strings

CHARACTER_LENGTH and OCTET_LENGTH

Query 1 SELECT First_Name ,CHARACTER_Length(First_Name) AS C_Length FROM Employee_Table ;

Query 2 SELECT First_Name ,Octet_Length (First_Name) AS C_Length FROM Employee_Table ; You can also use the OCTET LENGTH command. These two queries get the same exact answer sets!

Page 669

Chapter 18

Character Strings

The TRIM Command trims both Leading and Trailing Spaces Query 1

SELECT Last_Name ,Trim(Last_Name) AS No_Spaces FROM Employee_Table ;

Query 2 SELECT Last_Name ,Trim(Both from Last_Name) AS No_Spaces FROM Employee_Table ;

Both queries above do the exact same thing. They remove spaces from the beginning and the end of the column Last_Name. Both queries trim both the leading and trailing spaces from Last_Name.

Page 670

Chapter 18

Character Strings

Trim Combined with the CHARACTERS Command SELECT ' Rodriquez ' ,Characters (Trim (' Rodriquez ')) AS No_Spaces ;

2 front spaces

2 back spaces

' Rodriquez '

' Rodriquez ' __________ No_Spaces ___________ Rodriquez 9 This will allow for the character count to only be 9 because both the leading and trailing spaces have been cut.

Page 671

Chapter 18

Character Strings

How to TRIM only the Trailing Spaces SELECT ' Rodriquez ' ,Characters (Trim (Trailing FROM ' Rodriquez ')) AS Front_Spaces ;

2 front spaces

2 back spaces

' Rodriquez '

' Rodriquez ' ___________ Rodriquez

Front_Spaces ___________ 11

The TRAILING FROM Command allows you to only TRIM the spaces behind the Last_Name. Now, we will still get a character count of 11 because we are only cutting off the trailing spaces and not the beginning spaces.

Page 672

Chapter 18

Character Strings

REGEXP_REPLACE Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History EXECUTE

Sandbox ?

New Query

Query 1 SELECT Dept_No ,REGEXP_REPLACE(Dept_No, 0, 1) As Zero_to_1 FROM Employee_Table WHERE Dept_No IN (100, 200) ; Replace 0

Messages

Dept_No

1 2 3

200 100 200

Garden of Analysis

with 1 for Dept_No for the Result 1 first occurrence

Zero_to_1 210 110 210

The query above replaces the first occurrence of a zero with a one for the column Dept_No.

Page 673

Chapter 18

Character Strings

Concatenation Nexus Chameleon File Edit View Query Tools Help Web Windows

System: Hadoop

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1 Query 2 Query 3 SELECT First_Name ,Last_Name ,First_Name Two pipe symbols || || ' ' mean concatenation || Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy' Messages

first_name 1 Squiggy

Garden of Analysis

Result 1

last_name

Full_Name

Jones

Squiggy Jones

Concatenation allows you to combine multiple columns into one column. The || (Pipe Symbol) on your keyboard is just above the ENTER key. Don’t put a space in between, but just put two Pipe Symbols together. In this example, we have combined the first name, then a single space, and then the last name to get a new column called Full_Name.

Page 674

Chapter 18

Character Strings

A Visual of the TRIM Command Using Concatenation Concatenation without Trim and with Trim SELECT Last_Name concatenate ,First_Name ,Last_Name || First_Name as NameBackwards ,TRIM(Last_Name) || First_Name as TrimNameBackwards FROM Employee_Table

Last_Name First_Name __________ __________ Jones Squiggy Smith John Smythe Richard Harrison Herbert Chambers Mandee Strickling Cletus Reilly William Coffing Billy Larkins Loraine

NameBackwards TrimNameBackwards ______________________ __________________ Jones Squiggy JonesSquiggy Smith John SmithJohn Smythe Richard SmytheRichard Harrison Herbert HarrisonHerbert Chambers Mandee ChambersMandee Strickling Cletus StricklingCletus Reilly William ReillyWilliam Coffing Billy CoffingBilly Larkins Loraine LarkinsLoraine

When you use the TRIM command on a column, that column will have all beginning and ending spaces removed.

Page 675

Chapter 18

Character Strings

Trim and Trailing is Case Sensitive VARCHAR Capitol 'Y'

SELECT First_Name, Trim(trailing 'Y' from First_Name) AS No_Y, Trim(trailing 'y' from First_Name) AS Success FROM Employee_Table Lower Case 'y' ORDER BY 1; For leading and trailing TRIM commands, case sensitivity is important. First_Name No_Y Success __________ ________ __________ Billy Billy Bill Cletus Cletus Cletus Herbert Herbert Herbert John John John Loraine Loraine Loraine Mandee Mandee Mandee Richard Richard Richard Squiggy Squiggy Squigg William William William

For LEADING and TRAILNG TRIM commands, case sensitivity is required.

Page 676

Chapter 18

Character Strings

How to TRIM Trailing Letters VARCHAR

SELECT First_Name ,Trim(trailing 'y' from First_Name) AS No_Y ,Last_Name ,Trim(trailing 'g' from (TRIM (Last_Name))) AS No_G FROM Employee_Table ; CHAR(20)

First_Name No_Y __________ ________

Last_Name _________ No_G __________

Squiggy John Richard Herbert Mandee Cletus William Billy Loraine

Jones Smith Smythe Harrison Chambers Strickling Reilly Coffing Larkins

Squigg John Richard Herbert Mandee Cletus William Bill Loraine

Jones Smith Smythe Harrison Chambers Stricklin Reilly Coffin Larkins

The above example removed the trailing ‘y’ from the First_Name and the trailing ‘g’ from the Last_Name. Remember that this is case sensitive.

Page 677

Chapter 18

Character Strings

The SUBSTRING Command SELECT First_Name, SUBSTRING (First_Name FROM 2 for 3) AS Quiz FROM Employee_Table ; Start in position 2

First_Name __________ Squiggy John Richard Herbert Mandee Cletus William Billy Loraine

Go for 3 positions

Quiz ______ qui ohn ich erb and let ill ill ora

This is a SUBSTRING. The substring is passed two parameters, and they are the starting position of the string and the number of positions to return (from the starting position). The above example will start in position 2 and go for 3 positions! Page 678

Chapter 18

Character Strings

SUBSTRING and SUBSTR are equal, but use different syntax Query 1 with Substring

SELECT First_Name, SUBSTRING(First_Name FROM 2 for 3) AS Quiz FROM Employee_Table ;

Query 2 with Substr

SELECT First_Name, SUBSTR (First_Name , 2 ,3) AS Quiz2 FROM Employee_Table ;

Both queries above are going to yield the same results! SUBSTR is just a different way of doing a substring. Both have two parameters in starting position and number of character length.

Page 679

Chapter 18

Character Strings

How SUBSTRING Works with NO ENDING POSITION SELECT First_Name, SUBSTRING (First_Name FROM 2) AS GoToEnd FROM Employee_Table ; Start in Position 2

First_Name GoToEnd __________ _________ Squiggy quiggy John ohn Richard ichard Herbert erbert Mandee andee Cletus letus William illiam Billy illy Loraine oraine

If you don’t tell the Substring the end position, it will go all the way to the end.

Page 680

Chapter 18

Character Strings

Using SUBSTRING to move backwards SELECT First_Name, SUBSTRING (First_Name FROM 0 For 6) AS Before1 FROM Employee_Table ; Start in Position 0 (one space before)

First_Name Before1 __________ ________ Squiggy Squig John John Richard Richa Herbert Herbe Mandee Mande Cletus Cletu William Willi Billy Billy Loraine Lorai

A starting position of zero moves one space in front of the beginning. Notice that our FOR Length is 6 so ‘Squiggy’ turns into ‘ Squig’. The point being made here is that both the starting position and ending positions can move backwards which will come in handy as you see other example.

Page 681

Chapter 18

Character Strings

How SUBSTRING Works with a Starting Position of -1 SELECT First_Name, SUBSTRING (First_Name FROM -1 For 3) AS Before2 FROM Employee_Table ; Start in Position -1. This is two spaces before.

First_Name Before2 __________ ________ Squiggy S John J Richard R Herbert H Mandee M Cletus C William W Billy B Loraine L

A starting position of -1 moves two spaces in front of the beginning. Notice that our FOR Length is 3, so each name delivers only the first initial. The point being made here is that both the starting position and ending positions can move backwards which will come in handy as you see other example.

Page 682

Chapter 18

Character Strings

How SUBSTRING Works with an Ending Position of 0 SELECT First_Name, SUBSTRING (First_Name FROM 3 For 0) AS WhatsUp FROM Employee_Table ; Go for 0 positions

First_Name WhatsUp __________ ________ Squiggy John Richard Herbert Mandee Cletus William Billy Loraine

In our example above, we start in position 3, but we go for zero positions, so nothing is delivered in the column. That is what’s up!

Page 683

Chapter 18

Character Strings

An example using SUBSTRING, TRIM and CHAR Together SELECT Last_Name CHAR(20) ,SUBSTRING(Last_Name FROM CHARACTER_LENGTH( TRIM (TRAILING FROM Last_Name)) -1 FOR 2) AS Letters FROM Employee_Table; Last_Name __________ Jones Smith Smythe Harrison Chambers Strickling Reilly Coffing Larkins

Letters ______ es th he on rs ng ly ng ns

The SQL above brings back the last two letters of each Last_Name even though the last names are of different length. We first trimmed the spaces off of Last_Name. Then we counted the characters in the Last_Name. Then we subtracted two from the Last_Name character length and then passed it to our substring as the starting position. Since we didn’t give an ending position in our substring it defaulted to the end.

Page 684

Chapter 18

Character Strings

The POSITION Command finds a Letters Position SELECT Last_Name ,Position ('e' in Last_Name) AS Find_The_E ,Position ('f' in Last_Name) AS Find_The_F FROM Employee_Table ;

4th

e is in position

e is 2nd position in name

Last_Name Find_The_E Find_The_F __________ __________ __________ Jones 4 0 Smith 0 0 Smythe 6 0 No f is in Harrison 0 0 the name Chambers 6 0 Strickling 0 0 Reilly 2 0 1st f is in Coffing 0 3 3rd position Larkins 0 0

This is the position counter. What it will do is tell you what position a letter is on. Why did Jones have a 4 in the result set? The ‘e’ was in the 4th position. Why did Smith get a zero for both columns? There is no ‘e’ in Smith and no ‘f’ in Smith. If there are two ‘f’s, only the first occurrence is reported.

Page 685

Chapter 18

Character Strings

Concatenation

Two Pipe Symbols together (no space) mean concatenate

SELECT First_Name ,Last_Name ,First_Name A space || ' ' || Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy'

First_Name _________

Last_Name Full_Name _________ ___________

Squiggy

Jones

Squiggy Jones

See those || symbols? Those represent concatenation. That allows you to combine multiple columns into one column. The || (Pipe Symbol) on your keyboard is just above the ENTER key. Don’t put a space in between, but just put two Pipe Symbols together. In this example, we have combined the first name, then a single space and then the last name to get a new column called ‘Full name’ like Squiggy Jones.

Page 686

Chapter 18

Character Strings

Concatenation and SUBSTRING A Period (.) and a space

SELECT First_Name ,Last_Name ,Substring(First_Name, 1, 1) || '. ' || Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy'

_________ First_Name _________ Last_Name _________ Full_Name Squiggy Jones S. Jones Of the three items being concatenated together, what is the first item of concatenation in the example above? The first initial of the First_Name. Then, we concatenated a literal space and a period. Then, we concatenated the Last_Name.

Page 687

Chapter 18

Character Strings

Four Concatenations Together CHAR(20)

VARCHAR(12)

SELECT First_Name ,Last_Name ,TRIM(Last_Name) ||' ' || Substring(First_Name, 1, 1) || '.' AS Last_Name_1st FROM Employee_Table WHERE First_Name = 'Squiggy' ;

First_Name Last_Name_1st __________ Last_Name _________ _____________

Squiggy

Jones

Jones S.

Why did we TRIM the Last_Name? To get rid of the spaces or the output would have looked odd. How many items are being concatenated in the example above? There are 4 items concatenated. We start by trimming the Last_Name. Then we concatenate a single space. Then, we concatenate the first initial of the first name. And finally we concatenate a period.

Page 688

Chapter 18

Character Strings

Troubleshooting Concatenation ERROR: There should never be spaces between the pipe symbols

SELECT First_Name ,Last_Name ,TRIM (Last_Name) | | First_Name AS LastFirst FROM Employee_Table WHERE First_Name = 'Squiggy' ; This is now perfect

SELECT First_Name ,Last_Name ,TRIM (Last_Name) || First_Name AS LastFirst FROM Employee_Table WHERE First_Name = 'Squiggy' ; First_Name Last_Name ___________ LastFirst __________ __________ Squiggy

Jones

JonesSquiggy

What happened above to cause the error? Can you see it? The Pipe Symbols || have a space between them like | |, when it should be ||. It is a tough one to spot, so be careful.

Page 689

Chapter 19

Page 690

Interrogating the Data

Chapter 19

Interrogating the Data

Chapter 19 – Interrogating the Data

"The difference between genius and stupidity is that genius has its limits" - Albert Einstein

Page 691

Chapter 19

Interrogating the Data

Quiz – What would the Answer be? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT Class_Code ,Grade_Pt / (Grade_Pt * 2 ) as Math1 FROM Student_Table ORDER BY 1,2 ;

Can you guess what would return in the Answer Set?

Using the Student_Table above, try and predict what the answer will be if this query was running on the system.

Page 692

Chapter 19

Interrogating the Data

Answer to Quiz – What would the Answer be? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT Class_Code ,Grade_Pt / (Grade_Pt * 2 ) as Math1 FROM Student_Table ORDER BY 1,2 ; Error – Division by zero

You get an error when you DIVIDE by ZERO! Let’s turn the page and fix it!

Page 693

Chapter 19

Interrogating the Data

The NULLIF Command Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00

SELECT Class_Code ,Grade_Pt / ( NULLIF (Grade_pt, 0) * 2 ) AS Math1 FROM Student_Table; If you have a calculation where a ZERO could kill the operation, and you don’t want that, you can use the NULLIFZERO command to convert any zero value to a null value.

Page 694

Chapter 19

Interrogating the Data

Quiz – Fill in the Answers for the NULLIF Command Student_Table Student_ID _________ 423400 123250 234121

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Phillips Martin SR 3.00 Thomas Wendy FR 4.00

SELECT Fill in the Answer Last_Name Set below after ,NULLIF(Grade_Pt, 0) AS GP1 looking at the table ,NULLIF(Grade_Pt, 3.0) AS GP2 and the query. ,NULLIF(Grade_Pt, 4.0) AS GP3 FROM Student_Table WHERE Student_ID IN (423400, 123250, 234121) ORDER BY Last_Name ; Last_Name GP1 __________ ____ Larkins Phillips Thomas

GP2 ____

What would the above Answer Set produce from your analysis?

Page 695

GP3 ____

Chapter 19

Interrogating the Data

Answer– Fill in the Answers for the NULLIF Command Student_Table Student_ID _________ 423400 123250 234121

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Phillips Martin SR 3.00 Thomas Wendy FR 4.00

SELECT Fill in the Answer Last_Name Set below after ,NULLIF(Grade_Pt, 0) AS GP1 looking at the table ,NULLIF(Grade_Pt, 3.0) AS GP2 and the query. ,NULLIF(Grade_Pt, 4.0) AS GP3 FROM Student_Table WHERE Student_ID IN (423400, 123250, 234121) ORDER BY Last_Name ; Last_Name GP1 GP2 __________ ____ ____ ? 0.00 Larkins 3.00 ? Phillips 4.00 4.00 Thomas

GP3 ____ 0.00 3.00 ?

Look at the answers above. If it doesn’t make sense, go over it again until it does.

Page 696

Chapter 19

Interrogating the Data

The COALESCE Command – Fill In the Answers Student_Table Student_ID _________ 423400 260000 234121

Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Johnson Stanley ? ? Thomas Wendy FR 4.00

SELECT Fill in the Answer Last_Name Set below after looking at the table ,Grade_Pt and the query. ,Class_Code ,COALESCE (Grade_Pt, Student_ID) as ValidStudents FROM Student_Table WHERE Last_Name IN ('Johnson', 'Larkins', 'Thomas') ORDER BY 1 ; Last_Name Grade_Pt __________ ________ Johnson Larkins Thomas

? 0.00 4.00

Class_Code __________ ValidStudents ___________ ? FR FR

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.

Page 697

Chapter 19

Interrogating the Data

The COALESCE Answer Set Student_Table Student_ID _________ 423400 260000 234121

Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Johnson Stanley ? ? Thomas Wendy FR 4.00

SELECT Last_Name ,Grade_Pt ,Class_Code ,COALESCE (Grade_Pt, Student_ID) as ValidStudents FROM Student_Table WHERE Last_Name IN ('Johnson', 'Larkins', 'Thomas') ORDER BY 1 ;

Last_Name Grade_Pt __________ ________ Johnson Larkins Thomas

? 0.00 4.00

Class_Code __________ ValidStudents ___________ 260000 ? 0.00 FR 4.00 FR

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.

Page 698

Chapter 19

Interrogating the Data

COALESCE is Equivalent to This CASE Statement SELECT Last_Name ,Grade_Pt ,Class_Code ,COALESCE (Grade_Pt, Student_ID) as ValidStudents FROM Student_Table ; SELECT Last_Name ,Grade_Pt ,Class_Code , CASE WHEN Grade_Pt IS NOT NULL THEN Grade_Pt WHEN Student_ID IS NOT NULL THEN Class_Code ELSE NULL END as ValidStudents FROM Student_Table ;

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null. Above are two queries that return the exact same answer set. These example are designed to give you a better idea of how Coalesce works.

Page 699

Chapter 19

Interrogating the Data

The COALESCE Command Sample_Table Last_Name Home_Phone ___________ Work_Phone Cell_Phone __________ ___________ __________ Jones Patel Gonzales Nguyen

555-1234 ? ? ?

444-1234 456-7890 ? ?

? 454-6789 354-0987 ?

SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone) as Phone FROM Sample_Table ; Last_Name __________

Phone ______

Fill in the Answer Set above after looking at the table and the query

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.

Page 700

Chapter 19

Interrogating the Data

The COALESCE Answer Set Sample_Table

Last_Name Home_Phone ___________ Work_Phone Cell_Phone __________ ___________ __________ Jones Patel Gonzales Nguyen

555-1234 ? ? ?

444-1234 456-7890 ? ?

? 454-6789 354-0987 ?

SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone) as Phone FROM Sample_Table ;

Last_Name __________ Jones Patel Gonzales Nguyen

Phone ________ 555-1234 456-7890 354-0987 ?

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.

Page 701

Chapter 19

Interrogating the Data

The COALESCE Quiz Sample_Table Last_Name Home_Phone ___________ Work_Phone Cell_Phone __________ ___________ __________ Jones Patel Gonzales Nguyen

555-1234 ? ? ?

444-1234 456-7890 ? ?

? 454-6789 354-0987 ?

SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone, 'No Phone') as Phone FROM Sample_Table ; Last_Name __________

Phone ________

Fill in the answer set above after looking at the table and the query

Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null. Since we decided in the above query we don’t want NULLs, notice we have placed a literal ‘No Phone’ in the list. How will this affect the Answer Set?

Page 702

Chapter 19

Interrogating the Data

Answer - The COALESCE Quiz Sample_Table Last_Name Home_Phone ___________ Work_Phone __________ Cell_Phone __________ ___________ Jones Patel Gonzales Nguyen

555-1234 ? ? ?

444-1234 456-7890 ? ?

? 454-6789 354-0987 ?

SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone, 'No Phone') as Phone FROM Sample_Table ; Last_Name __________ Jones Patel Gonzales Nguyen

Phone ________ 555-1234 456-7890 354-0987 No Phone

Answers are above! We put a literal in the list so there’s no chance of NULL returning.

Page 703

Chapter 19

Interrogating the Data

The Basics of CAST (Convert and Store) CAST will convert a column or value’s data type temporarily into another data type. Below is the syntax:

SELECT CAST( AS [()] ) FROM ; Convert smallint to character

example using CAST:

CAST ( CAST ( CAST ( CAST ( CAST ( CAST (

AS CHAR(5) ) AS INTEGER ) AS SMALLINT ) AS BYTE (128) ) AS VARCHAR(5) ) AS FLOAT )

Truncates decimals

Data can be converted from one type to another by using the CAST function. As long as the data involved does not break any data rules (i.e. placing alphabetic or special characters into a numeric data type), the conversion works. The name of the CAST function comes from the Convert and Store operation that it performs.

Page 704

Chapter 19

Interrogating the Data

Some Great CAST (Convert and Store) Examples Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum

Systems + + + + + + + + + + + + + + +

Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica

Database: SQL Class

History

Sandbox

EXECUTE

?

New Query

Query 1

SELECT CAST('ABCDE' AS CHAR(1) ) AS Trunc ,CAST(128 AS CHAR(3) ) AS OK ,CAST(127 AS INTEGER ) AS Bigger FROM Dual

Messages

1

Garden of Analysis

TRUNC

OK

BIGGER

A

128

127

Result 1

The first CAST truncates the five characters (left to right) to form the single character ‘A’. In the second CAST, the integer 128 is converted to three characters and left justified in the output. The 127 was initially stored in a SMALLINT (5 digits - up to 32767) and then converted to an INTEGER. Hence, it uses 11 character positions for its display, ten numeric digits and a sign (positive assumed) and right justified as numeric.

Page 705

Chapter 19

Interrogating the Data

Some Great CAST (Convert and Store) Examples SELECT CAST(121.53 AS SMALLINT) AS Whole ,CAST(121.53 AS DECIMAL(3,0)) AS Rounder ;

______ _______ Whole Rounder 122 122.000000

The value of 121.53 was initially stored as a DECIMAL as 5 total digits with 2 of them to the right of the decimal point. Then, it is converted to a SMALLINT using CAST to remove the decimal positions, but notice it rounded up. On the other hand, the CAST in the column called Rounder is converted to a DECIMAL as 3 digits with no digits (3,0) to the right of the decimal, so it will round data values instead of truncating. Since .53 is greater than .5, it is rounded up to 122.

Page 706

Chapter 19

Interrogating the Data

Some Great CAST (Convert and Store) example SELECT Order_Number as OrdNo ,Customer_Number as CustNo ,Order_Date ,Order_Total ,CAST(Order_Total as integer) as Chopped ,CAST(Order_Total as Decimal(5,0)) as Rounded FROM Order_Table ;

OrdNo _________ CustNo Order_Date Order_Total _______ __________ __________ Chopped _______ 123585 123777 123512 123456 123552

87323456 57896883 11111111 11111111 31323134

10/10/1999 09/09/1999 01/01/1999 05/04/1998 10/01/1999

15231.62 23454.84 8005.91 12347.53 5111.47

15232 23455 8006 12348 5111

Rounded _______ 15232 23455 8006 12348 5111

The Column Chopped takes Order_Total (a Decimal (10,2) and CASTs it as an integer which chops off the decimals, but notice it still rounds up or down. Rounded CASTs Order_Total as a Decimal (5,0), which takes the decimals and rounds up if the decimal is .50 or above.

Page 707

Chapter 19

Interrogating the Data

Quiz - The Basics of the CASE Statements Course_Table Course_ID _________ 100 200 210 220 300 400

Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16

SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ Physical Database Design SQL Features

This is a CASE STATEMENT which allows you to evaluate a column in your table, and from that, come up with a new answer for your report. Every CASE begins with a CASE, and they all must end with a corresponding END. What would the answer be?

Page 708

Chapter 19

Interrogating the Data

Answer to Quiz - The Basics of the CASE Statements Course_Table Course_ID _________ 100 200 210 220 300 400

Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16

SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ ? Physical Database Design Two Credits SQL Features

The answer for the Physical Database Design class is null. This is because it fell through the case statement. The answer for the SQL Features course is Two Credits. Once a case statement gets a match, it leaves the statement and gets the next row. Page 709

Chapter 19

Interrogating the Data

Using an ELSE in the Case Statement Course_Table Course_ID _________ 100 200 210 220 300 400

Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16

SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' ELSE 'Four Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ Four Credits Physical Database Design Two Credits SQL Features

Now that we have an ELSE in our case statement we are guaranteed that nothing will fall through.

Page 710

Chapter 19

Interrogating the Data

Using an ELSE as a Safety Net Course_Table Course_ID _________ 100 200 210 220 300 400

Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16

SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' WHEN 4 THEN 'Four Credits' ELSE 'Do not know' END AS CreditAlias FROM Course_Table ; Now that we have an ELSE in our case statement we are guaranteed that nothing will fall through. An ELSE should be used in case you forgot a possibility and there was no match.

Page 711

Chapter 19

Interrogating the Data

Rules for a Valued Case Statement SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' Else 'Credits not found' END AS CreditAlias FROM Course_Table ;

The column Credits (in blue) follows the word CASE. This is a valued case statement. The value is the column Credits.

Rules for a Valued CASE: 1. You can only check for equality 2. You can only check the value of the column Credits

There are two types of CASE statements. There is the Valued CASE and the Searched CASE. Above are the rules for the Valued CASE statement.

Page 712

Chapter 19

Interrogating the Data

Rules for a Searched Case Statement SELECT Course_Name No Value follows the ,CASE word CASE. This is WHEN Credits Hash Join (cost=4.62..7.20 rows=7 width=50) Hash Cond: sc.course_id = c.course_id -> Hash Join (cost=2.23..4.58 rows=8 width=29) Hash Cond: sc.student_id = s.student_id -> Seq Scan on student_course_table sc (cost=0.00..2.14 rows=7 width=6) -> Hash (cost=2.10..2.10 rows=5 width=31) -> Seq Scan on student_table s (cost=0.00..2.10 rows=5 width=31) -> Hash (cost=2.24..2.24 rows=6 width=25) -> Broadcast Motion 2:2 (slice1; segments: 2) (cost=0.00..2.24 rows=6 width=25) -> Seq Scan on course_table c (cost=0.00..2.06 rows=3 width=25)

What few people in the world understand about joins is that two rows being joined need to be on the same segment. Because the Student_Course_Table and the Student_Table are joined on Student_ID, and both have a Distribution Key of Student_ID, so the matching rows naturally reside on the same segment. These tables are joined first. After they produce an intermediate answer set, the course_table is then broadcast to both segments for the final join.

Page 849

Chapter 25

Greenplum Explain

Explain of a Derived Table vs. a Correlated Subquery Both queries will return all columns from the Employee_Table if the employee makes a salary > Avg(Salary) within their own department. Correlated Subquery

Derived Table

SELECT * FROM Employee_Table as E WHERE Salary > (SELECT AVG(Salary) FROM Employee_Table as EE WHERE E.Dept_No = EE.Dept_No) ;

SELECT E.* FROM Employee_Table as E INNER JOIN (SELECT Dept_No , AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No AND Salary > AVGSAL

Both queries return the exact same answer set Employee_No Dept_No Last_Name First_Name _______ Salary ____________ ________ _________ __________ 1333454 1256349 1121334

200 Smith 400 Harrison 400 Strickling

John Herbert Cletus

48000.00 54500.00 54500.00

The three rows in the answer set are employees making a greater salary than the average salary within their dept_no. We were able to do this through a correlated subquery and a derived table. Now, we can compare the EXPLAIN plans. Page 850

Chapter 25

Greenplum Explain

Explain of the Correlated Subquery Correlated Subquery SELECT * FROM Employee_Table as E WHERE Salary > (SELECT AVG(Salary) FROM Employee_Table as EE WHERE E.Dept_No = EE.Dept_No) ; Gather Motion 2:1 (slice3; segments: 2) (cost=2.56..4.92 rows=4 width=43) -> Hash Join (cost=2.56..4.92 rows=2 width=43) Hash Cond: e.dept_no = "Expr_SUBQUERY".csq_c0 Join Filter: e.salary > "Expr_SUBQUERY".csq_c1 -> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..2.27 rows=5 width=43) Hash Key: e.dept_no -> Seq Scan on employee_table e (cost=0.00..2.09 rows=5 width=43) -> Hash (cost=2.48..2.48 rows=3 width=34) -> HashAggregate (cost=2.35..2.42 rows=3 width=34) Group By: ee.dept_no -> Redistribute Motion 2:2 (slice2; segments: 2) (cost=2.13..2.25 rows=3 width=34) Hash Key: ee.dept_no -> HashAggregate (cost=2.13..2.13 rows=3 width=34) Group By: ee.dept_no -> Seq Scan on employee_table ee (cost=0.00..2.09 rows=5 width=11)

The next page shows the EXPLAIN plan of the Derived table. Both plans are close to the same.

Page 851

Chapter 25

Greenplum Explain

Explain of the Derived Table SELECT E.* FROM Employee_Table as E INNER JOIN (SELECT Dept_No , AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No AND Salary > AVGSAL

Gather Motion 2:1 (slice3; segments: 2) (cost=2.56..4.92 rows=4 width=43) -> Hash Join (cost=2.56..4.92 rows=2 width=43) Hash Cond: e.dept_no = teratom.dept_no Join Filter: e.salary > teratom.avgsal -> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..2.27 rows=5 width=43) Hash Key: e.dept_no -> Seq Scan on employee_table e (cost=0.00..2.09 rows=5 width=43) -> Hash (cost=2.48..2.48 rows=3 width=34) -> HashAggregate (cost=2.35..2.42 rows=3 width=34) Group By: employee_table.dept_no -> Redistribute Motion 2:2 (slice2; segments: 2) (cost=2.13..2.25 rows=3 width=34) Hash Key: employee_table.dept_no -> HashAggregate (cost=2.13..2.13 rows=3 width=34) Group By: employee_table.dept_no -> Seq Scan on employee_table (cost=0.00..2.09 rows=5 width=11)

The previous page showed the EXPLAIN plan of the Correlated Subquery. Both plans are close to the same.

Page 852

Chapter 26

Page 853

Statistical Aggregate Functions

Chapter 26

Statistical Aggregate Functions

Chapter 26 – Statistical Aggregate Functions

"You can make more friends in two months by becoming interested in other people than you will in two years by trying to get other people interested in you." - Dale Carnegie

Page 854

Chapter 26

Statistical Aggregate Functions

The Stats Table Col1 Col3 ____ Col4 _____ Col5 _____ Col6 ____ Col2 ____ ____ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16

1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60

30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1

0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Above is the Stats table. This will be used for our statistical examples.

Page 855

Chapter 26

Statistical Aggregate Functions

The STDDEV_POP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using STDDEV_POP: STDDEV_POP() SELECT STDDEV_POP(col1) AS SDPCol1 FROM Stats_Table; __________________ SDPCol1

8.6554414483991899 The standard deviation function is a statistical measure of spread or dispersion of values. It is the root’s square of the difference of the mean (average). This measure is to compare the amount by which a set of values differs from the arithmetical mean. The STDDEV_POP function is one of two that calculates the standard deviation. The population is of all the rows included based on the comparison in the WHERE clause.

Page 856

Chapter 26

Statistical Aggregate Functions

A STDDEV_POP Example

1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

The STDDEV_POP function is one of two that calculates the standard deviation.

SELECT STDDEV_POP(col1) ,STDDEV_POP(col2) ,STDDEV_POP(col3) ,STDDEV_POP(col4) ,STDDEV_POP(col5) ,STDDEV_POP(col6) FROM Stats_Table;

AS Col1 AS Col2 AS Col3 AS Col4 AS Col5 AS Col6

Col1 Col2 ____ Col3 _____ Col4 _____ Col5 Col6 ____ _____ _____ 8.66

Page 857

4.39 13.82

8.66 4.42 26.89

The standard deviation function is a statistical measure of spread or dispersion of values.

Chapter 26

Statistical Aggregate Functions

The STDDEV_SAMP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using STDDEV_SAMP: STDDEV_SAMP() SELECT STDDEV_SAMP(col1) AS SDSCol1 FROM Stats_Table; SDSCol1 _________________ 8.8034084308295046 The standard deviation function is a statistical measure of spread or dispersion of values. It is the root’s square of the difference of the mean (average). This measure is to compare the amount by which a set of values differs from the arithmetical mean. The STDDEV_SAMP function is one of two that calculates the standard deviation. The sample is a random selection of all rows returned based on the comparisons in the WHERE clause. The population is for all of the rows based on the WHERE clause.

Page 858

Chapter 26

Statistical Aggregate Functions

A STDDEV_SAMP Example

1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

The STDDEV_SAMP function is one of two that calculates the standard deviation.

SELECT STDDEV_POP(col1) AS Col1 ,STDDEV_POP(col2) AS Col2 ,STDDEV_POP(col3) AS Col3 ,STDDEV_POP(col4) AS Col4 ,STDDEV_POP(col5) AS Col5 ,STDDEV_POP(col6) AS Col6 FROM Stats_Table; Col1 Col2 Col3 Col4 Col5 Col6 ____ _____ ____ _____ _____ _____ 8.66 4.39 13.82 8.66 4.42 26.89

Page 859

Chapter 26

Statistical Aggregate Functions

The VAR_POP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using VAR_POP:

VAR_POP() SELECT VAR_POP(col1) AS VPCol1 FROM Stats_Table;

VPCol1 ___________________ 74.9166666666666667

Page 860

Chapter 26

Statistical Aggregate Functions

A VAR_POP Example 1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation. There are two forms of Variance in Oracle, VAR_POP is for the entire population of data rows allowed by the WHERE clause.

SELECT VAR_POP(col1) AS Col1 ,VAR_POP(col2) AS Col2 ,VAR_POP(col3) AS Col3 ,VAR_POP(col4) AS Col4 ,VAR_POP(col5) AS Col5 ,VAR_POP(col6) AS Col6 FROM Stats_Table;

Col1 Col2 _____ Col3 Col4 Col5 Col6 ____ _____ _____ _____ _____ 74.92 19.29 191.06 74.92 19.58 722.81

Page 861

Chapter 26

Statistical Aggregate Functions

The VAR_SAMP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using VAR_SAMP: VAR_SAMP()

SELECT VAR_SAMP(col1) AS VSCol1 FROM Stats_Table; VSCol1 _______ 77.50

Page 862

Chapter 26

Statistical Aggregate Functions

A VAR_SAMP Example

1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation. There are two forms of Variance in Hadoop, VAR_SAMP is used for a random sampling of the data rows allowed through by the WHERE clause.

SELECT VAR_SAMP(col1) AS Col1 ,VAR_SAMP(col2) AS Col2 ,VAR_SAMP(col3) AS Col3 ,VAR_SAMP(col4) AS Col4 ,VAR_SAMP(col5) AS Col5 ,VAR_SAMP(col6) AS Col6 FROM Stats_Table ;

Col1 Col2 _____ Col3 Col4 Col5 Col6 ____ _____ _____ _____ _____ 17.50 19.95 197.65 77.50 20.25 747.73

Page 863

Chapter 26

Statistical Aggregate Functions

The VARIANCE Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using VARIANCE:

VARIANCE() SELECT VARIANCE (col1) AS VSCol1 FROM Stats_Table; VSCol1 _______ 77.50 The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation.

Page 864

Chapter 26

Statistical Aggregate Functions

A VARIANCE Example

1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation. There are two forms of Variance in Hadoop, VAR_SAMP is used for a random sampling of the data rows allowed through by the WHERE clause.

SELECT VARIANCE(col1) AS Col1 ,VARIANCE(col2) AS Col2 ,VARIANCE (col3) AS Col3 ,VARIANCE(col4) AS Col4 ,VARIANCE(col5) AS Col5 ,VARIANCE(col6) AS Col6 FROM Stats_Table ; Col1 Col2 _____ Col3 Col4 Col5 Col6 ____ _____ _____ _____ _____ 74.92 19.29 191.06 74.92 19.58 722.81

Page 865

Chapter 26

Statistical Aggregate Functions

The CORR Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax for using CORR: CORR(, ) SELECT CORR(col1, col2) AS CCol1and2 FROM Stats_Table; CCol1and2 _________ 0.99 The correlation coefficient is a number between -1 and 1. It is calculated from a number of pairs of observations or linear points (X,Y) Where: 1 = perfect positive correlation 0 = no correlation -1 = perfect negative correlation

The CORR function is a binary function, meaning that two variables are used as input to it. It measures the association between 2 random variables. If the variables are such that when one changes the other does so in a related manner, they are correlated. Independent variables are not correlated because the change in one does not necessarily cause the other to change.

Page 866

Chapter 26

Statistical Aggregate Functions

A CORR Example

1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

Where: 1 = perfect positive correlation 0 = no correlation -1 = perfect negative correlation

SELECT CORR(col1, col2) AS C1_2 ,CORR(col1, col3) AS C1_3 ,CORR(col1, col4) AS C1_4 ,CORR(col1, col5) AS C1_5 ,CORR(col1, col6) AS C1_6 FROM Stats_Table ; C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 ____ _____ 0.99 0.89 -1.00 -0.15 0.99

Page 867

Chapter 26

Statistical Aggregate Functions

Another CORR Example so you can Compare 1 2 3 4 5 6 Col

Page 868

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60

30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

SELECT CORR(col4, col2) AS C4_2 ,CORR(col4, col3) AS C4_3 ,CORR(col4, col1) AS C4_1 ,CORR(col4, col5) AS C4_5 ,CORR(col4, col6) AS C4_6 FROM Stats_Table ;

SELECT CORR(col1, col2) AS C1_2 ,CORR(col1, col3) AS C1_3 ,CORR(col1, col4) AS C1_4 ,CORR(col1, col5) AS C1_5 ,CORR(col1, col6) AS C1_6 FROM Stats_Table ;

C4_2 C4_3 _____ C4_1 _____ C4_5 _____ C4_6 ____ _____ -0.99 -0.89 -1.00 0.15 -0.99

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 ____ _____ 0.99 0.89 -1.00 -0.15 0.99

Chapter 26

Statistical Aggregate Functions

The COVAR_POP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax:

COVAR(, ) SELECT COVAR_POP(col1, col2) AS CCol1_2 FROM Stats_Table;

CCol1_2 _______ 37.5 The covariance is a statistical measure of the tendency of two variables to change in conjunction with each other. It is equal to the product of their standard deviations and correlation coefficients. The covariance is a statistic used for bivariate samples or bivariate distribution. It is used for working out the equations for regression lines and the product-moment correlation coefficient.

Page 869

Chapter 26

Statistical Aggregate Functions

A COVAR_POP Example

1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

The covariance is a statistical measure of the tendency of two variables to change in conjunction with each other. It is equal to the product of their standard deviations and correlation coefficients.

SELECT COVAR_POP(col1, col2) ,COVAR_POP(col1, col3) ,COVAR_POP(col1, col4) ,COVAR_POP(col1, col5) ,COVAR_POP(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 ______ C1_4 _____ C1_5 ______ C1_6 _____ ______ 37.50 105.90 -74.92 -5.82 230.75

Page 870

Chapter 26

Statistical Aggregate Functions

Another COVAR_POP Example so you can Compare 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1 2 3 4 5 6

1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

SELECT COVAR_POP(col4, col2) ,COVAR_POP(col4, col3) ,COVAR_POP(col4, col1) ,COVAR_POP(col4, col5) ,COVAR_POP(col4, col6) FROM Stats_Table ;

AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6

C4_2 C4_3 ______ C4_1 _____ C4_5 ______ C4_6 _____ ______ -37.50 -105.90 -74.92 5.82 -230.75

Page 871

SELECT COVAR_POP(col1, col2) ,COVAR_POP(col1, col3) ,COVAR_POP(col1, col4) ,COVAR_POP(col1, col5) ,COVAR_POP(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 ______ C1_4 _____ C1_5 ______ C1_6 _____ ______ 37.50 105.90 -74.92 -5.82 230.75

Chapter 26

Statistical Aggregate Functions

The COVAR_SAMP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Syntax: COVAR_SAMP (expression1,expression2) SELECT COVAR_SAMP(col1, col2) AS CCol1_2 FROM Stats_Table;

CCol1_2 _______ 38.79 The COVAR_SAMP function is sample covariance.

Page 872

Chapter 26

Statistical Aggregate Functions

A COVAR_SAMP Example

1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

The function eliminates all expression pairs where either expression in the pair is NULL.

SELECT COVAR_SAMP (col1, col2) AS C1_2 ,COVAR_SAMP(col1, col3) AS C1_3 ,COVAR_SAMP(col1, col4) AS C1_4 ,COVAR_SAMP(col1, col5) AS C1_5 ,COVAR_SAMP (col1, col6) AS C1_6 FROM Stats_Table ; C1_2 C1_3 ______ C1_4 _____ C1_5 ______ C1_6 _____ ______ 38.79 109.55 -77.50 -6.02 238.71

Page 873

Chapter 26

Statistical Aggregate Functions

Another COVAR_SAMP Example so you can Compare 1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

SELECT COVAR_SAMP(col1, col2) AS C1_2 ,COVAR_SAMP (col1, col3) AS C1_3 ,COVAR_SAMP (col1, col4) AS C1_4 ,COVAR_SAMP (col1, col5) AS C1_5 ,COVAR_SAMP (col1, col6) AS C1_6 FROM Stats_Table ; C1_2 C1_3 ______ C1_4 _____ C1_5 ______ C1_6 _____ ______ 38.79 109.55 -77.50 -6.02 238.71

Page 874

SELECT COVAR_SAMP (col4, col2) ,COVAR_SAMP (col4, col3) ,COVAR_SAMP (col4, col1) ,COVAR_SAMP (col4, col5) ,COVAR_SAMP (col4, col6) FROM Stats_Table ;

AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6

C4_2 C4_3 ______ C4_1 _____ C4_5 ______ C4_6 _____ ______ -38.79 -109.55 -77.50 6.02 -238.71

Chapter 26

Statistical Aggregate Functions

The REGR_INTERCEPT Function Syntax for using REGR_INTERCEPT:

REGR_INTERCEPT(dependent-expression, independent-expression)

SELECT REGR_INTERCEPT(col1, col2) AS RIofCol1_2 FROM Stats_Table;

RIofCol1_2 __________ -1.35

Page 875

Chapter 26

Statistical Aggregate Functions

A REGR_INTERCEPT Example 1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 A regression line is a line of best fit, drawn through a set of points on a graph for X and Y coordinates. It uses the Y coordinate as the Dependent Variable and the X value as the Independent Variable. Two regression lines always meet or intercept at the mean of the data points(x,y), where x=AVG(x) and y=AVG(y) and is not usually one of the original data points.

Page 876

SELECT REGR_INTERCEPT(col1, col2) AS C1_2 ,REGR_INTERCEPT(col1, col3) AS C1_3 ,REGR_INTERCEPT(col1, col4) AS C1_4 ,REGR_INTERCEPT(col1, col5) AS C1_5 ,REGR_INTERCEPT(col1, col6) AS C1_6 FROM Stats_Table ; C1_2 C1_4 _____ C1_5 C1_6 _____ C1_3 _____ _____ _____ -1.35

3.45

31.00 17.65

-0.83

Chapter 26

Statistical Aggregate Functions

Another REGR_INTERCEPT Example so you can Compare 1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

SELECT REGR_INTERCEPT(col1, col2) AS C1_2 ,REGR_INTERCEPT(col1, col3) AS C1_3 ,REGR_INTERCEPT(col1, col4) AS C1_4 ,REGR_INTERCEPT(col1, col5) AS C1_5 ,REGR_INTERCEPT(col1, col6) AS C1_6 FROM Stats_Table ;

SELECT REGR_INTERCEPT(col4, col2) AS C4_2 ,REGR_INTERCEPT(col4, col3) AS C4_3 ,REGR_INTERCEPT(col4, col1) AS C4_1 ,REGR_INTERCEPT(col4, col5) AS C4_5 ,REGR_INTERCEPT(col4, col6) AS C4_6 FROM Stats_Table ;

C1_2 C1_4 _____ C1_5 C1_6 _____ C1_3 _____ _____ _____

C4_2 C4_1 _____ C4_5 C4_6 _____ C4_3 _____ _____ _____

-1.35

32.35 27.55 31.00

Page 877

3.45

31.00 17.65

-0.83

13.35 31.83

Chapter 26

Statistical Aggregate Functions

The REGR_SLOPE Function Syntax for using REGR_SLOPE:

REGR_SLOPE(dependent-expression, independent-expression)

SELECT REGR_SLOPE(col1, col2) AS RSCol1_2 FROM Stats_Table;

RSCol1_2 _________ 1.94

A regression line is a line of best fit, drawn through a set of points on a graph of X and Y coordinates. It uses the Y coordinate as the Dependent Variable, and the X value as the Independent Variable. The slope of the line is the angle at which it moves on the X and Y coordinates. The vertical slope is Y on X and the horizontal slope is X on Y.

Page 878

Chapter 26

Statistical Aggregate Functions

A REGR_SLOPE Example 1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1

Col

A regression line is a line of best fit, drawn through a set of points on a graph of X and Y coordinates. It uses the Y coordinate as the Dependent Variable, and the X value as the Independent Variable. The slope of the line is the angle at which it moves on the X and Y coordinates. The vertical slope is Y on X and the horizontal slope is X on Y.

Page 879

SELECT REGR_SLOPE(col1, col2) ,REGR_SLOPE(col1, col3) ,REGR_SLOPE(col1, col4) ,REGR_SLOPE(col1, col5) ,REGR_SLOPE(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 1.94 0.55 -1.00 -0.30 0.32

Chapter 26

Statistical Aggregate Functions

Another REGR_SLOPE Example so you can Compare 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1 2 3 4 5 6

1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

SELECT REGR_SLOPE(col1, col2) ,REGR_SLOPE(col1, col3) ,REGR_SLOPE(col1, col4) ,REGR_SLOPE(col1, col5) ,REGR_SLOPE(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 1.94 0.55 -1.00 -0.30 0.32

Page 880

SELECT REGR_SLOPE(col4, col2) ,REGR_SLOPE(col4, col3) ,REGR_SLOPE(col4, col1) ,REGR_SLOPE(col4, col5) ,REGR_SLOPE(col4, col6) FROM Stats_Table ;

AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6

C4_2 C4_3 _____ C4_1 _____ C4_5 _____ C4_6 _____ _____ -1.94 -0.55 -1.00 0.30 -0.32

Chapter 26

Statistical Aggregate Functions

The REGR_AVGX Function Syntax for using REGR_AVGX:

REGR_AVGX(dependent-expression, independent-expression)

SELECT REGR_AVGX(col1, col2) AS RSCol1_2 FROM Stats_Table;

RSCol1_2 _________ 8.67

The REGR_AVGX function is the average of the independent variable (sum(X)/N).

Page 881

Chapter 26

Statistical Aggregate Functions

A REGR_AVGX Example

1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

The REGR_AVGX function is the average of the independent variable (sum(X)/N).

SELECT REGR_AVGX(col1, col2) ,REGR_AVGX(col1, col3) ,REGR_AVGX(col1, col4) ,REGR_AVGX(col1, col5) ,REGR_AVGX(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 8.67 21.73 15.5 7.23 51.17

Page 882

Chapter 26

Statistical Aggregate Functions

Another REGR_AVGX Example so you can Compare 1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

SELECT REGR_AVGX(col1, col2) ,REGR_AVGX(col1, col3) ,REGR_AVGX(col1, col4) ,REGR_AVGX(col1, col5) ,REGR_AVGX(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 8.67 21.73 15.5 7.23 51.17

Page 883

SELECT REGR_AVGX(col4, col2) ,REGR_AVGX(col4, col3) ,REGR_AVGX(col4, col1) ,REGR_AVGX(col4, col5) ,REGR_AVGX(col4, col6) FROM Stats_Table ;

AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6

C4_2 C4_3 _____ C4_1 _____ C4_5 _____ C4_6 _____ _____ 8.67 21.73 15.5 7.23 51.17

Chapter 26

Statistical Aggregate Functions

The REGR_AVGY Function Syntax for using REGR_AVGX: REGR_AVGX(dependent-expression, independent-expression)

SELECT REGR_AVGX(col1, col2) AS RSCol1_2 FROM Stats_Table;

RSCol1_2 _________ 8.67

The REGR_AVGX function is the average of the independent variable (sum(X)/N).

Page 884

Chapter 26

Statistical Aggregate Functions

A REGR_AVGY Example

1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

The REGR_AVGY function is the average of the independent variable (sum(X)/N).

SELECT REGR_AVGY(col1, col2) ,REGR_AVGY(col1, col3) ,REGR_AVGY(col1, col4) ,REGR_AVGY(col1, col5) ,REGR_AVGY(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 8.67 21.73 15.5 7.23 51.17

Page 885

Chapter 26

Statistical Aggregate Functions

Another COVAR_POP Example so you can Compare 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1 2 3 4 5 6

1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

SELECT REGR_AVGY(col1, col2) ,REGR_AVGY(col1, col3) ,REGR_AVGY(col1, col4) ,REGR_AVGY(col1, col5) ,REGR_AVGY(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 8.67 21.73 15.5 7.23 51.17

Page 886

SELECT REGR_AVGY(col4, col2) ,REGR_AVGY(col4, col3) ,REGR_AVGY(col4, col1) ,REGR_AVGY(col4, col5) ,REGR_AVGY(col4, col6) FROM Stats_Table ;

AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6

C4_2 C4_3 _____ C4_1 _____ C4_5 _____ C4_6 _____ _____ 8.67

21.73 15.5

7.23 51.17

Chapter 26

Statistical Aggregate Functions

The REGR_COUNT Function Syntax for using REGR_COUNT: REGR_COUNT(dependent-expression, independentexpression)

SELECT REGR_COUNT(col1, col2) AS RSCol1_2 FROM Stats_Table; RSCol1_2 _________ 30

The REGR_COUNT is the number of input rows in which both expressions are non-null.

Page 887

Chapter 26

Statistical Aggregate Functions

A REGR_COUNT Example

1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

The REGR_COUNT function is the number of input rows in which both expressions are non-null.

SELECT REGR_COUNT(col1, col2) ,REGR_COUNT(col1, col3) ,REGR_COUNT(col1, col4) ,REGR_COUNT(col1, col5) ,REGR_COUNT(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 30 30 30 30 30

Page 888

Chapter 26

Statistical Aggregate Functions

The REGR_R2 Function Syntax for using REGR__R2: REGR_R2(Y, X)

SELECT REGR_R2(col1, col2) AS RSCol1_2 FROM Stats_Table;

RSCol1_2 _________ 0.97

The REGR_R2 is the square of the correlation coefficient.

Page 889

Chapter 26

Statistical Aggregate Functions

A REGR_R2 Example 1 2 3 4 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60

30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

Col

The REGR_R2 is the square of the correlation coefficient.

SELECT REGR_R2(col1, col2) ,REGR_R2(col1, col3) ,REGR_R2(col1, col4) ,REGR_R2(col1, col5) ,REGR_R2(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 0.97 0.78 1 0.02 0.98

Page 890

Chapter 26

Statistical Aggregate Functions

The REGR_SXX Function Syntax for using REGR_SXX: REGR_SXX(Y, X)

SELECT REGR_SXX(col1, col2) AS RSCol1_2 FROM Stats_Table;

RSCol1_2 _________ 578.67

The REGR_SXX is the sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable).

Page 891

Chapter 26

Statistical Aggregate Functions

A REGR_SXX Example

1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 The REGR_SXX is the sum(X^2) sum(X)^2/N ("sum of squares" of the independent variable).

SELECT REGR_SXX(col1, col2) ,REGR_SXX(col1, col3) ,REGR_SXX(col1, col4) ,REGR_SXX(col1, col5) ,REGR_SXX(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 ______ C1_3 _____ C1_4 _____ C1_5 _______ C1_6 ______ 578.67 5731.87 2247.5 587.37 21684.17

Page 892

Chapter 26

Statistical Aggregate Functions

The REGR_SXY Function Syntax for using REGR_SXY: REGR_SXY(Y, X)

SELECT REGR_SXY(col1, col2) AS RSCol1_2 FROM Stats_Table;

RSCol1_2 _________ 1125

The REGR_SXY is the sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable).

Page 893

Chapter 26

Statistical Aggregate Functions

A REGR_SXY Example

1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

The REGR_SXY is the sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable).

SELECT REGR_SXY(col1, col2) ,REGR_SXY(col1, col3) ,REGR_SXY(col1, col4) ,REGR_SXY(col1, col5) ,REGR_SXY(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 ______ C1_3 _____ C1_4 _____ C1_5 _______ C1_6 ______ 1125 3177 -2247.5 -174.5 6922.5

Page 894

Chapter 26

Statistical Aggregate Functions

The REGR_SYY Function Syntax for using REGR_SYY: REGR_SYY(Y, X)

SELECT REGR_SYY(col1, col2) AS RSCol1_2 FROM Stats_Table;

RSCol1_2 _________ 2247.5

The REGR_SYY is the sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable).

Page 895

Chapter 26

Statistical Aggregate Functions

A REGR_SYY Example 1 2 3 4 5 6 Col

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100

The REGR_SYY is the sum(Y^2) sum(Y)^2/N ("sum of squares" of the dependent variable).

SELECT REGR_SYY(col1, col2) ,REGR_SYY(col1, col3) ,REGR_SYY(col1, col4) ,REGR_SYY(col1, col5) ,REGR_SYY(col1, col6) FROM Stats_Table ;

AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6

C1_2 ______ C1_3 _____ C1_4 _____ C1_5 _______ C1_6 ______ 2247.5 2247.5 2247.5 2247.5 2247.5

Page 896

Chapter 26

Statistical Aggregate Functions

Using GROUP BY SELECT col3 ,count(*) AS Cnt ,avg(col1) AS Avg1 ,stddev_pop(col1) AS SD1 ,var_pop(col1) AS VP1 ,avg(col4) AS Avg4 ,stddev_pop(col4) AS SD4 ,var_pop(col4) AS VP4 ,avg(col6) AS Avg6 ,stddev_pop(col6) AS SD6 ,var_pop(col6) AS VP6 FROM Stats_Table GROUP BY 1 ORDER BY 1; Col3 ____ 1 10 20 30 40 50 60

Page 897

Cnt ___ 2 7 14 2 2 2 1

Avg1 ____ 1.50 6.00 16.50 24.50 26.50 28.50 30.00

SD1 Avg4 ____ VP1 ___ _____ 0.50 0.25 29.50 2.00 4.00 25.00 4.03 16.25 14.50 0.50 0.25 6.50 0.50 0.25 4.50 0.50 0.25 2.50 0.00 0.00 1.00

SD4 ___ 0.50 2.00 4.03 0.50 0.50 0.50 0.00

VP4 ____ 0.25 4.00 16.25 0.25 0.25 0.25 0.00

Avg6 SD6 ____ ____ 2.50 2.50 24.29 8.63 53.57 10.76 75.00 5.00 87.50 2.50 92.50 2.50 00.00 0.00

VP6 ___ 6.25 74.49 115.82 25.00 6.25 6.25 0.00