Greenplum is the first open source data warehouse. Purchased and improved by EMC, sold to Dell, makes Greenplum one of t
165 9 8MB
English Pages 940 Year 2015
The Tera-Tom Video Series
Lessons with Tera-Tom Teradata Architecture and SQL Video Series These exciting videos make learning and certification much easier
Three ways to view them: 1. Safari (look up Coffing Studios) 2. CoffingDW.com (sign-up on our website) 3. Your company can buy them all for everyone to see (contact [email protected])
Current Books in the Tera-Tom Genius Series
Current Books in the Tera-Tom Genius Series
Our Recommended Book In The Tera-Tom Genius Series
Tera-Tom- Author of over 75 Books
Tera-Tom books have been the primary source of Teradata learning for over 20 years. They have helped to teach millions of people all aspects of Teradata. What people love the most about the Tera-Tom books is how easy they are to understand. They are so easy that a seven year old boy (raised by wolves) can understand them!
The Best Query Tool Works on all Systems
When you possess a tool like Nexus, you have access to every system in your enterprise! The Nexus Query Chameleon is the only tool that works on all systems. Its Super Join Builder allows for the ERwin Logical Model to be loaded, and then Nexus shows tables and views visually. It then guides users to show what joins to what. As users choose the tables and columns they want in their report, Nexus builds the SQL for them with each click of the mouse. Nexus was designed for Teradata and Hadoop, but works on all platforms. Nexus even converts table structures between vendors, so querying and managing multi-vendor platforms is transparent. Even if you only work with one system, you will find that the Nexus is the best query tool you have ever used. If you work with multiple systems, you will be even more amazed. Download a free trial at www.CoffingDW.com.
Trademarks and Copyrights Microsoft Windows, Windows 2003 Server, SQL Server 2012, SQL Server Compact Edition, .NET, PDW, SQL Server, T-SQL, Azure SQL Data Warehouse and Azure Cloud are trademarks of Microsoft. Teradata, NCR, BYNET and SQL Assistant are registered trademarks of Teradata Corporation, Dayton, Ohio, U.S.A., IBM, DB2 and Netezza are registered trademarks of IBM Corporation, ANSI is a registered trademark of the American National Standards Institute. Ethernet is a trademark of Xerox. UNIX is a trademark of The Open Group. Linux is a trademark of Linus Torvalds. Java and Oracle is a trademark of Oracle. ParAccel is a trademark of ParAccel. Kognitio is a trademark of Kognitio. Greenplum is a trademark of EMC Corporation/Dell Corporation. Nexus Query Chameleon is a trademark of Coffing Data Warehousing. Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of programs or program segments that are included. The manual is not a publication of EMC or Dell Corporation, nor was it produced in conjunction with EMC or Dell Corporation. Copyright © November 2015 by Coffing Publishing ISBN 978-1-940540-33-7 All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, neither is any liability assumed for damages resulting from the use of information contained herein.
About Tom Coffing
Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 20 years. Tom has written over 50 books on all aspects of Teradata, Netezza, Kognitio, Redshift, ParAccel, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 Teradata classes in places such as India, Africa, Europe, China, Malaysia, and throughout North America. Tom is also the owner and designer of the Nexus Query Chameleon, the most sophisticated enterprise query tool in the industry. The Nexus works on all platforms, including Hadoop, converts table structures between all systems, and allows companies to load their ERwin logical model inside Nexus. The Nexus guides users like a GPS system. Users point and click on any table or view from any system, and they are guided to what joins to what. As users choose the columns they want on their report, the SQL is built automatically. In High School, Tom was the first athlete from his school to ever place at state. He was selected by his school to represent them at Buckeye Boys State, and Tom was inducted into the first class of the Lakota High School Hall of Fame. At the University of Arizona and University of Nevada Las Vegas, Tom was a two-time All-American wrestler, Sophomore Athlete of the year, and a two-time winner of the 1980 Olympic wrestling trials. Tom graduated with a Bachelor’s degree in Speech Communications. After college, Tom became a state and national champion speech winner for Toastmasters and won two orchid awards as an actor. Tom is the proud father of three wonderful children and has been married for the past 32 years. You can contact Tom at 513 300-0341 or at [email protected].
About Leona Coffing
Leona Coffing is the Co-founder and Chief Financial Officer of Coffing Data Warehousing. She has co-authored several books on data warehousing and has produced more than 75 CoffingDW books. Leona has always been a driving force in the success and growth of Coffing Data Warehousing. She has successfully been able to keep Coffing Data Warehousing as an independent company without utilizing venture capital. Leona is also responsible for managing all of CoffingDW Nexus programmers and has managed and mentored over 30 Coffing Data Warehousing employees. She has also been responsible for part of the design of Nexus. She is credited with the idea of implementing Microsoft products inside Nexus. Leona is also a professional golf caddie on the women’s professional tour, a mother of three children, and a proud grandmother. Leona is a Phoenix, AZ native but now resides in Cincinnati, OH with her husband of 35 years, Tom.
Table of Contents
Contents Chapter 1 – Introduction to the Greenplum Architecture ............................................................................................. 2 What is Parallel Processing? ...................................................................................................................................... 3 The Basics of a Single Computer ............................................................................................................................... 4 Data in Memory is fast as Lightning .......................................................................................................................... 5 Parallel Processing Of Data ....................................................................................................................................... 6 Symmetric Multi-Processing (SMP) Server .............................................................................................................. 7 Commodity Hardware Servers are configured for Greenplum .................................................................................. 8 Commodity Hardware Allows For One Segment per CPU ....................................................................................... 9 The Master Host ....................................................................................................................................................... 10 The Segment's Responsibilities ................................................................................................................................ 11 The Host's Plan is Either All Segments or a Single Segment .................................................................................. 12 A Table has Columns and Rows............................................................................................................................... 13 Greenplum has Linear Scalability ............................................................................................................................ 14 The Architecture of A Greenplum Data Warehouse................................................................................................. 15 Nexus is Now Available for Greenplum .................................................................................................................. 16 Chapter 2 – Greenplum Table Structures ................................................................................................................... 18 The Concepts of Greenplum Tables......................................................................................................................... 19 Tables are Either Distributed by Hash or Random .................................................................................................. 20 A Hash Distributed Table has A Distribution Key .................................................................................................. 21 Picking A Distribution Key That Is Not Very Unique ............................................................................................ 22 Random Distribution Uses a Round Robin Technique ............................................................................................ 23 Tables Will Be Distributed Among All Segments ................................................................................................... 24 The Default For Distribution Chooses the First Column ......................................................................................... 25 Table are Either a Heap or Append-Only ................................................................................................................ 26 Tables are Stored in Either Row or Columnar Format ............................................................................................ 27
Table of Contents Creating a Column Oriented Table .......................................................................................................................... 28 Comparing Normal Table vs. Columnar Tables ...................................................................................................... 29 Columnar can move just One Column Block Into Memory .................................................................................... 30 Segments on Distributions are aligned to Rebuild a Row ....................................................................................... 31 Columnar Tables Store Each Column in Separate Blocks ...................................................................................... 32 Visualize the Data – Rows vs. Columns .................................................................................................................. 33 Table Rows are Either Sorted or Unsorted .............................................................................................................. 34 Creating a Clustered Index in Order to Physically Sort Rows ................................................................................ 35 Physically Ordered Tables Are Faster on Certain Queries ...................................................................................... 36 Another Way to Create a Clustered Table ............................................................................................................... 37 Creating a B-Tree Index and then Running Analyze ............................................................................................... 38 Creating a Bitmap Index .......................................................................................................................................... 39 Why Create a Bitmap Index? ................................................................................................................................... 40 Tables Can Be Partitioned........................................................................................................................................ 41 A Table Partitioned By Range (Per Month) ............................................................................................................ 42 A Visual of a Partitioned Table by Range (Month) ................................................................................................. 43 Tables Can Be Partitioned by Day ........................................................................................................................... 44 Visualize a Partitioned Table by Day ...................................................................................................................... 45 Creating a Partitioned Table Using a List ................................................................................................................ 46 Creating a Multi-Level Partitioned Table ................................................................................................................ 47 Changing a Table to a Partitioned Table.................................................................................................................. 48 Not Null Constraints ................................................................................................................................................ 49 Unique Constraints ................................................................................................................................................... 50 Unique Constraints That Fail ................................................................................................................................... 51 Primary Key Constraints .......................................................................................................................................... 52 A Primary Key Automatically Creates a Unique Index .......................................................................................... 53 Check Constraints .................................................................................................................................................... 54 Creating an Automatic Number Called a Sequence ................................................................................................ 55 Multiple INSERT example using a Sequence ......................................................................................................... 56
Table of Contents
Chapter 3 – Hashing and Data Distribution ................................................................................................................ 58 Distribution Keys Hashed on Unique Values Spread Evenly.................................................................................. 59 Distribution Keys with Non-Unique Values Spread Unevenly ............................................................................... 60 Best Practices for Choosing a Distribution Key ...................................................................................................... 61 The Hash Map Determines which Segment owns the Row..................................................................................... 62 The Hash Map Determines which Node will own the Row .................................................................................... 63 The Hash Map Determines which Node will own the Row .................................................................................... 64 The Hash Map Determines which Node will own the Row .................................................................................... 65 Hash Map Determines which Node will own the Row ........................................................................................... 66 A Review of the Hashing Process ............................................................................................................................ 67 Non-Unique Distribution Keys have Skewed Data ................................................................................................. 68 Non-Unique Distribution Keys have Skewed Data ................................................................................................. 69 Chapter 4 – The Technical Details.............................................................................................................................. 71 Greenplum Limitations ............................................................................................................................................ 72 Every Segment has the Exact Same Tables ............................................................................................................. 73 Tables are Distributed across All Segments ............................................................................................................ 74 The Table Header and the Data Rows are Stored Separately .................................................................................. 75 Segments Store Rows inside a Data Block Called a Page ....................................................................................... 76 To Read a Data Block a Node Moves the Block into Memory ............................................................................... 77 A Full Table Scan Means All Nodes Must Read All Rows..................................................................................... 78 Rows are Organized inside a Page ........................................................................................................................... 79 Moving Data Blocks is Like Checking In Luggage................................................................................................. 80 As Row-Based Tables Get Bigger, the Page Splits ................................................................................................. 81 Data Pages are Processed One at a Time per Unit ................................................................................................... 82 Creating a Table that is a Heap ................................................................................................................................ 83 Heap Page ................................................................................................................................................................. 84 Creating a Table that has a Clustered Index ............................................................................................................ 85
Table of Contents Clustered Index Page................................................................................................................................................ 86 The Row Offset Array is the Guidance System for Every Row .............................................................................. 87 The Row Offset Array Provides Two Search Options (1 of 2) ............................................................................... 88 The Row Offset Array Provides Two Search Options (2 of 2) ............................................................................... 89 The Row Offset Array Helps With Inserts .............................................................................................................. 90 B-Trees ..................................................................................................................................................................... 91 The Building of a B-Tree for a Clustered Index (1 of 3) ......................................................................................... 92 The Building of a B-Tree for a Clustered Index (2 of 3) ......................................................................................... 93 The Building of a B-Tree for a Clustered Index (3 of 3) ......................................................................................... 94 When Do I Create a Clustered Index? ..................................................................................................................... 95 When Do I Create a Non Clustered Index? ............................................................................................................. 96 B-Tree for Non Clustered Index on a Clustered Table (1 of 2) ............................................................................... 97 B-Tree for Non Clustered Index on a Clustered Table (2 of 2) ............................................................................... 98 Adding a Non Clustered Index To A ....................................................................................................................... 99 B-Tree for Non Clustered Index on a Heap Table (1 of 2) .................................................................................... 100 B-Tree for Non Clustered Index on a Heap Table (2 of 2) .................................................................................... 101 Chapter 5 – Physical Database Design .................................................................................................................... 103 The Four Stages of Modeling for Greenplum ........................................................................................................ 104 The Logical Model ................................................................................................................................................. 105 The Logical Model can be loaded inside Nexus .................................................................................................... 106 First, Second and Third Normal Form ................................................................................................................... 107 Quiz – Choose that Normalization Technique ....................................................................................................... 108 Answer to Quiz – Choose that Normalization Technique ..................................................................................... 109 Quiz – What Normalization is it now?................................................................................................................... 110 Answer to Quiz – What Normalization is it now? ................................................................................................. 111 The Employee_Table and Department_Table can be joined ................................................................................. 112 The Employee_Table and Department_Table Join SQL ....................................................................................... 113 The Extended Logical Model Template................................................................................................................. 114
Table of Contents User Access is of Great Importance ....................................................................................................................... 115 User Access in Layman’s Terms ........................................................................................................................... 116 User Access for Joins in Layman’s Terms ............................................................................................................ 117 The Nexus Shows Users the Table’s Distribution Key ......................................................................................... 118 Data Demographics Tell Us if the Column is Worthy........................................................................................... 119 Data Demographics – Distinct Rows ..................................................................................................................... 120 Data Demographics – Distinct Rows Query .......................................................................................................... 121 Data Demographics – Max Rows Null .................................................................................................................. 122 Data Demographics – Max Rows Null Query ....................................................................................................... 123 Data Demographics – Max Rows Per Value ......................................................................................................... 124 Data Demographics – Max Rows Per Value ......................................................................................................... 125 Data Demographics – Typical Rows Per Value .................................................................................................... 126 Typical Rows Per Value Query For Greenplum Systems ..................................................................................... 127 SQL to Get the Average Rows Per Value for a Column (Mean) .......................................................................... 128 Data Demographics – Change Rating .................................................................................................................... 129 Factors When Choosing Greenplum Indexes ........................................................................................................ 130 Distribution Key Data Demographics Candidate Guidelines ................................................................................ 131 Distribution key Access Considerations ................................................................................................................ 132 Answer -Three Important distribution key Considerations ................................................................................... 133 Step 1 is to Pick All Potential Distribution Key Columns ..................................................................................... 134 Step 1 is to Pick All Potential Distribution Key Columns ..................................................................................... 135 Step 2 is to Pick All Potential Secondary Indexes ................................................................................................. 136 Answer to 2nd Step to Picking Potential Secondary Indexes ................................................................................ 137 Choose the Distribution Key and Secondary Indexes............................................................................................ 138 3rd Step is to picking your Indexes ......................................................................................................................... 139 Our Index Picks ...................................................................................................................................................... 140
Table of Contents Chapter 6 – Denormalization ................................................................................................................................... 142 Denormalization ..................................................................................................................................................... 143 Derived Data .......................................................................................................................................................... 144 Repeating Groups ................................................................................................................................................... 145 Pre-Joining Tables .................................................................................................................................................. 146 Storing Summary Data with a Trigger ................................................................................................................... 147 Summary Tables or Data Marts the Old Way ........................................................................................................ 148 Horizontal Partitioning the Old Way ..................................................................................................................... 149 Horizontal Partitioning the New Way .................................................................................................................... 150 Vertical Partitioning the Old Way ......................................................................................................................... 151 Columnar Tables Are the New Vertical Partitioning ............................................................................................. 152 Chapter 7 - Nexus ..................................................................................................................................................... 154 Nexus is Available on the Cloud............................................................................................................................ 155 Nexus Queries Every Major System ...................................................................................................................... 156 How to Use Nexus ................................................................................................................................................. 157 Why is Nexus Special? Visualization and Automatic SQL ................................................................................... 158 Why is Nexus Special? Cross-System Joins .......................................................................................................... 159 Why is Nexus Special? The Amazing Hub System ............................................................................................... 160 Why is Nexus Special? Save Answer Sets as Tables ............................................................................................ 161 Why is Nexus Special? Automated Data Movement ............................................................................................. 162 Why is Nexus Special? Nexus makes the Servers Talk Directly .......................................................................... 163 What Makes Nexus Special? The Garden of Analysis .......................................................................................... 164 The Garden of Analysis Grouping Sets Tab .......................................................................................................... 165 The Garden of Analysis - Grouping Sets Answer Sets .......................................................................................... 166 The Garden of Analysis – Join Tab (1 of 4) .......................................................................................................... 167 The Garden of Analysis – Join Tab (2 of 4) .......................................................................................................... 168 The Garden of Analysis – Join Tab (3 of 4) .......................................................................................................... 169 The Garden of Analysis – Join Tab (4 of 4) .......................................................................................................... 170
Table of Contents The Garden of Analysis – Charts/Graphs Tab (1 of 4) .......................................................................................... 171 The Garden of Analysis – Charts/Graphs Tab (2 of 4) .......................................................................................... 172 The Garden of Analysis – Charts/Graphs Tab (3 of 4) .......................................................................................... 173 The Garden of Analysis – Charts/Graphs Tab (4 of 4) .......................................................................................... 174 The Garden of Analysis – Dynamic Charts Tab (1 of 4) ....................................................................................... 175 The Garden of Analysis – Dynamic Charts Tab (2 of 4) ....................................................................................... 176 The Garden of Analysis – Dynamic Charts Tab (3 of 4) ....................................................................................... 177 The Garden of Analysis – Dynamic Charts Tab (4 of 4) ....................................................................................... 178 The Garden of Analysis – Dashboard Tab (1 of 5)................................................................................................ 179 The Garden of Analysis – Dynamic Charts Tab (2 of 5) ....................................................................................... 180 The Garden of Analysis – Dynamic Charts Tab (3 of 5) ....................................................................................... 181 The Garden of Analysis – Dynamic Charts Tab (4 of 5) ....................................................................................... 182 The Garden of Analysis – Dynamic Charts Tab (5 of 5) ....................................................................................... 183 Getting to the Super Join Builder ........................................................................................................................... 184 The Super Join Builder is the First Entry in the Menu .......................................................................................... 185 The Super Join Builder Shows Tables Visually .................................................................................................... 186 Using the Add Join Button ..................................................................................................................................... 187 What to Do When No Tables are Joinable? ........................................................................................................... 188 Drag a Joinable Object into the Super Join Builder ............................................................................................... 189 You Will See the Add Custom Join Window ........................................................................................................ 190 Defining the Join Columns .................................................................................................................................... 191 Your Tables Will Appear Together ....................................................................................................................... 192 Select the Columns You Want on the Report ........................................................................................................ 193 Check out the SQL Tab to See the SQL that has been built .................................................................................. 194 SQL Tab ................................................................................................................................................................. 195 Hit Execute to get the Report inside the Super Join Builder ................................................................................. 196 The Report is delivered inside the Super Join Builder .......................................................................................... 197 Let's Join Two Tables Again (1 of 6)..................................................................................................................... 198 Let's Join Two Tables Again (2 of 6)..................................................................................................................... 199
Table of Contents Let's Join Two Tables Again (3 of 6)..................................................................................................................... 200 Let's Join Two Tables Again (4 of 6)..................................................................................................................... 201 Let's Join Two Tables Again (5 of 6)..................................................................................................................... 202 Let's Join Two Tables Again (6 of 6)..................................................................................................................... 203 The Tabs of the Super Join Builder Philosophy – One Query............................................................................... 204 The Tabs of the Super Join Builder – Objects Tab ................................................................................................ 205 The Tabs of the Super Join Builder – Columns Tab) ............................................................................................ 206 The Tabs of the Super Join Builder – Sorting Tab ................................................................................................ 207 The Tabs of the Super Join Builder – Joins Tab .................................................................................................... 208 The Tabs of the Super Join Builder – SQL Tab..................................................................................................... 209 The Tabs of the Super Join Builder – Metadata Tab ............................................................................................. 210 The Tabs of the Super Join Builder – Analytics Tab ............................................................................................. 211 The Tabs of the SJB – Analytics Tab – OLAP Screen .......................................................................................... 212 Getting a Simple CSUM in the Analytics Tab – OLAP ........................................................................................ 213 Getting a Simple CSUM – The SQL Automatically Generated ............................................................................ 214 The Answer Set of the CSUM ............................................................................................................................... 215 Getting all of the OLAP functions in the Analytics Tab ....................................................................................... 216 A Five Table Join Using the Menu ........................................................................................................................ 217 The First Table is placed in the Super Join Builder ............................................................................................... 218 Using the Add Join Cascading Menu ..................................................................................................................... 219 All Five Tables Are In the Super Join Builder ...................................................................................................... 220 A Five Table Join Two Steps (Cube) ..................................................................................................................... 221 Choose Cube with Columns from the Left Top of the Table ................................................................................ 222 All Tables are Cubed (Joined Together Instantly) ................................................................................................. 223 Choose Cube and then Choose Your Columns ...................................................................................................... 224 Create Cube - Tables Are Joined Without Columns Selected ............................................................................... 225 Create Cube – Select the Columns You Want on the Report ................................................................................ 226 How to join Greenplum, Oracle and SQL Server Tables ...................................................................................... 227 The Greenplum Table is now in the Super Join Builder........................................................................................ 228
Table of Contents Drag the Joining Oracle Table to the Super Join Builder ...................................................................................... 229 Defining the Join Columns .................................................................................................................................... 230 Choose the Columns You Want on Your Report................................................................................................... 231 Let's Add a SQL Server Table to our Teradata and Oracle Join ........................................................................... 232 Defining the Join Columns .................................................................................................................................... 233 All Three Tables are now in the Super Join Builder.............................................................................................. 234 Change the Hub and Run the Join on Oracle ......................................................................................................... 235 Change the Hub and Run the Join on SQL Server................................................................................................. 236 Simply Amazing - Change the Hub to the Garden of Analysis ............................................................................. 237 Have the Answer Set Saved Automatically to Any System .................................................................................. 238 Saving the Answer Set to an Oracle or SQL Server System ................................................................................. 239 Saving the Answer Set to a Greenplum System .................................................................................................... 240 Saving the Answer Set to a Teradata System ........................................................................................................ 241 Chapter 8 – The Basics of SQL ................................................................................................................................ 243 Introduction ............................................................................................................................................................ 244 SELECT * (All Columns) in a Table ..................................................................................................................... 245 Fully Qualifying a Database, Schema and Table ................................................................................................... 246 SELECT Specific Columns in a Table .................................................................................................................. 247 Commas in the Front or Back? .............................................................................................................................. 248 Place your Commas in front for better Debugging Capabilities ............................................................................ 249 Sort the Data with the ORDER BY Keyword ....................................................................................................... 250 ORDER BY Defaults to Ascending ....................................................................................................................... 251 Use the Name or the Number in your ORDER BY Statement .............................................................................. 252 Two Examples of ORDER BY using Different Techniques ................................................................................. 253 Changing the ORDER BY to Descending Order ................................................................................................... 254 NULL Values sort First in Ascending Mode (Default) ......................................................................................... 255 NULL Values sort Last in Descending Mode (DESC).......................................................................................... 256 Major Sort vs. Minor Sorts .................................................................................................................................... 257
Table of Contents Multiple Sort Keys using Names vs. Numbers ...................................................................................................... 258 Sorts are Alphabetical, NOT Logical ..................................................................................................................... 259 Using A CASE Statement to Sort Logically .......................................................................................................... 260 How to ALIAS a Column Name ............................................................................................................................ 261 A Missing Comma can by Mistake become an Alias ............................................................................................ 262 Comments using Double Dashes are Single Line Comments ............................................................................... 263 Comments for Multi-Lines..................................................................................................................................... 264 Comments for Multi-Lines as Double Dashes Per Line ........................................................................................ 265 A Great Technique for Comments to Look for SQL Errors .................................................................................. 266 Chapter 9 – The WHERE Clause.............................................................................................................................. 268 The WHERE Clause limits Returning Rows ......................................................................................................... 269 Double Quoted Aliases are for Reserved Words and Spaces ................................................................................ 270 Character Data needs Single Quotes in the WHERE Clause................................................................................. 271 Character Data needs Single Quotes, but Numbers Don’t..................................................................................... 272 Comparisons against a Null Value ......................................................................................................................... 273 NULL means UNKNOWN DATA so Equal (=) won’t Work .............................................................................. 274 Use IS NULL or IS NOT NULL when dealing with NULLs ............................................................................... 275 NULL is UNKNOWN DATA so NOT Equal won’t Work .................................................................................. 276 Use IS NULL or IS NOT NULL when dealing with NULLs ............................................................................... 277 Using Greater Than or Equal To (>=).................................................................................................................... 278 AND in the WHERE Clause .................................................................................................................................. 279 Troubleshooting AND ............................................................................................................................................ 280 OR in the WHERE Clause ..................................................................................................................................... 281 Troubleshooting Or ................................................................................................................................................ 282 Troubleshooting Character Data ............................................................................................................................ 283 Using Different Columns in an AND Statement ................................................................................................... 284 Quiz – How many rows will return? ...................................................................................................................... 285 Answer to Quiz – How many rows will return? .................................................................................................... 286
Table of Contents What is the Order of Precedence? .......................................................................................................................... 287 Using Parentheses to change the Order of Precedence .......................................................................................... 288 Using an IN List in place of OR ............................................................................................................................ 289 The IN List is an Excellent Technique................................................................................................................... 290 IN List vs. OR brings the same Results ................................................................................................................. 291 The IN List Can Use Character Data ..................................................................................................................... 292 Using a NOT IN List .............................................................................................................................................. 293 Null Values in a NOT IN List Bring Back No Rows ............................................................................................ 294 A Technique for Handling Nulls with a NOT IN List ........................................................................................... 295 BETWEEN is Inclusive ......................................................................................................................................... 296 NOT BETWEEN is Also Inclusive ....................................................................................................................... 297 LIKE uses Wildcards Percent ‘%’ and Underscore ‘_’ ......................................................................................... 298 LIKE command Underscore is Wildcard for one Character.................................................................................. 299 The ilike Command ................................................................................................................................................ 300 LIKE Command Works Differently on Char Vs Varchar ..................................................................................... 301 Troubleshooting LIKE Command on Character Data ........................................................................................... 302 Introducing the TRIM Command .......................................................................................................................... 303 Introducing the RTRIM Command ........................................................................................................................ 304 Quiz – What Data is Left Justified and what is Right? .......................................................................................... 305 Numbers are Right Justified and Character Data is Left ....................................................................................... 306 Answer – What Data is Left Justified and what is Right? ..................................................................................... 307 An example of Data with Left and Right Justification .......................................................................................... 308 A Visual of CHARACTER Data vs. VARCHAR Data ........................................................................................ 309 Use the TRIM command to remove spaces on CHAR Data ................................................................................. 310 Escape Character in the LIKE Command changes Wildcards .............................................................................. 311 Escape Characters Turn off Wildcards in the LIKE Command ............................................................................ 312 Quiz – Turn off that Wildcard................................................................................................................................ 313 ANSWER – To Find that Wildcard ....................................................................................................................... 314 Introducing the RTRIM Command ........................................................................................................................ 315
Table of Contents Quiz – What Data is Left Justified and What is Right? ......................................................................................... 316 Numbers are Right Justified and Character Data is Left ....................................................................................... 317 Answer – What Data is Left Justified and what is Right? ..................................................................................... 318 An example of Data with Left and Right Justification .......................................................................................... 319 A Visual of CHARACTER Data vs. VARCHAR Data ........................................................................................ 320 RTRIM command Removes Trailing spaces on CHAR Data ............................................................................... 321 Using Like with an AND Clause to Find Multiple Letters .................................................................................... 322 Using Like with an OR Clause to Find Either Letters ........................................................................................... 323 Chapter 10 – Distinct vs. Group By .......................................................................................................................... 325 The Distinct Command .......................................................................................................................................... 326 Distinct vs. GROUP BY ........................................................................................................................................ 327 Quiz – How many rows come back from the Distinct? ......................................................................................... 328 Answer – How many rows come back from the Distinct? .................................................................................... 329 Chapter 11 – Aggregation ......................................................................................................................................... 331 Quiz – You calculate the Answer Set in your own Mind ...................................................................................... 332 Answer – You calculate the Answer Set in your own Mind ................................................................................. 333 Quiz – You calculate the Answer Set in your own Mind ...................................................................................... 334 Answer – You calculate the Answer Set in your own Mind ................................................................................. 335 The 3 Rules of Aggregation ................................................................................................................................... 336 There are Five Aggregates ..................................................................................................................................... 337 Quiz – How many rows come back? ..................................................................................................................... 338 Answer – How many rows come back? ................................................................................................................. 339 Troubleshooting Aggregates .................................................................................................................................. 340 GROUP BY when Aggregates and Normal Columns Mix ................................................................................... 341 GROUP BY delivers one row per Group .............................................................................................................. 342 GROUP BY Dept_No or GROUP BY 1 the same thing ....................................................................................... 343 Limiting Rows and Improving Performance with WHERE .................................................................................. 344
Table of Contents WHERE Clause in Aggregation limits unneeded Calculations ............................................................................. 345 Keyword HAVING tests Aggregates after they are totaled .................................................................................. 346 Aggregates Return Null on Empty Tables ............................................................................................................. 347 Keyword HAVING is like an Extra WHERE Clause for Totals ........................................................................... 348 Keyword HAVING tests Aggregates after they are totaled .................................................................................. 349 Getting the Average Values per Column ............................................................................................................... 350 Three types of Advanced Grouping ....................................................................................................................... 351 Group By Grouping Sets ........................................................................................................................................ 352 Group By Rollup .................................................................................................................................................... 353 GROUP BY Rollup Result Set .............................................................................................................................. 354 GROUP BY Cube .................................................................................................................................................. 355 GROUP BY CUBE Result Set............................................................................................................................... 356 GROUP BY CUBE Result Set............................................................................................................................... 357 Quiz - GROUP BY GROUPING SETS Challenge ............................................................................................... 358 Answer To Quiz - GROUP BY GROUPING SETS Challenge ............................................................................ 359 Chapter 12 – Join Functions ..................................................................................................................................... 361 Greenplum Join Quiz ............................................................................................................................................. 362 Greenplum Join Quiz Answer ................................................................................................................................ 363 Redistribution ......................................................................................................................................................... 364 Big Table Small Table Join Strategy ..................................................................................................................... 365 Duplication of the Smaller Table across All-Distributions ................................................................................... 366 If the Join Condition is the Distribution Key no Movement ................................................................................. 367 Matching Rows That Are On The Same Node Naturally ...................................................................................... 368 What if the Join Condition Columns are Not distribution keyes ........................................................................... 369 Strategy 1 of 4 – The Merge Join ........................................................................................................................... 370 Quiz – Redistribute the Employees by their Dept_No .......................................................................................... 371 Quiz – Employees' Dept_No landed on segment with Matches ............................................................................ 372 Quiz – Redistribute the Orders to the Proper segment .......................................................................................... 373
Table of Contents Answer to Redistribute the Employees by their Dept_No Quiz ............................................................................ 374 Strategy 2 of 4 – The Hash Join ............................................................................................................................. 375 Strategy 3 of 4 – The Nested Join .......................................................................................................................... 376 Strategy 4 of 4 – The Product Join ......................................................................................................................... 377 A Two-Table Join Using Traditional Syntax ......................................................................................................... 378 A two-table join using Non-ANSI Syntax with Table Alias ................................................................................. 379 You Can Fully Qualify All Columns ..................................................................................................................... 380 A two-table join using ANSI Syntax ..................................................................................................................... 381 Both Queries have the same Results and Performance.......................................................................................... 382 Quiz – Can You Finish the Join Syntax? ............................................................................................................... 383 Answer to Quiz – Can You Finish the Join Syntax? ............................................................................................. 384 Quiz – Can You Find the Error? ............................................................................................................................ 385 Answer to Quiz – Can You Find the Error? .......................................................................................................... 386 Super Quiz – Can You Find the Difficult Error? ................................................................................................... 387 Answer to Super Quiz – Can You Find the Difficult Error? ................................................................................. 388 Quiz – Which rows from both tables won’t return? .............................................................................................. 389 Answer to Quiz – Which rows from both tables won’t return? ............................................................................. 390 LEFT OUTER JOIN .............................................................................................................................................. 391 LEFT OUTER JOIN Results ................................................................................................................................. 392 RIGHT OUTER JOIN............................................................................................................................................ 393 RIGHT OUTER JOIN Example and Results......................................................................................................... 394 FULL OUTER JOIN .............................................................................................................................................. 395 FULL OUTER JOIN Results ................................................................................................................................. 396 Which Tables are the Left and which Tables are Right? ....................................................................................... 397 Answer - Which Tables are the Left and Which are the Right? ............................................................................ 398 INNER JOIN with Additional AND Clause .......................................................................................................... 399 ANSI INNER JOIN with Additional AND Clause ............................................................................................... 400 ANSI INNER JOIN with Additional WHERE Clause .......................................................................................... 401 OUTER JOIN with Additional WHERE Clause ................................................................................................... 402
Table of Contents OUTER JOIN with Additional AND Clause ......................................................................................................... 403 OUTER JOIN with Additional AND Clause Results ............................................................................................ 404 Quiz – Why is this considered an INNER JOIN? .................................................................................................. 405 Evaluation Order for Outer Queries ....................................................................................................................... 406 The DREADED Product Join ................................................................................................................................ 407 The DREADED Product Join Results ................................................................................................................... 408 The Horrifying Cartesian Product Join .................................................................................................................. 409 The ANSI Cartesian Join will ERROR .................................................................................................................. 410 Quiz – Do these Joins Return the Same Answer Set? ........................................................................................... 411 Answer – Do these Joins Return the Same Answer Set? ....................................................................................... 412 The CROSS JOIN .................................................................................................................................................. 413 The CROSS JOIN Answer Set............................................................................................................................... 414 The Self Join.......................................................................................................................................................... 415 The Self Join with ANSI Syntax ............................................................................................................................ 416 Quiz – Will both queries bring back the same Answer Set? ................................................................................. 417 Answer – Will both queries bring back the same Answer Set? ............................................................................. 418 Quiz – Will both queries bring back the same Answer Set? ................................................................................. 419 Answer – Will both queries bring back the same Answer Set? ............................................................................. 420 How would you Join these two tables? .................................................................................................................. 421 An Associative Table is a Bridge that Joins Two Tables ...................................................................................... 422 Quiz – Can you write the 3-Table Join? ................................................................................................................ 423 Answer to Quiz – Can you Write the 3-Table Join? .............................................................................................. 424 Quiz – Can you write the 3-Table Join to ANSI Syntax? ...................................................................................... 425 Answer – Can you Write the 3-Table Join to ANSI Syntax? ................................................................................ 426 Quiz – Can you Place the ON Clauses at the End?................................................................................................ 427 Answer – Can you Place the ON Clauses at the End? ........................................................................................... 428 The 5-Table Join – Logical Insurance Model ........................................................................................................ 429 Quiz - Write a Five Table Join Using ANSI Syntax .............................................................................................. 430 Answer - Write a Five Table Join Using ANSI Syntax ......................................................................................... 431
Table of Contents Quiz - Write a Five Table Join Using Non-ANSI Syntax ..................................................................................... 432 Answer - Write a Five Table Join Using Non-ANSI Syntax ................................................................................. 433 Quiz –Re-Write this putting the ON clauses at the END ...................................................................................... 434 Answer –Re-Write this putting the ON clauses at the END .................................................................................. 435 Chapter 13 – Date Functions.................................................................................................................................... 437 Current_Date .......................................................................................................................................................... 438 Current_Date and Current_Time ........................................................................................................................... 439 Current_Date and Current_Timestamp .................................................................................................................. 440 The Many Different Ways to Look at a Timestamp .............................................................................................. 441 Current_Time vs. LocalTime with Precision ......................................................................................................... 442 Local_Time and Local_Timestamp With Precision .............................................................................................. 443 Now () and Timeofday () Functions ...................................................................................................................... 444 Adding A Week to a Date ...................................................................................................................................... 445 Add or Subtract Days from a date .......................................................................................................................... 446 Formatting Dates and Dollar Amounts .................................................................................................................. 447 The EXTRACT Command .................................................................................................................................... 448 EXTRACT from DATES and TIME ..................................................................................................................... 449 EXTRACT Command on the Century ................................................................................................................... 450 EXTRACT Command for the Decade, DOW and DOY ....................................................................................... 451 EXTRACT Microseconds, Milliseconds and Millennium .................................................................................... 452 EXTRACT of the Month on Aggregate Queries ................................................................................................... 453 Date_part Command .............................................................................................................................................. 454 Date_Trunc Command with Time ......................................................................................................................... 455 Date_Trunc Command with Dates ......................................................................................................................... 456 The AGE Command ............................................................................................................................................... 457 AGE Challenge ...................................................................................................................................................... 458 AGE Challenge Results.......................................................................................................................................... 459 Epoch ...................................................................................................................................................................... 460
Table of Contents Using Intervals ....................................................................................................................................................... 461 More Interval Examples ......................................................................................................................................... 462 Interval Arithmetic Results .................................................................................................................................... 463 A Complex Time Interval example using CAST ................................................................................................... 464 The OVERLAPS Command .................................................................................................................................. 465 An OVERLAPS example that Returns No Rows .................................................................................................. 466 The OVERLAPS Command using TIME.............................................................................................................. 467 Using both CAST and CONVERT in Literal Values ............................................................................................ 468 A Better Technique for YEAR, MONTH, and DAY Functions ........................................................................... 469 Chapter 14 – Conversions and Formatting ............................................................................................................... 471 Postgres Conversion Functions .............................................................................................................................. 472 Postgres Conversion Function Templates .............................................................................................................. 473 Postgres Conversion Function Templates Continued ............................................................................................ 474 To_Char command Examples ................................................................................................................................ 475 Formatting A Date with To_Char .......................................................................................................................... 476 Formatting A Date With To_Char Continued ....................................................................................................... 477 To_Number ............................................................................................................................................................ 478 To_Number Examples ........................................................................................................................................... 479 To_Date .................................................................................................................................................................. 480 To_Timestamp ....................................................................................................................................................... 481 Numeric Manipulation Functions .......................................................................................................................... 482 Finding the Cube Root ........................................................................................................................................... 483 Ceiling Gets the Smallest Integer Not Smaller Than X ......................................................................................... 484 Floor Finds the Largest Integer Not Greater Than X ............................................................................................. 485 The Round Function and Precision ........................................................................................................................ 486 Chapter 15 – Sub-query Functions ........................................................................................................................... 488 An IN List is much like a Subquery ....................................................................................................................... 489
Table of Contents An IN List Never has Duplicates – Just like a Subquery....................................................................................... 490 An IN List Ignores Duplicates ............................................................................................................................... 491 The Subquery ......................................................................................................................................................... 492 The Three Steps of How a Basic Subquery Works................................................................................................ 493 These are Equivalent Queries ................................................................................................................................ 494 The Final Answer Set from the Subquery.............................................................................................................. 495 Quiz- Answer the Difficult Question ..................................................................................................................... 496 Answer to Quiz- Answer the Difficult Question ................................................................................................... 497 Should you use a Subquery of a Join? ................................................................................................................... 498 Quiz- Write the Subquery ...................................................................................................................................... 499 Answer to Quiz- Write the Subquery..................................................................................................................... 500 Quiz- Write the More Difficult Subquery .............................................................................................................. 501 Answer to Quiz- Write the More Difficult Subquery ............................................................................................ 502 Quiz – Write the Extreme Subquery ...................................................................................................................... 503 Answer to Quiz – Write the Extreme Subquery .................................................................................................... 504 Quiz- Write the Subquery with an Aggregate........................................................................................................ 505 Answer to Quiz- Write the Subquery with an Aggregate ...................................................................................... 506 Quiz- Write the Correlated Subquery .................................................................................................................... 507 Answer to Quiz- Write the Correlated Subquery ................................................................................................... 508 The Basics of a Correlated Subquery ..................................................................................................................... 509 The Top Query always runs first in a Correlated Subquery .................................................................................. 510 Correlated Subquery Example vs. a Join with a Derived Table ............................................................................ 511 Quiz- A Second Chance to Write a Correlated Subquery ..................................................................................... 512 Answer - A Second Chance to Write a Correlated Subquery ................................................................................ 513 Quiz- A Third Chance to Write a Correlated Subquery ........................................................................................ 514 Answer - A Third Chance to Write a Correlated Subquery ................................................................................... 515 Quiz- Last Chance To Write a Correlated Subquery ............................................................................................. 516 Answer – Last Chance to Write a Correlated Subquery ........................................................................................ 517 Quiz – Write the Extreme Correlated Subquery .................................................................................................... 518
Table of Contents Answer To Quiz – Write the Extreme Correlated Subquery ................................................................................. 519 Quiz- Write the NOT Subquery ............................................................................................................................. 520 Answer to Quiz- Write the NOT Subquery ........................................................................................................... 521 Quiz- Write the Subquery using a WHERE Clause............................................................................................... 522 Answer - Write the Subquery using a WHERE Clause ......................................................................................... 523 Quiz- Write the Subquery with Two Parameters ................................................................................................... 524 Answer to Quiz- Write the Subquery with Two Parameters ................................................................................. 525 How the Double Parameter Subquery Works ........................................................................................................ 526 More on how the Double Parameter Subquery Works .......................................................................................... 527 Quiz – Write the Triple Subquery .......................................................................................................................... 528 Answer to Quiz – Write the Triple Subquery ........................................................................................................ 529 Quiz – How many rows return on a NOT IN with a NULL? ................................................................................ 530 Answer – How many rows return on a NOT IN with a NULL? ........................................................................... 531 How to handle a NOT IN with Potential NULL Values........................................................................................ 532 IN is equivalent to =ANY ...................................................................................................................................... 533 Using a Correlated Exists ....................................................................................................................................... 534 How a Correlated Exists matches up ..................................................................................................................... 535 The Correlated NOT Exists.................................................................................................................................... 536 Quiz – How many rows come back from this NOT Exists? .................................................................................. 537 Answer – How many rows come back from this NOT Exists? ............................................................................. 538 Chapter 16 – OLAP Functions .................................................................................................................................. 540 CSUM..................................................................................................................................................................... 541 CSUM – The Sort Explained ................................................................................................................................. 542 CSUM – Rows Unbounded Preceding Explained ................................................................................................. 543 CSUM – Making Sense of the Data ....................................................................................................................... 544 CSUM – Making Even More Sense of the Data .................................................................................................... 545 CSUM – The Major and Minor Sort Key(s) .......................................................................................................... 546 The ANSI CSUM – Getting a Sequential Number ................................................................................................ 547
Table of Contents Troubleshooting The ANSI OLAP on a GROUP BY ........................................................................................... 548 Reset with a PARTITION BY Statement .............................................................................................................. 549 PARTITION BY only Resets a Single OLAP not ALL of them........................................................................... 550 Moving SUM ......................................................................................................................................................... 551 ANSI Moving Window is Current Row and Preceding n Rows ........................................................................... 552 How ANSI Moving SUM Handles the Sort .......................................................................................................... 553 Quiz – How is that Total Calculated? .................................................................................................................... 554 Answer to Quiz – How is that Total Calculated? .................................................................................................. 555 Moving SUM every 3-rows Vs a Continuous Average ......................................................................................... 556 Partition By Resets an ANSI OLAP ...................................................................................................................... 557 Both the Greenplum Moving Average and ANSI Version .................................................................................... 558 Moving Average..................................................................................................................................................... 559 The Moving Window is Current Row and Preceding ............................................................................................ 560 How Moving Average Handles the Sort ................................................................................................................ 561 Quiz – How is that Total Calculated? .................................................................................................................... 562 Answer to Quiz – How is that Total Calculated? .................................................................................................. 563 Quiz – How is that 4th Row Calculated? ................................................................................................................ 564 Answer to Quiz – How is that 4th Row Calculated? .............................................................................................. 565 Moving Average every 3-rows Vs a Continuous Average .................................................................................... 566 Partition By Resets an ANSI OLAP ...................................................................................................................... 567 Moving Difference using ANSI Syntax with Partition By .................................................................................... 568 RANK Defaults to Ascending Order ..................................................................................................................... 569 Getting RANK to Sort in DESC Order .................................................................................................................. 570 RANK OVER and PARTITION BY ..................................................................................................................... 571 RANK and DENSE RANK ................................................................................................................................... 572 PERCENT_RANK OVER ..................................................................................................................................... 573 PERCENT_RANK OVER with 14 rows in Calculation ....................................................................................... 574 PERCENT_RANK OVER with 21 rows in Calculation ....................................................................................... 575 Quiz – What Causes the Product_ID to Reset? ..................................................................................................... 576
Table of Contents Answer to Quiz – What Cause the Product_ID to Reset? ..................................................................................... 577 COUNT OVER for a Sequential Number ............................................................................................................. 578 Troubleshooting COUNT OVER........................................................................................................................... 579 Quiz – What caused the COUNT OVER to Reset? ............................................................................................... 580 Answer to Quiz – What caused the COUNT OVER to Reset? ............................................................................. 581 The MAX OVER Command.................................................................................................................................. 582 MAX OVER with PARTITION BY Reset ............................................................................................................ 583 Troubleshooting MAX OVER ............................................................................................................................... 584 The MIN OVER Command ................................................................................................................................... 585 Troubleshooting MIN OVER................................................................................................................................. 586 Finding a Value of a Column in the Next Row with MIN .................................................................................... 587 Quiz – Fill in the Blank .......................................................................................................................................... 588 Answer – Fill in the Blank ..................................................................................................................................... 589 The Row_Number Command ................................................................................................................................ 590 Using a Derived Table and Row_Number ............................................................................................................. 591 Quiz – How did the Row_Number Reset? ............................................................................................................. 592 Answer – How did the Row_Number Reset? ........................................................................................................ 593 Ordered Analytics OVER ...................................................................................................................................... 594 CURRENT ROW AND UNBOUNDED FOLLOWING ...................................................................................... 595 Different Windowing Options ............................................................................................................................... 596 The CSUM for Each Product_Id and the Next Start Date ..................................................................................... 597 How Ntile Works ................................................................................................................................................... 598 Ntile ........................................................................................................................................................................ 599 Ntile Continued ...................................................................................................................................................... 600 Ntile Percentile ....................................................................................................................................................... 601 Another Ntile example ........................................................................................................................................... 602 Using Tertiles (Partitions of Four) ......................................................................................................................... 603 NTILE .................................................................................................................................................................... 604 NTILE Using a Value of 10 ................................................................................................................................... 605
Table of Contents NTILE With a Partition.......................................................................................................................................... 606 Using FIRST_VALUE ........................................................................................................................................... 607 FIRST_VALUE ..................................................................................................................................................... 608 FIRST_VALUE after Sorting by the Highest Value ............................................................................................. 609 FIRST_VALUE with Partitioning ......................................................................................................................... 610 Using LAST_VALUE ............................................................................................................................................ 611 LAST_VALUE ...................................................................................................................................................... 612 Using LEAD........................................................................................................................................................... 613 Using LEAD With and Offset of 2 ........................................................................................................................ 614 LEAD ..................................................................................................................................................................... 615 LEAD With Partitioning ........................................................................................................................................ 616 Using LAG ............................................................................................................................................................. 617 Using LAG with an Offset of 2 .............................................................................................................................. 618 LAG ........................................................................................................................................................................ 619 LAG with Partitioning............................................................................................................................................ 620 CUME_DIST ......................................................................................................................................................... 621 CUME_DIST with a Partition................................................................................................................................ 622 SUM (SUM(n)) ...................................................................................................................................................... 623 Chapter 17 – Temporary Tables ............................................................................................................................... 625 There are Two Types of Temporary Tables .......................................................................................................... 626 CREATING A Derived Table................................................................................................................................ 627 Naming the Derived Table ..................................................................................................................................... 628 Aliasing the Column Names in the Derived Table ................................................................................................ 629 Multiple Ways to Alias the Columns in a Derived Table ....................................................................................... 630 CREATING a Derived Table using the WITH Command ..................................................................................... 631 The Same Derived Query shown Three Different Ways ....................................................................................... 632 Most Derived Tables Are Used To Join To Other Tables ..................................................................................... 633 The Three Components of a Derived Table ........................................................................................................... 634
Table of Contents Visualize This Derived Table ................................................................................................................................ 635 A Derived Table and CAST Statements ................................................................................................................ 636 A Derived example Using the WITH Syntax ........................................................................................................ 637 Quiz - Answer the Questions ................................................................................................................................. 638 Answer to Quiz - Answer the Questions................................................................................................................ 639 Clever Tricks on Aliasing Columns in a Derived Table ........................................................................................ 640 An example of Two Derived Tables in a Single Query ......................................................................................... 641 MULTIPLE Derived Tables using the WITH Command ..................................................................................... 642 Finding the First Occurrence.................................................................................................................................. 643 Finding the Last Occurrence .................................................................................................................................. 644 Three Steps to Creating a Temporary Table .......................................................................................................... 645 Three Versions of Creating a Temporary Table .................................................................................................... 646 ON COMMIT PRESERVE ROWS is the Greenplum Default ............................................................................. 647 ON COMMIT DELETE ROWS ............................................................................................................................ 648 How to Use the ON COMMIT DELETE ROWS Option ..................................................................................... 649 ON COMMIT DROP ............................................................................................................................................. 650 How to Use the ON COMMIT DROP Option....................................................................................................... 651 Create Table AS ..................................................................................................................................................... 652 Creating a Temporary Table Using a CTAS that Joins Multiple Tables ............................................................... 653 Create Table LIKE ................................................................................................................................................. 654 Creating a Clustered Index on a Temporary Table ................................................................................................ 655 Chapter 18 – Character Strings ................................................................................................................................. 657 The LENGTH Command Counts Characters ........................................................................................................ 658 The LENGTH Command – Spaces can Count too ................................................................................................ 659 The LENGTH Command Doesn't Count Trailing Spaces ..................................................................................... 660 UPPER and LOWER Commands .......................................................................................................................... 661 Using the LOWER Command ............................................................................................................................... 662 A LOWER Command Example ............................................................................................................................. 663
Table of Contents Using the UPPER Command ................................................................................................................................. 664 An UPPER Command Example ............................................................................................................................ 665 Non-Letters are Unaffected by UPPER and LOWER ........................................................................................... 666 The CHARACTERS Command Counts Characters .............................................................................................. 667 The CHARACTERS Command and Character Data ............................................................................................ 668 CHARACTER_LENGTH and OCTET_LENGTH ............................................................................................... 669 The TRIM Command trims both Leading and Trailing Spaces............................................................................. 670 Trim Combined with the CHARACTERS Command ........................................................................................... 671 How to TRIM only the Trailing Spaces ................................................................................................................. 672 REGEXP_REPLACE ............................................................................................................................................ 673 Concatenation ......................................................................................................................................................... 674 A Visual of the TRIM Command Using Concatenation ........................................................................................ 675 Trim and Trailing is Case Sensitive ....................................................................................................................... 676 How to TRIM Trailing Letters ............................................................................................................................... 677 The SUBSTRING Command................................................................................................................................. 678 SUBSTRING and SUBSTR are equal, but use different syntax ........................................................................... 679 How SUBSTRING Works with NO ENDING POSITION .................................................................................. 680 Using SUBSTRING to move backwards ............................................................................................................... 681 How SUBSTRING Works with a Starting Position of -1 ..................................................................................... 682 How SUBSTRING Works with an Ending Position of 0 ...................................................................................... 683 An example using SUBSTRING, TRIM and CHAR Together ............................................................................. 684 The POSITION Command finds a Letters Position .............................................................................................. 685 Concatenation ......................................................................................................................................................... 686 Concatenation and SUBSTRING........................................................................................................................... 687 Four Concatenations Together ............................................................................................................................... 688 Troubleshooting Concatenation ............................................................................................................................. 689
Table of Contents Chapter 19 – Interrogating the Data......................................................................................................................... 691 Quiz – What would the Answer be? ...................................................................................................................... 692 Answer to Quiz – What would the Answer be? ..................................................................................................... 693 The NULLIF Command ......................................................................................................................................... 694 Quiz – Fill in the Answers for the NULLIF Command ......................................................................................... 695 Answer– Fill in the Answers for the NULLIF Command ..................................................................................... 696 The COALESCE Command – Fill In the Answers ............................................................................................... 697 The COALESCE Answer Set ................................................................................................................................ 698 COALESCE is Equivalent to This CASE Statement ............................................................................................ 699 The COALESCE Command .................................................................................................................................. 700 The COALESCE Answer Set ................................................................................................................................ 701 The COALESCE Quiz ........................................................................................................................................... 702 Answer - The COALESCE Quiz ........................................................................................................................... 703 The Basics of CAST (Convert and Store).............................................................................................................. 704 Some Great CAST (Convert and Store) Examples ................................................................................................ 705 Some Great CAST (Convert and Store) Examples ................................................................................................ 706 Some Great CAST (Convert and Store) example .................................................................................................. 707 Quiz - The Basics of the CASE Statements ........................................................................................................... 708 Answer to Quiz - The Basics of the CASE Statements ......................................................................................... 709 Using an ELSE in the Case Statement ................................................................................................................... 710 Using an ELSE as a Safety Net .............................................................................................................................. 711 Rules for a Valued Case Statement ........................................................................................................................ 712 Rules for a Searched Case Statement ..................................................................................................................... 713 Valued Case Vs. A Searched Case.......................................................................................................................... 714 Quiz - Valued Case Statement ............................................................................................................................... 715 Answer - Valued Case Statement........................................................................................................................... 716 Quiz - Searched Case Statement ............................................................................................................................ 717 Answer - Searched Case Statement ....................................................................................................................... 718 The CASE Challenge ............................................................................................................................................. 720
Table of Contents The CASE Challenge Answer................................................................................................................................ 721 Combining Searched Case and Valued Case ......................................................................................................... 722 A Trick for getting a Horizontal Case.................................................................................................................... 723 Nested Case ............................................................................................................................................................ 724 Chapter 20 – Set Operators Functions ...................................................................................................................... 726 Rules of Set Operators ........................................................................................................................................... 727 Rules of Set Operators ........................................................................................................................................... 728 INTERSECT Explained Logically......................................................................................................................... 729 INTERSECT Explained Logically......................................................................................................................... 730 UNION Explained Logically ................................................................................................................................. 731 UNION Explained Logically ................................................................................................................................. 732 UNION ALL Explained Logically ........................................................................................................................ 733 UNION ALL Explained Logically ........................................................................................................................ 734 EXCEPT Explained Logically ............................................................................................................................... 735 EXCEPT Explained Logically ............................................................................................................................... 736 An Equal Amount of Columns in both SELECT List ........................................................................................... 737 Columns in the SELECT list should be from the same Domain ........................................................................... 738 The Top Query handles all Aliases ........................................................................................................................ 739 The Bottom Query does the ORDER BY (a Number) .......................................................................................... 740 Great Trick: Place your Set Operator in a Derived Table..................................................................................... 741 UNION Vs UNION ALL ....................................................................................................................................... 742 Using UNION ALL and Literals ........................................................................................................................... 743 A Great example of how EXCEPT works ............................................................................................................. 744 Quiz – Build that Query ......................................................................................................................................... 745 Answer To Quiz – Build that Query ...................................................................................................................... 746 USING Multiple SET Operators in a Single Request............................................................................................ 747 Changing the Order of Precedence with Parentheses ............................................................................................ 748 Using UNION ALL for speed in Merging Data Sets ............................................................................................ 749
Table of Contents Chapter 21 – View Functions ................................................................................................................................... 751 The Fundamentals of Views .................................................................................................................................. 752 Creating a Simple View to Restrict Sensitive Columns ........................................................................................ 753 Creating a Simple View to Restrict Rows ............................................................................................................. 754 Basic Rules for Views ............................................................................................................................................ 755 Exception to the ORDER BY Rule inside a View ................................................................................................. 756 Views sometimes CREATED for Formatting ....................................................................................................... 757 Creating a View to Join Tables Together............................................................................................................... 758 Another Way to Alias Columns in a View CREATE ............................................................................................ 759 The Standard Way Most Aliasing is done ............................................................................................................. 760 What Happens When Both Aliasing Options Are Present .................................................................................... 761 Resolving Aliasing Problems in a View CREATE ............................................................................................... 762 Answer to Resolving Aliasing Problems in a View CREATE .............................................................................. 763 Aggregates on View Aggregates............................................................................................................................ 764 Altering A Table .................................................................................................................................................... 765 Altering a Table after a View has been Created .................................................................................................... 766 A View that Errors after an ALTER ...................................................................................................................... 767 Chapter 22 – Table Create and Data Types .............................................................................................................. 769 Greenplum Has Only Two Distribution Policies ................................................................................................... 770 Creating a Table with a Single Column Distribution Key ..................................................................................... 771 The Default Table Storage is a Heap ..................................................................................................................... 772 Creating a Table With a Multi-Column Distribution Key ..................................................................................... 773 Creating a Table with Random Distribution .......................................................................................................... 774 Creating a Table with No Distribution Key ........................................................................................................... 775 Guidelines for Partitioning a Table ........................................................................................................................ 776 Creating a Partitioned Table Using a Range .......................................................................................................... 777 A Visual of One Year of Data with Range Partitioning ........................................................................................ 778 Creating a Partitioned Table Using a Range Per Day ............................................................................................ 779
Table of Contents A Visual of One Year of Data with Range per Day .............................................................................................. 780 Creating a Partitioned Table Using a List .............................................................................................................. 781 Creating a Multi-Level Partitioned Table .............................................................................................................. 782 Changing a Table to a Partitioned Table................................................................................................................ 783 Not Null Constraints .............................................................................................................................................. 784 Unique Constraints ................................................................................................................................................. 785 Primary Key Constraints ........................................................................................................................................ 786 Check Constraints .................................................................................................................................................. 787 Append Only Tables .............................................................................................................................................. 788 Storage is Either Row, Column, or a Combination of Both .................................................................................. 789 Column-Orientated Tables ..................................................................................................................................... 790 CREATE INDEX Syntax....................................................................................................................................... 791 CREATE INDEX Syntax....................................................................................................................................... 792 Create Table LIKE ................................................................................................................................................. 793 Greenplum Data Types .......................................................................................................................................... 794 Greenplum Data Types Continued ......................................................................................................................... 795 Greenplum Data Types Continued ......................................................................................................................... 796 Greenplum Data Types Continued ......................................................................................................................... 797 Greenplum Data Types Continued ......................................................................................................................... 798 Chapter 23 – Data Manipulation Language (DML) ................................................................................................. 800 INSERT Syntax # 1 ................................................................................................................................................ 801 INSERT example with Syntax 1 ............................................................................................................................ 802 INSERT Syntax # 2 ................................................................................................................................................ 803 INSERT example with Syntax 2 ............................................................................................................................ 804 INSERT example with Syntax 3 ............................................................................................................................ 805 INSERT/SELECT Command ................................................................................................................................ 806 INSERT/SELECT example using All Columns (*) .............................................................................................. 807 INSERT/SELECT example with Less Columns ................................................................................................... 808
Table of Contents The UPDATE Command Basic Syntax ................................................................................................................. 809 Two UPDATE Examples ....................................................................................................................................... 810 Subquery UPDATE Command Syntax .................................................................................................................. 811 Example of Subquery UPDATE Command .......................................................................................................... 812 Join UPDATE Command Syntax .......................................................................................................................... 813 Example of an UPDATE Join Command .............................................................................................................. 814 Fast UPDATE ........................................................................................................................................................ 815 The DELETE Command Basic Syntax .................................................................................................................. 816 DELETE and TRUNCATE Examples................................................................................................................... 817 To DELETE or to TRUNCATE ............................................................................................................................ 818 Subquery and Join DELETE Command Syntax .................................................................................................... 819 Example of Subquery DELETE Command ........................................................................................................... 820 Chapter 24 – ANALYZE and VACUUM ................................................................................................................ 822 ANALYZE ............................................................................................................................................................. 823 ANALYZE Options ............................................................................................................................................... 824 What Columns Should You Analyze? ................................................................................................................... 825 Why Analyze? ........................................................................................................................................................ 826 VACUUM .............................................................................................................................................................. 827 VACUUM Options ................................................................................................................................................ 828 Chapter 25 – Greenplum Explain ............................................................................................................................. 830 How to See an EXPLAIN Plan .............................................................................................................................. 831 The Eight Rules to Reading an EXPLAIN Plan .................................................................................................... 832 Interpreting Keywords in an EXPLAIN Plan ........................................................................................................ 833 Interpreting an EXPLAIN Plan .............................................................................................................................. 834 A Single Segment Retrieve – The Fastest Query................................................................................................... 835 EXPLAIN With an ORDER BY Statement........................................................................................................... 836 EXPLAIN ANALYZE ........................................................................................................................................... 837
Table of Contents EXPLAIN With a Range Query on a Table Partitioned By Day........................................................................... 838 EXPLAIN That Uses a B-Tree Index Scan ........................................................................................................... 839 EXPLAIN That Uses a Bitmap Scan ..................................................................................................................... 840 EXPLAIN With a Simple Subquery ...................................................................................................................... 841 EXPLAIN With a Columnar Query ....................................................................................................................... 842 EXPLAIN With a Clustered Index ........................................................................................................................ 843 The Most Important Concept for Joins is the Distribution Key ............................................................................ 844 EXPLAIN With Join that has to Move Data ......................................................................................................... 845 EXPLAIN With Join that has to Move Data ......................................................................................................... 846 Changing the Join Query Changes the EXPLAIN Plan ........................................................................................ 847 Analyzing the Tables Structures For a 3-Table Join.............................................................................................. 848 An EXPLAIN For a 3-Table Join .......................................................................................................................... 849 Explain of a Derived Table vs. a Correlated Subquery ......................................................................................... 850 Explain of the Correlated Subquery ....................................................................................................................... 851 Explain of the Derived Table ................................................................................................................................. 852 Chapter 26 – Statistical Aggregate Functions........................................................................................................... 854 The Stats Table ....................................................................................................................................................... 855 The STDDEV_POP Function ................................................................................................................................ 856 A STDDEV_POP Example ................................................................................................................................... 857 The STDDEV_SAMP Function............................................................................................................................. 858 A STDDEV_SAMP Example ................................................................................................................................ 859 The VAR_POP Function ....................................................................................................................................... 860 A VAR_POP Example ........................................................................................................................................... 861 The VAR_SAMP Function .................................................................................................................................... 862 A VAR_SAMP Example ....................................................................................................................................... 863 The VARIANCE Function..................................................................................................................................... 864 A VARIANCE Example ........................................................................................................................................ 865 The CORR Function .............................................................................................................................................. 866
Table of Contents A CORR Example .................................................................................................................................................. 867 Another CORR Example so you can Compare...................................................................................................... 868 The COVAR_POP Function .................................................................................................................................. 869 A COVAR_POP Example ..................................................................................................................................... 870 Another COVAR_POP Example so you can Compare ......................................................................................... 871 The COVAR_SAMP Function .............................................................................................................................. 872 A COVAR_SAMP Example .................................................................................................................................. 873 Another COVAR_SAMP Example so you can Compare ..................................................................................... 874 The REGR_INTERCEPT Function ....................................................................................................................... 875 A REGR_INTERCEPT Example .......................................................................................................................... 876 Another REGR_INTERCEPT Example so you can Compare .............................................................................. 877 The REGR_SLOPE Function ................................................................................................................................ 878 A REGR_SLOPE Example .................................................................................................................................. 879 Another REGR_SLOPE Example so you can Compare ....................................................................................... 880 The REGR_AVGX Function ................................................................................................................................. 881 A REGR_AVGX Example .................................................................................................................................. 882 Another REGR_AVGX Example so you can Compare ........................................................................................ 883 The REGR_AVGY Function ................................................................................................................................. 884 A REGR_AVGY Example .................................................................................................................................... 885 Another COVAR_POP Example so you can Compare ......................................................................................... 886 The REGR_COUNT Function ............................................................................................................................... 887 A REGR_COUNT Example .................................................................................................................................. 888 The REGR_R2 Function ........................................................................................................................................ 889 A REGR_R2 Example ........................................................................................................................................... 890 The REGR_SXX Function..................................................................................................................................... 891 A REGR_SXX Example ........................................................................................................................................ 892 The REGR_SXY Function..................................................................................................................................... 893 A REGR_SXY Example ........................................................................................................................................ 894 The REGR_SYY Function..................................................................................................................................... 895
Table of Contents A REGR_SYY Example ........................................................................................................................................ 896 Using GROUP BY ................................................................................................................................................. 897
Chapter 1
Introduction to the Greenplum Architecture
Chapter 1
Introduction to the Greenplum Architecture
Chapter 1 – Introduction to the Greenplum Architecture
“The man who has no imagination has no wings.” – Muhammad Ali
Page 2
Chapter 1
Introduction to the Greenplum Architecture
What is Parallel Processing? “After enlightenment, the laundry” - Zen Proverb
Tera-Tom’s Parallel Processing Wash and Dry
Tera-Tom’s Parallel Processing Wash and Dry
Tera-Tom’s Parallel Processing Wash and Dry
Tera-Tom’s Parallel Processing Wash and Dry
Tera-Tom’s Parallel Processing Wash and Dry
“After parallel processing the laundry, enlightenment!” - Greenplum Zen Proverb
Two guys were having fun on a Saturday night when one said, “ I’ve got to go and do my laundry.” The other said, “What?!” The man explained that if he went to the laundry mat the next morning, he would be lucky to get one machine and be there all day. But, if he went on Saturday night he could get all the machines. Then, he could do all his wash and dry in two hours. Now that’s parallel processing mixed in with a little dry humor!
Page 3
Chapter 1
Introduction to the Greenplum Architecture
The Basics of a Single Computer CPU
Memory How are we doing on orders today?
Orders Order_No 100 200 300 400
Customer_No
Order_Date
21345679 32456733 31323134 87323456
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Order_Total 12347.53 8005.91 5111.47 15231.62
How would I know? I'm just a disk. I need to transfer the block of data to the memory, and that is a slow process.
“When you are courting a nice girl, an hour seems like a second. When you sit on a red-hot cinder, a second seems like an hour. That’s relativity.”
– Albert Einstein
Data on disk does absolutely nothing. When data is requested, the computer moves the data one block at a time from disk into memory. Once the data is in memory, it is processed by the CPU at lightning speed. All computers work this way. The "Achilles Heel" of every computer is the slow process of moving data from disk to memory. The real theory of relativity is to find out how to get blocks of data from the disk into memory faster!
Page 4
Chapter 1
Introduction to the Greenplum Architecture
Data in Memory is fast as Lightning CPU Memory Order_No 100 200 300 400
Customer_No
Order_Date
21345679 32456733 31323134 87323456
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Order_Total 12347.53 8005.91 5111.47 15231.62
Orders Order_No 100 200 300 400
Customer_No
Order_Date
21345679 32456733 31323134 87323456
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Order_Total 12347.53 8005.91 5111.47 15231.62
“You can observe a lot by watching.” – Yogi Berra
Once the data block is moved off of the disk and into memory, the processing of that block happens as fast as lightning. It is the movement of the block from disk into memory that slows down every computer. Data being processed in memory is so fast that even Yogi Berra couldn't catch it!
Page 5
Chapter 1
Introduction to the Greenplum Architecture
Parallel Processing Of Data Segment
Segment
Memory
Segment
Memory
Memory
Cust_No
Order_Date
Order_Total
Cust_No
Order_Date
Order_Total
21345679 32456733 31323134 87323456
01/01/2013 01/01/2013 01/01/2013 01/01/2013
12347.53 8005.91 5111.47 15231.62
34345699 41456543 51323154 67823486
01/01/2013 01/01/2013 01/01/2013 01/01/2013
13347.51 13005.91 7611.57 11671.92
Orders Cust_No 21345679 32456733 31323134 87323456
Cust_No
Order_Date
87945679 98756733 35623134 97873456
Orders
Order_Date
Order_Total
Cust_No
01/01/2013 01/01/2013 01/01/2013 01/01/2013
12347.53 8005.91 5111.47 15231.62
34345699 41456543 51323154 67823486
Order_Date
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Segment Memory
Order_Total
Cust_No
Order_Date
Order_Total
8347.53 17005.91 3451.47 19871.62
44445679 32547733 57497134 87768956
01/01/2013 01/01/2013 01/01/2013 01/01/2013
12447.53 8055.66 5651.47 231.62
Order_Total
Cust_No
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Orders
Order_Total 13347.51 13005.91 7611.57 11671.92
Cust_No 87945679 98756733 35623134 97873456
Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013
Orders 8347.53 17005.91 3451.47 19871.62
44445679 32547733 57497134 87768956
Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013
Order_Total 12447.53 8055.66 5651.47 231.62
"If the facts don't fit the theory, change the facts." -Albert Einstein
Big Data is all about parallel processing. Parallel processing is all about taking the rows of a table and spreading them among many parallel processing units. In Greenplum, these parallel processing units are referred to as Segments. Above, we can see a table called Orders. There are 16 rows in the table. Each segment holds four rows. Now they can process the data in parallel and be four times as fast. What Albert Einstein meant to say was, “If the theory doesn't fit the dimension table, change it to a fact." Each Segment shares nothing and holds a portion of every table. Page 6
Chapter 1
Introduction to the Greenplum Architecture
Symmetric Multi-Processing (SMP) Server CPU
CPU
CPU
CPU
Cache
Cache
Cache
Cache
Bus
Shared Memory
Disk I/O
A Symmetric Multi-Processing system has multiple processors for extra power, but these processors share a single operating system, memory pool and they share access to the disks. This is a great architecture for speed, similar to a restaurant that is quick and organized, but it lacks the ability for unlimited expansion. When there are too many cooks in the kitchen you need an MPP system that scales many SMP systems together as one parallel processing data warehouse.
A Symmetric Multi-Processing (SMP) system is a single server that is sometimes referred to as a node. Watch next how Greenplum uses these servers (called Segment Servers) to create a Massively Parallel Processing (MPP) system using commodity hardware. Page 7
Chapter 1
Introduction to the Greenplum Architecture
Commodity Hardware Servers are configured for Greenplum Segment Host 1 CPU
CPU
CPU
Segment Host 2 CPU
Memory S E G M E N T
S E G M E N T
S E G M E N T
CPU
CPU
Segment Host n CPU
Memory S E G M E N T
S E G M E N T
S E G M E N T
CPU
CPU
Memory S E G M E N T
S E G M E N T
S E G M E N T
Greenplum allows you to utilize commodity hardware servers called a Segment Host. Greenplum also allows you to configure your parallel processes called segments. The number of segments per Segment Host is usually defined by the number of CPU's the Segment Host contains. The Segment Hosts are connected together to create a Massively Parallel Processing (MPP) system from each SMP.
Greenplum allows commodity hardware to be utilized to create one giant Greenplum cluster.
Page 8
Chapter 1
Introduction to the Greenplum Architecture
Commodity Hardware Allows For One Segment per CPU SMP Segment Node 1
SMP Segment Node n
Dual-Core CPU
Dual-Core CPU
Dual-Core CPU
Memory
Segment
Segment
Segment
Dual-Core CPU
Memory
Segment
Segment
Segment
Segment
Segment
Greenplum provides incredible speeds with inexpensive costs by allowing customers to purchase commodity hardware. The rule of thumb is to create one segment per CPU. If you have two dual-core CPU processors in a server you should build four segments. By connecting multiple SMP servers together you can scale your Greenplum cluster in a linear fashion. Double the segments and double your speeds forever!
Page 9
Chapter 1
Introduction to the Greenplum Architecture
The Master Host Master Host Dual-Core CPU
Dual-Core CPU
Memory System Catalog
•
When a user logs into Greenplum, the host will log them in and be responsible for their session.
•
The host checks the SQL syntax, creates the EXPLAIN plan, checks the security, and builds a plan for the segments to follow.
•
The host uses system statistics and statistics from the ANALYZE command to build the best plan.
•
The host doesn't hold user data, but instead holds the Global System Catalog.
•
The host always delivers the final answer set to the user.
The host is the boss and the segments are the workers. Who doesn't lover their boss? Users login to the host and never communicate directly with the segments. The host builds a plan for the segments to follow that is delivered in plan slices. Each slice instructs the segments what to do. When the segments have done their work they return it to the host.
Page 10
Chapter 1
Introduction to the Greenplum Architecture
The Segment's Responsibilities Segment
Segment
Segment
Segment
Segment
Segment
•
Segments are responsible for storing and retrieving rows from their assigned disk (Virtual disk).
•
Segments lock the tables and rows.
•
Segments sort rows and do all aggregation.
•
Segments handle all the join processing.
•
Segments handle all space management and space accounting.
Greenplum segments have the responsibilities listed above.
Page 11
Segment
Segment
Chapter 1
Introduction to the Greenplum Architecture
The Host's Plan is Either All Segments or a Single Segment SQL Statement SELECT * FROM Employee_Table WHERE Employee_No = 2000000 ;
Master Host
Use the Distribution Key in the WHERE Clause with equality and only one segment is contacted.
Distribution Key
INTERCONNECT
Segment Host
Gigabit Ethernet
Segment Host
On most queries the Master Host will broadcast the plan to each segment simultaneously, but if you use the distribution key in the WHERE clause of your SQL with an equality statement, then only a single segment will be contacted to return the row.
Page 12
Chapter 1
Introduction to the Greenplum Architecture
A Table has Columns and Rows Emp_No Dept_No First_Name 100 1001 Rafael 200 1002 Maria 300 1003 Charl 400 1004 Kyle 400 1005 Rob 300 1006 Inna 200 1007 Sushma 100 1008 Mo 300 1009 Mo Segment
Segment
Last_Name Salary Minal 90000 Gomez 80000 Kertzel 70000 Stover 60000 Rivers 50000 Kinski 50000 Davis 50000 Khan 60000 Swartz 70000 Segment
Employee_Table 1001 100 Rafael
Employee_Table Employee_Table Minal 90000 1002 200 Maria Gomez 80000 1003 300 Charl Kertzel 70000
1004 400 Kyle
Stover 60000 1005 400 Rob
1007 200 Sushma Davis 50000 1008 100 Mo
Rivers 50000 1006 300 Inna Kinski 50000 Khan
60000 1009 300 Mo Swartz 70000
The table above has 9 rows. Our small system above has three parallel processing units called segments. Each segment holds three rows. Double your segments and double your speed and power. The idea of parallel processing is to take the rows of a table and spread them across the segments so each segment can process their portion of the data in parallel.
Page 13
Chapter 1
Introduction to the Greenplum Architecture
Greenplum has Linear Scalability Host
Interconnect
Interconnect
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
"A Journey of a thousand miles begins with a single step ."
- Lao Tzu
Greenplum was born to be parallel. With each query, a single step is performed in parallel by each segment. A Greenplum system consists of a series of segments that will work in parallel to store and process your data. This design allows you to start small and grow infinitely. If your Greenplum system provides you with an excellent Return on Investment (ROI), then continue to invest by purchasing more segment nodes. Most companies start small, but after seeing what Greenplum can do, they continue to grow their ROI from the single step of implementing a Greenplum system to millions of dollars in profits. Double your segment nodes and double your speeds….Forever. The Greenplum Data Warehouse actually provides a journey of a thousand smiles! Page 14
Chapter 1
Introduction to the Greenplum Architecture
The Architecture of A Greenplum Data Warehouse Host
The Host manages the distribution of data and builds the plan for the segments to follow.
Segment Node 1 S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
Segment Node n S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
S E G M E N T
“Be the change that you want to see in the world.”
- Mahatma Gandhi
The Host is the brains behind the entire operation. The user logs into the host, and for each SQL query, the host will come up with a plan to retrieve the data. It passes that plan to each segment node, and each of the segments process their portion of the data. If the data is spread evenly, parallel processing works perfectly. This technology is relatively inexpensive. It might not "be the change", but it will help your company "keep the change" because costs are low. Greenplum uses both SMP and MPP technology. Each segment node is an SMP, but then many segment nodes are lined together to become one big MPP system. Depending on the commodity hardware being used and the number of processors this will determine the number of segments per segment node. Above, we can see 8 segments per segment node.
Page 15
Chapter 1
Introduction to the Greenplum Architecture
Nexus is Now Available for Greenplum
Why the Nexus Chameleon should be your query tool of choice: 1) Queries every major system 2) Provides visualization and automatically writes the SQL 3) Can perform cross-system joins with a few clicks of the mouse 4) Converts table structures and moves the table and data between systems 5) Compares and synchronizes databases 6) Can move an entire database of tables or views between systems 7) Has the "Garden of Analysis" to re-query answer sets inside your PC 8) Provides a dashboard of graphs and charts for answer sets
Download the Nexus for a free trial at www.CoffingDW.com and use Nexus in-house or on the cloud.
Page 16
Chapter 2
Page 17
Greenplum Table Structures
Chapter 2
Greenplum Table Structures
Chapter 2 – Greenplum Table Structures
“Let me once again explain the rules. The Greenplum Data Warehouse Rules!” - Tera-Tom Coffing
Page 18
Chapter 2
Greenplum Table Structures
The Concepts of Greenplum Tables 1. Tables are either Distributed by Hash or Distributed Randomly. 2. Tables are either stored in a heap or are append-only tables. 3. The rows of a table by default are unsorted in a heap or they can be physically sorted with a clustered index.
4. Tables are stored physically on disk in either a row or columnar format. 5. Tables can be partitioned.
6. Tables are either permanent, temporary or external Tables. 7. Table can have Primary and Foreign Key constraints although Foreign Key constraints are not enforced. 8. Tables can have Unique constraints and other Boolean constraints.
9. Compression techniques are supported at the table or column level.
Above are some basics about concepts for Greenplum tables. The next several pages will cover each point one at a time. This will allow you to see exactly what is going on immediately.
Page 19
Chapter 2
Greenplum Table Structures
Tables are Either Distributed by Hash or Random Segment
Segment
Segment
Memory
Memory
Memory
Hashed Each Distribution holds different rows. Each row is hashed by the values in a certain column, such as Employee_No
1 4 7 11
Joel Davis Rick Jahns Lynn Meyer Seth Rogers
2 5 8 12
Mary Lewis John Miller Rich Jones Kyle Watson
3 Tony Brady 6 Lana Payne 9 Lorie Stewart 13 Dawn Daily
Random Each segment gets rows in a round robin fashion to ensure even distribution
100 Sales 200 Marketing
300 Finance 500 Research
400 HR 600 IT
The Greenplum database gives you two choices for table distribution. These choices are either distributed by Hash or randomly distributed. Large fact tables are usually hashed and smaller tables are often random. When a table is hashed, one or more columns are chosen as the distribution key. In our example above, the Employee_Table (top) is hashed by the Employee_No. The Random table (bottom) only has six rows in it and they are evenly distributed.
Page 20
Chapter 2
Greenplum Table Structures
A Hash Distributed Table has A Distribution Key CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) DISTRIBUTED BY (Employee_No) ; Segment Memory
Segment Memory
Segment Memory
Hashed Each Segment holds different rows. Each row is hashed by the values in a certain column, such as Employee_No
1 4 7 11
Joel Davis Rick Jahns Lynn Meyer Seth Rogers
2 5 8 12
Mary Lewis John Miller Rich Jones Kyle Watson
3 Tony Brady 6 Lana Payne 9 Lorie Stewart 13 Dawn Daily
Above is a basic TABLE CREATE STATEMENT for a table with a Distribution Key. You can use one or more columns as the Distribution Key on Greenplum. The values in this column will be hashed with a hashing formula and used to distribute the rows of the table across the Segments. Picking a good key is essential. An excellent Distribution Key will allow for even distribution among the many segments.
Page 21
Chapter 2
Greenplum Table Structures
Picking A Distribution Key That Is Not Very Unique CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) DISTRIBUTED BY (Last_Name) ; Segment 1
Segment 2
Segment 3
Segment 4
Memory
Memory
Memory
Memory
Jones Jones Jones Jones Miller
Davis Davis Davis Patel Patel
Luellener Grayson
Valentine Gonzales
The hash formula will distribute like values to the same segment. This can result in skewed data. Pick a good distribution key or you could get uneven data. Notice that like values went to the same segment and the data is unevenly spread.
Page 22
Chapter 2
Greenplum Table Structures
Random Distribution Uses a Round Robin Technique CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) DISTRIBUTED Randomly ; Segment 1
Segment 2
Segment 3
Segment 4
Memory
Memory
Memory
Memory
Davis Davis Davis Patel
Luellener Jones Miller
Valentine Jones Patel
Jones Grayson Gonzales Jones
Above is a basic TABLE CREATE STATEMENT for a table that is Distributed Randomly across all segments. That means that the rows are distributed in a round robin fashion to ensure even distribution. This should be done for relatively small tables, or for tables that don't have a column or a column combination, that will provide reasonably even distribution.
Page 23
Chapter 2
Greenplum Table Structures
Tables Will Be Distributed Among All Segments Segment 1
Segment 2
Segment 3
Segment 4
Memory
Memory
Memory
Memory
Segment 5
Segment 6
Segment 7
Segment 8
Memory
Memory
Memory
Memory
Segment 9
Segment 10
Segment 11
Segment 12
Memory
Memory
Memory
Memory
Above we see 12 segments and five tables. Each table is spread across all 12 segments. All five tables above are row based tables. Some are hash distributed and some are randomly distributed. Just see the data and understand that tables are spread across all segments in order to take full advantage of parallel processing. Greenplum was born to be parallel. Also understand that all segments have the exact same table structures, but each segment is responsible for different rows.
Page 24
Chapter 2
Greenplum Table Structures
The Default For Distribution Chooses the First Column Since no distribution has been defined the system defaults to choosing the first column as the Distribution Key.
CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) );
Segment
Segment
Segment
Memory
Memory
Memory
Emp_Intl
Emp_Intl
Emp_Intl
When no distribution is defined and the table is created, the default is a hash distribution policy. Greenplum will use either the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key.
Page 25
Chapter 2
Greenplum Table Structures
Table are Either a Heap or Append-Only
This table is stored as a Heap
This table is stored as Append-Only
CREATE TABLE Emp_Table ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) DISTRIBUTED BY (Employee_No) ; CREATE TABLE Emp_Append ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) WITH (appendonly=true) DISTRIBUTED BY (Employee_No) ;
UPDATE and DELETE statements can be performed on this table.
NO UPDATE and DELETE statements allowed. This saves about 20 bytes per row
By default, Greenplum Database uses storage in an unsorted heap. Heap tables allows for data to be deleted or updated after it is initially loaded. Append-only table storage works best with denormalized fact tables in a data warehouse environment, where the data is not updated after it is loaded. Append-only tables eliminate about 20 bytes per row because there is not the storage overhead of the per-row update visibility information. Append-only tables do not allow UPDATE and DELETE operations, and single row INSERT statements are not recommended because they are slow.
Page 26
Chapter 2
Greenplum Table Structures
Tables are Stored in Either Row or Columnar Format Segment
Segment
Segment
Memory
Memory
Memory
Employee_Row_Based
Employee_Row_Based
Employee_Row_Based
Employee_Columnar
Employee_Columnar
Employee_Columnar
A table is stored in either a row format or a columnar format. Traditionally, most systems have always stored the rows of a table in a row format (row store). When a query is run on the table the entire block of rows must be moved from disk into memory, where they are processed. This works well when all columns (or most columns) are needed to satisfy the query. Modern designs of computer systems will often now include a column format (column store). This works extremely well on queries that don't need all columns (or most columns) to satisfy the query, such as analytics, aggregations, etc. Only the columns needed will then be transferred from disk into memory. Greenplum gives you a choice of row, column or both.
Page 27
Chapter 2
Greenplum Table Structures
Creating a Column Oriented Table
Column-oriented tables must be Append-Only
CREATE TABLE Emp_Column ( Employee_No INTEGER ,Dept_No SMALLINT This table is ,First_Name VARCHAR(12) stored in a Columnar format ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) WITH (appendonly=true, orientation=column) DISTRIBUTED BY (Last_Name) ;
•
Column-oriented table storage must be append-only tables.
•
If rows are frequently inserted into the table, a roworiented table is better optimized for write operations.
A column-oriented table stores the columns in different blocks. A segment still gets the entire row, but only needs to move the column(s) into memory that are required to satisfy the current query. This can save a lot of time and data movement for queries that do not need all of the columns to satisfy the answer set. This works quite well on aggregating of data, ordered-analytics, etc. The only major issue is that column-oriented tables must be appendonly, however there are pretty substantial savings with column-oriented tables because the compression rates are so much better than row-oriented tables.
Page 28
Chapter 2
Greenplum Table Structures
Comparing Normal Table vs. Columnar Tables Segment Employee_Normal Emp_No
Dept_No
First_Name
Last_Name
Salary
1001
100
Rafael
Minal
90000.00
1004
400
Kyle
Stover
60000.00
1007
200
Sushma
Davis
50000.00
Employee_Columnar Emp_No
Dept_No
First_Name
Last_Name
Salary
1001
100
Rafael
Minal
90000.00
1004
400
Kyle
Stover
60000.00
1007
200
Sushma
Davis
50000.00
Above is a picture of the same table stored as a row-based (top) and column-based design. Notice that either way the node gets the entire row, but Greenplum has the option of storing it in either a row-based or column-based design. The row-based data is stored in one giant block so whenever the table is queried the entire block must move from disk into memory. The column-based design allows individual columns to move from disk to memory.
Page 29
Chapter 2
Greenplum Table Structures
Columnar can move just One Column Block Into Memory Segment Memory
Emp_No
1001 1004 1007
SELECT Emp_No FROM Employee_Columnar ;
Query
Employee_Columnar Emp_No
Dept_No
First_Name
Last_Name
Salary
1001
100
Rafael
Minal
90000.00
1004
400
Kyle
Stover
60000.00
1007
200
Sushma
Davis
50000.00
Columnar is brilliant when a query only needs a small portion of the columns from a table to satisfy the query. This is also considered vertical partitioning. Why eat the whole cake when you can take just a piece?
Page 30
Chapter 2
Greenplum Table Structures
Segments on Distributions are aligned to Rebuild a Row Segment Memory
Emp_No
What if the query needed two columns?
Salary
1001 1004 1007
90000.00 SELECT Emp_No, Salary FROM Employee_Columnar ;
60000.00
50000.00
Employee_Columnar Emp_No
Dept_No
First_Name
Last_Name
Salary
1001
100
Rafael
Minal
90000.00
1004
400
Kyle
Stover
60000.00
1007
200
Sushma
Davis
50000.00
Columnar is brilliant when a query only needs a small portion of the columns from a table to satisfy the query. Instead of moving an entire block containing all columns and throwing out the ones you don't need, you can use a columnar design to only retrieve the columns needed to satisfy the query.
Page 31
Chapter 2
Greenplum Table Structures
Columnar Tables Store Each Column in Separate Blocks Segment Memory AVG Salary
Segment Memory AVG Salary
Segment Memory AVG Salary
This is the same data you saw on the previous page! The difference is that the above is a columnar design. I have color coded this for you. There are 8 rows in the table and five columns. Notice that the entire row stays on the same segment, but each column is a separate block. This is a brilliant design for Ad Hoc queries and analytics because when only a few columns are needed, columnar can move just the columns it needs to move. Columnar can't be beat for queries because the blocks are so much smaller, and what isn't needed isn't moved.
Page 32
Chapter 2
Greenplum Table Structures
Visualize the Data – Rows vs. Columns 24 rows (five columns) stored in 6 blocks in this row-based system
24 rows (five columns) stored in 15 blocks (each column is its own block)
Both example above have the same data and the same amount of data. If your applications tend to need to analyze the majority of columns or read the entire table, then a row-based system (top example) can move more data into memory. Columnar tables are advantageous when only a few columns need to be read. This is just one of the reasons that analytics goes with columnar like bread goes with butter. A row-based system must move the entire page into memory even if it only needs to read one row or even a single column. If a user above needed to analyze the Salary, the columnar system would move 80% less block mass.
Page 33
Chapter 2
Greenplum Table Structures
Table Rows are Either Sorted or Unsorted This table is sorted because it was created with a Clustered Index on Employee_No
Sorted
Employee_No Dept_No Last_Name
First_Name
Salary
1001
100
Rafael
Minal
90000
1004
400
Kyle
Stover
60000
1007
200
Sushma
Davis
50000
1020
200
May
Jones
60000
This table is unsorted (heap) because it was NOT created with a Clustered Index
Not Sorted
Employee_No Dept_No Last_Name
First_Name
Salary
1001 1007
100 200
Rafael Sushma
Minal Davis
90000 50000
1020
200
May
Jones
60000
1004
400
Kyle
Stover
60000
The rows of a table are either sorted or unsorted. If the table has a clustered index it is sorted, but if it does not have a clustered index then it is unsorted, which is referred to as a heap. You can only have one clustered index per table because you can only sort a table one way. Sorting has nothing to do with a distribution key or a Random table, but once the rows are placed on a segment they are either sorted (clustered index) or unsorted (heap).
Page 34
Chapter 2
Greenplum Table Structures
Creating a Clustered Index in Order to Physically Sort Rows CREATE TABLE Order_Cluster (Order_Number INTEGER ,Customer_Number INTEGER ,Order_Date DATE ,Order_Total DECIMAL(8,2) ) DISTRIBUTED BY (Order_Number) ;
Create the Table
Index Name
CREATE INDEX Ord_Date_idx ON Order_Cluster (Order_Date) ;
Create an Index
Indexed Column
CLUSTER Ord_Date_idx ON Order_Cluster ;
Cluster the Index
Above, we have sorted the Order_Cluster table on each segment by Order_Date. You can have one clustered index on a table because you can only sort the rows one specific way. Having a Clustered Index on a Date column will help with range queries, because the data on each segment is sorted by date. CLUSTER is not supported with append-only tables or column-oriented tables.
Page 35
Chapter 2
Greenplum Table Structures
Physically Ordered Tables Are Faster on Certain Queries CREATE TABLE Order_Cluster (Order_Number INTEGER ,Customer_Number INTEGER ,Order_Date DATE ,Order_Total DECIMAL(8,2) ) DISTRIBUTED BY (Order_Number) ; CREATE INDEX Ord_Date_idx ON Order_Cluster (Order_Date) ;
CLUSTER Ord_Date_idx ON Order_Cluster ;
Create the Table
Create an Index Cluster the Index
SELECT * FROM Order_Cluster WHERE Order_Date Between '2015-05-01' AND '2015-05-31'
Range queries on date columns can benefit greatly from a clustered index. The above table is physically sorted on each segment by the column Order_Date. The query above won't have to do a Full Table Scan (FTS), but instead read only the rows that fall within the sorted range.
Page 36
Chapter 2
Greenplum Table Structures
Another Way to Create a Clustered Table CREATE TABLE Order_Cluster_New AS SELECT * FROM Order_Table Create the new table ORDER BY Order_Date ; using an ORDER BY statement
DROP table Order_Table ;
Drop the old table
ALTER TABLE Order_Cluster_New RENAME TO Order_Table ; CREATE INDEX O_Date_Idx ON Order_Table (Order_Date) ;
VACUUM ANALYZE Order_Table ;
Rename the new table to the original table
Create an index on the column you did the ORDER BY on
VACUUM and ANALYZE your newly created table
Above, we have sorted the Order_Cluster table on each segment by Order_Date. You can have one clustered index on a table because you can only sort the rows one specific way. Having a Clustered Index on a Date column will help with range queries, because the data on each segment is sorted by date. CLUSTER is not supported with append-only tables or column-oriented tables. This is a different way to create a table that is sorted. Warning: You cannot drop a table that has dependencies such as views.
Page 37
Chapter 2
Greenplum Table Structures
Creating a B-Tree Index and then Running Analyze Create Table Emp_2000 (Employee_No INTEGER ,Dept_No INTEGER ,Last_Name VARCHAR(1000) )Distributed BY (Dept_No) ; CREATE INDEX Emp_Idx on Emp_2000 (Employee_No) ; Analyze Emp_2000 ; EXPLAIN Select * FROM Emp_2000 WHERE Employee_No = 1000020;
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..200.32 rows=1 width=64) -> Index Scan using emp_idx on emp_2000 (cost=0.00..200.32 rows=1 width=64) Index Cond: employee_no = 1000020
Greenplum provides the index methods B-tree, bitmap, and GiST. The default is a B-tree. Above, we have created a table and loaded it with over 72,000 rows. We then created a B-tree index (non-unique). We ran statistics on the table using the Analyze command. We then typed the keyword EXPLAIN in front of our query to see what type of scan would take place. An Index Scan was utilized. We now know that the index on Employee_No is being used by the system.
Page 38
Chapter 2
Greenplum Table Structures
Creating a Bitmap Index Create Table Emp_75000 (Employee_No INTEGER ,Dept_No INTEGER ,Last_Name VARCHAR(1000) )Distributed BY (Employee_No); CREATE INDEX Dept_bmp_Idx on Emp_75000 USING bitmap (Dept_No) ;
Analyze Emp_75000 ; EXPLAIN Select * FROM Emp_75000 WHERE Dept_No = 1000021;
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..201.40 rows=1 width=64) -> Index Scan using dept_bmp_idx on emp_75000 (cost=0.00..201.40 rows=1 width=64) Index Cond: dept_no = 1000021
Greenplum provides the index methods B-tree, bitmap, and GiST. The default is a B-tree. Above, we have created a table and loaded it with over 75,000 rows. We then created a Bitmap Index. We ran statistics on the table using the Analyze command. We then typed the keyword EXPLAIN in front of our query to see what type of scan would take place. A Bitmap Index Scan was utilized. We now know that the index on Dept_No is being used by the system.
Page 39
Chapter 2
Greenplum Table Structures
Why Create a Bitmap Index? Bitmap indexes are most effective for queries that contain multiple conditions in the WHERE clause on large data warehouse tables that have few UPDATE and DELETE modifications. Each bit in the bitmap corresponds to a possible tuple ID. If the bit is set, the row with the corresponding tuple ID contains the key value. A mapping function converts the bit position to a tuple ID. Bitmaps reduce normal index size because they are compressed for storage. The size of a bitmap index is equivalent to the number of rows in the table times the number of distinct values in the bitmap indexed column.
SELECT * FROM Employee_Table WHERE Dept_No = 100 AND Last_Name = 'Jones'
Queries where multiple column are ANDed together can be a Bitmap candidate
Greenplum provides the index methods B-tree, bitmap, and GiST. The default is a B-tree. These are best used when a query uses multiple columns that are ANDed together that both have a bitmap index.
Page 40
Chapter 2
Greenplum Table Structures
Tables Can Be Partitioned Greenplum Database supports both range and list partitioning.
Range partitioning is based on a numerical range, such as a date range or price range. List partitioning is based on a list of values, such as region, state codes or a products. Greenplum supports multi-level partitioning so a combination of both types is allowed. Table partitioning logically divides large tables, such as Fact tables into smaller, more manageable tables. Partitioned tables improve query performance through partition elimination. Instead of performing a Full Table Scan (FTS) only the data partitions needed to satisfy the query are scanned. Partitioning does not change the physical distribution of table data across segments. It changes the way each segment sorts the rows. A partitioned table can also help with data maintenance tasks, such as rolling old data out of the data warehouse or loading new data into the data warehouse.
Page 41
Chapter 2
Greenplum Table Structures
A Table Partitioned By Range (Per Month) CREATE TABLE Ord_Tbl_Part ( Order_Number integer ,Customer_Number integer ,Order_Date date ,Order_Total decimal(10,2)) DISTRIBUTED BY (Order_Number) PARTITION BY RANGE (Order_Date) ( START(date '2015-01-01') INCLUSIVE END (date '2015-12-31') EXCLUSIVE EVERY (INTERVAL '1 month'));
Segment 1
Segment 2
Segment 3
Segment 4
Ord_Tbl_Part
Ord_Tbl_Part
Ord_Tbl_Part
Ord_Tbl_Part
01
JAN
JAN
JAN
JAN
02 03
FEB
FEB
FEB
FEB
MAR
MAR
MAR
MAR
12
DEC
DEC
DEC
DEC
Above is the CREATE statement for the Ord_Tbl_Part table. This table is distributed by Hash on the column (Order_Number) and that is how the rows are placed on the proper segments, but this table is partitioned by Order_Date. This partitions the data on each segment by month. This physical partitioning allows for faster loads and faster maintenance (Insert, Update, Deletes). This is the design you want when users are performing range queries on dates. Page 42
Chapter 2
Greenplum Table Structures
A Visual of a Partitioned Table by Range (Month) Segment 1
Segment 2
Segment 3
Segment 4
Ord_Tbl_Part
Ord_Tbl_Part
Ord_Tbl_Part
Ord_Tbl_Part
01
JAN
JAN
JAN
JAN
02 03 04 05 06 07 08 09 10 11
FEB
FEB
FEB
FEB
MAR APR
MAR APR
MAR APR
MAR APR
MAY JUN JUL AUG
MAY JUN JUL AUG
MAY JUN JUL AUG
MAY JUN JUL AUG
SEP
SEP
SEP
SEP
OCT NOV
OCT NOV
OCT NOV
OCT NOV
DEC
DEC
DEC
DEC
12
SELECT * FROM Ord_Tbl_Part WHERE Order_Date Between '2015-05-01' AND '2015-05-31'
Each segment holds rows that were hash distributed by Order_Number, but once the rows for the table arrive on their respective segments they are sorted by month of Order_Date. Each month is stored in separate blocks. The above range query will not do a Full Table Scan (FTS). Each segment merely needs to read their May blocks.
Page 43
Chapter 2
Greenplum Table Structures
Tables Can Be Partitioned by Day CREATE TABLE Ord_Tbl_Day ( Order_Number integer ,Customer_Number integer ,Order_Date date ,Order_Total decimal(10,2)) DISTRIBUTED BY (Order_Number) PARTITION BY RANGE (Order_Date) ( START(date '2015-01-01') INCLUSIVE END (date '2015-12-31') EXCLUSIVE EVERY (INTERVAL '1 Day')); SELECT * FROM Ord_Tbl_Day WHERE Order_Date = '2015-05-31' ;
Each segment reads a Single partition
SELECT * FROM Ord_Tbl_Day WHERE Order_Date Between '2015-05-01' AND '2015-05-07' ; SELECT * FROM Ord_Tbl_Day WHERE Order_Date >= '2015-05-01' AND Order_Date = 1030 is in page 3. Page 93
Chapter 4
The Technical Details
The Building of a B-Tree for a Clustered Index (3 of 3) 1001
Intermediate Node 1001
Header
1030
Header
2000
Header
3000
6000
Intermediate Node 3000
Header
4000
Header
5000
Header
Root Node
Intermediate Node 6000
Header
7000
Header
8000
Header
Leaf Pages containing the actual data rows
Let's look at this B-Tree starting at the leaf level. Each leaf is an 32 K page that contains data rows. Each data row has a RowID containing the FileID:PageNo:RowNum, which takes up 32 bytes. The rows are sorted in each page by Employee_No. Each Intermediate node has a pointer to the first RowID and Employee_No for every leaf it is responsible for. The Root node has a pointer to the first RowID and Employee_No for each Intermediate node. As a leaf adds rows and expands past 32 K it splits. As an Intermediate node adds leafs and expands past 32 K it splits into two more Intermediate nodes. As a Root node continues to add more Intermediate node pointers and expands past 32 K it splits into two Root nodes. The reason they call it a B-Tree (Balanced Tree) is because every row can be retrieved at the exact same speed. Page 94
Chapter 4
The Technical Details
When Do I Create a Clustered Index? 1. OLTP-type applications where very fast single row lookup is required, typically by means of the primary key. Creating a clustered index on the primary key is ideal. 2. Clustered indexes are great for range queries that use operators such as BETWEEN, >, >=, AVGSAL Messages Garden of Analysis Result 1
dept_no last_name 1 200 Smith 2 400 Strickling 3 400 Harrison
first_name John Cletus Herbert
salary 48000.00 54500.00 54500.00
avgsal 44944.44 48333.33 48333.33
Most derived tables involve calculations, aggregations or ordered analytics. This allows tables and derived columns to mix well on the final report. Above, we are finding all employees who make a salary that is greater than the average salary within their own department. We created a derived table that holds all departments and the average salary within the department. We then join the derived table (named TeraTom) to the employee_table where we can check the salary vs. the avg (salary).
Page 637
Chapter 17
Temporary Tables
Quiz - Answer the Questions SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty ;
1) What is the name of the derived table? __________ 2) How many columns are in the derived table? _______ 3) What is the name of the derived table columns? ______
4) Is there more than one row in the derived table? _______ 5) What common keys join the Employee and Derived? _______ 6) Why were the join keys named differently? ______________
Page 638
Chapter 17
Temporary Tables
Answer to Quiz - Answer the Questions SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty ;
1) What is the name of the derived table? TeraTom 2) How many columns are in the derived table? 2
3) What’s the name of the derived columns? Depty and AVGSAL 4) Is their more than one row in the derived table? Yes 5) What keys join the tables? Dept_No and Depty 6) Why were the join keys named differently? If both were named Dept_No, we would error unless we full qualified.
Page 639
Chapter 17
Temporary Tables
Clever Tricks on Aliasing Columns in a Derived Table SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table Alias Here INNER JOIN
1
(SELECT Dept_No as Depty, AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) as TeraTom ON Dept_No = Depty ;
SELECT E.Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table as E INNER JOIN Alias Here
2
(SELECT Dept_No, AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) as TeraTom ON E.Dept_No = TeraTom.Dept_No ;
Page 640
Chapter 17
Temporary Tables
An example of Two Derived Tables in a Single Query WITH T (Dept_No, AVGSAL) AS (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) SELECT T.Dept_No, First_Name, Last_Name, AVGSAL, Counter FROM Employee_Table as E INNER JOIN T ON E.Dept_No = T.Dept_No INNER JOIN (SELECT Employee_No, SUM(1) OVER(PARTITION BY Dept_No ORDER BY Dept_No, Last_Name Rows Unbounded Preceding) FROM Employee_Table) as S (Employee_No, Counter) ON E.Employee_No = S.Employee_No ORDER BY T.Dept_No; We have two derived tables in our example. One is used in a WITH statement and the other is a derived table within the query itself.
Page 641
Chapter 17
Temporary Tables
MULTIPLE Derived Tables using the WITH Command Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum
Systems + + + + + + + + + + + + + + +
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1 Query 2 Query 3
Aster Data WITH E AS (SELECT Dept_No, Last_Name, Salary Azure Cloud Separate FROM Employee_Table) multiple DB2 Derived ,D AS (SELECT Dept_No, Department_Name Excel tables in a FROM Department_Table) Greenplum WITH Hadoop SELECT E.*, department_name by using a Kognitio FROM E INNER JOIN D comma Netezza ON E.Dept_No = D.Dept_No Oracle WHERE E.Dept_No = 100 Matrix Redshift Messages Garden of Analysis Result 1 SQL Server Sybase e.dept_no e.last_name e.salary department_name Teradata Chambers 48850.00 Marketing 1 100 Vertica
Using the WITH Command, we can CREATE multiple Derived tables that can be referenced elsewhere in the query.
Page 642
Chapter 17
Temporary Tables
Finding the First Occurrence Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1 Query 2 Query 3 WITH Derived_Tbl AS (select Product_ID as Prod, Sale_Date, Daily_Sales, Row_Number() over (PARTITION BY product_id ORDER BY Sale_Date ASC) AS Row_Num from sales_table) Select * from Derived_Tbl Where Row_Num = 1 ; Messages
1 2 3
Prod 1000 2000 3000
Garden of Analysis
Result 1
Sale_Date Daily_Sales 09/28/2000 48850.40 09/28/2000 41888.88 09/28/2000 61301.77
Row_Num 1 1 1
Using the Row_Number ordered analytic and by partitioning of Product_ID and the sorting by Sale_Date ASC we are bringing back only the first occurrence of a row based on the earliest Sale_Date. This can be done because we are placing our query in a derived table and then selecting from that derived table using a WHERE clause.
Page 643
Chapter 17
Temporary Tables
Finding the Last Occurrence Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1 Query 2 Query 3 WITH Derived_Tbl AS (select Product_ID as Prod, Sale_Date, Daily_Sales, Row_Number() over (PARTITION BY product_id ORDER BY Sale_Date Desc) AS Row_Num from sales_table) Select * from Derived_Tbl Where Row_Num = 1 ; Messages
1 2 3
Prod 1000 2000 3000
Garden of Analysis
Result 1
Sale_Date Daily_Sales 10/04/2000 54553.10 10/04/2000 32800.50 10/04/2000 15675.33
Row_Num 1 1 1
Using the Row_Number ordered analytic and by partitioning of Product_ID and the sorting by Sale_Date DESC we are bringing back only the first occurrence of a row based on the latest Sale_Date. This can be done because we are placing our query in a derived table and then selecting from that derived table using a WHERE clause.
Page 644
Chapter 17
Temporary Tables
Three Steps to Creating a Temporary Table CREATE Temporary TABLE Dept_Sum ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT PRESERVE ROWS ; INSERT INTO Dept_Sum SELECT Dept_No ,SUM(Salary) FROM Employee_Table GROUP BY 1 ; SELECT * FROM Dept_Sum ORDER BY Dept_No;
Create the Temporary Table
Populate the Temporary Table With an INSERT/SELECT
SELECT * FROM Dept_Sum ORDER BY 2 DESC;
Query the table all session long
When you use the phrase ON COMMIT PRESERVE ROWS the data will stay in the table all session long. The normal ANSI default is ON COMMIT DELETE ROWS, which will delete the rows after a single transaction. However, Greenplum defaults to ON COMMIT PRESERVE ROWS.
Page 645
Chapter 17
Temporary Tables
Three Versions of Creating a Temporary Table CREATE Temporary TABLE Dept_Agg1 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT PRESERVE ROWS ; CREATE Temporary TABLE Dept_Agg2 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DELETE ROWS ; CREATE Temporary TABLE Dept_Agg3 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DROP ; I will explain how to use these different techniques in the next few pages.
Page 646
Chapter 17
Temporary Tables
ON COMMIT PRESERVE ROWS is the Greenplum Default CREATE Temporary TABLE Dept_Agg5 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT PRESERVE ROWS ; CREATE Temporary TABLE Dept_Agg6 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) This will default to ON COMMIT PRESERVE ROWS
ANSI defaults to ON COMMIT DELETE ROWS, but Greenplum has cleverly made their default ON COMMIT PRESERVE ROWS. Both examples above accomplish the same thing.
Page 647
Chapter 17
Temporary Tables
ON COMMIT DELETE ROWS CREATE Temporary TABLE Dept_Agg8 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DELETE ROWS ; INSERT INTO Dept_Agg8 SELECT Dept_No, SUM(Salary) FROM Employee_Table GROUP BY 1;
SELECT * FROM Dept_Agg8 Order by 1; dept_no _______
sum_salary _________
No rows returned because the table is empty
ON COMMIT DELETE ROWS allows the user a single transaction after creating the table before it deletes the contents. After the INSERT/SELECT the table's rows were deleted. This seems stupid at first, but it is actually smart. The next page will show you how to take advantage of this and why it is used.
Page 648
Chapter 17
Temporary Tables
How to Use the ON COMMIT DELETE ROWS Option CREATE Temporary TABLE Dept_Agg7 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DELETE ROWS ;
Begin Transaction; Answer Set Returns INSERT INTO Dept_Agg7 SELECT Dept_No, SUM(Salary) FROM Employee_Table GROUP BY 1; SELECT * FROM Dept_Agg7 Order by 1;
dept_no _______
10 100 200 300 400 ?
sum_salary _________
64300.00 48850.00 89888.88 40200.00 145000.00 32800.50
End Transaction;
The ON COMMIT DELETE ROWS option allow you only one transaction after creating the temporary table, but you can embed the INSERT/SELECT and the SELECT to get the report you need inside a Begin Transaction/End Transaction statement. This option should be used if you only need the temporary table to produce a single report.
Page 649
Chapter 17
Temporary Tables
ON COMMIT DROP CREATE Temporary TABLE Dept_Aggb ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DROP ; INSERT INTO Dept_Aggb SELECT Dept_No, SUM(Salary) FROM Employee_Table GROUP BY 1; SELECT * FROM Dept_Aggb Order by 1;
Error – The table Dept_Aggb does not exist!
ON COMMIT DROP will drop the temporary table after a single transaction. That is why the error above occurred. After the INSERT/SELECT, the temporary table rows was dropped. This seems stupid at first, but it is actually smart. The next page will show you how to take advantage of this and why it is used.
Page 650
Chapter 17
Temporary Tables
How to Use the ON COMMIT DROP Option Begin Transaction; CREATE Temporary TABLE Dept_AggA ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT DROP ;
INSERT INTO Dept_AggA SELECT Dept_No, SUM(Salary) FROM Employee_Table GROUP BY 1; SELECT * FROM Dept_AggA Order by 1;
Answer Set Returns dept_no _______ 10 100 200 300 400 ?
sum_salary _________ 64300.00 48850.00 89888.88 40200.00 145000.00 32800.50
End Transaction;
The ON COMMIT DROP option will drop the temporary table after a single transaction, which includes the CREATE statement. However, you can embed the CREATE Statement, the INSERT/SELECT and the SELECT to get the report you need inside a Begin Transaction/End Transaction statement. The great news is that the table no longer exists!
Page 651
Chapter 17
Temporary Tables
Create Table AS This table is exactly like the Order_Table
CREATE TABLE New_Order AS SELECT * FROM Order_Table This table uses only some columns
CREATE TABLE New_Employee AS SELECT First_Name ,Last_Name ,Salary FROM Employee_Table This table is a temporary table
CREATE Temporary TABLE temp_order AS SELECT * FROM Order_Table Above are some great example to quickly CREATE a Table from another table.
Page 652
Chapter 17
Temporary Tables
Creating a Temporary Table Using a CTAS that Joins Multiple Tables Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1 Query 2 Query 3 CREATE Temporary Table Emp_Dept_Temp A CTAS statement can create AS SELECT E.*, Department_Name, Budget and populate FROM Employee_Table E a Temporary table INNER JOIN Department_Table D The table goes away ON E.Dept_No = D.Dept_No; After the session is over Messages
Garden of Analysis
Result 1
0 rows processed. CREATE TABLE Command Complete
Only the user can see the table in only the session it was created in
You can create a temporary table using a CTAS (Create Table AS) statement, as in the above example.
Page 653
Chapter 17
Temporary Tables
Create Table LIKE This example uses an INSERT/SELECT
CREATE TABLE Sales3 (LIKE Sales_Table); INSERT INTO Sales3 SELECT * FROM Sales_Table;
SELECT * FROM Sales3;
This example creates a temporary table
CREATE Temporary TABLE Sales4 (LIKE Sales_Table); INSERT INTO Sales4 SELECT * FROM Sales_Table; SELECT * FROM Sales4;
The example above creates at table using the LIKE statement. It then loads the data with an INSERT/SELECT. You are now ready to query the new table. Notice that you can do the same technique to create a temporary table.
Page 654
Chapter 17
Temporary Tables
Creating a Clustered Index on a Temporary Table CREATE Temporary TABLE Dept_Agg_Vol ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) ON COMMIT PRESERVE ROWS ;
Create the Table
Index Name
CREATE INDEX Temp_Idx ON Dept_Agg_Vol (Dept_No) ;
Create an Index
Indexed Column
CLUSTER Temp_Idx ON Dept_Agg_Vol ;
Cluster the Index
Above we have sorted the temporary table Dept_Agg_Vol on each segment by Dept_No. You can have one clustered index on a table because you can only sort the rows one specific way. Having a Clustered Index on a Dept_No column will help with range queries, because the data on each segment is sorted by Dept_No. CLUSTER is not supported with append-only tables or column-oriented tables.
Page 655
Chapter 18
Page 656
Character Strings
Chapter 18
Character Strings
Chapter 18 – Character Strings
“It’s always been and always will be the same in the world: the horse does the work and the coachman is tipped.” - Anonymous
Page 657
Chapter 18
Character Strings
The LENGTH Command Counts Characters Nexus Chameleon File Edit View Query Tools Help Web Windows System: Hadoop
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History
Sandbox
EXECUTE
?
New Query
Query 1 Query 2 Query 3 SELECT First_Name ,LENGTH (First_Name) AS Lnth FROM Employee_Table WHERE LENGTH (First_Name) < 7 ORDER BY 1; Messages
first_name
1 2 3 4
Billy Cletus John Mandee
Garden of Analysis
Result 1
Lnth 5 6 4 6
The LENTH command counts the number of characters. If 'Tom' was in the Employee_Table, his length would be three.
Page 658
Chapter 18
Character Strings
The LENGTH Command – Spaces can Count too Nexus Chameleon History
File Edit View Query Tools Help Web Windows System: Oracle
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
EXECUTE
Sandbox ?
New Query
Query 1 Query 2 Query 3 SELECT 'T o m' AS First_Name ,LENGTH('T o m') AS Lnth There are spaces in between each letter Messages
Garden of Analysis
Result 1
First_Name Length 1
T o m
5
Spaces in between count
If ‘T o m’ was in the Employee_Table, his length would be 5. Yes, spaces in between do count as characters.
Page 659
Chapter 18
Character Strings
The LENGTH Command Doesn't Count Trailing Spaces CHAR (20) SELECT Last_Name ,LENGTH (Last_Name) AS Lnth FROM Employee_Table ORDER BY 1; Last_Name Lnth __________ _____ Chambers 8 Coffing 7 Harrison 8 Jones 5 Larkins 7 Reilly 6 Smith 5 Smythe 6 Strickling 10 The LENGTH command counts characters, but it auto-trims the ending spaces at the end of each last name.
Page 660
Chapter 18
Character Strings
UPPER and LOWER Commands Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
?
Query 1 Query 2 Query 3 SELECT Last_Name AS "Name_Normal" ,UPPER (Last_Name) AS "Name_Upper" ,LOWER (Last_name) AS "Name_Lower" FROM Employee_Table WHERE Last_Name LIKE 'S%' ; Messages
Garden of Analysis
Result 1
Name_Normal Name_Upper Name_Lower smythe SMYTHE Smythe 1 STRICKLING strickling Strickling 2 smith SMITH Smith 3
Upper () converts text to uppercase and Lower () converts text to lowercase.
Page 661
Sandbox New Query
Chapter 18
Character Strings
Using the LOWER Command Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1
SELECT LOWER('AbCdE') as "Go Low" FROM Order_Table Limit 1 ; Messages
Garden of Analysis
Result 1
Go Low 1
abcde
The LOWER function converts all letters in a specified string to lowercase letters. If there are characters in the string that are not letters, they are not affected by the LOWER command.
Page 662
Chapter 18
Character Strings
A LOWER Command Example Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1 SELECT 'They match' as "Do They Match?" FROM Order_Table WHERE LOWER('ABCDE') = 'abcde' Limit 1 ; Messages
Garden of Analysis
Result 1
Do They Match? 1
They match
The LOWER function converts all letters in a specified string to lowercase letters. If there are characters in the string that are not letters, they are not affected by the LOWER command. Above, we compare a LOWER 'ABCDE' = 'abcde' and they are now equivalent because we have lowercased the 'ABCDE'.
Page 663
Chapter 18
Character Strings
Using the UPPER Command Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1
SELECT UPPER('AbCdE') as "Go upper" FROM Order_Table Limit 1 ; Messages
Garden of Analysis
Result 1
Go upper 1
ABCDE
The UPPER function converts all letters in a specified string to uppercase letters. If there are characters in the string that are not letters, they are not affected by the UPPER command.
Page 664
Chapter 18
Character Strings
An UPPER Command Example Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1 SELECT 'They match' as "Do They Match?" FROM Order_Table WHERE 'ABCDE' = UPPER('abcde') LIMIT 1 ; Messages
Garden of Analysis
Result 1
Do They Match?
1
They match
The UPPER function converts all letters in a specified string to uppercase letters. If there are characters in the string that are not letters, they are not affected by the UPPER command. Above, we compare a string of 'ABCDE' = UPPER 'abcde' and they are now equivalent because we have uppercased the 'abcde'.
Page 665
Chapter 18
Character Strings
Non-Letters are Unaffected by UPPER and LOWER Nexus Chameleon History
File Edit View Query Tools Help Web Windows System: Oracle
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
Sandbox
EXECUTE
?
New Query
Query 1 SELECT LOWER('ABCDE1') as "Number Stays" ,UPPER('abCdE2') as "Numbers Hold" FROM Order_Table LIMIT 1 ; Messages
Garden of Analysis
Result 1
Number Stays Numbers Hold 1
abcde1
ABCDE2
The UPPER and LOWER functions convert all letters in a specified string to either upper or lower case letters. If there are characters in the string that are not letters, they are not affected by the UPPER or LOWER commands. Notice in our example that the numbers 1 and 2 were unaffected by the LOWER and UPPER commands.
Page 666
Chapter 18
Character Strings
The CHARACTERS Command Counts Characters Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
VARCHAR SELECT First_Name , CHARACTER_Length(First_Name) AS Lnth FROM Employee_Table WHERE CHARACTER_Length (First_Name) < 7 ORDER BY 1;
Answer Set First_Name Lnth __________ ____ Billy 5 Cletus
6
John
4
Mandee
6
The CHARACTERS command counts the number of characters. If ‘Tom’ was in the Employee_Table, his length would be three.
Page 667
Chapter 18
Character Strings
The CHARACTERS Command and Character Data CHAR (20) SELECT Last_Name ,CHARACTER_LENGTH(Last_Name) AS Lnth FROM Employee_Table ORDER BY 1;
Last_Name Lnth __________ _____ Chambers 8 Coffing 7 Harrison 8 Jones 5 Larkins 7 Reilly 6 Smith 5 Smythe 6 Strickling 10
The CHARACTERS command brings back a length even for Char (20) data type.
Page 668
Chapter 18
Character Strings
CHARACTER_LENGTH and OCTET_LENGTH
Query 1 SELECT First_Name ,CHARACTER_Length(First_Name) AS C_Length FROM Employee_Table ;
Query 2 SELECT First_Name ,Octet_Length (First_Name) AS C_Length FROM Employee_Table ; You can also use the OCTET LENGTH command. These two queries get the same exact answer sets!
Page 669
Chapter 18
Character Strings
The TRIM Command trims both Leading and Trailing Spaces Query 1
SELECT Last_Name ,Trim(Last_Name) AS No_Spaces FROM Employee_Table ;
Query 2 SELECT Last_Name ,Trim(Both from Last_Name) AS No_Spaces FROM Employee_Table ;
Both queries above do the exact same thing. They remove spaces from the beginning and the end of the column Last_Name. Both queries trim both the leading and trailing spaces from Last_Name.
Page 670
Chapter 18
Character Strings
Trim Combined with the CHARACTERS Command SELECT ' Rodriquez ' ,Characters (Trim (' Rodriquez ')) AS No_Spaces ;
2 front spaces
2 back spaces
' Rodriquez '
' Rodriquez ' __________ No_Spaces ___________ Rodriquez 9 This will allow for the character count to only be 9 because both the leading and trailing spaces have been cut.
Page 671
Chapter 18
Character Strings
How to TRIM only the Trailing Spaces SELECT ' Rodriquez ' ,Characters (Trim (Trailing FROM ' Rodriquez ')) AS Front_Spaces ;
2 front spaces
2 back spaces
' Rodriquez '
' Rodriquez ' ___________ Rodriquez
Front_Spaces ___________ 11
The TRAILING FROM Command allows you to only TRIM the spaces behind the Last_Name. Now, we will still get a character count of 11 because we are only cutting off the trailing spaces and not the beginning spaces.
Page 672
Chapter 18
Character Strings
REGEXP_REPLACE Nexus Chameleon File Edit View Query Tools Help Web Windows System: Oracle
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History EXECUTE
Sandbox ?
New Query
Query 1 SELECT Dept_No ,REGEXP_REPLACE(Dept_No, 0, 1) As Zero_to_1 FROM Employee_Table WHERE Dept_No IN (100, 200) ; Replace 0
Messages
Dept_No
1 2 3
200 100 200
Garden of Analysis
with 1 for Dept_No for the Result 1 first occurrence
Zero_to_1 210 110 210
The query above replaces the first occurrence of a zero with a one for the column Dept_No.
Page 673
Chapter 18
Character Strings
Concatenation Nexus Chameleon File Edit View Query Tools Help Web Windows
System: Hadoop
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History
Sandbox
EXECUTE
?
New Query
Query 1 Query 2 Query 3 SELECT First_Name ,Last_Name ,First_Name Two pipe symbols || || ' ' mean concatenation || Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy' Messages
first_name 1 Squiggy
Garden of Analysis
Result 1
last_name
Full_Name
Jones
Squiggy Jones
Concatenation allows you to combine multiple columns into one column. The || (Pipe Symbol) on your keyboard is just above the ENTER key. Don’t put a space in between, but just put two Pipe Symbols together. In this example, we have combined the first name, then a single space, and then the last name to get a new column called Full_Name.
Page 674
Chapter 18
Character Strings
A Visual of the TRIM Command Using Concatenation Concatenation without Trim and with Trim SELECT Last_Name concatenate ,First_Name ,Last_Name || First_Name as NameBackwards ,TRIM(Last_Name) || First_Name as TrimNameBackwards FROM Employee_Table
Last_Name First_Name __________ __________ Jones Squiggy Smith John Smythe Richard Harrison Herbert Chambers Mandee Strickling Cletus Reilly William Coffing Billy Larkins Loraine
NameBackwards TrimNameBackwards ______________________ __________________ Jones Squiggy JonesSquiggy Smith John SmithJohn Smythe Richard SmytheRichard Harrison Herbert HarrisonHerbert Chambers Mandee ChambersMandee Strickling Cletus StricklingCletus Reilly William ReillyWilliam Coffing Billy CoffingBilly Larkins Loraine LarkinsLoraine
When you use the TRIM command on a column, that column will have all beginning and ending spaces removed.
Page 675
Chapter 18
Character Strings
Trim and Trailing is Case Sensitive VARCHAR Capitol 'Y'
SELECT First_Name, Trim(trailing 'Y' from First_Name) AS No_Y, Trim(trailing 'y' from First_Name) AS Success FROM Employee_Table Lower Case 'y' ORDER BY 1; For leading and trailing TRIM commands, case sensitivity is important. First_Name No_Y Success __________ ________ __________ Billy Billy Bill Cletus Cletus Cletus Herbert Herbert Herbert John John John Loraine Loraine Loraine Mandee Mandee Mandee Richard Richard Richard Squiggy Squiggy Squigg William William William
For LEADING and TRAILNG TRIM commands, case sensitivity is required.
Page 676
Chapter 18
Character Strings
How to TRIM Trailing Letters VARCHAR
SELECT First_Name ,Trim(trailing 'y' from First_Name) AS No_Y ,Last_Name ,Trim(trailing 'g' from (TRIM (Last_Name))) AS No_G FROM Employee_Table ; CHAR(20)
First_Name No_Y __________ ________
Last_Name _________ No_G __________
Squiggy John Richard Herbert Mandee Cletus William Billy Loraine
Jones Smith Smythe Harrison Chambers Strickling Reilly Coffing Larkins
Squigg John Richard Herbert Mandee Cletus William Bill Loraine
Jones Smith Smythe Harrison Chambers Stricklin Reilly Coffin Larkins
The above example removed the trailing ‘y’ from the First_Name and the trailing ‘g’ from the Last_Name. Remember that this is case sensitive.
Page 677
Chapter 18
Character Strings
The SUBSTRING Command SELECT First_Name, SUBSTRING (First_Name FROM 2 for 3) AS Quiz FROM Employee_Table ; Start in position 2
First_Name __________ Squiggy John Richard Herbert Mandee Cletus William Billy Loraine
Go for 3 positions
Quiz ______ qui ohn ich erb and let ill ill ora
This is a SUBSTRING. The substring is passed two parameters, and they are the starting position of the string and the number of positions to return (from the starting position). The above example will start in position 2 and go for 3 positions! Page 678
Chapter 18
Character Strings
SUBSTRING and SUBSTR are equal, but use different syntax Query 1 with Substring
SELECT First_Name, SUBSTRING(First_Name FROM 2 for 3) AS Quiz FROM Employee_Table ;
Query 2 with Substr
SELECT First_Name, SUBSTR (First_Name , 2 ,3) AS Quiz2 FROM Employee_Table ;
Both queries above are going to yield the same results! SUBSTR is just a different way of doing a substring. Both have two parameters in starting position and number of character length.
Page 679
Chapter 18
Character Strings
How SUBSTRING Works with NO ENDING POSITION SELECT First_Name, SUBSTRING (First_Name FROM 2) AS GoToEnd FROM Employee_Table ; Start in Position 2
First_Name GoToEnd __________ _________ Squiggy quiggy John ohn Richard ichard Herbert erbert Mandee andee Cletus letus William illiam Billy illy Loraine oraine
If you don’t tell the Substring the end position, it will go all the way to the end.
Page 680
Chapter 18
Character Strings
Using SUBSTRING to move backwards SELECT First_Name, SUBSTRING (First_Name FROM 0 For 6) AS Before1 FROM Employee_Table ; Start in Position 0 (one space before)
First_Name Before1 __________ ________ Squiggy Squig John John Richard Richa Herbert Herbe Mandee Mande Cletus Cletu William Willi Billy Billy Loraine Lorai
A starting position of zero moves one space in front of the beginning. Notice that our FOR Length is 6 so ‘Squiggy’ turns into ‘ Squig’. The point being made here is that both the starting position and ending positions can move backwards which will come in handy as you see other example.
Page 681
Chapter 18
Character Strings
How SUBSTRING Works with a Starting Position of -1 SELECT First_Name, SUBSTRING (First_Name FROM -1 For 3) AS Before2 FROM Employee_Table ; Start in Position -1. This is two spaces before.
First_Name Before2 __________ ________ Squiggy S John J Richard R Herbert H Mandee M Cletus C William W Billy B Loraine L
A starting position of -1 moves two spaces in front of the beginning. Notice that our FOR Length is 3, so each name delivers only the first initial. The point being made here is that both the starting position and ending positions can move backwards which will come in handy as you see other example.
Page 682
Chapter 18
Character Strings
How SUBSTRING Works with an Ending Position of 0 SELECT First_Name, SUBSTRING (First_Name FROM 3 For 0) AS WhatsUp FROM Employee_Table ; Go for 0 positions
First_Name WhatsUp __________ ________ Squiggy John Richard Herbert Mandee Cletus William Billy Loraine
In our example above, we start in position 3, but we go for zero positions, so nothing is delivered in the column. That is what’s up!
Page 683
Chapter 18
Character Strings
An example using SUBSTRING, TRIM and CHAR Together SELECT Last_Name CHAR(20) ,SUBSTRING(Last_Name FROM CHARACTER_LENGTH( TRIM (TRAILING FROM Last_Name)) -1 FOR 2) AS Letters FROM Employee_Table; Last_Name __________ Jones Smith Smythe Harrison Chambers Strickling Reilly Coffing Larkins
Letters ______ es th he on rs ng ly ng ns
The SQL above brings back the last two letters of each Last_Name even though the last names are of different length. We first trimmed the spaces off of Last_Name. Then we counted the characters in the Last_Name. Then we subtracted two from the Last_Name character length and then passed it to our substring as the starting position. Since we didn’t give an ending position in our substring it defaulted to the end.
Page 684
Chapter 18
Character Strings
The POSITION Command finds a Letters Position SELECT Last_Name ,Position ('e' in Last_Name) AS Find_The_E ,Position ('f' in Last_Name) AS Find_The_F FROM Employee_Table ;
4th
e is in position
e is 2nd position in name
Last_Name Find_The_E Find_The_F __________ __________ __________ Jones 4 0 Smith 0 0 Smythe 6 0 No f is in Harrison 0 0 the name Chambers 6 0 Strickling 0 0 Reilly 2 0 1st f is in Coffing 0 3 3rd position Larkins 0 0
This is the position counter. What it will do is tell you what position a letter is on. Why did Jones have a 4 in the result set? The ‘e’ was in the 4th position. Why did Smith get a zero for both columns? There is no ‘e’ in Smith and no ‘f’ in Smith. If there are two ‘f’s, only the first occurrence is reported.
Page 685
Chapter 18
Character Strings
Concatenation
Two Pipe Symbols together (no space) mean concatenate
SELECT First_Name ,Last_Name ,First_Name A space || ' ' || Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy'
First_Name _________
Last_Name Full_Name _________ ___________
Squiggy
Jones
Squiggy Jones
See those || symbols? Those represent concatenation. That allows you to combine multiple columns into one column. The || (Pipe Symbol) on your keyboard is just above the ENTER key. Don’t put a space in between, but just put two Pipe Symbols together. In this example, we have combined the first name, then a single space and then the last name to get a new column called ‘Full name’ like Squiggy Jones.
Page 686
Chapter 18
Character Strings
Concatenation and SUBSTRING A Period (.) and a space
SELECT First_Name ,Last_Name ,Substring(First_Name, 1, 1) || '. ' || Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy'
_________ First_Name _________ Last_Name _________ Full_Name Squiggy Jones S. Jones Of the three items being concatenated together, what is the first item of concatenation in the example above? The first initial of the First_Name. Then, we concatenated a literal space and a period. Then, we concatenated the Last_Name.
Page 687
Chapter 18
Character Strings
Four Concatenations Together CHAR(20)
VARCHAR(12)
SELECT First_Name ,Last_Name ,TRIM(Last_Name) ||' ' || Substring(First_Name, 1, 1) || '.' AS Last_Name_1st FROM Employee_Table WHERE First_Name = 'Squiggy' ;
First_Name Last_Name_1st __________ Last_Name _________ _____________
Squiggy
Jones
Jones S.
Why did we TRIM the Last_Name? To get rid of the spaces or the output would have looked odd. How many items are being concatenated in the example above? There are 4 items concatenated. We start by trimming the Last_Name. Then we concatenate a single space. Then, we concatenate the first initial of the first name. And finally we concatenate a period.
Page 688
Chapter 18
Character Strings
Troubleshooting Concatenation ERROR: There should never be spaces between the pipe symbols
SELECT First_Name ,Last_Name ,TRIM (Last_Name) | | First_Name AS LastFirst FROM Employee_Table WHERE First_Name = 'Squiggy' ; This is now perfect
SELECT First_Name ,Last_Name ,TRIM (Last_Name) || First_Name AS LastFirst FROM Employee_Table WHERE First_Name = 'Squiggy' ; First_Name Last_Name ___________ LastFirst __________ __________ Squiggy
Jones
JonesSquiggy
What happened above to cause the error? Can you see it? The Pipe Symbols || have a space between them like | |, when it should be ||. It is a tough one to spot, so be careful.
Page 689
Chapter 19
Page 690
Interrogating the Data
Chapter 19
Interrogating the Data
Chapter 19 – Interrogating the Data
"The difference between genius and stupidity is that genius has its limits" - Albert Einstein
Page 691
Chapter 19
Interrogating the Data
Quiz – What would the Answer be? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Class_Code ,Grade_Pt / (Grade_Pt * 2 ) as Math1 FROM Student_Table ORDER BY 1,2 ;
Can you guess what would return in the Answer Set?
Using the Student_Table above, try and predict what the answer will be if this query was running on the system.
Page 692
Chapter 19
Interrogating the Data
Answer to Quiz – What would the Answer be? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Class_Code ,Grade_Pt / (Grade_Pt * 2 ) as Math1 FROM Student_Table ORDER BY 1,2 ; Error – Division by zero
You get an error when you DIVIDE by ZERO! Let’s turn the page and fix it!
Page 693
Chapter 19
Interrogating the Data
The NULLIF Command Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Class_Code ,Grade_Pt / ( NULLIF (Grade_pt, 0) * 2 ) AS Math1 FROM Student_Table; If you have a calculation where a ZERO could kill the operation, and you don’t want that, you can use the NULLIFZERO command to convert any zero value to a null value.
Page 694
Chapter 19
Interrogating the Data
Quiz – Fill in the Answers for the NULLIF Command Student_Table Student_ID _________ 423400 123250 234121
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Phillips Martin SR 3.00 Thomas Wendy FR 4.00
SELECT Fill in the Answer Last_Name Set below after ,NULLIF(Grade_Pt, 0) AS GP1 looking at the table ,NULLIF(Grade_Pt, 3.0) AS GP2 and the query. ,NULLIF(Grade_Pt, 4.0) AS GP3 FROM Student_Table WHERE Student_ID IN (423400, 123250, 234121) ORDER BY Last_Name ; Last_Name GP1 __________ ____ Larkins Phillips Thomas
GP2 ____
What would the above Answer Set produce from your analysis?
Page 695
GP3 ____
Chapter 19
Interrogating the Data
Answer– Fill in the Answers for the NULLIF Command Student_Table Student_ID _________ 423400 123250 234121
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Phillips Martin SR 3.00 Thomas Wendy FR 4.00
SELECT Fill in the Answer Last_Name Set below after ,NULLIF(Grade_Pt, 0) AS GP1 looking at the table ,NULLIF(Grade_Pt, 3.0) AS GP2 and the query. ,NULLIF(Grade_Pt, 4.0) AS GP3 FROM Student_Table WHERE Student_ID IN (423400, 123250, 234121) ORDER BY Last_Name ; Last_Name GP1 GP2 __________ ____ ____ ? 0.00 Larkins 3.00 ? Phillips 4.00 4.00 Thomas
GP3 ____ 0.00 3.00 ?
Look at the answers above. If it doesn’t make sense, go over it again until it does.
Page 696
Chapter 19
Interrogating the Data
The COALESCE Command – Fill In the Answers Student_Table Student_ID _________ 423400 260000 234121
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Johnson Stanley ? ? Thomas Wendy FR 4.00
SELECT Fill in the Answer Last_Name Set below after looking at the table ,Grade_Pt and the query. ,Class_Code ,COALESCE (Grade_Pt, Student_ID) as ValidStudents FROM Student_Table WHERE Last_Name IN ('Johnson', 'Larkins', 'Thomas') ORDER BY 1 ; Last_Name Grade_Pt __________ ________ Johnson Larkins Thomas
? 0.00 4.00
Class_Code __________ ValidStudents ___________ ? FR FR
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.
Page 697
Chapter 19
Interrogating the Data
The COALESCE Answer Set Student_Table Student_ID _________ 423400 260000 234121
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Johnson Stanley ? ? Thomas Wendy FR 4.00
SELECT Last_Name ,Grade_Pt ,Class_Code ,COALESCE (Grade_Pt, Student_ID) as ValidStudents FROM Student_Table WHERE Last_Name IN ('Johnson', 'Larkins', 'Thomas') ORDER BY 1 ;
Last_Name Grade_Pt __________ ________ Johnson Larkins Thomas
? 0.00 4.00
Class_Code __________ ValidStudents ___________ 260000 ? 0.00 FR 4.00 FR
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.
Page 698
Chapter 19
Interrogating the Data
COALESCE is Equivalent to This CASE Statement SELECT Last_Name ,Grade_Pt ,Class_Code ,COALESCE (Grade_Pt, Student_ID) as ValidStudents FROM Student_Table ; SELECT Last_Name ,Grade_Pt ,Class_Code , CASE WHEN Grade_Pt IS NOT NULL THEN Grade_Pt WHEN Student_ID IS NOT NULL THEN Class_Code ELSE NULL END as ValidStudents FROM Student_Table ;
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null. Above are two queries that return the exact same answer set. These example are designed to give you a better idea of how Coalesce works.
Page 699
Chapter 19
Interrogating the Data
The COALESCE Command Sample_Table Last_Name Home_Phone ___________ Work_Phone Cell_Phone __________ ___________ __________ Jones Patel Gonzales Nguyen
555-1234 ? ? ?
444-1234 456-7890 ? ?
? 454-6789 354-0987 ?
SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone) as Phone FROM Sample_Table ; Last_Name __________
Phone ______
Fill in the Answer Set above after looking at the table and the query
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.
Page 700
Chapter 19
Interrogating the Data
The COALESCE Answer Set Sample_Table
Last_Name Home_Phone ___________ Work_Phone Cell_Phone __________ ___________ __________ Jones Patel Gonzales Nguyen
555-1234 ? ? ?
444-1234 456-7890 ? ?
? 454-6789 354-0987 ?
SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone) as Phone FROM Sample_Table ;
Last_Name __________ Jones Patel Gonzales Nguyen
Phone ________ 555-1234 456-7890 354-0987 ?
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.
Page 701
Chapter 19
Interrogating the Data
The COALESCE Quiz Sample_Table Last_Name Home_Phone ___________ Work_Phone Cell_Phone __________ ___________ __________ Jones Patel Gonzales Nguyen
555-1234 ? ? ?
444-1234 456-7890 ? ?
? 454-6789 354-0987 ?
SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone, 'No Phone') as Phone FROM Sample_Table ; Last_Name __________
Phone ________
Fill in the answer set above after looking at the table and the query
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null. Since we decided in the above query we don’t want NULLs, notice we have placed a literal ‘No Phone’ in the list. How will this affect the Answer Set?
Page 702
Chapter 19
Interrogating the Data
Answer - The COALESCE Quiz Sample_Table Last_Name Home_Phone ___________ Work_Phone __________ Cell_Phone __________ ___________ Jones Patel Gonzales Nguyen
555-1234 ? ? ?
444-1234 456-7890 ? ?
? 454-6789 354-0987 ?
SELECT Last_Name ,COALESCE (Home_Phone, Work_Phone, Cell_Phone, 'No Phone') as Phone FROM Sample_Table ; Last_Name __________ Jones Patel Gonzales Nguyen
Phone ________ 555-1234 456-7890 354-0987 No Phone
Answers are above! We put a literal in the list so there’s no chance of NULL returning.
Page 703
Chapter 19
Interrogating the Data
The Basics of CAST (Convert and Store) CAST will convert a column or value’s data type temporarily into another data type. Below is the syntax:
SELECT CAST( AS [()] ) FROM ; Convert smallint to character
example using CAST:
CAST ( CAST ( CAST ( CAST ( CAST ( CAST (
AS CHAR(5) ) AS INTEGER ) AS SMALLINT ) AS BYTE (128) ) AS VARCHAR(5) ) AS FLOAT )
Truncates decimals
Data can be converted from one type to another by using the CAST function. As long as the data involved does not break any data rules (i.e. placing alphabetic or special characters into a numeric data type), the conversion works. The name of the CAST function comes from the Convert and Store operation that it performs.
Page 704
Chapter 19
Interrogating the Data
Some Great CAST (Convert and Store) Examples Nexus Chameleon File Edit View Query Tools Help Web Windows System: Greenplum
Systems + + + + + + + + + + + + + + +
Aster Data Azure Cloud DB2 Excel Greenplum Hadoop Kognitio Netezza Oracle Matrix Redshift SQL Server Sybase Teradata Vertica
Database: SQL Class
History
Sandbox
EXECUTE
?
New Query
Query 1
SELECT CAST('ABCDE' AS CHAR(1) ) AS Trunc ,CAST(128 AS CHAR(3) ) AS OK ,CAST(127 AS INTEGER ) AS Bigger FROM Dual
Messages
1
Garden of Analysis
TRUNC
OK
BIGGER
A
128
127
Result 1
The first CAST truncates the five characters (left to right) to form the single character ‘A’. In the second CAST, the integer 128 is converted to three characters and left justified in the output. The 127 was initially stored in a SMALLINT (5 digits - up to 32767) and then converted to an INTEGER. Hence, it uses 11 character positions for its display, ten numeric digits and a sign (positive assumed) and right justified as numeric.
Page 705
Chapter 19
Interrogating the Data
Some Great CAST (Convert and Store) Examples SELECT CAST(121.53 AS SMALLINT) AS Whole ,CAST(121.53 AS DECIMAL(3,0)) AS Rounder ;
______ _______ Whole Rounder 122 122.000000
The value of 121.53 was initially stored as a DECIMAL as 5 total digits with 2 of them to the right of the decimal point. Then, it is converted to a SMALLINT using CAST to remove the decimal positions, but notice it rounded up. On the other hand, the CAST in the column called Rounder is converted to a DECIMAL as 3 digits with no digits (3,0) to the right of the decimal, so it will round data values instead of truncating. Since .53 is greater than .5, it is rounded up to 122.
Page 706
Chapter 19
Interrogating the Data
Some Great CAST (Convert and Store) example SELECT Order_Number as OrdNo ,Customer_Number as CustNo ,Order_Date ,Order_Total ,CAST(Order_Total as integer) as Chopped ,CAST(Order_Total as Decimal(5,0)) as Rounded FROM Order_Table ;
OrdNo _________ CustNo Order_Date Order_Total _______ __________ __________ Chopped _______ 123585 123777 123512 123456 123552
87323456 57896883 11111111 11111111 31323134
10/10/1999 09/09/1999 01/01/1999 05/04/1998 10/01/1999
15231.62 23454.84 8005.91 12347.53 5111.47
15232 23455 8006 12348 5111
Rounded _______ 15232 23455 8006 12348 5111
The Column Chopped takes Order_Total (a Decimal (10,2) and CASTs it as an integer which chops off the decimals, but notice it still rounds up or down. Rounded CASTs Order_Total as a Decimal (5,0), which takes the decimals and rounds up if the decimal is .50 or above.
Page 707
Chapter 19
Interrogating the Data
Quiz - The Basics of the CASE Statements Course_Table Course_ID _________ 100 200 210 220 300 400
Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16
SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ Physical Database Design SQL Features
This is a CASE STATEMENT which allows you to evaluate a column in your table, and from that, come up with a new answer for your report. Every CASE begins with a CASE, and they all must end with a corresponding END. What would the answer be?
Page 708
Chapter 19
Interrogating the Data
Answer to Quiz - The Basics of the CASE Statements Course_Table Course_ID _________ 100 200 210 220 300 400
Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16
SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ ? Physical Database Design Two Credits SQL Features
The answer for the Physical Database Design class is null. This is because it fell through the case statement. The answer for the SQL Features course is Two Credits. Once a case statement gets a match, it leaves the statement and gets the next row. Page 709
Chapter 19
Interrogating the Data
Using an ELSE in the Case Statement Course_Table Course_ID _________ 100 200 210 220 300 400
Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16
SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' ELSE 'Four Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ Four Credits Physical Database Design Two Credits SQL Features
Now that we have an ELSE in our case statement we are guaranteed that nothing will fall through.
Page 710
Chapter 19
Interrogating the Data
Using an ELSE as a Safety Net Course_Table Course_ID _________ 100 200 210 220 300 400
Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16
SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' WHEN 4 THEN 'Four Credits' ELSE 'Do not know' END AS CreditAlias FROM Course_Table ; Now that we have an ELSE in our case statement we are guaranteed that nothing will fall through. An ELSE should be used in case you forgot a possibility and there was no match.
Page 711
Chapter 19
Interrogating the Data
Rules for a Valued Case Statement SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' Else 'Credits not found' END AS CreditAlias FROM Course_Table ;
The column Credits (in blue) follows the word CASE. This is a valued case statement. The value is the column Credits.
Rules for a Valued CASE: 1. You can only check for equality 2. You can only check the value of the column Credits
There are two types of CASE statements. There is the Valued CASE and the Searched CASE. Above are the rules for the Valued CASE statement.
Page 712
Chapter 19
Interrogating the Data
Rules for a Searched Case Statement SELECT Course_Name No Value follows the ,CASE word CASE. This is WHEN Credits Hash Join (cost=4.62..7.20 rows=7 width=50) Hash Cond: sc.course_id = c.course_id -> Hash Join (cost=2.23..4.58 rows=8 width=29) Hash Cond: sc.student_id = s.student_id -> Seq Scan on student_course_table sc (cost=0.00..2.14 rows=7 width=6) -> Hash (cost=2.10..2.10 rows=5 width=31) -> Seq Scan on student_table s (cost=0.00..2.10 rows=5 width=31) -> Hash (cost=2.24..2.24 rows=6 width=25) -> Broadcast Motion 2:2 (slice1; segments: 2) (cost=0.00..2.24 rows=6 width=25) -> Seq Scan on course_table c (cost=0.00..2.06 rows=3 width=25)
What few people in the world understand about joins is that two rows being joined need to be on the same segment. Because the Student_Course_Table and the Student_Table are joined on Student_ID, and both have a Distribution Key of Student_ID, so the matching rows naturally reside on the same segment. These tables are joined first. After they produce an intermediate answer set, the course_table is then broadcast to both segments for the final join.
Page 849
Chapter 25
Greenplum Explain
Explain of a Derived Table vs. a Correlated Subquery Both queries will return all columns from the Employee_Table if the employee makes a salary > Avg(Salary) within their own department. Correlated Subquery
Derived Table
SELECT * FROM Employee_Table as E WHERE Salary > (SELECT AVG(Salary) FROM Employee_Table as EE WHERE E.Dept_No = EE.Dept_No) ;
SELECT E.* FROM Employee_Table as E INNER JOIN (SELECT Dept_No , AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No AND Salary > AVGSAL
Both queries return the exact same answer set Employee_No Dept_No Last_Name First_Name _______ Salary ____________ ________ _________ __________ 1333454 1256349 1121334
200 Smith 400 Harrison 400 Strickling
John Herbert Cletus
48000.00 54500.00 54500.00
The three rows in the answer set are employees making a greater salary than the average salary within their dept_no. We were able to do this through a correlated subquery and a derived table. Now, we can compare the EXPLAIN plans. Page 850
Chapter 25
Greenplum Explain
Explain of the Correlated Subquery Correlated Subquery SELECT * FROM Employee_Table as E WHERE Salary > (SELECT AVG(Salary) FROM Employee_Table as EE WHERE E.Dept_No = EE.Dept_No) ; Gather Motion 2:1 (slice3; segments: 2) (cost=2.56..4.92 rows=4 width=43) -> Hash Join (cost=2.56..4.92 rows=2 width=43) Hash Cond: e.dept_no = "Expr_SUBQUERY".csq_c0 Join Filter: e.salary > "Expr_SUBQUERY".csq_c1 -> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..2.27 rows=5 width=43) Hash Key: e.dept_no -> Seq Scan on employee_table e (cost=0.00..2.09 rows=5 width=43) -> Hash (cost=2.48..2.48 rows=3 width=34) -> HashAggregate (cost=2.35..2.42 rows=3 width=34) Group By: ee.dept_no -> Redistribute Motion 2:2 (slice2; segments: 2) (cost=2.13..2.25 rows=3 width=34) Hash Key: ee.dept_no -> HashAggregate (cost=2.13..2.13 rows=3 width=34) Group By: ee.dept_no -> Seq Scan on employee_table ee (cost=0.00..2.09 rows=5 width=11)
The next page shows the EXPLAIN plan of the Derived table. Both plans are close to the same.
Page 851
Chapter 25
Greenplum Explain
Explain of the Derived Table SELECT E.* FROM Employee_Table as E INNER JOIN (SELECT Dept_No , AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No AND Salary > AVGSAL
Gather Motion 2:1 (slice3; segments: 2) (cost=2.56..4.92 rows=4 width=43) -> Hash Join (cost=2.56..4.92 rows=2 width=43) Hash Cond: e.dept_no = teratom.dept_no Join Filter: e.salary > teratom.avgsal -> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..2.27 rows=5 width=43) Hash Key: e.dept_no -> Seq Scan on employee_table e (cost=0.00..2.09 rows=5 width=43) -> Hash (cost=2.48..2.48 rows=3 width=34) -> HashAggregate (cost=2.35..2.42 rows=3 width=34) Group By: employee_table.dept_no -> Redistribute Motion 2:2 (slice2; segments: 2) (cost=2.13..2.25 rows=3 width=34) Hash Key: employee_table.dept_no -> HashAggregate (cost=2.13..2.13 rows=3 width=34) Group By: employee_table.dept_no -> Seq Scan on employee_table (cost=0.00..2.09 rows=5 width=11)
The previous page showed the EXPLAIN plan of the Correlated Subquery. Both plans are close to the same.
Page 852
Chapter 26
Page 853
Statistical Aggregate Functions
Chapter 26
Statistical Aggregate Functions
Chapter 26 – Statistical Aggregate Functions
"You can make more friends in two months by becoming interested in other people than you will in two years by trying to get other people interested in you." - Dale Carnegie
Page 854
Chapter 26
Statistical Aggregate Functions
The Stats Table Col1 Col3 ____ Col4 _____ Col5 _____ Col6 ____ Col2 ____ ____ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Above is the Stats table. This will be used for our statistical examples.
Page 855
Chapter 26
Statistical Aggregate Functions
The STDDEV_POP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax for using STDDEV_POP: STDDEV_POP() SELECT STDDEV_POP(col1) AS SDPCol1 FROM Stats_Table; __________________ SDPCol1
8.6554414483991899 The standard deviation function is a statistical measure of spread or dispersion of values. It is the root’s square of the difference of the mean (average). This measure is to compare the amount by which a set of values differs from the arithmetical mean. The STDDEV_POP function is one of two that calculates the standard deviation. The population is of all the rows included based on the comparison in the WHERE clause.
Page 856
Chapter 26
Statistical Aggregate Functions
A STDDEV_POP Example
1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The STDDEV_POP function is one of two that calculates the standard deviation.
SELECT STDDEV_POP(col1) ,STDDEV_POP(col2) ,STDDEV_POP(col3) ,STDDEV_POP(col4) ,STDDEV_POP(col5) ,STDDEV_POP(col6) FROM Stats_Table;
AS Col1 AS Col2 AS Col3 AS Col4 AS Col5 AS Col6
Col1 Col2 ____ Col3 _____ Col4 _____ Col5 Col6 ____ _____ _____ 8.66
Page 857
4.39 13.82
8.66 4.42 26.89
The standard deviation function is a statistical measure of spread or dispersion of values.
Chapter 26
Statistical Aggregate Functions
The STDDEV_SAMP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax for using STDDEV_SAMP: STDDEV_SAMP() SELECT STDDEV_SAMP(col1) AS SDSCol1 FROM Stats_Table; SDSCol1 _________________ 8.8034084308295046 The standard deviation function is a statistical measure of spread or dispersion of values. It is the root’s square of the difference of the mean (average). This measure is to compare the amount by which a set of values differs from the arithmetical mean. The STDDEV_SAMP function is one of two that calculates the standard deviation. The sample is a random selection of all rows returned based on the comparisons in the WHERE clause. The population is for all of the rows based on the WHERE clause.
Page 858
Chapter 26
Statistical Aggregate Functions
A STDDEV_SAMP Example
1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
The STDDEV_SAMP function is one of two that calculates the standard deviation.
SELECT STDDEV_POP(col1) AS Col1 ,STDDEV_POP(col2) AS Col2 ,STDDEV_POP(col3) AS Col3 ,STDDEV_POP(col4) AS Col4 ,STDDEV_POP(col5) AS Col5 ,STDDEV_POP(col6) AS Col6 FROM Stats_Table; Col1 Col2 Col3 Col4 Col5 Col6 ____ _____ ____ _____ _____ _____ 8.66 4.39 13.82 8.66 4.42 26.89
Page 859
Chapter 26
Statistical Aggregate Functions
The VAR_POP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax for using VAR_POP:
VAR_POP() SELECT VAR_POP(col1) AS VPCol1 FROM Stats_Table;
VPCol1 ___________________ 74.9166666666666667
Page 860
Chapter 26
Statistical Aggregate Functions
A VAR_POP Example 1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation. There are two forms of Variance in Oracle, VAR_POP is for the entire population of data rows allowed by the WHERE clause.
SELECT VAR_POP(col1) AS Col1 ,VAR_POP(col2) AS Col2 ,VAR_POP(col3) AS Col3 ,VAR_POP(col4) AS Col4 ,VAR_POP(col5) AS Col5 ,VAR_POP(col6) AS Col6 FROM Stats_Table;
Col1 Col2 _____ Col3 Col4 Col5 Col6 ____ _____ _____ _____ _____ 74.92 19.29 191.06 74.92 19.58 722.81
Page 861
Chapter 26
Statistical Aggregate Functions
The VAR_SAMP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax for using VAR_SAMP: VAR_SAMP()
SELECT VAR_SAMP(col1) AS VSCol1 FROM Stats_Table; VSCol1 _______ 77.50
Page 862
Chapter 26
Statistical Aggregate Functions
A VAR_SAMP Example
1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation. There are two forms of Variance in Hadoop, VAR_SAMP is used for a random sampling of the data rows allowed through by the WHERE clause.
SELECT VAR_SAMP(col1) AS Col1 ,VAR_SAMP(col2) AS Col2 ,VAR_SAMP(col3) AS Col3 ,VAR_SAMP(col4) AS Col4 ,VAR_SAMP(col5) AS Col5 ,VAR_SAMP(col6) AS Col6 FROM Stats_Table ;
Col1 Col2 _____ Col3 Col4 Col5 Col6 ____ _____ _____ _____ _____ 17.50 19.95 197.65 77.50 20.25 747.73
Page 863
Chapter 26
Statistical Aggregate Functions
The VARIANCE Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax for using VARIANCE:
VARIANCE() SELECT VARIANCE (col1) AS VSCol1 FROM Stats_Table; VSCol1 _______ 77.50 The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation.
Page 864
Chapter 26
Statistical Aggregate Functions
A VARIANCE Example
1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation. There are two forms of Variance in Hadoop, VAR_SAMP is used for a random sampling of the data rows allowed through by the WHERE clause.
SELECT VARIANCE(col1) AS Col1 ,VARIANCE(col2) AS Col2 ,VARIANCE (col3) AS Col3 ,VARIANCE(col4) AS Col4 ,VARIANCE(col5) AS Col5 ,VARIANCE(col6) AS Col6 FROM Stats_Table ; Col1 Col2 _____ Col3 Col4 Col5 Col6 ____ _____ _____ _____ _____ 74.92 19.29 191.06 74.92 19.58 722.81
Page 865
Chapter 26
Statistical Aggregate Functions
The CORR Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax for using CORR: CORR(, ) SELECT CORR(col1, col2) AS CCol1and2 FROM Stats_Table; CCol1and2 _________ 0.99 The correlation coefficient is a number between -1 and 1. It is calculated from a number of pairs of observations or linear points (X,Y) Where: 1 = perfect positive correlation 0 = no correlation -1 = perfect negative correlation
The CORR function is a binary function, meaning that two variables are used as input to it. It measures the association between 2 random variables. If the variables are such that when one changes the other does so in a related manner, they are correlated. Independent variables are not correlated because the change in one does not necessarily cause the other to change.
Page 866
Chapter 26
Statistical Aggregate Functions
A CORR Example
1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
Where: 1 = perfect positive correlation 0 = no correlation -1 = perfect negative correlation
SELECT CORR(col1, col2) AS C1_2 ,CORR(col1, col3) AS C1_3 ,CORR(col1, col4) AS C1_4 ,CORR(col1, col5) AS C1_5 ,CORR(col1, col6) AS C1_6 FROM Stats_Table ; C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 ____ _____ 0.99 0.89 -1.00 -0.15 0.99
Page 867
Chapter 26
Statistical Aggregate Functions
Another CORR Example so you can Compare 1 2 3 4 5 6 Col
Page 868
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
SELECT CORR(col4, col2) AS C4_2 ,CORR(col4, col3) AS C4_3 ,CORR(col4, col1) AS C4_1 ,CORR(col4, col5) AS C4_5 ,CORR(col4, col6) AS C4_6 FROM Stats_Table ;
SELECT CORR(col1, col2) AS C1_2 ,CORR(col1, col3) AS C1_3 ,CORR(col1, col4) AS C1_4 ,CORR(col1, col5) AS C1_5 ,CORR(col1, col6) AS C1_6 FROM Stats_Table ;
C4_2 C4_3 _____ C4_1 _____ C4_5 _____ C4_6 ____ _____ -0.99 -0.89 -1.00 0.15 -0.99
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 ____ _____ 0.99 0.89 -1.00 -0.15 0.99
Chapter 26
Statistical Aggregate Functions
The COVAR_POP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax:
COVAR(, ) SELECT COVAR_POP(col1, col2) AS CCol1_2 FROM Stats_Table;
CCol1_2 _______ 37.5 The covariance is a statistical measure of the tendency of two variables to change in conjunction with each other. It is equal to the product of their standard deviations and correlation coefficients. The covariance is a statistic used for bivariate samples or bivariate distribution. It is used for working out the equations for regression lines and the product-moment correlation coefficient.
Page 869
Chapter 26
Statistical Aggregate Functions
A COVAR_POP Example
1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
The covariance is a statistical measure of the tendency of two variables to change in conjunction with each other. It is equal to the product of their standard deviations and correlation coefficients.
SELECT COVAR_POP(col1, col2) ,COVAR_POP(col1, col3) ,COVAR_POP(col1, col4) ,COVAR_POP(col1, col5) ,COVAR_POP(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 ______ C1_4 _____ C1_5 ______ C1_6 _____ ______ 37.50 105.90 -74.92 -5.82 230.75
Page 870
Chapter 26
Statistical Aggregate Functions
Another COVAR_POP Example so you can Compare 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 2 3 4 5 6
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT COVAR_POP(col4, col2) ,COVAR_POP(col4, col3) ,COVAR_POP(col4, col1) ,COVAR_POP(col4, col5) ,COVAR_POP(col4, col6) FROM Stats_Table ;
AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6
C4_2 C4_3 ______ C4_1 _____ C4_5 ______ C4_6 _____ ______ -37.50 -105.90 -74.92 5.82 -230.75
Page 871
SELECT COVAR_POP(col1, col2) ,COVAR_POP(col1, col3) ,COVAR_POP(col1, col4) ,COVAR_POP(col1, col5) ,COVAR_POP(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 ______ C1_4 _____ C1_5 ______ C1_6 _____ ______ 37.50 105.90 -74.92 -5.82 230.75
Chapter 26
Statistical Aggregate Functions
The COVAR_SAMP Function Col1 Numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Syntax: COVAR_SAMP (expression1,expression2) SELECT COVAR_SAMP(col1, col2) AS CCol1_2 FROM Stats_Table;
CCol1_2 _______ 38.79 The COVAR_SAMP function is sample covariance.
Page 872
Chapter 26
Statistical Aggregate Functions
A COVAR_SAMP Example
1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
The function eliminates all expression pairs where either expression in the pair is NULL.
SELECT COVAR_SAMP (col1, col2) AS C1_2 ,COVAR_SAMP(col1, col3) AS C1_3 ,COVAR_SAMP(col1, col4) AS C1_4 ,COVAR_SAMP(col1, col5) AS C1_5 ,COVAR_SAMP (col1, col6) AS C1_6 FROM Stats_Table ; C1_2 C1_3 ______ C1_4 _____ C1_5 ______ C1_6 _____ ______ 38.79 109.55 -77.50 -6.02 238.71
Page 873
Chapter 26
Statistical Aggregate Functions
Another COVAR_SAMP Example so you can Compare 1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT COVAR_SAMP(col1, col2) AS C1_2 ,COVAR_SAMP (col1, col3) AS C1_3 ,COVAR_SAMP (col1, col4) AS C1_4 ,COVAR_SAMP (col1, col5) AS C1_5 ,COVAR_SAMP (col1, col6) AS C1_6 FROM Stats_Table ; C1_2 C1_3 ______ C1_4 _____ C1_5 ______ C1_6 _____ ______ 38.79 109.55 -77.50 -6.02 238.71
Page 874
SELECT COVAR_SAMP (col4, col2) ,COVAR_SAMP (col4, col3) ,COVAR_SAMP (col4, col1) ,COVAR_SAMP (col4, col5) ,COVAR_SAMP (col4, col6) FROM Stats_Table ;
AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6
C4_2 C4_3 ______ C4_1 _____ C4_5 ______ C4_6 _____ ______ -38.79 -109.55 -77.50 6.02 -238.71
Chapter 26
Statistical Aggregate Functions
The REGR_INTERCEPT Function Syntax for using REGR_INTERCEPT:
REGR_INTERCEPT(dependent-expression, independent-expression)
SELECT REGR_INTERCEPT(col1, col2) AS RIofCol1_2 FROM Stats_Table;
RIofCol1_2 __________ -1.35
Page 875
Chapter 26
Statistical Aggregate Functions
A REGR_INTERCEPT Example 1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 A regression line is a line of best fit, drawn through a set of points on a graph for X and Y coordinates. It uses the Y coordinate as the Dependent Variable and the X value as the Independent Variable. Two regression lines always meet or intercept at the mean of the data points(x,y), where x=AVG(x) and y=AVG(y) and is not usually one of the original data points.
Page 876
SELECT REGR_INTERCEPT(col1, col2) AS C1_2 ,REGR_INTERCEPT(col1, col3) AS C1_3 ,REGR_INTERCEPT(col1, col4) AS C1_4 ,REGR_INTERCEPT(col1, col5) AS C1_5 ,REGR_INTERCEPT(col1, col6) AS C1_6 FROM Stats_Table ; C1_2 C1_4 _____ C1_5 C1_6 _____ C1_3 _____ _____ _____ -1.35
3.45
31.00 17.65
-0.83
Chapter 26
Statistical Aggregate Functions
Another REGR_INTERCEPT Example so you can Compare 1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT REGR_INTERCEPT(col1, col2) AS C1_2 ,REGR_INTERCEPT(col1, col3) AS C1_3 ,REGR_INTERCEPT(col1, col4) AS C1_4 ,REGR_INTERCEPT(col1, col5) AS C1_5 ,REGR_INTERCEPT(col1, col6) AS C1_6 FROM Stats_Table ;
SELECT REGR_INTERCEPT(col4, col2) AS C4_2 ,REGR_INTERCEPT(col4, col3) AS C4_3 ,REGR_INTERCEPT(col4, col1) AS C4_1 ,REGR_INTERCEPT(col4, col5) AS C4_5 ,REGR_INTERCEPT(col4, col6) AS C4_6 FROM Stats_Table ;
C1_2 C1_4 _____ C1_5 C1_6 _____ C1_3 _____ _____ _____
C4_2 C4_1 _____ C4_5 C4_6 _____ C4_3 _____ _____ _____
-1.35
32.35 27.55 31.00
Page 877
3.45
31.00 17.65
-0.83
13.35 31.83
Chapter 26
Statistical Aggregate Functions
The REGR_SLOPE Function Syntax for using REGR_SLOPE:
REGR_SLOPE(dependent-expression, independent-expression)
SELECT REGR_SLOPE(col1, col2) AS RSCol1_2 FROM Stats_Table;
RSCol1_2 _________ 1.94
A regression line is a line of best fit, drawn through a set of points on a graph of X and Y coordinates. It uses the Y coordinate as the Dependent Variable, and the X value as the Independent Variable. The slope of the line is the angle at which it moves on the X and Y coordinates. The vertical slope is Y on X and the horizontal slope is X on Y.
Page 878
Chapter 26
Statistical Aggregate Functions
A REGR_SLOPE Example 1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
Col
A regression line is a line of best fit, drawn through a set of points on a graph of X and Y coordinates. It uses the Y coordinate as the Dependent Variable, and the X value as the Independent Variable. The slope of the line is the angle at which it moves on the X and Y coordinates. The vertical slope is Y on X and the horizontal slope is X on Y.
Page 879
SELECT REGR_SLOPE(col1, col2) ,REGR_SLOPE(col1, col3) ,REGR_SLOPE(col1, col4) ,REGR_SLOPE(col1, col5) ,REGR_SLOPE(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 1.94 0.55 -1.00 -0.30 0.32
Chapter 26
Statistical Aggregate Functions
Another REGR_SLOPE Example so you can Compare 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 2 3 4 5 6
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT REGR_SLOPE(col1, col2) ,REGR_SLOPE(col1, col3) ,REGR_SLOPE(col1, col4) ,REGR_SLOPE(col1, col5) ,REGR_SLOPE(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 1.94 0.55 -1.00 -0.30 0.32
Page 880
SELECT REGR_SLOPE(col4, col2) ,REGR_SLOPE(col4, col3) ,REGR_SLOPE(col4, col1) ,REGR_SLOPE(col4, col5) ,REGR_SLOPE(col4, col6) FROM Stats_Table ;
AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6
C4_2 C4_3 _____ C4_1 _____ C4_5 _____ C4_6 _____ _____ -1.94 -0.55 -1.00 0.30 -0.32
Chapter 26
Statistical Aggregate Functions
The REGR_AVGX Function Syntax for using REGR_AVGX:
REGR_AVGX(dependent-expression, independent-expression)
SELECT REGR_AVGX(col1, col2) AS RSCol1_2 FROM Stats_Table;
RSCol1_2 _________ 8.67
The REGR_AVGX function is the average of the independent variable (sum(X)/N).
Page 881
Chapter 26
Statistical Aggregate Functions
A REGR_AVGX Example
1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The REGR_AVGX function is the average of the independent variable (sum(X)/N).
SELECT REGR_AVGX(col1, col2) ,REGR_AVGX(col1, col3) ,REGR_AVGX(col1, col4) ,REGR_AVGX(col1, col5) ,REGR_AVGX(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 8.67 21.73 15.5 7.23 51.17
Page 882
Chapter 26
Statistical Aggregate Functions
Another REGR_AVGX Example so you can Compare 1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT REGR_AVGX(col1, col2) ,REGR_AVGX(col1, col3) ,REGR_AVGX(col1, col4) ,REGR_AVGX(col1, col5) ,REGR_AVGX(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 8.67 21.73 15.5 7.23 51.17
Page 883
SELECT REGR_AVGX(col4, col2) ,REGR_AVGX(col4, col3) ,REGR_AVGX(col4, col1) ,REGR_AVGX(col4, col5) ,REGR_AVGX(col4, col6) FROM Stats_Table ;
AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6
C4_2 C4_3 _____ C4_1 _____ C4_5 _____ C4_6 _____ _____ 8.67 21.73 15.5 7.23 51.17
Chapter 26
Statistical Aggregate Functions
The REGR_AVGY Function Syntax for using REGR_AVGX: REGR_AVGX(dependent-expression, independent-expression)
SELECT REGR_AVGX(col1, col2) AS RSCol1_2 FROM Stats_Table;
RSCol1_2 _________ 8.67
The REGR_AVGX function is the average of the independent variable (sum(X)/N).
Page 884
Chapter 26
Statistical Aggregate Functions
A REGR_AVGY Example
1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The REGR_AVGY function is the average of the independent variable (sum(X)/N).
SELECT REGR_AVGY(col1, col2) ,REGR_AVGY(col1, col3) ,REGR_AVGY(col1, col4) ,REGR_AVGY(col1, col5) ,REGR_AVGY(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 8.67 21.73 15.5 7.23 51.17
Page 885
Chapter 26
Statistical Aggregate Functions
Another COVAR_POP Example so you can Compare 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 2 3 4 5 6
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
SELECT REGR_AVGY(col1, col2) ,REGR_AVGY(col1, col3) ,REGR_AVGY(col1, col4) ,REGR_AVGY(col1, col5) ,REGR_AVGY(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 8.67 21.73 15.5 7.23 51.17
Page 886
SELECT REGR_AVGY(col4, col2) ,REGR_AVGY(col4, col3) ,REGR_AVGY(col4, col1) ,REGR_AVGY(col4, col5) ,REGR_AVGY(col4, col6) FROM Stats_Table ;
AS C4_2 AS C4_3 AS C4_1 AS C4_5 AS C4_6
C4_2 C4_3 _____ C4_1 _____ C4_5 _____ C4_6 _____ _____ 8.67
21.73 15.5
7.23 51.17
Chapter 26
Statistical Aggregate Functions
The REGR_COUNT Function Syntax for using REGR_COUNT: REGR_COUNT(dependent-expression, independentexpression)
SELECT REGR_COUNT(col1, col2) AS RSCol1_2 FROM Stats_Table; RSCol1_2 _________ 30
The REGR_COUNT is the number of input rows in which both expressions are non-null.
Page 887
Chapter 26
Statistical Aggregate Functions
A REGR_COUNT Example
1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The REGR_COUNT function is the number of input rows in which both expressions are non-null.
SELECT REGR_COUNT(col1, col2) ,REGR_COUNT(col1, col3) ,REGR_COUNT(col1, col4) ,REGR_COUNT(col1, col5) ,REGR_COUNT(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 30 30 30 30 30
Page 888
Chapter 26
Statistical Aggregate Functions
The REGR_R2 Function Syntax for using REGR__R2: REGR_R2(Y, X)
SELECT REGR_R2(col1, col2) AS RSCol1_2 FROM Stats_Table;
RSCol1_2 _________ 0.97
The REGR_R2 is the square of the correlation coefficient.
Page 889
Chapter 26
Statistical Aggregate Functions
A REGR_R2 Example 1 2 3 4 5 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Col
The REGR_R2 is the square of the correlation coefficient.
SELECT REGR_R2(col1, col2) ,REGR_R2(col1, col3) ,REGR_R2(col1, col4) ,REGR_R2(col1, col5) ,REGR_R2(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 C1_3 _____ C1_4 _____ C1_5 _____ C1_6 _____ _____ 0.97 0.78 1 0.02 0.98
Page 890
Chapter 26
Statistical Aggregate Functions
The REGR_SXX Function Syntax for using REGR_SXX: REGR_SXX(Y, X)
SELECT REGR_SXX(col1, col2) AS RSCol1_2 FROM Stats_Table;
RSCol1_2 _________ 578.67
The REGR_SXX is the sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable).
Page 891
Chapter 26
Statistical Aggregate Functions
A REGR_SXX Example
1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100 The REGR_SXX is the sum(X^2) sum(X)^2/N ("sum of squares" of the independent variable).
SELECT REGR_SXX(col1, col2) ,REGR_SXX(col1, col3) ,REGR_SXX(col1, col4) ,REGR_SXX(col1, col5) ,REGR_SXX(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 ______ C1_3 _____ C1_4 _____ C1_5 _______ C1_6 ______ 578.67 5731.87 2247.5 587.37 21684.17
Page 892
Chapter 26
Statistical Aggregate Functions
The REGR_SXY Function Syntax for using REGR_SXY: REGR_SXY(Y, X)
SELECT REGR_SXY(col1, col2) AS RSCol1_2 FROM Stats_Table;
RSCol1_2 _________ 1125
The REGR_SXY is the sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable).
Page 893
Chapter 26
Statistical Aggregate Functions
A REGR_SXY Example
1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The REGR_SXY is the sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable).
SELECT REGR_SXY(col1, col2) ,REGR_SXY(col1, col3) ,REGR_SXY(col1, col4) ,REGR_SXY(col1, col5) ,REGR_SXY(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 ______ C1_3 _____ C1_4 _____ C1_5 _______ C1_6 ______ 1125 3177 -2247.5 -174.5 6922.5
Page 894
Chapter 26
Statistical Aggregate Functions
The REGR_SYY Function Syntax for using REGR_SYY: REGR_SYY(Y, X)
SELECT REGR_SYY(col1, col2) AS RSCol1_2 FROM Stats_Table;
RSCol1_2 _________ 2247.5
The REGR_SYY is the sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable).
Page 895
Chapter 26
Statistical Aggregate Functions
A REGR_SYY Example 1 2 3 4 5 6 Col
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16 1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1 0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
The REGR_SYY is the sum(Y^2) sum(Y)^2/N ("sum of squares" of the dependent variable).
SELECT REGR_SYY(col1, col2) ,REGR_SYY(col1, col3) ,REGR_SYY(col1, col4) ,REGR_SYY(col1, col5) ,REGR_SYY(col1, col6) FROM Stats_Table ;
AS C1_2 AS C1_3 AS C1_4 AS C1_5 AS C1_6
C1_2 ______ C1_3 _____ C1_4 _____ C1_5 _______ C1_6 ______ 2247.5 2247.5 2247.5 2247.5 2247.5
Page 896
Chapter 26
Statistical Aggregate Functions
Using GROUP BY SELECT col3 ,count(*) AS Cnt ,avg(col1) AS Avg1 ,stddev_pop(col1) AS SD1 ,var_pop(col1) AS VP1 ,avg(col4) AS Avg4 ,stddev_pop(col4) AS SD4 ,var_pop(col4) AS VP4 ,avg(col6) AS Avg6 ,stddev_pop(col6) AS SD6 ,var_pop(col6) AS VP6 FROM Stats_Table GROUP BY 1 ORDER BY 1; Col3 ____ 1 10 20 30 40 50 60
Page 897
Cnt ___ 2 7 14 2 2 2 1
Avg1 ____ 1.50 6.00 16.50 24.50 26.50 28.50 30.00
SD1 Avg4 ____ VP1 ___ _____ 0.50 0.25 29.50 2.00 4.00 25.00 4.03 16.25 14.50 0.50 0.25 6.50 0.50 0.25 4.50 0.50 0.25 2.50 0.00 0.00 1.00
SD4 ___ 0.50 2.00 4.03 0.50 0.50 0.50 0.00
VP4 ____ 0.25 4.00 16.25 0.25 0.25 0.25 0.00
Avg6 SD6 ____ ____ 2.50 2.50 24.29 8.63 53.57 10.76 75.00 5.00 87.50 2.50 92.50 2.50 00.00 0.00
VP6 ___ 6.25 74.49 115.82 25.00 6.25 6.25 0.00