SQL Server Analytical Toolkit: Using Windowing, Analytical, Ranking, and Aggregate Functions for Data and Statistical Analysis 9781484286661, 9781484286678

Learn window function foundational concepts through a cookbook-style approach, beginning with an introduction to the OVE

213 67 85MB

English Pages 1069 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

SQL Server Analytical Toolkit: Using Windowing, Analytical, Ranking, and Aggregate Functions for Data and Statistical Analysis
 9781484286661, 9781484286678

Table of contents :
Table of Contents
About the Author
About the Technical Reviewer
Introduction
Chapter 1: Partitions, Frames, and the OVER() Clause
What Are Partitions and Window Frames?
What Is an OVER() Clause?
History of the OVER() Clause and Window Functions
The Window Functions
The OVER() Clause
Syntax
Partitions and Frames
ROWS Frame Definition
RANGE Frame Definition
Example 1
ROWS and RANGE Default Behavior
Scenario 1
Scenario 2
ROWS and RANGE Window Frame Examples
Data Set
Example 2
Example 3
Example 4
Example 5
Summary
Chapter 2: Sales DW Use Case: Aggregate Functions
Sales Data Warehouse
Sales Data Warehouse Conceptual Model
A Word About Performance Tuning
Aggregate Functions
COUNT(), MAX(), MIN(), AVG(), and SUM() Functions
With OVER()
GROUPING() Function
GROUPING: Performance Tuning Considerations
STRING_AGG Function
STDEV() and STDEVP() Functions
STDEV: Performance Tuning Considerations
VAR() and VARP() Functions
SQL Server 2022: Named Window Example
Summary
Chapter 3: Sales Use Case: Analytical Functions
Analytical Functions
CUME_DIST() Function
Performance Considerations
PERCENT_RANK() Function
Performance Considerations
High-Performance Strategy
LAST_VALUE() and FIRST_VALUE()
Performance Considerations
LAG() and LEAD()
Performance Considerations
PERCENTILE_CONT() and PERCENTILE_DISC()
Performance Considerations
Using a Report Table
Summary
Chapter 4: Sales Use Case: Ranking/Window Functions
Ranking/Window Functions
RANK() vs. PERCENT_RANK()
Performance Considerations
RANK() vs. DENSE_RANK()
Performance Considerations
NTILE() Function Revisited
Performance Considerations
ROW_NUMBER() Function
Performance Considerations
Islands and Gaps Example
Summary
Chapter 5: Finance Use Case: Aggregate Functions
Aggregate Functions
COUNT() and SUM() Functions
Performance Considerations
SUM() Function
Performance Considerations
MIN() and MAX() Functions
Performance Considerations
AVG() Function
Performance Considerations
GROUPING Function
Performance Considerations
STRING_AGG() Function
STDEV() and STDEVP() Functions
Performance Considerations
VAR() and VARP() Functions
Ticker Analysis
More Non-statistical Variance
Even More Statistical Variance
Summary
Chapter 6: Finance Use Case: Ranking Functions
Ranking Functions
RANK() Function
Example 1
Performance Considerations
Example 2
Performance Considerations
DENSE_RANK() Function
Example 1
Performance Considerations
Example 2
Performance Considerations
NTILE() Function
Example 1
Performance Considerations
Example 2
Performance Considerations
ROW_NUMBER() Function
Performance Considerations
The Data Gaps and Islands Problem
Step 1: Create the First CTE
Step 2: Set Up the Second CTE to Label Gaps
Step 3: Set Up the Third CTE and Identify Start/Stop Dates of Gaps
Step 4: Generate the Report
Performance Considerations
Islands Next
Step 1: Create the First CTE Using LAG() and LEAD()
Step 2: Create the Second CTE That Labels Islands and Gaps
Step 3: Identify Island Start/Stop Dates
Step 4: Create the Final Report
Summary
Chapter 7: Finance Use Case: Analytical Functions
Analytical Functions
CUME_DIST() Function
Performance Considerations
FIRST_VALUE() and LAST_VALUE() Functions
Performance Considerations
LAG() and LEAD() Functions
LAG() Function
Performance Considerations
LEAD() Function
Performance Considerations
Memory-Optimized Strategy
PERCENT_RANK() Function
Performance Considerations
PERCENTILE_CONT() and PERCENTILE_DISC()
PERCENTILE_CONT()
Performance Considerations
PERCENTILE_DISC Function
Performance Considerations
Multi-memory-enhanced-table Strategy
Performance Considerations
Summary
Chapter 8: Plant Use Case: Aggregate Functions
Aggregate Functions
Data Model
Data Dictionaries
Entity Data Dictionary
Entity Attribute Data Dictionary
Entity Relationship Data Dictionary: Equipment Failure Subject Area
Entity Relationship Data Dictionary: Equipment Status History
Entity Relationship Data Dictionary: Plant Expense
COUNT() Function
AVG() Function
MIN() and MAX() Functions
GROUPING() Function
STRING_AGG() Function
STDEV() and STDEVP() Functions
VAR() and VARP() Functions
Example 1: Rolling Variance
Example 2: Variance by Quarter
Example 3: Variance by Year
Performance Considerations
Memory-Optimized Table Approach
Create a File and File Group
Create the Memory-Optimized Table
Load the Memory-Optimized Table
Estimated Query Plan
Seven-Million-Row Query: Performance Tuning
Summary
Chapter 9: Plant Use Case: Ranking Functions
Ranking Functions
RANK() Function
Performance Considerations
Performance Considerations
Performance Considerations
DENSE_RANK() Function
Performance Considerations
NTILE Function
Performance Considerations
ROW_NUMBER() Function
Performance Considerations
Summary
Chapter 10: Plant Use Case: Analytical Functions
Analytical Functions
CUME_DIST() Function
Performance Considerations
FIRST_VALUE() and LAST_VALUE() Functions
Performance Considerations
LAG() Function
Performance Considerations
LEAD() Function
Performance Considerations
PERCENT_RANK() Function
Performance Considerations
PERCENTILE_CONT Function
Performance Considerations
PERCENTILE_DISC() Function
Performance Considerations
Our Usual Report Table Solution
SQL Server Analysis Services
Summary
Chapter 11: Inventory Use Case: Aggregate Functions
The Inventory Database
The Inventory Data Warehouse
Loading the Inventory Data Warehouse
Aggregate Functions
COUNT(), SUM(), MAX(), MIN(), and AVG() Functions
Performance Considerations
AVG() Function
Performance Considerations
Data Warehouse Query
Performance Considerations
STDEV() Function
Performance Considerations
Data Warehouse Query
Performance Considerations
VAR() Function
Performance Considerations
Enhancing the SSIS Package
Summary
Chapter 12: Inventory Use Case: Ranking Functions
Ranking Functions
RANK() Function
Performance Considerations
Querying the Data Warehouse
DENSE_RANK() Function
Performance Considerations
NTILE() Function
Performance Considerations
ROW_NUMBER() Function
Performance Considerations
Create an SSRS Report
Report Builder Mini Tutorial
Create a Power BI Report
Summary
Chapter 13: Inventory Use Case: Analytical Functions
Analytical Functions
CUME_DIST() Function
Performance Considerations
FIRST_VALUE() and LAST_VALUE() Functions
Performance Considerations
LAG() Function
Performance Considerations
LEAD() Function
Performance Considerations
PERCENT_RANK() Function
Performance Considerations
PERCENTILE_CONT() Function
Performance Considerations
PERCENTILE_DISC() Function
Performance Considerations
Overall Performance Considerations
Report Builder Examples
Summary
Chapter 14: Summary, Conclusions, and Next Steps
Summary
Our Journey
About the Code
About the Database Folders
Data Used in the Examples
The Toolkit
SQL Server
SSMS
The Window Functions
The Visual Studio Community License
SSAS Projects
SSIS Projects
Power BI Web Scorecards, Dashboards, and Reports
Microsoft Excel Spreadsheets
SSAS Server
SSRS Server and Website
Report Builder
Performance Analysis Tools
Estimated Query Plans
Live Query Plans
DBCC
IO and TIME Statistics
STATISTICS PROFILE
Where to Get the Tools
SQL Server Developer
Visual Studio Community
SQL Server Data Tools
SQL Server SSAS Project Support
SQL Server SSIS Project Support
SQL Server SSRS Project Support
Report Builder
Power BI Desktop
Power BI Server
Microsoft Excel
SSMS
Next Steps
Thank You!
Appendix A: Function Syntax, Descriptions
The Window Frame Specifications
ROWS and RANGE Default Behavior
Scenario 1
Scenario 2
The Aggregate Functions
COUNT()
COUNT_BIG()
SUM()
MAX()
MIN()
AVG()
GROUPING()
STRING_AGG()
STDEV()
STDEVP()
VAR()
VARP()
The Analytical Functions
CUME_DIST()
FIRST_VALUE()
LAST_VALUE()
LAG()
LEAD()
PERCENT_RANK()
PERCENTILE_CONT()
PERCENTILE_DISC()
The Window/Ranking Functions
RANK()
DENSE_RANK()
NTILE()
ROW_NUMBER()
Appendix B: Statistical Functions
Standard Deviation
Variance
Normal Distribution
Mean (Average)
Median
Mode
Geometric Mean
Harmonic Mean
Weighted Mean (Average)
Summary
Index
df-Capture.PNG

Polecaj historie