SlideShare a Scribd company logo
Welcome to the online class!
Slides, scripts, videos: BrentOzar.com/go/engine
Want to follow along? Download the small Stack Overflow 2010
database: BrentOzar.com/go/querystack
We’ll start at 5 minutes after the hour to give folks time to get
GoToWebinar working.
To chat with me & students: https://BrentOzar.com/slack in the
#BrentOzarUnlimited room. (Not GoToWebinar Q&A.)
How to Think Like
the SQL Server Engine
Brent Ozar, 2019/01/25
The best-in-class performance
monitoring tool, SQL Sentry is now
available in an edition that’s right-sized
for smaller environments.
SQL Sentry Essentials includes the core
features of SentryOne’s flagship
monitoring product and is perfect for
environments of up to five targets.
BrentOzar.com/go/sentryone
We’re using Stack Overflow data.
Open source, licensed with Creative Commons
SQL Server: BrentOzar.com/go/querystack
XML dump: archive.org/details/stackexchange
I’m using SQL Server 2019,
compatibility level 150/2019.
My cost threshold for parallelism is 5.
How to Think Like the SQL Server Engine
How to Think Like the SQL Server Engine
Page Header
Index OR
Data Rows
Slot Array
8KB
You: SQL Server.
Me: end user.
First query:
SELECT Id
FROM dbo.Users
Your execution plan:
1. Shuffle through all of the pages,
saying the Id of each record out loud.
SQL Server’s execution plan
SET STATISTICS IO ON
Logical reads: the number of 8K pages we read.
(7,405 x 8KB = 59MB)
That’s 15 reams.
Let’s add a filter.
SELECT Id
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
Your execution plan:
1. Shuffle through all of the pages,
saying the Id of each record out loud,
if their LastAccessDate > ‘2014/07/01’.
SQL Server’s execution plan
How to Think Like the SQL Server Engine
Lesson:
Using WHERE
without a matching index
means scanning all the data.
(And therearesome extra reads whenqueries goparallel –but moreonthatin
our moreadvanced classes.)
How to Think Like the SQL Server Engine
How to Think Like the SQL Server Engine
Lesson:
Estimated Subtree Cost is a rough measure
of CPU and IO work required for a query.
Let’s add a sort.
SELECT Id
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate
Your execution plan
1. Shuffle through all of the pages,
writing down fields __________ for each record,
if their LastAccessDate > ‘2014/07/01’.
2. Sort the matching records by LastAccessDate.
SQL Server’s execution plan
How to Think Like the SQL Server Engine
Order By:
Cost is up about 2x
We needed space to
write down our results,
so we got a memory grant
You can see more in Properties
You can’t always get what you want.
Memory is set when the query
starts, and not revised.
SQL Server has to assume
other people will run queries at
the same time as you.
Your memory grant can change
with each time that you run a
query.
* - This screenshot is from a different query to show variances.
And if you run out of memory…
Let’s get all the fields.
SELECT *
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate
Your execution plan
1. Shuffle through all of the pages,
writing down fields __________ for each record,
if their LastAccessDate > ‘2014/07/01’.
2. Sort the matching records by LastAccessDate.
That query sucks.
But why does it suck?
Do we work harder to read the data?
Do we work harder to write the data?
Do we work harder to sort the data?
Do we work harder to output the data?
The sort cost is
now 97%...
Of a MUCH larger
overall cost.
SELECT ID SELECT *
No order 6 6
ORDER BY 13 871
Lesson:
Sorting data is expensive, and more fields
make it worse.
Let’s run it a few times.
SELECT *
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate;
GO 100
Your execution plan
1. Shuffle through all of the pages,
writing down all the fields for each record,
if their LastAccessDate > ‘2014/07/01’.
2. Sort the matching records by LastAccessDate.
3. Keep the output so you could reuse it the next time
you saw this same query?
Oracle can.
(One ofthe reasons itcosts$47,000 per core.)
Oracle can.
(One ofthe reasons itcosts$47,000 per core.)
Another reason
SQL Server reads & sorts 100 times.
Lesson:
SQL Server caches data pages, not query
output.
So how do we
make this fast?
Nonclustered indexes: copies.
Stored in order we want, include the fields we want
CREATE INDEX
IX_LastAccessDate_Id
ON dbo.Users(LastAccessDate, Id)
Leaf pages
(we’re focusing on these)
“Index” pages
(but exist for both clustered and
nonclustered indexes)
Let’s go simple again.
SELECT Id
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate;
Your execution plan
1. Grab IX_LastAccessDate and seek to 2014/07/01.
2. Read the Id’s out in order.
The new plan uses the index
And it’s CHEAP.
SELECT ID SELECT *
No order 6 6
ORDER BY 13 871
ORDER BY, with index <1 48
Why cheaper?
For starters, it does less
logical reads…
And less CPU, too.
SET STATISTICS TIME shows you
how much CPU time each query
burned up.
The index eliminates the sort, which
burned up our CPUs.
The index covers the fields
needed by the query,
so we call it a covering index.*
*But covering isn’t really aspecial kindof index –
it’sonly covering when we’re talking about aquery.
So nonclustered index seeks are
great, right?
“Seek” sounds small, right?
But that’s a lot of data.
You probably think “seek” means,
“I’m going to jump to a row and read that one row.”
You probably think “scan” means,
“I’m going to read the whole thing.”
Note that date
“Seek” = read all rows
SQL Server doesn’t know.
You and I know this means the whole table:
But SQL Server doesn’t, and can’t guarantee it
unless you tell it more about the data in the table,
like add a constraint.
(More on that in other classes.)
Seek means,
“I’m going to jump to a row and start reading.”
Scan means,
“I’m going to start at either end of the object
(might be either the start, or the end)
and start reading.”
Neither term defines
how many rows will be read.
A scan that reads a few rows:
Seeks vs scans
A seek can start at the first row,
and read the entire table.
A scan can start at one end of the table,
and only read a few pages.
We can’t just say, “All index seeks! We’re done.”
Recap (so far)
Lessons we learned
SET STATISTICS IO ON: shows # of 8KB pages read
SET STATISTICS TIME ON: shows CPU work done
WHERE without a supporting index: table scan
ORDER BY without a supporting index: CPU work
Indexes reduce page reads and sorts
Seek =! awesome, and scan != terribad
The best-in-class performance
monitoring tool, SQL Sentry is now
available in an edition that’s right-sized
for smaller environments.
SQL Sentry Essentials includes the core
features of SentryOne’s flagship
monitoring product and is perfect for
environments of up to five targets.
BrentOzar.com/go/sentryone
Key Lookups and
Cardinality Estimation
Let’s add a couple of fields.
SELECT Id, DisplayName, Age
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate;
One execution plan
1. Grab IX_LastAccessDate_Id, seek to 2014/07/01.
2. Write down the Id and LastAccessDate of matching
records.
3. Grab the clustered index (white pages), and look up
each matching row by their Id to get DisplayName and
Age.
Sometimes that
happens, yes.
That’s why SQL includes the key
For simplicity, I said I created this index with the Id.
SQL Server always includes your clustering keys whether
you ask for ‘em or not because it has to join indexes
together.
Classic index
tuning sign
Key lookup is required when the
index doesn’t have all the fields we
need.
Hover your mouse over the key
lookup, look for the OUTPUT.
Small fields? Frequently used?
Add ‘em to the index.
Sometimes.
But sometimes this happens.
Lesson:
Even with indexes,
there’s a tipping point where it’s more efficient
for SQL to just scan the table once and get
out.
Enter statistics.
Statistics help SQL Server:
Decide which index to use
What order to process tables/indexes in
Whether to do seeks or scans
Guess how many rows will match your query
How much memory to allocate for the query
How to Think Like the SQL Server Engine
WHERE LastAccessDate
> '2014/07/01'
Add it up
Add it up
How to Think Like the SQL Server Engine
Examples of varchar & int stats
Two ways you can help
1. Keep your stats updated at least weekly.
Automatic stats updates aren’t enough. Consider Ola
Hallengren’s free scripts: Ola.Hallengren.com
2. Learn which T-SQL elements will cause cardinality
estimation problems, ignoring statistics
One idea, written differently
Estimated 2,076 rows
Estimated 2 rows
Both produce the same 2,443 rows, but they
use 2 different ways to retrieve those rows due
to their different estimates.
The classic problem
SQL Server has to decide between:
• Scanning the entire table,
which is great for big data, or
• An index seek + key lookup,
which is better for small data
It bases this decision on
cardinality estimation – and it’s not perfect.
We can avoid this problem by
widening our nonclustered index.
CREATE INDEX IX_LastAccessDate_Id_DisplayName_Age
ON dbo.Users (LastAccessDate, Id, DisplayName, Age)
Or:
CREATE INDEX IX_LastAccessDate_Id_Includes
ON dbo.Users (LastAccessDate, Id)
INCLUDE (DisplayName, Age)
Same query again
SELECT Id, DisplayName, Age
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate
Yay! Back to a single operator.
Recap (so far)
Lessons we learned
Index seek + key lookup = we may need wider indexes
Statistics help SQL Server pick indexes, methods
Cardinality estimation isn’t perfect (especially with real-
world T-SQL and joins to multiple tables)
You can help by understanding SQL’s limitations
and crafting your T-SQL to avoid them
Your next steps
Full How to Think Like the Engine: free videos
Fundamentals of Index Tuning: 1-day online class
Mastering Index Tuning: 3-day online class
Learn more:
BrentOzar.com/go/engine

More Related Content

How to Think Like the SQL Server Engine

  • 1. Welcome to the online class! Slides, scripts, videos: BrentOzar.com/go/engine Want to follow along? Download the small Stack Overflow 2010 database: BrentOzar.com/go/querystack We’ll start at 5 minutes after the hour to give folks time to get GoToWebinar working. To chat with me & students: https://BrentOzar.com/slack in the #BrentOzarUnlimited room. (Not GoToWebinar Q&A.)
  • 2. How to Think Like the SQL Server Engine Brent Ozar, 2019/01/25
  • 3. The best-in-class performance monitoring tool, SQL Sentry is now available in an edition that’s right-sized for smaller environments. SQL Sentry Essentials includes the core features of SentryOne’s flagship monitoring product and is perfect for environments of up to five targets. BrentOzar.com/go/sentryone
  • 4. We’re using Stack Overflow data. Open source, licensed with Creative Commons SQL Server: BrentOzar.com/go/querystack XML dump: archive.org/details/stackexchange I’m using SQL Server 2019, compatibility level 150/2019. My cost threshold for parallelism is 5.
  • 7. Page Header Index OR Data Rows Slot Array 8KB
  • 10. Your execution plan: 1. Shuffle through all of the pages, saying the Id of each record out loud.
  • 12. SET STATISTICS IO ON Logical reads: the number of 8K pages we read. (7,405 x 8KB = 59MB)
  • 14. Let’s add a filter. SELECT Id FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’
  • 15. Your execution plan: 1. Shuffle through all of the pages, saying the Id of each record out loud, if their LastAccessDate > ‘2014/07/01’.
  • 18. Lesson: Using WHERE without a matching index means scanning all the data. (And therearesome extra reads whenqueries goparallel –but moreonthatin our moreadvanced classes.)
  • 21. Lesson: Estimated Subtree Cost is a rough measure of CPU and IO work required for a query.
  • 22. Let’s add a sort. SELECT Id FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate
  • 23. Your execution plan 1. Shuffle through all of the pages, writing down fields __________ for each record, if their LastAccessDate > ‘2014/07/01’. 2. Sort the matching records by LastAccessDate.
  • 26. Order By: Cost is up about 2x We needed space to write down our results, so we got a memory grant
  • 27. You can see more in Properties
  • 28. You can’t always get what you want. Memory is set when the query starts, and not revised. SQL Server has to assume other people will run queries at the same time as you. Your memory grant can change with each time that you run a query. * - This screenshot is from a different query to show variances.
  • 29. And if you run out of memory…
  • 30. Let’s get all the fields. SELECT * FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate
  • 31. Your execution plan 1. Shuffle through all of the pages, writing down fields __________ for each record, if their LastAccessDate > ‘2014/07/01’. 2. Sort the matching records by LastAccessDate.
  • 33. But why does it suck? Do we work harder to read the data? Do we work harder to write the data? Do we work harder to sort the data? Do we work harder to output the data?
  • 34. The sort cost is now 97%...
  • 35. Of a MUCH larger overall cost.
  • 36. SELECT ID SELECT * No order 6 6 ORDER BY 13 871
  • 37. Lesson: Sorting data is expensive, and more fields make it worse.
  • 38. Let’s run it a few times. SELECT * FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate; GO 100
  • 39. Your execution plan 1. Shuffle through all of the pages, writing down all the fields for each record, if their LastAccessDate > ‘2014/07/01’. 2. Sort the matching records by LastAccessDate. 3. Keep the output so you could reuse it the next time you saw this same query?
  • 40. Oracle can. (One ofthe reasons itcosts$47,000 per core.)
  • 41. Oracle can. (One ofthe reasons itcosts$47,000 per core.) Another reason
  • 42. SQL Server reads & sorts 100 times.
  • 43. Lesson: SQL Server caches data pages, not query output. So how do we make this fast?
  • 44. Nonclustered indexes: copies. Stored in order we want, include the fields we want CREATE INDEX IX_LastAccessDate_Id ON dbo.Users(LastAccessDate, Id)
  • 45. Leaf pages (we’re focusing on these) “Index” pages (but exist for both clustered and nonclustered indexes)
  • 46. Let’s go simple again. SELECT Id FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate;
  • 47. Your execution plan 1. Grab IX_LastAccessDate and seek to 2014/07/01. 2. Read the Id’s out in order.
  • 48. The new plan uses the index
  • 50. SELECT ID SELECT * No order 6 6 ORDER BY 13 871 ORDER BY, with index <1 48
  • 51. Why cheaper? For starters, it does less logical reads…
  • 52. And less CPU, too. SET STATISTICS TIME shows you how much CPU time each query burned up. The index eliminates the sort, which burned up our CPUs.
  • 53. The index covers the fields needed by the query, so we call it a covering index.* *But covering isn’t really aspecial kindof index – it’sonly covering when we’re talking about aquery.
  • 54. So nonclustered index seeks are great, right?
  • 55. “Seek” sounds small, right? But that’s a lot of data.
  • 56. You probably think “seek” means, “I’m going to jump to a row and read that one row.” You probably think “scan” means, “I’m going to read the whole thing.”
  • 57. Note that date “Seek” = read all rows
  • 58. SQL Server doesn’t know. You and I know this means the whole table: But SQL Server doesn’t, and can’t guarantee it unless you tell it more about the data in the table, like add a constraint. (More on that in other classes.)
  • 59. Seek means, “I’m going to jump to a row and start reading.” Scan means, “I’m going to start at either end of the object (might be either the start, or the end) and start reading.” Neither term defines how many rows will be read.
  • 60. A scan that reads a few rows:
  • 61. Seeks vs scans A seek can start at the first row, and read the entire table. A scan can start at one end of the table, and only read a few pages. We can’t just say, “All index seeks! We’re done.”
  • 63. Lessons we learned SET STATISTICS IO ON: shows # of 8KB pages read SET STATISTICS TIME ON: shows CPU work done WHERE without a supporting index: table scan ORDER BY without a supporting index: CPU work Indexes reduce page reads and sorts Seek =! awesome, and scan != terribad
  • 64. The best-in-class performance monitoring tool, SQL Sentry is now available in an edition that’s right-sized for smaller environments. SQL Sentry Essentials includes the core features of SentryOne’s flagship monitoring product and is perfect for environments of up to five targets. BrentOzar.com/go/sentryone
  • 66. Let’s add a couple of fields. SELECT Id, DisplayName, Age FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate;
  • 67. One execution plan 1. Grab IX_LastAccessDate_Id, seek to 2014/07/01. 2. Write down the Id and LastAccessDate of matching records. 3. Grab the clustered index (white pages), and look up each matching row by their Id to get DisplayName and Age.
  • 69. That’s why SQL includes the key For simplicity, I said I created this index with the Id. SQL Server always includes your clustering keys whether you ask for ‘em or not because it has to join indexes together.
  • 70. Classic index tuning sign Key lookup is required when the index doesn’t have all the fields we need. Hover your mouse over the key lookup, look for the OUTPUT. Small fields? Frequently used? Add ‘em to the index.
  • 72. But sometimes this happens.
  • 73. Lesson: Even with indexes, there’s a tipping point where it’s more efficient for SQL to just scan the table once and get out.
  • 75. Statistics help SQL Server: Decide which index to use What order to process tables/indexes in Whether to do seeks or scans Guess how many rows will match your query How much memory to allocate for the query
  • 79. Examples of varchar & int stats
  • 80. Two ways you can help 1. Keep your stats updated at least weekly. Automatic stats updates aren’t enough. Consider Ola Hallengren’s free scripts: Ola.Hallengren.com 2. Learn which T-SQL elements will cause cardinality estimation problems, ignoring statistics
  • 81. One idea, written differently
  • 82. Estimated 2,076 rows Estimated 2 rows Both produce the same 2,443 rows, but they use 2 different ways to retrieve those rows due to their different estimates.
  • 83. The classic problem SQL Server has to decide between: • Scanning the entire table, which is great for big data, or • An index seek + key lookup, which is better for small data It bases this decision on cardinality estimation – and it’s not perfect. We can avoid this problem by widening our nonclustered index.
  • 84. CREATE INDEX IX_LastAccessDate_Id_DisplayName_Age ON dbo.Users (LastAccessDate, Id, DisplayName, Age) Or: CREATE INDEX IX_LastAccessDate_Id_Includes ON dbo.Users (LastAccessDate, Id) INCLUDE (DisplayName, Age)
  • 85. Same query again SELECT Id, DisplayName, Age FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate
  • 86. Yay! Back to a single operator.
  • 88. Lessons we learned Index seek + key lookup = we may need wider indexes Statistics help SQL Server pick indexes, methods Cardinality estimation isn’t perfect (especially with real- world T-SQL and joins to multiple tables) You can help by understanding SQL’s limitations and crafting your T-SQL to avoid them
  • 89. Your next steps Full How to Think Like the Engine: free videos Fundamentals of Index Tuning: 1-day online class Mastering Index Tuning: 3-day online class Learn more: BrentOzar.com/go/engine

Editor's Notes

  1. 463x
  2. 463x
  3. I’ve changed two things about the query – I’m only selecting Id, but I want only the users who have accessed the site since July 1st. So who in the room can describe to me how you’re going to deliver this data to me?