How to Think Like the SQL Server Engine
- 1. Welcome to the online class!
Slides, scripts, videos: BrentOzar.com/go/engine
Want to follow along? Download the small Stack Overflow 2010
database: BrentOzar.com/go/querystack
We’ll start at 5 minutes after the hour to give folks time to get
GoToWebinar working.
To chat with me & students: https://BrentOzar.com/slack in the
#BrentOzarUnlimited room. (Not GoToWebinar Q&A.)
- 2. How to Think Like
the SQL Server Engine
Brent Ozar, 2019/01/25
- 3. The best-in-class performance
monitoring tool, SQL Sentry is now
available in an edition that’s right-sized
for smaller environments.
SQL Sentry Essentials includes the core
features of SentryOne’s flagship
monitoring product and is perfect for
environments of up to five targets.
BrentOzar.com/go/sentryone
- 4. We’re using Stack Overflow data.
Open source, licensed with Creative Commons
SQL Server: BrentOzar.com/go/querystack
XML dump: archive.org/details/stackexchange
I’m using SQL Server 2019,
compatibility level 150/2019.
My cost threshold for parallelism is 5.
- 12. SET STATISTICS IO ON
Logical reads: the number of 8K pages we read.
(7,405 x 8KB = 59MB)
- 14. Let’s add a filter.
SELECT Id
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
- 15. Your execution plan:
1. Shuffle through all of the pages,
saying the Id of each record out loud,
if their LastAccessDate > ‘2014/07/01’.
- 18. Lesson:
Using WHERE
without a matching index
means scanning all the data.
(And therearesome extra reads whenqueries goparallel –but moreonthatin
our moreadvanced classes.)
- 22. Let’s add a sort.
SELECT Id
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate
- 23. Your execution plan
1. Shuffle through all of the pages,
writing down fields __________ for each record,
if their LastAccessDate > ‘2014/07/01’.
2. Sort the matching records by LastAccessDate.
- 26. Order By:
Cost is up about 2x
We needed space to
write down our results,
so we got a memory grant
- 28. You can’t always get what you want.
Memory is set when the query
starts, and not revised.
SQL Server has to assume
other people will run queries at
the same time as you.
Your memory grant can change
with each time that you run a
query.
* - This screenshot is from a different query to show variances.
- 30. Let’s get all the fields.
SELECT *
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate
- 31. Your execution plan
1. Shuffle through all of the pages,
writing down fields __________ for each record,
if their LastAccessDate > ‘2014/07/01’.
2. Sort the matching records by LastAccessDate.
- 33. But why does it suck?
Do we work harder to read the data?
Do we work harder to write the data?
Do we work harder to sort the data?
Do we work harder to output the data?
- 38. Let’s run it a few times.
SELECT *
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate;
GO 100
- 39. Your execution plan
1. Shuffle through all of the pages,
writing down all the fields for each record,
if their LastAccessDate > ‘2014/07/01’.
2. Sort the matching records by LastAccessDate.
3. Keep the output so you could reuse it the next time
you saw this same query?
- 46. Let’s go simple again.
SELECT Id
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate;
- 52. And less CPU, too.
SET STATISTICS TIME shows you
how much CPU time each query
burned up.
The index eliminates the sort, which
burned up our CPUs.
- 53. The index covers the fields
needed by the query,
so we call it a covering index.*
*But covering isn’t really aspecial kindof index –
it’sonly covering when we’re talking about aquery.
- 56. You probably think “seek” means,
“I’m going to jump to a row and read that one row.”
You probably think “scan” means,
“I’m going to read the whole thing.”
- 58. SQL Server doesn’t know.
You and I know this means the whole table:
But SQL Server doesn’t, and can’t guarantee it
unless you tell it more about the data in the table,
like add a constraint.
(More on that in other classes.)
- 59. Seek means,
“I’m going to jump to a row and start reading.”
Scan means,
“I’m going to start at either end of the object
(might be either the start, or the end)
and start reading.”
Neither term defines
how many rows will be read.
- 61. Seeks vs scans
A seek can start at the first row,
and read the entire table.
A scan can start at one end of the table,
and only read a few pages.
We can’t just say, “All index seeks! We’re done.”
- 63. Lessons we learned
SET STATISTICS IO ON: shows # of 8KB pages read
SET STATISTICS TIME ON: shows CPU work done
WHERE without a supporting index: table scan
ORDER BY without a supporting index: CPU work
Indexes reduce page reads and sorts
Seek =! awesome, and scan != terribad
- 64. The best-in-class performance
monitoring tool, SQL Sentry is now
available in an edition that’s right-sized
for smaller environments.
SQL Sentry Essentials includes the core
features of SentryOne’s flagship
monitoring product and is perfect for
environments of up to five targets.
BrentOzar.com/go/sentryone
- 66. Let’s add a couple of fields.
SELECT Id, DisplayName, Age
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate;
- 67. One execution plan
1. Grab IX_LastAccessDate_Id, seek to 2014/07/01.
2. Write down the Id and LastAccessDate of matching
records.
3. Grab the clustered index (white pages), and look up
each matching row by their Id to get DisplayName and
Age.
- 69. That’s why SQL includes the key
For simplicity, I said I created this index with the Id.
SQL Server always includes your clustering keys whether
you ask for ‘em or not because it has to join indexes
together.
- 70. Classic index
tuning sign
Key lookup is required when the
index doesn’t have all the fields we
need.
Hover your mouse over the key
lookup, look for the OUTPUT.
Small fields? Frequently used?
Add ‘em to the index.
- 75. Statistics help SQL Server:
Decide which index to use
What order to process tables/indexes in
Whether to do seeks or scans
Guess how many rows will match your query
How much memory to allocate for the query
- 80. Two ways you can help
1. Keep your stats updated at least weekly.
Automatic stats updates aren’t enough. Consider Ola
Hallengren’s free scripts: Ola.Hallengren.com
2. Learn which T-SQL elements will cause cardinality
estimation problems, ignoring statistics
- 82. Estimated 2,076 rows
Estimated 2 rows
Both produce the same 2,443 rows, but they
use 2 different ways to retrieve those rows due
to their different estimates.
- 83. The classic problem
SQL Server has to decide between:
• Scanning the entire table,
which is great for big data, or
• An index seek + key lookup,
which is better for small data
It bases this decision on
cardinality estimation – and it’s not perfect.
We can avoid this problem by
widening our nonclustered index.
- 85. Same query again
SELECT Id, DisplayName, Age
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’
ORDER BY LastAccessDate
- 88. Lessons we learned
Index seek + key lookup = we may need wider indexes
Statistics help SQL Server pick indexes, methods
Cardinality estimation isn’t perfect (especially with real-
world T-SQL and joins to multiple tables)
You can help by understanding SQL’s limitations
and crafting your T-SQL to avoid them
- 89. Your next steps
Full How to Think Like the Engine: free videos
Fundamentals of Index Tuning: 1-day online class
Mastering Index Tuning: 3-day online class
Learn more:
BrentOzar.com/go/engine
Editor's Notes
- 463x
- 463x
- I’ve changed two things about the query – I’m only selecting Id, but I want only the users who have accessed the site since July 1st.
So who in the room can describe to me how you’re going to deliver this data to me?