Why optimiser choose Clustered Index + Sort instead of Non-Clustered Index
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}
up vote
4
down vote
favorite
Given the next example:
IF OBJECT_ID('dbo.my_table') IS NOT NULL
DROP TABLE [dbo].[my_table];
GO
CREATE TABLE [dbo].[my_table]
(
[id] int IDENTITY (1,1) NOT NULL PRIMARY KEY,
[foo] int NULL,
[bar] int NULL,
[nki] int NOT NULL
);
GO
/* Insert some random data */
INSERT INTO [dbo].[my_table] (foo, bar, nki)
SELECT TOP (100000)
ABS(CHECKSUM(NewId())) % 14,
ABS(CHECKSUM(NewId())) % 20,
n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM
sys.all_objects AS s1
CROSS JOIN
sys.all_objects AS s2
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
ON [dbo].[my_table] ([nki] ASC);
GO
If I fetch all records ordered by [nki]
(Non-clustered index):
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 266 ms, elapsed time = 493 ms
Optimiser choose the clustered index and then applies a Sort algorithm.
Execution plan
But if I force it to use the non-clustered index:
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table WITH(INDEX(IX_my_TABLE));
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 311 ms, elapsed time = 188 ms
Then it uses non-clustered index with a Key Lookup:
Execution plan
Obviously if the non-clustered index is transformed into a covering index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
ON [dbo].[my_table] ([nki] ASC)
INCLUDE (id, foo, bar);
GO
Then it uses only this index:
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 32 ms, elapsed time = 106 ms
Execution plan
Question
- Why SQL Server use the clustered index plus a sort algorithm instead of non-clustered index even if execution time is 38% faster?
sql-server sql-server-2012 nonclustered-index
add a comment |
up vote
4
down vote
favorite
Given the next example:
IF OBJECT_ID('dbo.my_table') IS NOT NULL
DROP TABLE [dbo].[my_table];
GO
CREATE TABLE [dbo].[my_table]
(
[id] int IDENTITY (1,1) NOT NULL PRIMARY KEY,
[foo] int NULL,
[bar] int NULL,
[nki] int NOT NULL
);
GO
/* Insert some random data */
INSERT INTO [dbo].[my_table] (foo, bar, nki)
SELECT TOP (100000)
ABS(CHECKSUM(NewId())) % 14,
ABS(CHECKSUM(NewId())) % 20,
n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM
sys.all_objects AS s1
CROSS JOIN
sys.all_objects AS s2
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
ON [dbo].[my_table] ([nki] ASC);
GO
If I fetch all records ordered by [nki]
(Non-clustered index):
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 266 ms, elapsed time = 493 ms
Optimiser choose the clustered index and then applies a Sort algorithm.
Execution plan
But if I force it to use the non-clustered index:
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table WITH(INDEX(IX_my_TABLE));
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 311 ms, elapsed time = 188 ms
Then it uses non-clustered index with a Key Lookup:
Execution plan
Obviously if the non-clustered index is transformed into a covering index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
ON [dbo].[my_table] ([nki] ASC)
INCLUDE (id, foo, bar);
GO
Then it uses only this index:
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 32 ms, elapsed time = 106 ms
Execution plan
Question
- Why SQL Server use the clustered index plus a sort algorithm instead of non-clustered index even if execution time is 38% faster?
sql-server sql-server-2012 nonclustered-index
1
Sounds like you should review Kimberly Tripp's article about the tipping point: sqlskills.com/blogs/kimberly/the-tipping-point-query-answers More specifically the section titled "When does the tipping point occur?" I think she does a good job of covering when/why this can happen.
– John Eisbrener
Nov 20 at 15:47
1
The optimizer draws up the execution plan based on the information it has at compile time - it's not possible at this point to make decisions based on execution time (SQL Server is adding features that learn, like adaptive joins and adaptive memory grant feedback, but choosing between a seek and a scan based on duration is probably a long way off).
– Aaron Bertrand♦
Nov 20 at 19:36
1
Did you mean to leave off the ORDER BY in your forced-index query?
– Forrest
Nov 20 at 20:46
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
Given the next example:
IF OBJECT_ID('dbo.my_table') IS NOT NULL
DROP TABLE [dbo].[my_table];
GO
CREATE TABLE [dbo].[my_table]
(
[id] int IDENTITY (1,1) NOT NULL PRIMARY KEY,
[foo] int NULL,
[bar] int NULL,
[nki] int NOT NULL
);
GO
/* Insert some random data */
INSERT INTO [dbo].[my_table] (foo, bar, nki)
SELECT TOP (100000)
ABS(CHECKSUM(NewId())) % 14,
ABS(CHECKSUM(NewId())) % 20,
n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM
sys.all_objects AS s1
CROSS JOIN
sys.all_objects AS s2
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
ON [dbo].[my_table] ([nki] ASC);
GO
If I fetch all records ordered by [nki]
(Non-clustered index):
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 266 ms, elapsed time = 493 ms
Optimiser choose the clustered index and then applies a Sort algorithm.
Execution plan
But if I force it to use the non-clustered index:
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table WITH(INDEX(IX_my_TABLE));
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 311 ms, elapsed time = 188 ms
Then it uses non-clustered index with a Key Lookup:
Execution plan
Obviously if the non-clustered index is transformed into a covering index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
ON [dbo].[my_table] ([nki] ASC)
INCLUDE (id, foo, bar);
GO
Then it uses only this index:
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 32 ms, elapsed time = 106 ms
Execution plan
Question
- Why SQL Server use the clustered index plus a sort algorithm instead of non-clustered index even if execution time is 38% faster?
sql-server sql-server-2012 nonclustered-index
Given the next example:
IF OBJECT_ID('dbo.my_table') IS NOT NULL
DROP TABLE [dbo].[my_table];
GO
CREATE TABLE [dbo].[my_table]
(
[id] int IDENTITY (1,1) NOT NULL PRIMARY KEY,
[foo] int NULL,
[bar] int NULL,
[nki] int NOT NULL
);
GO
/* Insert some random data */
INSERT INTO [dbo].[my_table] (foo, bar, nki)
SELECT TOP (100000)
ABS(CHECKSUM(NewId())) % 14,
ABS(CHECKSUM(NewId())) % 20,
n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM
sys.all_objects AS s1
CROSS JOIN
sys.all_objects AS s2
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
ON [dbo].[my_table] ([nki] ASC);
GO
If I fetch all records ordered by [nki]
(Non-clustered index):
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 266 ms, elapsed time = 493 ms
Optimiser choose the clustered index and then applies a Sort algorithm.
Execution plan
But if I force it to use the non-clustered index:
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table WITH(INDEX(IX_my_TABLE));
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 311 ms, elapsed time = 188 ms
Then it uses non-clustered index with a Key Lookup:
Execution plan
Obviously if the non-clustered index is transformed into a covering index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_my_table]
ON [dbo].[my_table] ([nki] ASC)
INCLUDE (id, foo, bar);
GO
Then it uses only this index:
SET STATISTICS TIME ON;
SELECT id, foo, bar, nki FROM my_table ORDER BY nki;
SET STATISTICS TIME OFF;
SQL Server Execution Times: CPU time = 32 ms, elapsed time = 106 ms
Execution plan
Question
- Why SQL Server use the clustered index plus a sort algorithm instead of non-clustered index even if execution time is 38% faster?
sql-server sql-server-2012 nonclustered-index
sql-server sql-server-2012 nonclustered-index
asked Nov 20 at 15:42
McNets
13.7k41753
13.7k41753
1
Sounds like you should review Kimberly Tripp's article about the tipping point: sqlskills.com/blogs/kimberly/the-tipping-point-query-answers More specifically the section titled "When does the tipping point occur?" I think she does a good job of covering when/why this can happen.
– John Eisbrener
Nov 20 at 15:47
1
The optimizer draws up the execution plan based on the information it has at compile time - it's not possible at this point to make decisions based on execution time (SQL Server is adding features that learn, like adaptive joins and adaptive memory grant feedback, but choosing between a seek and a scan based on duration is probably a long way off).
– Aaron Bertrand♦
Nov 20 at 19:36
1
Did you mean to leave off the ORDER BY in your forced-index query?
– Forrest
Nov 20 at 20:46
add a comment |
1
Sounds like you should review Kimberly Tripp's article about the tipping point: sqlskills.com/blogs/kimberly/the-tipping-point-query-answers More specifically the section titled "When does the tipping point occur?" I think she does a good job of covering when/why this can happen.
– John Eisbrener
Nov 20 at 15:47
1
The optimizer draws up the execution plan based on the information it has at compile time - it's not possible at this point to make decisions based on execution time (SQL Server is adding features that learn, like adaptive joins and adaptive memory grant feedback, but choosing between a seek and a scan based on duration is probably a long way off).
– Aaron Bertrand♦
Nov 20 at 19:36
1
Did you mean to leave off the ORDER BY in your forced-index query?
– Forrest
Nov 20 at 20:46
1
1
Sounds like you should review Kimberly Tripp's article about the tipping point: sqlskills.com/blogs/kimberly/the-tipping-point-query-answers More specifically the section titled "When does the tipping point occur?" I think she does a good job of covering when/why this can happen.
– John Eisbrener
Nov 20 at 15:47
Sounds like you should review Kimberly Tripp's article about the tipping point: sqlskills.com/blogs/kimberly/the-tipping-point-query-answers More specifically the section titled "When does the tipping point occur?" I think she does a good job of covering when/why this can happen.
– John Eisbrener
Nov 20 at 15:47
1
1
The optimizer draws up the execution plan based on the information it has at compile time - it's not possible at this point to make decisions based on execution time (SQL Server is adding features that learn, like adaptive joins and adaptive memory grant feedback, but choosing between a seek and a scan based on duration is probably a long way off).
– Aaron Bertrand♦
Nov 20 at 19:36
The optimizer draws up the execution plan based on the information it has at compile time - it's not possible at this point to make decisions based on execution time (SQL Server is adding features that learn, like adaptive joins and adaptive memory grant feedback, but choosing between a seek and a scan based on duration is probably a long way off).
– Aaron Bertrand♦
Nov 20 at 19:36
1
1
Did you mean to leave off the ORDER BY in your forced-index query?
– Forrest
Nov 20 at 20:46
Did you mean to leave off the ORDER BY in your forced-index query?
– Forrest
Nov 20 at 20:46
add a comment |
2 Answers
2
active
oldest
votes
up vote
4
down vote
If you were to compare the number of reads required in 100,000 lookups with what’s involved in doing a sort, you might quickly get an idea about why the Query Optimizer figures that the CIX+Sort would be the best choice.
The Lookup execution ends up being quicker because the pages being read are in memory (even if you clear the cache, you have a lot of rows per page, so you’re reading the same pages over and over, but with different amounts of fragmentation or different memory pressure from other activity, this might not be the case). It really wouldn’t take all that much to have CIX+Sort go quicker, but what you’re seeing is because the cost of a read doesn’t take into consideration the relative cheapness of hitting the same pages repeatedly.
add a comment |
up vote
3
down vote
Why SQL Server use the clustered index plus a sort algorithm instead
of non-clustered index even if execution time is 38% faster?
Because SQL Server uses a cost-based optimizer based on statistics, not runtime info.
During the cost estimation process for this query, it does actually evaluate the lookup plan, but estimates it will take more effort. (Note the "Estimated Subtree Cost" when hovering over SELECT in the execution plan). That's not necessarily a bad assumption either - on my test machine, the lookup plan takes 6X the CPU of the sort/scan.
Look to Rob Farley's answer as to why SQL Server might cost the lookup plan higher.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
If you were to compare the number of reads required in 100,000 lookups with what’s involved in doing a sort, you might quickly get an idea about why the Query Optimizer figures that the CIX+Sort would be the best choice.
The Lookup execution ends up being quicker because the pages being read are in memory (even if you clear the cache, you have a lot of rows per page, so you’re reading the same pages over and over, but with different amounts of fragmentation or different memory pressure from other activity, this might not be the case). It really wouldn’t take all that much to have CIX+Sort go quicker, but what you’re seeing is because the cost of a read doesn’t take into consideration the relative cheapness of hitting the same pages repeatedly.
add a comment |
up vote
4
down vote
If you were to compare the number of reads required in 100,000 lookups with what’s involved in doing a sort, you might quickly get an idea about why the Query Optimizer figures that the CIX+Sort would be the best choice.
The Lookup execution ends up being quicker because the pages being read are in memory (even if you clear the cache, you have a lot of rows per page, so you’re reading the same pages over and over, but with different amounts of fragmentation or different memory pressure from other activity, this might not be the case). It really wouldn’t take all that much to have CIX+Sort go quicker, but what you’re seeing is because the cost of a read doesn’t take into consideration the relative cheapness of hitting the same pages repeatedly.
add a comment |
up vote
4
down vote
up vote
4
down vote
If you were to compare the number of reads required in 100,000 lookups with what’s involved in doing a sort, you might quickly get an idea about why the Query Optimizer figures that the CIX+Sort would be the best choice.
The Lookup execution ends up being quicker because the pages being read are in memory (even if you clear the cache, you have a lot of rows per page, so you’re reading the same pages over and over, but with different amounts of fragmentation or different memory pressure from other activity, this might not be the case). It really wouldn’t take all that much to have CIX+Sort go quicker, but what you’re seeing is because the cost of a read doesn’t take into consideration the relative cheapness of hitting the same pages repeatedly.
If you were to compare the number of reads required in 100,000 lookups with what’s involved in doing a sort, you might quickly get an idea about why the Query Optimizer figures that the CIX+Sort would be the best choice.
The Lookup execution ends up being quicker because the pages being read are in memory (even if you clear the cache, you have a lot of rows per page, so you’re reading the same pages over and over, but with different amounts of fragmentation or different memory pressure from other activity, this might not be the case). It really wouldn’t take all that much to have CIX+Sort go quicker, but what you’re seeing is because the cost of a read doesn’t take into consideration the relative cheapness of hitting the same pages repeatedly.
answered Nov 20 at 20:21
Rob Farley
13.3k12446
13.3k12446
add a comment |
add a comment |
up vote
3
down vote
Why SQL Server use the clustered index plus a sort algorithm instead
of non-clustered index even if execution time is 38% faster?
Because SQL Server uses a cost-based optimizer based on statistics, not runtime info.
During the cost estimation process for this query, it does actually evaluate the lookup plan, but estimates it will take more effort. (Note the "Estimated Subtree Cost" when hovering over SELECT in the execution plan). That's not necessarily a bad assumption either - on my test machine, the lookup plan takes 6X the CPU of the sort/scan.
Look to Rob Farley's answer as to why SQL Server might cost the lookup plan higher.
add a comment |
up vote
3
down vote
Why SQL Server use the clustered index plus a sort algorithm instead
of non-clustered index even if execution time is 38% faster?
Because SQL Server uses a cost-based optimizer based on statistics, not runtime info.
During the cost estimation process for this query, it does actually evaluate the lookup plan, but estimates it will take more effort. (Note the "Estimated Subtree Cost" when hovering over SELECT in the execution plan). That's not necessarily a bad assumption either - on my test machine, the lookup plan takes 6X the CPU of the sort/scan.
Look to Rob Farley's answer as to why SQL Server might cost the lookup plan higher.
add a comment |
up vote
3
down vote
up vote
3
down vote
Why SQL Server use the clustered index plus a sort algorithm instead
of non-clustered index even if execution time is 38% faster?
Because SQL Server uses a cost-based optimizer based on statistics, not runtime info.
During the cost estimation process for this query, it does actually evaluate the lookup plan, but estimates it will take more effort. (Note the "Estimated Subtree Cost" when hovering over SELECT in the execution plan). That's not necessarily a bad assumption either - on my test machine, the lookup plan takes 6X the CPU of the sort/scan.
Look to Rob Farley's answer as to why SQL Server might cost the lookup plan higher.
Why SQL Server use the clustered index plus a sort algorithm instead
of non-clustered index even if execution time is 38% faster?
Because SQL Server uses a cost-based optimizer based on statistics, not runtime info.
During the cost estimation process for this query, it does actually evaluate the lookup plan, but estimates it will take more effort. (Note the "Estimated Subtree Cost" when hovering over SELECT in the execution plan). That's not necessarily a bad assumption either - on my test machine, the lookup plan takes 6X the CPU of the sort/scan.
Look to Rob Farley's answer as to why SQL Server might cost the lookup plan higher.
answered Nov 20 at 21:01
Forrest
1,655516
1,655516
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f222987%2fwhy-optimiser-choose-clustered-index-sort-instead-of-non-clustered-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Sounds like you should review Kimberly Tripp's article about the tipping point: sqlskills.com/blogs/kimberly/the-tipping-point-query-answers More specifically the section titled "When does the tipping point occur?" I think she does a good job of covering when/why this can happen.
– John Eisbrener
Nov 20 at 15:47
1
The optimizer draws up the execution plan based on the information it has at compile time - it's not possible at this point to make decisions based on execution time (SQL Server is adding features that learn, like adaptive joins and adaptive memory grant feedback, but choosing between a seek and a scan based on duration is probably a long way off).
– Aaron Bertrand♦
Nov 20 at 19:36
1
Did you mean to leave off the ORDER BY in your forced-index query?
– Forrest
Nov 20 at 20:46