Why is this column order in my non-clustered index better for my query?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







3















I am working on a query for a table that contains movie tickets. The database holds 380k rows. A rows represents a showing of the movie (which cinema, when, how many tickets and at what price among other things).



I need to compute a few totals for every row: Admissions Paid, Admissions Revenue, Admissions Free and Total Admissions.



For a given row Admissions Paid is the sum of all tickets for that movie up until that point where price>0. The other 3 columns are computed similarly.



I wrote a query and created an index:



 SELECT [ID]
,[cinema_name]
,[movie_title]
,[price]
,[quantity]
,[start_date_time]
,* --I need all the columns for reporting
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price=0) as [Admissions Free]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Paid]
,(select SUM(quantity*price)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Revenue]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time) as [Total Admissions]
FROM [movies] o


I created the following index which brought the query time down to 5 minutes:



CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[movie_title] ASC,
[start_date_time] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


But this index brought the query time down to 1:30:



CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[start_date_time] ASC,
[movie_title] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


So my question is: why? From my understanding, it makes more sense to first gather all the movie titles and then look at the start times because there are more start times then there are movies. Distinct movies: 51, distinct start_date_times: 8786



Doesn't the underling B-Tree not cut off more branches if it eliminates the unnecessary start_date_times first?



Here are the execution plans:



desc



enter image description here



The first picture shows the execution plan for the index with movie_title first, the other picture shows start_date_time first.










share|improve this question




















  • 1





    Could you provide execution plans for both indexing options?

    – vonPryz
    Feb 25 at 8:15






  • 1





    Subqueries are too strange... It seems they can be converted to window-type SUM().

    – Akina
    Feb 25 at 8:21











  • @SabinBio o is the outer table. I edited the question.

    – dakes
    Feb 25 at 8:23






  • 1





    The key is here Distinct movies: 51, distinct start_date_times: 8786; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is

    – Sabin Bio
    Feb 25 at 8:29








  • 2





    Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.

    – hot2use
    Feb 25 at 13:26




















3















I am working on a query for a table that contains movie tickets. The database holds 380k rows. A rows represents a showing of the movie (which cinema, when, how many tickets and at what price among other things).



I need to compute a few totals for every row: Admissions Paid, Admissions Revenue, Admissions Free and Total Admissions.



For a given row Admissions Paid is the sum of all tickets for that movie up until that point where price>0. The other 3 columns are computed similarly.



I wrote a query and created an index:



 SELECT [ID]
,[cinema_name]
,[movie_title]
,[price]
,[quantity]
,[start_date_time]
,* --I need all the columns for reporting
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price=0) as [Admissions Free]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Paid]
,(select SUM(quantity*price)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Revenue]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time) as [Total Admissions]
FROM [movies] o


I created the following index which brought the query time down to 5 minutes:



CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[movie_title] ASC,
[start_date_time] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


But this index brought the query time down to 1:30:



CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[start_date_time] ASC,
[movie_title] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


So my question is: why? From my understanding, it makes more sense to first gather all the movie titles and then look at the start times because there are more start times then there are movies. Distinct movies: 51, distinct start_date_times: 8786



Doesn't the underling B-Tree not cut off more branches if it eliminates the unnecessary start_date_times first?



Here are the execution plans:



desc



enter image description here



The first picture shows the execution plan for the index with movie_title first, the other picture shows start_date_time first.










share|improve this question




















  • 1





    Could you provide execution plans for both indexing options?

    – vonPryz
    Feb 25 at 8:15






  • 1





    Subqueries are too strange... It seems they can be converted to window-type SUM().

    – Akina
    Feb 25 at 8:21











  • @SabinBio o is the outer table. I edited the question.

    – dakes
    Feb 25 at 8:23






  • 1





    The key is here Distinct movies: 51, distinct start_date_times: 8786; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is

    – Sabin Bio
    Feb 25 at 8:29








  • 2





    Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.

    – hot2use
    Feb 25 at 13:26
















3












3








3








I am working on a query for a table that contains movie tickets. The database holds 380k rows. A rows represents a showing of the movie (which cinema, when, how many tickets and at what price among other things).



I need to compute a few totals for every row: Admissions Paid, Admissions Revenue, Admissions Free and Total Admissions.



For a given row Admissions Paid is the sum of all tickets for that movie up until that point where price>0. The other 3 columns are computed similarly.



I wrote a query and created an index:



 SELECT [ID]
,[cinema_name]
,[movie_title]
,[price]
,[quantity]
,[start_date_time]
,* --I need all the columns for reporting
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price=0) as [Admissions Free]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Paid]
,(select SUM(quantity*price)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Revenue]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time) as [Total Admissions]
FROM [movies] o


I created the following index which brought the query time down to 5 minutes:



CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[movie_title] ASC,
[start_date_time] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


But this index brought the query time down to 1:30:



CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[start_date_time] ASC,
[movie_title] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


So my question is: why? From my understanding, it makes more sense to first gather all the movie titles and then look at the start times because there are more start times then there are movies. Distinct movies: 51, distinct start_date_times: 8786



Doesn't the underling B-Tree not cut off more branches if it eliminates the unnecessary start_date_times first?



Here are the execution plans:



desc



enter image description here



The first picture shows the execution plan for the index with movie_title first, the other picture shows start_date_time first.










share|improve this question
















I am working on a query for a table that contains movie tickets. The database holds 380k rows. A rows represents a showing of the movie (which cinema, when, how many tickets and at what price among other things).



I need to compute a few totals for every row: Admissions Paid, Admissions Revenue, Admissions Free and Total Admissions.



For a given row Admissions Paid is the sum of all tickets for that movie up until that point where price>0. The other 3 columns are computed similarly.



I wrote a query and created an index:



 SELECT [ID]
,[cinema_name]
,[movie_title]
,[price]
,[quantity]
,[start_date_time]
,* --I need all the columns for reporting
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price=0) as [Admissions Free]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Paid]
,(select SUM(quantity*price)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time
and price>0) as [Admissions Revenue]
,(select SUM(quantity)
from [movies] i
where i.movie_title=o.movie_title
and i.start_date_time<=o.start_date_time) as [Total Admissions]
FROM [movies] o


I created the following index which brought the query time down to 5 minutes:



CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[movie_title] ASC,
[start_date_time] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


But this index brought the query time down to 1:30:



CREATE NONCLUSTERED INDEX [startdatetime_movietitle_price] ON [dbo].[movies]
(
[start_date_time] ASC,
[movie_title] ASC,
[price] DESC
)
INCLUDE ( [quantity]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


So my question is: why? From my understanding, it makes more sense to first gather all the movie titles and then look at the start times because there are more start times then there are movies. Distinct movies: 51, distinct start_date_times: 8786



Doesn't the underling B-Tree not cut off more branches if it eliminates the unnecessary start_date_times first?



Here are the execution plans:



desc



enter image description here



The first picture shows the execution plan for the index with movie_title first, the other picture shows start_date_time first.







sql-server sql-server-2014






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 25 at 17:35









MDCCL

6,88331845




6,88331845










asked Feb 25 at 7:56









dakesdakes

183




183








  • 1





    Could you provide execution plans for both indexing options?

    – vonPryz
    Feb 25 at 8:15






  • 1





    Subqueries are too strange... It seems they can be converted to window-type SUM().

    – Akina
    Feb 25 at 8:21











  • @SabinBio o is the outer table. I edited the question.

    – dakes
    Feb 25 at 8:23






  • 1





    The key is here Distinct movies: 51, distinct start_date_times: 8786; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is

    – Sabin Bio
    Feb 25 at 8:29








  • 2





    Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.

    – hot2use
    Feb 25 at 13:26
















  • 1





    Could you provide execution plans for both indexing options?

    – vonPryz
    Feb 25 at 8:15






  • 1





    Subqueries are too strange... It seems they can be converted to window-type SUM().

    – Akina
    Feb 25 at 8:21











  • @SabinBio o is the outer table. I edited the question.

    – dakes
    Feb 25 at 8:23






  • 1





    The key is here Distinct movies: 51, distinct start_date_times: 8786; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is

    – Sabin Bio
    Feb 25 at 8:29








  • 2





    Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.

    – hot2use
    Feb 25 at 13:26










1




1





Could you provide execution plans for both indexing options?

– vonPryz
Feb 25 at 8:15





Could you provide execution plans for both indexing options?

– vonPryz
Feb 25 at 8:15




1




1





Subqueries are too strange... It seems they can be converted to window-type SUM().

– Akina
Feb 25 at 8:21





Subqueries are too strange... It seems they can be converted to window-type SUM().

– Akina
Feb 25 at 8:21













@SabinBio o is the outer table. I edited the question.

– dakes
Feb 25 at 8:23





@SabinBio o is the outer table. I edited the question.

– dakes
Feb 25 at 8:23




1




1





The key is here Distinct movies: 51, distinct start_date_times: 8786; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is

– Sabin Bio
Feb 25 at 8:29







The key is here Distinct movies: 51, distinct start_date_times: 8786; some hints could be at the logical reads and in the execution plan(s). For an index, it's very important the first column and how selective it is

– Sabin Bio
Feb 25 at 8:29






2




2





Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.

– hot2use
Feb 25 at 13:26







Could you post a link to the individual plans via Paste The Plan? You can anonymise information using Sentry One's Plan Explorer if need be.

– hot2use
Feb 25 at 13:26












2 Answers
2






active

oldest

votes


















5














The first index does look like a better fit for the query. Please provide the actual execution plans.



I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY) and see which of the two indexes is used.

Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).



It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:



using window functions:



-- window functions
SELECT
-- m.*,
movie_title, start_date_time,
price, quantity,

SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Free],
SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Paid],
SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Admissions Revenue],
SUM(quantity)
OVER
(PARTITION BY movie_title
ORDER BY start_date_time
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS [Total Admissions]
FROM
[movies] AS m ;


*: If there is a UNIQUE constraint on (movie_title, start_date_time), then you could use ROWS instead of RANGE for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE is required above.



using OUTER APPLY:



-- using OUTER APPLY
SELECT
-- m.*,
m.movie_title, m.start_date_time,
m.price, m.quantity,

c.[Admissions Free],
c.[Admissions Paid],
c.[Admissions Revenue],
c.[Total Admissions]
FROM
[movies] AS m
OUTER APPLY
( SELECT
SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
AS [Admissions Free],
SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
AS [Admissions Paid],
SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
AS [Admissions Revenue],
SUM(i.quantity)
AS [Total Admissions]
FROM [movies] AS i
WHERE i.movie_title = o.movie_title
AND i.start_date_time <= o.start_date_time
) AS c ;




This index may be a little better than the first one:



(
movie_title ASC,
start_date_time ASC
)
INCLUDE (price, quantity)





share|improve this answer


























  • Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?

    – dakes
    Feb 25 at 9:32













  • I tested the query and while it is faster it's not what I need to show. I'll use Paid Admissions as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.

    – dakes
    Feb 25 at 11:42











  • @dakes oh, I intended to add a note about that. Se the edit. (in short, use RANGE not ROWS, for the window fames)

    – ypercubeᵀᴹ
    Feb 25 at 11:45













  • I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.

    – dakes
    Feb 25 at 12:48



















0














I agree with ypercubeᵀᴹ answer, the query should be rewritten.




Can you maybe explain why this is so much faster?




The query that use the second index is faster only because it's executing in parallel.
Try to add option(maxdop 1) and the use of the first index will be faster.






share|improve this answer
























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230639%2fwhy-is-this-column-order-in-my-non-clustered-index-better-for-my-query%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5














    The first index does look like a better fit for the query. Please provide the actual execution plans.



    I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY) and see which of the two indexes is used.

    Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).



    It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:



    using window functions:



    -- window functions
    SELECT
    -- m.*,
    movie_title, start_date_time,
    price, quantity,

    SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Free],
    SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Paid],
    SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Revenue],
    SUM(quantity)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Total Admissions]
    FROM
    [movies] AS m ;


    *: If there is a UNIQUE constraint on (movie_title, start_date_time), then you could use ROWS instead of RANGE for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE is required above.



    using OUTER APPLY:



    -- using OUTER APPLY
    SELECT
    -- m.*,
    m.movie_title, m.start_date_time,
    m.price, m.quantity,

    c.[Admissions Free],
    c.[Admissions Paid],
    c.[Admissions Revenue],
    c.[Total Admissions]
    FROM
    [movies] AS m
    OUTER APPLY
    ( SELECT
    SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
    AS [Admissions Free],
    SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
    AS [Admissions Paid],
    SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
    AS [Admissions Revenue],
    SUM(i.quantity)
    AS [Total Admissions]
    FROM [movies] AS i
    WHERE i.movie_title = o.movie_title
    AND i.start_date_time <= o.start_date_time
    ) AS c ;




    This index may be a little better than the first one:



    (
    movie_title ASC,
    start_date_time ASC
    )
    INCLUDE (price, quantity)





    share|improve this answer


























    • Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?

      – dakes
      Feb 25 at 9:32













    • I tested the query and while it is faster it's not what I need to show. I'll use Paid Admissions as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.

      – dakes
      Feb 25 at 11:42











    • @dakes oh, I intended to add a note about that. Se the edit. (in short, use RANGE not ROWS, for the window fames)

      – ypercubeᵀᴹ
      Feb 25 at 11:45













    • I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.

      – dakes
      Feb 25 at 12:48
















    5














    The first index does look like a better fit for the query. Please provide the actual execution plans.



    I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY) and see which of the two indexes is used.

    Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).



    It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:



    using window functions:



    -- window functions
    SELECT
    -- m.*,
    movie_title, start_date_time,
    price, quantity,

    SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Free],
    SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Paid],
    SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Revenue],
    SUM(quantity)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Total Admissions]
    FROM
    [movies] AS m ;


    *: If there is a UNIQUE constraint on (movie_title, start_date_time), then you could use ROWS instead of RANGE for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE is required above.



    using OUTER APPLY:



    -- using OUTER APPLY
    SELECT
    -- m.*,
    m.movie_title, m.start_date_time,
    m.price, m.quantity,

    c.[Admissions Free],
    c.[Admissions Paid],
    c.[Admissions Revenue],
    c.[Total Admissions]
    FROM
    [movies] AS m
    OUTER APPLY
    ( SELECT
    SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
    AS [Admissions Free],
    SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
    AS [Admissions Paid],
    SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
    AS [Admissions Revenue],
    SUM(i.quantity)
    AS [Total Admissions]
    FROM [movies] AS i
    WHERE i.movie_title = o.movie_title
    AND i.start_date_time <= o.start_date_time
    ) AS c ;




    This index may be a little better than the first one:



    (
    movie_title ASC,
    start_date_time ASC
    )
    INCLUDE (price, quantity)





    share|improve this answer


























    • Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?

      – dakes
      Feb 25 at 9:32













    • I tested the query and while it is faster it's not what I need to show. I'll use Paid Admissions as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.

      – dakes
      Feb 25 at 11:42











    • @dakes oh, I intended to add a note about that. Se the edit. (in short, use RANGE not ROWS, for the window fames)

      – ypercubeᵀᴹ
      Feb 25 at 11:45













    • I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.

      – dakes
      Feb 25 at 12:48














    5












    5








    5







    The first index does look like a better fit for the query. Please provide the actual execution plans.



    I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY) and see which of the two indexes is used.

    Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).



    It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:



    using window functions:



    -- window functions
    SELECT
    -- m.*,
    movie_title, start_date_time,
    price, quantity,

    SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Free],
    SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Paid],
    SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Revenue],
    SUM(quantity)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Total Admissions]
    FROM
    [movies] AS m ;


    *: If there is a UNIQUE constraint on (movie_title, start_date_time), then you could use ROWS instead of RANGE for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE is required above.



    using OUTER APPLY:



    -- using OUTER APPLY
    SELECT
    -- m.*,
    m.movie_title, m.start_date_time,
    m.price, m.quantity,

    c.[Admissions Free],
    c.[Admissions Paid],
    c.[Admissions Revenue],
    c.[Total Admissions]
    FROM
    [movies] AS m
    OUTER APPLY
    ( SELECT
    SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
    AS [Admissions Free],
    SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
    AS [Admissions Paid],
    SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
    AS [Admissions Revenue],
    SUM(i.quantity)
    AS [Total Admissions]
    FROM [movies] AS i
    WHERE i.movie_title = o.movie_title
    AND i.start_date_time <= o.start_date_time
    ) AS c ;




    This index may be a little better than the first one:



    (
    movie_title ASC,
    start_date_time ASC
    )
    INCLUDE (price, quantity)





    share|improve this answer















    The first index does look like a better fit for the query. Please provide the actual execution plans.



    I would try using window functions instead of the four correlated subqueries. Or a single correlated subquery (with OUTER APPLY) and see which of the two indexes is used.

    Both ideas are to coerce the optimizer to use a single index scan to gather the rolling sums instead of 4 (that both your plans do).



    It would also be worth checking and comparing the two execution plans, when asking for all the columns and when asking for only the columns in the index:



    using window functions:



    -- window functions
    SELECT
    -- m.*,
    movie_title, start_date_time,
    price, quantity,

    SUM(CASE WHEN price = 0 THEN quantity ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Free],
    SUM(CASE WHEN price > 0 THEN quantity ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Paid],
    SUM(CASE WHEN price > 0 THEN quantity * price ELSE 0 END)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Admissions Revenue],
    SUM(quantity)
    OVER
    (PARTITION BY movie_title
    ORDER BY start_date_time
    RANGE BETWEEN UNBOUNDED PRECEDING
    AND CURRENT ROW
    ) AS [Total Admissions]
    FROM
    [movies] AS m ;


    *: If there is a UNIQUE constraint on (movie_title, start_date_time), then you could use ROWS instead of RANGE for the window frames (it's usually more efficient). From the comments, there is no such constraint and there could be many rows with same title and datetime, so RANGE is required above.



    using OUTER APPLY:



    -- using OUTER APPLY
    SELECT
    -- m.*,
    m.movie_title, m.start_date_time,
    m.price, m.quantity,

    c.[Admissions Free],
    c.[Admissions Paid],
    c.[Admissions Revenue],
    c.[Total Admissions]
    FROM
    [movies] AS m
    OUTER APPLY
    ( SELECT
    SUM(CASE WHEN i.price = 0 THEN i.quantity ELSE 0 END)
    AS [Admissions Free],
    SUM(CASE WHEN i.price > 0 THEN i.quantity ELSE 0 END)
    AS [Admissions Paid],
    SUM(CASE WHEN i.price > 0 THEN i.quantity * i.price ELSE 0 END)
    AS [Admissions Revenue],
    SUM(i.quantity)
    AS [Total Admissions]
    FROM [movies] AS i
    WHERE i.movie_title = o.movie_title
    AND i.start_date_time <= o.start_date_time
    ) AS c ;




    This index may be a little better than the first one:



    (
    movie_title ASC,
    start_date_time ASC
    )
    INCLUDE (price, quantity)






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 25 at 11:48

























    answered Feb 25 at 9:17









    ypercubeᵀᴹypercubeᵀᴹ

    78.7k11137222




    78.7k11137222













    • Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?

      – dakes
      Feb 25 at 9:32













    • I tested the query and while it is faster it's not what I need to show. I'll use Paid Admissions as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.

      – dakes
      Feb 25 at 11:42











    • @dakes oh, I intended to add a note about that. Se the edit. (in short, use RANGE not ROWS, for the window fames)

      – ypercubeᵀᴹ
      Feb 25 at 11:45













    • I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.

      – dakes
      Feb 25 at 12:48



















    • Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?

      – dakes
      Feb 25 at 9:32













    • I tested the query and while it is faster it's not what I need to show. I'll use Paid Admissions as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.

      – dakes
      Feb 25 at 11:42











    • @dakes oh, I intended to add a note about that. Se the edit. (in short, use RANGE not ROWS, for the window fames)

      – ypercubeᵀᴹ
      Feb 25 at 11:45













    • I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.

      – dakes
      Feb 25 at 12:48

















    Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?

    – dakes
    Feb 25 at 9:32







    Thank you very much, this is indeed much much faster. Execution time for the first index is 3 sec while it's 4 sec with the second index. I added the execution plans to the post, btw. Can you maybe explain why this is so much faster?

    – dakes
    Feb 25 at 9:32















    I tested the query and while it is faster it's not what I need to show. I'll use Paid Admissions as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.

    – dakes
    Feb 25 at 11:42





    I tested the query and while it is faster it's not what I need to show. I'll use Paid Admissions as an example. Assume there are three shows starting at the same time. Your query computes different Paid Admissions for all three columns, but I need them to be the same, namely the sum of all paid admissions at that time. I assume this is because it looks at rows in the OVER part one after the other.

    – dakes
    Feb 25 at 11:42













    @dakes oh, I intended to add a note about that. Se the edit. (in short, use RANGE not ROWS, for the window fames)

    – ypercubeᵀᴹ
    Feb 25 at 11:45







    @dakes oh, I intended to add a note about that. Se the edit. (in short, use RANGE not ROWS, for the window fames)

    – ypercubeᵀᴹ
    Feb 25 at 11:45















    I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.

    – dakes
    Feb 25 at 12:48





    I was torn which answer I should accept because @sepupic s answer actually answers my question, but this one was much more helpful in the long run. SO says "The bottom line is that you should accept the answer that you found to be the most helpful to you, personally.", so I accepted this one. I hope that's correct.

    – dakes
    Feb 25 at 12:48













    0














    I agree with ypercubeᵀᴹ answer, the query should be rewritten.




    Can you maybe explain why this is so much faster?




    The query that use the second index is faster only because it's executing in parallel.
    Try to add option(maxdop 1) and the use of the first index will be faster.






    share|improve this answer




























      0














      I agree with ypercubeᵀᴹ answer, the query should be rewritten.




      Can you maybe explain why this is so much faster?




      The query that use the second index is faster only because it's executing in parallel.
      Try to add option(maxdop 1) and the use of the first index will be faster.






      share|improve this answer


























        0












        0








        0







        I agree with ypercubeᵀᴹ answer, the query should be rewritten.




        Can you maybe explain why this is so much faster?




        The query that use the second index is faster only because it's executing in parallel.
        Try to add option(maxdop 1) and the use of the first index will be faster.






        share|improve this answer













        I agree with ypercubeᵀᴹ answer, the query should be rewritten.




        Can you maybe explain why this is so much faster?




        The query that use the second index is faster only because it's executing in parallel.
        Try to add option(maxdop 1) and the use of the first index will be faster.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 25 at 10:45









        sepupicsepupic

        7,841820




        7,841820






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230639%2fwhy-is-this-column-order-in-my-non-clustered-index-better-for-my-query%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

            ComboBox Display Member on multiple fields

            Is it possible to collect Nectar points via Trainline?