Redshift SQL - Skipped sequence











up vote
2
down vote

favorite












I'm working on applicant pipeline data and need to get a count of applicants who made it to each phase of the pipeline/funnel. If an applicant skips a phase, I need to need to count them in the phase anyway. Here's an example of how that data might look for one applicant:



Stage name | Entered on
Application Review | 9/7/2018
Recruiter Screen | 9/10/2018
Phone Interview | blank
Interview | 9/17/2018
Interview 2 | 9/20/2018
Offer | blank


this is what the table looks like:



CREATE TABLE application_stages (
application_id bigint,
stage_id bigint,
entered_on timestamp without time zone,
exited_on timestamp without time zone,
stage_name character varying
);


In this example, I want to count Application Review through Interview 2 (including the skipped/blank Phone Interview phase), but not the Offer. How would I write the above in SQL? (Data is stored in Amazon Redshift. Using SQL workbench to query.)



Also, please let me know if there is anything else I can add to my question to make the issue/solution clearer.










share|improve this question
























  • Welcome to stackoverflow. You should be providing text data rather than screen shot. Also it would be good if you add scheme of table.
    – Red Boy
    Nov 14 at 22:59












  • Appreciate the guidance! I'll update.
    – WilsonS
    Nov 14 at 23:40















up vote
2
down vote

favorite












I'm working on applicant pipeline data and need to get a count of applicants who made it to each phase of the pipeline/funnel. If an applicant skips a phase, I need to need to count them in the phase anyway. Here's an example of how that data might look for one applicant:



Stage name | Entered on
Application Review | 9/7/2018
Recruiter Screen | 9/10/2018
Phone Interview | blank
Interview | 9/17/2018
Interview 2 | 9/20/2018
Offer | blank


this is what the table looks like:



CREATE TABLE application_stages (
application_id bigint,
stage_id bigint,
entered_on timestamp without time zone,
exited_on timestamp without time zone,
stage_name character varying
);


In this example, I want to count Application Review through Interview 2 (including the skipped/blank Phone Interview phase), but not the Offer. How would I write the above in SQL? (Data is stored in Amazon Redshift. Using SQL workbench to query.)



Also, please let me know if there is anything else I can add to my question to make the issue/solution clearer.










share|improve this question
























  • Welcome to stackoverflow. You should be providing text data rather than screen shot. Also it would be good if you add scheme of table.
    – Red Boy
    Nov 14 at 22:59












  • Appreciate the guidance! I'll update.
    – WilsonS
    Nov 14 at 23:40













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I'm working on applicant pipeline data and need to get a count of applicants who made it to each phase of the pipeline/funnel. If an applicant skips a phase, I need to need to count them in the phase anyway. Here's an example of how that data might look for one applicant:



Stage name | Entered on
Application Review | 9/7/2018
Recruiter Screen | 9/10/2018
Phone Interview | blank
Interview | 9/17/2018
Interview 2 | 9/20/2018
Offer | blank


this is what the table looks like:



CREATE TABLE application_stages (
application_id bigint,
stage_id bigint,
entered_on timestamp without time zone,
exited_on timestamp without time zone,
stage_name character varying
);


In this example, I want to count Application Review through Interview 2 (including the skipped/blank Phone Interview phase), but not the Offer. How would I write the above in SQL? (Data is stored in Amazon Redshift. Using SQL workbench to query.)



Also, please let me know if there is anything else I can add to my question to make the issue/solution clearer.










share|improve this question















I'm working on applicant pipeline data and need to get a count of applicants who made it to each phase of the pipeline/funnel. If an applicant skips a phase, I need to need to count them in the phase anyway. Here's an example of how that data might look for one applicant:



Stage name | Entered on
Application Review | 9/7/2018
Recruiter Screen | 9/10/2018
Phone Interview | blank
Interview | 9/17/2018
Interview 2 | 9/20/2018
Offer | blank


this is what the table looks like:



CREATE TABLE application_stages (
application_id bigint,
stage_id bigint,
entered_on timestamp without time zone,
exited_on timestamp without time zone,
stage_name character varying
);


In this example, I want to count Application Review through Interview 2 (including the skipped/blank Phone Interview phase), but not the Offer. How would I write the above in SQL? (Data is stored in Amazon Redshift. Using SQL workbench to query.)



Also, please let me know if there is anything else I can add to my question to make the issue/solution clearer.







sql amazon-redshift






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 at 23:50

























asked Nov 14 at 22:46









WilsonS

115




115












  • Welcome to stackoverflow. You should be providing text data rather than screen shot. Also it would be good if you add scheme of table.
    – Red Boy
    Nov 14 at 22:59












  • Appreciate the guidance! I'll update.
    – WilsonS
    Nov 14 at 23:40


















  • Welcome to stackoverflow. You should be providing text data rather than screen shot. Also it would be good if you add scheme of table.
    – Red Boy
    Nov 14 at 22:59












  • Appreciate the guidance! I'll update.
    – WilsonS
    Nov 14 at 23:40
















Welcome to stackoverflow. You should be providing text data rather than screen shot. Also it would be good if you add scheme of table.
– Red Boy
Nov 14 at 22:59






Welcome to stackoverflow. You should be providing text data rather than screen shot. Also it would be good if you add scheme of table.
– Red Boy
Nov 14 at 22:59














Appreciate the guidance! I'll update.
– WilsonS
Nov 14 at 23:40




Appreciate the guidance! I'll update.
– WilsonS
Nov 14 at 23:40












2 Answers
2






active

oldest

votes

















up vote
0
down vote













You can hardcode the stages of the pipeline in event_list table like this:



id | stage_name
1 | first stage
2 | second stage
3 | third stage
4 | fourth stage


UPD: The deeper is the stage of the funnel, the higher is its ID. This way, you can compare them, i.e. third stage is deeper than second stage because 3>2. Thus, if you need to find people that reached the 2nd stage it includes people that have events with id=2 OR events with id>2, i.e. events deeper in the funnel.



If the second stage is missed and the third stage is recorded for some person you can still count that person as "reached second stage" by joining your event data to this table by stage_name and counting the number of records with id>=2, like



select count(distinct user_id)
from event_data t1
join event_list t2
using (stage_name)
where t2.id>=2


Alternatively, you can left join your event table to event_list and fill the gaps using lag function that returns the value of the previous row (i.e. assigning the timestamp of first stage to the second stage in the case above)






share|improve this answer























  • Say you were using this for counting phone interview, wouldn't this count twice, once each for interview1 and 2 for the above example? This is a good start, but you then need to count the distinct application_ids.
    – DancingFool
    Nov 15 at 0:03










  • @AlexYes sometimes the blanks are correct and should not be counted. Like in this example, Offer is blank and should not be counted because it is the final phase. Does your suggestion still work? Also, the blank can be in any phase, not always id 2. It's possible that there would be multiple blanks.
    – WilsonS
    Nov 15 at 0:15












  • Here is a csv of the table. pastebin.com/eqJepqF6
    – WilsonS
    Nov 15 at 0:36










  • @WilsonS I've added more detailed explanation to the post, hope it will make sense. If you want to count the Offer you will still be able to do it correctly because this is the deepest events and only true non-blanks will be counted.
    – AlexYes
    Nov 15 at 0:47










  • @DancingFool sure! I've edited the query. This is obvious, I was more concentrated on the method of filling the gaps.
    – AlexYes
    Nov 15 at 0:48


















up vote
0
down vote



accepted










Here is the SQL I ended up with. Thanks for the ideas, @AlexYes!



select stage_name,  
application_stages.application_id, entered_on,
case when entered_on is NULL then lead(entered_on,1)
ignore nulls
over
(PARTITION BY application_stages.application_id order by case stage_name
when 'Application Review' then 1
when 'Recruiter Screen' then 2
when 'Phone Interview' then 3
when 'Interview' then 4
when 'Interview 2' then 5
when 'Offer' then 6
when 'Hired' then 7 end) else entered_on end as for_count, exited_on
from application_stages


I realize that the above SQL doesn't give me the counts but I am doing the counts in Tableau. Happy to have the format above in case I need to do other calculations on the new "for_count" field.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309866%2fredshift-sql-skipped-sequence%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    You can hardcode the stages of the pipeline in event_list table like this:



    id | stage_name
    1 | first stage
    2 | second stage
    3 | third stage
    4 | fourth stage


    UPD: The deeper is the stage of the funnel, the higher is its ID. This way, you can compare them, i.e. third stage is deeper than second stage because 3>2. Thus, if you need to find people that reached the 2nd stage it includes people that have events with id=2 OR events with id>2, i.e. events deeper in the funnel.



    If the second stage is missed and the third stage is recorded for some person you can still count that person as "reached second stage" by joining your event data to this table by stage_name and counting the number of records with id>=2, like



    select count(distinct user_id)
    from event_data t1
    join event_list t2
    using (stage_name)
    where t2.id>=2


    Alternatively, you can left join your event table to event_list and fill the gaps using lag function that returns the value of the previous row (i.e. assigning the timestamp of first stage to the second stage in the case above)






    share|improve this answer























    • Say you were using this for counting phone interview, wouldn't this count twice, once each for interview1 and 2 for the above example? This is a good start, but you then need to count the distinct application_ids.
      – DancingFool
      Nov 15 at 0:03










    • @AlexYes sometimes the blanks are correct and should not be counted. Like in this example, Offer is blank and should not be counted because it is the final phase. Does your suggestion still work? Also, the blank can be in any phase, not always id 2. It's possible that there would be multiple blanks.
      – WilsonS
      Nov 15 at 0:15












    • Here is a csv of the table. pastebin.com/eqJepqF6
      – WilsonS
      Nov 15 at 0:36










    • @WilsonS I've added more detailed explanation to the post, hope it will make sense. If you want to count the Offer you will still be able to do it correctly because this is the deepest events and only true non-blanks will be counted.
      – AlexYes
      Nov 15 at 0:47










    • @DancingFool sure! I've edited the query. This is obvious, I was more concentrated on the method of filling the gaps.
      – AlexYes
      Nov 15 at 0:48















    up vote
    0
    down vote













    You can hardcode the stages of the pipeline in event_list table like this:



    id | stage_name
    1 | first stage
    2 | second stage
    3 | third stage
    4 | fourth stage


    UPD: The deeper is the stage of the funnel, the higher is its ID. This way, you can compare them, i.e. third stage is deeper than second stage because 3>2. Thus, if you need to find people that reached the 2nd stage it includes people that have events with id=2 OR events with id>2, i.e. events deeper in the funnel.



    If the second stage is missed and the third stage is recorded for some person you can still count that person as "reached second stage" by joining your event data to this table by stage_name and counting the number of records with id>=2, like



    select count(distinct user_id)
    from event_data t1
    join event_list t2
    using (stage_name)
    where t2.id>=2


    Alternatively, you can left join your event table to event_list and fill the gaps using lag function that returns the value of the previous row (i.e. assigning the timestamp of first stage to the second stage in the case above)






    share|improve this answer























    • Say you were using this for counting phone interview, wouldn't this count twice, once each for interview1 and 2 for the above example? This is a good start, but you then need to count the distinct application_ids.
      – DancingFool
      Nov 15 at 0:03










    • @AlexYes sometimes the blanks are correct and should not be counted. Like in this example, Offer is blank and should not be counted because it is the final phase. Does your suggestion still work? Also, the blank can be in any phase, not always id 2. It's possible that there would be multiple blanks.
      – WilsonS
      Nov 15 at 0:15












    • Here is a csv of the table. pastebin.com/eqJepqF6
      – WilsonS
      Nov 15 at 0:36










    • @WilsonS I've added more detailed explanation to the post, hope it will make sense. If you want to count the Offer you will still be able to do it correctly because this is the deepest events and only true non-blanks will be counted.
      – AlexYes
      Nov 15 at 0:47










    • @DancingFool sure! I've edited the query. This is obvious, I was more concentrated on the method of filling the gaps.
      – AlexYes
      Nov 15 at 0:48













    up vote
    0
    down vote










    up vote
    0
    down vote









    You can hardcode the stages of the pipeline in event_list table like this:



    id | stage_name
    1 | first stage
    2 | second stage
    3 | third stage
    4 | fourth stage


    UPD: The deeper is the stage of the funnel, the higher is its ID. This way, you can compare them, i.e. third stage is deeper than second stage because 3>2. Thus, if you need to find people that reached the 2nd stage it includes people that have events with id=2 OR events with id>2, i.e. events deeper in the funnel.



    If the second stage is missed and the third stage is recorded for some person you can still count that person as "reached second stage" by joining your event data to this table by stage_name and counting the number of records with id>=2, like



    select count(distinct user_id)
    from event_data t1
    join event_list t2
    using (stage_name)
    where t2.id>=2


    Alternatively, you can left join your event table to event_list and fill the gaps using lag function that returns the value of the previous row (i.e. assigning the timestamp of first stage to the second stage in the case above)






    share|improve this answer














    You can hardcode the stages of the pipeline in event_list table like this:



    id | stage_name
    1 | first stage
    2 | second stage
    3 | third stage
    4 | fourth stage


    UPD: The deeper is the stage of the funnel, the higher is its ID. This way, you can compare them, i.e. third stage is deeper than second stage because 3>2. Thus, if you need to find people that reached the 2nd stage it includes people that have events with id=2 OR events with id>2, i.e. events deeper in the funnel.



    If the second stage is missed and the third stage is recorded for some person you can still count that person as "reached second stage" by joining your event data to this table by stage_name and counting the number of records with id>=2, like



    select count(distinct user_id)
    from event_data t1
    join event_list t2
    using (stage_name)
    where t2.id>=2


    Alternatively, you can left join your event table to event_list and fill the gaps using lag function that returns the value of the previous row (i.e. assigning the timestamp of first stage to the second stage in the case above)







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 15 at 0:41

























    answered Nov 14 at 23:52









    AlexYes

    2,1142715




    2,1142715












    • Say you were using this for counting phone interview, wouldn't this count twice, once each for interview1 and 2 for the above example? This is a good start, but you then need to count the distinct application_ids.
      – DancingFool
      Nov 15 at 0:03










    • @AlexYes sometimes the blanks are correct and should not be counted. Like in this example, Offer is blank and should not be counted because it is the final phase. Does your suggestion still work? Also, the blank can be in any phase, not always id 2. It's possible that there would be multiple blanks.
      – WilsonS
      Nov 15 at 0:15












    • Here is a csv of the table. pastebin.com/eqJepqF6
      – WilsonS
      Nov 15 at 0:36










    • @WilsonS I've added more detailed explanation to the post, hope it will make sense. If you want to count the Offer you will still be able to do it correctly because this is the deepest events and only true non-blanks will be counted.
      – AlexYes
      Nov 15 at 0:47










    • @DancingFool sure! I've edited the query. This is obvious, I was more concentrated on the method of filling the gaps.
      – AlexYes
      Nov 15 at 0:48


















    • Say you were using this for counting phone interview, wouldn't this count twice, once each for interview1 and 2 for the above example? This is a good start, but you then need to count the distinct application_ids.
      – DancingFool
      Nov 15 at 0:03










    • @AlexYes sometimes the blanks are correct and should not be counted. Like in this example, Offer is blank and should not be counted because it is the final phase. Does your suggestion still work? Also, the blank can be in any phase, not always id 2. It's possible that there would be multiple blanks.
      – WilsonS
      Nov 15 at 0:15












    • Here is a csv of the table. pastebin.com/eqJepqF6
      – WilsonS
      Nov 15 at 0:36










    • @WilsonS I've added more detailed explanation to the post, hope it will make sense. If you want to count the Offer you will still be able to do it correctly because this is the deepest events and only true non-blanks will be counted.
      – AlexYes
      Nov 15 at 0:47










    • @DancingFool sure! I've edited the query. This is obvious, I was more concentrated on the method of filling the gaps.
      – AlexYes
      Nov 15 at 0:48
















    Say you were using this for counting phone interview, wouldn't this count twice, once each for interview1 and 2 for the above example? This is a good start, but you then need to count the distinct application_ids.
    – DancingFool
    Nov 15 at 0:03




    Say you were using this for counting phone interview, wouldn't this count twice, once each for interview1 and 2 for the above example? This is a good start, but you then need to count the distinct application_ids.
    – DancingFool
    Nov 15 at 0:03












    @AlexYes sometimes the blanks are correct and should not be counted. Like in this example, Offer is blank and should not be counted because it is the final phase. Does your suggestion still work? Also, the blank can be in any phase, not always id 2. It's possible that there would be multiple blanks.
    – WilsonS
    Nov 15 at 0:15






    @AlexYes sometimes the blanks are correct and should not be counted. Like in this example, Offer is blank and should not be counted because it is the final phase. Does your suggestion still work? Also, the blank can be in any phase, not always id 2. It's possible that there would be multiple blanks.
    – WilsonS
    Nov 15 at 0:15














    Here is a csv of the table. pastebin.com/eqJepqF6
    – WilsonS
    Nov 15 at 0:36




    Here is a csv of the table. pastebin.com/eqJepqF6
    – WilsonS
    Nov 15 at 0:36












    @WilsonS I've added more detailed explanation to the post, hope it will make sense. If you want to count the Offer you will still be able to do it correctly because this is the deepest events and only true non-blanks will be counted.
    – AlexYes
    Nov 15 at 0:47




    @WilsonS I've added more detailed explanation to the post, hope it will make sense. If you want to count the Offer you will still be able to do it correctly because this is the deepest events and only true non-blanks will be counted.
    – AlexYes
    Nov 15 at 0:47












    @DancingFool sure! I've edited the query. This is obvious, I was more concentrated on the method of filling the gaps.
    – AlexYes
    Nov 15 at 0:48




    @DancingFool sure! I've edited the query. This is obvious, I was more concentrated on the method of filling the gaps.
    – AlexYes
    Nov 15 at 0:48












    up vote
    0
    down vote



    accepted










    Here is the SQL I ended up with. Thanks for the ideas, @AlexYes!



    select stage_name,  
    application_stages.application_id, entered_on,
    case when entered_on is NULL then lead(entered_on,1)
    ignore nulls
    over
    (PARTITION BY application_stages.application_id order by case stage_name
    when 'Application Review' then 1
    when 'Recruiter Screen' then 2
    when 'Phone Interview' then 3
    when 'Interview' then 4
    when 'Interview 2' then 5
    when 'Offer' then 6
    when 'Hired' then 7 end) else entered_on end as for_count, exited_on
    from application_stages


    I realize that the above SQL doesn't give me the counts but I am doing the counts in Tableau. Happy to have the format above in case I need to do other calculations on the new "for_count" field.






    share|improve this answer



























      up vote
      0
      down vote



      accepted










      Here is the SQL I ended up with. Thanks for the ideas, @AlexYes!



      select stage_name,  
      application_stages.application_id, entered_on,
      case when entered_on is NULL then lead(entered_on,1)
      ignore nulls
      over
      (PARTITION BY application_stages.application_id order by case stage_name
      when 'Application Review' then 1
      when 'Recruiter Screen' then 2
      when 'Phone Interview' then 3
      when 'Interview' then 4
      when 'Interview 2' then 5
      when 'Offer' then 6
      when 'Hired' then 7 end) else entered_on end as for_count, exited_on
      from application_stages


      I realize that the above SQL doesn't give me the counts but I am doing the counts in Tableau. Happy to have the format above in case I need to do other calculations on the new "for_count" field.






      share|improve this answer

























        up vote
        0
        down vote



        accepted







        up vote
        0
        down vote



        accepted






        Here is the SQL I ended up with. Thanks for the ideas, @AlexYes!



        select stage_name,  
        application_stages.application_id, entered_on,
        case when entered_on is NULL then lead(entered_on,1)
        ignore nulls
        over
        (PARTITION BY application_stages.application_id order by case stage_name
        when 'Application Review' then 1
        when 'Recruiter Screen' then 2
        when 'Phone Interview' then 3
        when 'Interview' then 4
        when 'Interview 2' then 5
        when 'Offer' then 6
        when 'Hired' then 7 end) else entered_on end as for_count, exited_on
        from application_stages


        I realize that the above SQL doesn't give me the counts but I am doing the counts in Tableau. Happy to have the format above in case I need to do other calculations on the new "for_count" field.






        share|improve this answer














        Here is the SQL I ended up with. Thanks for the ideas, @AlexYes!



        select stage_name,  
        application_stages.application_id, entered_on,
        case when entered_on is NULL then lead(entered_on,1)
        ignore nulls
        over
        (PARTITION BY application_stages.application_id order by case stage_name
        when 'Application Review' then 1
        when 'Recruiter Screen' then 2
        when 'Phone Interview' then 3
        when 'Interview' then 4
        when 'Interview 2' then 5
        when 'Offer' then 6
        when 'Hired' then 7 end) else entered_on end as for_count, exited_on
        from application_stages


        I realize that the above SQL doesn't give me the counts but I am doing the counts in Tableau. Happy to have the format above in case I need to do other calculations on the new "for_count" field.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 21 at 17:46

























        answered Nov 20 at 19:29









        WilsonS

        115




        115






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309866%2fredshift-sql-skipped-sequence%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            mysqli_query(): Empty query in /home/lucindabrummitt/public_html/blog/wp-includes/wp-db.php on line 1924

            How to change which sound is reproduced for terminal bell?

            Can I use Tabulator js library in my java Spring + Thymeleaf project?