How to efficiently unpivot MULTIPLE columns in Hive?












1















My data is structured like in the below table:



| Name | Foo_A | Foo_B | Foo_C | Bar_A | Bar_B | Bar_C |
--------------------------------------------------------
| abcd | 16 | 32 | 14 | 52 | 41 | 17 |
| ... | ... | ... | ... | ... | ... | ... |


I am looking to query the data in Hive in a way such that it looks like this:



| Name | Class | FooVal | BarVal |
----------------------------------
| abcd | A | 16 | 52 |
| abcd | B | 32 | 41 |
| abcd | C | 14 | 17 |
| ... | ... | ... | ... |


I am already aware of and am using a UNION ALL, but what would be a more efficient way of doing this using "LATERAL VIEW explode" a map data type?










share|improve this question





























    1















    My data is structured like in the below table:



    | Name | Foo_A | Foo_B | Foo_C | Bar_A | Bar_B | Bar_C |
    --------------------------------------------------------
    | abcd | 16 | 32 | 14 | 52 | 41 | 17 |
    | ... | ... | ... | ... | ... | ... | ... |


    I am looking to query the data in Hive in a way such that it looks like this:



    | Name | Class | FooVal | BarVal |
    ----------------------------------
    | abcd | A | 16 | 52 |
    | abcd | B | 32 | 41 |
    | abcd | C | 14 | 17 |
    | ... | ... | ... | ... |


    I am already aware of and am using a UNION ALL, but what would be a more efficient way of doing this using "LATERAL VIEW explode" a map data type?










    share|improve this question



























      1












      1








      1


      1






      My data is structured like in the below table:



      | Name | Foo_A | Foo_B | Foo_C | Bar_A | Bar_B | Bar_C |
      --------------------------------------------------------
      | abcd | 16 | 32 | 14 | 52 | 41 | 17 |
      | ... | ... | ... | ... | ... | ... | ... |


      I am looking to query the data in Hive in a way such that it looks like this:



      | Name | Class | FooVal | BarVal |
      ----------------------------------
      | abcd | A | 16 | 52 |
      | abcd | B | 32 | 41 |
      | abcd | C | 14 | 17 |
      | ... | ... | ... | ... |


      I am already aware of and am using a UNION ALL, but what would be a more efficient way of doing this using "LATERAL VIEW explode" a map data type?










      share|improve this question
















      My data is structured like in the below table:



      | Name | Foo_A | Foo_B | Foo_C | Bar_A | Bar_B | Bar_C |
      --------------------------------------------------------
      | abcd | 16 | 32 | 14 | 52 | 41 | 17 |
      | ... | ... | ... | ... | ... | ... | ... |


      I am looking to query the data in Hive in a way such that it looks like this:



      | Name | Class | FooVal | BarVal |
      ----------------------------------
      | abcd | A | 16 | 52 |
      | abcd | B | 32 | 41 |
      | abcd | C | 14 | 17 |
      | ... | ... | ... | ... |


      I am already aware of and am using a UNION ALL, but what would be a more efficient way of doing this using "LATERAL VIEW explode" a map data type?







      hive query-optimization hiveql unpivot






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 20 '18 at 17:50









      leftjoin

      8,92522151




      8,92522151










      asked Nov 20 '18 at 16:27









      lizliz

      112




      112
























          2 Answers
          2






          active

          oldest

          votes


















          0














          CROSS JOIN with class stack (see code example) will multiply main table rows x3, one row per class, then use case statements to derive your columns depending on class value. CROSS JOIN with small dataset (3 rows) should be transformed to map join and will execute very fast on mappers.



          set hive.auto.convert.join=true; --this enables map-join

          select t.Name,
          s.class,
          case s.class when 'A' then t.Foo_A
          when 'B' then t.foo_B
          when 'C' then t.foo_C
          end as FooVal,
          case s.class when 'A' then t.Bar_A
          when 'B' then t.Bar_B
          when 'C' then t.Bar_C
          end as BarVal
          from table t
          cross join (select stack(3,'A','B','C') as class) s
          ;


          It will scan the table only once and perform much better than UNION ALL approach.






          share|improve this answer

































            0














            Thanks for reply! Please find below another way of doing it which is faster than CROSS JOIN.



                select t1.ID, t2.key_1 as class, t2.FooVal, t3.BarVal
            from table t1
            LATERAL VIEW explode (map(
            'A', Foo_A,
            'B', Foo_B,
            'C', Foo_C
            )) t2 as key_1, FooVal
            LATERAL VIEW explode (map(
            'A', Bar_A,
            'B', Bar_B,
            'C', Bar_C
            )) t3 as key_2, BarVal
            where t2.key_1 = t3.key_2;





            share|improve this answer
























            • I'm sorry, but this solution is not faster. Because cross join normally will execute as map-join. And your query does the same twice, multiplying rows x9, after that you filtering rows.

              – leftjoin
              Feb 5 at 13:34











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397354%2fhow-to-efficiently-unpivot-multiple-columns-in-hive%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            CROSS JOIN with class stack (see code example) will multiply main table rows x3, one row per class, then use case statements to derive your columns depending on class value. CROSS JOIN with small dataset (3 rows) should be transformed to map join and will execute very fast on mappers.



            set hive.auto.convert.join=true; --this enables map-join

            select t.Name,
            s.class,
            case s.class when 'A' then t.Foo_A
            when 'B' then t.foo_B
            when 'C' then t.foo_C
            end as FooVal,
            case s.class when 'A' then t.Bar_A
            when 'B' then t.Bar_B
            when 'C' then t.Bar_C
            end as BarVal
            from table t
            cross join (select stack(3,'A','B','C') as class) s
            ;


            It will scan the table only once and perform much better than UNION ALL approach.






            share|improve this answer






























              0














              CROSS JOIN with class stack (see code example) will multiply main table rows x3, one row per class, then use case statements to derive your columns depending on class value. CROSS JOIN with small dataset (3 rows) should be transformed to map join and will execute very fast on mappers.



              set hive.auto.convert.join=true; --this enables map-join

              select t.Name,
              s.class,
              case s.class when 'A' then t.Foo_A
              when 'B' then t.foo_B
              when 'C' then t.foo_C
              end as FooVal,
              case s.class when 'A' then t.Bar_A
              when 'B' then t.Bar_B
              when 'C' then t.Bar_C
              end as BarVal
              from table t
              cross join (select stack(3,'A','B','C') as class) s
              ;


              It will scan the table only once and perform much better than UNION ALL approach.






              share|improve this answer




























                0












                0








                0







                CROSS JOIN with class stack (see code example) will multiply main table rows x3, one row per class, then use case statements to derive your columns depending on class value. CROSS JOIN with small dataset (3 rows) should be transformed to map join and will execute very fast on mappers.



                set hive.auto.convert.join=true; --this enables map-join

                select t.Name,
                s.class,
                case s.class when 'A' then t.Foo_A
                when 'B' then t.foo_B
                when 'C' then t.foo_C
                end as FooVal,
                case s.class when 'A' then t.Bar_A
                when 'B' then t.Bar_B
                when 'C' then t.Bar_C
                end as BarVal
                from table t
                cross join (select stack(3,'A','B','C') as class) s
                ;


                It will scan the table only once and perform much better than UNION ALL approach.






                share|improve this answer















                CROSS JOIN with class stack (see code example) will multiply main table rows x3, one row per class, then use case statements to derive your columns depending on class value. CROSS JOIN with small dataset (3 rows) should be transformed to map join and will execute very fast on mappers.



                set hive.auto.convert.join=true; --this enables map-join

                select t.Name,
                s.class,
                case s.class when 'A' then t.Foo_A
                when 'B' then t.foo_B
                when 'C' then t.foo_C
                end as FooVal,
                case s.class when 'A' then t.Bar_A
                when 'B' then t.Bar_B
                when 'C' then t.Bar_C
                end as BarVal
                from table t
                cross join (select stack(3,'A','B','C') as class) s
                ;


                It will scan the table only once and perform much better than UNION ALL approach.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 20 '18 at 18:14

























                answered Nov 20 '18 at 17:36









                leftjoinleftjoin

                8,92522151




                8,92522151

























                    0














                    Thanks for reply! Please find below another way of doing it which is faster than CROSS JOIN.



                        select t1.ID, t2.key_1 as class, t2.FooVal, t3.BarVal
                    from table t1
                    LATERAL VIEW explode (map(
                    'A', Foo_A,
                    'B', Foo_B,
                    'C', Foo_C
                    )) t2 as key_1, FooVal
                    LATERAL VIEW explode (map(
                    'A', Bar_A,
                    'B', Bar_B,
                    'C', Bar_C
                    )) t3 as key_2, BarVal
                    where t2.key_1 = t3.key_2;





                    share|improve this answer
























                    • I'm sorry, but this solution is not faster. Because cross join normally will execute as map-join. And your query does the same twice, multiplying rows x9, after that you filtering rows.

                      – leftjoin
                      Feb 5 at 13:34
















                    0














                    Thanks for reply! Please find below another way of doing it which is faster than CROSS JOIN.



                        select t1.ID, t2.key_1 as class, t2.FooVal, t3.BarVal
                    from table t1
                    LATERAL VIEW explode (map(
                    'A', Foo_A,
                    'B', Foo_B,
                    'C', Foo_C
                    )) t2 as key_1, FooVal
                    LATERAL VIEW explode (map(
                    'A', Bar_A,
                    'B', Bar_B,
                    'C', Bar_C
                    )) t3 as key_2, BarVal
                    where t2.key_1 = t3.key_2;





                    share|improve this answer
























                    • I'm sorry, but this solution is not faster. Because cross join normally will execute as map-join. And your query does the same twice, multiplying rows x9, after that you filtering rows.

                      – leftjoin
                      Feb 5 at 13:34














                    0












                    0








                    0







                    Thanks for reply! Please find below another way of doing it which is faster than CROSS JOIN.



                        select t1.ID, t2.key_1 as class, t2.FooVal, t3.BarVal
                    from table t1
                    LATERAL VIEW explode (map(
                    'A', Foo_A,
                    'B', Foo_B,
                    'C', Foo_C
                    )) t2 as key_1, FooVal
                    LATERAL VIEW explode (map(
                    'A', Bar_A,
                    'B', Bar_B,
                    'C', Bar_C
                    )) t3 as key_2, BarVal
                    where t2.key_1 = t3.key_2;





                    share|improve this answer













                    Thanks for reply! Please find below another way of doing it which is faster than CROSS JOIN.



                        select t1.ID, t2.key_1 as class, t2.FooVal, t3.BarVal
                    from table t1
                    LATERAL VIEW explode (map(
                    'A', Foo_A,
                    'B', Foo_B,
                    'C', Foo_C
                    )) t2 as key_1, FooVal
                    LATERAL VIEW explode (map(
                    'A', Bar_A,
                    'B', Bar_B,
                    'C', Bar_C
                    )) t3 as key_2, BarVal
                    where t2.key_1 = t3.key_2;






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Dec 20 '18 at 16:15









                    lizliz

                    112




                    112













                    • I'm sorry, but this solution is not faster. Because cross join normally will execute as map-join. And your query does the same twice, multiplying rows x9, after that you filtering rows.

                      – leftjoin
                      Feb 5 at 13:34



















                    • I'm sorry, but this solution is not faster. Because cross join normally will execute as map-join. And your query does the same twice, multiplying rows x9, after that you filtering rows.

                      – leftjoin
                      Feb 5 at 13:34

















                    I'm sorry, but this solution is not faster. Because cross join normally will execute as map-join. And your query does the same twice, multiplying rows x9, after that you filtering rows.

                    – leftjoin
                    Feb 5 at 13:34





                    I'm sorry, but this solution is not faster. Because cross join normally will execute as map-join. And your query does the same twice, multiplying rows x9, after that you filtering rows.

                    – leftjoin
                    Feb 5 at 13:34


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397354%2fhow-to-efficiently-unpivot-multiple-columns-in-hive%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

                    ComboBox Display Member on multiple fields

                    Is it possible to collect Nectar points via Trainline?