How to perform groupby + transform + nunique in pandas?











up vote
1
down vote

favorite












I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.



df = pd.DataFrame({
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})

df.groupby('mID')['uID'].nunique()


Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:



df['ncount'] = df.groupby('mID')['uID'].transform('nunique')


(this obviously does not work)



It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.



Thanks










share|improve this question


























    up vote
    1
    down vote

    favorite












    I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.



    df = pd.DataFrame({
    'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
    'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
    })

    df.groupby('mID')['uID'].nunique()


    Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:



    df['ncount'] = df.groupby('mID')['uID'].transform('nunique')


    (this obviously does not work)



    It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.



    Thanks










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.



      df = pd.DataFrame({
      'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
      'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
      })

      df.groupby('mID')['uID'].nunique()


      Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:



      df['ncount'] = df.groupby('mID')['uID'].transform('nunique')


      (this obviously does not work)



      It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.



      Thanks










      share|improve this question













      I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.



      df = pd.DataFrame({
      'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
      'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
      })

      df.groupby('mID')['uID'].nunique()


      Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:



      df['ncount'] = df.groupby('mID')['uID'].transform('nunique')


      (this obviously does not work)



      It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.



      Thanks







      pandas unique pandas-groupby






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 at 23:35









      ZeroStack

      359116




      359116
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          GroupBy.transform('nunique')



          On v0.23.4, your solution works for me.



          df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
          df
          uID mID ncount
          0 James A 5
          1 Henry B 2
          2 Abe A 5
          3 James B 2
          4 Henry A 5
          5 Brian A 5
          6 Claude A 5
          7 James C 1





          GroupBy.nunique + pd.Series.map



          Additionally, with your existing solution, you could map the series back to mID:



          df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
          df
          uID mID ncount
          0 James A 5
          1 Henry B 2
          2 Abe A 5
          3 James B 2
          4 Henry A 5
          5 Brian A 5
          6 Claude A 5
          7 James C 1





          share|improve this answer




























            up vote
            1
            down vote













            You are very close!



            df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

            uID mID ncount
            0 James A 5
            1 Henry B 2
            2 Abe A 5
            3 James B 2
            4 Henry A 5
            5 Brian A 5
            6 Claude A 5
            7 James C 1





            share|improve this answer





















            • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
              – ZeroStack
              Nov 12 at 23:47












            • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
              – Peter Leimbigler
              Nov 12 at 23:53











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271655%2fhow-to-perform-groupby-transform-nunique-in-pandas%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            GroupBy.transform('nunique')



            On v0.23.4, your solution works for me.



            df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
            df
            uID mID ncount
            0 James A 5
            1 Henry B 2
            2 Abe A 5
            3 James B 2
            4 Henry A 5
            5 Brian A 5
            6 Claude A 5
            7 James C 1





            GroupBy.nunique + pd.Series.map



            Additionally, with your existing solution, you could map the series back to mID:



            df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
            df
            uID mID ncount
            0 James A 5
            1 Henry B 2
            2 Abe A 5
            3 James B 2
            4 Henry A 5
            5 Brian A 5
            6 Claude A 5
            7 James C 1





            share|improve this answer

























              up vote
              1
              down vote



              accepted










              GroupBy.transform('nunique')



              On v0.23.4, your solution works for me.



              df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
              df
              uID mID ncount
              0 James A 5
              1 Henry B 2
              2 Abe A 5
              3 James B 2
              4 Henry A 5
              5 Brian A 5
              6 Claude A 5
              7 James C 1





              GroupBy.nunique + pd.Series.map



              Additionally, with your existing solution, you could map the series back to mID:



              df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
              df
              uID mID ncount
              0 James A 5
              1 Henry B 2
              2 Abe A 5
              3 James B 2
              4 Henry A 5
              5 Brian A 5
              6 Claude A 5
              7 James C 1





              share|improve this answer























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                GroupBy.transform('nunique')



                On v0.23.4, your solution works for me.



                df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
                df
                uID mID ncount
                0 James A 5
                1 Henry B 2
                2 Abe A 5
                3 James B 2
                4 Henry A 5
                5 Brian A 5
                6 Claude A 5
                7 James C 1





                GroupBy.nunique + pd.Series.map



                Additionally, with your existing solution, you could map the series back to mID:



                df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
                df
                uID mID ncount
                0 James A 5
                1 Henry B 2
                2 Abe A 5
                3 James B 2
                4 Henry A 5
                5 Brian A 5
                6 Claude A 5
                7 James C 1





                share|improve this answer












                GroupBy.transform('nunique')



                On v0.23.4, your solution works for me.



                df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
                df
                uID mID ncount
                0 James A 5
                1 Henry B 2
                2 Abe A 5
                3 James B 2
                4 Henry A 5
                5 Brian A 5
                6 Claude A 5
                7 James C 1





                GroupBy.nunique + pd.Series.map



                Additionally, with your existing solution, you could map the series back to mID:



                df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
                df
                uID mID ncount
                0 James A 5
                1 Henry B 2
                2 Abe A 5
                3 James B 2
                4 Henry A 5
                5 Brian A 5
                6 Claude A 5
                7 James C 1






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 13 at 0:57









                coldspeed

                111k1799169




                111k1799169
























                    up vote
                    1
                    down vote













                    You are very close!



                    df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

                    uID mID ncount
                    0 James A 5
                    1 Henry B 2
                    2 Abe A 5
                    3 James B 2
                    4 Henry A 5
                    5 Brian A 5
                    6 Claude A 5
                    7 James C 1





                    share|improve this answer





















                    • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
                      – ZeroStack
                      Nov 12 at 23:47












                    • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
                      – Peter Leimbigler
                      Nov 12 at 23:53















                    up vote
                    1
                    down vote













                    You are very close!



                    df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

                    uID mID ncount
                    0 James A 5
                    1 Henry B 2
                    2 Abe A 5
                    3 James B 2
                    4 Henry A 5
                    5 Brian A 5
                    6 Claude A 5
                    7 James C 1





                    share|improve this answer





















                    • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
                      – ZeroStack
                      Nov 12 at 23:47












                    • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
                      – Peter Leimbigler
                      Nov 12 at 23:53













                    up vote
                    1
                    down vote










                    up vote
                    1
                    down vote









                    You are very close!



                    df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

                    uID mID ncount
                    0 James A 5
                    1 Henry B 2
                    2 Abe A 5
                    3 James B 2
                    4 Henry A 5
                    5 Brian A 5
                    6 Claude A 5
                    7 James C 1





                    share|improve this answer












                    You are very close!



                    df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

                    uID mID ncount
                    0 James A 5
                    1 Henry B 2
                    2 Abe A 5
                    3 James B 2
                    4 Henry A 5
                    5 Brian A 5
                    6 Claude A 5
                    7 James C 1






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 12 at 23:43









                    Peter Leimbigler

                    3,3931415




                    3,3931415












                    • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
                      – ZeroStack
                      Nov 12 at 23:47












                    • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
                      – Peter Leimbigler
                      Nov 12 at 23:53


















                    • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
                      – ZeroStack
                      Nov 12 at 23:47












                    • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
                      – Peter Leimbigler
                      Nov 12 at 23:53
















                    Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
                    – ZeroStack
                    Nov 12 at 23:47






                    Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
                    – ZeroStack
                    Nov 12 at 23:47














                    @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
                    – Peter Leimbigler
                    Nov 12 at 23:53




                    @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
                    – Peter Leimbigler
                    Nov 12 at 23:53


















                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271655%2fhow-to-perform-groupby-transform-nunique-in-pandas%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

                    ComboBox Display Member on multiple fields

                    Is it possible to collect Nectar points via Trainline?