How do I create a sum row and sum column in pandas?











up vote
8
down vote

favorite
1












I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.



I've got a table that looks like this from Khan Academy:



             | Undergraduate | Graduate | Total
-------------+---------------+----------+------
Straight A's | 240 | 60 | 300
-------------+---------------+----------+------
Not | 3,760 | 440 | 4,200
-------------+---------------+----------+------
Total | 4,000 | 500 | 4,500


I would like to recreate this table using pandas. Of course I could create a DataFrame using something like



"Graduate": {...},
"Undergraduate": {...},
"Total": {...},


But that seems like a naive approach that would both fall over quickly and just not really be extensible.



I've got the non-totals part of the table like this:



df = pd.DataFrame(
{
"Undergraduate": {"Straight A's": 240, "Not": 3_760},
"Graduate": {"Straight A's": 60, "Not": 440},
}
)
df


I've been looking and found a couple of promising things, like:



df['Total'] = df.sum(axis=1)


But I didn't find anything terribly elegant.



I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.



I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:



totals(df, rows=True, columns=True)


or something.



Does this exist in pandas, or do I have to just cobble together my own approach?










share|improve this question


























    up vote
    8
    down vote

    favorite
    1












    I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.



    I've got a table that looks like this from Khan Academy:



                 | Undergraduate | Graduate | Total
    -------------+---------------+----------+------
    Straight A's | 240 | 60 | 300
    -------------+---------------+----------+------
    Not | 3,760 | 440 | 4,200
    -------------+---------------+----------+------
    Total | 4,000 | 500 | 4,500


    I would like to recreate this table using pandas. Of course I could create a DataFrame using something like



    "Graduate": {...},
    "Undergraduate": {...},
    "Total": {...},


    But that seems like a naive approach that would both fall over quickly and just not really be extensible.



    I've got the non-totals part of the table like this:



    df = pd.DataFrame(
    {
    "Undergraduate": {"Straight A's": 240, "Not": 3_760},
    "Graduate": {"Straight A's": 60, "Not": 440},
    }
    )
    df


    I've been looking and found a couple of promising things, like:



    df['Total'] = df.sum(axis=1)


    But I didn't find anything terribly elegant.



    I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.



    I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:



    totals(df, rows=True, columns=True)


    or something.



    Does this exist in pandas, or do I have to just cobble together my own approach?










    share|improve this question
























      up vote
      8
      down vote

      favorite
      1









      up vote
      8
      down vote

      favorite
      1






      1





      I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.



      I've got a table that looks like this from Khan Academy:



                   | Undergraduate | Graduate | Total
      -------------+---------------+----------+------
      Straight A's | 240 | 60 | 300
      -------------+---------------+----------+------
      Not | 3,760 | 440 | 4,200
      -------------+---------------+----------+------
      Total | 4,000 | 500 | 4,500


      I would like to recreate this table using pandas. Of course I could create a DataFrame using something like



      "Graduate": {...},
      "Undergraduate": {...},
      "Total": {...},


      But that seems like a naive approach that would both fall over quickly and just not really be extensible.



      I've got the non-totals part of the table like this:



      df = pd.DataFrame(
      {
      "Undergraduate": {"Straight A's": 240, "Not": 3_760},
      "Graduate": {"Straight A's": 60, "Not": 440},
      }
      )
      df


      I've been looking and found a couple of promising things, like:



      df['Total'] = df.sum(axis=1)


      But I didn't find anything terribly elegant.



      I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.



      I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:



      totals(df, rows=True, columns=True)


      or something.



      Does this exist in pandas, or do I have to just cobble together my own approach?










      share|improve this question













      I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.



      I've got a table that looks like this from Khan Academy:



                   | Undergraduate | Graduate | Total
      -------------+---------------+----------+------
      Straight A's | 240 | 60 | 300
      -------------+---------------+----------+------
      Not | 3,760 | 440 | 4,200
      -------------+---------------+----------+------
      Total | 4,000 | 500 | 4,500


      I would like to recreate this table using pandas. Of course I could create a DataFrame using something like



      "Graduate": {...},
      "Undergraduate": {...},
      "Total": {...},


      But that seems like a naive approach that would both fall over quickly and just not really be extensible.



      I've got the non-totals part of the table like this:



      df = pd.DataFrame(
      {
      "Undergraduate": {"Straight A's": 240, "Not": 3_760},
      "Graduate": {"Straight A's": 60, "Not": 440},
      }
      )
      df


      I've been looking and found a couple of promising things, like:



      df['Total'] = df.sum(axis=1)


      But I didn't find anything terribly elegant.



      I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.



      I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:



      totals(df, rows=True, columns=True)


      or something.



      Does this exist in pandas, or do I have to just cobble together my own approach?







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 at 15:07









      Wayne Werner

      26.5k13109191




      26.5k13109191
























          4 Answers
          4






          active

          oldest

          votes

















          up vote
          12
          down vote













          Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



          import pandas as pd

          df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


          df.loc['Total',:]= df.sum(axis=0)
          df.loc[:,'Total'] = df.sum(axis=1)


          Output:



                        Graduate  Undergraduate  Total
          Not 440 3760 4200
          Straight A's 60 240 300
          Total 500 4000 4500





          share|improve this answer























          • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
            – Wayne Werner
            Nov 21 at 15:20










          • That's weird, I get 4200 as it is supposed to? Maybe a typo?
            – Archie
            Nov 21 at 15:22






          • 5




            @WayneWerner that is because this is an in place operation. It seems you've run it twice
            – piRSquared
            Nov 21 at 15:23










          • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
            – Wayne Werner
            Nov 21 at 15:27


















          up vote
          7
          down vote














          append and assign



          The point of this answer is to provide an in line and not an in place solution.



          append



          I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



          assign



          I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





          df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

          Graduate Undergraduate Total
          Not 440 3760 4200
          Straight A's 60 240 300
          Total 500 4000 4500




          Fun alternative



          Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



          Also, still in line.



          def tc(d):
          return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

          df.pipe(tc).T.pipe(tc).T

          Graduate Undergraduate Total
          Not 440 3760 4200
          Straight A's 60 240 300
          Total 500 4000 4500





          share|improve this answer






























            up vote
            4
            down vote













            From the original data using crosstab, if just base on your input, you just need melt before crosstab



            s=df.reset_index().melt('index')
            pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
            Out[33]:
            variable Graduate Undergraduate All
            index
            Not 440 3760 4200
            Straight A's 60 240 300
            All 500 4000 4500




            Toy data



            df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
            # before `agg`, I think your input is the result after `groupby`
            df
            Out[37]:
            c1 c2 c3
            0 1 2 1
            1 2 2 2
            2 2 3 3
            3 3 3 4
            4 4 3 5


            pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
            =True)
            Out[38]:
            c2 2 3 All
            c1
            1 1.0 NaN 1
            2 2.0 3.0 5
            3 NaN 4.0 4
            4 NaN 5.0 5
            All 3.0 12.0 15





            share|improve this answer






























              up vote
              0
              down vote













              The original data is:



              >>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
              >>> df
              Out:
              Graduate Undergraduate
              Straight A's 60 240
              Not 440 3760


              You can only use df.T to achieve recreating this table:



              >>> df_new = df.T
              >>> df_new
              Out:
              Straight A's Not
              Graduate 60 440
              Undergraduate 240 3760


              After computing the Total by row and columns:



              >>> df_new.loc['Total',:]= df_new.sum(axis=0)
              >>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
              >>> df_new
              Out:
              Straight A's Not Total
              Graduate 60.0 440.0 500.0
              Undergraduate 240.0 3760.0 4000.0
              Total 300.0 4200.0 4500.0





              share|improve this answer





















                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414960%2fhow-do-i-create-a-sum-row-and-sum-column-in-pandas%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                12
                down vote













                Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



                import pandas as pd

                df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


                df.loc['Total',:]= df.sum(axis=0)
                df.loc[:,'Total'] = df.sum(axis=1)


                Output:



                              Graduate  Undergraduate  Total
                Not 440 3760 4200
                Straight A's 60 240 300
                Total 500 4000 4500





                share|improve this answer























                • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                  – Wayne Werner
                  Nov 21 at 15:20










                • That's weird, I get 4200 as it is supposed to? Maybe a typo?
                  – Archie
                  Nov 21 at 15:22






                • 5




                  @WayneWerner that is because this is an in place operation. It seems you've run it twice
                  – piRSquared
                  Nov 21 at 15:23










                • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                  – Wayne Werner
                  Nov 21 at 15:27















                up vote
                12
                down vote













                Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



                import pandas as pd

                df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


                df.loc['Total',:]= df.sum(axis=0)
                df.loc[:,'Total'] = df.sum(axis=1)


                Output:



                              Graduate  Undergraduate  Total
                Not 440 3760 4200
                Straight A's 60 240 300
                Total 500 4000 4500





                share|improve this answer























                • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                  – Wayne Werner
                  Nov 21 at 15:20










                • That's weird, I get 4200 as it is supposed to? Maybe a typo?
                  – Archie
                  Nov 21 at 15:22






                • 5




                  @WayneWerner that is because this is an in place operation. It seems you've run it twice
                  – piRSquared
                  Nov 21 at 15:23










                • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                  – Wayne Werner
                  Nov 21 at 15:27













                up vote
                12
                down vote










                up vote
                12
                down vote









                Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



                import pandas as pd

                df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


                df.loc['Total',:]= df.sum(axis=0)
                df.loc[:,'Total'] = df.sum(axis=1)


                Output:



                              Graduate  Undergraduate  Total
                Not 440 3760 4200
                Straight A's 60 240 300
                Total 500 4000 4500





                share|improve this answer














                Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):



                import pandas as pd

                df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})


                df.loc['Total',:]= df.sum(axis=0)
                df.loc[:,'Total'] = df.sum(axis=1)


                Output:



                              Graduate  Undergraduate  Total
                Not 440 3760 4200
                Straight A's 60 240 300
                Total 500 4000 4500






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 21 at 15:25

























                answered Nov 21 at 15:12









                Archie

                556722




                556722












                • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                  – Wayne Werner
                  Nov 21 at 15:20










                • That's weird, I get 4200 as it is supposed to? Maybe a typo?
                  – Archie
                  Nov 21 at 15:22






                • 5




                  @WayneWerner that is because this is an in place operation. It seems you've run it twice
                  – piRSquared
                  Nov 21 at 15:23










                • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                  – Wayne Werner
                  Nov 21 at 15:27


















                • Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                  – Wayne Werner
                  Nov 21 at 15:20










                • That's weird, I get 4200 as it is supposed to? Maybe a typo?
                  – Archie
                  Nov 21 at 15:22






                • 5




                  @WayneWerner that is because this is an in place operation. It seems you've run it twice
                  – piRSquared
                  Nov 21 at 15:23










                • Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                  – Wayne Werner
                  Nov 21 at 15:27
















                Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                – Wayne Werner
                Nov 21 at 15:20




                Huh... this is giving me some weird output though - 3760+440 isn't 8400, but that's what it's showing??
                – Wayne Werner
                Nov 21 at 15:20












                That's weird, I get 4200 as it is supposed to? Maybe a typo?
                – Archie
                Nov 21 at 15:22




                That's weird, I get 4200 as it is supposed to? Maybe a typo?
                – Archie
                Nov 21 at 15:22




                5




                5




                @WayneWerner that is because this is an in place operation. It seems you've run it twice
                – piRSquared
                Nov 21 at 15:23




                @WayneWerner that is because this is an in place operation. It seems you've run it twice
                – piRSquared
                Nov 21 at 15:23












                Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                – Wayne Werner
                Nov 21 at 15:27




                Ah, I must have accidentally hit ctrl+enter in my notebook. This time I made a copy to operate on :)
                – Wayne Werner
                Nov 21 at 15:27












                up vote
                7
                down vote














                append and assign



                The point of this answer is to provide an in line and not an in place solution.



                append



                I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



                assign



                I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





                df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

                Graduate Undergraduate Total
                Not 440 3760 4200
                Straight A's 60 240 300
                Total 500 4000 4500




                Fun alternative



                Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



                Also, still in line.



                def tc(d):
                return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

                df.pipe(tc).T.pipe(tc).T

                Graduate Undergraduate Total
                Not 440 3760 4200
                Straight A's 60 240 300
                Total 500 4000 4500





                share|improve this answer



























                  up vote
                  7
                  down vote














                  append and assign



                  The point of this answer is to provide an in line and not an in place solution.



                  append



                  I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



                  assign



                  I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





                  df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

                  Graduate Undergraduate Total
                  Not 440 3760 4200
                  Straight A's 60 240 300
                  Total 500 4000 4500




                  Fun alternative



                  Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



                  Also, still in line.



                  def tc(d):
                  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

                  df.pipe(tc).T.pipe(tc).T

                  Graduate Undergraduate Total
                  Not 440 3760 4200
                  Straight A's 60 240 300
                  Total 500 4000 4500





                  share|improve this answer

























                    up vote
                    7
                    down vote










                    up vote
                    7
                    down vote










                    append and assign



                    The point of this answer is to provide an in line and not an in place solution.



                    append



                    I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



                    assign



                    I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





                    df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

                    Graduate Undergraduate Total
                    Not 440 3760 4200
                    Straight A's 60 240 300
                    Total 500 4000 4500




                    Fun alternative



                    Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



                    Also, still in line.



                    def tc(d):
                    return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

                    df.pipe(tc).T.pipe(tc).T

                    Graduate Undergraduate Total
                    Not 440 3760 4200
                    Straight A's 60 240 300
                    Total 500 4000 4500





                    share|improve this answer















                    append and assign



                    The point of this answer is to provide an in line and not an in place solution.



                    append



                    I use append to stack a Series or DataFrame vertically. It also creates a copy so that I can continue to chain.



                    assign



                    I use assign to add a column. However, the DataFrame I'm working on is in the in between nether space. So I use a lambda in the assign argument which tells Pandas to apply it to the calling DataFrame.





                    df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

                    Graduate Undergraduate Total
                    Not 440 3760 4200
                    Straight A's 60 240 300
                    Total 500 4000 4500




                    Fun alternative



                    Uses drop with errors='ignore' to get rid of potentially pre-existing Total rows and columns.



                    Also, still in line.



                    def tc(d):
                    return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

                    df.pipe(tc).T.pipe(tc).T

                    Graduate Undergraduate Total
                    Not 440 3760 4200
                    Straight A's 60 240 300
                    Total 500 4000 4500






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 21 at 15:27

























                    answered Nov 21 at 15:09









                    piRSquared

                    150k21135277




                    150k21135277






















                        up vote
                        4
                        down vote













                        From the original data using crosstab, if just base on your input, you just need melt before crosstab



                        s=df.reset_index().melt('index')
                        pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
                        Out[33]:
                        variable Graduate Undergraduate All
                        index
                        Not 440 3760 4200
                        Straight A's 60 240 300
                        All 500 4000 4500




                        Toy data



                        df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
                        # before `agg`, I think your input is the result after `groupby`
                        df
                        Out[37]:
                        c1 c2 c3
                        0 1 2 1
                        1 2 2 2
                        2 2 3 3
                        3 3 3 4
                        4 4 3 5


                        pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
                        =True)
                        Out[38]:
                        c2 2 3 All
                        c1
                        1 1.0 NaN 1
                        2 2.0 3.0 5
                        3 NaN 4.0 4
                        4 NaN 5.0 5
                        All 3.0 12.0 15





                        share|improve this answer



























                          up vote
                          4
                          down vote













                          From the original data using crosstab, if just base on your input, you just need melt before crosstab



                          s=df.reset_index().melt('index')
                          pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
                          Out[33]:
                          variable Graduate Undergraduate All
                          index
                          Not 440 3760 4200
                          Straight A's 60 240 300
                          All 500 4000 4500




                          Toy data



                          df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
                          # before `agg`, I think your input is the result after `groupby`
                          df
                          Out[37]:
                          c1 c2 c3
                          0 1 2 1
                          1 2 2 2
                          2 2 3 3
                          3 3 3 4
                          4 4 3 5


                          pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
                          =True)
                          Out[38]:
                          c2 2 3 All
                          c1
                          1 1.0 NaN 1
                          2 2.0 3.0 5
                          3 NaN 4.0 4
                          4 NaN 5.0 5
                          All 3.0 12.0 15





                          share|improve this answer

























                            up vote
                            4
                            down vote










                            up vote
                            4
                            down vote









                            From the original data using crosstab, if just base on your input, you just need melt before crosstab



                            s=df.reset_index().melt('index')
                            pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
                            Out[33]:
                            variable Graduate Undergraduate All
                            index
                            Not 440 3760 4200
                            Straight A's 60 240 300
                            All 500 4000 4500




                            Toy data



                            df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
                            # before `agg`, I think your input is the result after `groupby`
                            df
                            Out[37]:
                            c1 c2 c3
                            0 1 2 1
                            1 2 2 2
                            2 2 3 3
                            3 3 3 4
                            4 4 3 5


                            pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
                            =True)
                            Out[38]:
                            c2 2 3 All
                            c1
                            1 1.0 NaN 1
                            2 2.0 3.0 5
                            3 NaN 4.0 4
                            4 NaN 5.0 5
                            All 3.0 12.0 15





                            share|improve this answer














                            From the original data using crosstab, if just base on your input, you just need melt before crosstab



                            s=df.reset_index().melt('index')
                            pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
                            Out[33]:
                            variable Graduate Undergraduate All
                            index
                            Not 440 3760 4200
                            Straight A's 60 240 300
                            All 500 4000 4500




                            Toy data



                            df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
                            # before `agg`, I think your input is the result after `groupby`
                            df
                            Out[37]:
                            c1 c2 c3
                            0 1 2 1
                            1 2 2 2
                            2 2 3 3
                            3 3 3 4
                            4 4 3 5


                            pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
                            =True)
                            Out[38]:
                            c2 2 3 All
                            c1
                            1 1.0 NaN 1
                            2 2.0 3.0 5
                            3 NaN 4.0 4
                            4 NaN 5.0 5
                            All 3.0 12.0 15






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 21 at 15:21

























                            answered Nov 21 at 15:16









                            W-B

                            95.2k72860




                            95.2k72860






















                                up vote
                                0
                                down vote













                                The original data is:



                                >>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
                                >>> df
                                Out:
                                Graduate Undergraduate
                                Straight A's 60 240
                                Not 440 3760


                                You can only use df.T to achieve recreating this table:



                                >>> df_new = df.T
                                >>> df_new
                                Out:
                                Straight A's Not
                                Graduate 60 440
                                Undergraduate 240 3760


                                After computing the Total by row and columns:



                                >>> df_new.loc['Total',:]= df_new.sum(axis=0)
                                >>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
                                >>> df_new
                                Out:
                                Straight A's Not Total
                                Graduate 60.0 440.0 500.0
                                Undergraduate 240.0 3760.0 4000.0
                                Total 300.0 4200.0 4500.0





                                share|improve this answer

























                                  up vote
                                  0
                                  down vote













                                  The original data is:



                                  >>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
                                  >>> df
                                  Out:
                                  Graduate Undergraduate
                                  Straight A's 60 240
                                  Not 440 3760


                                  You can only use df.T to achieve recreating this table:



                                  >>> df_new = df.T
                                  >>> df_new
                                  Out:
                                  Straight A's Not
                                  Graduate 60 440
                                  Undergraduate 240 3760


                                  After computing the Total by row and columns:



                                  >>> df_new.loc['Total',:]= df_new.sum(axis=0)
                                  >>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
                                  >>> df_new
                                  Out:
                                  Straight A's Not Total
                                  Graduate 60.0 440.0 500.0
                                  Undergraduate 240.0 3760.0 4000.0
                                  Total 300.0 4200.0 4500.0





                                  share|improve this answer























                                    up vote
                                    0
                                    down vote










                                    up vote
                                    0
                                    down vote









                                    The original data is:



                                    >>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
                                    >>> df
                                    Out:
                                    Graduate Undergraduate
                                    Straight A's 60 240
                                    Not 440 3760


                                    You can only use df.T to achieve recreating this table:



                                    >>> df_new = df.T
                                    >>> df_new
                                    Out:
                                    Straight A's Not
                                    Graduate 60 440
                                    Undergraduate 240 3760


                                    After computing the Total by row and columns:



                                    >>> df_new.loc['Total',:]= df_new.sum(axis=0)
                                    >>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
                                    >>> df_new
                                    Out:
                                    Straight A's Not Total
                                    Graduate 60.0 440.0 500.0
                                    Undergraduate 240.0 3760.0 4000.0
                                    Total 300.0 4200.0 4500.0





                                    share|improve this answer












                                    The original data is:



                                    >>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
                                    >>> df
                                    Out:
                                    Graduate Undergraduate
                                    Straight A's 60 240
                                    Not 440 3760


                                    You can only use df.T to achieve recreating this table:



                                    >>> df_new = df.T
                                    >>> df_new
                                    Out:
                                    Straight A's Not
                                    Graduate 60 440
                                    Undergraduate 240 3760


                                    After computing the Total by row and columns:



                                    >>> df_new.loc['Total',:]= df_new.sum(axis=0)
                                    >>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
                                    >>> df_new
                                    Out:
                                    Straight A's Not Total
                                    Graduate 60.0 440.0 500.0
                                    Undergraduate 240.0 3760.0 4000.0
                                    Total 300.0 4200.0 4500.0






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered 2 days ago









                                    TimeSeam

                                    1515




                                    1515






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.





                                        Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                        Please pay close attention to the following guidance:


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414960%2fhow-do-i-create-a-sum-row-and-sum-column-in-pandas%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        How to change which sound is reproduced for terminal bell?

                                        Can I use Tabulator js library in my java Spring + Thymeleaf project?

                                        Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents