Condition on a dataframe to create a new dataframe - Python











up vote
0
down vote

favorite












I have a dataframe as below.



    id type  value     Date name
0 111 a 100 2018/11 x1
1 112 b 200 2018/12 x2
2 113 a 300 2018/08 x3
3 113 a 200 2018/08 x4
4 114 a 300 2017/12 x4
5 114 a 500 2018/12 x5
6 114 b 500 2018/12 x5


I want a create a dataframe based on 4 conditions.




  1. if id unique and type != b then take line and add column case1

  2. if id unique and type = b then take line if name unique and add column case2

  3. if id not unique and type != b then
    aggregate line with same date, summing value, add column case3

  4. if id not unique and type = b then
    aggregate line with same date , summing value ignoring rows with type b, add column case4


The new dataframe will be as follow



    id type  value     Date   case
0 111 a 100 2018/11 case1
1 112 b 200 2018/12 case2
2 113 a 500 2018/08 case3
3 114 a 300 2017/12 case4
4 114 b 500 2018/12 case4


I have tried to create the column 'case' as my first step :




for i in df.id.unique():



if 'b' in df.Type:

df['Case']= 'case 1'

else:

df['Case']= 'case 2' else:


else:



if 'b' in df.Type:

df['Case']= 'case 3'

else:

df['Case']= 'case 4'



Im new to pandas manipulation so advices will be appreciated










share|improve this question




























    up vote
    0
    down vote

    favorite












    I have a dataframe as below.



        id type  value     Date name
    0 111 a 100 2018/11 x1
    1 112 b 200 2018/12 x2
    2 113 a 300 2018/08 x3
    3 113 a 200 2018/08 x4
    4 114 a 300 2017/12 x4
    5 114 a 500 2018/12 x5
    6 114 b 500 2018/12 x5


    I want a create a dataframe based on 4 conditions.




    1. if id unique and type != b then take line and add column case1

    2. if id unique and type = b then take line if name unique and add column case2

    3. if id not unique and type != b then
      aggregate line with same date, summing value, add column case3

    4. if id not unique and type = b then
      aggregate line with same date , summing value ignoring rows with type b, add column case4


    The new dataframe will be as follow



        id type  value     Date   case
    0 111 a 100 2018/11 case1
    1 112 b 200 2018/12 case2
    2 113 a 500 2018/08 case3
    3 114 a 300 2017/12 case4
    4 114 b 500 2018/12 case4


    I have tried to create the column 'case' as my first step :




    for i in df.id.unique():



    if 'b' in df.Type:

    df['Case']= 'case 1'

    else:

    df['Case']= 'case 2' else:


    else:



    if 'b' in df.Type:

    df['Case']= 'case 3'

    else:

    df['Case']= 'case 4'



    Im new to pandas manipulation so advices will be appreciated










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a dataframe as below.



          id type  value     Date name
      0 111 a 100 2018/11 x1
      1 112 b 200 2018/12 x2
      2 113 a 300 2018/08 x3
      3 113 a 200 2018/08 x4
      4 114 a 300 2017/12 x4
      5 114 a 500 2018/12 x5
      6 114 b 500 2018/12 x5


      I want a create a dataframe based on 4 conditions.




      1. if id unique and type != b then take line and add column case1

      2. if id unique and type = b then take line if name unique and add column case2

      3. if id not unique and type != b then
        aggregate line with same date, summing value, add column case3

      4. if id not unique and type = b then
        aggregate line with same date , summing value ignoring rows with type b, add column case4


      The new dataframe will be as follow



          id type  value     Date   case
      0 111 a 100 2018/11 case1
      1 112 b 200 2018/12 case2
      2 113 a 500 2018/08 case3
      3 114 a 300 2017/12 case4
      4 114 b 500 2018/12 case4


      I have tried to create the column 'case' as my first step :




      for i in df.id.unique():



      if 'b' in df.Type:

      df['Case']= 'case 1'

      else:

      df['Case']= 'case 2' else:


      else:



      if 'b' in df.Type:

      df['Case']= 'case 3'

      else:

      df['Case']= 'case 4'



      Im new to pandas manipulation so advices will be appreciated










      share|improve this question















      I have a dataframe as below.



          id type  value     Date name
      0 111 a 100 2018/11 x1
      1 112 b 200 2018/12 x2
      2 113 a 300 2018/08 x3
      3 113 a 200 2018/08 x4
      4 114 a 300 2017/12 x4
      5 114 a 500 2018/12 x5
      6 114 b 500 2018/12 x5


      I want a create a dataframe based on 4 conditions.




      1. if id unique and type != b then take line and add column case1

      2. if id unique and type = b then take line if name unique and add column case2

      3. if id not unique and type != b then
        aggregate line with same date, summing value, add column case3

      4. if id not unique and type = b then
        aggregate line with same date , summing value ignoring rows with type b, add column case4


      The new dataframe will be as follow



          id type  value     Date   case
      0 111 a 100 2018/11 case1
      1 112 b 200 2018/12 case2
      2 113 a 500 2018/08 case3
      3 114 a 300 2017/12 case4
      4 114 b 500 2018/12 case4


      I have tried to create the column 'case' as my first step :




      for i in df.id.unique():



      if 'b' in df.Type:

      df['Case']= 'case 1'

      else:

      df['Case']= 'case 2' else:


      else:



      if 'b' in df.Type:

      df['Case']= 'case 3'

      else:

      df['Case']= 'case 4'



      Im new to pandas manipulation so advices will be appreciated







      python pandas dataframe condition data-manipulation






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 at 2:55









      Alex

      745621




      745621










      asked Nov 15 at 18:27









      John Doe

      32




      32
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          You can use this:



          # groupby and add group sizes
          df['id_count'] = df.groupby('id')['id'].transform('size')

          # conditions for np.select
          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          # choices for np.select
          choices = ['case1', 'case2', 'case3', 'case4']
          # Add case column
          df['case'] = np.select(conditions, choices, default=None)

          # next grouping
          grouping = ['id', 'type', 'Date', 'case']
          # replace value column
          df['value'] = df.groupby(grouping)['value'].transform('sum')

          # drop duplicate rows
          df = df.drop_duplicates(subset=grouping, keep='first')
          # remove extra columns
          df = df.drop(['name', 'id_count'], axis='columns')




          Step by step



          First of all you can create a groupby of the id column, like so:



          gb = df.groupby('id')


          Then you can use this to count how many times an id occurs:



          df['id_count'] = gb['id'].transform('size')


          df now looks like this:



              id type  value     Date name  id_count
          0 111 a 100 2018/11 x1 1
          1 112 b 200 2018/12 x2 1
          2 113 a 300 2018/08 x3 2
          3 113 a 200 2018/08 x4 2
          4 114 a 300 2017/12 x4 3
          5 114 a 500 2018/12 x5 3
          6 114 b 500 2018/12 x5 3


          Now you can use np.select to make your conditions:



          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          choices = ['case1', 'case2', 'case3', 'case4']
          df['case'] = np.select(conditions, choices, default=None)


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 300 2018/08 x3 2 case3
          3 113 a 200 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.



          grouping = ['id', 'type', 'Date', 'case']
          df['value'] = df.groupby(grouping)['value'].transform('sum')


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          3 113 a 500 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Finally, drop-duplicates using the grouping list from before:



          df = df.drop_duplicates(subset=grouping, keep='first')


          Giving:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          4 114 a 300 2017/12 x4 3 case3
          6 114 b 500 2018/12 x5 3 case4


          You can remove the extra column using drop:



          df = df.drop(['name', 'id_count'], axis='columns')





          share|improve this answer





















          • Many thanks, I learned a lot, that's very good.
            – John Doe
            Nov 16 at 22:34











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325764%2fcondition-on-a-dataframe-to-create-a-new-dataframe-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote



          accepted










          You can use this:



          # groupby and add group sizes
          df['id_count'] = df.groupby('id')['id'].transform('size')

          # conditions for np.select
          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          # choices for np.select
          choices = ['case1', 'case2', 'case3', 'case4']
          # Add case column
          df['case'] = np.select(conditions, choices, default=None)

          # next grouping
          grouping = ['id', 'type', 'Date', 'case']
          # replace value column
          df['value'] = df.groupby(grouping)['value'].transform('sum')

          # drop duplicate rows
          df = df.drop_duplicates(subset=grouping, keep='first')
          # remove extra columns
          df = df.drop(['name', 'id_count'], axis='columns')




          Step by step



          First of all you can create a groupby of the id column, like so:



          gb = df.groupby('id')


          Then you can use this to count how many times an id occurs:



          df['id_count'] = gb['id'].transform('size')


          df now looks like this:



              id type  value     Date name  id_count
          0 111 a 100 2018/11 x1 1
          1 112 b 200 2018/12 x2 1
          2 113 a 300 2018/08 x3 2
          3 113 a 200 2018/08 x4 2
          4 114 a 300 2017/12 x4 3
          5 114 a 500 2018/12 x5 3
          6 114 b 500 2018/12 x5 3


          Now you can use np.select to make your conditions:



          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          choices = ['case1', 'case2', 'case3', 'case4']
          df['case'] = np.select(conditions, choices, default=None)


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 300 2018/08 x3 2 case3
          3 113 a 200 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.



          grouping = ['id', 'type', 'Date', 'case']
          df['value'] = df.groupby(grouping)['value'].transform('sum')


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          3 113 a 500 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Finally, drop-duplicates using the grouping list from before:



          df = df.drop_duplicates(subset=grouping, keep='first')


          Giving:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          4 114 a 300 2017/12 x4 3 case3
          6 114 b 500 2018/12 x5 3 case4


          You can remove the extra column using drop:



          df = df.drop(['name', 'id_count'], axis='columns')





          share|improve this answer





















          • Many thanks, I learned a lot, that's very good.
            – John Doe
            Nov 16 at 22:34















          up vote
          0
          down vote



          accepted










          You can use this:



          # groupby and add group sizes
          df['id_count'] = df.groupby('id')['id'].transform('size')

          # conditions for np.select
          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          # choices for np.select
          choices = ['case1', 'case2', 'case3', 'case4']
          # Add case column
          df['case'] = np.select(conditions, choices, default=None)

          # next grouping
          grouping = ['id', 'type', 'Date', 'case']
          # replace value column
          df['value'] = df.groupby(grouping)['value'].transform('sum')

          # drop duplicate rows
          df = df.drop_duplicates(subset=grouping, keep='first')
          # remove extra columns
          df = df.drop(['name', 'id_count'], axis='columns')




          Step by step



          First of all you can create a groupby of the id column, like so:



          gb = df.groupby('id')


          Then you can use this to count how many times an id occurs:



          df['id_count'] = gb['id'].transform('size')


          df now looks like this:



              id type  value     Date name  id_count
          0 111 a 100 2018/11 x1 1
          1 112 b 200 2018/12 x2 1
          2 113 a 300 2018/08 x3 2
          3 113 a 200 2018/08 x4 2
          4 114 a 300 2017/12 x4 3
          5 114 a 500 2018/12 x5 3
          6 114 b 500 2018/12 x5 3


          Now you can use np.select to make your conditions:



          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          choices = ['case1', 'case2', 'case3', 'case4']
          df['case'] = np.select(conditions, choices, default=None)


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 300 2018/08 x3 2 case3
          3 113 a 200 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.



          grouping = ['id', 'type', 'Date', 'case']
          df['value'] = df.groupby(grouping)['value'].transform('sum')


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          3 113 a 500 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Finally, drop-duplicates using the grouping list from before:



          df = df.drop_duplicates(subset=grouping, keep='first')


          Giving:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          4 114 a 300 2017/12 x4 3 case3
          6 114 b 500 2018/12 x5 3 case4


          You can remove the extra column using drop:



          df = df.drop(['name', 'id_count'], axis='columns')





          share|improve this answer





















          • Many thanks, I learned a lot, that's very good.
            – John Doe
            Nov 16 at 22:34













          up vote
          0
          down vote



          accepted







          up vote
          0
          down vote



          accepted






          You can use this:



          # groupby and add group sizes
          df['id_count'] = df.groupby('id')['id'].transform('size')

          # conditions for np.select
          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          # choices for np.select
          choices = ['case1', 'case2', 'case3', 'case4']
          # Add case column
          df['case'] = np.select(conditions, choices, default=None)

          # next grouping
          grouping = ['id', 'type', 'Date', 'case']
          # replace value column
          df['value'] = df.groupby(grouping)['value'].transform('sum')

          # drop duplicate rows
          df = df.drop_duplicates(subset=grouping, keep='first')
          # remove extra columns
          df = df.drop(['name', 'id_count'], axis='columns')




          Step by step



          First of all you can create a groupby of the id column, like so:



          gb = df.groupby('id')


          Then you can use this to count how many times an id occurs:



          df['id_count'] = gb['id'].transform('size')


          df now looks like this:



              id type  value     Date name  id_count
          0 111 a 100 2018/11 x1 1
          1 112 b 200 2018/12 x2 1
          2 113 a 300 2018/08 x3 2
          3 113 a 200 2018/08 x4 2
          4 114 a 300 2017/12 x4 3
          5 114 a 500 2018/12 x5 3
          6 114 b 500 2018/12 x5 3


          Now you can use np.select to make your conditions:



          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          choices = ['case1', 'case2', 'case3', 'case4']
          df['case'] = np.select(conditions, choices, default=None)


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 300 2018/08 x3 2 case3
          3 113 a 200 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.



          grouping = ['id', 'type', 'Date', 'case']
          df['value'] = df.groupby(grouping)['value'].transform('sum')


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          3 113 a 500 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Finally, drop-duplicates using the grouping list from before:



          df = df.drop_duplicates(subset=grouping, keep='first')


          Giving:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          4 114 a 300 2017/12 x4 3 case3
          6 114 b 500 2018/12 x5 3 case4


          You can remove the extra column using drop:



          df = df.drop(['name', 'id_count'], axis='columns')





          share|improve this answer












          You can use this:



          # groupby and add group sizes
          df['id_count'] = df.groupby('id')['id'].transform('size')

          # conditions for np.select
          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          # choices for np.select
          choices = ['case1', 'case2', 'case3', 'case4']
          # Add case column
          df['case'] = np.select(conditions, choices, default=None)

          # next grouping
          grouping = ['id', 'type', 'Date', 'case']
          # replace value column
          df['value'] = df.groupby(grouping)['value'].transform('sum')

          # drop duplicate rows
          df = df.drop_duplicates(subset=grouping, keep='first')
          # remove extra columns
          df = df.drop(['name', 'id_count'], axis='columns')




          Step by step



          First of all you can create a groupby of the id column, like so:



          gb = df.groupby('id')


          Then you can use this to count how many times an id occurs:



          df['id_count'] = gb['id'].transform('size')


          df now looks like this:



              id type  value     Date name  id_count
          0 111 a 100 2018/11 x1 1
          1 112 b 200 2018/12 x2 1
          2 113 a 300 2018/08 x3 2
          3 113 a 200 2018/08 x4 2
          4 114 a 300 2017/12 x4 3
          5 114 a 500 2018/12 x5 3
          6 114 b 500 2018/12 x5 3


          Now you can use np.select to make your conditions:



          conditions = [
          (df['id_count'].eq(1) & df['type'].ne('b')),
          (df['id_count'].eq(1) & df['type'].eq('b')),
          (df['id_count'].ne(1) & df['type'].ne('b')),
          (df['id_count'].ne(1) & df['type'].eq('b'))]
          choices = ['case1', 'case2', 'case3', 'case4']
          df['case'] = np.select(conditions, choices, default=None)


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 300 2018/08 x3 2 case3
          3 113 a 200 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.



          grouping = ['id', 'type', 'Date', 'case']
          df['value'] = df.groupby(grouping)['value'].transform('sum')


          Resulting in:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          3 113 a 500 2018/08 x4 2 case3
          4 114 a 300 2017/12 x4 3 case3
          5 114 a 500 2018/12 x5 3 case3
          6 114 b 500 2018/12 x5 3 case4


          Finally, drop-duplicates using the grouping list from before:



          df = df.drop_duplicates(subset=grouping, keep='first')


          Giving:



              id type  value     Date name  id_count   case
          0 111 a 100 2018/11 x1 1 case1
          1 112 b 200 2018/12 x2 1 case2
          2 113 a 500 2018/08 x3 2 case3
          4 114 a 300 2017/12 x4 3 case3
          6 114 b 500 2018/12 x5 3 case4


          You can remove the extra column using drop:



          df = df.drop(['name', 'id_count'], axis='columns')






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 16 at 1:52









          Alex

          745621




          745621












          • Many thanks, I learned a lot, that's very good.
            – John Doe
            Nov 16 at 22:34


















          • Many thanks, I learned a lot, that's very good.
            – John Doe
            Nov 16 at 22:34
















          Many thanks, I learned a lot, that's very good.
          – John Doe
          Nov 16 at 22:34




          Many thanks, I learned a lot, that's very good.
          – John Doe
          Nov 16 at 22:34


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325764%2fcondition-on-a-dataframe-to-create-a-new-dataframe-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to send String Array data to Server using php in android

          Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

          Is anime1.com a legal site for watching anime?