Condition on a dataframe to create a new dataframe

Condition on a dataframe to create a new dataframe - Python

up vote
0
down vote

favorite

I have a dataframe as below.

    id type  value     Date name

0  111    a    100  2018/11   x1

1  112    b    200  2018/12   x2

2  113    a    300  2018/08   x3

3  113    a    200  2018/08   x4

4  114    a    300  2017/12   x4

5  114    a    500  2018/12   x5

6  114    b    500  2018/12   x5

I want a create a dataframe based on 4 conditions.

if id unique and type != b then take line and add column case1

if id unique and type = b then take line if name unique and add column case2

if id not unique and type != b then
aggregate line with same date, summing value, add column case3

if id not unique and type = b then
aggregate line with same date , summing value ignoring rows with type b, add column case4

The new dataframe will be as follow

    id type  value     Date   case

0  111    a    100  2018/11  case1

1  112    b    200  2018/12  case2

2  113    a    500  2018/08  case3

3  114    a    300  2017/12  case4

4  114    b    500  2018/12  case4

I have tried to create the column 'case' as my first step :

for i in df.id.unique():

if 'b' in df.Type:



    df['Case']= 'case 1'



else:



    df['Case']= 'case 2' else:

else:

if 'b' in df.Type:



    df['Case']= 'case 3'



else:



    df['Case']= 'case 4'

Im new to pandas manipulation so advices will be appreciated

edited Nov 16 at 2:55

Alex

745621

asked Nov 15 at 18:27

John Doe

add a comment |

up vote
0
down vote

favorite

I have a dataframe as below.

    id type  value     Date name

0  111    a    100  2018/11   x1

1  112    b    200  2018/12   x2

2  113    a    300  2018/08   x3

3  113    a    200  2018/08   x4

4  114    a    300  2017/12   x4

5  114    a    500  2018/12   x5

6  114    b    500  2018/12   x5

I want a create a dataframe based on 4 conditions.

if id unique and type != b then take line and add column case1

if id unique and type = b then take line if name unique and add column case2

if id not unique and type != b then
aggregate line with same date, summing value, add column case3

if id not unique and type = b then
aggregate line with same date , summing value ignoring rows with type b, add column case4

The new dataframe will be as follow

    id type  value     Date   case

0  111    a    100  2018/11  case1

1  112    b    200  2018/12  case2

2  113    a    500  2018/08  case3

3  114    a    300  2017/12  case4

4  114    b    500  2018/12  case4

I have tried to create the column 'case' as my first step :

for i in df.id.unique():

if 'b' in df.Type:



    df['Case']= 'case 1'



else:



    df['Case']= 'case 2' else:

else:

if 'b' in df.Type:



    df['Case']= 'case 3'



else:



    df['Case']= 'case 4'

Im new to pandas manipulation so advices will be appreciated

edited Nov 16 at 2:55

Alex

745621

asked Nov 15 at 18:27

John Doe

add a comment |

up vote
0
down vote

favorite

I have a dataframe as below.

    id type  value     Date name

0  111    a    100  2018/11   x1

1  112    b    200  2018/12   x2

2  113    a    300  2018/08   x3

3  113    a    200  2018/08   x4

4  114    a    300  2017/12   x4

5  114    a    500  2018/12   x5

6  114    b    500  2018/12   x5

I want a create a dataframe based on 4 conditions.

if id unique and type != b then take line and add column case1

if id unique and type = b then take line if name unique and add column case2

if id not unique and type != b then
aggregate line with same date, summing value, add column case3

if id not unique and type = b then
aggregate line with same date , summing value ignoring rows with type b, add column case4

The new dataframe will be as follow

    id type  value     Date   case

0  111    a    100  2018/11  case1

1  112    b    200  2018/12  case2

2  113    a    500  2018/08  case3

3  114    a    300  2017/12  case4

4  114    b    500  2018/12  case4

I have tried to create the column 'case' as my first step :

for i in df.id.unique():

if 'b' in df.Type:



    df['Case']= 'case 1'



else:



    df['Case']= 'case 2' else:

else:

if 'b' in df.Type:



    df['Case']= 'case 3'



else:



    df['Case']= 'case 4'

Im new to pandas manipulation so advices will be appreciated

edited Nov 16 at 2:55

Alex

745621

asked Nov 15 at 18:27

John Doe

I have a dataframe as below.

    id type  value     Date name

0  111    a    100  2018/11   x1

1  112    b    200  2018/12   x2

2  113    a    300  2018/08   x3

3  113    a    200  2018/08   x4

4  114    a    300  2017/12   x4

5  114    a    500  2018/12   x5

6  114    b    500  2018/12   x5

I want a create a dataframe based on 4 conditions.

if id unique and type != b then take line and add column case1

if id unique and type = b then take line if name unique and add column case2

if id not unique and type != b then
aggregate line with same date, summing value, add column case3

if id not unique and type = b then
aggregate line with same date , summing value ignoring rows with type b, add column case4

The new dataframe will be as follow

    id type  value     Date   case

0  111    a    100  2018/11  case1

1  112    b    200  2018/12  case2

2  113    a    500  2018/08  case3

3  114    a    300  2017/12  case4

4  114    b    500  2018/12  case4

I have tried to create the column 'case' as my first step :

for i in df.id.unique():

if 'b' in df.Type:



    df['Case']= 'case 1'



else:



    df['Case']= 'case 2' else:

else:

if 'b' in df.Type:



    df['Case']= 'case 3'



else:



    df['Case']= 'case 4'

Im new to pandas manipulation so advices will be appreciated

python pandas dataframe condition data-manipulation

edited Nov 16 at 2:55

Alex

745621

asked Nov 15 at 18:27

John Doe

edited Nov 16 at 2:55

Alex

745621

asked Nov 15 at 18:27

John Doe

edited Nov 16 at 2:55

Alex

745621

edited Nov 16 at 2:55

Alex

745621

edited Nov 16 at 2:55

Alex

745621

asked Nov 15 at 18:27

John Doe

asked Nov 15 at 18:27

John Doe

asked Nov 15 at 18:27

John Doe

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

You can use this:

# groupby and add group sizes

df['id_count'] = df.groupby('id')['id'].transform('size')



# conditions for np.select

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

# choices for np.select

choices = ['case1', 'case2', 'case3', 'case4']

# Add case column

df['case'] = np.select(conditions, choices, default=None)



# next grouping

grouping = ['id', 'type', 'Date', 'case']

# replace value column

df['value'] = df.groupby(grouping)['value'].transform('sum')



# drop duplicate rows

df = df.drop_duplicates(subset=grouping, keep='first')

# remove extra columns

df = df.drop(['name', 'id_count'], axis='columns')

Step by step

First of all you can create a groupby of the id column, like so:

gb = df.groupby('id')

Then you can use this to count how many times an id occurs:

df['id_count'] = gb['id'].transform('size')

df now looks like this:

    id type  value     Date name  id_count

0  111    a    100  2018/11   x1         1

1  112    b    200  2018/12   x2         1

2  113    a    300  2018/08   x3         2

3  113    a    200  2018/08   x4         2

4  114    a    300  2017/12   x4         3

5  114    a    500  2018/12   x5         3

6  114    b    500  2018/12   x5         3

Now you can use np.select to make your conditions:

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

choices = ['case1', 'case2', 'case3', 'case4']

df['case'] = np.select(conditions, choices, default=None)

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    300  2018/08   x3         2  case3

3  113    a    200  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.

grouping = ['id', 'type', 'Date', 'case']

df['value'] = df.groupby(grouping)['value'].transform('sum')

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

3  113    a    500  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Finally, drop-duplicates using the grouping list from before:

df = df.drop_duplicates(subset=grouping, keep='first')

Giving:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

4  114    a    300  2017/12   x4         3  case3

6  114    b    500  2018/12   x5         3  case4

You can remove the extra column using drop:

df = df.drop(['name', 'id_count'], axis='columns')

answered Nov 16 at 1:52

Alex

745621

Many thanks, I learned a lot, that's very good.
– John Doe
Nov 16 at 22:34

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325764%2fcondition-on-a-dataframe-to-create-a-new-dataframe-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

You can use this:

# groupby and add group sizes

df['id_count'] = df.groupby('id')['id'].transform('size')



# conditions for np.select

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

# choices for np.select

choices = ['case1', 'case2', 'case3', 'case4']

# Add case column

df['case'] = np.select(conditions, choices, default=None)



# next grouping

grouping = ['id', 'type', 'Date', 'case']

# replace value column

df['value'] = df.groupby(grouping)['value'].transform('sum')



# drop duplicate rows

df = df.drop_duplicates(subset=grouping, keep='first')

# remove extra columns

df = df.drop(['name', 'id_count'], axis='columns')

Step by step

First of all you can create a groupby of the id column, like so:

gb = df.groupby('id')

Then you can use this to count how many times an id occurs:

df['id_count'] = gb['id'].transform('size')

df now looks like this:

    id type  value     Date name  id_count

0  111    a    100  2018/11   x1         1

1  112    b    200  2018/12   x2         1

2  113    a    300  2018/08   x3         2

3  113    a    200  2018/08   x4         2

4  114    a    300  2017/12   x4         3

5  114    a    500  2018/12   x5         3

6  114    b    500  2018/12   x5         3

Now you can use np.select to make your conditions:

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

choices = ['case1', 'case2', 'case3', 'case4']

df['case'] = np.select(conditions, choices, default=None)

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    300  2018/08   x3         2  case3

3  113    a    200  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.

grouping = ['id', 'type', 'Date', 'case']

df['value'] = df.groupby(grouping)['value'].transform('sum')

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

3  113    a    500  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Finally, drop-duplicates using the grouping list from before:

df = df.drop_duplicates(subset=grouping, keep='first')

Giving:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

4  114    a    300  2017/12   x4         3  case3

6  114    b    500  2018/12   x5         3  case4

You can remove the extra column using drop:

df = df.drop(['name', 'id_count'], axis='columns')

answered Nov 16 at 1:52

Alex

745621

Many thanks, I learned a lot, that's very good.
– John Doe
Nov 16 at 22:34

add a comment |

up vote
0
down vote

accepted

You can use this:

# groupby and add group sizes

df['id_count'] = df.groupby('id')['id'].transform('size')



# conditions for np.select

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

# choices for np.select

choices = ['case1', 'case2', 'case3', 'case4']

# Add case column

df['case'] = np.select(conditions, choices, default=None)



# next grouping

grouping = ['id', 'type', 'Date', 'case']

# replace value column

df['value'] = df.groupby(grouping)['value'].transform('sum')



# drop duplicate rows

df = df.drop_duplicates(subset=grouping, keep='first')

# remove extra columns

df = df.drop(['name', 'id_count'], axis='columns')

Step by step

First of all you can create a groupby of the id column, like so:

gb = df.groupby('id')

Then you can use this to count how many times an id occurs:

df['id_count'] = gb['id'].transform('size')

df now looks like this:

    id type  value     Date name  id_count

0  111    a    100  2018/11   x1         1

1  112    b    200  2018/12   x2         1

2  113    a    300  2018/08   x3         2

3  113    a    200  2018/08   x4         2

4  114    a    300  2017/12   x4         3

5  114    a    500  2018/12   x5         3

6  114    b    500  2018/12   x5         3

Now you can use np.select to make your conditions:

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

choices = ['case1', 'case2', 'case3', 'case4']

df['case'] = np.select(conditions, choices, default=None)

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    300  2018/08   x3         2  case3

3  113    a    200  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.

grouping = ['id', 'type', 'Date', 'case']

df['value'] = df.groupby(grouping)['value'].transform('sum')

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

3  113    a    500  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Finally, drop-duplicates using the grouping list from before:

df = df.drop_duplicates(subset=grouping, keep='first')

Giving:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

4  114    a    300  2017/12   x4         3  case3

6  114    b    500  2018/12   x5         3  case4

You can remove the extra column using drop:

df = df.drop(['name', 'id_count'], axis='columns')

answered Nov 16 at 1:52

Alex

745621

Many thanks, I learned a lot, that's very good.
– John Doe
Nov 16 at 22:34

add a comment |

up vote
0
down vote

accepted

You can use this:

# groupby and add group sizes

df['id_count'] = df.groupby('id')['id'].transform('size')



# conditions for np.select

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

# choices for np.select

choices = ['case1', 'case2', 'case3', 'case4']

# Add case column

df['case'] = np.select(conditions, choices, default=None)



# next grouping

grouping = ['id', 'type', 'Date', 'case']

# replace value column

df['value'] = df.groupby(grouping)['value'].transform('sum')



# drop duplicate rows

df = df.drop_duplicates(subset=grouping, keep='first')

# remove extra columns

df = df.drop(['name', 'id_count'], axis='columns')

Step by step

First of all you can create a groupby of the id column, like so:

gb = df.groupby('id')

Then you can use this to count how many times an id occurs:

df['id_count'] = gb['id'].transform('size')

df now looks like this:

    id type  value     Date name  id_count

0  111    a    100  2018/11   x1         1

1  112    b    200  2018/12   x2         1

2  113    a    300  2018/08   x3         2

3  113    a    200  2018/08   x4         2

4  114    a    300  2017/12   x4         3

5  114    a    500  2018/12   x5         3

6  114    b    500  2018/12   x5         3

Now you can use np.select to make your conditions:

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

choices = ['case1', 'case2', 'case3', 'case4']

df['case'] = np.select(conditions, choices, default=None)

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    300  2018/08   x3         2  case3

3  113    a    200  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.

grouping = ['id', 'type', 'Date', 'case']

df['value'] = df.groupby(grouping)['value'].transform('sum')

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

3  113    a    500  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Finally, drop-duplicates using the grouping list from before:

df = df.drop_duplicates(subset=grouping, keep='first')

Giving:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

4  114    a    300  2017/12   x4         3  case3

6  114    b    500  2018/12   x5         3  case4

You can remove the extra column using drop:

df = df.drop(['name', 'id_count'], axis='columns')

answered Nov 16 at 1:52

Alex

745621

You can use this:

# groupby and add group sizes

df['id_count'] = df.groupby('id')['id'].transform('size')



# conditions for np.select

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

# choices for np.select

choices = ['case1', 'case2', 'case3', 'case4']

# Add case column

df['case'] = np.select(conditions, choices, default=None)



# next grouping

grouping = ['id', 'type', 'Date', 'case']

# replace value column

df['value'] = df.groupby(grouping)['value'].transform('sum')



# drop duplicate rows

df = df.drop_duplicates(subset=grouping, keep='first')

# remove extra columns

df = df.drop(['name', 'id_count'], axis='columns')

Step by step

First of all you can create a groupby of the id column, like so:

gb = df.groupby('id')

Then you can use this to count how many times an id occurs:

df['id_count'] = gb['id'].transform('size')

df now looks like this:

    id type  value     Date name  id_count

0  111    a    100  2018/11   x1         1

1  112    b    200  2018/12   x2         1

2  113    a    300  2018/08   x3         2

3  113    a    200  2018/08   x4         2

4  114    a    300  2017/12   x4         3

5  114    a    500  2018/12   x5         3

6  114    b    500  2018/12   x5         3

Now you can use np.select to make your conditions:

conditions = [

    (df['id_count'].eq(1) & df['type'].ne('b')),

    (df['id_count'].eq(1) & df['type'].eq('b')),

    (df['id_count'].ne(1) & df['type'].ne('b')),

    (df['id_count'].ne(1) & df['type'].eq('b'))]

choices = ['case1', 'case2', 'case3', 'case4']

df['case'] = np.select(conditions, choices, default=None)

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    300  2018/08   x3         2  case3

3  113    a    200  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Create another groupby using grouping (a list of columns); then sum the value column in these groups, replacing the value column.

grouping = ['id', 'type', 'Date', 'case']

df['value'] = df.groupby(grouping)['value'].transform('sum')

Resulting in:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

3  113    a    500  2018/08   x4         2  case3

4  114    a    300  2017/12   x4         3  case3

5  114    a    500  2018/12   x5         3  case3

6  114    b    500  2018/12   x5         3  case4

Finally, drop-duplicates using the grouping list from before:

df = df.drop_duplicates(subset=grouping, keep='first')

Giving:

    id type  value     Date name  id_count   case

0  111    a    100  2018/11   x1         1  case1

1  112    b    200  2018/12   x2         1  case2

2  113    a    500  2018/08   x3         2  case3

4  114    a    300  2017/12   x4         3  case3

6  114    b    500  2018/12   x5         3  case4

You can remove the extra column using drop:

df = df.drop(['name', 'id_count'], axis='columns')

answered Nov 16 at 1:52

Alex

745621

answered Nov 16 at 1:52

Alex

745621

answered Nov 16 at 1:52

Alex

745621

answered Nov 16 at 1:52

Alex

745621

Many thanks, I learned a lot, that's very good.
– John Doe
Nov 16 at 22:34

add a comment |

Many thanks, I learned a lot, that's very good.
– John Doe
Nov 16 at 22:34

Many thanks, I learned a lot, that's very good.
– John Doe
Nov 16 at 22:34

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky