How to perform groupby + transform + nunique in pandas?

up vote
1
down vote

favorite

I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.

df = pd.DataFrame({

         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],

         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']

})



df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

asked Nov 12 at 23:35

ZeroStack

359116

add a comment |

up vote
1
down vote

favorite

df = pd.DataFrame({

         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],

         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']

})



df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

asked Nov 12 at 23:35

ZeroStack

359116

add a comment |

up vote
1
down vote

favorite

df = pd.DataFrame({

         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],

         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']

})



df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

asked Nov 12 at 23:35

ZeroStack

359116

df = pd.DataFrame({

         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],

         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']

})



df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

pandas unique pandas-groupby

asked Nov 12 at 23:35

ZeroStack

359116

asked Nov 12 at 23:35

ZeroStack

359116

asked Nov 12 at 23:35

ZeroStack

359116

asked Nov 12 at 23:35

ZeroStack

359116

asked Nov 12 at 23:35

ZeroStack

359116

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 at 0:57

coldspeed

111k1799169

add a comment |

up vote
1
down vote

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 at 23:43

Peter Leimbigler

3,3931415

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
– Peter Leimbigler
Nov 12 at 23:53

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271655%2fhow-to-perform-groupby-transform-nunique-in-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 at 0:57

coldspeed

111k1799169

add a comment |

up vote
1
down vote

accepted

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 at 0:57

coldspeed

111k1799169

add a comment |

up vote
1
down vote

accepted

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 at 0:57

coldspeed

111k1799169

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 at 0:57

coldspeed

111k1799169

answered Nov 13 at 0:57

coldspeed

111k1799169

answered Nov 13 at 0:57

coldspeed

111k1799169

answered Nov 13 at 0:57

coldspeed

111k1799169

add a comment |

up vote
1
down vote

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 at 23:43

Peter Leimbigler

3,3931415

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
– Peter Leimbigler
Nov 12 at 23:53

add a comment |

up vote
1
down vote

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 at 23:43

Peter Leimbigler

3,3931415

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
– Peter Leimbigler
Nov 12 at 23:53

add a comment |

up vote
1
down vote

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 at 23:43

Peter Leimbigler

3,3931415

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 at 23:43

Peter Leimbigler

3,3931415

answered Nov 12 at 23:43

Peter Leimbigler

3,3931415

answered Nov 12 at 23:43

Peter Leimbigler

3,3931415

answered Nov 12 at 23:43

Peter Leimbigler

3,3931415

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
– Peter Leimbigler
Nov 12 at 23:53

add a comment |

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
– Peter Leimbigler
Nov 12 at 23:53

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.
– Peter Leimbigler
Nov 12 at 23:53

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky