How to perform groupby + transform + nunique in pandas?
up vote
1
down vote
favorite
I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.
df = pd.DataFrame({
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
df.groupby('mID')['uID'].nunique()
Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
(this obviously does not work)
It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.
Thanks
pandas unique pandas-groupby
add a comment |
up vote
1
down vote
favorite
I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.
df = pd.DataFrame({
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
df.groupby('mID')['uID'].nunique()
Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
(this obviously does not work)
It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.
Thanks
pandas unique pandas-groupby
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.
df = pd.DataFrame({
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
df.groupby('mID')['uID'].nunique()
Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
(this obviously does not work)
It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.
Thanks
pandas unique pandas-groupby
I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.
df = pd.DataFrame({
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
df.groupby('mID')['uID'].nunique()
Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
(this obviously does not work)
It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.
Thanks
pandas unique pandas-groupby
pandas unique pandas-groupby
asked Nov 12 at 23:35
ZeroStack
359116
359116
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
GroupBy.transform('nunique')
On v0.23.4
, your solution works for me.
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
GroupBy.nunique
+ pd.Series.map
Additionally, with your existing solution, you could map
the series back to mID
:
df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
add a comment |
up vote
1
down vote
You are very close!
df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47
@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columnsmID
and/oruID
.
– Peter Leimbigler
Nov 12 at 23:53
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
GroupBy.transform('nunique')
On v0.23.4
, your solution works for me.
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
GroupBy.nunique
+ pd.Series.map
Additionally, with your existing solution, you could map
the series back to mID
:
df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
add a comment |
up vote
1
down vote
accepted
GroupBy.transform('nunique')
On v0.23.4
, your solution works for me.
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
GroupBy.nunique
+ pd.Series.map
Additionally, with your existing solution, you could map
the series back to mID
:
df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
GroupBy.transform('nunique')
On v0.23.4
, your solution works for me.
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
GroupBy.nunique
+ pd.Series.map
Additionally, with your existing solution, you could map
the series back to mID
:
df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
GroupBy.transform('nunique')
On v0.23.4
, your solution works for me.
df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
GroupBy.nunique
+ pd.Series.map
Additionally, with your existing solution, you could map
the series back to mID
:
df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
df
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
answered Nov 13 at 0:57
coldspeed
111k1799169
111k1799169
add a comment |
add a comment |
up vote
1
down vote
You are very close!
df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47
@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columnsmID
and/oruID
.
– Peter Leimbigler
Nov 12 at 23:53
add a comment |
up vote
1
down vote
You are very close!
df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47
@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columnsmID
and/oruID
.
– Peter Leimbigler
Nov 12 at 23:53
add a comment |
up vote
1
down vote
up vote
1
down vote
You are very close!
df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
You are very close!
df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)
uID mID ncount
0 James A 5
1 Henry B 2
2 Abe A 5
3 James B 2
4 Henry A 5
5 Brian A 5
6 Claude A 5
7 James C 1
answered Nov 12 at 23:43
Peter Leimbigler
3,3931415
3,3931415
Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47
@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columnsmID
and/oruID
.
– Peter Leimbigler
Nov 12 at 23:53
add a comment |
Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47
@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columnsmID
and/oruID
.
– Peter Leimbigler
Nov 12 at 23:53
Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47
Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.
– ZeroStack
Nov 12 at 23:47
@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns
mID
and/or uID
.– Peter Leimbigler
Nov 12 at 23:53
@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns
mID
and/or uID
.– Peter Leimbigler
Nov 12 at 23:53
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271655%2fhow-to-perform-groupby-transform-nunique-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown