Union of two pandas DataFrames
Say I have two data frames:
df1:
A
0 a
1 b
df2:
A
0 a
1 c
I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:
A B
0 a df1, df2
1 b df1
2 c df2
I can get the concatenated data frame (df3) without duplicates as follows:
import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)
I can't think of/find a method to have control over what element goes where. How can I add the extra column?
Thank you very much for any tips.
python pandas dataframe merge concatenation
add a comment |
Say I have two data frames:
df1:
A
0 a
1 b
df2:
A
0 a
1 c
I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:
A B
0 a df1, df2
1 b df1
2 c df2
I can get the concatenated data frame (df3) without duplicates as follows:
import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)
I can't think of/find a method to have control over what element goes where. How can I add the extra column?
Thank you very much for any tips.
python pandas dataframe merge concatenation
add a comment |
Say I have two data frames:
df1:
A
0 a
1 b
df2:
A
0 a
1 c
I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:
A B
0 a df1, df2
1 b df1
2 c df2
I can get the concatenated data frame (df3) without duplicates as follows:
import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)
I can't think of/find a method to have control over what element goes where. How can I add the extra column?
Thank you very much for any tips.
python pandas dataframe merge concatenation
Say I have two data frames:
df1:
A
0 a
1 b
df2:
A
0 a
1 c
I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:
A B
0 a df1, df2
1 b df1
2 c df2
I can get the concatenated data frame (df3) without duplicates as follows:
import pandas as pd
df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)
I can't think of/find a method to have control over what element goes where. How can I add the extra column?
Thank you very much for any tips.
python pandas dataframe merge concatenation
python pandas dataframe merge concatenation
edited Jan 22 at 23:44
coldspeed
127k23129216
127k23129216
asked Jan 22 at 19:32
Leon RaiLeon Rai
557
557
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
Nice and succinct!
– cph_sto
Jan 22 at 19:53
@cph_sto Thank you! Upvoted back.
– coldspeed
Jan 22 at 19:55
1
I have learnt a lot from you.
– cph_sto
Jan 22 at 19:56
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
Jan 22 at 20:28
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
Jan 22 at 20:29
|
show 1 more comment
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
pleasure Leon :)
– cph_sto
Jan 22 at 20:38
add a comment |
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54315222%2funion-of-two-pandas-dataframes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
Nice and succinct!
– cph_sto
Jan 22 at 19:53
@cph_sto Thank you! Upvoted back.
– coldspeed
Jan 22 at 19:55
1
I have learnt a lot from you.
– cph_sto
Jan 22 at 19:56
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
Jan 22 at 20:28
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
Jan 22 at 20:29
|
show 1 more comment
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
Nice and succinct!
– cph_sto
Jan 22 at 19:53
@cph_sto Thank you! Upvoted back.
– coldspeed
Jan 22 at 19:55
1
I have learnt a lot from you.
– cph_sto
Jan 22 at 19:56
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
Jan 22 at 20:28
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
Jan 22 at 20:29
|
show 1 more comment
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
Merge with an indicator
argument, and remap the result:
m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}
result = df1.merge(df2, on=['A'], how='outer', indicator='B')
result['B'] = result['B'].map(m)
result
A B
0 a df1, df2
1 b df1
2 c df2
answered Jan 22 at 19:37
coldspeedcoldspeed
127k23129216
127k23129216
Nice and succinct!
– cph_sto
Jan 22 at 19:53
@cph_sto Thank you! Upvoted back.
– coldspeed
Jan 22 at 19:55
1
I have learnt a lot from you.
– cph_sto
Jan 22 at 19:56
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
Jan 22 at 20:28
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
Jan 22 at 20:29
|
show 1 more comment
Nice and succinct!
– cph_sto
Jan 22 at 19:53
@cph_sto Thank you! Upvoted back.
– coldspeed
Jan 22 at 19:55
1
I have learnt a lot from you.
– cph_sto
Jan 22 at 19:56
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
Jan 22 at 20:28
1
@LeonRaidf1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)
– coldspeed
Jan 22 at 20:29
Nice and succinct!
– cph_sto
Jan 22 at 19:53
Nice and succinct!
– cph_sto
Jan 22 at 19:53
@cph_sto Thank you! Upvoted back.
– coldspeed
Jan 22 at 19:55
@cph_sto Thank you! Upvoted back.
– coldspeed
Jan 22 at 19:55
1
1
I have learnt a lot from you.
– cph_sto
Jan 22 at 19:56
I have learnt a lot from you.
– cph_sto
Jan 22 at 19:56
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
Jan 22 at 20:28
Excellent! Could you add how to do the same for intersection? outer->inner?
– Leon Rai
Jan 22 at 20:28
1
1
@LeonRai
df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)– coldspeed
Jan 22 at 20:29
@LeonRai
df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2')
(since intersection implies membership in both)– coldspeed
Jan 22 at 20:29
|
show 1 more comment
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
pleasure Leon :)
– cph_sto
Jan 22 at 20:38
add a comment |
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
pleasure Leon :)
– cph_sto
Jan 22 at 20:38
add a comment |
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
We use outer join to solve this -
df1 = pd.DataFrame({'A':['a','b']})
df2 = pd.DataFrame({'A':['a','c']})
df1['col1']='df1'
df2['col2']='df2'
df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')
df['B']=df['col1']+','+df['col2']
df['B'] = df['B'].str.strip(',')
df=df[['A','B']]
df
A B
0 a df1,df2
1 b df1
2 c df2
answered Jan 22 at 19:52
cph_stocph_sto
1,691320
1,691320
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
pleasure Leon :)
– cph_sto
Jan 22 at 20:38
add a comment |
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
pleasure Leon :)
– cph_sto
Jan 22 at 20:38
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
pleasure Leon :)
– cph_sto
Jan 22 at 20:38
pleasure Leon :)
– cph_sto
Jan 22 at 20:38
add a comment |
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
add a comment |
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
add a comment |
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
Use the command below:
df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')])
.groupby('A')
.aggregate(list)
.reset_index()
The result will be:
A source
0 a [df1, df2]
1 b [df1]
2 c [df2]
The assign
will add a column named source
with value df1
and df2
to your dataframes. groupby
command groups rows with same A
value to single row. aggregate
command describes how to aggregate other columns (source
) for each group of rows with same A
. I have used list
aggregate function so that the source
column be the list of values with same A
.
edited Jan 22 at 20:05
answered Jan 22 at 19:46
Narges AyoubiNarges Ayoubi
564
564
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
add a comment |
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
thank you for the answer!
– Leon Rai
Jan 22 at 20:31
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54315222%2funion-of-two-pandas-dataframes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown