Union of two pandas DataFrames

Say I have two data frames:

df1:

  A

0 a

1 b

df2:

  A

0 a

1 c

I want the result to be the union of the two frames with an extra column showing the source data frame that the row belongs to. In case of duplicates, duplicates should be removed and the respective extra column should show both sources:

  A  B

0 a  df1, df2

1 b  df1

2 c  df2

I can get the concatenated data frame (df3) without duplicates as follows:

import pandas as pd

df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)

I can't think of/find a method to have control over what element goes where. How can I add the extra column?

Thank you very much for any tips.

edited Jan 22 at 23:44

coldspeed

127k23129216

asked Jan 22 at 19:32

Leon Rai

557

add a comment |

Say I have two data frames:

df1:

  A

0 a

1 b

df2:

  A

0 a

1 c

  A  B

0 a  df1, df2

1 b  df1

2 c  df2

I can get the concatenated data frame (df3) without duplicates as follows:

import pandas as pd

df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)

I can't think of/find a method to have control over what element goes where. How can I add the extra column?

Thank you very much for any tips.

edited Jan 22 at 23:44

coldspeed

127k23129216

asked Jan 22 at 19:32

Leon Rai

557

add a comment |

Say I have two data frames:

df1:

  A

0 a

1 b

df2:

  A

0 a

1 c

  A  B

0 a  df1, df2

1 b  df1

2 c  df2

I can get the concatenated data frame (df3) without duplicates as follows:

import pandas as pd

df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)

I can't think of/find a method to have control over what element goes where. How can I add the extra column?

Thank you very much for any tips.

edited Jan 22 at 23:44

coldspeed

127k23129216

asked Jan 22 at 19:32

Leon Rai

557

Say I have two data frames:

df1:

  A

0 a

1 b

df2:

  A

0 a

1 c

  A  B

0 a  df1, df2

1 b  df1

2 c  df2

I can get the concatenated data frame (df3) without duplicates as follows:

import pandas as pd

df3=pd.concat([df1,df2],ignore_index=True).drop_duplicates().reset_index(drop=True)

I can't think of/find a method to have control over what element goes where. How can I add the extra column?

Thank you very much for any tips.

python pandas dataframe merge concatenation

edited Jan 22 at 23:44

coldspeed

127k23129216

asked Jan 22 at 19:32

Leon Rai

557

edited Jan 22 at 23:44

coldspeed

127k23129216

asked Jan 22 at 19:32

Leon Rai

557

edited Jan 22 at 23:44

coldspeed

127k23129216

edited Jan 22 at 23:44

coldspeed

127k23129216

edited Jan 22 at 23:44

coldspeed

127k23129216

asked Jan 22 at 19:32

Leon Rai

557

asked Jan 22 at 19:32

Leon Rai

557

asked Jan 22 at 19:32

Leon Rai

557

add a comment |

3 Answers
3

active

oldest

votes

Merge with an indicator argument, and remap the result:

m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}



result = df1.merge(df2, on=['A'], how='outer', indicator='B')

result['B'] = result['B'].map(m)



result

   A         B

0  a  df1, df2

1  b       df1

2  c       df2

answered Jan 22 at 19:37

coldspeed

127k23129216

Nice and succinct!

– cph_sto
Jan 22 at 19:53

@cph_sto Thank you! Upvoted back.

– coldspeed
Jan 22 at 19:55

1

I have learnt a lot from you.

– cph_sto
Jan 22 at 19:56

Excellent! Could you add how to do the same for intersection? outer->inner?

– Leon Rai
Jan 22 at 20:28

1

@LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

– coldspeed
Jan 22 at 20:29

|
show 1 more comment

We use outer join to solve this -

df1 = pd.DataFrame({'A':['a','b']})

df2 = pd.DataFrame({'A':['a','c']})

df1['col1']='df1'

df2['col2']='df2'

df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')

df['B']=df['col1']+','+df['col2']

df['B'] = df['B'].str.strip(',')

df=df[['A','B']]

df



   A        B

0  a  df1,df2

1  b      df1

2  c      df2

answered Jan 22 at 19:52

cph_sto

1,691320

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

pleasure Leon :)

– cph_sto
Jan 22 at 20:38

add a comment |

Use the command below:

df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 

    .groupby('A') 

    .aggregate(list) 

    .reset_index()

The result will be:

   A      source

0  a  [df1, df2]

1  b       [df1]

2  c       [df2]

The assign will add a column named source with value df1 and df2 to your dataframes. groupby command groups rows with same A value to single row. aggregate command describes how to aggregate other columns (source) for each group of rows with same A. I have used list aggregate function so that the source column be the list of values with same A.

edited Jan 22 at 20:05

answered Jan 22 at 19:46

Narges Ayoubi

564

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54315222%2funion-of-two-pandas-dataframes%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Merge with an indicator argument, and remap the result:

m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}



result = df1.merge(df2, on=['A'], how='outer', indicator='B')

result['B'] = result['B'].map(m)



result

   A         B

0  a  df1, df2

1  b       df1

2  c       df2

answered Jan 22 at 19:37

coldspeed

127k23129216

Nice and succinct!

– cph_sto
Jan 22 at 19:53

@cph_sto Thank you! Upvoted back.

– coldspeed
Jan 22 at 19:55

1

I have learnt a lot from you.

– cph_sto
Jan 22 at 19:56

Excellent! Could you add how to do the same for intersection? outer->inner?

– Leon Rai
Jan 22 at 20:28

1

@LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

– coldspeed
Jan 22 at 20:29

|
show 1 more comment

Merge with an indicator argument, and remap the result:

m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}



result = df1.merge(df2, on=['A'], how='outer', indicator='B')

result['B'] = result['B'].map(m)



result

   A         B

0  a  df1, df2

1  b       df1

2  c       df2

answered Jan 22 at 19:37

coldspeed

127k23129216

Nice and succinct!

– cph_sto
Jan 22 at 19:53

@cph_sto Thank you! Upvoted back.

– coldspeed
Jan 22 at 19:55

1

I have learnt a lot from you.

– cph_sto
Jan 22 at 19:56

Excellent! Could you add how to do the same for intersection? outer->inner?

– Leon Rai
Jan 22 at 20:28

1

@LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

– coldspeed
Jan 22 at 20:29

|
show 1 more comment

Merge with an indicator argument, and remap the result:

m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}



result = df1.merge(df2, on=['A'], how='outer', indicator='B')

result['B'] = result['B'].map(m)



result

   A         B

0  a  df1, df2

1  b       df1

2  c       df2

answered Jan 22 at 19:37

coldspeed

127k23129216

Merge with an indicator argument, and remap the result:

m = {'left_only': 'df1', 'right_only': 'df2', 'both': 'df1, df2'}



result = df1.merge(df2, on=['A'], how='outer', indicator='B')

result['B'] = result['B'].map(m)



result

   A         B

0  a  df1, df2

1  b       df1

2  c       df2

answered Jan 22 at 19:37

coldspeed

127k23129216

answered Jan 22 at 19:37

coldspeed

127k23129216

answered Jan 22 at 19:37

coldspeed

127k23129216

answered Jan 22 at 19:37

coldspeed

127k23129216

Nice and succinct!

– cph_sto
Jan 22 at 19:53

@cph_sto Thank you! Upvoted back.

– coldspeed
Jan 22 at 19:55

1

I have learnt a lot from you.

– cph_sto
Jan 22 at 19:56

Excellent! Could you add how to do the same for intersection? outer->inner?

– Leon Rai
Jan 22 at 20:28

1

@LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

– coldspeed
Jan 22 at 20:29

|
show 1 more comment

Nice and succinct!

– cph_sto
Jan 22 at 19:53

@cph_sto Thank you! Upvoted back.

– coldspeed
Jan 22 at 19:55

1

I have learnt a lot from you.

– cph_sto
Jan 22 at 19:56

Excellent! Could you add how to do the same for intersection? outer->inner?

– Leon Rai
Jan 22 at 20:28

1

@LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

– coldspeed
Jan 22 at 20:29

Nice and succinct!

– cph_sto
Jan 22 at 19:53

@cph_sto Thank you! Upvoted back.

– coldspeed
Jan 22 at 19:55

I have learnt a lot from you.

– cph_sto
Jan 22 at 19:56

Excellent! Could you add how to do the same for intersection? outer->inner?

– Leon Rai
Jan 22 at 20:28

@LeonRai df1.merge(df2, on=['A'], how='inner').assign(B='df1, df2') (since intersection implies membership in both)

– coldspeed
Jan 22 at 20:29

|
show 1 more comment

We use outer join to solve this -

df1 = pd.DataFrame({'A':['a','b']})

df2 = pd.DataFrame({'A':['a','c']})

df1['col1']='df1'

df2['col2']='df2'

df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')

df['B']=df['col1']+','+df['col2']

df['B'] = df['B'].str.strip(',')

df=df[['A','B']]

df



   A        B

0  a  df1,df2

1  b      df1

2  c      df2

answered Jan 22 at 19:52

cph_sto

1,691320

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

pleasure Leon :)

– cph_sto
Jan 22 at 20:38

add a comment |

We use outer join to solve this -

df1 = pd.DataFrame({'A':['a','b']})

df2 = pd.DataFrame({'A':['a','c']})

df1['col1']='df1'

df2['col2']='df2'

df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')

df['B']=df['col1']+','+df['col2']

df['B'] = df['B'].str.strip(',')

df=df[['A','B']]

df



   A        B

0  a  df1,df2

1  b      df1

2  c      df2

answered Jan 22 at 19:52

cph_sto

1,691320

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

pleasure Leon :)

– cph_sto
Jan 22 at 20:38

add a comment |

We use outer join to solve this -

df1 = pd.DataFrame({'A':['a','b']})

df2 = pd.DataFrame({'A':['a','c']})

df1['col1']='df1'

df2['col2']='df2'

df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')

df['B']=df['col1']+','+df['col2']

df['B'] = df['B'].str.strip(',')

df=df[['A','B']]

df



   A        B

0  a  df1,df2

1  b      df1

2  c      df2

answered Jan 22 at 19:52

cph_sto

1,691320

We use outer join to solve this -

df1 = pd.DataFrame({'A':['a','b']})

df2 = pd.DataFrame({'A':['a','c']})

df1['col1']='df1'

df2['col2']='df2'

df=pd.merge(df1, df2, on=['A'], how="outer").fillna('')

df['B']=df['col1']+','+df['col2']

df['B'] = df['B'].str.strip(',')

df=df[['A','B']]

df



   A        B

0  a  df1,df2

1  b      df1

2  c      df2

answered Jan 22 at 19:52

cph_sto

1,691320

answered Jan 22 at 19:52

cph_sto

1,691320

answered Jan 22 at 19:52

cph_sto

1,691320

answered Jan 22 at 19:52

cph_sto

1,691320

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

pleasure Leon :)

– cph_sto
Jan 22 at 20:38

add a comment |

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

pleasure Leon :)

– cph_sto
Jan 22 at 20:38

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

pleasure Leon :)

– cph_sto
Jan 22 at 20:38

add a comment |

Use the command below:

df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 

    .groupby('A') 

    .aggregate(list) 

    .reset_index()

The result will be:

   A      source

0  a  [df1, df2]

1  b       [df1]

2  c       [df2]

edited Jan 22 at 20:05

answered Jan 22 at 19:46

Narges Ayoubi

564

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

add a comment |

Use the command below:

df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 

    .groupby('A') 

    .aggregate(list) 

    .reset_index()

The result will be:

   A      source

0  a  [df1, df2]

1  b       [df1]

2  c       [df2]

edited Jan 22 at 20:05

answered Jan 22 at 19:46

Narges Ayoubi

564

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

add a comment |

Use the command below:

df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 

    .groupby('A') 

    .aggregate(list) 

    .reset_index()

The result will be:

   A      source

0  a  [df1, df2]

1  b       [df1]

2  c       [df2]

edited Jan 22 at 20:05

answered Jan 22 at 19:46

Narges Ayoubi

564

Use the command below:

df3 = pd.concat([df1.assign(source='df1'), df2.assign(source='df2')]) 

    .groupby('A') 

    .aggregate(list) 

    .reset_index()

The result will be:

   A      source

0  a  [df1, df2]

1  b       [df1]

2  c       [df2]

edited Jan 22 at 20:05

answered Jan 22 at 19:46

Narges Ayoubi

564

edited Jan 22 at 20:05

answered Jan 22 at 19:46

Narges Ayoubi

564

answered Jan 22 at 19:46

Narges Ayoubi

564

answered Jan 22 at 19:46

Narges Ayoubi

564

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

add a comment |

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

thank you for the answer!

– Leon Rai
Jan 22 at 20:31

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky