iterating re.split() on a dataframe

I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.

My data looks like:

   xg              

0.05+0.43

0.93+0.05

0.00

0.11+0.11

0.00

3.94-2.06

I want to create

I can do this using a for loop and and indexing.

for i in range(len(df)):

    if df['xg'].str.len()[i] < 5:

        df['e'][i] = df['xg'][i]

    else:

        df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])

However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.

I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.

If anyone has an idea of a better method than my ugly for loop I would be very interested.

edited Nov 21 '18 at 0:25

U9-Forward

15.8k51541

asked Nov 20 '18 at 21:58

oldlizard

282

1

Possible duplicate of how to split column of tuples in pandas dataframe?

– Matthieu Brucher
Nov 20 '18 at 22:04

add a comment |

I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.

My data looks like:

   xg              

0.05+0.43

0.93+0.05

0.00

0.11+0.11

0.00

3.94-2.06

I want to create

I can do this using a for loop and and indexing.

for i in range(len(df)):

    if df['xg'].str.len()[i] < 5:

        df['e'][i] = df['xg'][i]

    else:

        df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])

However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.

If anyone has an idea of a better method than my ugly for loop I would be very interested.

edited Nov 21 '18 at 0:25

U9-Forward

15.8k51541

asked Nov 20 '18 at 21:58

oldlizard

282

1

Possible duplicate of how to split column of tuples in pandas dataframe?

– Matthieu Brucher
Nov 20 '18 at 22:04

add a comment |

I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.

My data looks like:

   xg              

0.05+0.43

0.93+0.05

0.00

0.11+0.11

0.00

3.94-2.06

I want to create

I can do this using a for loop and and indexing.

for i in range(len(df)):

    if df['xg'].str.len()[i] < 5:

        df['e'][i] = df['xg'][i]

    else:

        df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])

However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.

If anyone has an idea of a better method than my ugly for loop I would be very interested.

edited Nov 21 '18 at 0:25

U9-Forward

15.8k51541

asked Nov 20 '18 at 21:58

oldlizard

282

I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.

My data looks like:

   xg              

0.05+0.43

0.93+0.05

0.00

0.11+0.11

0.00

3.94-2.06

I want to create

I can do this using a for loop and and indexing.

for i in range(len(df)):

    if df['xg'].str.len()[i] < 5:

        df['e'][i] = df['xg'][i]

    else:

        df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])

However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.

If anyone has an idea of a better method than my ugly for loop I would be very interested.

python regex python-3.x pandas loops

edited Nov 21 '18 at 0:25

U9-Forward

15.8k51541

asked Nov 20 '18 at 21:58

oldlizard

282

edited Nov 21 '18 at 0:25

U9-Forward

15.8k51541

asked Nov 20 '18 at 21:58

oldlizard

282

edited Nov 21 '18 at 0:25

U9-Forward

15.8k51541

edited Nov 21 '18 at 0:25

U9-Forward

15.8k51541

edited Nov 21 '18 at 0:25

U9-Forward

15.8k51541

asked Nov 20 '18 at 21:58

oldlizard

282

asked Nov 20 '18 at 21:58

oldlizard

282

asked Nov 20 '18 at 21:58

oldlizard

282

1

Possible duplicate of how to split column of tuples in pandas dataframe?

– Matthieu Brucher
Nov 20 '18 at 22:04

add a comment |

1

Possible duplicate of how to split column of tuples in pandas dataframe?

– Matthieu Brucher
Nov 20 '18 at 22:04

Possible duplicate of how to split column of tuples in pandas dataframe?

– Matthieu Brucher
Nov 20 '18 at 22:04

add a comment |

2 Answers
2

active

oldest

votes

Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939

df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})

df[['left','right']] = df['col'].str.split('[+|-]', expand=True)



df.head()

       col left right

0      1+2    1     2

1      3+4    3     4

2       20   20  None

3  0.6+1.6  0.6   1.6

answered Nov 20 '18 at 22:31

wonderstruck80

12418

This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

– oldlizard
Nov 20 '18 at 22:37

add a comment |

This may be what you want. Not sure it's elegant, but should be faster than a python loop.

import pandas as pd

import numpy as np



data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']

df = pd.DataFrame(data, columns=['xg'])



# Solution

tmp = df['xg'].str.split(r'[ -+]')

df['e'] = tmp.apply(lambda x: x[0])

df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)

del(tmp)

answered Nov 20 '18 at 22:43

AResem

1114

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402227%2fiterating-re-split-on-a-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939

df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})

df[['left','right']] = df['col'].str.split('[+|-]', expand=True)



df.head()

       col left right

0      1+2    1     2

1      3+4    3     4

2       20   20  None

3  0.6+1.6  0.6   1.6

answered Nov 20 '18 at 22:31

wonderstruck80

12418

This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

– oldlizard
Nov 20 '18 at 22:37

add a comment |

Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939

df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})

df[['left','right']] = df['col'].str.split('[+|-]', expand=True)



df.head()

       col left right

0      1+2    1     2

1      3+4    3     4

2       20   20  None

3  0.6+1.6  0.6   1.6

answered Nov 20 '18 at 22:31

wonderstruck80

12418

This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

– oldlizard
Nov 20 '18 at 22:37

add a comment |

Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939

df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})

df[['left','right']] = df['col'].str.split('[+|-]', expand=True)



df.head()

       col left right

0      1+2    1     2

1      3+4    3     4

2       20   20  None

3  0.6+1.6  0.6   1.6

answered Nov 20 '18 at 22:31

wonderstruck80

12418

Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939

df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})

df[['left','right']] = df['col'].str.split('[+|-]', expand=True)



df.head()

       col left right

0      1+2    1     2

1      3+4    3     4

2       20   20  None

3  0.6+1.6  0.6   1.6

answered Nov 20 '18 at 22:31

wonderstruck80

12418

answered Nov 20 '18 at 22:31

wonderstruck80

12418

answered Nov 20 '18 at 22:31

wonderstruck80

12418

answered Nov 20 '18 at 22:31

wonderstruck80

12418

This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

– oldlizard
Nov 20 '18 at 22:37

add a comment |

This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

– oldlizard
Nov 20 '18 at 22:37

This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

– oldlizard
Nov 20 '18 at 22:37

add a comment |

This may be what you want. Not sure it's elegant, but should be faster than a python loop.

import pandas as pd

import numpy as np



data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']

df = pd.DataFrame(data, columns=['xg'])



# Solution

tmp = df['xg'].str.split(r'[ -+]')

df['e'] = tmp.apply(lambda x: x[0])

df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)

del(tmp)

answered Nov 20 '18 at 22:43

AResem

1114

add a comment |

This may be what you want. Not sure it's elegant, but should be faster than a python loop.

import pandas as pd

import numpy as np



data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']

df = pd.DataFrame(data, columns=['xg'])



# Solution

tmp = df['xg'].str.split(r'[ -+]')

df['e'] = tmp.apply(lambda x: x[0])

df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)

del(tmp)

answered Nov 20 '18 at 22:43

AResem

1114

add a comment |

This may be what you want. Not sure it's elegant, but should be faster than a python loop.

import pandas as pd

import numpy as np



data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']

df = pd.DataFrame(data, columns=['xg'])



# Solution

tmp = df['xg'].str.split(r'[ -+]')

df['e'] = tmp.apply(lambda x: x[0])

df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)

del(tmp)

answered Nov 20 '18 at 22:43

AResem

1114

This may be what you want. Not sure it's elegant, but should be faster than a python loop.

import pandas as pd

import numpy as np



data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']

df = pd.DataFrame(data, columns=['xg'])



# Solution

tmp = df['xg'].str.split(r'[ -+]')

df['e'] = tmp.apply(lambda x: x[0])

df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)

del(tmp)

answered Nov 20 '18 at 22:43

AResem

1114

answered Nov 20 '18 at 22:43

AResem

1114

answered Nov 20 '18 at 22:43

AResem

1114

answered Nov 20 '18 at 22:43

AResem

1114

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky