iterating re.split() on a dataframe
I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.
My data looks like:
xg
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06
I want to create
e a
0.05 0.43
0.93 0.05
0.00
0.11 0.11
0.00
3.94 2.06
I can do this using a for loop and and indexing.
for i in range(len(df)):
if df['xg'].str.len()[i] < 5:
df['e'][i] = df['xg'][i]
else:
df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])
However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.
I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.
If anyone has an idea of a better method than my ugly for loop I would be very interested.
python regex python-3.x pandas loops
add a comment |
I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.
My data looks like:
xg
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06
I want to create
e a
0.05 0.43
0.93 0.05
0.00
0.11 0.11
0.00
3.94 2.06
I can do this using a for loop and and indexing.
for i in range(len(df)):
if df['xg'].str.len()[i] < 5:
df['e'][i] = df['xg'][i]
else:
df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])
However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.
I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.
If anyone has an idea of a better method than my ugly for loop I would be very interested.
python regex python-3.x pandas loops
1
Possible duplicate of how to split column of tuples in pandas dataframe?
– Matthieu Brucher
Nov 20 '18 at 22:04
add a comment |
I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.
My data looks like:
xg
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06
I want to create
e a
0.05 0.43
0.93 0.05
0.00
0.11 0.11
0.00
3.94 2.06
I can do this using a for loop and and indexing.
for i in range(len(df)):
if df['xg'].str.len()[i] < 5:
df['e'][i] = df['xg'][i]
else:
df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])
However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.
I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.
If anyone has an idea of a better method than my ugly for loop I would be very interested.
python regex python-3.x pandas loops
I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.
My data looks like:
xg
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06
I want to create
e a
0.05 0.43
0.93 0.05
0.00
0.11 0.11
0.00
3.94 2.06
I can do this using a for loop and and indexing.
for i in range(len(df)):
if df['xg'].str.len()[i] < 5:
df['e'][i] = df['xg'][i]
else:
df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])
However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.
I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.
If anyone has an idea of a better method than my ugly for loop I would be very interested.
python regex python-3.x pandas loops
python regex python-3.x pandas loops
edited Nov 21 '18 at 0:25
U9-Forward
15.8k51541
15.8k51541
asked Nov 20 '18 at 21:58
oldlizardoldlizard
282
282
1
Possible duplicate of how to split column of tuples in pandas dataframe?
– Matthieu Brucher
Nov 20 '18 at 22:04
add a comment |
1
Possible duplicate of how to split column of tuples in pandas dataframe?
– Matthieu Brucher
Nov 20 '18 at 22:04
1
1
Possible duplicate of how to split column of tuples in pandas dataframe?
– Matthieu Brucher
Nov 20 '18 at 22:04
Possible duplicate of how to split column of tuples in pandas dataframe?
– Matthieu Brucher
Nov 20 '18 at 22:04
add a comment |
2 Answers
2
active
oldest
votes
Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939
df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
df[['left','right']] = df['col'].str.split('[+|-]', expand=True)
df.head()
col left right
0 1+2 1 2
1 3+4 3 4
2 20 20 None
3 0.6+1.6 0.6 1.6
This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!
– oldlizard
Nov 20 '18 at 22:37
add a comment |
This may be what you want. Not sure it's elegant, but should be faster than a python loop.
import pandas as pd
import numpy as np
data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
df = pd.DataFrame(data, columns=['xg'])
# Solution
tmp = df['xg'].str.split(r'[ -+]')
df['e'] = tmp.apply(lambda x: x[0])
df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
del(tmp)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402227%2fiterating-re-split-on-a-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939
df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
df[['left','right']] = df['col'].str.split('[+|-]', expand=True)
df.head()
col left right
0 1+2 1 2
1 3+4 3 4
2 20 20 None
3 0.6+1.6 0.6 1.6
This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!
– oldlizard
Nov 20 '18 at 22:37
add a comment |
Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939
df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
df[['left','right']] = df['col'].str.split('[+|-]', expand=True)
df.head()
col left right
0 1+2 1 2
1 3+4 3 4
2 20 20 None
3 0.6+1.6 0.6 1.6
This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!
– oldlizard
Nov 20 '18 at 22:37
add a comment |
Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939
df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
df[['left','right']] = df['col'].str.split('[+|-]', expand=True)
df.head()
col left right
0 1+2 1 2
1 3+4 3 4
2 20 20 None
3 0.6+1.6 0.6 1.6
Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939
df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
df[['left','right']] = df['col'].str.split('[+|-]', expand=True)
df.head()
col left right
0 1+2 1 2
1 3+4 3 4
2 20 20 None
3 0.6+1.6 0.6 1.6
answered Nov 20 '18 at 22:31
wonderstruck80wonderstruck80
12418
12418
This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!
– oldlizard
Nov 20 '18 at 22:37
add a comment |
This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!
– oldlizard
Nov 20 '18 at 22:37
This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!
– oldlizard
Nov 20 '18 at 22:37
This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!
– oldlizard
Nov 20 '18 at 22:37
add a comment |
This may be what you want. Not sure it's elegant, but should be faster than a python loop.
import pandas as pd
import numpy as np
data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
df = pd.DataFrame(data, columns=['xg'])
# Solution
tmp = df['xg'].str.split(r'[ -+]')
df['e'] = tmp.apply(lambda x: x[0])
df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
del(tmp)
add a comment |
This may be what you want. Not sure it's elegant, but should be faster than a python loop.
import pandas as pd
import numpy as np
data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
df = pd.DataFrame(data, columns=['xg'])
# Solution
tmp = df['xg'].str.split(r'[ -+]')
df['e'] = tmp.apply(lambda x: x[0])
df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
del(tmp)
add a comment |
This may be what you want. Not sure it's elegant, but should be faster than a python loop.
import pandas as pd
import numpy as np
data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
df = pd.DataFrame(data, columns=['xg'])
# Solution
tmp = df['xg'].str.split(r'[ -+]')
df['e'] = tmp.apply(lambda x: x[0])
df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
del(tmp)
This may be what you want. Not sure it's elegant, but should be faster than a python loop.
import pandas as pd
import numpy as np
data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
df = pd.DataFrame(data, columns=['xg'])
# Solution
tmp = df['xg'].str.split(r'[ -+]')
df['e'] = tmp.apply(lambda x: x[0])
df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
del(tmp)
answered Nov 20 '18 at 22:43
AResemAResem
1114
1114
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402227%2fiterating-re-split-on-a-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Possible duplicate of how to split column of tuples in pandas dataframe?
– Matthieu Brucher
Nov 20 '18 at 22:04