iterating re.split() on a dataframe












5















I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.



My data looks like:



   xg              
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06


I want to create



 e      a
0.05 0.43
0.93 0.05
0.00
0.11 0.11
0.00
3.94 2.06


I can do this using a for loop and and indexing.



for i in range(len(df)):
if df['xg'].str.len()[i] < 5:
df['e'][i] = df['xg'][i]
else:
df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])


However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.



I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.



If anyone has an idea of a better method than my ugly for loop I would be very interested.










share|improve this question




















  • 1





    Possible duplicate of how to split column of tuples in pandas dataframe?

    – Matthieu Brucher
    Nov 20 '18 at 22:04
















5















I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.



My data looks like:



   xg              
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06


I want to create



 e      a
0.05 0.43
0.93 0.05
0.00
0.11 0.11
0.00
3.94 2.06


I can do this using a for loop and and indexing.



for i in range(len(df)):
if df['xg'].str.len()[i] < 5:
df['e'][i] = df['xg'][i]
else:
df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])


However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.



I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.



If anyone has an idea of a better method than my ugly for loop I would be very interested.










share|improve this question




















  • 1





    Possible duplicate of how to split column of tuples in pandas dataframe?

    – Matthieu Brucher
    Nov 20 '18 at 22:04














5












5








5








I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.



My data looks like:



   xg              
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06


I want to create



 e      a
0.05 0.43
0.93 0.05
0.00
0.11 0.11
0.00
3.94 2.06


I can do this using a for loop and and indexing.



for i in range(len(df)):
if df['xg'].str.len()[i] < 5:
df['e'][i] = df['xg'][i]
else:
df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])


However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.



I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.



If anyone has an idea of a better method than my ugly for loop I would be very interested.










share|improve this question
















I am trying to use re.split() to split a single variable in a pandas dataframe into two other variables.



My data looks like:



   xg              
0.05+0.43
0.93+0.05
0.00
0.11+0.11
0.00
3.94-2.06


I want to create



 e      a
0.05 0.43
0.93 0.05
0.00
0.11 0.11
0.00
3.94 2.06


I can do this using a for loop and and indexing.



for i in range(len(df)):
if df['xg'].str.len()[i] < 5:
df['e'][i] = df['xg'][i]
else:
df['e'][i], df['a'][i] = re.split("[+ -]", df['xg'][i])


However this is slow and I do not believe is a good way of doing this and I am trying to improve my code/python understanding.



I had made various attempts by trying to write it using np.where, or using a list comprehension or apply lambda but I can't get it too run. I think all the issues I have are because I am trying to apply the functions to the whole series rather than the positional value.



If anyone has an idea of a better method than my ugly for loop I would be very interested.







python regex python-3.x pandas loops






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 0:25









U9-Forward

15.8k51541




15.8k51541










asked Nov 20 '18 at 21:58









oldlizardoldlizard

282




282








  • 1





    Possible duplicate of how to split column of tuples in pandas dataframe?

    – Matthieu Brucher
    Nov 20 '18 at 22:04














  • 1





    Possible duplicate of how to split column of tuples in pandas dataframe?

    – Matthieu Brucher
    Nov 20 '18 at 22:04








1




1





Possible duplicate of how to split column of tuples in pandas dataframe?

– Matthieu Brucher
Nov 20 '18 at 22:04





Possible duplicate of how to split column of tuples in pandas dataframe?

– Matthieu Brucher
Nov 20 '18 at 22:04












2 Answers
2






active

oldest

votes


















2














Borrowed from this answer using the str.split method with the expand argument:
https://stackoverflow.com/a/14745484/3084939



df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
df[['left','right']] = df['col'].str.split('[+|-]', expand=True)

df.head()
col left right
0 1+2 1 2
1 3+4 3 4
2 20 20 None
3 0.6+1.6 0.6 1.6





share|improve this answer
























  • This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

    – oldlizard
    Nov 20 '18 at 22:37





















0














This may be what you want. Not sure it's elegant, but should be faster than a python loop.



import pandas as pd
import numpy as np

data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
df = pd.DataFrame(data, columns=['xg'])

# Solution
tmp = df['xg'].str.split(r'[ -+]')
df['e'] = tmp.apply(lambda x: x[0])
df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
del(tmp)





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402227%2fiterating-re-split-on-a-dataframe%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Borrowed from this answer using the str.split method with the expand argument:
    https://stackoverflow.com/a/14745484/3084939



    df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
    df[['left','right']] = df['col'].str.split('[+|-]', expand=True)

    df.head()
    col left right
    0 1+2 1 2
    1 3+4 3 4
    2 20 20 None
    3 0.6+1.6 0.6 1.6





    share|improve this answer
























    • This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

      – oldlizard
      Nov 20 '18 at 22:37


















    2














    Borrowed from this answer using the str.split method with the expand argument:
    https://stackoverflow.com/a/14745484/3084939



    df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
    df[['left','right']] = df['col'].str.split('[+|-]', expand=True)

    df.head()
    col left right
    0 1+2 1 2
    1 3+4 3 4
    2 20 20 None
    3 0.6+1.6 0.6 1.6





    share|improve this answer
























    • This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

      – oldlizard
      Nov 20 '18 at 22:37
















    2












    2








    2







    Borrowed from this answer using the str.split method with the expand argument:
    https://stackoverflow.com/a/14745484/3084939



    df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
    df[['left','right']] = df['col'].str.split('[+|-]', expand=True)

    df.head()
    col left right
    0 1+2 1 2
    1 3+4 3 4
    2 20 20 None
    3 0.6+1.6 0.6 1.6





    share|improve this answer













    Borrowed from this answer using the str.split method with the expand argument:
    https://stackoverflow.com/a/14745484/3084939



    df = pd.DataFrame({'col': ['1+2','3+4','20','0.6-1.6']})
    df[['left','right']] = df['col'].str.split('[+|-]', expand=True)

    df.head()
    col left right
    0 1+2 1 2
    1 3+4 3 4
    2 20 20 None
    3 0.6+1.6 0.6 1.6






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 20 '18 at 22:31









    wonderstruck80wonderstruck80

    12418




    12418













    • This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

      – oldlizard
      Nov 20 '18 at 22:37





















    • This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

      – oldlizard
      Nov 20 '18 at 22:37



















    This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

    – oldlizard
    Nov 20 '18 at 22:37







    This is a much better method than the loop, I thought you could only split on a single delimiter. Thanks!

    – oldlizard
    Nov 20 '18 at 22:37















    0














    This may be what you want. Not sure it's elegant, but should be faster than a python loop.



    import pandas as pd
    import numpy as np

    data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
    df = pd.DataFrame(data, columns=['xg'])

    # Solution
    tmp = df['xg'].str.split(r'[ -+]')
    df['e'] = tmp.apply(lambda x: x[0])
    df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
    del(tmp)





    share|improve this answer




























      0














      This may be what you want. Not sure it's elegant, but should be faster than a python loop.



      import pandas as pd
      import numpy as np

      data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
      df = pd.DataFrame(data, columns=['xg'])

      # Solution
      tmp = df['xg'].str.split(r'[ -+]')
      df['e'] = tmp.apply(lambda x: x[0])
      df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
      del(tmp)





      share|improve this answer


























        0












        0








        0







        This may be what you want. Not sure it's elegant, but should be faster than a python loop.



        import pandas as pd
        import numpy as np

        data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
        df = pd.DataFrame(data, columns=['xg'])

        # Solution
        tmp = df['xg'].str.split(r'[ -+]')
        df['e'] = tmp.apply(lambda x: x[0])
        df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
        del(tmp)





        share|improve this answer













        This may be what you want. Not sure it's elegant, but should be faster than a python loop.



        import pandas as pd
        import numpy as np

        data = ['0.05+0.43','0.93+0.05','0.00','0.11+0.11','0.00','3.94-2.06']
        df = pd.DataFrame(data, columns=['xg'])

        # Solution
        tmp = df['xg'].str.split(r'[ -+]')
        df['e'] = tmp.apply(lambda x: x[0])
        df['a'] = tmp.apply(lambda x: x[1] if len(x) > 1 else np.nan)
        del(tmp)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 20 '18 at 22:43









        AResemAResem

        1114




        1114






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402227%2fiterating-re-split-on-a-dataframe%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

            ComboBox Display Member on multiple fields

            Is it possible to collect Nectar points via Trainline?