Dataframe iterate rows eliminate when condition met












-1















I have a big Dataframe, here is the sample data:



df['length']
353.216
353.514
273.559
274.199
353.813
354.116


I want to iterate over the rows and compare the i+1 with i row (and if the difference is less 2, then the value should stay, otherwise the whole row should be filtered out), I tried with Boolean indexing: diff = abs(df['length']).diff() < 2 and then df_clean = df[diff]



I want to get rid off all 'abnormal' rows. I know that every i+1 row should be in +- 2 range.
The problem is that there can be more than one row. I want to get rid of 273.559 and 274.199 (in this case), as the difference between them is less 2 I would need to iterate all the rows two times. Including a for loop to iterate over and over again doesn't seem the best approach to me, any good solutions?



Edit: My Output should look as follows:



df_clean_data ['length']
353.216
353.514
353.813
354.116


Thank you in advance
Ziga










share|improve this question




















  • 1





    Can you explain better exactly what you want as output?

    – Matina G
    Nov 21 '18 at 13:21






  • 1





    Why only 273.559 and 274.199? There are more contiguous elements with a smaller than 2 distance to surrounding ones, like 353.216 and 353.514

    – yatu
    Nov 21 '18 at 13:27











  • Only 273.559 (diff = 273.559 - 353.514 = -79.955) and 274.199 should also be eliminated as it exceeds 2 to other 'normal' values (diff = 274.199 - 353.514 = -79.315)

    – Ziga
    Nov 21 '18 at 13:31













  • Please reformulate your question if you want any help, what you are trying to do seems quite unclear

    – yatu
    Nov 21 '18 at 13:36


















-1















I have a big Dataframe, here is the sample data:



df['length']
353.216
353.514
273.559
274.199
353.813
354.116


I want to iterate over the rows and compare the i+1 with i row (and if the difference is less 2, then the value should stay, otherwise the whole row should be filtered out), I tried with Boolean indexing: diff = abs(df['length']).diff() < 2 and then df_clean = df[diff]



I want to get rid off all 'abnormal' rows. I know that every i+1 row should be in +- 2 range.
The problem is that there can be more than one row. I want to get rid of 273.559 and 274.199 (in this case), as the difference between them is less 2 I would need to iterate all the rows two times. Including a for loop to iterate over and over again doesn't seem the best approach to me, any good solutions?



Edit: My Output should look as follows:



df_clean_data ['length']
353.216
353.514
353.813
354.116


Thank you in advance
Ziga










share|improve this question




















  • 1





    Can you explain better exactly what you want as output?

    – Matina G
    Nov 21 '18 at 13:21






  • 1





    Why only 273.559 and 274.199? There are more contiguous elements with a smaller than 2 distance to surrounding ones, like 353.216 and 353.514

    – yatu
    Nov 21 '18 at 13:27











  • Only 273.559 (diff = 273.559 - 353.514 = -79.955) and 274.199 should also be eliminated as it exceeds 2 to other 'normal' values (diff = 274.199 - 353.514 = -79.315)

    – Ziga
    Nov 21 '18 at 13:31













  • Please reformulate your question if you want any help, what you are trying to do seems quite unclear

    – yatu
    Nov 21 '18 at 13:36
















-1












-1








-1








I have a big Dataframe, here is the sample data:



df['length']
353.216
353.514
273.559
274.199
353.813
354.116


I want to iterate over the rows and compare the i+1 with i row (and if the difference is less 2, then the value should stay, otherwise the whole row should be filtered out), I tried with Boolean indexing: diff = abs(df['length']).diff() < 2 and then df_clean = df[diff]



I want to get rid off all 'abnormal' rows. I know that every i+1 row should be in +- 2 range.
The problem is that there can be more than one row. I want to get rid of 273.559 and 274.199 (in this case), as the difference between them is less 2 I would need to iterate all the rows two times. Including a for loop to iterate over and over again doesn't seem the best approach to me, any good solutions?



Edit: My Output should look as follows:



df_clean_data ['length']
353.216
353.514
353.813
354.116


Thank you in advance
Ziga










share|improve this question
















I have a big Dataframe, here is the sample data:



df['length']
353.216
353.514
273.559
274.199
353.813
354.116


I want to iterate over the rows and compare the i+1 with i row (and if the difference is less 2, then the value should stay, otherwise the whole row should be filtered out), I tried with Boolean indexing: diff = abs(df['length']).diff() < 2 and then df_clean = df[diff]



I want to get rid off all 'abnormal' rows. I know that every i+1 row should be in +- 2 range.
The problem is that there can be more than one row. I want to get rid of 273.559 and 274.199 (in this case), as the difference between them is less 2 I would need to iterate all the rows two times. Including a for loop to iterate over and over again doesn't seem the best approach to me, any good solutions?



Edit: My Output should look as follows:



df_clean_data ['length']
353.216
353.514
353.813
354.116


Thank you in advance
Ziga







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 13:45







Ziga

















asked Nov 21 '18 at 13:18









ZigaZiga

175




175








  • 1





    Can you explain better exactly what you want as output?

    – Matina G
    Nov 21 '18 at 13:21






  • 1





    Why only 273.559 and 274.199? There are more contiguous elements with a smaller than 2 distance to surrounding ones, like 353.216 and 353.514

    – yatu
    Nov 21 '18 at 13:27











  • Only 273.559 (diff = 273.559 - 353.514 = -79.955) and 274.199 should also be eliminated as it exceeds 2 to other 'normal' values (diff = 274.199 - 353.514 = -79.315)

    – Ziga
    Nov 21 '18 at 13:31













  • Please reformulate your question if you want any help, what you are trying to do seems quite unclear

    – yatu
    Nov 21 '18 at 13:36
















  • 1





    Can you explain better exactly what you want as output?

    – Matina G
    Nov 21 '18 at 13:21






  • 1





    Why only 273.559 and 274.199? There are more contiguous elements with a smaller than 2 distance to surrounding ones, like 353.216 and 353.514

    – yatu
    Nov 21 '18 at 13:27











  • Only 273.559 (diff = 273.559 - 353.514 = -79.955) and 274.199 should also be eliminated as it exceeds 2 to other 'normal' values (diff = 274.199 - 353.514 = -79.315)

    – Ziga
    Nov 21 '18 at 13:31













  • Please reformulate your question if you want any help, what you are trying to do seems quite unclear

    – yatu
    Nov 21 '18 at 13:36










1




1





Can you explain better exactly what you want as output?

– Matina G
Nov 21 '18 at 13:21





Can you explain better exactly what you want as output?

– Matina G
Nov 21 '18 at 13:21




1




1





Why only 273.559 and 274.199? There are more contiguous elements with a smaller than 2 distance to surrounding ones, like 353.216 and 353.514

– yatu
Nov 21 '18 at 13:27





Why only 273.559 and 274.199? There are more contiguous elements with a smaller than 2 distance to surrounding ones, like 353.216 and 353.514

– yatu
Nov 21 '18 at 13:27













Only 273.559 (diff = 273.559 - 353.514 = -79.955) and 274.199 should also be eliminated as it exceeds 2 to other 'normal' values (diff = 274.199 - 353.514 = -79.315)

– Ziga
Nov 21 '18 at 13:31







Only 273.559 (diff = 273.559 - 353.514 = -79.955) and 274.199 should also be eliminated as it exceeds 2 to other 'normal' values (diff = 274.199 - 353.514 = -79.315)

– Ziga
Nov 21 '18 at 13:31















Please reformulate your question if you want any help, what you are trying to do seems quite unclear

– yatu
Nov 21 '18 at 13:36







Please reformulate your question if you want any help, what you are trying to do seems quite unclear

– yatu
Nov 21 '18 at 13:36














3 Answers
3






active

oldest

votes


















1














The key to success is a function working almost like diff():



def mark(x):
global prevX
difr = abs(x - prevX)
result = difr >= 2
if not result:
prevX = x
return result


But the differences are that:




  1. This function uses a global variable "previous x" (prevX),
    containing initially the first length (the program has to
    set it).

  2. Substitution of the current x under prevX occurs only
    if the difference is less than 2. So, in this respect,
    we "skip" rows to be deleted.


The initial step is to set prevX to the 1st length:



prevX = df.loc[0, 'length']


And the actual processing is performed with a single instruction:



df.drop(df[df['length'].apply(mark)].index, inplace=True)


A bit of explanation:





  • df['length'].apply(mark) generates boolean array. True means "this row
    is to be deleted". For instruction purpose execute this command alone
    (before dropping).


  • df[...].index generates list of index values for these rows.


  • df.drop deletes rows with the given indices (in place).


So the whole script is like below:



import pandas as pd

def mark(x):
global prevX
difr = abs(x - prevX)
result = difr > 2
if not result:
prevX = x
return result

data={ 'length': [ 353.216, 353.514, 273.559, 274.199, 353.813, 354.116 ] }
df = pd.DataFrame(data)
prevX = df.loc[0, 'length']
df.drop(df[df['length'].apply(mark)].index, inplace=True)


The result is:



    length
0 353.216
1 353.514
4 353.813
5 354.116


Alternative: If you want the result in another Dataframe, delete
inplace=True and substitute the result under the target variable.






share|improve this answer


























  • That's an amazing solution, thank you. One more Question: what exactly does 'inplace = True'?

    – Ziga
    Nov 22 '18 at 6:52






  • 1





    Without inplace the DataFrame with dropped rows is only the result of the function and the df involved is no changed. But when you use inplace=True, the result is saved in this df.

    – Valdi_Bo
    Nov 22 '18 at 8:29



















0














You have to iterate over your dataframe's rows like this as you can have multiple lines to filter between 2 values :



ref_row=df.iloc[0] # First line or first value you want to set as reference
valid_rows_indexes = # Store valid lines indexes
for index, row in df.iterrows(): # Iterate over rows
if abs(ref_row['length'] - row['length'])<2:
valid_rows_indexes.append(index) # Append valid line index
ref_row=row # Set this row as new reference value
df_clean_data = df.loc[valid_rows_indexes] # Filter dataframe


Hope this is helpfull.






share|improve this answer































    0














    your question is not crystal clear, but still whatever I understood I am trying to suggest some way.




    1. sort the DataFrame on that column(length)


    2. using for loop check for your difference


    3. if you want that record add it in the new DataFrame


    4. use new DataFrame



    other way Because you have Big DataFrame




    1. sort the DataFrame on that column(length)


    2. create new column


    3. using for loop check for your difference


    4. if you don't want that record write np.nanin the new column


    5. remove all the record which contain np.nan in new column







    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412947%2fdataframe-iterate-rows-eliminate-when-condition-met%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      The key to success is a function working almost like diff():



      def mark(x):
      global prevX
      difr = abs(x - prevX)
      result = difr >= 2
      if not result:
      prevX = x
      return result


      But the differences are that:




      1. This function uses a global variable "previous x" (prevX),
        containing initially the first length (the program has to
        set it).

      2. Substitution of the current x under prevX occurs only
        if the difference is less than 2. So, in this respect,
        we "skip" rows to be deleted.


      The initial step is to set prevX to the 1st length:



      prevX = df.loc[0, 'length']


      And the actual processing is performed with a single instruction:



      df.drop(df[df['length'].apply(mark)].index, inplace=True)


      A bit of explanation:





      • df['length'].apply(mark) generates boolean array. True means "this row
        is to be deleted". For instruction purpose execute this command alone
        (before dropping).


      • df[...].index generates list of index values for these rows.


      • df.drop deletes rows with the given indices (in place).


      So the whole script is like below:



      import pandas as pd

      def mark(x):
      global prevX
      difr = abs(x - prevX)
      result = difr > 2
      if not result:
      prevX = x
      return result

      data={ 'length': [ 353.216, 353.514, 273.559, 274.199, 353.813, 354.116 ] }
      df = pd.DataFrame(data)
      prevX = df.loc[0, 'length']
      df.drop(df[df['length'].apply(mark)].index, inplace=True)


      The result is:



          length
      0 353.216
      1 353.514
      4 353.813
      5 354.116


      Alternative: If you want the result in another Dataframe, delete
      inplace=True and substitute the result under the target variable.






      share|improve this answer


























      • That's an amazing solution, thank you. One more Question: what exactly does 'inplace = True'?

        – Ziga
        Nov 22 '18 at 6:52






      • 1





        Without inplace the DataFrame with dropped rows is only the result of the function and the df involved is no changed. But when you use inplace=True, the result is saved in this df.

        – Valdi_Bo
        Nov 22 '18 at 8:29
















      1














      The key to success is a function working almost like diff():



      def mark(x):
      global prevX
      difr = abs(x - prevX)
      result = difr >= 2
      if not result:
      prevX = x
      return result


      But the differences are that:




      1. This function uses a global variable "previous x" (prevX),
        containing initially the first length (the program has to
        set it).

      2. Substitution of the current x under prevX occurs only
        if the difference is less than 2. So, in this respect,
        we "skip" rows to be deleted.


      The initial step is to set prevX to the 1st length:



      prevX = df.loc[0, 'length']


      And the actual processing is performed with a single instruction:



      df.drop(df[df['length'].apply(mark)].index, inplace=True)


      A bit of explanation:





      • df['length'].apply(mark) generates boolean array. True means "this row
        is to be deleted". For instruction purpose execute this command alone
        (before dropping).


      • df[...].index generates list of index values for these rows.


      • df.drop deletes rows with the given indices (in place).


      So the whole script is like below:



      import pandas as pd

      def mark(x):
      global prevX
      difr = abs(x - prevX)
      result = difr > 2
      if not result:
      prevX = x
      return result

      data={ 'length': [ 353.216, 353.514, 273.559, 274.199, 353.813, 354.116 ] }
      df = pd.DataFrame(data)
      prevX = df.loc[0, 'length']
      df.drop(df[df['length'].apply(mark)].index, inplace=True)


      The result is:



          length
      0 353.216
      1 353.514
      4 353.813
      5 354.116


      Alternative: If you want the result in another Dataframe, delete
      inplace=True and substitute the result under the target variable.






      share|improve this answer


























      • That's an amazing solution, thank you. One more Question: what exactly does 'inplace = True'?

        – Ziga
        Nov 22 '18 at 6:52






      • 1





        Without inplace the DataFrame with dropped rows is only the result of the function and the df involved is no changed. But when you use inplace=True, the result is saved in this df.

        – Valdi_Bo
        Nov 22 '18 at 8:29














      1












      1








      1







      The key to success is a function working almost like diff():



      def mark(x):
      global prevX
      difr = abs(x - prevX)
      result = difr >= 2
      if not result:
      prevX = x
      return result


      But the differences are that:




      1. This function uses a global variable "previous x" (prevX),
        containing initially the first length (the program has to
        set it).

      2. Substitution of the current x under prevX occurs only
        if the difference is less than 2. So, in this respect,
        we "skip" rows to be deleted.


      The initial step is to set prevX to the 1st length:



      prevX = df.loc[0, 'length']


      And the actual processing is performed with a single instruction:



      df.drop(df[df['length'].apply(mark)].index, inplace=True)


      A bit of explanation:





      • df['length'].apply(mark) generates boolean array. True means "this row
        is to be deleted". For instruction purpose execute this command alone
        (before dropping).


      • df[...].index generates list of index values for these rows.


      • df.drop deletes rows with the given indices (in place).


      So the whole script is like below:



      import pandas as pd

      def mark(x):
      global prevX
      difr = abs(x - prevX)
      result = difr > 2
      if not result:
      prevX = x
      return result

      data={ 'length': [ 353.216, 353.514, 273.559, 274.199, 353.813, 354.116 ] }
      df = pd.DataFrame(data)
      prevX = df.loc[0, 'length']
      df.drop(df[df['length'].apply(mark)].index, inplace=True)


      The result is:



          length
      0 353.216
      1 353.514
      4 353.813
      5 354.116


      Alternative: If you want the result in another Dataframe, delete
      inplace=True and substitute the result under the target variable.






      share|improve this answer















      The key to success is a function working almost like diff():



      def mark(x):
      global prevX
      difr = abs(x - prevX)
      result = difr >= 2
      if not result:
      prevX = x
      return result


      But the differences are that:




      1. This function uses a global variable "previous x" (prevX),
        containing initially the first length (the program has to
        set it).

      2. Substitution of the current x under prevX occurs only
        if the difference is less than 2. So, in this respect,
        we "skip" rows to be deleted.


      The initial step is to set prevX to the 1st length:



      prevX = df.loc[0, 'length']


      And the actual processing is performed with a single instruction:



      df.drop(df[df['length'].apply(mark)].index, inplace=True)


      A bit of explanation:





      • df['length'].apply(mark) generates boolean array. True means "this row
        is to be deleted". For instruction purpose execute this command alone
        (before dropping).


      • df[...].index generates list of index values for these rows.


      • df.drop deletes rows with the given indices (in place).


      So the whole script is like below:



      import pandas as pd

      def mark(x):
      global prevX
      difr = abs(x - prevX)
      result = difr > 2
      if not result:
      prevX = x
      return result

      data={ 'length': [ 353.216, 353.514, 273.559, 274.199, 353.813, 354.116 ] }
      df = pd.DataFrame(data)
      prevX = df.loc[0, 'length']
      df.drop(df[df['length'].apply(mark)].index, inplace=True)


      The result is:



          length
      0 353.216
      1 353.514
      4 353.813
      5 354.116


      Alternative: If you want the result in another Dataframe, delete
      inplace=True and substitute the result under the target variable.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 21 '18 at 18:20

























      answered Nov 21 '18 at 18:15









      Valdi_BoValdi_Bo

      5,2252916




      5,2252916













      • That's an amazing solution, thank you. One more Question: what exactly does 'inplace = True'?

        – Ziga
        Nov 22 '18 at 6:52






      • 1





        Without inplace the DataFrame with dropped rows is only the result of the function and the df involved is no changed. But when you use inplace=True, the result is saved in this df.

        – Valdi_Bo
        Nov 22 '18 at 8:29



















      • That's an amazing solution, thank you. One more Question: what exactly does 'inplace = True'?

        – Ziga
        Nov 22 '18 at 6:52






      • 1





        Without inplace the DataFrame with dropped rows is only the result of the function and the df involved is no changed. But when you use inplace=True, the result is saved in this df.

        – Valdi_Bo
        Nov 22 '18 at 8:29

















      That's an amazing solution, thank you. One more Question: what exactly does 'inplace = True'?

      – Ziga
      Nov 22 '18 at 6:52





      That's an amazing solution, thank you. One more Question: what exactly does 'inplace = True'?

      – Ziga
      Nov 22 '18 at 6:52




      1




      1





      Without inplace the DataFrame with dropped rows is only the result of the function and the df involved is no changed. But when you use inplace=True, the result is saved in this df.

      – Valdi_Bo
      Nov 22 '18 at 8:29





      Without inplace the DataFrame with dropped rows is only the result of the function and the df involved is no changed. But when you use inplace=True, the result is saved in this df.

      – Valdi_Bo
      Nov 22 '18 at 8:29













      0














      You have to iterate over your dataframe's rows like this as you can have multiple lines to filter between 2 values :



      ref_row=df.iloc[0] # First line or first value you want to set as reference
      valid_rows_indexes = # Store valid lines indexes
      for index, row in df.iterrows(): # Iterate over rows
      if abs(ref_row['length'] - row['length'])<2:
      valid_rows_indexes.append(index) # Append valid line index
      ref_row=row # Set this row as new reference value
      df_clean_data = df.loc[valid_rows_indexes] # Filter dataframe


      Hope this is helpfull.






      share|improve this answer




























        0














        You have to iterate over your dataframe's rows like this as you can have multiple lines to filter between 2 values :



        ref_row=df.iloc[0] # First line or first value you want to set as reference
        valid_rows_indexes = # Store valid lines indexes
        for index, row in df.iterrows(): # Iterate over rows
        if abs(ref_row['length'] - row['length'])<2:
        valid_rows_indexes.append(index) # Append valid line index
        ref_row=row # Set this row as new reference value
        df_clean_data = df.loc[valid_rows_indexes] # Filter dataframe


        Hope this is helpfull.






        share|improve this answer


























          0












          0








          0







          You have to iterate over your dataframe's rows like this as you can have multiple lines to filter between 2 values :



          ref_row=df.iloc[0] # First line or first value you want to set as reference
          valid_rows_indexes = # Store valid lines indexes
          for index, row in df.iterrows(): # Iterate over rows
          if abs(ref_row['length'] - row['length'])<2:
          valid_rows_indexes.append(index) # Append valid line index
          ref_row=row # Set this row as new reference value
          df_clean_data = df.loc[valid_rows_indexes] # Filter dataframe


          Hope this is helpfull.






          share|improve this answer













          You have to iterate over your dataframe's rows like this as you can have multiple lines to filter between 2 values :



          ref_row=df.iloc[0] # First line or first value you want to set as reference
          valid_rows_indexes = # Store valid lines indexes
          for index, row in df.iterrows(): # Iterate over rows
          if abs(ref_row['length'] - row['length'])<2:
          valid_rows_indexes.append(index) # Append valid line index
          ref_row=row # Set this row as new reference value
          df_clean_data = df.loc[valid_rows_indexes] # Filter dataframe


          Hope this is helpfull.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 13:56









          Clem G.Clem G.

          12916




          12916























              0














              your question is not crystal clear, but still whatever I understood I am trying to suggest some way.




              1. sort the DataFrame on that column(length)


              2. using for loop check for your difference


              3. if you want that record add it in the new DataFrame


              4. use new DataFrame



              other way Because you have Big DataFrame




              1. sort the DataFrame on that column(length)


              2. create new column


              3. using for loop check for your difference


              4. if you don't want that record write np.nanin the new column


              5. remove all the record which contain np.nan in new column







              share|improve this answer




























                0














                your question is not crystal clear, but still whatever I understood I am trying to suggest some way.




                1. sort the DataFrame on that column(length)


                2. using for loop check for your difference


                3. if you want that record add it in the new DataFrame


                4. use new DataFrame



                other way Because you have Big DataFrame




                1. sort the DataFrame on that column(length)


                2. create new column


                3. using for loop check for your difference


                4. if you don't want that record write np.nanin the new column


                5. remove all the record which contain np.nan in new column







                share|improve this answer


























                  0












                  0








                  0







                  your question is not crystal clear, but still whatever I understood I am trying to suggest some way.




                  1. sort the DataFrame on that column(length)


                  2. using for loop check for your difference


                  3. if you want that record add it in the new DataFrame


                  4. use new DataFrame



                  other way Because you have Big DataFrame




                  1. sort the DataFrame on that column(length)


                  2. create new column


                  3. using for loop check for your difference


                  4. if you don't want that record write np.nanin the new column


                  5. remove all the record which contain np.nan in new column







                  share|improve this answer













                  your question is not crystal clear, but still whatever I understood I am trying to suggest some way.




                  1. sort the DataFrame on that column(length)


                  2. using for loop check for your difference


                  3. if you want that record add it in the new DataFrame


                  4. use new DataFrame



                  other way Because you have Big DataFrame




                  1. sort the DataFrame on that column(length)


                  2. create new column


                  3. using for loop check for your difference


                  4. if you don't want that record write np.nanin the new column


                  5. remove all the record which contain np.nan in new column








                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 '18 at 13:59









                  AnupritaAnuprita

                  285




                  285






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53412947%2fdataframe-iterate-rows-eliminate-when-condition-met%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to change which sound is reproduced for terminal bell?

                      Can I use Tabulator js library in my java Spring + Thymeleaf project?

                      Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents