Split dates into time ranges in pandas











up vote
2
down vote

favorite












14  [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15 [2017-07-26, 2017-06-09, 2017-02-24]
16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18 [2017-02-08]


this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this



14  [2017-03-06, 2017-02-13]
15 [2017-02-24]
16 [null] # or delete empty rows, it doesn't matter
17 [null]
18 [2017-02-08]


then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far



months_4 = individual_dates.copy()

for _ in months_4['Date']:
_ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))


and



months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')

for _ in months_8['Date']:
_ = _[np.isin(_, range_8)]


achieved absolutely no result, data stays the same no matter what



update: I did what you said



individual_dates['Date'] = individual_dates['Date'].str.strip('').str.split(', ')


df = pd.DataFrame({

'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())

})

df


and here is the result



Date    ID
0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24









share|improve this question




























    up vote
    2
    down vote

    favorite












    14  [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
    15 [2017-07-26, 2017-06-09, 2017-02-24]
    16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
    17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
    18 [2017-02-08]


    this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this



    14  [2017-03-06, 2017-02-13]
    15 [2017-02-24]
    16 [null] # or delete empty rows, it doesn't matter
    17 [null]
    18 [2017-02-08]


    then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far



    months_4 = individual_dates.copy()

    for _ in months_4['Date']:
    _ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))


    and



    months_8 = individual_dates.copy()
    range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')

    for _ in months_8['Date']:
    _ = _[np.isin(_, range_8)]


    achieved absolutely no result, data stays the same no matter what



    update: I did what you said



    individual_dates['Date'] = individual_dates['Date'].str.strip('').str.split(', ')


    df = pd.DataFrame({

    'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
    'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())

    })

    df


    and here is the result



    Date    ID
    0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
    1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
    2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
    3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
    4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
    5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
    6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
    7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
    8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
    9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
    10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24









    share|improve this question


























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      14  [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
      15 [2017-07-26, 2017-06-09, 2017-02-24]
      16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
      17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
      18 [2017-02-08]


      this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this



      14  [2017-03-06, 2017-02-13]
      15 [2017-02-24]
      16 [null] # or delete empty rows, it doesn't matter
      17 [null]
      18 [2017-02-08]


      then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far



      months_4 = individual_dates.copy()

      for _ in months_4['Date']:
      _ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))


      and



      months_8 = individual_dates.copy()
      range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')

      for _ in months_8['Date']:
      _ = _[np.isin(_, range_8)]


      achieved absolutely no result, data stays the same no matter what



      update: I did what you said



      individual_dates['Date'] = individual_dates['Date'].str.strip('').str.split(', ')


      df = pd.DataFrame({

      'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
      'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())

      })

      df


      and here is the result



      Date    ID
      0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
      1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
      2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
      3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
      4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
      5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
      6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
      7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
      8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
      9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
      10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24









      share|improve this question















      14  [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
      15 [2017-07-26, 2017-06-09, 2017-02-24]
      16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
      17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
      18 [2017-02-08]


      this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this



      14  [2017-03-06, 2017-02-13]
      15 [2017-02-24]
      16 [null] # or delete empty rows, it doesn't matter
      17 [null]
      18 [2017-02-08]


      then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far



      months_4 = individual_dates.copy()

      for _ in months_4['Date']:
      _ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))


      and



      months_8 = individual_dates.copy()
      range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')

      for _ in months_8['Date']:
      _ = _[np.isin(_, range_8)]


      achieved absolutely no result, data stays the same no matter what



      update: I did what you said



      individual_dates['Date'] = individual_dates['Date'].str.strip('').str.split(', ')


      df = pd.DataFrame({

      'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
      'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())

      })

      df


      and here is the result



      Date    ID
      0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
      1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
      2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
      3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
      4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
      5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
      6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
      7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
      8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
      9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
      10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24






      python-3.x pandas numpy datetime






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 at 12:18

























      asked Nov 14 at 5:33









      Mels Hakobyan

      206




      206
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:



          from itertools import chain

          df = pd.DataFrame({
          'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
          'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
          })

          range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')

          df['Date'] = pd.to_datetime(df['Date'])

          df = df[df['Date'].isin(range_8)]
          print (df)
          Date ID
          0 2017-03-06 14
          0 2017-02-13 14
          1 2017-02-24 15
          4 2017-02-08 18





          share|improve this answer





















          • Use df['Date'] = df['Date'].str.strip('').str.split(', ')
            – jezrael
            Nov 14 at 11:50










          • I updated my question, you can see the results after I did what you said
            – Mels Hakobyan
            Nov 14 at 12:04










          • @MelsHakobyan - check comment above, under my question
            – jezrael
            Nov 14 at 12:08










          • yeah, I added that as well, still the same
            – Mels Hakobyan
            Nov 14 at 12:12










          • @MelsHakobyan - how working df['Date'] = df['Date'].str.strip('').str.split() ?
            – jezrael
            Nov 14 at 12:13











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53293753%2fsplit-dates-into-time-ranges-in-pandas%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote



          accepted










          For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:



          from itertools import chain

          df = pd.DataFrame({
          'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
          'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
          })

          range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')

          df['Date'] = pd.to_datetime(df['Date'])

          df = df[df['Date'].isin(range_8)]
          print (df)
          Date ID
          0 2017-03-06 14
          0 2017-02-13 14
          1 2017-02-24 15
          4 2017-02-08 18





          share|improve this answer





















          • Use df['Date'] = df['Date'].str.strip('').str.split(', ')
            – jezrael
            Nov 14 at 11:50










          • I updated my question, you can see the results after I did what you said
            – Mels Hakobyan
            Nov 14 at 12:04










          • @MelsHakobyan - check comment above, under my question
            – jezrael
            Nov 14 at 12:08










          • yeah, I added that as well, still the same
            – Mels Hakobyan
            Nov 14 at 12:12










          • @MelsHakobyan - how working df['Date'] = df['Date'].str.strip('').str.split() ?
            – jezrael
            Nov 14 at 12:13















          up vote
          0
          down vote



          accepted










          For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:



          from itertools import chain

          df = pd.DataFrame({
          'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
          'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
          })

          range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')

          df['Date'] = pd.to_datetime(df['Date'])

          df = df[df['Date'].isin(range_8)]
          print (df)
          Date ID
          0 2017-03-06 14
          0 2017-02-13 14
          1 2017-02-24 15
          4 2017-02-08 18





          share|improve this answer





















          • Use df['Date'] = df['Date'].str.strip('').str.split(', ')
            – jezrael
            Nov 14 at 11:50










          • I updated my question, you can see the results after I did what you said
            – Mels Hakobyan
            Nov 14 at 12:04










          • @MelsHakobyan - check comment above, under my question
            – jezrael
            Nov 14 at 12:08










          • yeah, I added that as well, still the same
            – Mels Hakobyan
            Nov 14 at 12:12










          • @MelsHakobyan - how working df['Date'] = df['Date'].str.strip('').str.split() ?
            – jezrael
            Nov 14 at 12:13













          up vote
          0
          down vote



          accepted







          up vote
          0
          down vote



          accepted






          For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:



          from itertools import chain

          df = pd.DataFrame({
          'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
          'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
          })

          range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')

          df['Date'] = pd.to_datetime(df['Date'])

          df = df[df['Date'].isin(range_8)]
          print (df)
          Date ID
          0 2017-03-06 14
          0 2017-02-13 14
          1 2017-02-24 15
          4 2017-02-08 18





          share|improve this answer












          For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:



          from itertools import chain

          df = pd.DataFrame({
          'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
          'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
          })

          range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')

          df['Date'] = pd.to_datetime(df['Date'])

          df = df[df['Date'].isin(range_8)]
          print (df)
          Date ID
          0 2017-03-06 14
          0 2017-02-13 14
          1 2017-02-24 15
          4 2017-02-08 18






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 14 at 6:45









          jezrael

          313k21248324




          313k21248324












          • Use df['Date'] = df['Date'].str.strip('').str.split(', ')
            – jezrael
            Nov 14 at 11:50










          • I updated my question, you can see the results after I did what you said
            – Mels Hakobyan
            Nov 14 at 12:04










          • @MelsHakobyan - check comment above, under my question
            – jezrael
            Nov 14 at 12:08










          • yeah, I added that as well, still the same
            – Mels Hakobyan
            Nov 14 at 12:12










          • @MelsHakobyan - how working df['Date'] = df['Date'].str.strip('').str.split() ?
            – jezrael
            Nov 14 at 12:13


















          • Use df['Date'] = df['Date'].str.strip('').str.split(', ')
            – jezrael
            Nov 14 at 11:50










          • I updated my question, you can see the results after I did what you said
            – Mels Hakobyan
            Nov 14 at 12:04










          • @MelsHakobyan - check comment above, under my question
            – jezrael
            Nov 14 at 12:08










          • yeah, I added that as well, still the same
            – Mels Hakobyan
            Nov 14 at 12:12










          • @MelsHakobyan - how working df['Date'] = df['Date'].str.strip('').str.split() ?
            – jezrael
            Nov 14 at 12:13
















          Use df['Date'] = df['Date'].str.strip('').str.split(', ')
          – jezrael
          Nov 14 at 11:50




          Use df['Date'] = df['Date'].str.strip('').str.split(', ')
          – jezrael
          Nov 14 at 11:50












          I updated my question, you can see the results after I did what you said
          – Mels Hakobyan
          Nov 14 at 12:04




          I updated my question, you can see the results after I did what you said
          – Mels Hakobyan
          Nov 14 at 12:04












          @MelsHakobyan - check comment above, under my question
          – jezrael
          Nov 14 at 12:08




          @MelsHakobyan - check comment above, under my question
          – jezrael
          Nov 14 at 12:08












          yeah, I added that as well, still the same
          – Mels Hakobyan
          Nov 14 at 12:12




          yeah, I added that as well, still the same
          – Mels Hakobyan
          Nov 14 at 12:12












          @MelsHakobyan - how working df['Date'] = df['Date'].str.strip('').str.split() ?
          – jezrael
          Nov 14 at 12:13




          @MelsHakobyan - how working df['Date'] = df['Date'].str.strip('').str.split() ?
          – jezrael
          Nov 14 at 12:13


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53293753%2fsplit-dates-into-time-ranges-in-pandas%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to send String Array data to Server using php in android

          Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

          Is anime1.com a legal site for watching anime?