Split dates into time ranges in pandas
up vote
2
down vote
favorite
14 [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15 [2017-07-26, 2017-06-09, 2017-02-24]
16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18 [2017-02-08]
this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this
14 [2017-03-06, 2017-02-13]
15 [2017-02-24]
16 [null] # or delete empty rows, it doesn't matter
17 [null]
18 [2017-02-08]
then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far
months_4 = individual_dates.copy()
for _ in months_4['Date']:
_ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))
and
months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')
for _ in months_8['Date']:
_ = _[np.isin(_, range_8)]
achieved absolutely no result, data stays the same no matter what
update: I did what you said
individual_dates['Date'] = individual_dates['Date'].str.strip('').str.split(', ')
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())
})
df
and here is the result
Date ID
0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24
python-3.x pandas numpy datetime
add a comment |
up vote
2
down vote
favorite
14 [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15 [2017-07-26, 2017-06-09, 2017-02-24]
16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18 [2017-02-08]
this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this
14 [2017-03-06, 2017-02-13]
15 [2017-02-24]
16 [null] # or delete empty rows, it doesn't matter
17 [null]
18 [2017-02-08]
then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far
months_4 = individual_dates.copy()
for _ in months_4['Date']:
_ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))
and
months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')
for _ in months_8['Date']:
_ = _[np.isin(_, range_8)]
achieved absolutely no result, data stays the same no matter what
update: I did what you said
individual_dates['Date'] = individual_dates['Date'].str.strip('').str.split(', ')
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())
})
df
and here is the result
Date ID
0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24
python-3.x pandas numpy datetime
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
14 [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15 [2017-07-26, 2017-06-09, 2017-02-24]
16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18 [2017-02-08]
this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this
14 [2017-03-06, 2017-02-13]
15 [2017-02-24]
16 [null] # or delete empty rows, it doesn't matter
17 [null]
18 [2017-02-08]
then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far
months_4 = individual_dates.copy()
for _ in months_4['Date']:
_ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))
and
months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')
for _ in months_8['Date']:
_ = _[np.isin(_, range_8)]
achieved absolutely no result, data stays the same no matter what
update: I did what you said
individual_dates['Date'] = individual_dates['Date'].str.strip('').str.split(', ')
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())
})
df
and here is the result
Date ID
0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24
python-3.x pandas numpy datetime
14 [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15 [2017-07-26, 2017-06-09, 2017-02-24]
16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18 [2017-02-08]
this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this
14 [2017-03-06, 2017-02-13]
15 [2017-02-24]
16 [null] # or delete empty rows, it doesn't matter
17 [null]
18 [2017-02-08]
then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far
months_4 = individual_dates.copy()
for _ in months_4['Date']:
_ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))
and
months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')
for _ in months_8['Date']:
_ = _[np.isin(_, range_8)]
achieved absolutely no result, data stays the same no matter what
update: I did what you said
individual_dates['Date'] = individual_dates['Date'].str.strip('').str.split(', ')
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())
})
df
and here is the result
Date ID
0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24
python-3.x pandas numpy datetime
python-3.x pandas numpy datetime
edited Nov 14 at 12:18
asked Nov 14 at 5:33
Mels Hakobyan
206
206
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:
from itertools import chain
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})
range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'].isin(range_8)]
print (df)
Date ID
0 2017-03-06 14
0 2017-02-13 14
1 2017-02-24 15
4 2017-02-08 18
Usedf['Date'] = df['Date'].str.strip('').str.split(', ')
– jezrael
Nov 14 at 11:50
I updated my question, you can see the results after I did what you said
– Mels Hakobyan
Nov 14 at 12:04
@MelsHakobyan - check comment above, under my question
– jezrael
Nov 14 at 12:08
yeah, I added that as well, still the same
– Mels Hakobyan
Nov 14 at 12:12
@MelsHakobyan - how workingdf['Date'] = df['Date'].str.strip('').str.split()?
– jezrael
Nov 14 at 12:13
|
show 5 more comments
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:
from itertools import chain
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})
range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'].isin(range_8)]
print (df)
Date ID
0 2017-03-06 14
0 2017-02-13 14
1 2017-02-24 15
4 2017-02-08 18
Usedf['Date'] = df['Date'].str.strip('').str.split(', ')
– jezrael
Nov 14 at 11:50
I updated my question, you can see the results after I did what you said
– Mels Hakobyan
Nov 14 at 12:04
@MelsHakobyan - check comment above, under my question
– jezrael
Nov 14 at 12:08
yeah, I added that as well, still the same
– Mels Hakobyan
Nov 14 at 12:12
@MelsHakobyan - how workingdf['Date'] = df['Date'].str.strip('').str.split()?
– jezrael
Nov 14 at 12:13
|
show 5 more comments
up vote
0
down vote
accepted
For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:
from itertools import chain
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})
range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'].isin(range_8)]
print (df)
Date ID
0 2017-03-06 14
0 2017-02-13 14
1 2017-02-24 15
4 2017-02-08 18
Usedf['Date'] = df['Date'].str.strip('').str.split(', ')
– jezrael
Nov 14 at 11:50
I updated my question, you can see the results after I did what you said
– Mels Hakobyan
Nov 14 at 12:04
@MelsHakobyan - check comment above, under my question
– jezrael
Nov 14 at 12:08
yeah, I added that as well, still the same
– Mels Hakobyan
Nov 14 at 12:12
@MelsHakobyan - how workingdf['Date'] = df['Date'].str.strip('').str.split()?
– jezrael
Nov 14 at 12:13
|
show 5 more comments
up vote
0
down vote
accepted
up vote
0
down vote
accepted
For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:
from itertools import chain
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})
range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'].isin(range_8)]
print (df)
Date ID
0 2017-03-06 14
0 2017-02-13 14
1 2017-02-24 15
4 2017-02-08 18
For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:
from itertools import chain
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})
range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'].isin(range_8)]
print (df)
Date ID
0 2017-03-06 14
0 2017-02-13 14
1 2017-02-24 15
4 2017-02-08 18
answered Nov 14 at 6:45
jezrael
313k21248324
313k21248324
Usedf['Date'] = df['Date'].str.strip('').str.split(', ')
– jezrael
Nov 14 at 11:50
I updated my question, you can see the results after I did what you said
– Mels Hakobyan
Nov 14 at 12:04
@MelsHakobyan - check comment above, under my question
– jezrael
Nov 14 at 12:08
yeah, I added that as well, still the same
– Mels Hakobyan
Nov 14 at 12:12
@MelsHakobyan - how workingdf['Date'] = df['Date'].str.strip('').str.split()?
– jezrael
Nov 14 at 12:13
|
show 5 more comments
Usedf['Date'] = df['Date'].str.strip('').str.split(', ')
– jezrael
Nov 14 at 11:50
I updated my question, you can see the results after I did what you said
– Mels Hakobyan
Nov 14 at 12:04
@MelsHakobyan - check comment above, under my question
– jezrael
Nov 14 at 12:08
yeah, I added that as well, still the same
– Mels Hakobyan
Nov 14 at 12:12
@MelsHakobyan - how workingdf['Date'] = df['Date'].str.strip('').str.split()?
– jezrael
Nov 14 at 12:13
Use
df['Date'] = df['Date'].str.strip('').str.split(', ')– jezrael
Nov 14 at 11:50
Use
df['Date'] = df['Date'].str.strip('').str.split(', ')– jezrael
Nov 14 at 11:50
I updated my question, you can see the results after I did what you said
– Mels Hakobyan
Nov 14 at 12:04
I updated my question, you can see the results after I did what you said
– Mels Hakobyan
Nov 14 at 12:04
@MelsHakobyan - check comment above, under my question
– jezrael
Nov 14 at 12:08
@MelsHakobyan - check comment above, under my question
– jezrael
Nov 14 at 12:08
yeah, I added that as well, still the same
– Mels Hakobyan
Nov 14 at 12:12
yeah, I added that as well, still the same
– Mels Hakobyan
Nov 14 at 12:12
@MelsHakobyan - how working
df['Date'] = df['Date'].str.strip('').str.split() ?– jezrael
Nov 14 at 12:13
@MelsHakobyan - how working
df['Date'] = df['Date'].str.strip('').str.split() ?– jezrael
Nov 14 at 12:13
|
show 5 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53293753%2fsplit-dates-into-time-ranges-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown