Filling in past and future data from partial data in Python
up vote
0
down vote
favorite
I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.
I want the missing values to be filled with values preceding from the values before to get an output that looks like:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
python python-3.x pandas dataframe missing-data
add a comment |
up vote
0
down vote
favorite
I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.
I want the missing values to be filled with values preceding from the values before to get an output that looks like:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
python python-3.x pandas dataframe missing-data
1
Have you tried any methods yet? Look atfillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)
– soundstripe
Nov 13 at 17:36
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.
I want the missing values to be filled with values preceding from the values before to get an output that looks like:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
python python-3.x pandas dataframe missing-data
I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.
I want the missing values to be filled with values preceding from the values before to get an output that looks like:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
python python-3.x pandas dataframe missing-data
python python-3.x pandas dataframe missing-data
asked Nov 13 at 17:32
HelloToEarth
346210
346210
1
Have you tried any methods yet? Look atfillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)
– soundstripe
Nov 13 at 17:36
add a comment |
1
Have you tried any methods yet? Look atfillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)
– soundstripe
Nov 13 at 17:36
1
1
Have you tried any methods yet? Look at
fillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)– soundstripe
Nov 13 at 17:36
Have you tried any methods yet? Look at
fillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)– soundstripe
Nov 13 at 17:36
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
add a comment |
up vote
0
down vote
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
add a comment |
up vote
0
down vote
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
add a comment |
up vote
0
down vote
up vote
0
down vote
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
answered Nov 13 at 18:00
kon_u
1616
1616
add a comment |
add a comment |
up vote
0
down vote
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
add a comment |
up vote
0
down vote
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
add a comment |
up vote
0
down vote
up vote
0
down vote
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
edited Nov 13 at 18:04
answered Nov 13 at 17:51
Mabel Villalba
1,447114
1,447114
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286584%2ffilling-in-past-and-future-data-from-partial-data-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Have you tried any methods yet? Look at
fillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)– soundstripe
Nov 13 at 17:36