Filling in past and future data from partial data in Python











up vote
0
down vote

favorite












I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:



State   Year    Month   Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0


The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.



I want the missing values to be filled with values preceding from the values before to get an output that looks like:



TN      1996    4       24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0









share|improve this question


















  • 1




    Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)
    – soundstripe
    Nov 13 at 17:36

















up vote
0
down vote

favorite












I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:



State   Year    Month   Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0


The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.



I want the missing values to be filled with values preceding from the values before to get an output that looks like:



TN      1996    4       24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0









share|improve this question


















  • 1




    Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)
    – soundstripe
    Nov 13 at 17:36















up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:



State   Year    Month   Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0


The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.



I want the missing values to be filled with values preceding from the values before to get an output that looks like:



TN      1996    4       24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0









share|improve this question













I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:



State   Year    Month   Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0


The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.



I want the missing values to be filled with values preceding from the values before to get an output that looks like:



TN      1996    4       24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0






python python-3.x pandas dataframe missing-data






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 at 17:32









HelloToEarth

346210




346210








  • 1




    Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)
    – soundstripe
    Nov 13 at 17:36
















  • 1




    Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)
    – soundstripe
    Nov 13 at 17:36










1




1




Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)
– soundstripe
Nov 13 at 17:36






Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)
– soundstripe
Nov 13 at 17:36














2 Answers
2






active

oldest

votes

















up vote
0
down vote













How about pandas.interpolate?: interpolate values according to different methods



See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



And some existing example previously posted: Pandas interpolate() backwards in dataframe






share|improve this answer




























    up vote
    0
    down vote













    You can create a dataframe with the missing months and them merge your result with it:



    dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
    end='1/08/%d' %df['Year'].max(),
    freq='MS', closed='left')

    >> dates

    DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
    '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
    '1987-10-01', '1987-11-01',
    ...
    '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
    '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
    '2015-12-01', '2016-01-01'],
    dtype='datetime64[ns]', length=348, freq='MS')


    Then you can create a dataframe with all the months:



    all_months = pd.DataFrame.from_records((dates.year, dates.month),
    index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


    And then merge it with the original dataframe and forward-fill it:



    df.merge(all_months, how='right').ffill()

    State Year Month Value
    0 TN 1987.0 1.0 24410.0
    1 TN 1987.0 2.0 24410.0
    2 TN 1987.0 3.0 24410.0
    3 TN 1987.0 4.0 24410.0
    4 TN 1996.0 1.0 24410.0
    5 TN 1996.0 2.0 24410.0
    6 TN 1996.0 3.0 24410.0
    7 TN 1996.0 4.0 24410.0
    8 TN 1996.0 5.0 37109.0
    9 TN 1996.0 6.0 37109.0
    10 TN 1996.0 7.0 37109.0
    11 TN 1996.0 8.0 37109.0
    12 TN 1996.0 9.0 37109.0
    13 TN 1996.0 10.0 37109.0
    14 TN 1996.0 11.0 37109.0
    15 TN 1996.0 12.0 37109.0
    16 TN 2016.0 1.0 49808.0
    17 TN 1987.0 5.0 49808.0
    18 TN 1987.0 6.0 49808.0
    19 TN 1987.0 7.0 49808.0
    20 TN 1987.0 8.0 49808.0
    21 TN 1987.0 9.0 49808.0
    22 TN 1987.0 10.0 49808.0
    23 TN 1987.0 11.0 49808.0
    24 TN 1987.0 12.0 49808.0
    25 TN 1988.0 1.0 49808.0
    26 TN 1988.0 2.0 49808.0
    27 TN 1988.0 3.0 49808.0
    28 TN 1988.0 4.0 49808.0
    29 TN 1988.0 5.0 49808.0
    .. ... ... ... ...
    319 TN 2013.0 7.0 49808.0
    320 TN 2013.0 8.0 49808.0
    321 TN 2013.0 9.0 49808.0
    322 TN 2013.0 10.0 49808.0
    323 TN 2013.0 11.0 49808.0
    324 TN 2013.0 12.0 49808.0
    325 TN 2014.0 1.0 49808.0
    326 TN 2014.0 2.0 49808.0
    327 TN 2014.0 3.0 49808.0
    328 TN 2014.0 4.0 49808.0
    329 TN 2014.0 5.0 49808.0
    330 TN 2014.0 6.0 49808.0
    331 TN 2014.0 7.0 49808.0
    332 TN 2014.0 8.0 49808.0
    333 TN 2014.0 9.0 49808.0
    334 TN 2014.0 10.0 49808.0
    335 TN 2014.0 11.0 49808.0
    336 TN 2014.0 12.0 49808.0
    337 TN 2015.0 1.0 49808.0
    338 TN 2015.0 2.0 49808.0
    339 TN 2015.0 3.0 49808.0
    340 TN 2015.0 4.0 49808.0
    341 TN 2015.0 5.0 49808.0
    342 TN 2015.0 6.0 49808.0
    343 TN 2015.0 7.0 49808.0
    344 TN 2015.0 8.0 49808.0
    345 TN 2015.0 9.0 49808.0
    346 TN 2015.0 10.0 49808.0
    347 TN 2015.0 11.0 49808.0
    348 TN 2015.0 12.0 49808.0


    Using pandas.resample



    Another solution is index by date and then resample there:



    df['Day'] = 1

    df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

    >> df1

    State Year Month Value Day
    date
    1987-01-01 TN 1987.0 1.0 24410.0 1
    1987-02-01 TN 1987.0 2.0 24410.0 1
    1987-03-01 TN 1987.0 3.0 24410.0 1
    1987-04-01 TN 1987.0 4.0 24410.0 1
    1996-01-01 TN 1996.0 1.0 24410.0 1
    1996-02-01 TN 1996.0 2.0 24410.0 1
    1996-03-01 TN 1996.0 3.0 24410.0 1
    1996-04-01 TN 1996.0 4.0 24410.0 1
    1996-05-01 TN 1996.0 5.0 37109.0 1
    1996-06-01 TN 1996.0 6.0 37109.0 1
    1996-07-01 TN 1996.0 7.0 37109.0 1
    1996-08-01 TN 1996.0 8.0 37109.0 1
    1996-09-01 TN 1996.0 9.0 37109.0 1
    1996-10-01 TN 1996.0 10.0 37109.0 1
    1996-11-01 TN 1996.0 11.0 37109.0 1
    1996-12-01 TN 1996.0 12.0 37109.0 1
    2016-01-01 TN 2016.0 1.0 49808.0 1
    2016-02-01 TN 2016.0 2.0 49808.0 1


    Then you can resample it by month by doing:



        res = df1.resample('M').first().ffill()

    >> res

    State Year Month Value Day
    date
    1987-01-31 TN 1987.0 1.0 24410.0 1.0
    1987-02-28 TN 1987.0 2.0 24410.0 1.0
    1987-03-31 TN 1987.0 3.0 24410.0 1.0
    1987-04-30 TN 1987.0 4.0 24410.0 1.0
    1987-05-31 TN 1987.0 4.0 24410.0 1.0
    1987-06-30 TN 1987.0 4.0 24410.0 1.0
    1987-07-31 TN 1987.0 4.0 24410.0 1.0
    1987-08-31 TN 1987.0 4.0 24410.0 1.0
    1987-09-30 TN 1987.0 4.0 24410.0 1.0
    1987-10-31 TN 1987.0 4.0 24410.0 1.0
    1987-11-30 TN 1987.0 4.0 24410.0 1.0
    1987-12-31 TN 1987.0 4.0 24410.0 1.0
    1988-01-31 TN 1987.0 4.0 24410.0 1.0
    1988-02-29 TN 1987.0 4.0 24410.0 1.0
    1988-03-31 TN 1987.0 4.0 24410.0 1.0
    1988-04-30 TN 1987.0 4.0 24410.0 1.0
    1988-05-31 TN 1987.0 4.0 24410.0 1.0
    1988-06-30 TN 1987.0 4.0 24410.0 1.0
    1988-07-31 TN 1987.0 4.0 24410.0 1.0
    1988-08-31 TN 1987.0 4.0 24410.0 1.0
    1988-09-30 TN 1987.0 4.0 24410.0 1.0
    1988-10-31 TN 1987.0 4.0 24410.0 1.0
    1988-11-30 TN 1987.0 4.0 24410.0 1.0
    1988-12-31 TN 1987.0 4.0 24410.0 1.0
    1989-01-31 TN 1987.0 4.0 24410.0 1.0
    1989-02-28 TN 1987.0 4.0 24410.0 1.0
    1989-03-31 TN 1987.0 4.0 24410.0 1.0
    1989-04-30 TN 1987.0 4.0 24410.0 1.0
    1989-05-31 TN 1987.0 4.0 24410.0 1.0
    1989-06-30 TN 1987.0 4.0 24410.0 1.0
    ... ... ... ... ... ...
    2013-09-30 TN 1996.0 12.0 37109.0 1.0
    2013-10-31 TN 1996.0 12.0 37109.0 1.0
    2013-11-30 TN 1996.0 12.0 37109.0 1.0
    2013-12-31 TN 1996.0 12.0 37109.0 1.0
    2014-01-31 TN 1996.0 12.0 37109.0 1.0
    2014-02-28 TN 1996.0 12.0 37109.0 1.0
    2014-03-31 TN 1996.0 12.0 37109.0 1.0
    2014-04-30 TN 1996.0 12.0 37109.0 1.0
    2014-05-31 TN 1996.0 12.0 37109.0 1.0
    2014-06-30 TN 1996.0 12.0 37109.0 1.0
    2014-07-31 TN 1996.0 12.0 37109.0 1.0
    2014-08-31 TN 1996.0 12.0 37109.0 1.0
    2014-09-30 TN 1996.0 12.0 37109.0 1.0
    2014-10-31 TN 1996.0 12.0 37109.0 1.0
    2014-11-30 TN 1996.0 12.0 37109.0 1.0
    2014-12-31 TN 1996.0 12.0 37109.0 1.0
    2015-01-31 TN 1996.0 12.0 37109.0 1.0
    2015-02-28 TN 1996.0 12.0 37109.0 1.0
    2015-03-31 TN 1996.0 12.0 37109.0 1.0
    2015-04-30 TN 1996.0 12.0 37109.0 1.0
    2015-05-31 TN 1996.0 12.0 37109.0 1.0
    2015-06-30 TN 1996.0 12.0 37109.0 1.0
    2015-07-31 TN 1996.0 12.0 37109.0 1.0
    2015-08-31 TN 1996.0 12.0 37109.0 1.0
    2015-09-30 TN 1996.0 12.0 37109.0 1.0
    2015-10-31 TN 1996.0 12.0 37109.0 1.0
    2015-11-30 TN 1996.0 12.0 37109.0 1.0
    2015-12-31 TN 1996.0 12.0 37109.0 1.0
    2016-01-31 TN 2016.0 1.0 49808.0 1.0
    2016-02-29 TN 2016.0 2.0 49808.0 1.0


    You can get the original structure by doing:



    >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

    State Year Month Value
    0 TN 1987.0 1.0 24410.0
    1 TN 1987.0 2.0 24410.0
    2 TN 1987.0 3.0 24410.0
    3 TN 1987.0 4.0 24410.0
    4 TN 1987.0 4.0 24410.0
    5 TN 1987.0 4.0 24410.0
    6 TN 1987.0 4.0 24410.0
    7 TN 1987.0 4.0 24410.0
    8 TN 1987.0 4.0 24410.0





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286584%2ffilling-in-past-and-future-data-from-partial-data-in-python%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      0
      down vote













      How about pandas.interpolate?: interpolate values according to different methods



      See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



      And some existing example previously posted: Pandas interpolate() backwards in dataframe






      share|improve this answer

























        up vote
        0
        down vote













        How about pandas.interpolate?: interpolate values according to different methods



        See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



        And some existing example previously posted: Pandas interpolate() backwards in dataframe






        share|improve this answer























          up vote
          0
          down vote










          up vote
          0
          down vote









          How about pandas.interpolate?: interpolate values according to different methods



          See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



          And some existing example previously posted: Pandas interpolate() backwards in dataframe






          share|improve this answer












          How about pandas.interpolate?: interpolate values according to different methods



          See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



          And some existing example previously posted: Pandas interpolate() backwards in dataframe







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 13 at 18:00









          kon_u

          1616




          1616
























              up vote
              0
              down vote













              You can create a dataframe with the missing months and them merge your result with it:



              dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
              end='1/08/%d' %df['Year'].max(),
              freq='MS', closed='left')

              >> dates

              DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
              '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
              '1987-10-01', '1987-11-01',
              ...
              '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
              '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
              '2015-12-01', '2016-01-01'],
              dtype='datetime64[ns]', length=348, freq='MS')


              Then you can create a dataframe with all the months:



              all_months = pd.DataFrame.from_records((dates.year, dates.month),
              index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


              And then merge it with the original dataframe and forward-fill it:



              df.merge(all_months, how='right').ffill()

              State Year Month Value
              0 TN 1987.0 1.0 24410.0
              1 TN 1987.0 2.0 24410.0
              2 TN 1987.0 3.0 24410.0
              3 TN 1987.0 4.0 24410.0
              4 TN 1996.0 1.0 24410.0
              5 TN 1996.0 2.0 24410.0
              6 TN 1996.0 3.0 24410.0
              7 TN 1996.0 4.0 24410.0
              8 TN 1996.0 5.0 37109.0
              9 TN 1996.0 6.0 37109.0
              10 TN 1996.0 7.0 37109.0
              11 TN 1996.0 8.0 37109.0
              12 TN 1996.0 9.0 37109.0
              13 TN 1996.0 10.0 37109.0
              14 TN 1996.0 11.0 37109.0
              15 TN 1996.0 12.0 37109.0
              16 TN 2016.0 1.0 49808.0
              17 TN 1987.0 5.0 49808.0
              18 TN 1987.0 6.0 49808.0
              19 TN 1987.0 7.0 49808.0
              20 TN 1987.0 8.0 49808.0
              21 TN 1987.0 9.0 49808.0
              22 TN 1987.0 10.0 49808.0
              23 TN 1987.0 11.0 49808.0
              24 TN 1987.0 12.0 49808.0
              25 TN 1988.0 1.0 49808.0
              26 TN 1988.0 2.0 49808.0
              27 TN 1988.0 3.0 49808.0
              28 TN 1988.0 4.0 49808.0
              29 TN 1988.0 5.0 49808.0
              .. ... ... ... ...
              319 TN 2013.0 7.0 49808.0
              320 TN 2013.0 8.0 49808.0
              321 TN 2013.0 9.0 49808.0
              322 TN 2013.0 10.0 49808.0
              323 TN 2013.0 11.0 49808.0
              324 TN 2013.0 12.0 49808.0
              325 TN 2014.0 1.0 49808.0
              326 TN 2014.0 2.0 49808.0
              327 TN 2014.0 3.0 49808.0
              328 TN 2014.0 4.0 49808.0
              329 TN 2014.0 5.0 49808.0
              330 TN 2014.0 6.0 49808.0
              331 TN 2014.0 7.0 49808.0
              332 TN 2014.0 8.0 49808.0
              333 TN 2014.0 9.0 49808.0
              334 TN 2014.0 10.0 49808.0
              335 TN 2014.0 11.0 49808.0
              336 TN 2014.0 12.0 49808.0
              337 TN 2015.0 1.0 49808.0
              338 TN 2015.0 2.0 49808.0
              339 TN 2015.0 3.0 49808.0
              340 TN 2015.0 4.0 49808.0
              341 TN 2015.0 5.0 49808.0
              342 TN 2015.0 6.0 49808.0
              343 TN 2015.0 7.0 49808.0
              344 TN 2015.0 8.0 49808.0
              345 TN 2015.0 9.0 49808.0
              346 TN 2015.0 10.0 49808.0
              347 TN 2015.0 11.0 49808.0
              348 TN 2015.0 12.0 49808.0


              Using pandas.resample



              Another solution is index by date and then resample there:



              df['Day'] = 1

              df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

              >> df1

              State Year Month Value Day
              date
              1987-01-01 TN 1987.0 1.0 24410.0 1
              1987-02-01 TN 1987.0 2.0 24410.0 1
              1987-03-01 TN 1987.0 3.0 24410.0 1
              1987-04-01 TN 1987.0 4.0 24410.0 1
              1996-01-01 TN 1996.0 1.0 24410.0 1
              1996-02-01 TN 1996.0 2.0 24410.0 1
              1996-03-01 TN 1996.0 3.0 24410.0 1
              1996-04-01 TN 1996.0 4.0 24410.0 1
              1996-05-01 TN 1996.0 5.0 37109.0 1
              1996-06-01 TN 1996.0 6.0 37109.0 1
              1996-07-01 TN 1996.0 7.0 37109.0 1
              1996-08-01 TN 1996.0 8.0 37109.0 1
              1996-09-01 TN 1996.0 9.0 37109.0 1
              1996-10-01 TN 1996.0 10.0 37109.0 1
              1996-11-01 TN 1996.0 11.0 37109.0 1
              1996-12-01 TN 1996.0 12.0 37109.0 1
              2016-01-01 TN 2016.0 1.0 49808.0 1
              2016-02-01 TN 2016.0 2.0 49808.0 1


              Then you can resample it by month by doing:



                  res = df1.resample('M').first().ffill()

              >> res

              State Year Month Value Day
              date
              1987-01-31 TN 1987.0 1.0 24410.0 1.0
              1987-02-28 TN 1987.0 2.0 24410.0 1.0
              1987-03-31 TN 1987.0 3.0 24410.0 1.0
              1987-04-30 TN 1987.0 4.0 24410.0 1.0
              1987-05-31 TN 1987.0 4.0 24410.0 1.0
              1987-06-30 TN 1987.0 4.0 24410.0 1.0
              1987-07-31 TN 1987.0 4.0 24410.0 1.0
              1987-08-31 TN 1987.0 4.0 24410.0 1.0
              1987-09-30 TN 1987.0 4.0 24410.0 1.0
              1987-10-31 TN 1987.0 4.0 24410.0 1.0
              1987-11-30 TN 1987.0 4.0 24410.0 1.0
              1987-12-31 TN 1987.0 4.0 24410.0 1.0
              1988-01-31 TN 1987.0 4.0 24410.0 1.0
              1988-02-29 TN 1987.0 4.0 24410.0 1.0
              1988-03-31 TN 1987.0 4.0 24410.0 1.0
              1988-04-30 TN 1987.0 4.0 24410.0 1.0
              1988-05-31 TN 1987.0 4.0 24410.0 1.0
              1988-06-30 TN 1987.0 4.0 24410.0 1.0
              1988-07-31 TN 1987.0 4.0 24410.0 1.0
              1988-08-31 TN 1987.0 4.0 24410.0 1.0
              1988-09-30 TN 1987.0 4.0 24410.0 1.0
              1988-10-31 TN 1987.0 4.0 24410.0 1.0
              1988-11-30 TN 1987.0 4.0 24410.0 1.0
              1988-12-31 TN 1987.0 4.0 24410.0 1.0
              1989-01-31 TN 1987.0 4.0 24410.0 1.0
              1989-02-28 TN 1987.0 4.0 24410.0 1.0
              1989-03-31 TN 1987.0 4.0 24410.0 1.0
              1989-04-30 TN 1987.0 4.0 24410.0 1.0
              1989-05-31 TN 1987.0 4.0 24410.0 1.0
              1989-06-30 TN 1987.0 4.0 24410.0 1.0
              ... ... ... ... ... ...
              2013-09-30 TN 1996.0 12.0 37109.0 1.0
              2013-10-31 TN 1996.0 12.0 37109.0 1.0
              2013-11-30 TN 1996.0 12.0 37109.0 1.0
              2013-12-31 TN 1996.0 12.0 37109.0 1.0
              2014-01-31 TN 1996.0 12.0 37109.0 1.0
              2014-02-28 TN 1996.0 12.0 37109.0 1.0
              2014-03-31 TN 1996.0 12.0 37109.0 1.0
              2014-04-30 TN 1996.0 12.0 37109.0 1.0
              2014-05-31 TN 1996.0 12.0 37109.0 1.0
              2014-06-30 TN 1996.0 12.0 37109.0 1.0
              2014-07-31 TN 1996.0 12.0 37109.0 1.0
              2014-08-31 TN 1996.0 12.0 37109.0 1.0
              2014-09-30 TN 1996.0 12.0 37109.0 1.0
              2014-10-31 TN 1996.0 12.0 37109.0 1.0
              2014-11-30 TN 1996.0 12.0 37109.0 1.0
              2014-12-31 TN 1996.0 12.0 37109.0 1.0
              2015-01-31 TN 1996.0 12.0 37109.0 1.0
              2015-02-28 TN 1996.0 12.0 37109.0 1.0
              2015-03-31 TN 1996.0 12.0 37109.0 1.0
              2015-04-30 TN 1996.0 12.0 37109.0 1.0
              2015-05-31 TN 1996.0 12.0 37109.0 1.0
              2015-06-30 TN 1996.0 12.0 37109.0 1.0
              2015-07-31 TN 1996.0 12.0 37109.0 1.0
              2015-08-31 TN 1996.0 12.0 37109.0 1.0
              2015-09-30 TN 1996.0 12.0 37109.0 1.0
              2015-10-31 TN 1996.0 12.0 37109.0 1.0
              2015-11-30 TN 1996.0 12.0 37109.0 1.0
              2015-12-31 TN 1996.0 12.0 37109.0 1.0
              2016-01-31 TN 2016.0 1.0 49808.0 1.0
              2016-02-29 TN 2016.0 2.0 49808.0 1.0


              You can get the original structure by doing:



              >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

              State Year Month Value
              0 TN 1987.0 1.0 24410.0
              1 TN 1987.0 2.0 24410.0
              2 TN 1987.0 3.0 24410.0
              3 TN 1987.0 4.0 24410.0
              4 TN 1987.0 4.0 24410.0
              5 TN 1987.0 4.0 24410.0
              6 TN 1987.0 4.0 24410.0
              7 TN 1987.0 4.0 24410.0
              8 TN 1987.0 4.0 24410.0





              share|improve this answer



























                up vote
                0
                down vote













                You can create a dataframe with the missing months and them merge your result with it:



                dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
                end='1/08/%d' %df['Year'].max(),
                freq='MS', closed='left')

                >> dates

                DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
                '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
                '1987-10-01', '1987-11-01',
                ...
                '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
                '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
                '2015-12-01', '2016-01-01'],
                dtype='datetime64[ns]', length=348, freq='MS')


                Then you can create a dataframe with all the months:



                all_months = pd.DataFrame.from_records((dates.year, dates.month),
                index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


                And then merge it with the original dataframe and forward-fill it:



                df.merge(all_months, how='right').ffill()

                State Year Month Value
                0 TN 1987.0 1.0 24410.0
                1 TN 1987.0 2.0 24410.0
                2 TN 1987.0 3.0 24410.0
                3 TN 1987.0 4.0 24410.0
                4 TN 1996.0 1.0 24410.0
                5 TN 1996.0 2.0 24410.0
                6 TN 1996.0 3.0 24410.0
                7 TN 1996.0 4.0 24410.0
                8 TN 1996.0 5.0 37109.0
                9 TN 1996.0 6.0 37109.0
                10 TN 1996.0 7.0 37109.0
                11 TN 1996.0 8.0 37109.0
                12 TN 1996.0 9.0 37109.0
                13 TN 1996.0 10.0 37109.0
                14 TN 1996.0 11.0 37109.0
                15 TN 1996.0 12.0 37109.0
                16 TN 2016.0 1.0 49808.0
                17 TN 1987.0 5.0 49808.0
                18 TN 1987.0 6.0 49808.0
                19 TN 1987.0 7.0 49808.0
                20 TN 1987.0 8.0 49808.0
                21 TN 1987.0 9.0 49808.0
                22 TN 1987.0 10.0 49808.0
                23 TN 1987.0 11.0 49808.0
                24 TN 1987.0 12.0 49808.0
                25 TN 1988.0 1.0 49808.0
                26 TN 1988.0 2.0 49808.0
                27 TN 1988.0 3.0 49808.0
                28 TN 1988.0 4.0 49808.0
                29 TN 1988.0 5.0 49808.0
                .. ... ... ... ...
                319 TN 2013.0 7.0 49808.0
                320 TN 2013.0 8.0 49808.0
                321 TN 2013.0 9.0 49808.0
                322 TN 2013.0 10.0 49808.0
                323 TN 2013.0 11.0 49808.0
                324 TN 2013.0 12.0 49808.0
                325 TN 2014.0 1.0 49808.0
                326 TN 2014.0 2.0 49808.0
                327 TN 2014.0 3.0 49808.0
                328 TN 2014.0 4.0 49808.0
                329 TN 2014.0 5.0 49808.0
                330 TN 2014.0 6.0 49808.0
                331 TN 2014.0 7.0 49808.0
                332 TN 2014.0 8.0 49808.0
                333 TN 2014.0 9.0 49808.0
                334 TN 2014.0 10.0 49808.0
                335 TN 2014.0 11.0 49808.0
                336 TN 2014.0 12.0 49808.0
                337 TN 2015.0 1.0 49808.0
                338 TN 2015.0 2.0 49808.0
                339 TN 2015.0 3.0 49808.0
                340 TN 2015.0 4.0 49808.0
                341 TN 2015.0 5.0 49808.0
                342 TN 2015.0 6.0 49808.0
                343 TN 2015.0 7.0 49808.0
                344 TN 2015.0 8.0 49808.0
                345 TN 2015.0 9.0 49808.0
                346 TN 2015.0 10.0 49808.0
                347 TN 2015.0 11.0 49808.0
                348 TN 2015.0 12.0 49808.0


                Using pandas.resample



                Another solution is index by date and then resample there:



                df['Day'] = 1

                df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

                >> df1

                State Year Month Value Day
                date
                1987-01-01 TN 1987.0 1.0 24410.0 1
                1987-02-01 TN 1987.0 2.0 24410.0 1
                1987-03-01 TN 1987.0 3.0 24410.0 1
                1987-04-01 TN 1987.0 4.0 24410.0 1
                1996-01-01 TN 1996.0 1.0 24410.0 1
                1996-02-01 TN 1996.0 2.0 24410.0 1
                1996-03-01 TN 1996.0 3.0 24410.0 1
                1996-04-01 TN 1996.0 4.0 24410.0 1
                1996-05-01 TN 1996.0 5.0 37109.0 1
                1996-06-01 TN 1996.0 6.0 37109.0 1
                1996-07-01 TN 1996.0 7.0 37109.0 1
                1996-08-01 TN 1996.0 8.0 37109.0 1
                1996-09-01 TN 1996.0 9.0 37109.0 1
                1996-10-01 TN 1996.0 10.0 37109.0 1
                1996-11-01 TN 1996.0 11.0 37109.0 1
                1996-12-01 TN 1996.0 12.0 37109.0 1
                2016-01-01 TN 2016.0 1.0 49808.0 1
                2016-02-01 TN 2016.0 2.0 49808.0 1


                Then you can resample it by month by doing:



                    res = df1.resample('M').first().ffill()

                >> res

                State Year Month Value Day
                date
                1987-01-31 TN 1987.0 1.0 24410.0 1.0
                1987-02-28 TN 1987.0 2.0 24410.0 1.0
                1987-03-31 TN 1987.0 3.0 24410.0 1.0
                1987-04-30 TN 1987.0 4.0 24410.0 1.0
                1987-05-31 TN 1987.0 4.0 24410.0 1.0
                1987-06-30 TN 1987.0 4.0 24410.0 1.0
                1987-07-31 TN 1987.0 4.0 24410.0 1.0
                1987-08-31 TN 1987.0 4.0 24410.0 1.0
                1987-09-30 TN 1987.0 4.0 24410.0 1.0
                1987-10-31 TN 1987.0 4.0 24410.0 1.0
                1987-11-30 TN 1987.0 4.0 24410.0 1.0
                1987-12-31 TN 1987.0 4.0 24410.0 1.0
                1988-01-31 TN 1987.0 4.0 24410.0 1.0
                1988-02-29 TN 1987.0 4.0 24410.0 1.0
                1988-03-31 TN 1987.0 4.0 24410.0 1.0
                1988-04-30 TN 1987.0 4.0 24410.0 1.0
                1988-05-31 TN 1987.0 4.0 24410.0 1.0
                1988-06-30 TN 1987.0 4.0 24410.0 1.0
                1988-07-31 TN 1987.0 4.0 24410.0 1.0
                1988-08-31 TN 1987.0 4.0 24410.0 1.0
                1988-09-30 TN 1987.0 4.0 24410.0 1.0
                1988-10-31 TN 1987.0 4.0 24410.0 1.0
                1988-11-30 TN 1987.0 4.0 24410.0 1.0
                1988-12-31 TN 1987.0 4.0 24410.0 1.0
                1989-01-31 TN 1987.0 4.0 24410.0 1.0
                1989-02-28 TN 1987.0 4.0 24410.0 1.0
                1989-03-31 TN 1987.0 4.0 24410.0 1.0
                1989-04-30 TN 1987.0 4.0 24410.0 1.0
                1989-05-31 TN 1987.0 4.0 24410.0 1.0
                1989-06-30 TN 1987.0 4.0 24410.0 1.0
                ... ... ... ... ... ...
                2013-09-30 TN 1996.0 12.0 37109.0 1.0
                2013-10-31 TN 1996.0 12.0 37109.0 1.0
                2013-11-30 TN 1996.0 12.0 37109.0 1.0
                2013-12-31 TN 1996.0 12.0 37109.0 1.0
                2014-01-31 TN 1996.0 12.0 37109.0 1.0
                2014-02-28 TN 1996.0 12.0 37109.0 1.0
                2014-03-31 TN 1996.0 12.0 37109.0 1.0
                2014-04-30 TN 1996.0 12.0 37109.0 1.0
                2014-05-31 TN 1996.0 12.0 37109.0 1.0
                2014-06-30 TN 1996.0 12.0 37109.0 1.0
                2014-07-31 TN 1996.0 12.0 37109.0 1.0
                2014-08-31 TN 1996.0 12.0 37109.0 1.0
                2014-09-30 TN 1996.0 12.0 37109.0 1.0
                2014-10-31 TN 1996.0 12.0 37109.0 1.0
                2014-11-30 TN 1996.0 12.0 37109.0 1.0
                2014-12-31 TN 1996.0 12.0 37109.0 1.0
                2015-01-31 TN 1996.0 12.0 37109.0 1.0
                2015-02-28 TN 1996.0 12.0 37109.0 1.0
                2015-03-31 TN 1996.0 12.0 37109.0 1.0
                2015-04-30 TN 1996.0 12.0 37109.0 1.0
                2015-05-31 TN 1996.0 12.0 37109.0 1.0
                2015-06-30 TN 1996.0 12.0 37109.0 1.0
                2015-07-31 TN 1996.0 12.0 37109.0 1.0
                2015-08-31 TN 1996.0 12.0 37109.0 1.0
                2015-09-30 TN 1996.0 12.0 37109.0 1.0
                2015-10-31 TN 1996.0 12.0 37109.0 1.0
                2015-11-30 TN 1996.0 12.0 37109.0 1.0
                2015-12-31 TN 1996.0 12.0 37109.0 1.0
                2016-01-31 TN 2016.0 1.0 49808.0 1.0
                2016-02-29 TN 2016.0 2.0 49808.0 1.0


                You can get the original structure by doing:



                >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

                State Year Month Value
                0 TN 1987.0 1.0 24410.0
                1 TN 1987.0 2.0 24410.0
                2 TN 1987.0 3.0 24410.0
                3 TN 1987.0 4.0 24410.0
                4 TN 1987.0 4.0 24410.0
                5 TN 1987.0 4.0 24410.0
                6 TN 1987.0 4.0 24410.0
                7 TN 1987.0 4.0 24410.0
                8 TN 1987.0 4.0 24410.0





                share|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  You can create a dataframe with the missing months and them merge your result with it:



                  dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
                  end='1/08/%d' %df['Year'].max(),
                  freq='MS', closed='left')

                  >> dates

                  DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
                  '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
                  '1987-10-01', '1987-11-01',
                  ...
                  '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
                  '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
                  '2015-12-01', '2016-01-01'],
                  dtype='datetime64[ns]', length=348, freq='MS')


                  Then you can create a dataframe with all the months:



                  all_months = pd.DataFrame.from_records((dates.year, dates.month),
                  index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


                  And then merge it with the original dataframe and forward-fill it:



                  df.merge(all_months, how='right').ffill()

                  State Year Month Value
                  0 TN 1987.0 1.0 24410.0
                  1 TN 1987.0 2.0 24410.0
                  2 TN 1987.0 3.0 24410.0
                  3 TN 1987.0 4.0 24410.0
                  4 TN 1996.0 1.0 24410.0
                  5 TN 1996.0 2.0 24410.0
                  6 TN 1996.0 3.0 24410.0
                  7 TN 1996.0 4.0 24410.0
                  8 TN 1996.0 5.0 37109.0
                  9 TN 1996.0 6.0 37109.0
                  10 TN 1996.0 7.0 37109.0
                  11 TN 1996.0 8.0 37109.0
                  12 TN 1996.0 9.0 37109.0
                  13 TN 1996.0 10.0 37109.0
                  14 TN 1996.0 11.0 37109.0
                  15 TN 1996.0 12.0 37109.0
                  16 TN 2016.0 1.0 49808.0
                  17 TN 1987.0 5.0 49808.0
                  18 TN 1987.0 6.0 49808.0
                  19 TN 1987.0 7.0 49808.0
                  20 TN 1987.0 8.0 49808.0
                  21 TN 1987.0 9.0 49808.0
                  22 TN 1987.0 10.0 49808.0
                  23 TN 1987.0 11.0 49808.0
                  24 TN 1987.0 12.0 49808.0
                  25 TN 1988.0 1.0 49808.0
                  26 TN 1988.0 2.0 49808.0
                  27 TN 1988.0 3.0 49808.0
                  28 TN 1988.0 4.0 49808.0
                  29 TN 1988.0 5.0 49808.0
                  .. ... ... ... ...
                  319 TN 2013.0 7.0 49808.0
                  320 TN 2013.0 8.0 49808.0
                  321 TN 2013.0 9.0 49808.0
                  322 TN 2013.0 10.0 49808.0
                  323 TN 2013.0 11.0 49808.0
                  324 TN 2013.0 12.0 49808.0
                  325 TN 2014.0 1.0 49808.0
                  326 TN 2014.0 2.0 49808.0
                  327 TN 2014.0 3.0 49808.0
                  328 TN 2014.0 4.0 49808.0
                  329 TN 2014.0 5.0 49808.0
                  330 TN 2014.0 6.0 49808.0
                  331 TN 2014.0 7.0 49808.0
                  332 TN 2014.0 8.0 49808.0
                  333 TN 2014.0 9.0 49808.0
                  334 TN 2014.0 10.0 49808.0
                  335 TN 2014.0 11.0 49808.0
                  336 TN 2014.0 12.0 49808.0
                  337 TN 2015.0 1.0 49808.0
                  338 TN 2015.0 2.0 49808.0
                  339 TN 2015.0 3.0 49808.0
                  340 TN 2015.0 4.0 49808.0
                  341 TN 2015.0 5.0 49808.0
                  342 TN 2015.0 6.0 49808.0
                  343 TN 2015.0 7.0 49808.0
                  344 TN 2015.0 8.0 49808.0
                  345 TN 2015.0 9.0 49808.0
                  346 TN 2015.0 10.0 49808.0
                  347 TN 2015.0 11.0 49808.0
                  348 TN 2015.0 12.0 49808.0


                  Using pandas.resample



                  Another solution is index by date and then resample there:



                  df['Day'] = 1

                  df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

                  >> df1

                  State Year Month Value Day
                  date
                  1987-01-01 TN 1987.0 1.0 24410.0 1
                  1987-02-01 TN 1987.0 2.0 24410.0 1
                  1987-03-01 TN 1987.0 3.0 24410.0 1
                  1987-04-01 TN 1987.0 4.0 24410.0 1
                  1996-01-01 TN 1996.0 1.0 24410.0 1
                  1996-02-01 TN 1996.0 2.0 24410.0 1
                  1996-03-01 TN 1996.0 3.0 24410.0 1
                  1996-04-01 TN 1996.0 4.0 24410.0 1
                  1996-05-01 TN 1996.0 5.0 37109.0 1
                  1996-06-01 TN 1996.0 6.0 37109.0 1
                  1996-07-01 TN 1996.0 7.0 37109.0 1
                  1996-08-01 TN 1996.0 8.0 37109.0 1
                  1996-09-01 TN 1996.0 9.0 37109.0 1
                  1996-10-01 TN 1996.0 10.0 37109.0 1
                  1996-11-01 TN 1996.0 11.0 37109.0 1
                  1996-12-01 TN 1996.0 12.0 37109.0 1
                  2016-01-01 TN 2016.0 1.0 49808.0 1
                  2016-02-01 TN 2016.0 2.0 49808.0 1


                  Then you can resample it by month by doing:



                      res = df1.resample('M').first().ffill()

                  >> res

                  State Year Month Value Day
                  date
                  1987-01-31 TN 1987.0 1.0 24410.0 1.0
                  1987-02-28 TN 1987.0 2.0 24410.0 1.0
                  1987-03-31 TN 1987.0 3.0 24410.0 1.0
                  1987-04-30 TN 1987.0 4.0 24410.0 1.0
                  1987-05-31 TN 1987.0 4.0 24410.0 1.0
                  1987-06-30 TN 1987.0 4.0 24410.0 1.0
                  1987-07-31 TN 1987.0 4.0 24410.0 1.0
                  1987-08-31 TN 1987.0 4.0 24410.0 1.0
                  1987-09-30 TN 1987.0 4.0 24410.0 1.0
                  1987-10-31 TN 1987.0 4.0 24410.0 1.0
                  1987-11-30 TN 1987.0 4.0 24410.0 1.0
                  1987-12-31 TN 1987.0 4.0 24410.0 1.0
                  1988-01-31 TN 1987.0 4.0 24410.0 1.0
                  1988-02-29 TN 1987.0 4.0 24410.0 1.0
                  1988-03-31 TN 1987.0 4.0 24410.0 1.0
                  1988-04-30 TN 1987.0 4.0 24410.0 1.0
                  1988-05-31 TN 1987.0 4.0 24410.0 1.0
                  1988-06-30 TN 1987.0 4.0 24410.0 1.0
                  1988-07-31 TN 1987.0 4.0 24410.0 1.0
                  1988-08-31 TN 1987.0 4.0 24410.0 1.0
                  1988-09-30 TN 1987.0 4.0 24410.0 1.0
                  1988-10-31 TN 1987.0 4.0 24410.0 1.0
                  1988-11-30 TN 1987.0 4.0 24410.0 1.0
                  1988-12-31 TN 1987.0 4.0 24410.0 1.0
                  1989-01-31 TN 1987.0 4.0 24410.0 1.0
                  1989-02-28 TN 1987.0 4.0 24410.0 1.0
                  1989-03-31 TN 1987.0 4.0 24410.0 1.0
                  1989-04-30 TN 1987.0 4.0 24410.0 1.0
                  1989-05-31 TN 1987.0 4.0 24410.0 1.0
                  1989-06-30 TN 1987.0 4.0 24410.0 1.0
                  ... ... ... ... ... ...
                  2013-09-30 TN 1996.0 12.0 37109.0 1.0
                  2013-10-31 TN 1996.0 12.0 37109.0 1.0
                  2013-11-30 TN 1996.0 12.0 37109.0 1.0
                  2013-12-31 TN 1996.0 12.0 37109.0 1.0
                  2014-01-31 TN 1996.0 12.0 37109.0 1.0
                  2014-02-28 TN 1996.0 12.0 37109.0 1.0
                  2014-03-31 TN 1996.0 12.0 37109.0 1.0
                  2014-04-30 TN 1996.0 12.0 37109.0 1.0
                  2014-05-31 TN 1996.0 12.0 37109.0 1.0
                  2014-06-30 TN 1996.0 12.0 37109.0 1.0
                  2014-07-31 TN 1996.0 12.0 37109.0 1.0
                  2014-08-31 TN 1996.0 12.0 37109.0 1.0
                  2014-09-30 TN 1996.0 12.0 37109.0 1.0
                  2014-10-31 TN 1996.0 12.0 37109.0 1.0
                  2014-11-30 TN 1996.0 12.0 37109.0 1.0
                  2014-12-31 TN 1996.0 12.0 37109.0 1.0
                  2015-01-31 TN 1996.0 12.0 37109.0 1.0
                  2015-02-28 TN 1996.0 12.0 37109.0 1.0
                  2015-03-31 TN 1996.0 12.0 37109.0 1.0
                  2015-04-30 TN 1996.0 12.0 37109.0 1.0
                  2015-05-31 TN 1996.0 12.0 37109.0 1.0
                  2015-06-30 TN 1996.0 12.0 37109.0 1.0
                  2015-07-31 TN 1996.0 12.0 37109.0 1.0
                  2015-08-31 TN 1996.0 12.0 37109.0 1.0
                  2015-09-30 TN 1996.0 12.0 37109.0 1.0
                  2015-10-31 TN 1996.0 12.0 37109.0 1.0
                  2015-11-30 TN 1996.0 12.0 37109.0 1.0
                  2015-12-31 TN 1996.0 12.0 37109.0 1.0
                  2016-01-31 TN 2016.0 1.0 49808.0 1.0
                  2016-02-29 TN 2016.0 2.0 49808.0 1.0


                  You can get the original structure by doing:



                  >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

                  State Year Month Value
                  0 TN 1987.0 1.0 24410.0
                  1 TN 1987.0 2.0 24410.0
                  2 TN 1987.0 3.0 24410.0
                  3 TN 1987.0 4.0 24410.0
                  4 TN 1987.0 4.0 24410.0
                  5 TN 1987.0 4.0 24410.0
                  6 TN 1987.0 4.0 24410.0
                  7 TN 1987.0 4.0 24410.0
                  8 TN 1987.0 4.0 24410.0





                  share|improve this answer














                  You can create a dataframe with the missing months and them merge your result with it:



                  dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
                  end='1/08/%d' %df['Year'].max(),
                  freq='MS', closed='left')

                  >> dates

                  DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
                  '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
                  '1987-10-01', '1987-11-01',
                  ...
                  '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
                  '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
                  '2015-12-01', '2016-01-01'],
                  dtype='datetime64[ns]', length=348, freq='MS')


                  Then you can create a dataframe with all the months:



                  all_months = pd.DataFrame.from_records((dates.year, dates.month),
                  index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


                  And then merge it with the original dataframe and forward-fill it:



                  df.merge(all_months, how='right').ffill()

                  State Year Month Value
                  0 TN 1987.0 1.0 24410.0
                  1 TN 1987.0 2.0 24410.0
                  2 TN 1987.0 3.0 24410.0
                  3 TN 1987.0 4.0 24410.0
                  4 TN 1996.0 1.0 24410.0
                  5 TN 1996.0 2.0 24410.0
                  6 TN 1996.0 3.0 24410.0
                  7 TN 1996.0 4.0 24410.0
                  8 TN 1996.0 5.0 37109.0
                  9 TN 1996.0 6.0 37109.0
                  10 TN 1996.0 7.0 37109.0
                  11 TN 1996.0 8.0 37109.0
                  12 TN 1996.0 9.0 37109.0
                  13 TN 1996.0 10.0 37109.0
                  14 TN 1996.0 11.0 37109.0
                  15 TN 1996.0 12.0 37109.0
                  16 TN 2016.0 1.0 49808.0
                  17 TN 1987.0 5.0 49808.0
                  18 TN 1987.0 6.0 49808.0
                  19 TN 1987.0 7.0 49808.0
                  20 TN 1987.0 8.0 49808.0
                  21 TN 1987.0 9.0 49808.0
                  22 TN 1987.0 10.0 49808.0
                  23 TN 1987.0 11.0 49808.0
                  24 TN 1987.0 12.0 49808.0
                  25 TN 1988.0 1.0 49808.0
                  26 TN 1988.0 2.0 49808.0
                  27 TN 1988.0 3.0 49808.0
                  28 TN 1988.0 4.0 49808.0
                  29 TN 1988.0 5.0 49808.0
                  .. ... ... ... ...
                  319 TN 2013.0 7.0 49808.0
                  320 TN 2013.0 8.0 49808.0
                  321 TN 2013.0 9.0 49808.0
                  322 TN 2013.0 10.0 49808.0
                  323 TN 2013.0 11.0 49808.0
                  324 TN 2013.0 12.0 49808.0
                  325 TN 2014.0 1.0 49808.0
                  326 TN 2014.0 2.0 49808.0
                  327 TN 2014.0 3.0 49808.0
                  328 TN 2014.0 4.0 49808.0
                  329 TN 2014.0 5.0 49808.0
                  330 TN 2014.0 6.0 49808.0
                  331 TN 2014.0 7.0 49808.0
                  332 TN 2014.0 8.0 49808.0
                  333 TN 2014.0 9.0 49808.0
                  334 TN 2014.0 10.0 49808.0
                  335 TN 2014.0 11.0 49808.0
                  336 TN 2014.0 12.0 49808.0
                  337 TN 2015.0 1.0 49808.0
                  338 TN 2015.0 2.0 49808.0
                  339 TN 2015.0 3.0 49808.0
                  340 TN 2015.0 4.0 49808.0
                  341 TN 2015.0 5.0 49808.0
                  342 TN 2015.0 6.0 49808.0
                  343 TN 2015.0 7.0 49808.0
                  344 TN 2015.0 8.0 49808.0
                  345 TN 2015.0 9.0 49808.0
                  346 TN 2015.0 10.0 49808.0
                  347 TN 2015.0 11.0 49808.0
                  348 TN 2015.0 12.0 49808.0


                  Using pandas.resample



                  Another solution is index by date and then resample there:



                  df['Day'] = 1

                  df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

                  >> df1

                  State Year Month Value Day
                  date
                  1987-01-01 TN 1987.0 1.0 24410.0 1
                  1987-02-01 TN 1987.0 2.0 24410.0 1
                  1987-03-01 TN 1987.0 3.0 24410.0 1
                  1987-04-01 TN 1987.0 4.0 24410.0 1
                  1996-01-01 TN 1996.0 1.0 24410.0 1
                  1996-02-01 TN 1996.0 2.0 24410.0 1
                  1996-03-01 TN 1996.0 3.0 24410.0 1
                  1996-04-01 TN 1996.0 4.0 24410.0 1
                  1996-05-01 TN 1996.0 5.0 37109.0 1
                  1996-06-01 TN 1996.0 6.0 37109.0 1
                  1996-07-01 TN 1996.0 7.0 37109.0 1
                  1996-08-01 TN 1996.0 8.0 37109.0 1
                  1996-09-01 TN 1996.0 9.0 37109.0 1
                  1996-10-01 TN 1996.0 10.0 37109.0 1
                  1996-11-01 TN 1996.0 11.0 37109.0 1
                  1996-12-01 TN 1996.0 12.0 37109.0 1
                  2016-01-01 TN 2016.0 1.0 49808.0 1
                  2016-02-01 TN 2016.0 2.0 49808.0 1


                  Then you can resample it by month by doing:



                      res = df1.resample('M').first().ffill()

                  >> res

                  State Year Month Value Day
                  date
                  1987-01-31 TN 1987.0 1.0 24410.0 1.0
                  1987-02-28 TN 1987.0 2.0 24410.0 1.0
                  1987-03-31 TN 1987.0 3.0 24410.0 1.0
                  1987-04-30 TN 1987.0 4.0 24410.0 1.0
                  1987-05-31 TN 1987.0 4.0 24410.0 1.0
                  1987-06-30 TN 1987.0 4.0 24410.0 1.0
                  1987-07-31 TN 1987.0 4.0 24410.0 1.0
                  1987-08-31 TN 1987.0 4.0 24410.0 1.0
                  1987-09-30 TN 1987.0 4.0 24410.0 1.0
                  1987-10-31 TN 1987.0 4.0 24410.0 1.0
                  1987-11-30 TN 1987.0 4.0 24410.0 1.0
                  1987-12-31 TN 1987.0 4.0 24410.0 1.0
                  1988-01-31 TN 1987.0 4.0 24410.0 1.0
                  1988-02-29 TN 1987.0 4.0 24410.0 1.0
                  1988-03-31 TN 1987.0 4.0 24410.0 1.0
                  1988-04-30 TN 1987.0 4.0 24410.0 1.0
                  1988-05-31 TN 1987.0 4.0 24410.0 1.0
                  1988-06-30 TN 1987.0 4.0 24410.0 1.0
                  1988-07-31 TN 1987.0 4.0 24410.0 1.0
                  1988-08-31 TN 1987.0 4.0 24410.0 1.0
                  1988-09-30 TN 1987.0 4.0 24410.0 1.0
                  1988-10-31 TN 1987.0 4.0 24410.0 1.0
                  1988-11-30 TN 1987.0 4.0 24410.0 1.0
                  1988-12-31 TN 1987.0 4.0 24410.0 1.0
                  1989-01-31 TN 1987.0 4.0 24410.0 1.0
                  1989-02-28 TN 1987.0 4.0 24410.0 1.0
                  1989-03-31 TN 1987.0 4.0 24410.0 1.0
                  1989-04-30 TN 1987.0 4.0 24410.0 1.0
                  1989-05-31 TN 1987.0 4.0 24410.0 1.0
                  1989-06-30 TN 1987.0 4.0 24410.0 1.0
                  ... ... ... ... ... ...
                  2013-09-30 TN 1996.0 12.0 37109.0 1.0
                  2013-10-31 TN 1996.0 12.0 37109.0 1.0
                  2013-11-30 TN 1996.0 12.0 37109.0 1.0
                  2013-12-31 TN 1996.0 12.0 37109.0 1.0
                  2014-01-31 TN 1996.0 12.0 37109.0 1.0
                  2014-02-28 TN 1996.0 12.0 37109.0 1.0
                  2014-03-31 TN 1996.0 12.0 37109.0 1.0
                  2014-04-30 TN 1996.0 12.0 37109.0 1.0
                  2014-05-31 TN 1996.0 12.0 37109.0 1.0
                  2014-06-30 TN 1996.0 12.0 37109.0 1.0
                  2014-07-31 TN 1996.0 12.0 37109.0 1.0
                  2014-08-31 TN 1996.0 12.0 37109.0 1.0
                  2014-09-30 TN 1996.0 12.0 37109.0 1.0
                  2014-10-31 TN 1996.0 12.0 37109.0 1.0
                  2014-11-30 TN 1996.0 12.0 37109.0 1.0
                  2014-12-31 TN 1996.0 12.0 37109.0 1.0
                  2015-01-31 TN 1996.0 12.0 37109.0 1.0
                  2015-02-28 TN 1996.0 12.0 37109.0 1.0
                  2015-03-31 TN 1996.0 12.0 37109.0 1.0
                  2015-04-30 TN 1996.0 12.0 37109.0 1.0
                  2015-05-31 TN 1996.0 12.0 37109.0 1.0
                  2015-06-30 TN 1996.0 12.0 37109.0 1.0
                  2015-07-31 TN 1996.0 12.0 37109.0 1.0
                  2015-08-31 TN 1996.0 12.0 37109.0 1.0
                  2015-09-30 TN 1996.0 12.0 37109.0 1.0
                  2015-10-31 TN 1996.0 12.0 37109.0 1.0
                  2015-11-30 TN 1996.0 12.0 37109.0 1.0
                  2015-12-31 TN 1996.0 12.0 37109.0 1.0
                  2016-01-31 TN 2016.0 1.0 49808.0 1.0
                  2016-02-29 TN 2016.0 2.0 49808.0 1.0


                  You can get the original structure by doing:



                  >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

                  State Year Month Value
                  0 TN 1987.0 1.0 24410.0
                  1 TN 1987.0 2.0 24410.0
                  2 TN 1987.0 3.0 24410.0
                  3 TN 1987.0 4.0 24410.0
                  4 TN 1987.0 4.0 24410.0
                  5 TN 1987.0 4.0 24410.0
                  6 TN 1987.0 4.0 24410.0
                  7 TN 1987.0 4.0 24410.0
                  8 TN 1987.0 4.0 24410.0






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 13 at 18:04

























                  answered Nov 13 at 17:51









                  Mabel Villalba

                  1,447114




                  1,447114






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286584%2ffilling-in-past-and-future-data-from-partial-data-in-python%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

                      ComboBox Display Member on multiple fields

                      Is it possible to collect Nectar points via Trainline?