Cfrgtkky

Question

How can we use columns 'Yr', 'Mo' and 'Dy' to create a new column with type Datetime and set it as the index of the Pandas DataFrame?

type

score 0 · Accepted Answer · 2018-11-21 00:15:02Z

First, you should convert Yr to a four-digit int, i.e. 1961 or 2061. This is unambiguous and, if you use the approach below, the format YYYY-MM-DD is required. That's because Pandas uses format='%Y%m%d' in pandas/core/tools/datetimes.py:

# From pandas/core/tools/datetimes.py, if you pass a DataFrame or dict

values = to_datetime(values, format='%Y%m%d', errors=errors)

So, to take an example:

from itertools import product



import numpy as np

import pandas as pd

np.random.seed(444)



datecols = ['Yr', 'Mo', 'Dy']

mapper = dict(zip(datecols, ('year', 'month', 'day')))

df = pd.DataFrame(list(product([61, 62], [1, 2], [1, 2, 3])),

                  columns=datecols)

df['data'] = np.random.randn(len(df))

Here is df:

In [11]: df                                                                                                                                                   

Out[11]: 

    Yr  Mo  Dy      data

0   61   1   1  0.357440

1   61   1   2  0.377538

2   61   1   3  1.382338

3   61   2   1  1.175549

4   61   2   2 -0.939276

5   61   2   3 -1.143150

6   62   1   1 -0.542440

7   62   1   2 -0.548708

8   62   1   3  0.208520

9   62   2   1  0.212690

10  62   2   2  1.268021

11  62   2   3 -0.807303

Let's assume for the sake of simplicity that the true range is 1920 onward, i.e.:

In [16]: yr = df['Yr']                                                                                                                                        



In [17]: df['Yr'] = np.where(yr <= 20, 2000 + yr, 1900 + yr)                                                                                                  



In [18]: df                                                                                                                                                   

Out[18]: 

      Yr  Mo  Dy      data

0   1961   1   1  0.357440

1   1961   1   2  0.377538

2   1961   1   3  1.382338

3   1961   2   1  1.175549

4   1961   2   2 -0.939276

5   1961   2   3 -1.143150

6   1962   1   1 -0.542440

7   1962   1   2 -0.548708

8   1962   1   3  0.208520

9   1962   2   1  0.212690

10  1962   2   2  1.268021

11  1962   2   3 -0.807303

The second thing you need to do is rename the columns; Pandas is fairly strict about this if you pass in a mapping or DataFrame to pd.to_datetime(). Here is that step and the result:

In [21]: df.index = pd.to_datetime(df[datecols].rename(columns=mapper))                                                                                       



In [22]: df                                                                                                                                                   

Out[22]: 

              Yr  Mo  Dy      data

1961-01-01  1961   1   1  0.357440

1961-01-02  1961   1   2  0.377538

1961-01-03  1961   1   3  1.382338

1961-02-01  1961   2   1  1.175549

1961-02-02  1961   2   2 -0.939276

1961-02-03  1961   2   3 -1.143150

1962-01-01  1962   1   1 -0.542440

1962-01-02  1962   1   2 -0.548708

1962-01-03  1962   1   3  0.208520

1962-02-01  1962   2   1  0.212690

1962-02-02  1962   2   2  1.268021

1962-02-03  1962   2   3 -0.807303

Lastly, here's one alternate through concatenating the columns as strings:

In [27]: as_str = df[datecols].astype(str)   

In [30]: pd.to_datetime( 

    ...:     as_str['Yr'] + '-' + as_str['Mo'] +'-' + as_str['Dy'], 

    ...:     format='%y-%m-%d' 

    ...:    )                                                                                                                                                 

Out[30]: 

0    2061-01-01

1    2061-01-02

2    2061-01-03

3    2061-02-01

4    2061-02-02

5    2061-02-03

6    2062-01-01

7    2062-01-02

8    2062-01-03

9    2062-02-01

10   2062-02-02

11   2062-02-03

dtype: datetime64[ns]

Notice again that this will assume the century for you. If you want to be explicit, you need to follow the same approach as above for adding the correct century before defining as_str.

Shivam SinhaShivam Sinha 635 · Accepted Answer · 2018-11-21 05:36:14Z

As pointed out by Brad, this is how I fixed it

def adjustyear(x):

    if x >= 1800:

        x = 1900 + x

    else:

        x = 2000 + x

    return x



def parsefunc(x):

    yearmodified = adjustyear(x['Yr'])

    print(yearmodified)

    datetimestr = str(yearmodified)+str(x['Mo'])+str(x['Dy'])

    return pd.to_datetime(datetimestr, format='%Y%m%d', errors='ignore')



data['newindex'] = data.apply(parsefunc, axis=1)

data.index = data['newindex']

score 0 · Accepted Answer · 2018-11-21 00:15:02Z

First, you should convert Yr to a four-digit int, i.e. 1961 or 2061. This is unambiguous and, if you use the approach below, the format YYYY-MM-DD is required. That's because Pandas uses format='%Y%m%d' in pandas/core/tools/datetimes.py:

# From pandas/core/tools/datetimes.py, if you pass a DataFrame or dict

values = to_datetime(values, format='%Y%m%d', errors=errors)

So, to take an example:

from itertools import product



import numpy as np

import pandas as pd

np.random.seed(444)



datecols = ['Yr', 'Mo', 'Dy']

mapper = dict(zip(datecols, ('year', 'month', 'day')))

df = pd.DataFrame(list(product([61, 62], [1, 2], [1, 2, 3])),

                  columns=datecols)

df['data'] = np.random.randn(len(df))

Here is df:

In [11]: df                                                                                                                                                   

Out[11]: 

    Yr  Mo  Dy      data

0   61   1   1  0.357440

1   61   1   2  0.377538

2   61   1   3  1.382338

3   61   2   1  1.175549

4   61   2   2 -0.939276

5   61   2   3 -1.143150

6   62   1   1 -0.542440

7   62   1   2 -0.548708

8   62   1   3  0.208520

9   62   2   1  0.212690

10  62   2   2  1.268021

11  62   2   3 -0.807303

Let's assume for the sake of simplicity that the true range is 1920 onward, i.e.:

In [16]: yr = df['Yr']                                                                                                                                        



In [17]: df['Yr'] = np.where(yr <= 20, 2000 + yr, 1900 + yr)                                                                                                  



In [18]: df                                                                                                                                                   

Out[18]: 

      Yr  Mo  Dy      data

0   1961   1   1  0.357440

1   1961   1   2  0.377538

2   1961   1   3  1.382338

3   1961   2   1  1.175549

4   1961   2   2 -0.939276

5   1961   2   3 -1.143150

6   1962   1   1 -0.542440

7   1962   1   2 -0.548708

8   1962   1   3  0.208520

9   1962   2   1  0.212690

10  1962   2   2  1.268021

11  1962   2   3 -0.807303

The second thing you need to do is rename the columns; Pandas is fairly strict about this if you pass in a mapping or DataFrame to pd.to_datetime(). Here is that step and the result:

In [21]: df.index = pd.to_datetime(df[datecols].rename(columns=mapper))                                                                                       



In [22]: df                                                                                                                                                   

Out[22]: 

              Yr  Mo  Dy      data

1961-01-01  1961   1   1  0.357440

1961-01-02  1961   1   2  0.377538

1961-01-03  1961   1   3  1.382338

1961-02-01  1961   2   1  1.175549

1961-02-02  1961   2   2 -0.939276

1961-02-03  1961   2   3 -1.143150

1962-01-01  1962   1   1 -0.542440

1962-01-02  1962   1   2 -0.548708

1962-01-03  1962   1   3  0.208520

1962-02-01  1962   2   1  0.212690

1962-02-02  1962   2   2  1.268021

1962-02-03  1962   2   3 -0.807303

Lastly, here's one alternate through concatenating the columns as strings:

In [27]: as_str = df[datecols].astype(str)   

In [30]: pd.to_datetime( 

    ...:     as_str['Yr'] + '-' + as_str['Mo'] +'-' + as_str['Dy'], 

    ...:     format='%y-%m-%d' 

    ...:    )                                                                                                                                                 

Out[30]: 

0    2061-01-01

1    2061-01-02

2    2061-01-03

3    2061-02-01

4    2061-02-02

5    2061-02-03

6    2062-01-01

7    2062-01-02

8    2062-01-03

9    2062-02-01

10   2062-02-02

11   2062-02-03

dtype: datetime64[ns]

Notice again that this will assume the century for you. If you want to be explicit, you need to follow the same approach as above for adding the correct century before defining as_str.

Shivam SinhaShivam Sinha 635 · Accepted Answer · 2018-11-21 05:36:14Z

As pointed out by Brad, this is how I fixed it

def adjustyear(x):

    if x >= 1800:

        x = 1900 + x

    else:

        x = 2000 + x

    return x



def parsefunc(x):

    yearmodified = adjustyear(x['Yr'])

    print(yearmodified)

    datetimestr = str(yearmodified)+str(x['Mo'])+str(x['Dy'])

    return pd.to_datetime(datetimestr, format='%Y%m%d', errors='ignore')



data['newindex'] = data.apply(parsefunc, axis=1)

data.index = data['newindex']

搜尋此網誌

Cfrgtkky

Make Datetime Series from separate year, month, and date columns in Pandas

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?

Make Datetime Series from separate year, month, and date columns in Pandas

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?

2 Answers
2

2 Answers
2

2 Answers
2