52_Pandas处理日期和时间列（字符串转换、日期提取等）

将解释如何操作表示 pandas.DataFrame 的日期和时间（日期和时间）的列。字符串与 datetime64[ns] 类型的相互转换，将日期和时间提取为数字的方法等。

以下内容进行说明。

将字符串转换为 datetime64[ns] 类型（时间戳类型）：to_datetime()

时间戳类型属性/方法

使用 dt 访问器批量处理整个列

提取日期、星期几等。

将日期时间转换为任何格式的字符串

转换为 Python 数据帧类型，NumPy datetime64[ns] 类型数组

对于 dt 中未提供的方法

对于日期时间索引

从文件读取时将字符串转换为 datetime64[ns] 类型

如何将 datetime64[ns] 类型指定为索引并将其处理为时序数据以及如何使用，请参考以下文章。

26_Pandas.DataFrame时间序列数据的处理
27_Pandas按星期，月份，季度和年份的天计算时间序列数据的总计和平均值

以带有以下 csv 文件的 pandas.DataFrame 为例。

import pandas as pd
import datetime
df = pd.read_csv('./data/sample_datetime_multi.csv')print(df)#                  A                   B#0  2017-11-01 12:24   2017年11月1日 12时24分#1  2017-11-18 23:00  2017年11月18日 23时00分#2   2017-12-05 5:05    2017年12月5日 5时05分#3   2017-12-22 8:54   2017年12月22日 8时54分#4  2018-01-08 14:20    2018年1月8日 14时20分#5  2018-01-19 20:01   2018年1月19日 20时01分

将字符串转换为 datetime64[ns] 类型（时间戳类型）：to_datetime()

使用 pandas.to_datetime() 函数，您可以将表示日期和时间的字符串列 pandas.Series 转换为 datetime64[ns] 类型。

print(pd.to_datetime(df['A']))# 0   2017-11-01 12:24:00# 1   2017-11-18 23:00:00# 2   2017-12-05 05:05:00# 3   2017-12-22 08:54:00# 4   2018-01-08 14:20:00# 5   2018-01-19 20:01:00# Name: A, dtype: datetime64[ns]

如果格式不标准，请在参数格式中指定格式字符串。

print(pd.to_datetime(df['B'],format='%Y年%m月%d日 %H时%M分'))# 0   2017-11-01 12:24:00# 1   2017-11-18 23:00:00# 2   2017-12-05 05:05:00# 3   2017-12-22 08:54:00# 4   2018-01-08 14:20:00# 5   2018-01-19 20:01:00# Name: B, dtype: datetime64[ns]

即使原始格式不同，如果指示的日期和时间相同，则 datetime64[ns] 类型值是等价的。

print(pd.to_datetime(df['A'])== pd.to_datetime(df['B'],format='%Y年%m月%d日 %H时%M分'))# 0    True# 1    True# 2    True# 3    True# 4    True# 5    True# dtype: bool

如果要将转换为 datetime64[ns] 类型的列作为新列添加到 pandas.DataFrame，请指定新列名并分配它。如果您指定原始列名，它将被覆盖。

df['X']= pd.to_datetime(df['A'])print(df)#                   A                   B                   X#0  2017-11-01 12:24   2017年11月1日 12时24分 2017-11-01 12:24:00#1  2017-11-18 23:00  2017年11月18日 23时00分 2017-11-18 23:00:00#2   2017-12-05 5:05    2017年12月5日 5时05分 2017-12-05 05:05:00#3   2017-12-22 8:54   2017年12月22日 8时54分 2017-12-22 08:54:00#4  2018-01-08 14:20    2018年1月8日 14时20分 2018-01-08 14:20:00#5  2018-01-19 20:01   2018年1月19日 20时01分 2018-01-19 20:01:00

时间戳类型属性/方法

pandas.to_datetime() 函数转换的列的dtype是datetime64[ns]类型，每个元素都是Timestamp类型。

print(df)#                   A                   B                   X# 0  2017-11-01 12:24   2017年11月1日 12时24分 2017-11-01 12:24:00# 1  2017-11-18 23:00  2017年11月18日 23时00分 2017-11-18 23:00:00# 2   2017-12-05 5:05    2017年12月5日 5时05分 2017-12-05 05:05:00# 3   2017-12-22 8:54   2017年12月22日 8时54分 2017-12-22 08:54:00# 4  2018-01-08 14:20    2018年1月8日 14时20分 2018-01-08 14:20:00# 5  2018-01-19 20:01   2018年1月19日 20时01分 2018-01-19 20:01:00print(df.dtypes)# A            object# B            object# X    datetime64[ns]# dtype: objectprint(df['X'][0])# 2017-11-01 12:24:00print(type(df['X'][0]))# <class 'pandas._libs.tslib.Timestamp'>

Timestamp 类型继承并扩展了 Python 标准库 datetime 的 datetime 类型。

print(issubclass(pd.Timestamp, datetime.datetime))# True

可以获取年、月、日（年、月、日）、时、分、秒（时、分、秒）、星期几（字符串：weekday_name，数字：dayofweek）等作为属性。

print(df['X'][0].year)# 2017print(df['X'][0].weekday_name)# Wednesday

还可以使用 to_pydatetime() 转换为 Python 标准库 datetime 类型，使用 to_datetime64() 转换为 NumPy datetime64[ns] 类型。

py_dt = df['X'][0].to_pydatetime()print(type(py_dt))# <class 'datetime.datetime'>
dt64 = df['X'][0].to_datetime64()print(type(dt64))# <class 'numpy.datetime64'>

timestamp() 是一种以浮点浮点类型返回 UNIX 时间（纪元秒 = 自 1970 年 1 月 1 日 00:00:00 以来的秒数）的方法。如果需要整数，请使用 int()。

print(df['X'][0].timestamp())# 1509539040.0print(pd.to_datetime('1970-01-01 00:00:00').timestamp())# 0.0print(int(df['X'][0].timestamp()))# 1509539040

与 Python 标准库中的 datetime 类型一样，strftime() 可用于转换为任何格式的字符串。请参阅下文，了解如何将其应用于列的所有元素。

print(df['X'][0].strftime('%Y/%m/%d'))# 2017/11/01

使用 dt 访问器批量处理整个列

有一个 str 访问器将字符串处理应用于整个 pandas.Series。

13_Pandas字符串的替换和空格处删除等方法

提取日期、星期几。

与Timestamp类型一样，年、月、日（年、月、日）、时、分、秒（时、分、秒）、星期几（字符串：weekday_name，数字：dayofweek）等都可以作为属性获得。在 dt 之后写下每个属性名称。 pandas.Series 的每个元素都被处理并返回 pandas.Series。

print(df['X'].dt.year)# 0    2017# 1    2017# 2    2017# 3    2017# 4    2018# 5    2018# Name: X, dtype: int64print(df['X'].dt.hour)# 0    12# 1    23# 2     5# 3     8# 4    14# 5    20# Name: X, dtype: int64

也可以使用 dayofweek（星期一为 0，星期日为 6）仅提取一周中特定日期的行。

print(df['X'].dt.dayofweek)# 0    2# 1    5# 2    1# 3    4# 4    0# 5    4# Name: X, dtype: int64print(df[df['X'].dt.dayofweek ==4])#                   A                  B                   X# 3   2017-12-22 8:54  2017年12月22日 8时54分 2017-12-22 08:54:00# 5  2018-01-19 20:01  2018年1月19日 20时01分 2018-01-19 20:01:00

将日期时间转换为任何格式的字符串

当使用 astype() 方法将 datetime64[ns] 类型的列转换为字符串 str 类型时，它会转换为标准格式的字符串。

print(df['X'].astype(str))# 0    2017-11-01 12:24:00# 1    2017-11-18 23:00:00# 2    2017-12-05 05:05:00# 3    2017-12-22 08:54:00# 4    2018-01-08 14:20:00# 5    2018-01-19 20:01:00# Name: X, dtype: object

dt.strftime() 可用于一次将列转换为任何格式的字符串。也可以使其成为仅具有日期或仅具有时间的字符串。

print(df['X'].dt.strftime('%A, %B %d, %Y'))# 0    Wednesday, November 01, 2017# 1     Saturday, November 18, 2017# 2      Tuesday, December 05, 2017# 3       Friday, December 22, 2017# 4        Monday, January 08, 2018# 5        Friday, January 19, 2018# Name: X, dtype: objectprint(df['X'].dt.strftime('%Y年%m月%d日'))# 0    2017年11月01日# 1    2017年11月18日# 2    2017年12月05日# 3    2017年12月22日# 4    2018年01月08日# 5    2018年01月19日# Name: X, dtype: object

如果要将转换为字符串的列作为新列添加到 pandas.DataFrame，请指定新列名并分配它。如果您指定原始列名，它将被覆盖。

df['en']= df['X'].dt.strftime('%A, %B %d, %Y')
df['cn']= df['X'].dt.strftime('%Y年%m月%d日')print(df)#                   A                   B                   X  \# 0  2017-11-01 12:24   2017年11月1日 12时24分 2017-11-01 12:24:00   # 1  2017-11-18 23:00  2017年11月18日 23时00分 2017-11-18 23:00:00   # 2   2017-12-05 5:05    2017年12月5日 5时05分 2017-12-05 05:05:00   # 3   2017-12-22 8:54   2017年12月22日 8时54分 2017-12-22 08:54:00   # 4  2018-01-08 14:20    2018年1月8日 14时20分 2018-01-08 14:20:00   # 5  2018-01-19 20:01   2018年1月19日 20时01分 2018-01-19 20:01:00   #                              en           cn# 0  Wednesday, November 01, 2017  2017年11月01日  # 1   Saturday, November 18, 2017  2017年11月18日  # 2    Tuesday, December 05, 2017  2017年12月05日  # 3     Friday, December 22, 2017  2017年12月22日  # 4      Monday, January 08, 2018  2018年01月08日  # 5      Friday, January 19, 2018  2018年01月19日

转换为 Python 数据帧类型，NumPy datetime64[ns] 类型数组

可以使用 dt.to_pydatetime() 获得一个 NumPy 数组 ndarray，其元素是 Python 标准库的日期时间类型对象。

print(df['X'].dt.to_pydatetime())# [datetime.datetime(2017, 11, 1, 12, 24)#  datetime.datetime(2017, 11, 18, 23, 0)#  datetime.datetime(2017, 12, 5, 5, 5)#  datetime.datetime(2017, 12, 22, 8, 54)#  datetime.datetime(2018, 1, 8, 14, 20)#  datetime.datetime(2018, 1, 19, 20, 1)]print(type(df['X'].dt.to_pydatetime()))print(type(df['X'].dt.to_pydatetime()[0]))# <class 'numpy.ndarray'># <class 'datetime.datetime'>

NumPy的datetime64[ns]类型数组可以用values属性代替方法获取。

print(df['X'].values)# ['2017-11-01T12:24:00.000000000' '2017-11-18T23:00:00.000000000'#  '2017-12-05T05:05:00.000000000' '2017-12-22T08:54:00.000000000'#  '2018-01-08T14:20:00.000000000' '2018-01-19T20:01:00.000000000']print(type(df['X'].values))print(type(df['X'].values[0]))# <class 'numpy.ndarray'># <class 'numpy.datetime64'>

对于 dt 中未提供的方法

例如，Timestamp 类型有一个返回 UNIX 时间（秒）的方法 (timestamp())，但 dt 访问器没有。在这种情况下，使用 map() 即可。

06_Pandas中map(),applymap(),apply()函数的使用方法

print(df['X'].map(pd.Timestamp.timestamp))# 0    1.509539e+09# 1    1.511046e+09# 2    1.512450e+09# 3    1.513933e+09# 4    1.515421e+09# 5    1.516392e+09# Name: X, dtype: float64

如果要转换为整数 int 类型，请使用 astype() 方法。

print(df['X'].map(pd.Timestamp.timestamp).astype(int))# 0    1509539040# 1    1511046000# 2    1512450300# 3    1513932840# 4    1515421200# 5    1516392060# Name: X, dtype: int64

对于日期时间索引

在处理时间序列数据时非常有用。有关详细信息，请参阅下面的文章。

26_Pandas.DataFrame时间序列数据的处理
27_Pandas按星期，月份，季度和年份的天计算时间序列数据的总计和平均值

在示例中，set_index() 用于将现有列指定为索引，为方便起见，使用 drop() 方法删除多余的列。

12_Pandas.DataFrame删除指定行和列（drop）
22_Pandas.DataFrame,重置列的行名(set_index）

df_i = df.set_index('X').drop(['en','cn'], axis=1)print(df_i)#                                     A                   B# X                                                        # 2017-11-01 12:24:00  2017-11-01 12:24   2017年11月1日 12时24分# 2017-11-18 23:00:00  2017-11-18 23:00  2017年11月18日 23时00分# 2017-12-05 05:05:00   2017-12-05 5:05    2017年12月5日 5时05分# 2017-12-22 08:54:00   2017-12-22 8:54   2017年12月22日 8时54分# 2018-01-08 14:20:00  2018-01-08 14:20    2018年1月8日 14时20分# 2018-01-19 20:01:00  2018-01-19 20:01   2018年1月19日 20时01分print(df_i.index)# DatetimeIndex(['2017-11-01 12:24:00', '2017-11-18 23:00:00',#                '2017-12-05 05:05:00', '2017-12-22 08:54:00',#                '2018-01-08 14:20:00', '2018-01-19 20:01:00'],#               dtype='datetime64[ns]', name='X', freq=None)

DatetimeIndex 类型索引具有年、月、日（年、月、日）、时、分、秒（时、分、秒）、星期几（字符串：weekday_name，数字：dayofweek）等属性，以及方法如由于提供了 strftime()，因此可以一次处理所有索引元素，而无需通过 dt 属性。

返回类型因属性和方法而异，不是pandas.Series，但如果要在pandas.DataFrame中添加新列，可以指定新列名并分配。

print(df_i.index.minute)# Int64Index([24, 0, 5, 54, 20, 1], dtype='int64', name='X')print(df_i.index.strftime('%y/%m/%d'))# ['17/11/01' '17/11/18' '17/12/05' '17/12/22' '18/01/08' '18/01/19']
df_i['min']= df_i.index.minute
df_i['str']= df_i.index.strftime('%y/%m/%d')print(df_i)#                                     A                   B  min       str# X# 2017-11-01 12:24:00  2017-11-01 12:24   2017年11月1日 12时24分   24  17/11/01# 2017-11-18 23:00:00  2017-11-18 23:00  2017年11月18日 23时00分    0  17/11/18# 2017-12-05 05:05:00   2017-12-05 5:05    2017年12月5日 5时05分    5  17/12/05# 2017-12-22 08:54:00   2017-12-22 8:54   2017年12月22日 8时54分   54  17/12/22# 2018-01-08 14:20:00  2018-01-08 14:20    2018年1月8日 14时20分   20  18/01/08# 2018-01-19 20:01:00  2018-01-19 20:01   2018年1月19日 20时01分    1  18/01/19

从文件读取时将字符串转换为 datetime64[ns] 类型

从文件中读取数据时，可以在读取时将字符串转换为 datetime64[ns] 类型。对于 pandas.read_csv() 函数，在参数 parse_dates 中指定要转换为 datetime64[ns] 类型的列号列表。请注意，即使只有一个，也必须列出。

df_csv = pd.read_csv('data/sample_datetime_multi.csv', parse_dates=[0])print(df_csv)#                     A                   B# 0 2017-11-01 12:24:00   2017年11月1日 12时24分# 1 2017-11-18 23:00:00  2017年11月18日 23时00分# 2 2017-12-05 05:05:00    2017年12月5日 5时05分# 3 2017-12-22 08:54:00   2017年12月22日 8时54分# 4 2018-01-08 14:20:00    2018年1月8日 14时20分# 5 2018-01-19 20:01:00   2018年1月19日 20时01分print(df_csv.dtypes)# A    datetime64[ns]# B            object# dtype: object

df_csv_jp = pd.read_csv('./data/sample_datetime_multi.csv',
                        parse_dates=[1],
                        date_parser=lambda date: pd.to_datetime(date,format='%Y年%m月%d日 %H时%M分'))print(df_csv_jp)#                   A                   B# 0  2017-11-01 12:24 2017-11-01 12:24:00# 1  2017-11-18 23:00 2017-11-18 23:00:00# 2   2017-12-05 5:05 2017-12-05 05:05:00# 3   2017-12-22 8:54 2017-12-22 08:54:00# 4  2018-01-08 14:20 2018-01-08 14:20:00# 5  2018-01-19 20:01 2018-01-19 20:01:00print(df_csv_jp.dtypes)# A            object# B    datetime64[ns]# dtype: object

可以使用参数 index_col 指定要索引的列。

03_Pandas读取csv/tsv文件（read_csv，read_table）

在这种情况下，如果参数 parse_dates=True，索引列将被转换为 datetime64[ns] 类型。

df_csv_jp_i = pd.read_csv('./data/sample_datetime_multi.csv',
                          index_col=1,
                          parse_dates=True,
                          date_parser=lambda date: pd.to_datetime(date,format='%Y年%m月%d日 %H时%M分'))print(df_csv_jp_i)#                                     A# B                                    # 2017-11-01 12:24:00  2017-11-01 12:24# 2017-11-18 23:00:00  2017-11-18 23:00# 2017-12-05 05:05:00   2017-12-05 5:05# 2017-12-22 08:54:00   2017-12-22 8:54# 2018-01-08 14:20:00  2018-01-08 14:20# 2018-01-19 20:01:00  2018-01-19 20:01print(df_csv_jp_i.index)# DatetimeIndex(['2017-11-01 12:24:00', '2017-11-18 23:00:00',#                '2017-12-05 05:05:00', '2017-12-22 08:54:00',#                '2018-01-08 14:20:00', '2018-01-19 20:01:00'],#               dtype='datetime64[ns]', name='B', freq=None)

读取 Excel 文件的 pandas.read_excel() 函数也有参数 parse_dates、date_parser 和 index_col，因此在读取时也可以进行类似的转换。有关 pandas.read_excel() 函数的信息，请参阅以下文章。

50_Pandas读取 Excel 文件 (xlsx, xls)

标签： pandas python 数据分析

本文转载自: https://blog.csdn.net/qq_18351157/article/details/127703926
版权归原作者 饺子大人 所有，如有侵权，请联系我们删除。

52_Pandas处理日期和时间列（字符串转换、日期提取等）