import pandas as pd pd.options.display.max_rows = 7 df = pd.read_csv(r"data/touch_events.csv") df.head()
%a
Mon
%A
Monday
%w
1
%d
30
%-d
%b
Sep
%B
September
%m
09
%-m
9
%y
13
%Y
2013
%H
07
%-H
7
%I
%-I
%p
AM
%M
06
%-M
6
%S
05
%-S
5
%f
000000
%z
%Z
%j
273
%-j
%U
39
%W
%c
Mon Sep 30 07:06:05 2013
%x
09/30/13
%X
07:06:05
%%
%
df["doc_created_utc_milli"].astype('datetime64[ns]')
0 2018-05-22 13:42:34.908000000 1 2018-05-22 13:42:41.664000000 2 2018-05-22 13:42:46.695000064 ... 115 2018-05-23 18:06:12.259000064 116 2018-05-23 18:06:21.420999936 117 2018-05-23 18:06:50.976999936 Name: doc_created_utc_milli, Length: 118, dtype: datetime64[ns]
the format argument is the format pandas is expected to see your data in, if your datetime is not in a conventional format you can pass in the format your datetime is structured using the codes above.
df["date_parsed"] = pd.to_datetime(df["doc_created_utc_milli"])
df.head()
df.dtypes
doc_created_utc_milli object event object class object date_parsed datetime64[ns, pytz.FixedOffset(120)] dtype: object
If you have a Series (Column) which is parsed as Pandas Datetime,TimeStamp you can take advantage of the dt accessor. With dt accessor you can extract sub parts of your date and time,indicator booleans for date tests. for example: Series.dt.is_month_start Indicates whether the date is the first day of the month.
df["date_parsed"].dt.date
0 2018-05-22 1 2018-05-22 2 2018-05-22 ... 115 2018-05-23 116 2018-05-23 117 2018-05-23 Name: date_parsed, Length: 118, dtype: object
df["date_parsed"].dt.time
0 15:42:34.908000 1 15:42:41.664000 2 15:42:46.695000 ... 115 20:06:12.259000 116 20:06:21.420999 117 20:06:50.976999 Name: date_parsed, Length: 118, dtype: object
df["date_parsed"].dt.year
0 2018 1 2018 2 2018 ... 115 2018 116 2018 117 2018 Name: date_parsed, Length: 118, dtype: int64
df["date_parsed"].dt.month
0 5 1 5 2 5 .. 115 5 116 5 117 5 Name: date_parsed, Length: 118, dtype: int64
df["date_parsed"].dt.day
0 22 1 22 2 22 .. 115 23 116 23 117 23 Name: date_parsed, Length: 118, dtype: int64
We can use these value in all kind of ways in our analysis, below is a simple example of grouping our data based on days and counting how many classes are in each days.
df.groupby(df.date_parsed.dt.day)["class"].value_counts()
date_parsed class 22 b 44 r 39 k 31 23 k 2 b 1 r 1 Name: class, dtype: int64
df["date_parsed"].dt.is_month_end
0 False 1 False 2 False ... 115 False 116 False 117 False Name: date_parsed, Length: 118, dtype: bool
You can always change the format of your datetime to your required format using strftime formatter(Full List of codes given above). Example:
df["date_parsed"]
0 2018-05-22 15:42:34.908000+02:00 1 2018-05-22 15:42:41.664000+02:00 2 2018-05-22 15:42:46.695000064+02:00 ... 115 2018-05-23 20:06:12.259000064+02:00 116 2018-05-23 20:06:21.420999936+02:00 117 2018-05-23 20:06:50.976999936+02:00 Name: date_parsed, Length: 118, dtype: datetime64[ns, pytz.FixedOffset(120)]
Let's change this to a custome format of our choice like dd/mm/yyyy.
df["date_parsed"].dt.strftime("%d/%m/%Y")
0 22/05/2018 1 22/05/2018 2 22/05/2018 ... 115 23/05/2018 116 23/05/2018 117 23/05/2018 Name: date_parsed, Length: 118, dtype: object
to carry out math operation such as adding and subtracting your data needs to be in pandas Datetime format.
adding days to your datetime you should use pd.DateOffset() method.
# to add 40 days df["date_1"] = df["date_parsed"] + pd.DateOffset(n = -40) #to remove 40 days df["date_2"] = df["date_parsed"] + pd.DateOffset(n = 40)
df
118 rows × 6 columns
Adding two date columns
doc_created_utc_milli object event object class object date_parsed datetime64[ns, pytz.FixedOffset(120)] date_1 datetime64[ns, pytz.FixedOffset(120)] date_2 datetime64[ns, pytz.FixedOffset(120)] dtype: object
df.date_1 = df.date_1.astype('datetime64[ns]') df.date_2 = df.date_2.astype('datetime64[ns]') df.dtypes
doc_created_utc_milli object event object class object date_parsed datetime64[ns, pytz.FixedOffset(120)] date_1 datetime64[ns] date_2 datetime64[ns] dtype: object
pandas captures 4 general time related concepts:
datetime.datetime
datetime.timedelta
dateutil.relativedelta.relativedelta
dateutil
Timestamp
DatetimeIndex
datetime64[ns]
datetime64[ns, tz]
to_datetime
date_range
Timedelta
TimedeltaIndex
timedelta64[ns]
to_timedelta
timedelta_range
Period
PeriodIndex
period[freq]
period_range
DateOffset
None