In [1]:
import pandas as pd
pd.options.display.max_rows = 7
df = pd.read_csv(r"data/touch_events.csv")

df.head()
Out[1]:
doc_created_utc_milli event class
0 2018-05-22 15:42:34.908000+02:00 [[0.9475761, 0.9558146, 1.004272, 0.9985596, 0... b
1 2018-05-22 15:42:41.664000+02:00 [[0.02438468, 0.9864037, 0.9979962, 0.9987678,... b
2 2018-05-22 15:42:46.695000064+02:00 [[0.9484811, 0.9460094, 0.987562, 0.04781944, ... r
3 2018-05-22 15:42:51.561999872+02:00 [[0.03416152, 0.9763285, 0.9972644, 0.9974135,... b
4 2018-05-22 15:42:56.312000+02:00 [[0.03698828, 0.9631263, 1.002845, 0.9995005, ... b
To Parse a datetime column into Pandas datetime format we will use pandas.to_datetime() method.
  • you can pass in the format argument which takes in strftime format method.
  • Below is a list of complete meaning of strftime formats for your information.
  • Example 01/01/2018 whould have a strftime of "%d/%m/%Y"

Code Meaning Example
%a Weekday as locale’s abbreviated name. Mon
%A Weekday as locale’s full name. Monday
%w Weekday as a decimal number, where 0 is Sunday and 6 is Saturday. 1
%d Day of the month as a zero-padded decimal number. 30
%-d Day of the month as a decimal number. (Platform specific) 30
%b Month as locale’s abbreviated name. Sep
%B Month as locale’s full name. September
%m Month as a zero-padded decimal number. 09
%-m Month as a decimal number. (Platform specific) 9
%y Year without century as a zero-padded decimal number. 13
%Y Year with century as a decimal number. 2013
%H Hour (24-hour clock) as a zero-padded decimal number. 07
%-H Hour (24-hour clock) as a decimal number. (Platform specific) 7
%I Hour (12-hour clock) as a zero-padded decimal number. 07
%-I Hour (12-hour clock) as a decimal number. (Platform specific) 7
%p Locale’s equivalent of either AM or PM. AM
%M Minute as a zero-padded decimal number. 06
%-M Minute as a decimal number. (Platform specific) 6
%S Second as a zero-padded decimal number. 05
%-S Second as a decimal number. (Platform specific) 5
%f Microsecond as a decimal number, zero-padded on the left. 000000
%z UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive).
%Z Time zone name (empty string if the object is naive).
%j Day of the year as a zero-padded decimal number. 273
%-j Day of the year as a decimal number. (Platform specific) 273
%U Week number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. 39
%W Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0. 39
%c Locale’s appropriate date and time representation. Mon Sep 30 07:06:05 2013
%x Locale’s appropriate date representation. 09/30/13
%X Locale’s appropriate time representation. 07:06:05
%% A literal '%' character. %

If your date column is a string of the format '2017-01-01' you can use pandas astype to convert it to datetime

In [2]:
df["doc_created_utc_milli"].astype('datetime64[ns]')
Out[2]:
0     2018-05-22 13:42:34.908000000
1     2018-05-22 13:42:41.664000000
2     2018-05-22 13:42:46.695000064
                   ...             
115   2018-05-23 18:06:12.259000064
116   2018-05-23 18:06:21.420999936
117   2018-05-23 18:06:50.976999936
Name: doc_created_utc_milli, Length: 118, dtype: datetime64[ns]

Why is format useful and when wold you use it?

the format argument is the format pandas is expected to see your data in, if your datetime is not in a conventional format you can pass in the format your datetime is structured using the codes above.

In [3]:
df["date_parsed"] = pd.to_datetime(df["doc_created_utc_milli"])
In [4]:
df.head()
Out[4]:
doc_created_utc_milli event class date_parsed
0 2018-05-22 15:42:34.908000+02:00 [[0.9475761, 0.9558146, 1.004272, 0.9985596, 0... b 2018-05-22 15:42:34.908000+02:00
1 2018-05-22 15:42:41.664000+02:00 [[0.02438468, 0.9864037, 0.9979962, 0.9987678,... b 2018-05-22 15:42:41.664000+02:00
2 2018-05-22 15:42:46.695000064+02:00 [[0.9484811, 0.9460094, 0.987562, 0.04781944, ... r 2018-05-22 15:42:46.695000064+02:00
3 2018-05-22 15:42:51.561999872+02:00 [[0.03416152, 0.9763285, 0.9972644, 0.9974135,... b 2018-05-22 15:42:51.561999872+02:00
4 2018-05-22 15:42:56.312000+02:00 [[0.03698828, 0.9631263, 1.002845, 0.9995005, ... b 2018-05-22 15:42:56.312000+02:00
In [5]:
df.dtypes
Out[5]:
doc_created_utc_milli                                   object
event                                                   object
class                                                   object
date_parsed              datetime64[ns, pytz.FixedOffset(120)]
dtype: object
In [ ]:

Series Methods to access sub parts of your date and time.

If you have a Series (Column) which is parsed as Pandas Datetime,TimeStamp you can take advantage of the dt accessor.
With dt accessor you can extract sub parts of your date and time,indicator booleans for date tests. for example:
Series.dt.is_month_start Indicates whether the date is the first day of the month.

Here are some of the more common dt accessors:

  • Series.dt.date :  
    Returns numpy array of python datetime.date objects (namely, the date part of Timestamps without timezone information).

  • Series.dt.time :  
    Returns numpy array of datetime.time.

  • Series.dt.timetz :  
    Returns numpy array of datetime.time also containing timezone information.

  • Series.dt.year :  
    The year of the datetime.

  • Series.dt.month :  
    The month as January=1, December=12.

  • Series.dt.day :  
    The days of the datetime.

  • Series.dt.hour :  
    The hours of the datetime.

  • Series.dt.minute :  
    The minutes of the datetime.

  • Series.dt.second :  
    The seconds of the datetime.

  • Series.dt.microsecond :  
    The microseconds of the datetime.

  • Series.dt.nanosecond :  
    The nanoseconds of the datetime.

  • Series.dt.week :  
    The week ordinal of the year.

  • Series.dt.weekofyear :  
    The week ordinal of the year.

  • Series.dt.dayofweek :  
    The day of the week with Monday=0, Sunday=6.

  • Series.dt.weekday :  
    The day of the week with Monday=0, Sunday=6.

  • Series.dt.dayofyear :  
    The ordinal day of the year.

  • Series.dt.quarter :  
    The quarter of the date.

  • Series.dt.is_month_start :  
    Indicates whether the date is the first day of the month.

  • Series.dt.is_month_end :  
    Indicates whether the date is the last day of the month.

  • Series.dt.is_quarter_start :  
    Indicator for whether the date is the first day of a quarter.

  • Series.dt.is_quarter_end :  
    Indicator for whether the date is the last day of a quarter.

  • Series.dt.is_year_start :  
    Indicate whether the date is the first day of a year.

  • Series.dt.is_year_end :  
    Indicate whether the date is the last day of the year.

  • Series.dt.is_leap_year :  
    Boolean indicator if the date belongs to a leap year.

  • Series.dt.daysinmonth :  
    The number of days in the month.

  • Series.dt.days_in_month :  
    The number of days in the month.

  • Series.dt.tz :  
    Return timezone, if any.
In [6]:
df["date_parsed"].dt.date
Out[6]:
0      2018-05-22
1      2018-05-22
2      2018-05-22
          ...    
115    2018-05-23
116    2018-05-23
117    2018-05-23
Name: date_parsed, Length: 118, dtype: object
In [7]:
df["date_parsed"].dt.time
Out[7]:
0      15:42:34.908000
1      15:42:41.664000
2      15:42:46.695000
            ...       
115    20:06:12.259000
116    20:06:21.420999
117    20:06:50.976999
Name: date_parsed, Length: 118, dtype: object
In [8]:
df["date_parsed"].dt.year
Out[8]:
0      2018
1      2018
2      2018
       ... 
115    2018
116    2018
117    2018
Name: date_parsed, Length: 118, dtype: int64
In [9]:
df["date_parsed"].dt.month
Out[9]:
0      5
1      5
2      5
      ..
115    5
116    5
117    5
Name: date_parsed, Length: 118, dtype: int64
In [10]:
df["date_parsed"].dt.day
Out[10]:
0      22
1      22
2      22
       ..
115    23
116    23
117    23
Name: date_parsed, Length: 118, dtype: int64

We can use these value in all kind of ways in our analysis, below is a simple example of grouping our data based on days and counting how many classes are in each days.

In [11]:
df.groupby(df.date_parsed.dt.day)["class"].value_counts()
Out[11]:
date_parsed  class
22           b        44
             r        39
             k        31
23           k         2
             b         1
             r         1
Name: class, dtype: int64
In [12]:
df["date_parsed"].dt.is_month_end
Out[12]:
0      False
1      False
2      False
       ...  
115    False
116    False
117    False
Name: date_parsed, Length: 118, dtype: bool

Custom format which will convert your datetime back into an object and should be done after your analysis.

You can always change the format of your datetime to your required format using strftime formatter(Full List of codes given above).
Example:

In [13]:
df["date_parsed"]
Out[13]:
0        2018-05-22 15:42:34.908000+02:00
1        2018-05-22 15:42:41.664000+02:00
2     2018-05-22 15:42:46.695000064+02:00
                      ...                
115   2018-05-23 20:06:12.259000064+02:00
116   2018-05-23 20:06:21.420999936+02:00
117   2018-05-23 20:06:50.976999936+02:00
Name: date_parsed, Length: 118, dtype: datetime64[ns, pytz.FixedOffset(120)]

Let's change this to a custome format of our choice like dd/mm/yyyy.

In [14]:
df["date_parsed"].dt.strftime("%d/%m/%Y")
Out[14]:
0      22/05/2018
1      22/05/2018
2      22/05/2018
          ...    
115    23/05/2018
116    23/05/2018
117    23/05/2018
Name: date_parsed, Length: 118, dtype: object

Adding/removing days to your dates

  • to carry out math operation such as adding and subtracting your data needs to be in pandas Datetime format.

  • adding days to your datetime you should use pd.DateOffset() method.

  • to add days , just put a positive integer of the number of days you wish to add.
  • to subtract days , add a negative integer of the days you wish to subtract.
In [15]:
# to add 40 days
df["date_1"] = df["date_parsed"] + pd.DateOffset(n = -40)

#to remove 40 days
df["date_2"] = df["date_parsed"] + pd.DateOffset(n = 40)
In [16]:
df
Out[16]:
doc_created_utc_milli event class date_parsed date_1 date_2
0 2018-05-22 15:42:34.908000+02:00 [[0.9475761, 0.9558146, 1.004272, 0.9985596, 0... b 2018-05-22 15:42:34.908000+02:00 2018-04-12 15:42:34.908000+02:00 2018-07-01 15:42:34.908000+02:00
1 2018-05-22 15:42:41.664000+02:00 [[0.02438468, 0.9864037, 0.9979962, 0.9987678,... b 2018-05-22 15:42:41.664000+02:00 2018-04-12 15:42:41.664000+02:00 2018-07-01 15:42:41.664000+02:00
2 2018-05-22 15:42:46.695000064+02:00 [[0.9484811, 0.9460094, 0.987562, 0.04781944, ... r 2018-05-22 15:42:46.695000064+02:00 2018-04-12 15:42:46.695000064+02:00 2018-07-01 15:42:46.695000064+02:00
... ... ... ... ... ... ...
115 2018-05-23 20:06:12.259000064+02:00 [[1.008965, 1.00946, 1.001291, 0.1155886, 1.00... r 2018-05-23 20:06:12.259000064+02:00 2018-04-13 20:06:12.259000064+02:00 2018-07-02 20:06:12.259000064+02:00
116 2018-05-23 20:06:21.420999936+02:00 [[0.04032795, 0.07729655, 0.04748052, 0.151452... k 2018-05-23 20:06:21.420999936+02:00 2018-04-13 20:06:21.420999936+02:00 2018-07-02 20:06:21.420999936+02:00
117 2018-05-23 20:06:50.976999936+02:00 [[1.005216, 1.005337, 1.001163, 1.000643, 0, 1... k 2018-05-23 20:06:50.976999936+02:00 2018-04-13 20:06:50.976999936+02:00 2018-07-02 20:06:50.976999936+02:00

118 rows × 6 columns

Adding two date columns

In [17]:
df.dtypes
Out[17]:
doc_created_utc_milli                                   object
event                                                   object
class                                                   object
date_parsed              datetime64[ns, pytz.FixedOffset(120)]
date_1                   datetime64[ns, pytz.FixedOffset(120)]
date_2                   datetime64[ns, pytz.FixedOffset(120)]
dtype: object
In [18]:
df.date_1 = df.date_1.astype('datetime64[ns]')
df.date_2 = df.date_2.astype('datetime64[ns]')

df.dtypes
Out[18]:
doc_created_utc_milli                                   object
event                                                   object
class                                                   object
date_parsed              datetime64[ns, pytz.FixedOffset(120)]
date_1                                          datetime64[ns]
date_2                                          datetime64[ns]
dtype: object

Overview

pandas captures 4 general time related concepts:

  1. Date times: A specific date and time with timezone support. Similar to datetime.datetime from the standard library.
  2. Time deltas: An absolute time duration. Similar to datetime.timedelta from the standard library.
  3. Time spans: A span of time defined by a point in time and its associated frequency.
  4. Date offsets: A relative time duration that respects calendar arithmetic. Similar to dateutil.relativedelta.relativedelta from the dateutil package.
  5. </ol>

    Concept Scalar Class Array Class pandas Data Type Primary Creation Method
    Date times Timestamp DatetimeIndex datetime64[ns] or datetime64[ns, tz] to_datetime or date_range
    Time deltas Timedelta TimedeltaIndex timedelta64[ns] to_timedelta or timedelta_range
    Time spans Period PeriodIndex period[freq] Period or period_range
    Date offsets DateOffset None None DateOffset