Time Series and Timedelta in Pandas
Overview
Datasets are a crucial part of any application of machine learning, and the best way to work with them is by using the Pandas library in python. However, Pandas was originally developed to assist with financial modeling, which means it has a range of tools for working with dates and times. Python provides various representations of dates, times, deltas, and timespans. We need a way to ease our work when dealing with tonnes of data with n number of timestamps. In this article, we will talk about TimeSeries and Timedelta in Python.
Introduction
A timedelta refers to a difference in time expressed in time units such as days, hours, minutes, and seconds. They can be both positive and negative.
Syntax:-
While implementing, we have three parameters to look at, value, unit, and kwargs. Let’s deal with them one by one.
Parameters:-
- value: It can take values of datatype timedelta, np.timedelta64, string(str), or integer(int). It is mandatory to mention this parameter while using the function.
- unit: It takes in string values. The default value is nanoseconds ‘ns’. It denotes the unit of the input if the input is an integer. It can have possible values like 'days', 'minutes', 'hours', 'seconds, 'milliseconds', 'nanoseconds', etc in multiple forms. It is mandatory to mention this parameter while using the function.
- kwargs: {days, seconds, microseconds, milliseconds, minutes, hours, weeks}. The .value attribute is always in nanoseconds since it is the default value for the parameter. We also need to remember that if the precision is higher than nanoseconds, the precision of the duration is truncated to nanoseconds. It is mandatory to mention this parameter while using the function.
Code Example 1: We look into how to initialize the Timedelta object with both value and unit.
Output:-
Code Example 2:
Output:-
Code Example 3:
Output:-
How to Create Timedelta Objects in Pandas (using Various Arguments)
1. String:- In order to create a timedelta object using a string argument we pass a string literal.
Code Example:
Output:-
2. Integer:- What differs from string, in this case, is we just need to pass an integer value and the object will be created.
Code Example:
Output:-
3. Data Offsets:- In order to first learn how to create a timedelta object using data offset, we first need to understand what data offset actually is. Data offsets are parameters like weeks, days, hours, minutes, seconds, milliseconds, microseconds, and nanoseconds. This, when passed as an argument, helps in the creation of the timedelta object.
Code Example:
Output:-
Code Example:
Output:-
Getting Familiar with To_Timedelta() Function in Pandas
Reading the name of the function, what is the first guess you can make about this function? Not that tough, isn’t it? The function name is a definition in itself. The argument that is passed to this function is converted to a timedelta object. So naturally, the return type for this function is timedelta. It is written as:-
Syntax:-
While implementing this function, we see three parameters being considered arg, unit, and errors. We will discuss them in detail one by one.
Parameters:-
- args:- It takes in values like string, timedelta, list-like or Series. It contains the data to be converted to timedelta. It is mandatory to mention this parameter while using the function.
- unit:- It can take in string values. The default value is nanoseconds ‘ns’. It denotes the unit of the input if the input is an integer. It is an optional parameter.
- errors:- We have three possible values for this parameter. They are ‘ignore’, ‘raise’, and ‘coerce’. The default value is ‘raise’.
- If the parameter value is set to ‘raise’, then invalid parsing raises an exception.
- If the parameter value is set to ‘coerce’, then invalid parsing will be set as NaT.
- If the parameter value is set to ‘ignore’, then invalid parsing will return the input. It is a mandatory parameter.
- Return type: timedelta
If the parsing is successful, the return type then depends on the input. For different input types, we have different return types. Let us see what kind of output we get for various input types.
- list-like: TimedeltaIndex of timedelta64 dtype
- Series: Series of timedelta64 dtype
- Scalar: Timedelta
Different Operations with Time Series in Pandas(Addition Operations, subtraction Operations)
While working with the timedelta performing operations like addition and subtraction can be really important for the programmer. There are no complicated steps to be followed. Let's see how these operations can be implemented in one line of code.
Operations on TimeDelta dataframe or series:-
- Addition -
df['SUM'] = df['TimeDelta_1'] + df['TimeDelta_2']
- Subtraction :-
df['DIFF'] = df['TimeDelta_1'] - df['TimeDelta_2']
Returns: It returns the dataframe after operations are performed.
Code Example:
Output:-
Code Example:
Output:-
Code Example:
Output:-
Pandas Timedelta.seconds Property
Timedelta.seconds in pandas is used to return the Number of seconds. Its implementation is simpler than it sounds. We do not need any special parameters and the return type is in the form of seconds. Let us look at the code to get it all cleared.
Code Example 11:
Output:-
Code Example:
Output:-
Conclusion
We have analyzed and visualized timedelta objects thoroughly. Here’s a quick recap of all the essential points we have learned so far.
- timedelta is essential in calculating the time difference in different units of time.
- While implementing timedelta in pandas, the three important parameters are value, unit, and kwargs.
- We read about data offsets which are crucial in time series analysis and learned the importance of Timedelta.seconds.
Analyzing time is crucial, and timedelta is the cornerstone of time series analysis in pandas. You can earn more insights by experimenting with your data. Play with the parameters and observe what the end result is. The better you know your data, the better results you will achieve. Keep Experimenting, Keep Learning.