Understanding Python Data Class
Overview
Everything in Python is an object. Python's class object is used to construct custom objects with their properties and functions. However, building classes in Python might involve writing a lot of boilerplate code to set up the class instance based on the parameters supplied to it or to create common functions like comparison operators. Data classes, introduced in Python 3.7 (and backported to Python 3.6), provide a convenient approach to reducing the verbosity of classes. Many typical tasks in a class, such as instantiating properties from parameters supplied to the class, can be reduced to a few simple commands.
Python Data Class Module
The Data class Python has been introduced in the 3.7 version of Python. It has been introduced as a utility tool to create structured classes that have been designed particularly to store data.
Syntax to install Data class Python
To install the Data class Python in our module we need to write the following:
The Data class Python helps us to implement apt object initialization, object comparison, and object representation without writing boilerplate code.
The Data class Python can be defined as a list of data sets containing the allocation attributes along with the value. The Data class Python cannot be assigned to the objects, although Data class Python is used to create objects from scratch.
Basics of Python Data Class
As discussed earlier, the Python data class has been introduced in Python 3.7 as a utility tool to create a structured class for data storage.
These classes have particular attributes and functions that deal with data and its representation. DataClasses is widely used in Python 3.6. Although the module was first introduced in Python3.7, it is also compatible with Python3.6 when the dataclasses library is installed.
Let's look at an example to have a better idea of the data class. As an example, consider the Position class, which will represent geographic positions with a name as well as latitude and longitude:
Code:
The @dataclass decorator just above the class declaration distinguishes this as a data class. Simply list the fields you want in your data class beneath the Position: line class. The : notation for the fields is based on a new Python 3.6 feature called variable annotations.
Those few lines of code are all you need. The new class is now available for use:
Code:
Output:
Code:
Output:
Code:
Output:
You can also create data classes in the same way named tuples are created. The following is almost similar to the definition of Position above:
Code:
A data class is just like any other Python class. The only thing that distinguishes it is that it has basic data model methods like .__init__(), .__repr__(), and .__eq__() implemented for you.
Default Values
It is simple to add default values to the fields of your data class:
This works precisely the same as if you had supplied the default settings in the specification of the .__init__() method of a regular class:
TypeHints
So far, we haven't made a big deal about the fact that data classes support typing right out of the box. You may have noticed that we defined the fields with a type hint: name: str specifies that the name should be a text string (str type).
In reality, adding some kind of type hint is required while specifying the fields in your data class. The field will not be included in the data class if it does not have a type hint. typing. Any is helpful if you don't want to specify explicit types to your data class.
When using data classes, you must include type hints in some form, although these types are not enforced at runtime. The following code executes without error:
Adding Methods
You are already aware that a data class is simply an ordinary class. That is, you are free to add your methods to a data class. As an example, consider calculating the distance between two points on the Earth's surface. One method to do this is by using the haversine formula:
Just like with a normal class, you can add a distance_to() method to your data class:
Code
Input to the above code:
Output of the above code:
Immutable Data Classes in Python
One of the distinguishing characteristics of the named tuple is that it is immutable. That is, the values of its fields may never change. This is a great concept for a wide range of data classes! When you construct a data class, set frozen=True to make it immutable. For example, here's an immutable version of the Position class that you saw earlier:
You cannot assign values to fields in a frozen data class once they are created:
However, keep in mind that mutable fields in your data class may still change. This is true for all Python nested data structures.
Although ImmutableCard and ImmutableDeck are both immutable, the list of cards is not. As a result, you can still change the cards in the deck:
To avoid this, ensure that all fields of an immutable data class are immutable (but remember that types are not enforced at runtime). Instead of a list, a tuple should be used to implement the ImmutableDeck.
Extra Flexibility with Python Data Class
So far, you've seen some of the data class's basic features: it provides some convenience methods, and you may still add default values and other methods. You will now learn about more complex capabilities such as @dataclass decorator parameters and the field() function. They work together to give you more options when constructing a data class.
Let's go back to the playing card example from earlier in the lesson and add a class that contains a deck of cards while we're at it:
A simple deck of only two cards can be made as follows:
Inheritance
You can freely subclass data classes. As an example, we will add a country field to our Position example and utilize it to record capitals:
Everything works perfectly in this simple example:
Input:Capital('Oslo', 10.8, 59.9, 'Norway')
Output:
After the three original fields in Position, the country field of the Capital class is added. Things get a little more complex if any of the base class's fields have default values:
This code will immediately crash with a TypeError stating that "non-default parameter 'country' follows default argument." The issue is that our new country field has no default value, although the lon and lat fields have default values. The data class will attempt to write an .__init__() method with the following signature:
Code:
The lat field is used for obtaining latitude and the lon field is used for obtaining longitude.
However, this is not the proper use of Python. If a parameter has a default value, all subsequent parameters must have a default value as well. To put it another way, if a field in a base class has a default value, then all new fields introduced in a subclass must also have default values.
Another thing to consider is how fields are organized in a subclass. Beginning with the base class, fields are arranged in the order in which they were first defined. The order of fields in a subclass does not change if they are redefined. For example, consider the following definitions of Position and Capital:
The order of the fields in Capital will remain name, lon, lat, and country. However, the default value of lat will be 40.0.
Input: Capital('Madrid', country='Spain')
Output:
Equality of Data Classes
Since the classes store data, determining if two objects have the same data is a typical task with data classes. The == operator is used to do this.
The code for an equivalent class for storing an article without a data class decorator is shown below.
Output of the above code:
Equality between two objects in Python using the == operator looks for the same memory location. Because two objects take distinct memory locations when they are created, the equality output is False. Equality between DataClass objects verifies the equality of data in them. This describes why True is returned as the output of an equality check between two DataClass objects that have the same data.
Data Class Optimization
Slots can be used to optimize classes and consume less memory. Although there is no specific syntax for working with slots in data classes, the standard manner of constructing slots also works for data classes. (They are, in fact, regular classes!)
Essentially, slots are defined by using .__slots__ to enumerate the variables in a class. Variables or attributes not found in .__slots__ may not be defined. Furthermore, a slot class might not even have default values.
The advantage of imposing such constraints is that certain optimizations can be carried out. For example, slot classes use less memory, as measured by Pympler:
Similarly, slots classes are often easier to work with. The following example compares the speed of attribute access on a slots data class with a regular data class using timeit from the standard library.
In the above example, the slot class is approximately 35% faster.
Alternatives to Python Data Class
You've probably used a tuple or a dict for simple data structures. The queen of hearts card could be represented in either of two ways:
It works. However, it imposes a great deal of responsibility on you as a programmer:
- You need to remember that the variable queen of hearts_... denotes a card.
- You need to remember the attribute order for the tuple version. Writing ('Spades,' 'A') will break your software but will most likely not result in an understandable error message.
- When using the dict version, ensure the attribute names are consistent. For example, {'value': 'A' and'suit': 'Spades'} will not function as planned.
Moreover, using these structures is not ideal:
The namedtuple is a better alternative. It's been used for a long time to create readable small data structures. We can rebuild the data class example above using a namedtuple, as seen below:
This NamedTupleCard definition delivers the same outputs as our DataClassCard example did:
So why even bother using data classes at all? First off, data classes contain a much wider range of features than what you have seen thus far. Additionally, the namedtuple has several other characteristics that are not always desired. A namedtuple is an ordinary tuple by definition. This can be seen in comparisons, for example:
Even though it would appear advantageous, this ignorance of its type can result in subtle and difficult-to-detect bugs because it will also gladly compare two distinct namedtuple classes:
There are also certain limitations with the namedtuple. For example, it can be challenging to set default values to some namedtuple fields. Additionally, a namedtuple is immutable by nature. In other words, a namedtuple value can never change. This feature works great in some situations, but in others, it would be good to have greater flexibility:
Data classes will not completely replace namedtuple usage. For example, if you need your data structure to behave like a tuple, a named tuple would be an excellent choice!
The attrs project is another alternative and one of the inspirations for data classes. After installing attrs (pip install attrs), you can create a card class as follows:
This can be used in the same way as the previous DataClassCard and NamedTupleCard examples. The attrs project is good, and it has some things that data classes do not, such as converters and validators. Furthermore, attrs has been around for a long and is supported in both Python 2.7 and Python 3.4 and higher. However, because attrs is not included in the standard library, it introduces an extra reliance on your applications. Similar functionality will be available everywhere through data classes.
In addition to dict, tuple, namedtuple, and attrs, there are many other similar projects, including typing.NamedTuple, plumber, namedlist, attrdict, and fields. While data classes are a great new alternative, there are still some situations when one of the older alternatives is preferable. For example, if you require compatibility with a specific API that expects tuples or if you require functionality that is not supported by data classes.
FAQs
In this section, we will go through some frequently asked questions regarding Python data classes
Q: Which versions of Python support the Python data classes?
A: The Python data classes are supported by the Python 3.6 and Python 3.7 versions.
Q: What is the Python data class module used for?
A: The Python data class module is used as a utility tool to create structured classes that have been designed particularly to store data.
Q: How do we integrate Python data class in our module?
A: The Python data class can be integrated by the pip install dataclasses command.
Q: How are Python data classes distinguished from other Python classes?
A: The only thing that distinguishes Pyhton data classes from other Python classes is that it has basic data model methods like .__init__(), .__repr__(), and .__eq__() implemented for you.
Q: Is it possible to create truly immutable data objects in Python?
A: Creating a completely immutable Python object is not possible, although upon passing frozen=True to the dataclass() method, the immutability can be emulated.
Q: How is the equality operator used with Python data classes?
A: Since the classes store data, determining if two objects have the same data is a typical task with data classes. In such cases, the === operator is used to do the comparison.
Q: How can we optimize the data classes?
A: The Python data classes can be optimized by using the slots thus helping to consume less memory.
Q: What are some alternatives of Python data classes?
A: Some of the alternatives of Python data classes are: tuples, dictionaries, named tuples, attrs, dataclass, pydantic.
Conclusion
- Every instance in Python is an object.
- The Python class object is used to construct custom objects with their own properties and functions.
- The Python data class was introduced in Python 3.7
- The Python data class is a utility tool to create structured classes that have been designed particularly to store data.
- The Python data classes can be distinguished from other Python classes using basic data model methods like .__init__(), .__repr__(), and .__eq__(), etc.
- The Python data classes can be optimized by using the slots thus helping to consume less memory.
- The === operator is used for determining if two objects have the same data is a typical task with data classes.