Data class was introduced to Python 3.7 (PEP 557) to describe a data object with typed attributes.
from dataclasses import dataclass
@dataclass
class Order:
βββ
Order class describing price and quantity.
βββ
price: float
quantity: int
active: bool
The declaration is equivalent to cpp-like direct initialisation
def __init__(price: float, quantity: int, active: bool):
self.price = price
self.quantity = quantity
self.active = active
Type declaration in attributes suggests hints to users and developers, but has no enforcement on the input arguments. Developers are only benefited with the brevity of the implicit constructor. Not to mention the data class is only complicated when private attributes are needed.
Inheritance could be headache
If you are going to introduce inheritance in data models, it could be a headache with dataclasses.
For example, the base model Order
contains an optional attribute active
which is defaulted as True
.
@dataclass
class Order:
βββ
Order class describing price and quantity.
βββ
price: float
quantity: int
active: bool = True
and a subclass model StopOrder
has an additional attribute stop_price
@dataclass
class StopOrder(Order):
βββ
Stop order with a specified stop price.
βββ
stop_price: float
You will fail to construct the class with an error TypeError: non-default argument 'stop_price' follows default argument
in Python 3.9 or below. The short answer is Python attempted to construct a constructor __init__
that a default argument is between the non-default arguments, like below.
def __init__(self, price: float, quantity: int, active: bool = True, stop_price: float):
β¦
The above issue was then resolved in Python 3.10+, but it means dataclass is not well-defined until later versions.
Pydantic
Pydantic is the backbone of FastAPI and is designed to provide a more complete framework around dataclasses. The major feature is the attribute typing hints are enforced, and users can provide custom validators per data attribute.
from pydantic import BaseModel
class Order(BaseModel):
βββ
Order class describing price and quantity.
βββ
price: float
quantity: int
active: bool
# Gives an integer 1 rather than β1β
order = Order(price=100.0, quantity=β1β, active=True)
order.quantity
# Out: 1
# Throws Validation error on the string input of βquantityβ
Order(price=100.0, quantity=βPeterβ, active=True)
The object model guarantees the object is validated before passing to downstream. A huge improvement for users and developers. Also Pydantic is immune from the inheritance pain mentioned above.
Model conversion
Same as dataclasses.dataclass
, Pydantic can convert models into native dict / json
order.dict(). # {'price': 100.0, 'quantity': 1, 'active': True}
order.json() # '{"price": 100.0, "quantity": 1, "active": true}'
In the meantime, Pydantic model can be converted from βarbitrary class instanceβ. Of course it is not entirely βarbitraryβ, but as long as the class instance contains the same set of attributes, Pydantic attempts to convert it into Pydantic model.
For example, another order class is defined with namedtuple
.
from collections import namedtuple
OrderModel = namedtuple("OrderModel", ["price", "quantity", "active"])
and the Pydantic model has activated the ORM mode. The model creation can be achieved by calling class method from_orm
from pydantic import BaseModel
class Order(BaseModel):
βββ
Order class describing price and quantity.
βββ
price: float
quantity: int
active: bool
class Config:
orm_mode = True
Order.from_orm(OrderModel(price=100.0, quantity=1, active=True))
# Out: Order(price=100.0, quantity=1, active=True)
With ORM mode, data objects from database, e.g. imported via SQLAlchemy or Django, can be trivially converted to Pydantic model.
Private attributes
Pydantic provides private attributes to keep confidential data attributes hidden.
from pydantic import BaseModel
class Order(BaseModel):
βββ
Order class describing price and quantity.
βββ
price: float
quantity: int
active: bool
_creator: str
def __init__(self, creator, **kwargs):
super().__init__(**kwargs)
self._creator = creator
# Private attribute is hidden
order = Order(price=100.0, quantity=β1β, active=True, creator=βPeterβ)
order.dict()
# Out: Order(price=100.0, quantity=1, active=True)
Do you really need validators?
As mentioned, one of the greatest advantages of employing Pydantic is the default validator provided for each supported type. In the meantime, actually, it is a crucial question for developers whether the default validator is needed.
For example, a data model TBaseModel
simply takes a data attribute of an integer list a
, while creating a new object with 10 million integers of a
may take a few seconds to validate the full list.
a = list(range(10000000))
b = TBaseModel(a=a)
# %timeit 5.17 s Β± 33.3 ms per loop (mean Β± std. dev. of 7 runs, 1 loop each)
If your upstream already secures the data type of the data input, is enforcing validators more a bonus or just purely a painful overhead?
Do you really need Pydantic?
Yes, Pydantic is so great and powerful. It will definitely a game changer to enforce typing, if it is not yet. However, I assume there are more considerations, and also a few alternative options, to choose between them.
For these considerations, a few primary and fundamental characteristics impact hugely. For example, I always prefer sticking with standard libraries in enterprise level development, as I have little interest in understanding how an application breaks down due to an arbitrary low level dependency upgrade. If possible, I would stick to dataclasses and then wrap it with custom validators. In the meantime, I have better control on the release and deployment of my open sourced applications, and I do not bother to provide any compatibility to legacy Python releases, e.g. Python 3.6. In such a case, I am happy to rely on Pydantic even if it does not serve the greatest scope of audiences, but sufficient majority.
At the moment, let me summarise the major features / differences between dataclasses and Pydantic, especially for those readers who are making a decision on choosing one of them as the model framework.
Standard library: No dependency needed for standard library dataclasses. A big win.
Supported version: Same (Python 3.7+)
Painless inheritance: Immune in Pydantic, otherwise stick in Python 3.10+ in dataclass
Type validation: Rigidly enforced in Pydantic, but customisable
Class decorator: dataclass is applied with decorator, but Pydantic with BaseModel
inheritance
Private attribute: When you have non-primiative type of attributes, e.g. numpy array.
A short table can be summarised as below
Feature | dataclass | Pydantic |
---|---|---|
Standard library | Y | N |
Supported version | Python 3.7+ | Python 3.7+ |
Painless inheritance | Python 3.10+ | Python 3.7+ |
Type validation | N | Y |
Class decorator | Y | N |
Dump to / load from dict | Y | Y |
Dump to / load from arbitrary class | N | Y |
Private attribute | N | Y |
Also, you can sort out a slightly better version of dataclass with a third party library attr, and Tin detailed greatly why attr is sometimes more preferable than Pydantic.
Conclusion
Let me summarise it in a βsimpleβ decision tree.
How does that sound?