All Projects → Fatal1ty → Mashumaro

Fatal1ty / Mashumaro

Licence: apache-2.0
Fast and well tested serialization framework on top of dataclasses

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Mashumaro

Srsly
🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)
Stars: ✭ 189 (-9.13%)
Mutual labels:  json, yaml, msgpack, serialization
Java
jsoniter (json-iterator) is fast and flexible JSON parser available in Java and Go
Stars: ✭ 1,308 (+528.85%)
Mutual labels:  json, serialization, deserialization
Go
A high-performance 100% compatible drop-in replacement of "encoding/json"
Stars: ✭ 10,248 (+4826.92%)
Mutual labels:  json, serialization, deserialization
Yyjson
The fastest JSON library in C
Stars: ✭ 1,894 (+810.58%)
Mutual labels:  json, serialization, deserialization
Aspjson
A fast classic ASP JSON parser and encoder for easy JSON manipulation to work with the new JavaScript MV* libraries and frameworks.
Stars: ✭ 165 (-20.67%)
Mutual labels:  json, serialization, deserialization
Fastjson
A fast JSON parser/generator for Java.
Stars: ✭ 23,997 (+11437.02%)
Mutual labels:  json, serialization, deserialization
Dataclass factory
Modern way to convert python dataclasses or other objects to and from more common types like dicts or json-like structures
Stars: ✭ 116 (-44.23%)
Mutual labels:  json, serialization, deserialization
Handyjson
A handy swift json-object serialization/deserialization library
Stars: ✭ 3,913 (+1781.25%)
Mutual labels:  json, serialization, deserialization
Pyjson tricks
Extra features for Python's JSON: comments, order, numpy, pandas, datetimes, and many more! Simple but customizable.
Stars: ✭ 131 (-37.02%)
Mutual labels:  json, serialization, deserialization
Noproto
Flexible, Fast & Compact Serialization with RPC
Stars: ✭ 138 (-33.65%)
Mutual labels:  json, serialization, deserialization
Rapidyaml
Rapid YAML - a library to parse and emit YAML, and do it fast.
Stars: ✭ 183 (-12.02%)
Mutual labels:  json, yaml, serialization
Dart Json Mapper
Serialize / Deserialize Dart Objects to / from JSON
Stars: ✭ 206 (-0.96%)
Mutual labels:  json, serialization, deserialization
Airframe
Essential Building Blocks for Scala
Stars: ✭ 442 (+112.5%)
Mutual labels:  json, msgpack, serialization
Jackson Module Kotlin
Module that adds support for serialization/deserialization of Kotlin (http://kotlinlang.org) classes and data classes.
Stars: ✭ 830 (+299.04%)
Mutual labels:  json, serialization, deserialization
Remarshal
Convert between CBOR, JSON, MessagePack, TOML, and YAML
Stars: ✭ 421 (+102.4%)
Mutual labels:  json, yaml, msgpack
Datafiles
A file-based ORM for Python dataclasses.
Stars: ✭ 113 (-45.67%)
Mutual labels:  json, yaml, serialization
dataconf
Simple dataclasses configuration management for Python with hocon/json/yaml/properties/env-vars/dict support.
Stars: ✭ 40 (-80.77%)
Mutual labels:  yaml, serialization, deserialization
Bebop
An extremely simple, fast, efficient, cross-platform serialization format
Stars: ✭ 305 (+46.63%)
Mutual labels:  json, serialization, deserialization
Borer
Efficient CBOR and JSON (de)serialization in Scala
Stars: ✭ 131 (-37.02%)
Mutual labels:  json, serialization, deserialization
Orjson
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
Stars: ✭ 2,595 (+1147.6%)
Mutual labels:  json, serialization, deserialization

mashumaro (マシュマロ)

mashumaro is a fast and well tested serialization framework on top of dataclasses.

Build Status Coverage Status Latest Version Python Version License

When using dataclasses, you often need to dump and load objects according to the described scheme. This framework not only adds this ability to serialize in different formats, but also makes serialization rapidly.

Table of contents

Installation

Use pip to install:

$ pip install mashumaro

Supported serialization formats

This framework adds methods for dumping to and loading from the following formats:

Plain dict can be useful when you need to pass a dict object to a third-party library, such as a client for MongoDB.

Supported field types

There is support for generic types from the standard typing module:

  • List
  • Tuple
  • Set
  • FrozenSet
  • Deque
  • Dict
  • OrderedDict
  • Mapping
  • MutableMapping
  • Counter
  • ChainMap
  • Sequence

for special primitives from the typing module:

  • Optional
  • Union
  • Any

for enumerations based on classes from the standard enum module:

  • Enum
  • IntEnum
  • Flag
  • IntFlag

for common built-in types:

  • int
  • float
  • bool
  • str
  • bytes
  • bytearray

for built-in datetime oriented types (see more details):

  • datetime
  • date
  • time
  • timedelta
  • timezone

for pathlike types:

  • PurePath
  • Path
  • PurePosixPath
  • PosixPath
  • PureWindowsPath
  • WindowsPath
  • os.PathLike

for other less popular built-in types:

  • uuid.UUID
  • decimal.Decimal
  • fractions.Fraction
  • ipaddress.IPv4Address
  • ipaddress.IPv6Address
  • ipaddress.IPv4Network
  • ipaddress.IPv6Network
  • ipaddress.IPv4Interface
  • ipaddress.IPv6Interface

for specific types like NoneType, nested dataclasses itself and even user defined classes.

Usage example

from enum import Enum
from typing import Set
from dataclasses import dataclass
from mashumaro import DataClassJSONMixin

class PetType(Enum):
    CAT = 'CAT'
    MOUSE = 'MOUSE'

@dataclass(unsafe_hash=True)
class Pet(DataClassJSONMixin):
    name: str
    age: int
    pet_type: PetType

@dataclass
class Person(DataClassJSONMixin):
    first_name: str
    second_name: str
    age: int
    pets: Set[Pet]


tom = Pet(name='Tom', age=5, pet_type=PetType.CAT)
jerry = Pet(name='Jerry', age=3, pet_type=PetType.MOUSE)
john = Person(first_name='John', second_name='Smith', age=18, pets={tom, jerry})

dump = john.to_json()
person = Person.from_json(dump)
# person == john

Pet.from_json('{"name": "Tom", "age": 5, "pet_type": "CAT"}')
# Pet(name='Tom', age=5, pet_type=<PetType.CAT: 'CAT'>)

How does it work?

This framework works by taking the schema of the data and generating a specific parser and builder for exactly that schema. This is much faster than inspection of field types on every call of parsing or building at runtime.

Benchmark

  • macOS 11.1 Big Sur
  • Apple M1
  • 16GB RAM

Load and dump sample data 1.000 times in 5 runs. The following figures show the best overall time in each case.

Framework From dict To dict
Time Slowdown factor Time Slowdown factor
mashumaro 0.04316 1x 0.02751 1x
cattrs 0.06518 1.51x 0.04856 1.77x
pydantic 0.23339 5.41x 0.11464 4.17x
marshmallow 0.24699 5.72x 0.09425 3.43x
dataclasses 0.22689 5.26x
dacite 0.91482 21.2x

To run benchmark in your environment:

git clone [email protected]:Fatal1ty/mashumaro.git
cd mashumaro
python3 -m venv env && source env/bin/activate
pip install -e .
pip install -r requirements-dev.txt
python benchmark/run.py

API

Mashumaro provides a couple of mixins for each format.

DataClassDictMixin.to_dict(use_bytes: bool, use_enum: bool, use_datetime: bool)

Make a dictionary from dataclass object based on the dataclass schema provided. Options include:

use_bytes: False     # False - convert bytes/bytearray objects to base64 encoded string, True - keep untouched
use_enum: False      # False - convert enum objects to enum values, True - keep untouched
use_datetime: False  # False - convert datetime oriented objects to ISO 8601 formatted string, True - keep untouched

DataClassDictMixin.from_dict(data: Mapping, use_bytes: bool, use_enum: bool, use_datetime: bool)

Make a new object from dict object based on the dataclass schema provided. Options include:

use_bytes: False     # False - load bytes/bytearray objects from base64 encoded string, True - keep untouched
use_enum: False      # False - load enum objects from enum values, True - keep untouched
use_datetime: False  # False - load datetime oriented objects from ISO 8601 formatted string, True - keep untouched

DataClassJSONMixin.to_json(encoder: Optional[Encoder], dict_params: Optional[Mapping], **encoder_kwargs)

Make a JSON formatted string from dataclass object based on the dataclass schema provided. Options include:

encoder        # function called for json encoding, defaults to json.dumps
dict_params    # dictionary of parameter values passed underhood to `to_dict` function
encoder_kwargs # keyword arguments for encoder function

DataClassJSONMixin.from_json(data: Union[str, bytes, bytearray], decoder: Optional[Decoder], dict_params: Optional[Mapping], **decoder_kwargs)

Make a new object from JSON formatted string based on the dataclass schema provided. Options include:

decoder        # function called for json decoding, defaults to json.loads
dict_params    # dictionary of parameter values passed underhood to `from_dict` function
decoder_kwargs # keyword arguments for decoder function

DataClassMessagePackMixin.to_msgpack(encoder: Optional[Encoder], dict_params: Optional[Mapping], **encoder_kwargs)

Make a MessagePack formatted bytes object from dataclass object based on the dataclass schema provided. Options include:

encoder        # function called for MessagePack encoding, defaults to msgpack.packb
dict_params    # dictionary of parameter values passed underhood to `to_dict` function
encoder_kwargs # keyword arguments for encoder function

DataClassMessagePackMixin.from_msgpack(data: Union[str, bytes, bytearray], decoder: Optional[Decoder], dict_params: Optional[Mapping], **decoder_kwargs)

Make a new object from MessagePack formatted data based on the dataclass schema provided. Options include:

decoder        # function called for MessagePack decoding, defaults to msgpack.unpackb
dict_params    # dictionary of parameter values passed underhood to `from_dict` function
decoder_kwargs # keyword arguments for decoder function

DataClassYAMLMixin.to_yaml(encoder: Optional[Encoder], dict_params: Optional[Mapping], **encoder_kwargs)

Make an YAML formatted bytes object from dataclass object based on the dataclass schema provided. Options include:

encoder        # function called for YAML encoding, defaults to yaml.dump
dict_params    # dictionary of parameter values passed underhood to `to_dict` function
encoder_kwargs # keyword arguments for encoder function

DataClassYAMLMixin.from_yaml(data: Union[str, bytes], decoder: Optional[Decoder], dict_params: Optional[Mapping], **decoder_kwargs)

Make a new object from YAML formatted data based on the dataclass schema provided. Options include:

decoder        # function called for YAML decoding, defaults to yaml.safe_load
dict_params    # dictionary of parameter values passed underhood to `from_dict` function
decoder_kwargs # keyword arguments for decoder function

Customization

Serializable Interface

If you already have a separate custom class, and you want to serialize instances of it with mashumaro, you can achieve this by implementing SerializableType interface:

from typing import Dict
from datetime import datetime
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializableType

class DateTime(datetime, SerializableType):
    def _serialize(self) -> Dict[str, int]:
        return {
            "year": self.year,
            "month": self.month,
            "day": self.day,
            "hour": self.hour,
            "minute": self.minute,
            "second": self.second,
        }

    @classmethod
    def _deserialize(cls, value: Dict[str, int]) -> 'DateTime':
        return DateTime(
            year=value['year'],
            month=value['month'],
            day=value['day'],
            hour=value['hour'],
            minute=value['minute'],
            second=value['second'],
        )


@dataclass
class Holiday(DataClassDictMixin):
    when: DateTime = DateTime.now()


new_year = Holiday(when=DateTime(2019, 1, 1, 12))
dictionary = new_year.to_dict()
# {'x': {'year': 2019, 'month': 1, 'day': 1, 'hour': 0, 'minute': 0, 'second': 0}}
assert Holiday.from_dict(dictionary) == new_year

Field options

In some cases creating a new class just for one little thing could be excessive. Moreover, you may need to deal with third party classes that you are not allowed to change. You can usedataclasses.field function as a default field value to configure some serialization aspects through its metadata parameter. Next section describes all supported options to use in metadata mapping.

serialize option

This option allows you to change the serialization method through a value of type Callable[[Any], Any] that could be any callable object like a function, a class method, a class instance method, an instance of a callable class or even a lambda function.

Example:

@dataclass
class A(DataClassDictMixin):
    dt: datetime = field(
        metadata={
            "serialize": lambda v: v.strftime('%Y-%m-%d %H:%M:%S')
        }
    )

deserialize option

This option allows you to change the deserialization method. When using this option, the deserialization behaviour depends on what type of value the option has. It could be either Callable[[Any], Any] or str.

A value of type Callable[[Any], Any] is a generic way to specify any callable object like a function, a class method, a class instance method, an instance of a callable class or even a lambda function to be called for deserialization.

A value of type str sets a specific engine for deserialization. Keep in mind that all possible engines depend on the field type that this option is used with. At this moment there are next deserialization engines to choose from:

Applicable field types Supported engines Description
datetime, date, time ciso8601, pendulum How to parse datetime string. By default native fromisoformat of corresponding class will be used for datetime, date and time fields. It's the fastest way in most cases, but you can choose an alternative.

Example:

from datetime import datetime
from dataclasses import dataclass, field
from typing import List
from mashumaro import DataClassDictMixin
import ciso8601
import dateutil

@dataclass
class A(DataClassDictMixin):
    x: datetime = field(
        metadata={"deserialize": "pendulum"}
    )

class B(DataClassDictMixin):
    x: datetime = field(
        metadata={"deserialize": ciso8601.parse_datetime_as_naive}
    )

@dataclass
class C(DataClassDictMixin):
    dt: List[datetime] = field(
        metadata={
            "deserialize": lambda l: list(map(dateutil.parser.isoparse, l))
        }
    )

serialization_strategy option

This option is useful when you want to change the serialization behaviour for a class depending on some defined parameters. For this case you can create the special class implementing SerializationStrategy interface:

from dataclasses import dataclass, field
from datetime import datetime
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializationStrategy

class FormattedDateTime(SerializationStrategy):
    def __init__(self, fmt):
        self.fmt = fmt

    def serialize(self, value: datetime) -> str:
        return value.strftime(self.fmt)

    def deserialize(self, value: str) -> datetime:
        return datetime.strptime(value, self.fmt)

@dataclass
class DateTimeFormats(DataClassDictMixin):
    short: datetime = field(
        metadata={
            "serialization_strategy": FormattedDateTime(
                fmt="%d%m%Y%H%M%S",
            )
        }
    )
    verbose: datetime = field(
        metadata={
            "serialization_strategy": FormattedDateTime(
                fmt="%A %B %d, %Y, %H:%M:%S",
            )
        }
    )

formats = DateTimeFormats(
    short=datetime(2019, 1, 1, 12),
    verbose=datetime(2019, 1, 1, 12),
)
dictionary = formats.to_dict()
# {'short': '01012019120000', 'verbose': 'Tuesday January 01, 2019, 12:00:00'}
assert DateTimeFormats.from_dict(dictionary) == formats

If you don't want to remember the names of the options you can use field_options helper function:

from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options

@dataclass
class A(DataClassDictMixin):
    x: int = field(
        metadata=field_options(
            serialize=str,
            deserialize=int,
            ...
        )
    )

More options are on the way. If you know which option would be useful for many, please don't hesitate to create an issue or pull request.

Config options

If inheritance is not an empty word for you, you'll fall in love with the Config class. You can register serialize and deserialize methods, define code generation options and other things just in one place. Or in some classes in different ways if you need flexibility. Inheritance is always on the first place.

There is a base class BaseConfig that you can inherit for the sake of convenience, but it's not mandatory.

In the following example you can see how the debug flag is changed from class to class: ModelA will have debug mode enabled but ModelB will not.

from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig

class BaseModel(DataClassDictMixin):
    class Config(BaseConfig):
        debug = True

class ModelA(BaseModel):
    a: int

class ModelB(BaseModel):
    b: int

    class Config(BaseConfig):
        debug = False

Next section describes all supported options to use in the config.

debug config option

If you enable the debug option the generated code for your data class will be printed.

omit_none config option

If you want to skip None values on serialization you can add omit_none parameter to to_dict method using the code_generation_options list:

from dataclasses import dataclass
from typing import Optional
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig, TO_DICT_ADD_OMIT_NONE_FLAG

@dataclass
class Inner(DataClassDictMixin):
    x: int = None
    # "x" won't be omitted since there is no TO_DICT_ADD_OMIT_NONE_FLAG here

@dataclass
class Model(DataClassDictMixin):
    x: Inner
    a: int = None
    b: str = None  # will be omitted

    class Config(BaseConfig):
        code_generation_options = [TO_DICT_ADD_OMIT_NONE_FLAG]

Model(x=Inner(), a=1).to_dict(omit_none=True)  # {'x': {'x': None}, 'a': 1}

serialization_strategy config option

You can register custom SerializationStrategy, serialize and deserialize methods for specific types just in one place. It could be configured using a dictionary with types as keys. The value could be either a SerializationStrategy instance or a dictionary with serialize and deserialize values with the same meaning as in the field options.

from dataclasses import dataclass
from datetime import datetime, date
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.types import SerializationStrategy

class FormattedDateTime(SerializationStrategy):
    def __init__(self, fmt):
        self.fmt = fmt

    def serialize(self, value: datetime) -> str:
        return value.strftime(self.fmt)

    def deserialize(self, value: str) -> datetime:
        return datetime.strptime(value, self.fmt)

@dataclass
class DataClass(DataClassDictMixin):

    datetime: datetime
    date: date

    class Config(BaseConfig):
        serialization_strategy = {
            datetime: FormattedDateTime("%Y"),
            date: {
                # you can use specific str values for datetime here as well
                "deserialize": "pendulum",
                "serialize": date.isoformat,
            },
        }

instance = DataClass.from_dict({"datetime": "2021", "date": "2021"})
# DataClass(datetime=datetime.datetime(2021, 1, 1, 0, 0), date=Date(2021, 1, 1))
dictionary = instance.to_dict()
# {'datetime': '2021', 'date': '2021-01-01'}

Serialization hooks

In some cases you need to prepare input / output data or do some extraordinary actions at different stages of the deserialization / serialization lifecycle. You can do this with different types of hooks.

Before deserialization

For doing something with a dictionary that will be passed to deserialization you can use __pre_deserialize__ class method:

@dataclass
class A(DataClassJSONMixin):
    abc: int

    @classmethod
    def __pre_deserialize__(cls, d: Dict[Any, Any]) -> Dict[Any, Any]:
        return {k.lower(): v for k, v in d.items()}

print(DataClass.from_dict({"ABC": 123}))    # DataClass(abc=123)
print(DataClass.from_json('{"ABC": 123}'))  # DataClass(abc=123)

After deserialization

For doing something with a dataclass instance that was created as a result of deserialization you can use __post_deserialize__ class method:

@dataclass
class A(DataClassJSONMixin):
    abc: int

    @classmethod
    def __post_deserialize__(cls, obj: 'A') -> 'A':
        obj.abc = 456
        return obj

print(DataClass.from_dict({"abc": 123}))    # DataClass(abc=456)
print(DataClass.from_json('{"abc": 123}'))  # DataClass(abc=456)

Before serialization

For doing something before serialization you can use __pre_serialize__ method:

@dataclass
class A(DataClassJSONMixin):
    abc: int
    counter: ClassVar[int] = 0

    def __pre_serialize__(self) -> 'A':
        self.counter += 1
        return self

obj = DataClass(abc=123)
obj.to_dict()
obj.to_json()
print(obj.counter)  # 2

After serialization

For doing something with a dictionary that was created as a result of serialization you can use __post_serialize__ method:

@dataclass
class A(DataClassJSONMixin):
    user: str
    password: str

    def __post_serialize__(self, d: Dict[Any, Any]) -> Dict[Any, Any]:
        d.pop('password')
        return d

obj = DataClass(user="name", password="secret")
print(obj.to_dict())  # {"user": "name"}
print(obj.to_json())  # '{"user": "name"}'

TODO

  • add optional validation
  • write custom useful types such as URL, Email etc
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].