All Projects → coqui-ai → coqpit

coqui-ai / coqpit

Licence: MIT license
Simple but maybe too simple config management through python data classes. We use it for machine learning.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to coqpit

dataconf
Simple dataclasses configuration management for Python with hocon/json/yaml/properties/env-vars/dict support.
Stars: ✭ 40 (-40.3%)
Mutual labels:  yaml, serialization, dataclasses
Simple-YAML
A Java API that provides an easy-to-use way to store data using the YAML format.
Stars: ✭ 68 (+1.49%)
Mutual labels:  yaml, serialization, config-management
Strictyaml
Type-safe YAML parser and validator.
Stars: ✭ 836 (+1147.76%)
Mutual labels:  yaml, serialization
Mini Yaml
Single header YAML 1.0 C++11 serializer/deserializer.
Stars: ✭ 79 (+17.91%)
Mutual labels:  yaml, serialization
Datafiles
A file-based ORM for Python dataclasses.
Stars: ✭ 113 (+68.66%)
Mutual labels:  yaml, serialization
crystalizer
(De)serialize any Crystal object - out of the box. Supports JSON, YAML and Byte format.
Stars: ✭ 32 (-52.24%)
Mutual labels:  yaml, serialization
climatecontrol
Python library for loading settings and config data from files and environment variables
Stars: ✭ 20 (-70.15%)
Mutual labels:  yaml, dataclasses
Pretty Yaml
PyYAML-based module to produce pretty and readable YAML-serialized data
Stars: ✭ 110 (+64.18%)
Mutual labels:  yaml, serialization
Orjson
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
Stars: ✭ 2,595 (+3773.13%)
Mutual labels:  serialization, dataclasses
Rapidyaml
Rapid YAML - a library to parse and emit YAML, and do it fast.
Stars: ✭ 183 (+173.13%)
Mutual labels:  yaml, serialization
Srsly
🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)
Stars: ✭ 189 (+182.09%)
Mutual labels:  yaml, serialization
Chronicle Wire
A Java Serialisation Library that supports multiple formats
Stars: ✭ 204 (+204.48%)
Mutual labels:  yaml, serialization
Config
PHP library for simple configuration management
Stars: ✭ 39 (-41.79%)
Mutual labels:  yaml, config-management
theconf
Python Package for Managing Configurations
Stars: ✭ 35 (-47.76%)
Mutual labels:  yaml, config-management
Config
📝 Go config manage(load,get,set). support JSON, YAML, TOML, INI, HCL, ENV and Flags. Multi file load, data override merge, parse ENV var. Go应用配置加载管理,支持多种格式,多文件加载,远程文件加载,支持数据合并,解析环境变量名
Stars: ✭ 225 (+235.82%)
Mutual labels:  yaml, config-management
Yamldotnet
YamlDotNet is a .NET library for YAML
Stars: ✭ 1,382 (+1962.69%)
Mutual labels:  yaml, serialization
Typical
Typical: Fast, simple, & correct data-validation using Python 3 typing.
Stars: ✭ 111 (+65.67%)
Mutual labels:  serialization, typing
Dataclass factory
Modern way to convert python dataclasses or other objects to and from more common types like dicts or json-like structures
Stars: ✭ 116 (+73.13%)
Mutual labels:  serialization, typing
Kaml
YAML support for kotlinx.serialization
Stars: ✭ 178 (+165.67%)
Mutual labels:  yaml, serialization
Mashumaro
Fast and well tested serialization framework on top of dataclasses
Stars: ✭ 208 (+210.45%)
Mutual labels:  yaml, serialization

👩‍✈️ Coqpit

CI

Simple, light-weight and no dependency config handling through python data classes with to/from JSON serialization/deserialization.

Currently it is being used by 🐸TTS.

Why I need this

What I need from a ML configuration library...

  1. Fixing a general config schema in Python to guide users about expected values.

    Python is good but not universal. Sometimes you train a ML model and use it on a different platform. So, you need your model configuration file importable by other programming languages.

  2. Simple dynamic value and type checking with default values.

    If you are a beginner in a ML project, it is hard to guess the right values for your ML experiment. Therefore it is important to have some default values and know what range and type of input are expected for each field.

  3. Ability to decompose large configs.

    As you define more fields for the training dataset, data preprocessing, model parameters, etc., your config file tends to get quite large but in most cases, they can be decomposed, enabling flexibility and readability.

  4. Inheritance and nested configurations.

    Simply helps to keep configurations consistent and easier to maintain.

  5. Ability to override values from the command line when necessary.

    For instance, you might need to define a path for your dataset, and this changes for almost every run. Then the user should be able to override this value easily over the command line.

    It also allows easy hyper-parameter search without changing your original code. Basically, you can run different models with different parameters just using command line arguments.

  6. Defining dynamic or conditional config values.

    Sometimes you need to define certain values depending on the other values. Using python helps to define the underlying logic for such config values.

  7. No dependencies

    You don't want to install a ton of libraries for just configuration management. If you install one, then it is better to be just native python.

🚫 Limitations

  • Union type dataclass fields cannot be parsed from console arguments due to the type ambiguity.
  • JSON is the only supported serialization format, although the others can be easily integrated.
  • Listtype with multiple item type annotations are not supported. (e.g. List[int, str]).
  • dict fields are parsed from console arguments as JSON str without type checking. (e.g --val_dict '{"a":10, "b":100}').
  • MISSING fields cannot be avoided when parsing console arguments.

🔍 Examples

👉 Simple Coqpit

import os
from dataclasses import asdict, dataclass, field
from typing import List, Union
from coqpit import MISSING, Coqpit, check_argument


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_d: float = 10.21
    val_c: str = "Coqpit is great!"
    # mandatory field
    # raise an error when accessing the value if it is not changed. It is a way to define
    val_k: int = MISSING
    # optional field
    val_dict: dict = field(default_factory=lambda: {"val_aa": 10, "val_ss": "This is in a dict."})
    # list of list
    val_listoflist: List[List] = field(default_factory=lambda: [[1, 2], [3, 4]])
    val_listofunion: List[List[Union[str,int]]] = field(default_factory=lambda: [[1, 3], [1, "Hi!"]])

    def check_values(
        self,
    ):  # you can define explicit constraints on the fields using `check_argument()`
        """Check config fields"""
        c = asdict(self)
        check_argument("val_a", c, restricted=True, min_val=10, max_val=2056)
        check_argument("val_b", c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument("val_c", c, restricted=True)


if __name__ == "__main__":
    file_path = os.path.dirname(os.path.abspath(__file__))
    config = SimpleConfig()

    # try MISSING class argument
    try:
        k = config.val_k
    except AttributeError:
        print(" val_k needs a different value before accessing it.")
    config.val_k = 1000

    # try serialization and deserialization
    print(config.serialize())
    print(config.to_json())
    config.save_json(os.path.join(file_path, "example_config.json"))
    config.load_json(os.path.join(file_path, "example_config.json"))
    print(config.pprint())

    # try `dict` interface
    print(*config)
    print(dict(**config))

    # value assignment by mapping
    config["val_a"] = -999
    print(config["val_a"])
    assert config.val_a == -999

👉 Serialization

import os
from dataclasses import asdict, dataclass, field
from coqpit import Coqpit, check_argument
from typing import List, Union


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_c: str = "Coqpit is great!"

    def check_values(self,):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_a', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_b', c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument('val_c', c, restricted=True)


@dataclass
class NestedConfig(Coqpit):
    val_d: int = 10
    val_e: int = None
    val_f: str = "Coqpit is great!"
    sc_list: List[SimpleConfig] = None
    sc: SimpleConfig = SimpleConfig()
    union_var: Union[List[SimpleConfig], SimpleConfig] = field(default_factory=lambda: [SimpleConfig(),SimpleConfig()])

    def check_values(self,):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_d', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_e', c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument('val_f', c, restricted=True)
        check_argument('sc_list', c, restricted=True, allow_none=True)
        check_argument('sc', c, restricted=True, allow_none=True)


if __name__ == '__main__':
    file_path = os.path.dirname(os.path.abspath(__file__))
    # init 🐸 dataclass
    config = NestedConfig()

    # save to a json file
    config.save_json(os.path.join(file_path, 'example_config.json'))
    # load a json file
    config2 = NestedConfig(val_d=None, val_e=500, val_f=None, sc_list=None, sc=None, union_var=None)
    # update the config with the json file.
    config2.load_json(os.path.join(file_path, 'example_config.json'))
    # now they should be having the same values.
    assert config == config2

    # pretty print the dataclass
    print(config.pprint())

    # export values to a dict
    config_dict = config.to_dict()
    # crate a new config with different values than the defaults
    config2 = NestedConfig(val_d=None, val_e=500, val_f=None, sc_list=None, sc=None, union_var=None)
    # update the config with the exported valuess from the previous config.
    config2.from_dict(config_dict)
    # now they should be having the same values.
    assert config == config2

👉 argparse handling and parsing.

import argparse
import os
from dataclasses import asdict, dataclass, field
from typing import List

from coqpit import Coqpit, check_argument
import sys


@dataclass
class SimplerConfig(Coqpit):
    val_a: int = field(default=None, metadata={'help': 'this is val_a'})


@dataclass
class SimpleConfig(Coqpit):
    val_req: str # required field
    val_a: int = field(default=10,
                       metadata={'help': 'this is val_a of SimpleConfig'})
    val_b: int = field(default=None, metadata={'help': 'this is val_b'})
    nested_config: SimplerConfig = SimplerConfig()
    mylist_with_default: List[SimplerConfig] = field(
        default_factory=lambda:
        [SimplerConfig(val_a=100),
         SimplerConfig(val_a=999)],
        metadata={'help': 'list of SimplerConfig'})

    # mylist_without_default: List[SimplerConfig] = field(default=None, metadata={'help': 'list of SimplerConfig'})  # NOT SUPPORTED YET!

    def check_values(self, ):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_a', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_b',
                       c,
                       restricted=True,
                       min_val=128,
                       max_val=4058,
                       allow_none=True)
        check_argument('val_req', c, restricted=True)


def main():
    # reference config that we like to match with the one parsed from argparse
    config_ref = SimpleConfig(val_req='this is different',
                              val_a=222,
                              val_b=999,
                              nested_config=SimplerConfig(val_a=333),
                              mylist_with_default=[
                                  SimplerConfig(val_a=222),
                                  SimplerConfig(val_a=111)
                              ])

    # create new config object from CLI inputs
    parsed = SimpleConfig.init_from_argparse()
    parsed.pprint()

    # check the parsed config with the reference config
    assert parsed == config_ref


if __name__ == '__main__':
    sys.argv.extend(['--coqpit.val_req', 'this is different'])
    sys.argv.extend(['--coqpit.val_a', '222'])
    sys.argv.extend(['--coqpit.val_b', '999'])
    sys.argv.extend(['--coqpit.nested_config.val_a', '333'])
    sys.argv.extend(['--coqpit.mylist_with_default.0.val_a', '222'])
    sys.argv.extend(['--coqpit.mylist_with_default.1.val_a', '111'])
    main()

🤸‍♀️ Merging coqpits

import os
from dataclasses import dataclass
from coqpit import Coqpit, check_argument


@dataclass
class CoqpitA(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_d: float = 10.21
    val_c: str = "Coqpit is great!"


@dataclass
class CoqpitB(Coqpit):
    val_d: int = 25
    val_e: int = 257
    val_f: float = -10.21
    val_g: str = "Coqpit is really great!"


if __name__ == '__main__':
    file_path = os.path.dirname(os.path.abspath(__file__))
    coqpita = CoqpitA()
    coqpitb = CoqpitB()
    coqpitb.merge(coqpita)
    print(coqpitb.val_a)
    print(coqpitb.pprint())

Development

Install the pre-commit hook to automatically check your commits for style and hinting issues:

$ python .pre-commit-2.12.1.pyz install

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].