All Projects → networktocode → Diffsync

networktocode / Diffsync

Licence: other
A utility library for comparing and synchronizing different datasets.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Diffsync

Crossbeam
Tools for concurrent programming in Rust
Stars: ✭ 4,180 (+15976.92%)
Mutual labels:  synchronization
Hive2hive
Java library for secure, distributed, P2P-based file synchronization and sharing.
Stars: ✭ 483 (+1757.69%)
Mutual labels:  synchronization
Monstache
a go daemon that syncs MongoDB to Elasticsearch in realtime
Stars: ✭ 736 (+2730.77%)
Mutual labels:  synchronization
Subsync
Subtitle Speech Synchronizer
Stars: ✭ 379 (+1357.69%)
Mutual labels:  synchronization
Symmetric Ds
SymmetricDS is a database and file synchronization solution that is platform-independent, web-enabled, and database agnostic. SymmetricDS was built to make data replication across two to tens of thousands of databases and file systems fast, easy and resilient. We specialize in near real time, bi-directional data replication across large node networks over the WAN or LAN.
Stars: ✭ 450 (+1630.77%)
Mutual labels:  synchronization
Ftp Deployment
A tool for automated deployment of web applications to an FTP server.
Stars: ✭ 527 (+1926.92%)
Mutual labels:  synchronization
Drwmutex
Distributed RWMutex in Go
Stars: ✭ 308 (+1084.62%)
Mutual labels:  synchronization
Casync
Content-Addressable Data Synchronization Tool
Stars: ✭ 890 (+3323.08%)
Mutual labels:  synchronization
Soundsync
Virtual cables between any audio source and any speaker in your home
Stars: ✭ 453 (+1642.31%)
Mutual labels:  synchronization
Osync
A robust two way (bidirectional) file sync script based on rsync with fault tolerance, POSIX ACL support, time control and near realtime sync
Stars: ✭ 677 (+2503.85%)
Mutual labels:  synchronization
Dsync
IDAPython plugin that synchronizes disassembler and decompiler views
Stars: ✭ 399 (+1434.62%)
Mutual labels:  synchronization
Canal mysql elasticsearch sync
基于canal的mysql和elasticsearch实时同步方案,支持增量同步和全量同步
Stars: ✭ 410 (+1476.92%)
Mutual labels:  synchronization
Ffsubsync
Automagically synchronize subtitles with video.
Stars: ✭ 5,167 (+19773.08%)
Mutual labels:  synchronization
Bus
Efficient, lock-free, bounded Rust broadcast channel
Stars: ✭ 368 (+1315.38%)
Mutual labels:  synchronization
Server
The Etebase server (so you can run your own)
Stars: ✭ 826 (+3076.92%)
Mutual labels:  synchronization
Telepat Api
This is the Telepat API where HTTP calls are made. CRUD operations are not processed here directly. Messages are sent to the Telepat workers where CRUD operations are being taken care of along with client communication (notifications).
Stars: ✭ 335 (+1188.46%)
Mutual labels:  synchronization
Mt Aws Glacier
Perl Multithreaded Multipart sync to Amazon Glacier
Stars: ✭ 522 (+1907.69%)
Mutual labels:  synchronization
Rsm
distributed computing toolkit in rust
Stars: ✭ 17 (-34.62%)
Mutual labels:  synchronization
Ipfs Sync
Live IPFS directory synchronization.
Stars: ✭ 16 (-38.46%)
Mutual labels:  synchronization
Outlookcaldavsynchronizer
Sync Outlook with Google, SOGo, Nextcloud or any other CalDAV/CardDAV server
Stars: ✭ 560 (+2053.85%)
Mutual labels:  synchronization

DiffSync

DiffSync is a utility library that can be used to compare and synchronize different datasets.

For example, it can be used to compare a list of devices from 2 inventory systems and, if required, synchronize them in either direction.

A = DiffSyncSystemA()
B = DiffSyncSystemB()

A.load()
B.load()

# Show the difference between both systems, that is, what would change if we applied changes from System B to System A
diff_a_b = A.diff_from(B)
print(diff.str())

# Update System A to align with the current status of system B
A.sync_from(B)

# Update System B to align with the current status of system A
A.sync_to(B)

Getting started

To be able to properly compare different datasets, DiffSync relies on a shared datamodel that both systems must use. Specifically, each system or dataset must provide a DiffSync "adapter" subclass, which in turn represents its dataset as instances of one or more DiffSyncModel data model classes.

When comparing two systems, DiffSync detects the intersection between the two systems (which data models they have in common, and which attributes are shared between each pair of data models) and uses this intersection to compare and/or synchronize the data.

Define your model with DiffSyncModel

DiffSyncModel is based on Pydantic and is using Python typing to define the format of each attribute. Each DiffSyncModel subclass supports the following class-level attributes:

  • _modelname - Defines the type of the model; used to identify common models between different systems (Mandatory)
  • _identifiers - List of instance field names used as primary keys for this object (Mandatory)
  • _shortname - List of instance field names to use for a shorter name (Optional)
  • _attributes - List of non-identifier instance field names for this object; used to identify the fields in common between data models for different systems (Optional)
  • _children - Dict of {<model_name>: <field_name>} indicating which fields store references to child data model instances. (Optional)

DiffSyncModel instances must be uniquely identified by their unique id, composed of all fields defined in _identifiers. The unique id must be globally meaningful (such as an unique instance name or slug), as it is used to identify object correspondence between differing systems or data sets. It must not be a value that is only locally meaningful, such as a database primary key integer value.

Only fields listed in _identifiers, _attributes, or _children will be potentially included in comparison and synchronization between systems or data sets. Any other fields will be ignored; this allows for a model to additionally contain fields that are only locally relevant (such as database primary key values) and therefore are irrelevant to comparisons.

from typing import List, Optional
from diffsync import DiffSyncModel

class Site(DiffSyncModel):
    _modelname = "site"
    _identifiers = ("name",)
    _shortname = ()
    _attributes = ("contact_phone",)
    _children = {"device": "devices"}

    name: str
    contact_phone: Optional[str]
    devices: List = list()
    database_pk: Optional[int]  # not listed in _identifiers/_attributes/_children as it's only locally significant

Relationship between models

Currently the relationships between models are very loose by design. Instead of storing an object, it's recommended to store the unique id of an object and retrieve it from the store as needed. The add_child() API of DiffSyncModel provides this behavior as a default.

Define your system adapter with DiffSync

A DiffSync "adapter" subclass must reference each model available at the top of the object by its modelname and must have a top_level attribute defined to indicate how the diff and the synchronization should be done. In the example below, "site" is the only top level object so the synchronization engine will only check all known Site instances and all children of each Site. In this case, as shown in the code above, Devices are children of Sites, so this is exactly the intended logic.

from diffsync import DiffSync

class BackendA(DiffSync):

    site = Site
    device = Device

    top_level = ["site"]

It's up to the implementer to populate the DiffSync's internal cache with the appropriate data. In the example below we are using the load() method to populate the cache but it's not mandatory, it could be done differently.

Store data in a DiffSync object

To add a site to the local cache/store, you need to pass a valid DiffSyncModel object to the add() function.

class BackendA(DiffSync):
    [...]

    def load(self):
        # Store an individual object
        site = self.site(name="nyc")
        self.add(site)

        # Store an object and define it as a child of another object
        device = self.device(name="rtr-nyc", role="router", site_name="nyc")
        self.add(device)
        site.add_child(device)

Update remote system on sync

When data synchronization is performed via sync_from() or sync_to(), DiffSync automatically updates the in-memory DiffSyncModel objects of the receiving adapter. The implementer of this class is responsible for ensuring that any remote system or data store is updated correspondingly. There are two usual ways to do this, depending on whether it's more convenient to manage individual records (as in a database) or modify the entire data store in one pass (as in a file-based data store).

Manage individual records

To update individual records in a remote system, you need to extend your DiffSyncModel class(es) to define your own create, update and/or delete methods for each model. A DiffSyncModel instance stores a reference to its parent DiffSync adapter instance in case you need to use it to look up other model instances from the DiffSync's cache.

class Device(DiffSyncModel):
    [...]

    @classmethod
    def create(cls, diffsync, ids, attrs):
        ## TODO add your own logic here to create the device on the remote system
        # Call the super().create() method to create the in-memory DiffSyncModel instance
        return super().create(ids=ids, diffsync=diffsync, attrs=attrs)

    def update(self, attrs):
        ## TODO add your own logic here to update the device on the remote system
        # Call the super().update() method to update the in-memory DiffSyncModel instance
        return super().update(attrs)

    def delete(self):
        ## TODO add your own logic here to delete the device on the remote system
        # Call the super().delete() method to remove the DiffSyncModel instance from its parent DiffSync adapter
        super().delete()
        return self

Bulk/batch modifications

If you prefer to update the entire remote system with the final state after performing all individual create/update/delete operations (as might be the case if your "remote system" is a single YAML or JSON file), the easiest place to implement this logic is in the sync_complete() callback method that is automatically invoked by DiffSync upon completion of a sync operation.

class BackendA(DiffSync):
    [...]

    def sync_complete(self, source: DiffSync, diff: Diff, flags: DiffSyncFlags, logger: structlog.BoundLogger):
        ## TODO add your own logic to update the remote system now.
        # The various parameters passed to this method are for your convenience in implementing more complex logic, and
        # can be ignored if you do not need them.
        #
        # The default DiffSync.sync_complete() method does nothing, but it's always a good habit to call super():
        super().sync_complete(source, diff, flags, logger)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].