All Projects → LabeliaLabs → distributed-learning-contributivity

LabeliaLabs / distributed-learning-contributivity

Licence: Apache-2.0 license
Simulate collaborative ML scenarios, experiment multi-partner learning approaches and measure respective contributions of different datasets to model performance.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to distributed-learning-contributivity

Pysyft
A library for answering questions using data you cannot see
Stars: ✭ 7,811 (+15840.82%)
Mutual labels:  federated-learning
CRFL
CRFL: Certifiably Robust Federated Learning against Backdoor Attacks (ICML 2021)
Stars: ✭ 44 (-10.2%)
Mutual labels:  federated-learning
DBA
DBA: Distributed Backdoor Attacks against Federated Learning (ICLR 2020)
Stars: ✭ 98 (+100%)
Mutual labels:  federated-learning
DeML-Golem
Proof Of Concept of DEcentralised Machine Learning on top of the Golem (https://golem.network/) architecture
Stars: ✭ 35 (-28.57%)
Mutual labels:  federated-learning
FedScale
FedScale is a scalable and extensible open-source federated learning (FL) platform.
Stars: ✭ 274 (+459.18%)
Mutual labels:  federated-learning
Addressing-Class-Imbalance-FL
This is the code for Addressing Class Imbalance in Federated Learning (AAAI-2021).
Stars: ✭ 62 (+26.53%)
Mutual labels:  federated-learning
Fate
An Industrial Grade Federated Learning Framework
Stars: ✭ 3,775 (+7604.08%)
Mutual labels:  federated-learning
ambianic-edge
The core runtime engine for Ambianic Edge devices.
Stars: ✭ 98 (+100%)
Mutual labels:  federated-learning
FedFusion
The implementation of "Towards Faster and Better Federated Learning: A Feature Fusion Approach" (ICIP 2019)
Stars: ✭ 30 (-38.78%)
Mutual labels:  federated-learning
federated pca
Federated Principal Component Analysis Revisited!
Stars: ✭ 30 (-38.78%)
Mutual labels:  federated-learning
FedDANE
FedDANE: A Federated Newton-Type Method (Asilomar Conference on Signals, Systems, and Computers ‘19)
Stars: ✭ 25 (-48.98%)
Mutual labels:  federated-learning
Awesome-Federated-Learning-on-Graph-and-GNN-papers
Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.
Stars: ✭ 206 (+320.41%)
Mutual labels:  federated-learning
FEDL
FEDL-Federated Learning algorithm using TensorFlow (Transaction on Networking 2021)
Stars: ✭ 41 (-16.33%)
Mutual labels:  federated-learning
FedDA
Source code for 'Dual Attention Based FL for Wireless Traffic Prediction'
Stars: ✭ 41 (-16.33%)
Mutual labels:  federated-learning
federated-learning
tf implementation of federated learning
Stars: ✭ 36 (-26.53%)
Mutual labels:  federated-learning
Awesome Mlops
A curated list of references for MLOps
Stars: ✭ 7,119 (+14428.57%)
Mutual labels:  federated-learning
federated
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data
Stars: ✭ 25 (-48.98%)
Mutual labels:  federated-learning
PyAriesFL
Federated Learning on HyperLedger Aries
Stars: ✭ 19 (-61.22%)
Mutual labels:  federated-learning
FedNLP
FedNLP: An Industry and Research Integrated Platform for Federated Learning in Natural Language Processing, Backed by FedML, Inc. The Previous Research Version is Accepted to NAACL 2022
Stars: ✭ 215 (+338.78%)
Mutual labels:  federated-learning
baai-federated-learning-helmet-baseline
电力人工智能数据竞赛——安全帽未佩戴行为目标检测赛道基准模型
Stars: ✭ 26 (-46.94%)
Mutual labels:  federated-learning

Build Status Code Coverage Open In Colab Discuss on Slack

MPLC: Multi-Partner Learning and Contributivity

In short, MPLC enables to:

  • simulate collaborative multi-partner ML scenarios
  • experiment and benchmark multi-partner learning approaches
  • experiment and benchmark contributivity measurement approaches

Table of content


Introduction

This work focuses on collaborative data science projects where multiple partners want to train a model on multiple datasets, contributed by different data providing partners. Such scenarios are sometimes referenced as cross-silos federated learning (see for example the reference paper Advances and Open Problems in Federated Learning).

In the context of cross-silos federated learning scenarios, it addresses the following questions:

  • how to train and test a predictive model, what federated learning strategies could be considered? Federated averaging seems to have become the default approach, but other could be explored.
  • how to measure how much each dataset contributed to the performance of the collective model? This question can be of interest in some cases, for example as a basis to agree on how to share the reward of the ML challenge or the future revenues derived from the predictive model, or to detect possible corrupted datasets or partners not playing by the rules.

This library aims to help researchers and practitioners to explore these questions.

Context of this work

This work is being carried out in the context of collaborative research projects, open science and open source software. It is work in progress.

How to interact with this project?

It depends in what capacity you are interested! For example:

  • If you'd like to experiment right now by yourself multi-partner learning approaches and contributivity measurement methods, jump to section Run an experiment.
  • If you'd like to get in touch with active members of the workgroup, jump to section Contacts, contributions, collaborations. If you are a student or a teacher, we are used to discuss student projects related to the mplc library.
  • If you are already familiar with this type of projects, you can either have a look at section Ongoing work and improvement plan or head towards issues and PRs to see what's going on these days. We use the help wanted tag to flag issues on which help is particularly wanted, but other open issues would also very much welcome contributions. There is also a CONTRIBUTING.md with indications and best practices we recommend.

Should you have any question, don't hesitate reach out, we'll be happy to discuss how we could help.

About this repository

In this repository, we benchmark different distributed learning and contributivity measurement approaches on public datasets artificially partitioned in a number of individual datasets, to mock a collaborative ML project (a cross-silos federated learning project).

The public datasets currently supported are: MNIST, CIFAR10, TITANIC, ESC50 and IMDB. They also act as templates for using the library on custom datasets.

The documentation is here

Structure of the library

This library can be schematically broken down into 3 blocks:

  1. Scenarios
  2. Multi-partner learning approaches
  3. Contributivity measurement approaches

Scenarios

A key capability is to easily define and simulate different multi-partner settings to be able to experiment on them. For that, the library enables to configure scenarios by specifying the number of partners, how the dataset is partitioned among them, etc. See the first tutorial, and the related documentation's section Definition of collaborative scenarios for all available parameters.

Multi-partner learning approaches

Once a given scenario is configured, it is then possible to configure the multi-partner learning approach. So far, 3 different approaches are implemented (federated averaging, sequential, sequential averaging). See the related documentation section Configuration of the collaborative and distributed learning for descriptive schemas and additional ML-related parameters.

Contributivity measurement approaches

Finally, with a given scenario and selected multi-partner learning approaches, it becomes possible to address contributivity measurement approaches. See the related documentation's sections Configuration of contributivity measurement methods to be tested and Contributivity measurement approaches studied and implemented.

Installation

Using pip

pip install mplc

This installs the last packaged version available on PyPI.

Build from Source

If you want to install mplc from the repository, make sure that you got the latest version of pip.

Then clone the repository, and trigger the installation from local sources.

git clone https://github.com/LabeliaLabs/distributed-learning-contributivity.git
cd distributed-learning-contributivity
pip install -e . 

Run an experiment

There are two ways to run an experiment of multi-partner learning approaches and/or contributivity methods.

(Recommended) Defining an Experiment in the code

You can first use the mplc library in a notebook, or regular python script, as it is demonstrated in the tutorials and in the below code snippet.

import mplc

from mplc.experiment import Experiment
from mplc.scenario import Scenario

# Let's configure different multi-partner scenarios
scenario1 = Scenario(partners_count=3,
                     amounts_per_partner=[0.2, 0.3, 0.5],
                     dataset_name='cifar10',
                     epoch_count=10,
                     minibatch_count=3,
                     contributivity_methods=["Shapley values", "S-Model"]
                     )
scenario2 = Scenario(4, [0.25]*4)  # Here attributes (e.g. dataset, epochs...) will be default values 
scenario3 = Scenario(4, [0.8, 0.1, 0.05, 0.05])

# Now let's instantiate an Experiment and add the above scenarios
my_exp = Experiment(experiment_name='my_first_experiment',
                    nb_repeats=10,
                    scenarios_list=[scenario1, scenario3],
                    )
my_exp.add_scenario(scenario2)

# Everything is now set to run the Experiment
my_exp.run()

(Alternative) Defining an Experiment with a config file

Alternatively, you can also use the main.py provided in the repository together with a .yml config file.

  1. Define your scenario(s) in the config.yml file by changing the values of the suggested parameters of the example scenario (you can browse more available parameters in the documentation). For example:
experiment_name: my_custom_experiment
n_repeats: 5
scenario_params_list:
  - dataset:
    'mnist':
    - 'random_initialization'
    'cifar10':
    - 'random_initialization'
    dataset_proportion:
      - 0.5
    partners_count: 
      - 3
    amounts_per_partner:
      - [0.4, 0.3, 0.3]
    samples_split_option:
      - 'random'
      - ['advanced', [[7, 'shared'], [6, 'shared'], [2, 'specific']]]
    multi_partner_learning_approach:
      - 'fedavg'
    aggregation:
      - 'data-volume'
      - 'uniform'
    contributivity_methods:
      - ["Shapley values", "Independent scores", "TMCS"]
    epoch_count:
      - 20
    minibatch_count:
      - 20
    gradient_updates_per_pass_count:
      - 8

Under scenario_params_list, enter a list of sets of scenario(s). Each set starts with - dataset: and must have only one partners_count value. The length of amount_per_partners, corrupted_datasets (and samples_split_option when the advanced definition is used) must match the partner_counts value. If for a given parameter multiple values are specified, e.g. like for aggregation in the example scenario above, all possible combinations of parameters will be assembled as separate scenarios and run.

  1. Then execute main.py -f config.yml. Add the -v argument if you want a more verbose output.

  2. A results.csv file will be generated in a new folder for your experiment under /experiments/<your_experiment>. You can read this raw results.csv file or use the notebooks in /notebooks.

Note: example experiment(s) are stored in folder /saved_experiments to illustrate the use of the library. The notebooks include graphs, like for example the following:

Example graphs

Reference scenarios

Description of the reference scenarios

We defined 5 reference scenarios on which we propose to test and benchmark the different multi-partner learning approaches and contributivity measurement methods.

The 5 reference scenarios are described on the following schema (link to editable version):

Reference scenarios

In brief:

  • Scenarios 1 and 2 with 2 partners only enable simple baselines, with different data splits (each partner having samples of different classes in the first scenario, of all classes both in the second one), and introducing corrupted data;

  • Scenario 3 proposes a more realistic configuration, with partners having both samples of common classes and samples of different classes each;

  • Scenario 4 offers a case with 5 partners, and an identical distribution of data samples of all classes, but with 1 partner having its data entirely corrupted;

  • Scenario 5 is more complex, with 11 partners and a mix of identical distribution of data samples of several classes for a majority of partners, data samples of different classes for certain other partners, and corrupted data also.

Results and benchmarks

Results of experiments and benchmarks of multi-partner learning approaches and contributivity methods on the reference scenarios will be summarised in this section in the upcoming months. Associated notebooks and full results will be shared on Open Science Framework.

Ongoing work and improvement plan

The current work focuses on the following 4 priorities:

  1. Design and implement new multi-partner learning approaches
  2. Design and implement new contributivity measurement methods
  3. Perform experiments and gain experience about best-suited multi-partner learning approaches and contributivity measurement methods in different situations
  4. Make the library agnostic/compatible with other datasets and model architectures

There is also a transverse, continuous improvement effort on code quality, readability, optimization.

This work is collaborative, enthusiasts are welcome to comment open issues and PRs or open new ones.

Contacts, contributions, collaborations

Should you be interested in this open effort and would like to share any question, suggestion or input, you can use the following channels:

logo Labelia Labs

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].