All Projects → kamathhrishi → GreyNSights

kamathhrishi / GreyNSights

Licence: MIT License
Privacy-Preserving Data Analysis using Pandas

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to GreyNSights

Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (+72.22%)
Mutual labels:  pandas, data-analytics, data-analysis
data-analysis-using-python
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (+350%)
Mutual labels:  pandas, data-analytics, data-analysis
r4dswebsite
Public repository for the R4DS community website.
Stars: ✭ 19 (+5.56%)
Mutual labels:  data-analytics, data-analysis
PandasVersusExcel
Python数据分析入门,数据分析师入门
Stars: ✭ 120 (+566.67%)
Mutual labels:  pandas, data-analysis
online-course-recommendation-system
Built on data from Pluralsight's course API fetched results. Works with model trained with K-means unsupervised clustering algorithm.
Stars: ✭ 31 (+72.22%)
Mutual labels:  pandas, data-analysis
DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+4583.33%)
Mutual labels:  pandas, data-analysis
tutorials
Short programming tutorials pertaining to data analysis.
Stars: ✭ 14 (-22.22%)
Mutual labels:  pandas, data-analysis
dataquest-guided-projects-solutions
My dataquest project solutions
Stars: ✭ 35 (+94.44%)
Mutual labels:  pandas, data-analysis
Datscan
DatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (-27.78%)
Mutual labels:  pandas, data-analysis
8-Week-SQL-Challenge
Case study solutions for #8WeekSQLChallenge at https://8weeksqlchallenge.com
Stars: ✭ 43 (+138.89%)
Mutual labels:  data-analytics, data-analysis
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+2227.78%)
Mutual labels:  pandas, data-analysis
ipython-notebooks
A collection of Jupyter notebooks exploring different datasets.
Stars: ✭ 43 (+138.89%)
Mutual labels:  pandas, data-analysis
pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+794.44%)
Mutual labels:  pandas, data-analysis
Data-Science-101
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (+5.56%)
Mutual labels:  pandas, data-analysis
visions
Type System for Data Analysis in Python
Stars: ✭ 136 (+655.56%)
Mutual labels:  pandas, data-analysis
Data-Science-Resources
A guide to getting started with Data Science and ML.
Stars: ✭ 17 (-5.56%)
Mutual labels:  pandas, data-analysis
Dominando-Pandas
Este repositório está destinado ao processo de aprendizagem da biblioteca Pandas.
Stars: ✭ 22 (+22.22%)
Mutual labels:  pandas, data-analysis
Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+400%)
Mutual labels:  pandas, data-analytics
tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
Stars: ✭ 212 (+1077.78%)
Mutual labels:  pandas, data-analysis
Fraud-Detection-in-Online-Transactions
Detecting Frauds in Online Transactions using Anamoly Detection Techniques Such as Over Sampling and Under-Sampling as the ratio of Frauds is less than 0.00005 thus, simply applying Classification Algorithm may result in Overfitting
Stars: ✭ 41 (+127.78%)
Mutual labels:  data-analytics, data-analysis

GreyNSights

The grey area between privacy and utility

Introductory Blogpost

GreyNSights is a Framework for Privacy-Preserving Data Analysis. Currently with support only for Pandas. The framework allows analysts to remotely query a dataset such that the dataset remains at source and private to data analyst. The package offers flexbility to the analyst by ensuring that they can use the same pandas syntax for analyzing and transforming datasets, but cannot view the indiviual rows. GreyNSights also offers flexibility to query several parties together and get aggregate statistics without revealing individual counts of parties.

Not for production usage.


The three major principles behind the library:

  • No raw data is exposed only aggregates

    The analyst can query and transform the dataset however they would want to, but can only get the aggregate results back.

  • The aggregates or analysis does not leak any information about individual rows

    The aggregate results are differentially private securing data rows from differencing attacks.

  • Pandas capabilities to transform and process datasets is still preserved

    The analyst might have to add a few lines of code for initializing the setup with dataowner, but they would essentially use the same pandas syntax ensuring anybody who already knows pandas could use without having to learn anything more.

Installation

  1. Clone the repository

    https://github.com/kamathhrishi/GreyNSights.git

  2. Install the required packages

    pip install requirements.txt

  3. Install the library from source

    python3 setup.py install

Workflow Diagram

Usage

Analysis using GreyNSights hosted remotely.

#Initilization code of GreyNSights
import GreyNsights
from GreyNsights.analyst import DataWorker, DataSource, Pointer, Command, Analyst
from GreyNsights.frameworks import framework

identity = Analyst("Alice", port=65441, host="127.0.0.1")
worker = DataWorker(port=6544, host="127.0.0.1")
dataset = DataSource(identity,worker, "Sample Data")
config = dataset.get_config()

#Initialization Pointer
dataset_pt = config.approve().init_pointer()

#Analysis of dataset
df = pandas.DataFrame(dataset_pt)
df.columns
df.describe().get()
df['carrots_eaten'].mean().get()
df['carrots_eaten'].sum().get()
(df['carrots_eaten']>70).sum().get()
df['carrots_eaten'].max().get()

Analysis using Pandas

dataset=pd.read_csv(<PATH>)

df = pandas.DataFrame(dataset)
df.columns
df.describe().get()
df['carrots_eaten'].mean()
df['carrots_eaten'].sum()
(df['carrots_eaten']>70).sum()
df['carrots_eaten'].max()

Examples

  1. Accidents example provides examples of how range of queries could be performed and how datasets could be transformed using GreyNSights
  2. Federated Analytics example which shows how you could analyze datasets of several parties together. This is only restricted to linear queries such as sum, average, std and counts.

Contributing

Read CONTRIBUTING documentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].