Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → capeprivacy → Cape Python

capeprivacy / Cape Python

Licence: apache-2.0

Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark

Programming Languages

python

139335 projects - #7 most used programming language

Labels

hacktoberfest data-science privacy spark pandas collaboration policy

Projects that are alternatives of or similar to Cape Python

Datacompy

Pandas and Spark DataFrame comparison for humans

Stars: ✭ 147 (+17.6%)

Mutual labels: data-science, spark, pandas

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+17538.4%)

Mutual labels: data-science, spark, pandas

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+2335.2%)

Mutual labels: data-science, spark, pandas

H1st

The AI Application Platform We All Need. Human AND Machine Intelligence. Based on experience building AI solutions at Panasonic: robotics predictive maintenance, cold-chain energy optimization, Gigafactory battery mfg, avionics, automotive cybersecurity, and more.

Stars: ✭ 697 (+457.6%)

Mutual labels: collaboration, hacktoberfest, data-science

Orange3

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+2421.6%)

Mutual labels: hacktoberfest, data-science, pandas

Dvc

🦉Data Version Control | Git for Data & Models | ML Experiments Management

Stars: ✭ 9,004 (+7103.2%)

Mutual labels: collaboration, hacktoberfest, data-science

Nothing Private

Do you think you are safe using private browsing or incognito mode?. 😄 👿 This will prove that you're wrong.

Stars: ✭ 1,375 (+1000%)

Mutual labels: hacktoberfest, privacy

Sigmoidal ai

Tutoriais de Python, Data Science, Machine Learning e Deep Learning - Sigmoidal

Stars: ✭ 103 (-17.6%)

Mutual labels: data-science, pandas

Python Bigdata

Data science and Big Data with Python

Stars: ✭ 112 (-10.4%)

Mutual labels: data-science, spark

Goodwork

Self hosted project management and collaboration tool powered by TALL stack

Stars: ✭ 1,730 (+1284%)

Mutual labels: collaboration, hacktoberfest

Danfojs

danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

Stars: ✭ 1,304 (+943.2%)

Mutual labels: data-science, pandas

Sweetviz

Visualize and compare datasets, target values and associations, with one line of code.

Stars: ✭ 1,851 (+1380.8%)

Mutual labels: data-science, pandas

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+1112.8%)

Mutual labels: data-science, pandas

Awesome Openminds Team

A Repository for students, geeks, programmers, and designers

Stars: ✭ 101 (-19.2%)

Mutual labels: collaboration, hacktoberfest

Sspipe

Simple Smart Pipe: python productivity-tool for rapid data manipulation

Stars: ✭ 96 (-23.2%)

Mutual labels: data-science, pandas

Pyspark Cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Stars: ✭ 108 (-13.6%)

Mutual labels: data-science, spark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+970.4%)

Mutual labels: data-science, spark

Seaborn Tutorial

This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.

Stars: ✭ 114 (-8.8%)

Mutual labels: data-science, pandas

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+1204%)

Mutual labels: pandas, spark

Syft.js

The official Syft worker for Web and Node, built in Javascript

Stars: ✭ 118 (-5.6%)

Mutual labels: hacktoberfest, privacy

View All Similar Projects ➔

Cape Python

A Python library supporting data transformations and collaborative privacy policies, for data science projects in Pandas and Apache Spark

See below for instructions on how to get started or visit the documentation.

Getting started

Prerequisites

Python 3.6 or above, and pip
Pandas 1.0+
PySpark 3.0+ (if using Spark)
Make (if installing from source)

Install with pip

Cape Python is available through PyPi.

pip install cape-privacy

Support for Apache Spark is optional. If you plan on using the library together with Apache Spark, we suggest the following instead:

pip install cape-privacy[spark]

We recommend running it in a virtual environment, such as venv.

Install from source

It is possible to install the library from source. This installs all dependencies, including Apache Spark:

git clone https://github.com/capeprivacy/cape-python.git
cd cape-python
make bootstrap

Usage example

This example is an abridged version of the tutorial found here

df = pd.DataFrame({
    "name": ["alice", "bob"],
    "age": [34, 55],
    "birthdate": [pd.Timestamp(1985, 2, 23), pd.Timestamp(1963, 5, 10)],
})

tokenize = Tokenizer(max_token_len=10, key=b"my secret")
perturb_numeric = NumericPerturbation(dtype=dtypes.Integer, min=-10, max=10)

df["name"] = tokenize(df["name"])
df["age"] = perturb_numeric(df["age"])

print(df.head())
# >>
#          name  age  birthdate
# 0  f42c2f1964   34 1985-02-23
# 1  2e586494b2   63 1963-05-10

These steps can be saved in policy files so you can share them and collaborate with your team:

# my-policy.yaml
label: my-policy
version: 1
rules:
  - match:
      name: age
    actions:
      - transform:
          type: numeric-perturbation
          dtype: Integer
          min: -10
          max: 10
          seed: 4984
  - match:
      name: name
    actions:
      - transform:
          type: tokenizer
          max_token_len: 10
          key: my secret

You can then load this policy and apply it to your data frame:

# df can be a Pandas or Spark data frame 
policy = cape.parse_policy("my-policy.yaml")
df = cape.apply_policy(policy, df)

print(df.head())
# >>
#          name  age  birthdate
# 0  f42c2f1964   34 1985-02-23
# 1  2e586494b2   63 1963-05-10

You can see more examples and usage here or in our documentation.

About Cape Privacy and Cape Python

Cape Privacy helps teams share data and make decisions for safer and more powerful data science. Learn more at capeprivacy.com.

Cape Python brings Cape's policy language to Pandas and Apache Spark. The supported techniques include tokenization with linkability as well as perturbation and rounding. You can experiment with these techniques programmatically, in Python or in human-readable policy files.

Cape architecture

Cape is comprised of multiples services and libraries. You can use Cape Python as a standalone library, or you can integrate it with the Coordinator in Cape Core, which supports user and policy management.

Project status and roadmap

Cape Python 0.1.1 was released 24th June 2020. It is actively maintained and developed, alongside other elements of the Cape ecosystem.

Upcoming features:

Reversible tokenisation: allow reversing of tokenization to reveal the raw value.
Policy audit logging: create logging hooks to allow audit logs for policy downloads and usage in Cape Python.
Expand pipeline integrations: add Apache Beam, Apache Flink, Apache Arrow Flight or Dask integration as another pipeline we can support, either as part of Cape Python or in its own separate project.

The goal is a complete data management ecosystem. Cape Privacy provides Cape Coordinator, to manage policy and users. This will interact with the Cape Privacy libraries (such as Cape Python) through a workers interface, and with your own data services through an API.

Help and resources

If you need help using Cape Python, you can:

View the documentation.
Submit an issue.
Talk to us on our community Slack.

Please file feature requests and bug reports as GitHub issues.

Community

Contributing

View our contributing guide for more information.

Code of conduct

Our code of conduct is included on the Cape Privacy website. All community members are expected to follow it. Please refer to that page for information on how to report problems.

License

Licensed under Apache License, Version 2.0 (see LICENSE or http://www.apache.org/licenses/LICENSE-2.0). Copyright as specified in NOTICE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 125

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗