Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → criteo → Cluster Pack

criteo / Cluster Pack

Licence: apache-2.0

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

Programming Languages

python

139335 projects - #7 most used programming language

Labels

s3 pyspark hdfs

Projects that are alternatives of or similar to Cluster Pack

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (+152.17%)

Mutual labels: s3, hdfs

Seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Stars: ✭ 13,380 (+58073.91%)

Mutual labels: s3, hdfs

Tiledb

The Universal Storage Engine

Stars: ✭ 1,072 (+4560.87%)

Mutual labels: s3, hdfs

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+552.17%)

Mutual labels: pyspark, hdfs

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (+8.7%)

Mutual labels: s3, pyspark

Smart open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Stars: ✭ 2,306 (+9926.09%)

Mutual labels: s3, hdfs

Tiledb Py

Python interface to the TileDB storage manager

Stars: ✭ 78 (+239.13%)

Mutual labels: s3, hdfs

Kafka Connect Ui

Web tool for Kafka Connect |

Stars: ✭ 388 (+1586.96%)

Mutual labels: s3, hdfs

kafka-connect-fs

Kafka Connect FileSystem Connector

Stars: ✭ 107 (+365.22%)

Mutual labels: s3, hdfs

Storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

Stars: ✭ 232 (+908.7%)

Mutual labels: s3, hdfs

Juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Stars: ✭ 4,262 (+18430.43%)

Mutual labels: s3, hdfs

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+1665.22%)

Mutual labels: pyspark, hdfs

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+2652.17%)

Mutual labels: pyspark

Hasura Backend Plus

🔑Auth and 📦Storage for Hasura. The quickest way to get Auth and Storage working for your next app based on Hasura.

Stars: ✭ 776 (+3273.91%)

Mutual labels: s3

Stock Analysis Engine

Backtest 1000s of minute-by-minute trading algorithms for training AI with automated pricing data from: IEX, Tradier and FinViz. Datasets and trading performance automatically published to S3 for building AI training datasets for teaching DNNs how to trade. Runs on Kubernetes and docker-compose. >150 million trading history rows generated from +5000 algorithms. Heads up: Yahoo's Finance API was disabled on 2019-01-03 https://developer.yahoo.com/yql/

Stars: ✭ 605 (+2530.43%)

Mutual labels: s3

Kodexplorer

A web based file manager,web IDE / browser based code editor

Stars: ✭ 5,490 (+23769.57%)

Mutual labels: s3

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-78.26%)

Mutual labels: hdfs

Pgbackrest

Reliable PostgreSQL Backup & Restore

Stars: ✭ 766 (+3230.43%)

Mutual labels: s3

S3fs Fuse

FUSE-based file system backed by Amazon S3

Stars: ✭ 5,733 (+24826.09%)

Mutual labels: s3

Django S3direct

Directly upload files to S3 compatible services with Django.

Stars: ✭ 570 (+2378.26%)

Mutual labels: s3

View All Similar Projects ➔

cluster-pack

cluster-pack is a library on top of either pex or conda-pack to make your Python code easily available on a cluster.

Its goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.

The first examples use Skein (a simple library for deploying applications on Apache YARN) and PySpark with HDFS storage. We intend to add more examples for other applications (like Dask, Ray) and S3 storage.

An introducing blog post can be found here.

Installation

Install with Pip

$ pip install cluster-pack

Install from source

$ git clone https://github.com/criteo/cluster-pack
$ cd cluster-pack
$ pip install .

Prerequisites

cluster-pack supports Python ≥3.6.

Features

Ships a package with all the dependencies from your current virtual environment or your conda environment
Stores metadata for an environment
Supports "under development" mode by taking advantage of pip's editable installs mode, all editable requirements will be uploaded all the time, making local changes directly visible on the cluster
Interactive (Jupyter notebook) mode
Provides config helpers to directly use the uploaded zip file inside your application
Launching jobs from jobs by propagating all artifacts

Basic examples with skein

Basic examples with PySpark

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 23

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗