All Projects → skale-me → Skale

skale-me / Skale

Licence: apache-2.0
High performance distributed data processing engine

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Skale

Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+17.44%)
Mutual labels:  aws-s3, cluster
Goofys
a high-performance, POSIX-ish Amazon S3 file system written in Go
Stars: ✭ 3,932 (+908.21%)
Mutual labels:  aws-s3, azure-storage
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-95.13%)
Mutual labels:  parquet, azure-storage
Storage
💿 Storage abstractions with implementations for .NET/.NET Standard
Stars: ✭ 380 (-2.56%)
Mutual labels:  aws-s3, azure-storage
node-storage
📬 A unified file storage library for storage in cloud or on premise
Stars: ✭ 29 (-92.56%)
Mutual labels:  aws-s3, azure-storage
BlobHelper
BlobHelper is a common, consistent storage interface for Microsoft Azure, Amazon S3, Komodo, Kvpbase, and local filesystem written in C#.
Stars: ✭ 23 (-94.1%)
Mutual labels:  aws-s3, azure-storage
Zenko
Zenko is the open source multi-cloud data controller: own and keep control of your data on any cloud.
Stars: ✭ 353 (-9.49%)
Mutual labels:  aws-s3, azure-storage
Sparklens
Qubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (-11.54%)
Mutual labels:  cluster
Diplomat
A HTTP Ruby API for Consul
Stars: ✭ 358 (-8.21%)
Mutual labels:  cluster
Ckss Certified Kubernetes Security Specialist
This repository is a collection of resources to prepare for the Certified Kubernetes Security Specialist (CKSS) exam.
Stars: ✭ 333 (-14.62%)
Mutual labels:  cluster
S3mock
A simple mock implementation of the AWS S3 API startable as Docker image, JUnit 4 rule, or JUnit Jupiter extension
Stars: ✭ 332 (-14.87%)
Mutual labels:  aws-s3
Tensorflowonspark
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+861.03%)
Mutual labels:  cluster
Nebula
Nebula is a powerful framwork for building highly concurrent, distributed, and resilient message-driven applications for C++.
Stars: ✭ 385 (-1.28%)
Mutual labels:  cluster
Dotnext
Next generation API for .NET
Stars: ✭ 379 (-2.82%)
Mutual labels:  cluster
Azure Spring Boot
Spring Boot Starters for Azure services
Stars: ✭ 352 (-9.74%)
Mutual labels:  azure-storage
Parquet Cpp
Apache Parquet
Stars: ✭ 339 (-13.08%)
Mutual labels:  parquet
Kontraktor
distributed Actors for Java 8 / JavaScript
Stars: ✭ 333 (-14.62%)
Mutual labels:  cluster
Nodejsstarterkit
Starter Kit for Node.js v14.x, minimum dependencies 🚀
Stars: ✭ 348 (-10.77%)
Mutual labels:  cluster
Swarmlet
A self-hosted, open-source Platform as a Service that enables easy swarm deployments, load balancing, automatic SSL, metrics, analytics and more.
Stars: ✭ 373 (-4.36%)
Mutual labels:  cluster
Oap
Optimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-12.05%)
Mutual labels:  parquet

logo

Build Status Build Status npm badge

High performance distributed data processing and machine learning.

Skale provides a high-level API in Javascript and an optimized parallel execution engine on top of NodeJS.

Features

  • Pure javascript implementation of a Spark like engine
  • Multiple data sources: filesystems, databases, cloud (S3, azure)
  • Multiple data formats: CSV, JSON, Columnar (Parquet)...
  • 50 high level operators to build parallel apps
  • Machine learning: scalable classification, regression, clusterization
  • Run interactively in a nodeJS REPL shell
  • Docker ready, simple local mode or full distributed mode
  • Very fast, see benchmark

Quickstart

npm install skale

Word count example:

var sc = require('skale').context();

sc.textFile('/my/path/*.txt')
  .flatMap(line => line.split(' '))
  .map(word => [word, 1])
  .reduceByKey((a, b) => a + b, 0)
  .count(function (err, result) {
    console.log(result);
    sc.end();
  });

Local mode

In local mode, worker processes are automatically forked and communicate with app through child process IPC channel. This is the simplest way to operate, and it allows to use all machine available cores.

To run in local mode, just execute your app script:

node my_app.js

or with debug traces:

SKALE_DEBUG=2 node my_app.js

Distributed mode

In distributed mode, a cluster server process and worker processes must be started prior to start app. Processes communicate with each other via raw TCP or via websockets.

To run in distributed cluster mode, first start a cluster server on server_host:

./bin/server.js

On each worker host, start a worker controller process which connects to server:

./bin/worker.js -H server_host

Then run your app, setting the cluster server host in environment:

SKALE_HOST=server_host node my_app.js

The same with debug traces:

SKALE_HOST=server_host SKALE_DEBUG=2 node my_app.js

Resources

Authors

The original authors of skale are Cedric Artigue and Marc Vertes.

List of all contributors

License

Apache-2.0

Credits

Logo Icon made by Smashicons from www.flaticon.com is licensed by CC 3.0 BY
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].