Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Stars: ✭ 115 (-58.03%)

Mutual labels: pyspark

mapreduce-examples

A collection of mapreduce problems and solutions

Stars: ✭ 23 (-91.61%)

Mutual labels: mapreduce

Federated-Learning-and-Split-Learning-with-raspberry-pi

SRDS 2020: End-to-End Evaluation of Federated Learning and Split Learning for Internet of Things

Stars: ✭ 54 (-80.29%)

Mutual labels: distributed-computing

interbit

To the end of servers

Stars: ✭ 23 (-91.61%)

Mutual labels: distributed-computing

good-karma-kit

😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...

Stars: ✭ 238 (-13.14%)

Mutual labels: distributed-computing

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (-58.03%)

Mutual labels: pyspark

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+797.45%)

Mutual labels: pyspark

connected-component

Map Reduce Implementation of Connected Component on Apache Spark

Stars: ✭ 68 (-75.18%)

Mutual labels: mapreduce

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (-90.88%)

Mutual labels: pyspark

tasq

A simple task queue implementation to enqeue jobs on local or remote processes.

Stars: ✭ 83 (-69.71%)

Mutual labels: distributed-computing

dask-pytorch-ddp

dask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.

Stars: ✭ 50 (-81.75%)

Mutual labels: distributed-computing

kar

KAR: A Runtime for the Hybrid Cloud

Stars: ✭ 18 (-93.43%)

Mutual labels: distributed-computing

databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Stars: ✭ 19 (-93.07%)

Mutual labels: pyspark

swarm-learning

A simplified library for decentralized, privacy preserving machine learning

Stars: ✭ 142 (-48.18%)

Mutual labels: distributed-computing

python mozetl

ETL jobs for Firefox Telemetry

Stars: ✭ 25 (-90.88%)

Mutual labels: pyspark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-88.32%)

Mutual labels: pyspark

tutorial

Tutorials to help you build your first Swim app

Stars: ✭ 27 (-90.15%)

Mutual labels: distributed-computing

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-94.16%)

Mutual labels: pyspark

pycondor

Build and submit workflows to HTCondor in Python

Stars: ✭ 23 (-91.61%)

Mutual labels: distributed-computing

Charm4py

Parallel Programming with Python and Charm++

Stars: ✭ 259 (-5.47%)

Mutual labels: distributed-computing

machinaris

An easy-to-use WebUI for crypto plotting and farming. Offers Plotman, MadMax, Chiadog, Bladebit, Farmr, and Forktools in a Docker container. Supports Chia, MMX, Chives, Flax, HDDCoin, and BPX among others.

Stars: ✭ 324 (+18.25%)

Mutual labels: distributed-computing

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-93.43%)

Mutual labels: mapreduce

optimism-v2

ARCHIVE of monorepo implementing Boba, an L2 Compute solution built on Optimistic Ethereum - active repo is at https://github.com/bobanetwork/boba

Stars: ✭ 34 (-87.59%)

Mutual labels: distributed-computing

interview-refresh-java-bigdata

a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.

Stars: ✭ 25 (-90.88%)

Mutual labels: mapreduce

raven-distribution-framework

Decentralized Computing Backend for Artificial Intelligence, Web3, Metaverse, and Gaming Application

Stars: ✭ 31 (-88.69%)

Mutual labels: distributed-computing

mapreduce

A in-process MapReduce library to help you optimizing service response time or concurrent task processing.

Stars: ✭ 93 (-66.06%)

Mutual labels: mapreduce

rce

Distributed, workflow-driven integration environment

Stars: ✭ 42 (-84.67%)

Mutual labels: distributed-computing

SadlyDistributed

Distributing your code(soul), in almost any language(state), among a cluster of idle browsers(voids)

Stars: ✭ 20 (-92.7%)

Mutual labels: distributed-computing

Sparkora

Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟

Stars: ✭ 51 (-81.39%)

Mutual labels: pyspark

QCFractal

A distributed compute and database platform for quantum chemistry.

Stars: ✭ 107 (-60.95%)

Mutual labels: distributed-computing

oshinko-s2i

This is a place to put s2i images and utilities for spark application builders for openshift

Stars: ✭ 16 (-94.16%)

Mutual labels: pyspark

PFL-Non-IID

The origin of the Non-IID phenomenon is the personalization of users, who generate the Non-IID data. With Non-IID (Not Independent and Identically Distributed) issues existing in the federated learning setting, a myriad of approaches has been proposed to crack this hard nut. In contrast, the personalized federated learning may take the advantage…

Stars: ✭ 58 (-78.83%)

Mutual labels: distributed-computing

protoactor-python

Proto Actor - Ultra fast distributed actors

Stars: ✭ 78 (-71.53%)

Mutual labels: distributed-computing

distex

Distributed process pool for Python

Stars: ✭ 101 (-63.14%)

Mutual labels: distributed-computing

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-93.43%)

Mutual labels: mapreduce

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-90.88%)

Mutual labels: pyspark

flask-spark-docker

Just a boilerplate for PySpark and Flask

Stars: ✭ 32 (-88.32%)

Mutual labels: pyspark

IoTPy

Python for streams

Stars: ✭ 24 (-91.24%)

Mutual labels: distributed-computing

microcore

.NET Core framework for inter-service communication

Stars: ✭ 24 (-91.24%)

Mutual labels: distributed-computing

realtimemap-dotnet

A showcase for Proto.Actor - an ultra-fast distributed actors solution for Go, C#, and Java/Kotlin.

Stars: ✭ 47 (-82.85%)

Mutual labels: distributed-computing

python-json-socket

JSON messaging based socket interface with multi-threaded server and client

Stars: ✭ 52 (-81.02%)

Mutual labels: distributed-computing

Prime95

Prime95 source code from GIMPS to find Mersenne Prime.

Stars: ✭ 25 (-90.88%)

Mutual labels: distributed-computing

OSCI

Open Source Contributor Index

Stars: ✭ 107 (-60.95%)

Mutual labels: pyspark

pyspark-asyncactions

Asynchronous actions for PySpark

Stars: ✭ 30 (-89.05%)

Mutual labels: pyspark

blockchain-reading-list

A reading list on blockchain and related technologies, targeted at technical people who want a deep understanding of those topics.

Stars: ✭ 93 (-66.06%)

Mutual labels: distributed-computing

Spark-for-data-engineers

Apache Spark for data engineers

Stars: ✭ 22 (-91.97%)

Mutual labels: pyspark

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-89.42%)

Mutual labels: pyspark

protoactor-go

Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin

Stars: ✭ 4,138 (+1410.22%)

Mutual labels: distributed-computing

Gleam

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

Stars: ✭ 2,949 (+976.28%)

Mutual labels: distributed-computing

data-parallelism

juliafolds.github.io/data-parallelism/

Stars: ✭ 22 (-91.97%)

Mutual labels: distributed-computing

mmtf-workshop-2018

Structural Bioinformatics Training Workshop & Hackathon 2018

Stars: ✭ 50 (-81.75%)

Mutual labels: pyspark

61-120 of 308 similar projects

‹

›