Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-71.59%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (+782.95%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-86.36%)
Selinon An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (-55.11%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+94.13%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-31.63%)
NakedtensorBare bone examples of machine learning in TensorFlow
Stars: ✭ 2,443 (+362.69%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-71.21%)
dislibThe Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-92.61%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (-75.95%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (-31.44%)
Protoactor GoProto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin
Stars: ✭ 3,934 (+645.08%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (-17.42%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+766.67%)
BigtopMirror of Apache Bigtop
Stars: ✭ 356 (-32.58%)
Devops RoadmapDevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.
Stars: ✭ 349 (-33.9%)
StroomStroom is a highly scalable data storage, processing and analysis platform.
Stars: ✭ 344 (-34.85%)
Opendata.cern.chSource code for the CERN Open Data portal
Stars: ✭ 411 (-22.16%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (-37.5%)
ParacelDistributed training framework with parameter server
Stars: ✭ 335 (-36.55%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (-3.98%)
Easylambdadistributed dataflows with functional list operations for data processing with C++14
Stars: ✭ 475 (-10.04%)
MockneatMockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.
Stars: ✭ 410 (-22.35%)
Platon GoGolang implementation of the PlatON protocol
Stars: ✭ 331 (-37.31%)
DatafuseDatafuse is a free Cloud-Native Analytics DBMS(Inspired by ClickHouse) implemented in Rust
Stars: ✭ 327 (-38.07%)
CoulerUnified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
Stars: ✭ 405 (-23.3%)
TezApache Tez
Stars: ✭ 313 (-40.72%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+4075.76%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-31.44%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-5.49%)
DiplomatA HTTP Ruby API for Consul
Stars: ✭ 358 (-32.2%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-19.32%)
VespaThe open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+609.66%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-21.59%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (-8.14%)
Grouparoo🦘 The Grouparoo Monorepo - open source customer data sync framework
Stars: ✭ 334 (-36.74%)
Cogcomp NlpCogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (-22.35%)
ArkimeArkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+845.83%)
SleuthA Go library for master-less peer-to-peer autodiscovery and RPC between HTTP services
Stars: ✭ 331 (-37.31%)
Decentralized InternetA SDK/library for decentralized web and distributing computing projects
Stars: ✭ 406 (-23.11%)
FishnetDistributed Stockfish analysis for lichess.org
Stars: ✭ 306 (-42.05%)
RedisliteRedis in a python module.
Stars: ✭ 464 (-12.12%)
Awesome Federated Computing📚 👓 A collection of research papers, codes, tutorials and blogs on Federated Computing/Learning.
Stars: ✭ 314 (-40.53%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (-40.91%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-3.98%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+639.2%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-26.33%)
MistServerless proxy for Spark cluster
Stars: ✭ 309 (-41.48%)
FluidFluid, elastic data abstraction and acceleration for BigData/AI applications in cloud
Stars: ✭ 265 (-49.81%)
IgniteApache Ignite
Stars: ✭ 4,027 (+662.69%)
HelixMirror of Apache Helix
Stars: ✭ 304 (-42.42%)
MorpheusMorpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (-42.61%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (-14.02%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+622.16%)