Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-1.74%)
K8s Ingress ClaimAn admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Stars: ✭ 14 (-87.83%)
Dremio OssDremio - the missing link in modern data
Stars: ✭ 862 (+649.57%)
AccumuloApache Accumulo
Stars: ✭ 857 (+645.22%)
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-77.39%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+642.61%)
bftkvA distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-76.52%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (+10.43%)
AmbariMirror of Apache Ambari
Stars: ✭ 1,576 (+1270.43%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-83.48%)
spark-connectorA connector for Apache Spark to access Exasol
Stars: ✭ 13 (-88.7%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (+610.43%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+584.35%)
mascMicrosoft's contributions for Spark with Apache Accumulo
Stars: ✭ 20 (-82.61%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-79.13%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+5628.7%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (+487.83%)
spark-rootApache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-75.65%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (+441.74%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+4818.26%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+4693.91%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+2546.96%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (+394.78%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+7026.96%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (+384.35%)
CboardAn easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+2330.43%)
ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-79.13%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1242.61%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+4377.39%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+113.91%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+340.87%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+3879.13%)
TrafodionApache Trafodion
Stars: ✭ 242 (+110.43%)
RedisliteRedis in a python module.
Stars: ✭ 464 (+303.48%)
falconMirror of Apache Falcon
Stars: ✭ 95 (-17.39%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (+294.78%)
Selinon An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (+106.09%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+19072.17%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+270.43%)
Books整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Stars: ✭ 222 (+93.04%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+260%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-4.35%)
sparkApache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+429.57%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-62.61%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-5.22%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-6.96%)
merkle-dbHigh-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-61.74%)