TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (-19.1%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-96.93%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-94.1%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+12.54%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-90.02%)
deepchecksTest Suites for Validating ML Models & Data. Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort.
Stars: ✭ 1,595 (-38.08%)
PolyaxonMachine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+15.14%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-95.26%)
Awesome MlopsA curated list of references for MLOps
Stars: ✭ 7,119 (+176.36%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-80.32%)
oomstoreLightweight and Fast Feature Store Powered by Go (and Rust).
Stars: ✭ 76 (-97.05%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+30.24%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-97.48%)
vertex-ai-samplesSample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud
Stars: ✭ 270 (-89.52%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-85.99%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+48.02%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+755.9%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-71.08%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (-41.34%)
AutodlAutomated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (-66.85%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-99.57%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-97.67%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-97.48%)
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (-95.46%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (-76.24%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (-98.29%)
neptune-client📒 Experiment tracking tool and model registry
Stars: ✭ 348 (-86.49%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-96.31%)
yt-channels-DS-AI-ML-CSA comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Stars: ✭ 1,038 (-59.7%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-89.91%)
cliPolyaxon Core Client & CLI to streamline MLOps
Stars: ✭ 18 (-99.3%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-85.95%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+51.51%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+55.4%)
Metaflow🚀 Build and manage real-life data science projects with ease!
Stars: ✭ 5,108 (+98.29%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (-83.7%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (-81.37%)
Hyperparameter hunterEasy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (-74.84%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-75.43%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (-69.22%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+119.57%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-98.25%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+1127.41%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+114.01%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+281.56%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (-53.45%)
LabsResearch on distributed system
Stars: ✭ 73 (-97.17%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-48.06%)
VickyBytesSubscribe to this GitHub repo to access the latest tech talks, tech demos, learning materials & modules, and developer community updates!
Stars: ✭ 48 (-98.14%)
leetspeekOpen and collaborative content from leet hackers!
Stars: ✭ 11 (-99.57%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (-78.34%)