Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+233.16%)
AbrisAvro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (-31.58%)
Spark Nlp ModelsModels and Pipelines for the Spark NLP library
Stars: ✭ 88 (-53.68%)
Dev SetupmacOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Stars: ✭ 5,590 (+2842.11%)
MilvusAn open-source vector database for embedding similarity search and AI applications.
Stars: ✭ 9,015 (+4644.74%)
SmileStatistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+2748.42%)
Mongo SparkThe MongoDB Spark Connector
Stars: ✭ 588 (+209.47%)
OpaqueAn encrypted data analytics platform
Stars: ✭ 129 (-32.11%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (+193.68%)
FlintWebex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)
Stars: ✭ 85 (-55.26%)
Spark DariaEssential Spark extensions and helper methods ✨😲
Stars: ✭ 553 (+191.05%)
SparkmonitorMonitor Apache Spark from Jupyter Notebook
Stars: ✭ 154 (-18.95%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+2876.84%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-56.32%)
CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (+167.89%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+805.79%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+152.63%)
SparkCross-platform real-time collaboration client optimized for business and organizations.
Stars: ✭ 471 (+147.89%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+140%)
LeharVisualize data using relative ordering
Stars: ✭ 81 (-57.37%)
Bigdataie大数据博客、笔试题、教程、项目、面经的整理
Stars: ✭ 445 (+134.21%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-32.63%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-58.42%)
Dji Firmware ToolsTools for handling firmwares of DJI products, with focus on quadcopters.
Stars: ✭ 424 (+123.16%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-19.47%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+121.05%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+531.05%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+857.37%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+117.37%)
Ganngann(go-approximate-nearest-neighbor) is a library for Approximate Nearest Neighbor Search written in Go
Stars: ✭ 75 (-60.53%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+117.89%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-7.37%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+113.68%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+4874.74%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-33.16%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+10503.68%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-61.05%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+1906.84%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-20.53%)
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+1872.63%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-33.68%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-63.68%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-1.58%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-3.68%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-5.79%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+996.84%)
Knn MattingSource Code for KNN Matting, CVPR 2012 / TPAMI 2013. MATLAB code ready to run. Simple and robust implementation under 40 lines.
Stars: ✭ 130 (-31.58%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+221.58%)