Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-29.91%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-37.38%)
FlintWebex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)
Stars: ✭ 85 (-20.56%)
Cleanframestype-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-29.91%)
RoffildlibraryLibrary for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-41.12%)
Spark Nlp ModelsModels and Pipelines for the Spark NLP library
Stars: ✭ 88 (-17.76%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-32.71%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-40.19%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+1020.56%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-43.93%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1016.82%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-9.35%)
LabsResearch on distributed system
Stars: ✭ 73 (-31.78%)
Spark FfmFFM (Field-Awared Factorization Machine) on Spark
Stars: ✭ 101 (-5.61%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-39.25%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-22.43%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-41.12%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-14.02%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-43.93%)
LeharVisualize data using relative ordering
Stars: ✭ 81 (-24.3%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-27.1%)
Pyspark ExamplesCode examples on Apache Spark using python
Stars: ✭ 58 (-45.79%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+1165.42%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+8733.64%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-17.76%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-30.84%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-35.51%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-9.35%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-19.63%)
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-37.38%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-1.87%)
ThingsboardOpen-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+9737.38%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-21.5%)
Spark BigqueryGoogle BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-39.25%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1150.47%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-23.36%)
Silexsomething to help you spark
Stars: ✭ 61 (-42.99%)
MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (+1051.4%)
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Stars: ✭ 81 (-24.3%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+1563.55%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-1.87%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-13.08%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-26.17%)