DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+198.05%)
CrayonSimple framework agnostic UI router for SPAs
Stars: ✭ 310 (+51.22%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (+920.49%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+49.27%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-67.32%)
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (+45.85%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-38.54%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (+45.85%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-68.29%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (+37.56%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (-2.44%)
Hbase RddSpark RDD to read, write and delete from HBase
Stars: ✭ 277 (+35.12%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-68.78%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+1410.73%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-39.02%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-69.27%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (+26.34%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-25.37%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (+23.9%)
RoffildlibraryLibrary for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-69.27%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-40.49%)
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-77.07%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-70.73%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-90.24%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1128.29%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-87.8%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-70.73%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-40.98%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-71.71%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-73.66%)
StreamlineStreamLine - Streaming Analytics
Stars: ✭ 151 (-26.34%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-91.22%)
blogblog entries
Stars: ✭ 39 (-80.98%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-87.8%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-10.73%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-78.05%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-73.66%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+2589.27%)
Mongo SparkThe MongoDB Spark Connector
Stars: ✭ 588 (+186.83%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-45.85%)
Spark Submit UiThis is a based on playframwork for submit spark app
Stars: ✭ 53 (-74.15%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+1099.51%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+2523.9%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-75.61%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (+0%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+1314.15%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-8.78%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-14.15%)
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (-22.44%)
Apache Spark NodeNode.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (-33.66%)