Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-73.49%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-67.91%)
LabsResearch on distributed system
Stars: ✭ 73 (-66.05%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+5926.51%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+50.23%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-83.72%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-90.7%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-35.35%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-63.72%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-63.26%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-92.56%)
flokkrDocumentation placeholder and utilities for all the other containers.
Stars: ✭ 30 (-86.05%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-61.86%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-62.79%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-14.88%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (+19.53%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (+18.14%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (+38.6%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+1333.02%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+45.12%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+42.79%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+54.42%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+1715.35%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+67.91%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+68.37%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+1460.47%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+82.79%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-60%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-57.21%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+88.84%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (+80.93%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (+94.88%)
Cleanframestype-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-65.12%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-61.4%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-54.88%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+135.81%)
Pdf编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+5485.58%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+653.02%)
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+185.12%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+727.91%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-51.16%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-50.23%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+268.84%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-49.3%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-48.84%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+618.14%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-48.37%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-47.44%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-88.84%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+341.4%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+455.81%)