Basehttps://www.researchgate.net/profile/Rajah_Iyer
Stars: ✭ 48 (-62.5%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (+146.88%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+1088.28%)
CookFair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Stars: ✭ 314 (+145.31%)
Hadoop BookExample source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
Stars: ✭ 3,317 (+2491.41%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-31.25%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+139.84%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+700.78%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+0%)
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-4.69%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-10.94%)
SrcA light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-47.66%)
Dev SetupmacOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Stars: ✭ 5,590 (+4267.19%)
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (+133.59%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (+133.59%)
WhirlFast iterative local development and testing of Apache Airflow workflows
Stars: ✭ 111 (-13.28%)
BehemothBehemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Stars: ✭ 286 (+123.44%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+679.69%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (+120.31%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+3478.91%)
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Stars: ✭ 995 (+677.34%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (+117.19%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-6.25%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+112.5%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+670.31%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+2319.53%)
FlintWebex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)
Stars: ✭ 85 (-33.59%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (+98.44%)
Jsr203 HadoopA Java NIO file system provider for HDFS
Stars: ✭ 35 (-72.66%)
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-63.28%)
ObjinsyncContinuously synchronize directories from remote object store to local filesystem
Stars: ✭ 29 (-77.34%)
Spark Bigquery ConnectorBigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-1.56%)
ThingsboardOpen-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+8123.44%)
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+378.91%)
AkkeeperAn easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-76.56%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-75%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+377.34%)
pulsephData Pulse application log aggregation and monitoring
Stars: ✭ 13 (-89.84%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+645.31%)
dbt-on-airflowNo description or website provided.
Stars: ✭ 30 (-76.56%)
Spark FfmFFM (Field-Awared Factorization Machine) on Spark
Stars: ✭ 101 (-21.09%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-49.22%)
Incubator DolphinschedulerApache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
Stars: ✭ 6,916 (+5303.13%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+4207.03%)
Spark BigqueryGoogle BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-49.22%)
Mongo SparkThe MongoDB Spark Connector
Stars: ✭ 588 (+359.38%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-10.94%)
JumbuneJumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-50%)
Hadoop study定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Stars: ✭ 567 (+342.97%)