SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-90.09%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+5.41%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+1358.56%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-81.98%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+3416.22%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+197.3%)
IgniteApache Ignite
Stars: ✭ 4,027 (+3527.93%)
HiveApache Hive
Stars: ✭ 4,031 (+3531.53%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+225.23%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+4866.67%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-95.5%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (+37.84%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+823.42%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+11572.97%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (+50.45%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-28.83%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-18.02%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+28384.68%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-0.9%)
Awkward 0.xManipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (+94.59%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-63.06%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-12.61%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-18.02%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+121.62%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-79.28%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+190.99%)
ParquetviewerSimple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (+30.63%)
CalciteApache Calcite
Stars: ✭ 2,816 (+2436.94%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (+29.73%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+26.13%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+31.53%)
awesome-toolscurated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-72.07%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+2115.32%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+111.71%)
mmtf-sparkMethods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-81.98%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-54.05%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-14.41%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-82.88%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-39.64%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-71.17%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-86.49%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+2026.13%)
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (-29.73%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (-85.59%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-78.38%)
clusterdockclusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-76.58%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-73.87%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-76.58%)