All Projects → Big Whale → Similar Projects or Alternatives

651 Open source projects that are alternatives of or similar to Big Whale

Docker Hadoop
A Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-66.87%)
Mutual labels:  spark, hadoop
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-65.03%)
Mutual labels:  spark, hadoop
Kamu Cli
Next generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-57.67%)
Mutual labels:  spark, flink
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (-24.54%)
Mutual labels:  hadoop, flink
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (+24.54%)
Mutual labels:  spark, hadoop
Docker Spark
🚢 Docker image for Apache Spark
Stars: ✭ 78 (-52.15%)
Mutual labels:  spark, hadoop
Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-49.69%)
Mutual labels:  spark, hadoop
Featran
A Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+157.67%)
Mutual labels:  spark, flink
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-31.9%)
Mutual labels:  spark, hadoop
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (-1.23%)
Mutual labels:  spark, hadoop
yuzhouwan
Code Library for My Blog
Stars: ✭ 39 (-76.07%)
Mutual labels:  spark, hadoop
Kylo
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+461.96%)
Mutual labels:  spark, hadoop
Apache Spark Hands On
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-54.6%)
Mutual labels:  spark, hadoop
Hops Examples
Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-48.47%)
Mutual labels:  spark, flink
Spark Bigquery Connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-22.7%)
Mutual labels:  spark
Spark Authorizer
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-13.5%)
Mutual labels:  spark
Pulsar Flink
Elastic data processing with Apache Pulsar and Apache Flink
Stars: ✭ 126 (-22.7%)
Mutual labels:  flink
Scala Samples
There are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-23.31%)
Mutual labels:  spark
Hadoop Hdfs
Mirror of Apache Hadoop HDFS
Stars: ✭ 152 (-6.75%)
Mutual labels:  hadoop
Rasterframes
Geospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-12.88%)
Mutual labels:  spark
Spark Infotheoretic Feature Selection
This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (-24.54%)
Mutual labels:  spark
Dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-25.15%)
Mutual labels:  hadoop
Data science blogs
A repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-14.72%)
Mutual labels:  spark
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-25.15%)
Mutual labels:  spark
Deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+1139.26%)
Mutual labels:  spark
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+7849.08%)
Mutual labels:  hadoop
Powderkeg
Live-coding the cluster!
Stars: ✭ 152 (-6.75%)
Mutual labels:  spark
Azure Event Hubs Spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-14.11%)
Mutual labels:  spark
Zparkio
Boiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-25.77%)
Mutual labels:  spark
Eat pyspark in 10 days
pyspark🍒🥭 is delicious,just eat it!😋😋
Stars: ✭ 116 (-28.83%)
Mutual labels:  spark
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (-14.11%)
Mutual labels:  hadoop
Example Spark Kafka
Apache Spark and Apache Kafka integration example
Stars: ✭ 120 (-26.38%)
Mutual labels:  spark
Streamline
StreamLine - Streaming Analytics
Stars: ✭ 151 (-7.36%)
Mutual labels:  flink
Teddy
Spark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-26.38%)
Mutual labels:  spark
Kinesis Sql
Kinesis Connector for Structured Streaming
Stars: ✭ 120 (-26.38%)
Mutual labels:  spark
Elassandra
Elassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+887.73%)
Mutual labels:  spark
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-14.72%)
Mutual labels:  spark
Flink Docker
Docker packaging for Apache Flink
Stars: ✭ 118 (-27.61%)
Mutual labels:  flink
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-28.22%)
Mutual labels:  hadoop
Vue Info Card
Simple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (-2.45%)
Mutual labels:  spark
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-6.75%)
Mutual labels:  spark
Spark Tsne
Distributed t-SNE via Apache Spark
Stars: ✭ 151 (-7.36%)
Mutual labels:  spark
Isolation Forest
A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-14.72%)
Mutual labels:  spark
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+893.25%)
Mutual labels:  hadoop
Xlearning
AI on Hadoop
Stars: ✭ 1,709 (+948.47%)
Mutual labels:  hadoop
Cube.js
📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+7251.53%)
Mutual labels:  spark
Spark Ml Source Analysis
spark ml 算法原理剖析以及具体的源码实现分析
Stars: ✭ 1,873 (+1049.08%)
Mutual labels:  spark
Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-28.83%)
Mutual labels:  hadoop
Asakusafw
Asakusa Framework
Stars: ✭ 114 (-30.06%)
Mutual labels:  hadoop
Spark Lucenerdd
Spark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-30.06%)
Mutual labels:  spark
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (-15.95%)
Mutual labels:  spark
Tensorflowonyarn
Support TensorFlow on YARN
Stars: ✭ 114 (-30.06%)
Mutual labels:  hadoop
Spring Shiro Spark
Spring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
Stars: ✭ 114 (-30.06%)
Mutual labels:  spark
Hadoop Common
Mirror of Apache Hadoop common
Stars: ✭ 155 (-4.91%)
Mutual labels:  hadoop
Benchm Ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+1025.77%)
Mutual labels:  spark
Apache Spark Node
Node.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (-16.56%)
Mutual labels:  spark
Parquet Go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-30.06%)
Mutual labels:  hadoop
Hbaseclient
HBase客户端数据管理软件
Stars: ✭ 135 (-17.18%)
Mutual labels:  hadoop
Spark Mllib Twitter Sentiment Analysis
🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib
Stars: ✭ 113 (-30.67%)
Mutual labels:  spark
Python Bigdata
Data science and Big Data with Python
Stars: ✭ 112 (-31.29%)
Mutual labels:  spark
61-120 of 651 similar projects