Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-66.87%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-57.67%)
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (-24.54%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-52.15%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-49.69%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+157.67%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-31.9%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (-76.07%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+461.96%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-54.6%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-48.47%)
Spark Bigquery ConnectorBigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-22.7%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-13.5%)
Pulsar FlinkElastic data processing with Apache Pulsar and Apache Flink
Stars: ✭ 126 (-22.7%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-23.31%)
Hadoop HdfsMirror of Apache Hadoop HDFS
Stars: ✭ 152 (-6.75%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-12.88%)
Spark Infotheoretic Feature SelectionThis package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (-24.54%)
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-25.15%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-14.72%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-25.15%)
DeequDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+1139.26%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+7849.08%)
PowderkegLive-coding the cluster!
Stars: ✭ 152 (-6.75%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-14.11%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-25.77%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-14.11%)
StreamlineStreamLine - Streaming Analytics
Stars: ✭ 151 (-7.36%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-26.38%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-26.38%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+887.73%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-14.72%)
Flink DockerDocker packaging for Apache Flink
Stars: ✭ 118 (-27.61%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-28.22%)
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (-2.45%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-6.75%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-7.36%)
Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-14.72%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+893.25%)
XlearningAI on Hadoop
Stars: ✭ 1,709 (+948.47%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+7251.53%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-28.83%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-30.06%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-30.06%)
Spring Shiro SparkSpring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
Stars: ✭ 114 (-30.06%)
Hadoop CommonMirror of Apache Hadoop common
Stars: ✭ 155 (-4.91%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+1025.77%)
Apache Spark NodeNode.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (-16.56%)
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-30.06%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-31.29%)