Eyerissf An Eyeriss Chip (researched by MIT, a CNN accelerator) simulator and New DNN framework "Hive"
Stars: ✭ 68 (+300%)
Stormtweetssentimentd3vizComputes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.
Stars: ✭ 25 (+47.06%)
MLHadoopThis repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.
Stars: ✭ 50 (+194.12%)
albisAlbis: High-Performance File Format for Big Data Systems
Stars: ✭ 20 (+17.65%)
phoenixApache Phoenix / Hbase Spring Boot Microservices
Stars: ✭ 23 (+35.29%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+5288.24%)
hadoop-cryptoLibrary for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Stars: ✭ 38 (+123.53%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+129.41%)
miniparquetLibrary to read a subset of Parquet files
Stars: ✭ 38 (+123.53%)
Docs4dev后端开发常用框架文档及中文翻译,包含 Spring 系列文档(Spring, Spring Boot, Spring Cloud, Spring Security, Spring Session),大数据(Apache Hive, HBase, Apache Flume),日志(Log4j2, Logback),Http Server(NGINX,Apache),Python,数据库(OpenTSDB,MySQL,PostgreSQL)等最新官方文档以及对应的中文翻译。
Stars: ✭ 974 (+5629.41%)
Awkward 0.xManipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (+1170.59%)
lib mysqludf redisProvides Mysql UDF commands to synchronize data from Mysql to Redis.
Stars: ✭ 20 (+17.65%)
hivemindHive API server (offloads most API calls from hived) implemented using Python+SQL
Stars: ✭ 46 (+170.59%)
parquet-extraA collection of Apache Parquet add-on modules
Stars: ✭ 30 (+76.47%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (+541.18%)
hivebergDemonstration of a Hive Input Format for Iceberg
Stars: ✭ 22 (+29.41%)
docker-hadoopDocker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (+247.06%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-70.59%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (+405.88%)
columnifyMake record oriented data to columnar format.
Stars: ✭ 28 (+64.71%)
Pyetlpython ETL framework
Stars: ✭ 33 (+94.12%)
Winutilswinutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Stars: ✭ 657 (+3764.71%)
Gcs ToolsGCS support for avro-tools, parquet-tools and protobuf
Stars: ✭ 57 (+235.29%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+5823.53%)
HiverunnerAn Open Source unit test framework for Hive queries based on JUnit 4 and 5
Stars: ✭ 225 (+1223.53%)
SkaleHigh performance distributed data processing engine
Stars: ✭ 390 (+2194.12%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+1917.65%)
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (+23.53%)
DatabookA facebook for data
Stars: ✭ 26 (+52.94%)
TonyTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 626 (+3582.35%)
PystoreFast data store for Pandas time-series data
Stars: ✭ 325 (+1811.76%)
RatatoolA tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (+1541.18%)
learning-sparkTidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (+64.71%)
Javapdf🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)
Stars: ✭ 609 (+3482.35%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+33170.59%)
HybridBackendEfficient training of deep recommenders on cloud.
Stars: ✭ 30 (+76.47%)
Hadoop Attack LibraryA collection of pentest tools and resources targeting Hadoop environments
Stars: ✭ 228 (+1241.18%)
meepo异构存储数据迁移
Stars: ✭ 29 (+70.59%)
HiveLightweight and blazing fast key-value database written in pure Dart.
Stars: ✭ 2,681 (+15670.59%)
graphiqueGraphQL service for arrow tables and parquet data sets.
Stars: ✭ 28 (+64.71%)
Hadoop ConnectorsLibraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Stars: ✭ 218 (+1182.35%)
parquet-usqlA custom extractor designed to read parquet for Azure Data Lake Analytics
Stars: ✭ 13 (-23.53%)
logparserEasy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
Stars: ✭ 139 (+717.65%)
CalciteApache Calcite
Stars: ✭ 2,816 (+16464.71%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (+152.94%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (+88.24%)
databricks-dbapiDBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
Stars: ✭ 21 (+23.53%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+652.94%)
gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (+129.41%)
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+3505.88%)