dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-73.68%)
presto-client-phpA Presto client for the PHP programming language.
Stars: ✭ 24 (-78.95%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-35.09%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-87.72%)
Presto Redispresto-redis is an experimental sql layer for redis
Stars: ✭ 18 (-84.21%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+2057.02%)
Wifi基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (-18.42%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+439.47%)
AtsdAxibase Time Series Database Documentation
Stars: ✭ 68 (-40.35%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+616.67%)
Parquet Dotnet🏐 Apache Parquet for modern .NET
Stars: ✭ 276 (+142.11%)
PyhivePython interface to Hive and Presto. 🐝
Stars: ✭ 1,378 (+1108.77%)
centurionKotlin Bigdata Toolkit
Stars: ✭ 320 (+180.7%)
Winutilswinutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Stars: ✭ 657 (+476.32%)
meepo异构存储数据迁移
Stars: ✭ 29 (-74.56%)
JumbuneJumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-43.86%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-82.46%)
TonyTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 626 (+449.12%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-70.18%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-64.04%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+4861.4%)
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+871.93%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-14.91%)
Tf YarnTrain TensorFlow models on YARN in just a few lines of code!
Stars: ✭ 76 (-33.33%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+651.75%)
Hadoop Mini Clustershadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Stars: ✭ 265 (+132.46%)
py-hdfs-mountMount HDFS with fuse, works with kerberos!
Stars: ✭ 13 (-88.6%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+1528.07%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (+328.07%)
cmuxA set of commands for managing CDH clusters using Cloudera Manager REST API.
Stars: ✭ 34 (-70.18%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-49.12%)
platys-modern-data-platformSupport for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....
Stars: ✭ 35 (-69.3%)
School Of SreAt LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Stars: ✭ 5,141 (+4409.65%)
ros hadoopHadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Stars: ✭ 92 (-19.3%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-24.56%)
cobra-policytoolManage Apache Atlas and Ranger configuration for your Hadoop environment.
Stars: ✭ 16 (-85.96%)
Presto EthereumPresto Ethereum Connector -- SQL on Ethereum
Stars: ✭ 450 (+294.74%)
parquet-usqlA custom extractor designed to read parquet for Azure Data Lake Analytics
Stars: ✭ 13 (-88.6%)
fsbrowserFast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (-76.32%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+5170.18%)
MLHadoopThis repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.
Stars: ✭ 50 (-56.14%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+263.16%)
RoapiCreate full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (+121.93%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-82.46%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-78.07%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+948.25%)
Hadoop PotA scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
Stars: ✭ 8 (-92.98%)
pulsephData Pulse application log aggregation and monitoring
Stars: ✭ 13 (-88.6%)
kubesqlA tool based on presto using sql to query the resources of kubernetes, such as pods, nodes and so on.
Stars: ✭ 56 (-50.88%)