HiveApache Hive
Stars: ✭ 4,031 (+6618.33%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (+713.33%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-58.33%)
MLHadoopThis repository contains Machine-Learning MapReduce codes for Hadoop which are written from scratch (without using any package or library). E.g. Prediction (Linear and Logistic Regression), Clustering (K-Means), Classification (KNN) etc.
Stars: ✭ 50 (-16.67%)
clickhouse hadoopImport data from clickhouse to hadoop with pure SQL
Stars: ✭ 26 (-56.67%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-91.67%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-60%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (+510%)
ibisIBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Stars: ✭ 48 (-20%)
darwinAvro Schema Evolution made easy
Stars: ✭ 26 (-56.67%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+501.67%)
hive-jdbc-driverAn alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-48.33%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+1340%)
Gis Tools For HadoopThe GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Stars: ✭ 485 (+708.33%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+8.33%)
hadoop-cryptoLibrary for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Stars: ✭ 38 (-36.67%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+503.33%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-68.33%)
AkkeeperAn easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-50%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+470%)
pyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (+1516.67%)
SparklyrR interface for Apache Spark
Stars: ✭ 775 (+1191.67%)
liquibase-impalaLiquibase extension to add Impala Database support
Stars: ✭ 23 (-61.67%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (+475%)
memex-gateGeneral Architecture for Text Engineering
Stars: ✭ 47 (-21.67%)
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-6.67%)
IqlAn ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Stars: ✭ 341 (+468.33%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+1150%)
dbt-sugardbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Stars: ✭ 139 (+131.67%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+450%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (-21.67%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+1490%)
xxhadoopData Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (-38.33%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+453.33%)
preprocessyPython package for Customizable Data Preprocessing Pipelines
Stars: ✭ 34 (-43.33%)
SparkctrCTR prediction model based on spark(LR, GBDT, DNN)
Stars: ✭ 740 (+1133.33%)
corcAn ORC File Scheme for the Cascading data processing platform.
Stars: ✭ 14 (-76.67%)
CascadingCascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+430%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (-5%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-8.33%)
disk基于hadoop+hbase+springboot实现分布式网盘系统
Stars: ✭ 53 (-11.67%)
TezApache Tez
Stars: ✭ 313 (+421.67%)
Cdhprojecthadoop各组件使用,持续更新
Stars: ✭ 733 (+1121.67%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-11.67%)
School Of SreAt LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Stars: ✭ 5,141 (+8468.33%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+3803.33%)
Nagios Plugins450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+1566.67%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-25%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-45%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (+126.67%)
Data Engineering BookAccumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (+685%)