datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (-50%)
awesome-coder-resources编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (+50%)
logparserEasy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
Stars: ✭ 139 (+286.11%)
apiaryApiary provides modules which can be combined to create a federated cloud data lake
Stars: ✭ 30 (-16.67%)
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (+119.44%)
HiveRunnerAn Open Source unit test framework for Hive queries based on JUnit 4 and 5
Stars: ✭ 244 (+577.78%)
flink-learnLearning Flink : Flink CEP,Flink Core,Flink SQL
Stars: ✭ 70 (+94.44%)
datacatalog-tag-managerPython package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format
Stars: ✭ 17 (-52.78%)
radiatorHive Ruby API Client
Stars: ✭ 49 (+36.11%)
columnifyMake record oriented data to columnar format.
Stars: ✭ 28 (-22.22%)
SparkTwitterAnalysisAn Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-19.44%)
Clustering4EverC4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+250%)
starlakeStarlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (-55.56%)
bigquery-data-lineageReference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (+211.11%)
PersonNotes个人笔记集中营,快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧
Stars: ✭ 61 (+69.44%)
cbassadding "simple" to HBase
Stars: ✭ 25 (-30.56%)
intersect一道面试题的思考 - 6000万数据包和300万数据包在50M内存使用环境中求交集
Stars: ✭ 54 (+50%)
NotesThis is a learning note | Java基础,JVM,源码,大数据,面经
Stars: ✭ 69 (+91.67%)
Sub-TrackFlutter Application to keep track of Subscriptions
Stars: ✭ 31 (-13.89%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (+66.67%)
liquibase-impalaLiquibase extension to add Impala Database support
Stars: ✭ 23 (-36.11%)
ucz-dfsA distributed file system written in Rust.
Stars: ✭ 25 (-30.56%)
hdocdbHBase as a JSON Document Database
Stars: ✭ 24 (-33.33%)
disk基于hadoop+hbase+springboot实现分布式网盘系统
Stars: ✭ 53 (+47.22%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (+33.33%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-52.78%)
awesome-hiveA curated list of awesome Hive resources.
Stars: ✭ 20 (-44.44%)
orionManagement and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (+113.89%)
hayabusaHayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data
Stars: ✭ 43 (+19.44%)
replicatorMySQL Replicator. Replicates MySQL tables to Kafka and HBase, keeping the data changes history in HBase.
Stars: ✭ 41 (+13.89%)
UnROOT.jlNative Julia I/O package to work with CERN ROOT files
Stars: ✭ 52 (+44.44%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-33.33%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (+19.44%)
Spark DB ConnectorUse Scala API to read/write data from different databases,HBase,MySQL,etc.
Stars: ✭ 24 (-33.33%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-47.22%)
Lidea大型分布式系统实时监控平台
Stars: ✭ 28 (-22.22%)
databricks-dbapiDBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
Stars: ✭ 21 (-41.67%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+3652.78%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+113.89%)
beemosBEE MOnitoring System: create an infrastructure for monitoring beehives
Stars: ✭ 16 (-55.56%)
fenseFense is a database proxy written in Java, which can connect DB of different engines at the same time. The key features are: authority management, query cache, audit security, current limiting fuse, onesql and so on
Stars: ✭ 22 (-38.89%)
phoenixApache Phoenix / Hbase Spring Boot Microservices
Stars: ✭ 23 (-36.11%)
hive compared bqhive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Stars: ✭ 27 (-25%)
codefoundryExamples for gauravbytes.com
Stars: ✭ 57 (+58.33%)
workflUXAn open-source, cloud-ready web application for simplified deployment of big data workflows.
Stars: ✭ 26 (-27.78%)
coolplayflinkFlink: Stateful Computations over Data Streams
Stars: ✭ 14 (-61.11%)