Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (+581.25%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+197512.5%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+2437.5%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+10162.5%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+506.25%)
RoapiCreate full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (+1481.25%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (+31.25%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+4250%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+262.5%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+11281.25%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (+331.25%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (+81.25%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+2043.75%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+14418.75%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+2156.25%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (+1000%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+2168.75%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+837.5%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+2356.25%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+3718.75%)
PrivescA collection of Windows, Linux and MySQL privilege escalation scripts and exploits.
Stars: ✭ 786 (+4812.5%)
Db DumperDump the contents of a database
Stars: ✭ 744 (+4550%)
SparkctrCTR prediction model based on spark(LR, GBDT, DNN)
Stars: ✭ 740 (+4525%)
RecordsSQL for Humans™
Stars: ✭ 6,761 (+42156.25%)
Sql FormatterA whitespace formatter for different query languages
Stars: ✭ 779 (+4768.75%)
Cdhprojecthadoop各组件使用,持续更新
Stars: ✭ 733 (+4481.25%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+4587.5%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+4856.25%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+4556.25%)
FluentpdoA PHP SQL query builder using PDO
Stars: ✭ 783 (+4793.75%)
Node Typescript Koa RestREST API boilerplate using NodeJS and KOA2, typescript. Logging and JWT as middlewares. TypeORM with class-validator, SQL CRUD. Docker included. Swagger docs, actions CI and valuable README
Stars: ✭ 739 (+4518.75%)
Efcore.pgEntity Framework Core provider for PostgreSQL
Stars: ✭ 838 (+5137.5%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+4450%)
QlbridgeA golang expression evaluator & Library to build SQL query engine based functionality.
Stars: ✭ 721 (+4406.25%)
FramelessExpressive types for Spark.
Stars: ✭ 717 (+4381.25%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+5006.25%)
Nano SqlUniversal database layer for the client, server & mobile devices. It's like Lego for databases.
Stars: ✭ 717 (+4381.25%)
NopcommerceThe most popular open-source eCommerce shopping cart solution based on ASP.NET Core
Stars: ✭ 6,827 (+42568.75%)
Spark RedisA connector for Spark that allows reading and writing to/from Redis cluster
Stars: ✭ 773 (+4731.25%)
ModinModin: Speed up your Pandas workflows by changing a single line of code
Stars: ✭ 6,639 (+41393.75%)
HailScalable genomic data analysis.
Stars: ✭ 706 (+4312.5%)
Sparkling WaterSparkling Water provides H2O functionality inside Spark cluster
Stars: ✭ 887 (+5443.75%)
NutesSQL import of USDA nutrient database
Stars: ✭ 6 (-62.5%)
TypeormTypeORM module for Nest framework (node.js) 🍇
Stars: ✭ 807 (+4943.75%)
SparklyrR interface for Apache Spark
Stars: ✭ 775 (+4743.75%)
BaikaldbBaikalDB, A Distributed HTAP Database.
Stars: ✭ 707 (+4318.75%)
SmartsqlSmartSql = MyBatis in C# + .NET Core+ Cache(Memory | Redis) + R/W Splitting + PropertyChangedTrack +Dynamic Repository + InvokeSync + Diagnostics
Stars: ✭ 775 (+4743.75%)
LeantimeLeantime is a lean project management system for innovators. Designed to help you manage your projects from ideation to delivery.
Stars: ✭ 702 (+4287.5%)
EzsqlPHP class to make interacting with a database ridiculusly easy
Stars: ✭ 804 (+4925%)
TidbTiDB is an open source distributed HTAP database compatible with the MySQL protocol
Stars: ✭ 29,871 (+186593.75%)
SequelizeAn easy-to-use and promise-based multi SQL dialects ORM tool for Node.js
Stars: ✭ 25,422 (+158787.5%)
EralchemyEntity Relation Diagrams generation tool
Stars: ✭ 767 (+4693.75%)