waggle-danceHive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Stars: ✭ 194 (+351.16%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (+123.26%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+890.7%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (+134.88%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+10553.49%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+111.63%)
HiveApache Hive
Stars: ✭ 4,031 (+9274.42%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (+46.51%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+3665.12%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (+167.44%)
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-34.88%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+225.58%)
Docker Registry PrunerTool to apply retention logic to docker images in a Docker Registry
Stars: ✭ 122 (+183.72%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+667.44%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+30032.56%)
apiaryApiary provides modules which can be combined to create a federated cloud data lake
Stars: ✭ 30 (-30.23%)
HelicalinsightHelical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Stars: ✭ 214 (+397.67%)
nifiDeploy a secured, clustered, auto-scaling NiFi service in AWS.
Stars: ✭ 37 (-13.95%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+55.81%)
terraform-aws-sftpThis terraform module is used to create sftp on AWS for S3.
Stars: ✭ 20 (-53.49%)
common-datax基于DataX的通用数据同步微服务,一个Restful接口搞定所有通用数据同步
Stars: ✭ 51 (+18.6%)
hiveql-parserHiveQL Parser. Parse HiveQL code and print AST in JSON format if success, else print well formed syntax error message.
Stars: ✭ 25 (-41.86%)
go-localstackGo Wrapper for using localstack
Stars: ✭ 56 (+30.23%)
silly-androidAndroid plugins for Java, making core Android APIs easy to use
Stars: ✭ 40 (-6.98%)
commentatorA simple commenting system for your blog.
Stars: ✭ 29 (-32.56%)
CS Book🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (-6.98%)
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (+25.58%)
databricks-dbapiDBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
Stars: ✭ 21 (-51.16%)
mining-campEasy automated configuration and deployment of Minecraft servers on AWS spot instances, featuring automatic backups and restoration using S3.
Stars: ✭ 43 (+0%)
simple-ddl-parserSimple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
Stars: ✭ 76 (+76.74%)
mlflow-dockerReady to run docker-compose configuration for ML Flow with Mysql and Minio S3
Stars: ✭ 146 (+239.53%)
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+509.3%)
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-39.53%)
awesome-hiveA curated list of awesome Hive resources.
Stars: ✭ 20 (-53.49%)
datajoint-pythonRelational data pipelines for the science lab
Stars: ✭ 140 (+225.58%)
athena-sqliteA SQLite driver for S3 and Amazon Athena 😳
Stars: ✭ 82 (+90.7%)
minio-dartUnofficial MinIO Dart Client SDK that provides simple APIs to access any Amazon S3 compatible object storage server.
Stars: ✭ 42 (-2.33%)
terraform-s3-userA Terraform module that creates a tagged S3 bucket and an IAM user/key with access to the bucket
Stars: ✭ 20 (-53.49%)
s3cr3tA supercharged S3 reverse proxy
Stars: ✭ 55 (+27.91%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-25.58%)
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (+30.23%)
data-profilinga set of scripts to pull meta data and data profiling metrics from relational database systems
Stars: ✭ 57 (+32.56%)
django-s3fileA lightweight file upload input for Django and Amazon S3
Stars: ✭ 66 (+53.49%)
spark-rootApache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-34.88%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-41.86%)
react-native-appsync-s3React Native app for image uploads to S3 and storing their records in Amazon DynamoDB using AWS Amplify and AppSync SDK
Stars: ✭ 18 (-58.14%)
pysorterA command line utility for organizing files and directories according to regex patterns.
Stars: ✭ 40 (-6.98%)