Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+2997.33%)

Mutual labels: sql, spark, jdbc

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (+21.33%)

Mutual labels: big-data, spark

H2database

H2 is an embeddable RDBMS written in Java.

Stars: ✭ 3,078 (+4004%)

Mutual labels: sql, jdbc

Ebean

Ebean ORM

Stars: ✭ 1,172 (+1462.67%)

Mutual labels: sql, jdbc

Labs

Research on distributed system

Stars: ✭ 73 (-2.67%)

Mutual labels: spark, big-data

Jaydebeapi

JayDeBeApi module allows you to connect from Python code to databases using Java JDBC. It provides a Python DB-API v2.0 to that database.

Stars: ✭ 247 (+229.33%)

Mutual labels: sql, jdbc

Clickhouse

ClickHouse® is a free analytics DBMS for big data

Stars: ✭ 21,089 (+28018.67%)

Mutual labels: sql, big-data

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-81.33%)

Mutual labels: big-data, spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-82.67%)

Mutual labels: big-data, spark

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-5.33%)

Mutual labels: spark, big-data

Requery

requery - modern SQL based query & persistence for Java / Kotlin / Android

Stars: ✭ 3,071 (+3994.67%)

Mutual labels: sql, jdbc

Db Util

If you are using JPA and Hibernate, this tool can auto-detect N+1 query issues during testing.

Stars: ✭ 194 (+158.67%)

Mutual labels: sql, jdbc

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+382.67%)

Mutual labels: spark, big-data

Doma

DAO oriented database mapping framework for Java 8+

Stars: ✭ 257 (+242.67%)

Mutual labels: sql, jdbc

Clickhouse Native Jdbc

ClickHouse Native Protocol JDBC implementation

Stars: ✭ 310 (+313.33%)

Mutual labels: spark, jdbc

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-13.33%)

Mutual labels: spark, big-data

Jooq

jOOQ is the best way to write SQL in Java

Stars: ✭ 4,695 (+6160%)

Mutual labels: sql, jdbc

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+576%)

Mutual labels: spark, big-data

Kamu Cli

Next generation tool for decentralized exchange and transformation of semi-structured data

Stars: ✭ 69 (-8%)

Mutual labels: sql, spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+7250.67%)

Mutual labels: spark, big-data

Sqlhelper

SQL Tools ( Dialect, Pagination, DDL dump, UrlParser, SqlStatementParser, WallFilter, BatchExecutor for Test) based Java. it is easy to integration into any ORM frameworks

Stars: ✭ 242 (+222.67%)

Mutual labels: sql, jdbc

Quickperf

QuickPerf is a testing library for Java to quickly evaluate and improve some performance-related properties

Stars: ✭ 231 (+208%)

Mutual labels: sql, jdbc

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+26.67%)

Mutual labels: big-data, spark

Java Persistence Frameworks Comparison

Comparison of non-JPA SQL mapping frameworks for Java (Jooq, Spring JDBCTemplate, MyBatis, EBean, JDBI, Speedment, sql2o)

Stars: ✭ 213 (+184%)

Mutual labels: sql, jdbc

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+48%)

Mutual labels: big-data, spark

incubator-linkis

Stars: ✭ 2,459 (+3178.67%)

Mutual labels: spark, jdbc

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (+242.67%)

Mutual labels: spark, big-data

Calcite

Apache Calcite

Stars: ✭ 2,816 (+3654.67%)

Mutual labels: sql, big-data

Crate

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.

Stars: ✭ 3,254 (+4238.67%)

Mutual labels: sql, big-data

Clojureql

ClojureQL is superior SQL integration for Clojure

Stars: ✭ 281 (+274.67%)

Mutual labels: sql, jdbc

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+5104%)

Mutual labels: spark, big-data

Ragtime

Database-independent migration library

Stars: ✭ 519 (+592%)

Mutual labels: sql, jdbc

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+714.67%)

Mutual labels: sql, spark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+828%)

Mutual labels: sql, spark

Sylph

Stream computing platform for bigdata

Stars: ✭ 362 (+382.67%)

Mutual labels: sql, big-data

Micronaut Data

Ahead of Time Data Repositories

Stars: ✭ 352 (+369.33%)

Mutual labels: sql, jdbc

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+29297.33%)

Mutual labels: spark, big-data

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (+460%)

Mutual labels: spark, big-data

Beam

Apache Beam is a unified programming model for Batch and Streaming

Stars: ✭ 5,149 (+6765.33%)

Mutual labels: sql, big-data

Ignite

Apache Ignite

Stars: ✭ 4,027 (+5269.33%)

Mutual labels: sql, big-data

Hibernate Springboot

Collection of best practices for Java persistence performance in Spring Boot applications

Stars: ✭ 589 (+685.33%)

Mutual labels: sql, jdbc

Jailer

Database Subsetting and Relational Data Browsing Tool.

Stars: ✭ 576 (+668%)

Mutual labels: sql, jdbc

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+7441.33%)

Mutual labels: spark, big-data

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+4984%)

Mutual labels: spark, big-data

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-78.67%)

Mutual labels: sql, spark

Spark Doc Zh

Apache Spark 官方文档中文版