LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (-5.53%)
Mutual labels: spark, presto, hive, storage, jdbc, engine, impala, pyspark, udf, thrift-server, resource-manager, jobserver, application-manager, livy, hive-table, linkis, context-service, scriptis YanagishimaWeb UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Stars: ✭ 424 (-82.76%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+86.3%)
Sqliorm sql interface, Criteria, CriteriaBuilder, ResultMapBuilder
Stars: ✭ 1,644 (-33.14%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+387.31%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-91.22%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (-85.24%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (-71.7%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-97.44%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-97.4%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-96.99%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-96.67%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-96.58%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-96.26%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-95.61%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-95.61%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-45.59%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-94.88%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (-94.02%)
QuillCompile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (-18.75%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (-59.9%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (-61.2%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-97.07%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+1185.81%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (-51.4%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-99.43%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-95.73%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+2.4%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-96.3%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-94.27%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (-25.95%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-93.9%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-99.51%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-98.7%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (-74.99%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-92.84%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+17.89%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (-90.32%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (-91.87%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-93.57%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-99.31%)
hive to es同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-99.15%)
liquibase-impalaLiquibase extension to add Impala Database support
Stars: ✭ 23 (-99.06%)
hive-jdbc-driverAn alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-98.74%)
TiBigDataTiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (-92.19%)
implyrSQL backend to dplyr for Impala
Stars: ✭ 74 (-96.99%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-98.94%)