All Projects → dataux → Dataux

dataux / Dataux

Licence: mit
Federated mysql compatible proxy to elasticsearch, mongo, cassandra, big-table, google datastore

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Dataux

Squid
🦑 Provides SQL tagged template strings and schema definition functions.
Stars: ✭ 57 (-78.73%)
Mutual labels:  sql, sql-query, database
Goqu
SQL builder and query library for golang
Stars: ✭ 984 (+267.16%)
Mutual labels:  sql, sql-query, database
Jet
Type safe SQL builder with code generation and automatic query result data mapping
Stars: ✭ 373 (+39.18%)
Mutual labels:  sql, sql-query, database
Tableqa
AI Tool for querying natural language on tabular data.
Stars: ✭ 109 (-59.33%)
Mutual labels:  sql, sql-query, database
Deveeldb
DeveelDB is a complete SQL database system, primarly developed for .NET/Mono frameworks
Stars: ✭ 80 (-70.15%)
Mutual labels:  sql, sql-query, database
Reporting Services Examples
📕 Various example reports I use for SQL Server Reporting Services (SSRS) as well as documents for unit testing, requirements and a style guide template.
Stars: ✭ 63 (-76.49%)
Mutual labels:  sql, sql-query, database
Jooq
jOOQ is the best way to write SQL in Java
Stars: ✭ 4,695 (+1651.87%)
Mutual labels:  sql, sql-query, database
Ebean
Ebean ORM
Stars: ✭ 1,172 (+337.31%)
Mutual labels:  sql, database, elasticsearch
Postguard
🐛 Statically validate Postgres SQL queries in JS / TS code and derive schemas.
Stars: ✭ 104 (-61.19%)
Mutual labels:  sql, sql-query, database
Querybuilder
SQL query builder, written in c#, helps you build complex queries easily, supports SqlServer, MySql, PostgreSql, Oracle, Sqlite and Firebird
Stars: ✭ 2,111 (+687.69%)
Mutual labels:  sql, sql-query, database
Massive Js
A data mapper for Node.js and PostgreSQL.
Stars: ✭ 2,521 (+840.67%)
Mutual labels:  sql, database
Scany
Library for scanning data from a database into Go structs and more
Stars: ✭ 228 (-14.93%)
Mutual labels:  sql, database
Liquibase
Main Liquibase Source
Stars: ✭ 2,910 (+985.82%)
Mutual labels:  sql, database
Materialize
Materialize lets you ask questions of your live data, which it answers and then maintains for you as your data continue to change. The moment you need a refreshed answer, you can get it in milliseconds. Materialize is designed to help you interactively explore your streaming data, perform data warehousing analytics against live relational data, or just increase the freshness and reduce the load of your dashboard and monitoring tasks.
Stars: ✭ 3,341 (+1146.64%)
Mutual labels:  sql, database
Sparrow
A simple database toolkit for PHP
Stars: ✭ 236 (-11.94%)
Mutual labels:  sql, database
React Agent
Client and server-side state management library
Stars: ✭ 235 (-12.31%)
Mutual labels:  sql, database
Granite
ORM Model with Adapters for mysql, pg, sqlite in the Crystal Language.
Stars: ✭ 238 (-11.19%)
Mutual labels:  sql, database
Sqlfiddle3
New version based on vert.x and docker
Stars: ✭ 242 (-9.7%)
Mutual labels:  sql, database
Scenic
Scenic is maintained by Derek Prior, Caleb Hearth, and you, our contributors.
Stars: ✭ 2,856 (+965.67%)
Mutual labels:  sql, database
Db
Data access layer for PostgreSQL, CockroachDB, MySQL, SQLite and MongoDB with ORM-like features.
Stars: ✭ 2,832 (+956.72%)
Mutual labels:  sql, database

Sql Query Proxy to Elasticsearch, Mongo, Kubernetes, BigTable, etc.

Unify disparate data sources and files into a single Federated view of your data and query with SQL without copying into datawarehouse.

Mysql compatible federated query engine to Elasticsearch, Mongo, Google Datastore, Cassandra, Google BigTable, Kubernetes, file-based sources. This query engine hosts a mysql protocol listener, which rewrites sql queries to native (elasticsearch, mongo, cassandra, kuberntes-rest-api, bigtable). It works by implementing a full relational algebra distributed execution engine to run sql queries and poly-fill missing features from underlying sources. So, a backend key-value storage such as cassandra can now have complete WHERE clause support as well as aggregate functions etc.

Most similar to prestodb but in Golang, and focused on easy to add custom data sources as well as REST api sources.

Storage Sources

Features

  • Distributed run queries across multiple servers
  • Hackable Sources Very easy to add a new Source for your custom data, files, json, csv, storage.
  • Hackable Functions Add custom go functions to extend the sql language.
  • Joins Get join functionality between heterogeneous sources.
  • Frontends currently only MySql protocol is supported but RethinkDB (for real-time api) is planned, and are pluggable.
  • Backends Elasticsearch, Google-Datastore, Mongo, Cassandra, BigTable, Kubernetes currently implemented. Csv, Json files, and custom formats (protobuf) are in progress.

Status

  • NOT Production ready. Currently supporting a few non-critical use-cases (ad-hoc queries, support tool) in production.

Try it Out

These examples are:

  1. We are going to create a CSV database of Baseball data from http://seanlahman.com/baseball-archive/statistics/
  2. Connect to Google BigQuery public datasets (you will need a project, but the free quota will probably keep it free).
# download files to local /tmp
mkdir -p /tmp/baseball
cd /tmp/baseball
curl -Ls http://seanlahman.com/files/database/baseballdatabank-2017.1.zip > bball.zip
unzip bball.zip

mv baseball*/core/*.csv .
rm bball.zip
rm -rf baseballdatabank-*

# run a docker container locally
docker run -e "LOGGING=debug" --rm -it -p 4000:4000 \
  -v /tmp/baseball:/tmp/baseball \
  gcr.io/dataux-io/dataux:latest


In another Console open Mysql:

# connect to the docker container you just started
mysql -h 127.0.0.1 -P4000


-- Now create a new Source
CREATE source baseball WITH {
  "type":"cloudstore", 
  "schema":"baseball", 
  "settings" : {
     "type": "localfs",
     "format": "csv",
     "path": "baseball/",
     "localpath": "/tmp"
  }
};

show databases;

use baseball;

show tables;

describe appearances

select count(*) from appearances;

select * from appearances limit 10;


Big Query Example

# assuming you are running local, if you are instead in Google Cloud, or Google Container Engine
# you don't need the credentials or volume mount
docker run -e "GOOGLE_APPLICATION_CREDENTIALS=/.config/gcloud/application_default_credentials.json" \
  -e "LOGGING=debug" \
  --rm -it \
  -p 4000:4000 \
  -v ~/.config/gcloud:/.config/gcloud \
  gcr.io/dataux-io/dataux:latest

# now that dataux is running use mysql-client to connect
mysql -h 127.0.0.1 -P 4000

now run some queries

-- add a bigquery datasource
CREATE source `datauxtest` WITH {
    "type":"bigquery",
    "schema":"bqsf_bikes",
    "table_aliases" : {
       "bikeshare_stations" : "bigquery-public-data:san_francisco.bikeshare_stations"
    },
    "settings" : {
      "billing_project" : "your-google-cloud-project",
      "data_project" : "bigquery-public-data",
      "dataset" : "san_francisco"
    }
};

use bqsf_bikes;

show tables;

describe film_locations;

select * from film_locations limit 10;

Hacking

For now, the goal is to allow this to be used for library, so the vendor is not checked in. use docker containers or dep for now.

# run dep ensure
dep ensure -v 


Related Projects, Database Proxies & Multi-Data QL

  • Data-Accessability Making it easier to query, access, share, and use data. Protocol shifting (for accessibility). Sharing/Replication between db types.
  • Scalability/Sharding Implement sharding, connection sharing
Name Scaling Ease Of Access (sql, etc) Comments
Vitess Y for scaling (sharding), very mature
twemproxy Y for scaling memcache
Couchbase N1QL Y Y sql interface to couchbase k/v (and full-text-index)
prestodb Y query front end to multiple backends, distributed
cratedb Y Y all-in-one db, not a proxy, sql to es
codis Y for scaling redis
MariaDB MaxScale Y for scaling mysql/mariadb (sharding) mature
Netflix Dynomite Y not really sql, just multi-store k/v
redishappy Y for scaling redis, haproxy
mixer Y simple mysql sharding

We use more and more databases, flatfiles, message queues, etc. For db's the primary reader/writer is fine but secondary readers such as investigating ad-hoc issues means we might be accessing and learning many different query languages.

Credit to mixer, derived mysql connection pieces from it (which was forked from vitess).

Inspiration/Other works

In Internet architectures, data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, and derived data stores or indexes which serve reads and other complex queries. The data in these secondary stores is often derived from the primary data through custom transformations, sometimes involving complex processing driven by business logic. Similarly data in caching tiers is derived from reads against the primary data store, but needs to get invalidated or refreshed when the primary data gets mutated. A fundamental requirement emerging from these kinds of data architectures is the need to reliably capture, flow and process primary data changes.

from Databus

Building

I plan on getting the vendor getting checked in soon so the build will work. However I am currently trying to figure out how to organize packages to allow use as both a library as well as a daemon. (see how minimal main.go is, to encourage your own builtins and datasources.)

# for just docker

# ensure /vendor has correct versions
dep ensure -update 

# build binary
./.build

# build docker

docker build -t gcr.io/dataux-io/dataux:v0.15.1 .


Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].