All Projects → alash3al → xyr

alash3al / xyr

Licence: Apache-2.0 license
Query any data source using SQL, works with the local filesystem, s3, and more. It should be a very tiny and lightweight alternative to AWS Athena, Presto ... etc.

Programming Languages

go
31211 projects - #10 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to xyr

Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+177.59%)
Mutual labels:  presto
sqlite-createtable-parser
A parser for sqlite create table sql statements.
Stars: ✭ 67 (+15.52%)
Mutual labels:  sqlite3
logstash-output-s3
No description or website provided.
Stars: ✭ 55 (-5.17%)
Mutual labels:  s3
Quix
Quix Notebook Manager
Stars: ✭ 184 (+217.24%)
Mutual labels:  presto
graphchain
⚡️ An efficient cache for the execution of dask graphs.
Stars: ✭ 63 (+8.62%)
Mutual labels:  s3
Luki
[Deprecated] The official repository for Luki the Discord bot
Stars: ✭ 21 (-63.79%)
Mutual labels:  sqlite3
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+22239.66%)
Mutual labels:  presto
spring-file-storage-service
The FSS(file storage service) APIs make storing the blob file easy and simple .
Stars: ✭ 33 (-43.1%)
Mutual labels:  s3
pg-bifrost
PostgreSQL Logical Replication tool into Kinesis, S3 and RabbitMQ
Stars: ✭ 31 (-46.55%)
Mutual labels:  s3
s3bundler
ARCHIVED - see https://aws.amazon.com/about-aws/whats-new/2019/04/Amazon-S3-Introduces-S3-Batch-Operations-for-Object-Management/ Amazon S3 Bundler downloads billions of small S3 objects, bundles them into archives, and uploads them back into S3.
Stars: ✭ 26 (-55.17%)
Mutual labels:  s3
sqllex
The most pythonic ORM (for SQLite and PostgreSQL). Seriously, try it out!
Stars: ✭ 80 (+37.93%)
Mutual labels:  sqlite3
amazon-sns-java-extended-client-lib
This AWS SNS client library allows to publish messages to SNS that exceed the 256 KB message size limit.
Stars: ✭ 23 (-60.34%)
Mutual labels:  s3
mediasort
Upload manager using Laravel's built-in Filesystem/Cloud Storage
Stars: ✭ 20 (-65.52%)
Mutual labels:  s3
Presto Go Client
A Presto client for the Go programming language.
Stars: ✭ 183 (+215.52%)
Mutual labels:  presto
s3 asset deploy
Deploy & manage static assets on S3 with rolling deploys & rollbacks in mind.
Stars: ✭ 63 (+8.62%)
Mutual labels:  s3
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+3905.17%)
Mutual labels:  presto
aws-pdf-textract-pipeline
🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Stars: ✭ 141 (+143.1%)
Mutual labels:  s3
Swift-FFDB
a Object/Relational Mapping (ORM) support to iOS and MacOS .Since SwiftFFDB is build on top of FMDB.
Stars: ✭ 22 (-62.07%)
Mutual labels:  sqlite3
secure-media
Store private media securely in WordPress.
Stars: ✭ 22 (-62.07%)
Mutual labels:  s3
kafka-connect-fs
Kafka Connect FileSystem Connector
Stars: ✭ 107 (+84.48%)
Mutual labels:  s3

xyr

xyr is a very lightweight, simple, and powerful data ETL platform that helps you to query available data sources using SQL.

Example (Local Filesystem)

here we define a new table called users which will load all json files in that directory (recursive) with any of the following json formats: (object/object[] per-file, newline delimited json objects/object[], or event no delimiter json objects/object[] like what kinesis firehose json output format).

Let's image we have a directory of json files called /tmp/data/users and here is an example of a json file there:

{"id":10,"email":"[email protected]"}{"id":20,"email":"[email protected]"}{"id": 3,"email":"[email protected]"}{"id": 4,"email":"[email protected]"}

Then we can define its schema as following

# where xyr should store its internal database
data_dir = "./tmp/db/"

# this file is `./config.xyr.hcl`
table "users" {
    // the driver we want
    driver = "jsondir"

    // the data source directory
    source = "/tmp/data/users"

    // xyr will try to create a table into its internal storage, so it needs
    // to know at least what are the required columns names of your data.
    // i.e: {"id": 1, "email": "[email protected]", "age": 20}
    // but we only need "id" and "email", so we defined both in the below columns array
    // and not that the ordering is the same as our example.
    columns = ["id", "email"]

    // what do you want to load
    // in case of jsondir, we can specify a regex pattern to filter the files 
    // using the filename
    // but if we're using an SQL driver we can provide an sql statement that reads the data
    // from the source SQL based database.
    // i.e: "SELECT * FROM SOME_TABLE"
    filter = ".*"
}

Now its the time to load it

$ xyr table:import users

Now let's query it

$ xyr exec "SELECT * FROM users"

All tables you define could be joined in the same query easily, let's imagine that we have the following defination

# debug mode "affects the log level"
debug = true

# how many workers to use to write into our sqlite db
# 0 means current cpu cores count
workers_count = 0

# where xyr should store its internal database
data_dir = "./tmp/db/"


table "users" {
    driver = "s3jsondir"
    source = "s3://ACCESS_KEY:SECRET_KEY@/BUCKET_NAME?region=&ssl=false&path=true&perpage=1000"

    # which prefix we want to select
    filter = "xyr/users/"

    columns = ["id", "email"]
}

table "user_vists" {
    driver = "postgres"
    source = "postgresql://username:password@server:port/dbname?option1=value1"
    columns = ["user_id", "vists"]
    filter = "SELECT user_id, count(vists) FROM USERS GROUP BY user_id"
}

Now let's join them

$ xyr exec "SELECT * FROM users LEFT JOIN user_vists ON users_vists.user_id = users.id"

Installation

use this docker package

Supported Drivers

Driver Source Connection String
jsondir /PATH/TO/JSON/DATA/DIR
s3jsondir s3://[access_key_url_encoded]:[secret_key_url_encoded]@[endpoint_url]/bucket_name?region=&ssl=false&path=true&perpage=1000&downloaders_count=8&downloader_concurrency=8
mysql usrname:password@tcp(server:port)/dbname?option1=value1&...
postgres postgresql://username:password@server:port/dbname?option1=value1
sqlite3 /path/to/db.sqlite?option1=value1
sqlserver sqlserver://username:password@host/instance?param1=value&param2=value
sqlserver://username:password@host:port?param1=value&param2=value
sqlserver://sa@localhost/SQLExpress?database=master&connection+timeout=30
hana hdb://user:password@host:port
clickhouse tcp://host1:9000?username=user&password=qwerty&database=clicks&read_timeout=10&write_timeout=20&alt_hosts=host2:9000,host3:9000
oracle oracle://user:pass@server1/service?server=server2&server=server3

Use Cases

  • Simple Presto Alternative.
  • Simple AWS Athena Alternative.
  • Convert your JSON documents into a SQL DB.

How does it work?

internaly xyr utilizes SQLite as an embeded sql datastore (it may be changed in future and we can add multiple data stores), when you define a table in XYRCONFIG file then run $ xyr table:import you will be able to import all defined tables as well querying them via $ xyr exec "SELECT * FROM TABLE_NAME_HERE" which outputs json result by default.

Plan

  • Building the initial core.
  • Add the basic import command for importing the tables into xyr.
  • Add the exec command to execute SQL query.
  • Add well known SQL drivers
    • mysql
    • postgres
    • sqlite3
    • clickhouse
    • oracle
    • hana
    • sqlserver
  • Add an S3 driver
  • Adding/Improving documentations
  • Expose another API beside the CLI to enable external Apps to query xyr.
    • JSON Endpoint?
    • Mysql Protocol?
    • Redis Protocol?
  • Improving the code base (iteration 1).
  • Add another backend instead of sqlite3 as internal datastore?
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].