All Projects → zhouqingqing → qpmodel

zhouqingqing / qpmodel

Licence: MIT license
A Relational Optimizer and Executor

Programming Languages

C#
18002 projects
ANTLR
299 projects

Projects that are alternatives of or similar to qpmodel

gg
General Golang Code Generator
Stars: ✭ 88 (+62.96%)
Mutual labels:  codegen
butte
No description or website provided.
Stars: ✭ 69 (+27.78%)
Mutual labels:  codegen
graphql-codegen-apollo-next-ssr
Autogenerate apollo code for nextjs ssr
Stars: ✭ 176 (+225.93%)
Mutual labels:  codegen
vox
Vox language compiler. AOT / JIT / Linker. Zero dependencies
Stars: ✭ 288 (+433.33%)
Mutual labels:  codegen
cmake-reflection-template
A template for simple C++ reflection done with CMake and Python (no other external tools)
Stars: ✭ 37 (-31.48%)
Mutual labels:  codegen
graphql-ts-client
Typescript DSL for GraphQL.
Stars: ✭ 124 (+129.63%)
Mutual labels:  codegen
xyr
Query any data source using SQL, works with the local filesystem, s3, and more. It should be a very tiny and lightweight alternative to AWS Athena, Presto ... etc.
Stars: ✭ 58 (+7.41%)
Mutual labels:  distributed-query
gqty
a GraphQL client built for rapid iteration
Stars: ✭ 420 (+677.78%)
Mutual labels:  codegen
startle
Startle C Library
Stars: ✭ 15 (-72.22%)
Mutual labels:  codegen
rosetta
Easy to use Rust i18n library based on code generation
Stars: ✭ 37 (-31.48%)
Mutual labels:  codegen
DotNetJS
Consume C# in JavaScript with comfort: single-file UMD library, auto-generated 2-way bindings and type definitions
Stars: ✭ 551 (+920.37%)
Mutual labels:  codegen
d2a
A translator Django into SQLAlchemy.
Stars: ✭ 23 (-57.41%)
Mutual labels:  codegen
ogen
OpenAPI v3 code generator for go
Stars: ✭ 436 (+707.41%)
Mutual labels:  codegen
evon
Fast and versatile event dispatcher code generator for Golang
Stars: ✭ 15 (-72.22%)
Mutual labels:  codegen
domainrobot-api
Swagger documentation for different APIs powered by InterNetX GmbH.
Stars: ✭ 14 (-74.07%)
Mutual labels:  codegen
slm-code-generation
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)
Stars: ✭ 75 (+38.89%)
Mutual labels:  codegen
graphql-cli-codegen
apollo-codegen plugin for graphql-cli
Stars: ✭ 22 (-59.26%)
Mutual labels:  codegen
sbt-guardrail
Principled code generation from OpenAPI specifications
Stars: ✭ 24 (-55.56%)
Mutual labels:  codegen
EasyEE-Auto
EasyEE 自动化代码生成器。EasyEE Automated code generator.
Stars: ✭ 39 (-27.78%)
Mutual labels:  codegen
graphql-java-codegen-gradle-plugin
Gradle plugin for graphql-java-codegen
Stars: ✭ 19 (-64.81%)
Mutual labels:  codegen

Build Status Build Status CodeFactor Coverage Status

An Experimental Relational Optimizer and Executor

This project implements a relational optimizer and executor in c#. It is called "experimental" because it does not have all the details carved. The main target is the optimizer, and the purpose is to prepare for a more serious production implementation later. The executor part is needed for plan correctness verification with TPCH/DS end to end runnable.

It is built on top of many database research results, and it has been open-sourced in the hope that others may find it useful and database community can provide feedback and ways to improve it.

Why C#

Optimizer is logic centric, so a high-level language is preferred. After experiments, production may want to turn it into some C/C++ code, so the language must be a close relative of them. C# (.net core) provides some great features like cross-platform, LINQ, dynamic types to make modeling easy, and it is close enough to C++ (and that's why not python).

Optimizer

The optimizer exercises the following constructs:

  • Top down/bottom up structure: the optimizer does utilize a top down cascades style optimizer structure but optionally you can choose to use bottom up join order resolver. It currently use DPccp ("Analysis of Two Existing and One New Dynamic Programming Algorithm") by G. Moerkotte, et al. It also implements some other join order resolver like DPBushy (TDBasic, GOO), mainly for the purpose of correctness verification. A more generic join resolver DPHyper (Dynamic Programming Strikes Back) is in preparation.
  • Subquery decorrelation: it follows the "Unnesting Arbitrary Queries" and "The Complete Story of Joins (in Hyper)" by T. Neumann et al.
  • CTE inline/non-inline: it follows the Optimization of Common Table Expressions in MPP Database Systems by A. El-Helw et al.
  • Cardinality estimation, costing: currently this follows "as-is" textbook implementation. We did not spend much time here because it is a local issue, meaning later improvements shall not impact the architecture. CE also demonstrate upgrade or version management.
  • Distributed plan: follows the remote exchange model (gather, redistributed etc). A naive in-machine parallelism can be modeled with the same scheme.
  • The optimizer also exposes a DataSet like interface which demonstrates language integration. Another example calculating Pi with Monte-Carlo can be found in Unittest/TestDataSet.
	// leverage c#'s sqrt so no reinventing wheels
	double sqroot(double d) => Math.Sqrt(d);
	SQLContext.Register<double, string>("sqroot", sqroot);

	// use sqroot in SQL
	var sql = "SELECT a1, sqroot(b1*a1+2) from a join b on b2=a2 where a1>1";

	// above query in DataSet form
	var a = sqlContext.Read("a");
	var b = sqlContext.Read("b");
	var rows = a.filter("a1>1").join(b, "b2=a2").select("a1", "sqroot(b1*a1+2)").show();
  • Verify the optimizer by some unittests and TPCH/DS. All TPCH queries are runnable. TPCDS we do not support window function and rolling groups. You can find TPCH plan here.

Executor

The executor is implemented for two purpose: (1) verify plan correctness; (2) test codeGen engineering readiness following paper "How to Architect a query compiler, Revisited" by R.Y.Tahboub et al. Executor is not intended to demonstrate implementing specific operator efficiently.

  • It does not use a LMS underneath, but thanks for c#'s interpolated string support, the template is easy to read and construct side by side with interpreted code.
  • To support incremental codeGen improvement, meaning we can ship partial codeGen implementation, it supports a generic expression evaluation function which can fallback to the interpreted implementation if that specific expression is not codeGen ready.
  • It emulates a cross machine execution using parallel threads using remote exchange model. Current setting is 10 machines, so a plan with 1 Gather and 6 redistribution utilizes 70 threads for emulation.
  • The executor utilizes Object and dynamic types to simplify implementation. To port it to C/C++, you can follow the traditional "dispatch" mechanism like a + b => {int4pl(a,b), int2pl(a,b), doublepl(a,b), ...}.

You can see an example generated code for this query here with generic expression evaluation is in this code snippet:

	// projection on PhysicHashAgg112: Output: {a.a2}[0]*2,{count(a.a1)}[1],repeat('a',{a.a2}[0]) 
	Row rproj = new Row(3);
	rproj[0] = ((dynamic)r112[0] * (dynamic)2);
	rproj[1] = r112[1];
	rproj[2] = ExprSearch.Locate("74").Exec(context, r112) /*repeat('a',{a.a2}[0])*/;
	r112 = rproj;

How to Play

The program is based on .net core, so it runs on both Windows and Linux. Load the project with Visual Studio 2019 free community edition or visual studio code. Program.Main() mainly for debugging purpose. There are several builtin tables like 'a', 'b', 'c', 'd' for testing purpose, and you can find their definition in Catalog.createBuildInTestTables(). If you come up with a PR, make sure you have unittest passed.

Unittest comes up with multiple tests covering major functionalities:

  • TPCH end to end power run with small data set
  • TPCDS, JoBench only exercise the optimizer
  • Other grouped unittest stressing different aspects
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].