Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → beader → Tianchi_nl2sql

beader / Tianchi_nl2sql

追一科技首届中文NL2SQL挑战赛决赛第3名方案+代码

Labels

jupyter-notebook nlp

Projects that are alternatives of or similar to Tianchi nl2sql

Simple Faster Rcnn Pytorch

A simplified implemention of Faster R-CNN that replicate performance from origin paper

Stars: ✭ 3,422 (+1080%)

Mutual labels: jupyter-notebook

Coms4995 S19

COMS W4995 Applied Machine Learning - Spring 19

Stars: ✭ 292 (+0.69%)

Mutual labels: jupyter-notebook

Predict Customer Churn

A general-purpose framework for solving problems with machine learning applied to predicting customer churn

Stars: ✭ 294 (+1.38%)

Mutual labels: jupyter-notebook

Sscnet

Semantic Scene Completion from a Single Depth Image

Stars: ✭ 290 (+0%)

Mutual labels: jupyter-notebook

Covid Model

Stars: ✭ 292 (+0.69%)

Mutual labels: jupyter-notebook

Automated Manual Comparison

Automated vs Manual Feature Engineering Comparison. Implemented using Featuretools.

Stars: ✭ 291 (+0.34%)

Mutual labels: jupyter-notebook

Tensorflow Glove

An implementation of GloVe in TensorFlow

Stars: ✭ 288 (-0.69%)

Mutual labels: jupyter-notebook

Dive Into Dl Tensorflow2.0

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现，项目已得到李沐老师的认可

Stars: ✭ 3,380 (+1065.52%)

Mutual labels: jupyter-notebook

G Darknet

darknet with GIoU

Stars: ✭ 292 (+0.69%)

Mutual labels: jupyter-notebook

Datascience course

Curso de Data Science em Português

Stars: ✭ 294 (+1.38%)

Mutual labels: jupyter-notebook

Python for data science

A rapid on-ramp primer for programmers who want to learn Python for doing data science research and development.

Stars: ✭ 290 (+0%)

Mutual labels: jupyter-notebook

Mathtoolsforneuroscience

Materials for Mathematical Tools for Neuroscience course at Harvard (Neurobio 212)

Stars: ✭ 287 (-1.03%)

Mutual labels: jupyter-notebook

Ta Lib In Chinese

中文版TA-Lib库使用教程

Stars: ✭ 292 (+0.69%)

Mutual labels: jupyter-notebook

Image Captioning

Image Captioning using InceptionV3 and beam search

Stars: ✭ 290 (+0%)

Mutual labels: jupyter-notebook

Neural Networks And Deep Learning

This is my assignment on Andrew Ng's course “neural networks and deep learning”

Stars: ✭ 292 (+0.69%)

Mutual labels: jupyter-notebook

Dianping textmining

大众点评评论文本挖掘，包括点评数据爬取、数据清洗入库、数据分析、评论情感分析等的完整挖掘项目

Stars: ✭ 289 (-0.34%)

Mutual labels: jupyter-notebook

Fpn tensorflow

This is a tensorflow re-implementation of Feature Pyramid Networks for Object Detection.

Stars: ✭ 291 (+0.34%)

Mutual labels: jupyter-notebook

Tdc

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics

Stars: ✭ 291 (+0.34%)

Mutual labels: jupyter-notebook

Scientific Python Lectures

Lectures on scientific computing with python, as IPython notebooks.

Stars: ✭ 3,158 (+988.97%)

Mutual labels: jupyter-notebook

Multidimensional Lstm Bitcoin Time Series

Using multidimensional LSTM neural networks to create a forecast for Bitcoin price

Stars: ✭ 289 (-0.34%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

首届中文NL2SQL挑战赛

竞赛链接

⚠️ 由于可能存在的版权问题，请自行联系竞赛平台或主办方索要竞赛数据，谢谢!

💡 代码运行环境至文末

成绩

本项目所采用的方案在复赛中的线上排名为第5，决赛最终成绩排名第3。

主分支下的代码以 jupyter notebook 的形式呈现，以学习交流为目的，对原始的代码经过一定的整理，并不会完全复现线上的结果，但效果不会差太多。

code 目录下的 model1.ipynb 与 model2.ipynb为建模流程，nl2sql/utils 目录则包含了该任务所需的一些基础函数和数据结构。

致谢

感谢追一科技的孙宁远对本次比赛做了细致的赛前辅导
感谢追一科技研究员、科学空间博主苏剑林，分享了大量关于NLP方面的优质博文。本方案受到了基于Bert的NL2SQL模型：一个简明的Baseline这篇文章的启发。项目中使用的 RAdam 优化器的实现直接来自于苏剑林开源的 keras_radam 项目
感谢 CyberZHG 大神的开源项目 keras-bert，本次比赛中我们使用了 keras-bert 构建我们的模型。
感谢哈工大讯飞联合实验室的 Chinese-BERT-wwm 项目，本次比赛中我们使用了他们BERT-wwm, Chinese 预训练模型参数

背景

首届中文NL2SQL挑战赛，使用金融以及通用领域的表格数据作为数据源，提供在此基础上标注的自然语言与SQL语句的匹配对，希望选手可以利用数据训练出可以准确转换自然语言到SQL的模型。

模型的输入为一个 Question + Table，输出一个 SQL 结构，该 SQL 结构对应一条 SQL 语句。

其中

sel 为一个 list，代表 SELECT 语句所选取的列
agg 为一个 list，与 sel 一一对应，表示对该列做哪个聚合操作，比如 sum, max, min 等
conds 为一个 list，代表 WHERE 语句中的的一系列条件，每个条件是一个由 (条件列，条件运算符，条件值) 构成的三元组
cond_conn_op 为一个 int，代表 conds 中各条件之间的并列关系，可以是 and 或者 or

方案介绍

我们将原始的 Label 做一个简单的变换

将 agg 与 sel 合并，agg 中对表格中每一列都做预测，新的类别 NO_OP 表明该列不被选中
将 conds 分为 conds_ops 与 conds_vals 两个部分，这么做的原因是想分两步进行预测。由一个模型先预测 conds 需要选取哪些列以及操作符，再由另一个模型预测所选取的列的比较值

Model 1

Model 1 将 Question 与 Header 顺序连接，在每个 Column 之前添加一个特殊标记，TEXT 或 REAL，这两个特殊的 Token 可以从 BERT 预留的未训练过的 Token 中任选两个来代替。

Model 1 的架构如下:

Model 2

Model 2 则负责 cond_val 的预测。我们的思路是根据 Model 1 选择的 cond_col，枚举 cond_op 与 cond_val，生成一系列的候选组合，将这些组合当成多个二分类问题来做

Model 2 的架构如下:

最后将 Model 2 对一些列候选组合的预测合并起来

模型训练中的一些探索尝试

关于模型训练中的优化，以及一些成功、不成功的 idea，详细可以见我们决赛答辩ppt。

代码运行环境

深度学习框架: tensorflow, keras

具体版本见 requirements.txt

更方便的做法是在 Docker 中运行。比赛中用了如下 Docker 镜像

REPOSITORY	TAG	IMAGE ID
tensorflow/tensorflow	nightly-gpu-py3-jupyter	6e60684e9aa4

由于需要用到 python3.6，使用了 tensorflow nightly build 的镜像，我将比赛时用的镜像传到 docker hub 上了，可以通过如下命令获取。

docker pull beader/tensorflow:nightly-gpu-py3-jupyter

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 290

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗