Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → FerreroJeremy → Ln2sql

FerreroJeremy / Ln2sql

Licence: gpl-3.0

A tool to query a database in natural language

Programming Languages

python

139335 projects - #7 most used programming language

Labels

database natural-language-processing

Projects that are alternatives of or similar to Ln2sql

Chat

基于自然语言理解与机器学习的聊天机器人，支持多用户并发及自定义多轮对话

Stars: ✭ 516 (+28.04%)

Mutual labels: database, natural-language-processing

Awesome Tensorlayer

A curated list of dedicated resources and applications

Stars: ✭ 248 (-38.46%)

Mutual labels: database, natural-language-processing

Wikisql

A large annotated semantic parsing corpus for developing natural language interfaces.

Stars: ✭ 965 (+139.45%)

Mutual labels: database, natural-language-processing

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (-36.72%)

Mutual labels: database, natural-language-processing

Web Database Analytics

Web scrapping and related analytics using Python tools

Stars: ✭ 175 (-56.58%)

Mutual labels: database, natural-language-processing

Text2sql Data

A collection of datasets that pair questions with SQL queries.

Stars: ✭ 287 (-28.78%)

Mutual labels: database, natural-language-processing

My Cs Degree

A CS degree with a focus on full-stack ML engineering, 2020

Stars: ✭ 391 (-2.98%)

Mutual labels: natural-language-processing

Collective Knowledge framework (CK) helps to organize black-box research software as a database of reusable components and micro-services with common APIs, automation actions and extensible meta descriptions. See real-world use cases from Arm, General Motors, ACM, Raspberry Pi foundation and others:

Stars: ✭ 395 (-1.99%)

Mutual labels: database

Operator

KubeDB Operator

Stars: ✭ 387 (-3.97%)

Mutual labels: database

Samples

Sample projects using Material, Graph, and Algorithm.

Stars: ✭ 386 (-4.22%)

Mutual labels: database

D2l Vn

Một cuốn sách tương tác về học sâu có mã nguồn, toán và thảo luận. Đề cập đến nhiều framework phổ biến (TensorFlow, Pytorch & MXNet) và được sử dụng tại 175 trường Đại học.

Stars: ✭ 402 (-0.25%)

Mutual labels: natural-language-processing

Projects

🪐 End-to-end NLP workflows from prototype to production

Stars: ✭ 397 (-1.49%)

Mutual labels: natural-language-processing

Dl topics

List of DL topics and resources essential for cracking interviews

Stars: ✭ 392 (-2.73%)

Mutual labels: natural-language-processing

Sqlboiler

Generate a Go ORM tailored to your database schema.

Stars: ✭ 4,497 (+1015.88%)

Mutual labels: database

Neuronlp2

Deep neural models for core NLP tasks (Pytorch version)

Stars: ✭ 397 (-1.49%)

Mutual labels: natural-language-processing

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (-3.47%)

Mutual labels: database

Anlp19

Course repo for Applied Natural Language Processing (Spring 2019)

Stars: ✭ 402 (-0.25%)

Mutual labels: natural-language-processing

Tf Seq2seq

Sequence to sequence learning using TensorFlow.

Stars: ✭ 387 (-3.97%)

Mutual labels: natural-language-processing

Firebase Instagram

📸 Instagram clone with Firebase Cloud Firestore, Expo, and React Native 😁😍

Stars: ✭ 389 (-3.47%)

Mutual labels: database

Model

Ruby persistence framework with entities and repositories

Stars: ✭ 399 (-0.99%)

Mutual labels: database

View All Similar Projects ➔

ln2sql

ln2sql is a NLP tool to query a database in natural language. The tool takes in input a database model and a sentence and translate the latter in a valid SQL statement able to query the input data model.

Scientific paper

The initial tool is described in the following French paper (which can be found in the docs/ directory):

Benoît Couderc and Jérémy Ferrero. fr2sql : Database Query in French. (fr2sql : Interrogation de bases de données en français [in French]). In Proceedings of the 17th RECITAL (affiliated with the 22th TALN Conference). June 2015. Caen, France. ATALA. pp.1-12

Please cite the paper if you use ln2sql.

Differences between the version of the paper (fr2sql) and this version (ln2sql)

ln2sql is not the state-of-the-art tool for copyright reasons. It's just a quick & dirty Python wrapper but it has some speed optimizations.

A data model is only learned from a parsing of a SQL dump file. Thus, no database connection is needed.
In the paper, TreeTagger is used to filter the words of the input sentence according to its POS tagging. In this way, a mapping between the keywords of the input sentence and the keywords of the data model can be performed. In ln2sql, Treegagger is left in favour of an import of personal configuration files (for languages, stop words and synonyms) to be more generic.

Beware that ln2sql cannot therefore automatically solve the gender and number problem. So if the word "students" is in the input sentence, it does not match with the table "student" in the model of data. To do that, the equivalence "students → student" must be appear in the used thesaurus. If you want a version using TreeTagger, a Python wrapper exists and a documentation can be found here.

In theory, all languages can be supported.
A grammar still parse the input sentence to generate the corresponding query structure, but now this structure is stocked in a Python class able to print a query structure JSON file. Thus, the hash map for the query generation was abandoned. In addition, a multi-threading implementation was adopted.

SQL statements supported

[X] SELECT
- [X] one column
- [X] multiple columns
- [X] all columns
- [X] distinct select
- [X] aggregate functions
  - [X] count-select
  - [X] sum-select
  - [X] avg-select
  - [X] min-select
  - [X] max-select
[X] JOIN
- [X] inner join
- [X] natural join
[X] WHERE
- [X] one condition
- [X] multiple conditions
- [X] junction
- [X] disjunction
- [X] cross-condition
- [X] operators
  - [X] equal operator
  - [X] not equal operator
  - [X] greater-than operator
  - [X] less-than operator
  - [X] like operator
  - [ ] between operator (not 100% efficient)
- [X] aggregate functions
  - [X] sum in condition
  - [X] avg in condition
  - [X] min in condition
  - [X] max in condition
[X] ORDER BY
- [X] ASC
- [X] DESC
[X] GROUP BY
[X] multiple queries
[X] exception and error handling
[ ] detection of values (not 100% efficient)

How to use it?

Supported languages

The tool can deal with any language, so long as it has its configuration file (i.e. a file with the keywords of the language).

Language configuration files can be found in lang/ directory. The files are CSV files. Each line represent a type of keywords. Anything before the colon is ignored. Keywords must be separated by a comma.

You can build your own language configuration file following the English and French templates.

Database input

To be effective ln2sql need to learn the data model of the database that the user want to query. It need to load the corresponding SQL dump file to do that.
A database dump is a file containing a record of the table structure and/or the data of a database.

Usage of the Database class

database = Database()
database.load("database/tal.sql")
database.print_me()

For the following SQL statements loaded, the output in the terminal looks like:

Thesaurus and stop word list

You can improve the keyword filtering using a thesaurus. Thesaurus can be found in thesaurus/ directory. You can build your own thesaurus following the OpenOffice template.

You can improve the stop word filtering using a stop word list. You can build your own stop word list following the template of the lists in stopwords/ directory.

Usage

You can directly use the python wrapper by the following way:

Usage:
	From the cloned source:
	python3 -m ln2sql.main -d <path> -l <path> -i <input-sentence> [-j <path>] [-t <path>] [-s <path>]
Parameters:
	-h					print this help message
	-d <path>				path to sql dump file
	-l <path>				path to language configuration file
	-i <input-sentence>			input sentence to parse
	-j <path>				path to JSON output file
	-t <path>				path to thesaurus file
	-s <path>				path to stopwords file

example of usage:

python3 -m ln2sql.main -d database_store/city.sql -l lang_store/english.csv -j output.json -i "Count how many city there are with the name blob?"

or by graphical interface by typing the following command:

python ln2sql/ln2sql_gui.py

a window like the one below will appear:

JSON output

With the following input:

What is the average age of students whose name is Doe or age over 25?

the output is:

{
	"select": {
		"column": "age",
		"type": "AVG"
	},
	"from": {
		"table": "student"
	},
	"join": {

	},
	"where": {
		"conditions": [
			{ "column": "name",
			  "operator": "=",
			  "value": "Doe"
			},
			{
			  "operator": "OR"
			},
			{ "column": "age",
			  "operator": ">",
			  "value": "25"
			}
		]
	},
	"group_by": {

	},
	"order_by": {

	}
}

Conception

The tool is implemented under the Model-View-Controller pattern. The classes imported from the Python Standard Library do not appear in the diagram except those required for inheritance (e.g. Thread or Exception).

The above diagram was modeled with StarUML.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 403

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (15) 🔗