All Projects → FerreroJeremy → Ln2sql

FerreroJeremy / Ln2sql

Licence: gpl-3.0
A tool to query a database in natural language

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Ln2sql

Chat
基于自然语言理解与机器学习的聊天机器人,支持多用户并发及自定义多轮对话
Stars: ✭ 516 (+28.04%)
Mutual labels:  database, natural-language-processing
Awesome Tensorlayer
A curated list of dedicated resources and applications
Stars: ✭ 248 (-38.46%)
Mutual labels:  database, natural-language-processing
Wikisql
A large annotated semantic parsing corpus for developing natural language interfaces.
Stars: ✭ 965 (+139.45%)
Mutual labels:  database, natural-language-processing
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-36.72%)
Mutual labels:  database, natural-language-processing
Web Database Analytics
Web scrapping and related analytics using Python tools
Stars: ✭ 175 (-56.58%)
Mutual labels:  database, natural-language-processing
Text2sql Data
A collection of datasets that pair questions with SQL queries.
Stars: ✭ 287 (-28.78%)
Mutual labels:  database, natural-language-processing
My Cs Degree
A CS degree with a focus on full-stack ML engineering, 2020
Stars: ✭ 391 (-2.98%)
Mutual labels:  natural-language-processing
Ck
Collective Knowledge framework (CK) helps to organize black-box research software as a database of reusable components and micro-services with common APIs, automation actions and extensible meta descriptions. See real-world use cases from Arm, General Motors, ACM, Raspberry Pi foundation and others:
Stars: ✭ 395 (-1.99%)
Mutual labels:  database
Operator
KubeDB Operator
Stars: ✭ 387 (-3.97%)
Mutual labels:  database
Samples
Sample projects using Material, Graph, and Algorithm.
Stars: ✭ 386 (-4.22%)
Mutual labels:  database
D2l Vn
Một cuốn sách tương tác về học sâu có mã nguồn, toán và thảo luận. Đề cập đến nhiều framework phổ biến (TensorFlow, Pytorch & MXNet) và được sử dụng tại 175 trường Đại học.
Stars: ✭ 402 (-0.25%)
Mutual labels:  natural-language-processing
Projects
🪐 End-to-end NLP workflows from prototype to production
Stars: ✭ 397 (-1.49%)
Mutual labels:  natural-language-processing
Dl topics
List of DL topics and resources essential for cracking interviews
Stars: ✭ 392 (-2.73%)
Mutual labels:  natural-language-processing
Sqlboiler
Generate a Go ORM tailored to your database schema.
Stars: ✭ 4,497 (+1015.88%)
Mutual labels:  database
Neuronlp2
Deep neural models for core NLP tasks (Pytorch version)
Stars: ✭ 397 (-1.49%)
Mutual labels:  natural-language-processing
Orc
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-3.47%)
Mutual labels:  database
Anlp19
Course repo for Applied Natural Language Processing (Spring 2019)
Stars: ✭ 402 (-0.25%)
Mutual labels:  natural-language-processing
Tf Seq2seq
Sequence to sequence learning using TensorFlow.
Stars: ✭ 387 (-3.97%)
Mutual labels:  natural-language-processing
Firebase Instagram
📸 Instagram clone with Firebase Cloud Firestore, Expo, and React Native 😁😍
Stars: ✭ 389 (-3.47%)
Mutual labels:  database
Model
Ruby persistence framework with entities and repositories
Stars: ✭ 399 (-0.99%)
Mutual labels:  database

ln2sql

ln2sql is a NLP tool to query a database in natural language. The tool takes in input a database model and a sentence and translate the latter in a valid SQL statement able to query the input data model.

Scientific paper

The initial tool is described in the following French paper (which can be found in the docs/ directory):

Benoît Couderc and Jérémy Ferrero. fr2sql : Database Query in French. (fr2sql : Interrogation de bases de données en français [in French]). In Proceedings of the 17th RECITAL (affiliated with the 22th TALN Conference). June 2015. Caen, France. ATALA. pp.1-12

Please cite the paper if you use ln2sql.

Differences between the version of the paper (fr2sql) and this version (ln2sql)

ln2sql is not the state-of-the-art tool for copyright reasons. It's just a quick & dirty Python wrapper but it has some speed optimizations.

Beware that ln2sql cannot therefore automatically solve the gender and number problem. So if the word "students" is in the input sentence, it does not match with the table "student" in the model of data. To do that, the equivalence "students → student" must be appear in the used thesaurus. If you want a version using TreeTagger, a Python wrapper exists and a documentation can be found here.

  • In theory, all languages can be supported.

  • A grammar still parse the input sentence to generate the corresponding query structure, but now this structure is stocked in a Python class able to print a query structure JSON file. Thus, the hash map for the query generation was abandoned. In addition, a multi-threading implementation was adopted.

SQL statements supported

  • [X] SELECT
    • [X] one column
    • [X] multiple columns
    • [X] all columns
    • [X] distinct select
    • [X] aggregate functions
      • [X] count-select
      • [X] sum-select
      • [X] avg-select
      • [X] min-select
      • [X] max-select
  • [X] JOIN
    • [X] inner join
    • [X] natural join
  • [X] WHERE
    • [X] one condition
    • [X] multiple conditions
    • [X] junction
    • [X] disjunction
    • [X] cross-condition
    • [X] operators
      • [X] equal operator
      • [X] not equal operator
      • [X] greater-than operator
      • [X] less-than operator
      • [X] like operator
      • [ ] between operator (not 100% efficient)
    • [X] aggregate functions
      • [X] sum in condition
      • [X] avg in condition
      • [X] min in condition
      • [X] max in condition
  • [X] ORDER BY
    • [X] ASC
    • [X] DESC
  • [X] GROUP BY
  • [X] multiple queries
  • [X] exception and error handling
  • [ ] detection of values (not 100% efficient)

How to use it?

Supported languages

The tool can deal with any language, so long as it has its configuration file (i.e. a file with the keywords of the language).

Language configuration files can be found in lang/ directory. The files are CSV files. Each line represent a type of keywords. Anything before the colon is ignored. Keywords must be separated by a comma.

You can build your own language configuration file following the English and French templates.

Database input

To be effective ln2sql need to learn the data model of the database that the user want to query. It need to load the corresponding SQL dump file to do that.
A database dump is a file containing a record of the table structure and/or the data of a database.

Usage of the Database class
database = Database()
database.load("database/tal.sql")
database.print_me()

For the following SQL statements loaded, the output in the terminal looks like:

Thesaurus and stop word list

You can improve the keyword filtering using a thesaurus. Thesaurus can be found in thesaurus/ directory. You can build your own thesaurus following the OpenOffice template.

You can improve the stop word filtering using a stop word list. You can build your own stop word list following the template of the lists in stopwords/ directory.

Usage

You can directly use the python wrapper by the following way:

Usage:
	From the cloned source:
	python3 -m ln2sql.main -d <path> -l <path> -i <input-sentence> [-j <path>] [-t <path>] [-s <path>]
Parameters:
	-h					print this help message
	-d <path>				path to sql dump file
	-l <path>				path to language configuration file
	-i <input-sentence>			input sentence to parse
	-j <path>				path to JSON output file
	-t <path>				path to thesaurus file
	-s <path>				path to stopwords file

example of usage:

python3 -m ln2sql.main -d database_store/city.sql -l lang_store/english.csv -j output.json -i "Count how many city there are with the name blob?"

or by graphical interface by typing the following command:

python ln2sql/ln2sql_gui.py

a window like the one below will appear:

JSON output

With the following input:

What is the average age of students whose name is Doe or age over 25?

the output is:

{
	"select": {
		"column": "age",
		"type": "AVG"
	},
	"from": {
		"table": "student"
	},
	"join": {

	},
	"where": {
		"conditions": [
			{ "column": "name",
			  "operator": "=",
			  "value": "Doe"
			},
			{
			  "operator": "OR"
			},
			{ "column": "age",
			  "operator": ">",
			  "value": "25"
			}
		]
	},
	"group_by": {

	},
	"order_by": {

	}
}

Conception

The tool is implemented under the Model-View-Controller pattern. The classes imported from the Python Standard Library do not appear in the diagram except those required for inheritance (e.g. Thread or Exception).

The above diagram was modeled with StarUML.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].