All Projects β†’ haven-jeon β†’ LegalQA

haven-jeon / LegalQA

Licence: other
Korean LegalQA using SentenceKoBART

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to LegalQA

hangul-search-js
πŸ‡°πŸ‡· Simple Korean text search module
Stars: ✭ 22 (-71.43%)
Mutual labels:  korean-nlp, korean-language
stackoverflow-semantic-search
Word2Vec encodings based search engine for Stackoverflow questions
Stars: ✭ 23 (-70.13%)
Mutual labels:  search-engine, semantic-search
Haystack
πŸ” Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+4327.27%)
Mutual labels:  search-engine, semantic-search
Acts as indexed
Acts As Indexed is a plugin which provides a pain-free way to add fulltext search to your Ruby on Rails app
Stars: ✭ 211 (+174.03%)
Mutual labels:  search-engine
Search Engine Parser
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
Stars: ✭ 216 (+180.52%)
Mutual labels:  search-engine
Elasticsearch
Free and Open, Distributed, RESTful Search Engine
Stars: ✭ 57,778 (+74936.36%)
Mutual labels:  search-engine
patzilla
PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
Stars: ✭ 71 (-7.79%)
Mutual labels:  search-engine
Examine
A .NET indexing and search engine powered by Lucene.Net
Stars: ✭ 208 (+170.13%)
Mutual labels:  search-engine
FUTURE
A private, free, open-source search engine built on a P2P network
Stars: ✭ 19 (-75.32%)
Mutual labels:  search-engine
Algoliasearch Laravel
[Deprecated] We now recommend using Laravel Scout, see =>
Stars: ✭ 242 (+214.29%)
Mutual labels:  search-engine
Dweb.page
Your Gateway to the Distributed Web
Stars: ✭ 239 (+210.39%)
Mutual labels:  search-engine
Magnetico
Autonomous (self-hosted) BitTorrent DHT search engine suite.
Stars: ✭ 2,626 (+3310.39%)
Mutual labels:  search-engine
KoSpacing
Automatic Korean word spacing with R
Stars: ✭ 76 (-1.3%)
Mutual labels:  korean-nlp
Scout
RESTful search server written in Python, powered by SQLite.
Stars: ✭ 213 (+176.62%)
Mutual labels:  search-engine
KoBERT-Transformers
KoBERT on πŸ€— Huggingface Transformers πŸ€— (with Bug Fixed)
Stars: ✭ 162 (+110.39%)
Mutual labels:  korean-nlp
Alfanous
Alfanous is an Arabic search engine API provides the simple and advanced search in Quran , more features and many interfaces...
Stars: ✭ 209 (+171.43%)
Mutual labels:  search-engine
first-contrib-app
A search engine to find good beginner issues across Github and become an open source contributor !
Stars: ✭ 33 (-57.14%)
Mutual labels:  search-engine
Magnetissimo
Web application that indexes all popular torrent sites, and saves it to the local database.
Stars: ✭ 2,551 (+3212.99%)
Mutual labels:  search-engine
Addok
Search engine for address. Only address.
Stars: ✭ 226 (+193.51%)
Mutual labels:  search-engine
Conceptualsearch
Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs
Stars: ✭ 245 (+218.18%)
Mutual labels:  search-engine

LegalQA using SentenceKoBART

Implementation of legal QA system based on SentenceKoBART

Setup

# install git lfs , https://github.com/git-lfs/git-lfs/wiki/Installation
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git clone https://github.com/haven-jeon/LegalQA.git
cd LegalQA
git lfs pull
# If the lfs quota is exceeded, please download it with the command below.
# https://drive.google.com/file/d/1DJFMknxT7OAAWYFV_WGW2UcCxmuf3cp_/view?usp=sharing
# mv SentenceKoBART.bin model/
# pip install --use-deprecated=legacy-resolver  -r requirements.txt 
pip install -r requirements.txt

Index

python app.py -t index

GPU-based indexing available as an option

  • device: cuda

Search

With REST API

To start the Jina server for REST API:

# python app.py -t query_restful --flow flows/query_numpy.yml
python app.py -t query_restful

Then use a client to query:

curl --request POST -d '{"parameters": {"limit": 1},  "data": ["상속 κ΄€λ ¨ 문의"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:1234/search'

From the terminal

# python app.py -t query --flow flows/query_numpy.yml
python app.py -t query

Approximate KNN Search with AnnLite

python app.py -t index --flow flows/index_annlite.yml

python app.py -t query --flow flows/query_annlite.yml

  • Retrieval time(sec.)
    • AMD Ryzen 5 PRO 4650U, 16 GB Memory
    • Average of 100 searches
    • Excluding BertReRanker
top-k Numpy AnnLite Faiss Annoy
10 1.433 0.101 0.131 0.118

Production Ready Neural Search with HNSWPostgresIndexer

docker run -e POSTGRES_PASSWORD=123456 -p 127.0.0.1:5432:5432/tcp postgres:13.2
python app.py -t index --flow flows/index_psqlhnsw.yml

python app.py -t query --flow flows/query_psqlhnsw.yml

Presentation

Demo

  • Working!

Links

FAQ

Why this dataset?

Legal data is composed of technical terms, so it is difficult to search if you are not familiar with these terms. Because of these characteristics, I thought it was a good example to show the effectiveness of neural IR.

LFS quota is exceeded

You can download SentenceKoBART.bin from one of the two links below.

Citation

Model training, data crawling, and demo system were all supported by the AWS Hero program.

@misc{heewon2021,
author = {Heewon Jeon},
title = {LegalQA using SentenceKoBART},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/LegalQA}}

License

  • QA data data/legalqa.jsonlines is crawled in www.freelawfirm.co.kr based on robots.txt. Commercial use other than academic use is prohibited.
  • We are not responsible for any legal decisions we make based on the resources provided here.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].