All Projects → NTMC-Community → Matchzoo Py

NTMC-Community / Matchzoo Py

Licence: apache-2.0
Facilitating the design, comparison and sharing of deep text matching models.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Matchzoo Py

Matchzoo
Facilitating the design, comparison and sharing of deep text matching models.
Stars: ✭ 3,568 (+885.64%)
Mutual labels:  natural-language-processing, matching, text
ELMO-NLP
ELMO在QA问答,文本分类等NLP上面的应用
Stars: ✭ 15 (-95.86%)
Mutual labels:  text, matching
Nlp Recipes
Natural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+1497.51%)
Mutual labels:  natural-language-processing, text
Nlp
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
Stars: ✭ 367 (+1.38%)
Mutual labels:  natural-language-processing, text
Language Modelling
Generating Text using Deep Learning in Python - LSTM, RNN, Keras
Stars: ✭ 38 (-89.5%)
Mutual labels:  natural-language-processing, text
Nlp Pretrained Model
A collection of Natural language processing pre-trained models.
Stars: ✭ 122 (-66.3%)
Mutual labels:  natural-language-processing, text
Stringi
THE String Processing Package for R (with ICU)
Stars: ✭ 204 (-43.65%)
Mutual labels:  natural-language-processing, text
allot
Parse placeholder and wildcard text commands
Stars: ✭ 51 (-85.91%)
Mutual labels:  text, matching
Adapter Transformers
Huggingface Transformers + Adapters = ❤️
Stars: ✭ 338 (-6.63%)
Mutual labels:  natural-language-processing
Awesome Self Supervised Learning
A curated list of awesome self-supervised methods
Stars: ✭ 4,492 (+1140.88%)
Mutual labels:  natural-language-processing
Nndial
NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
Stars: ✭ 332 (-8.29%)
Mutual labels:  natural-language-processing
Vncorenlp
A Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (-2.21%)
Mutual labels:  natural-language-processing
Decoro
Android library designed for automatic formatting of text input by custom rules
Stars: ✭ 325 (-10.22%)
Mutual labels:  text
Dynamic Memory Networks In Theano
Implementation of Dynamic memory networks by Kumar et al. http://arxiv.org/abs/1506.07285
Stars: ✭ 334 (-7.73%)
Mutual labels:  natural-language-processing
Question generation
Neural question generation using transformers
Stars: ✭ 356 (-1.66%)
Mutual labels:  natural-language-processing
Adam qas
ADAM - A Question Answering System. Inspired from IBM Watson
Stars: ✭ 330 (-8.84%)
Mutual labels:  natural-language-processing
Awesome Search
Awesome Search - this is all about the (e-commerce) search and its awesomeness
Stars: ✭ 361 (-0.28%)
Mutual labels:  natural-language-processing
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (-1.1%)
Mutual labels:  natural-language-processing
Indic nlp library
Resources and tools for Indian language Natural Language Processing
Stars: ✭ 348 (-3.87%)
Mutual labels:  natural-language-processing
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (-3.87%)
Mutual labels:  text
logo

MatchZoo-py Tweet

PyTorch version of MatchZoo.

Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。

Python 3.6 Pypi Downloads Documentation Status Build Status codecov License Requirements Status Gitter

The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use.

Tasks Text 1 Text 2 Objective
Paraphrase Indentification string 1 string 2 classification
Textual Entailment text hypothesis classification
Question Answer question answer classification/ranking
Conversation dialog response classification/ranking
Information Retrieval query document ranking

Get Started in 60 Seconds

To train a Deep Semantic Structured Model, make use of MatchZoo customized loss functions and evaluation metrics to define a task:

import torch
import matchzoo as mz

ranking_task = mz.tasks.Ranking(losses=mz.losses.RankCrossEntropyLoss(num_neg=4))
ranking_task.metrics = [
    mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
    mz.metrics.MeanAveragePrecision()
]

Prepare input data:

train_pack = mz.datasets.wiki_qa.load_data('train', task=ranking_task)
valid_pack = mz.datasets.wiki_qa.load_data('dev', task=ranking_task)

Preprocess your input data in three lines of code, keep track parameters to be passed into the model:

preprocessor = mz.models.ArcI.get_default_preprocessor()
train_processed = preprocessor.fit_transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)

Generate pair-wise training data on-the-fly:

trainset = mz.dataloader.Dataset(
    data_pack=train_processed,
    mode='pair',
    num_dup=1,
    num_neg=4,
    batch_size=32
)
validset = mz.dataloader.Dataset(
    data_pack=valid_processed,
    mode='point',
    batch_size=32
)

Define padding callback and generate data loader:

padding_callback = mz.models.ArcI.get_default_padding_callback()

trainloader = mz.dataloader.DataLoader(
    dataset=trainset,
    stage='train',
    callback=padding_callback
)
validloader = mz.dataloader.DataLoader(
    dataset=validset,
    stage='dev',
    callback=padding_callback
)

Initialize the model, fine-tune the hyper-parameters:

model = mz.models.ArcI()
model.params['task'] = ranking_task
model.params['embedding_output_dim'] = 100
model.params['embedding_input_dim'] = preprocessor.context['embedding_input_dim']
model.guess_and_fill_missing_params()
model.build()

Trainer is used to control the training flow:

optimizer = torch.optim.Adam(model.parameters())

trainer = mz.trainers.Trainer(
    model=model,
    optimizer=optimizer,
    trainloader=trainloader,
    validloader=validloader,
    epochs=10
)

trainer.run()

References

Tutorials

English Documentation

If you're interested in the cutting-edge research progress, please take a look at awaresome neural models for semantic match.

Install

MatchZoo-py is dependent on PyTorch. Two ways to install MatchZoo-py:

Install MatchZoo-py from Pypi:

pip install matchzoo-py

Install MatchZoo-py from the Github source:

git clone https://github.com/NTMC-Community/MatchZoo-py.git
cd MatchZoo-py
python setup.py install

Models

Citation

If you use MatchZoo in your research, please use the following BibTex entry.

@inproceedings{Guo:2019:MLP:3331184.3331403,
 author = {Guo, Jiafeng and Fan, Yixing and Ji, Xiang and Cheng, Xueqi},
 title = {MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching},
 booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series = {SIGIR'19},
 year = {2019},
 isbn = {978-1-4503-6172-9},
 location = {Paris, France},
 pages = {1297--1300},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3331184.3331403},
 doi = {10.1145/3331184.3331403},
 acmid = {3331403},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {matchzoo, neural network, text matching},
} 

Development Team

​ ​ ​ ​

faneshion
Yixing Fan

Core Dev
ASST PROF, ICT

Chriskuei
Jiangui Chen

Core Dev
PhD. ICT

caiyinqiong
Yinqiong Cai

Core Dev
M.S. ICT

pl8787
Liang Pang

Core Dev
ASST PROF, ICT

lixinsu
Lixin Su

Dev
PhD. ICT

ChrisRBXiong
Ruibin Xiong

Dev
M.S. ICT

dyuyang
Yuyang Ding

Dev
M.S. ICT

rgtjf
Junfeng Tian

Dev
M.S. ECNU

wqh17101
Qinghua Wang

Documentation
B.S. Shandong Univ.

Contribution

Please make sure to read the Contributing Guide before creating a pull request. If you have a MatchZoo-related paper/project/compnent/tool, send a pull request to this awesome list!

Thank you to all the people who already contributed to MatchZoo!

Bo Wang, Zeyi Wang, Liu Yang, Zizhen Wang, Zhou Yang, Jianpeng Hou, Lijuan Chen, Yukun Zheng, Niuguo Cheng, Dai Zhuyun, Aneesh Joshi, Zeno Gantner, Kai Huang, stanpcf, ChangQF, Mike Kellogg

Project Organizers

  • Jiafeng Guo
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Yanyan Lan
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Xueqi Cheng
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage

License

Apache-2.0

Copyright (c) 2019-present, Yixing Fan (faneshion)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].