All Projects → NTMC-Community → Matchzoo

NTMC-Community / Matchzoo

Licence: apache-2.0
Facilitating the design, comparison and sharing of deep text matching models.

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
Makefile
30231 projects

Projects that are alternatives of or similar to Matchzoo

Matchzoo Py
Facilitating the design, comparison and sharing of deep text matching models.
Stars: ✭ 362 (-89.85%)
Mutual labels:  natural-language-processing, matching, text
ELMO-NLP
ELMO在QA问答,文本分类等NLP上面的应用
Stars: ✭ 15 (-99.58%)
Mutual labels:  text, matching
Nlp Recipes
Natural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+62.08%)
Mutual labels:  natural-language-processing, text
Nlp
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
Stars: ✭ 367 (-89.71%)
Mutual labels:  natural-language-processing, text
Language Modelling
Generating Text using Deep Learning in Python - LSTM, RNN, Keras
Stars: ✭ 38 (-98.93%)
Mutual labels:  natural-language-processing, text
Nlp Pretrained Model
A collection of Natural language processing pre-trained models.
Stars: ✭ 122 (-96.58%)
Mutual labels:  natural-language-processing, text
Stringi
THE String Processing Package for R (with ICU)
Stars: ✭ 204 (-94.28%)
Mutual labels:  natural-language-processing, text
allot
Parse placeholder and wildcard text commands
Stars: ✭ 51 (-98.57%)
Mutual labels:  text, matching
Awesome Arabic
A curated list of awesome projects and dev/design resources for supporting Arabic computational needs.
Stars: ✭ 309 (-91.34%)
Mutual labels:  natural-language-processing
Fitty
✨ Makes text fit perfectly
Stars: ✭ 3,321 (-6.92%)
Mutual labels:  text
Nlp101
NLP 101: a resource repository for Deep Learning and Natural Language Processing
Stars: ✭ 305 (-91.45%)
Mutual labels:  natural-language-processing
Zhihu
This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.
Stars: ✭ 3,307 (-7.32%)
Mutual labels:  natural-language-processing
Bytenet Tensorflow
ByteNet for character-level language modelling
Stars: ✭ 319 (-91.06%)
Mutual labels:  natural-language-processing
Nlprule
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.
Stars: ✭ 309 (-91.34%)
Mutual labels:  natural-language-processing
Decoro
Android library designed for automatic formatting of text input by custom rules
Stars: ✭ 325 (-90.89%)
Mutual labels:  text
Nlp
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
Stars: ✭ 304 (-91.48%)
Mutual labels:  natural-language-processing
Graphbrain
Language, Knowledge, Cognition
Stars: ✭ 294 (-91.76%)
Mutual labels:  natural-language-processing
Nndial
NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
Stars: ✭ 332 (-90.7%)
Mutual labels:  natural-language-processing
Chakin
Simple downloader for pre-trained word vectors
Stars: ✭ 323 (-90.95%)
Mutual labels:  natural-language-processing
Gcn Over Pruned Trees
Graph Convolution over Pruned Dependency Trees Improves Relation Extraction (authors' PyTorch implementation)
Stars: ✭ 312 (-91.26%)
Mutual labels:  natural-language-processing
logo

MatchZoo Tweet

Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。

Python 3.6 Pypi Downloads Documentation Status Build Status codecov License Requirements Status

🔥News: MatchZoo-py (PyTorch version of MatchZoo) is ready now.

The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use.

Tasks Text 1 Text 2 Objective
Paraphrase Identification string 1 string 2 classification
Textual Entailment text hypothesis classification
Question Answer question answer classification/ranking
Conversation dialog response classification/ranking
Information Retrieval query document ranking

Get Started in 60 Seconds

To train a Deep Semantic Structured Model, import matchzoo and prepare input data.

import matchzoo as mz

train_pack = mz.datasets.wiki_qa.load_data('train', task='ranking')
valid_pack = mz.datasets.wiki_qa.load_data('dev', task='ranking')

Preprocess your input data in three lines of code, keep track parameters to be passed into the model.

preprocessor = mz.preprocessors.DSSMPreprocessor()
train_processed = preprocessor.fit_transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)

Make use of MatchZoo customized loss functions and evaluation metrics:

ranking_task = mz.tasks.Ranking(loss=mz.losses.RankCrossEntropyLoss(num_neg=4))
ranking_task.metrics = [
    mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
    mz.metrics.MeanAveragePrecision()
]

Initialize the model, fine-tune the hyper-parameters.

model = mz.models.DSSM()
model.params['input_shapes'] = preprocessor.context['input_shapes']
model.params['task'] = ranking_task
model.guess_and_fill_missing_params()
model.build()
model.compile()

Generate pair-wise training data on-the-fly, evaluate model performance using customized callbacks on validation data.

train_generator = mz.PairDataGenerator(train_processed, num_dup=1, num_neg=4, batch_size=64, shuffle=True)
valid_x, valid_y = valid_processed.unpack()
evaluate = mz.callbacks.EvaluateAllMetrics(model, x=valid_x, y=valid_y, batch_size=len(valid_x))
history = model.fit_generator(train_generator, epochs=20, callbacks=[evaluate], workers=5, use_multiprocessing=False)

References

Tutorials

English Documentation

中文文档

If you're interested in the cutting-edge research progress, please take a look at awaresome neural models for semantic match.

Install

MatchZoo is dependent on Keras and Tensorflow. Two ways to install MatchZoo:

Install MatchZoo from Pypi:

pip install matchzoo

Install MatchZoo from the Github source:

git clone https://github.com/NTMC-Community/MatchZoo.git
cd MatchZoo
python setup.py install

Models

  1. DRMM: this model is an implementation of A Deep Relevance Matching Model for Ad-hoc Retrieval.

  2. MatchPyramid: this model is an implementation of Text Matching as Image Recognition

  3. ARC-I: this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences

  4. DSSM: this model is an implementation of Learning Deep Structured Semantic Models for Web Search using Clickthrough Data

  5. CDSSM: this model is an implementation of Learning Semantic Representations Using Convolutional Neural Networks for Web Search

  6. ARC-II: this model is an implementation of Convolutional Neural Network Architectures for Matching Natural Language Sentences

  7. MV-LSTM:this model is an implementation of A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations

  8. aNMM: this model is an implementation of aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model

  9. DUET: this model is an implementation of Learning to Match Using Local and Distributed Representations of Text for Web Search

  10. K-NRM: this model is an implementation of End-to-End Neural Ad-hoc Ranking with Kernel Pooling

  11. CONV-KNRM: this model is an implementation of Convolutional neural networks for soft-matching n-grams in ad-hoc search

  12. models under development: Match-SRNN, DeepRank, BiMPM ....

Citation

If you use MatchZoo in your research, please use the following BibTex entry.

@inproceedings{Guo:2019:MLP:3331184.3331403,
 author = {Guo, Jiafeng and Fan, Yixing and Ji, Xiang and Cheng, Xueqi},
 title = {MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching},
 booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series = {SIGIR'19},
 year = {2019},
 isbn = {978-1-4503-6172-9},
 location = {Paris, France},
 pages = {1297--1300},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3331184.3331403},
 doi = {10.1145/3331184.3331403},
 acmid = {3331403},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {matchzoo, neural network, text matching},
} 

Development Team

​ ​ ​ ​

faneshion
Fan Yixing

Core Dev
ASST PROF, ICT

bwanglzu
Wang Bo

Core Dev
M.S. TU Delft

uduse
Wang Zeyi

Core Dev
B.S. UC Davis

pl8787
Pang Liang

Core Dev
ASST PROF, ICT

yangliuy
Yang Liu

Core Dev
PhD. UMASS

wqh17101
Wang Qinghua

Documentation
B.S. Shandong Univ.

ZizhenWang
Wang Zizhen

Dev
M.S. UCAS

lixinsu
Su Lixin

Dev
PhD. UCAS

zhouzhouyang520
Yang Zhou

Dev
M.S. CQUT

rgtjf
Tian Junfeng

Dev
M.S. ECNU

Contribution

Please make sure to read the Contributing Guide before creating a pull request. If you have a MatchZoo-related paper/project/compnent/tool, send a pull request to this awesome list!

Thank you to all the people who already contributed to MatchZoo!

Jianpeng Hou, Lijuan Chen, Yukun Zheng, Niuguo Cheng, Dai Zhuyun, Aneesh Joshi, Zeno Gantner, Kai Huang, stanpcf, ChangQF, Mike Kellogg

Project Organizers

  • Jiafeng Guo
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Yanyan Lan
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Xueqi Cheng
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage

License

Apache-2.0

Copyright (c) 2015-present, Yixing Fan (faneshion)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].