All Projects → yourh → Attentionxml

yourh / Attentionxml

Implementation for "AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification"

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Attentionxml

Webapiclient
An open source project based on the HttpClient. You only need to define the c# interface and modify the related features to invoke the client library of the remote http interface asynchronously.
Stars: ✭ 1,618 (+1184.13%)
Mutual labels:  xml
Fetch Plus
🐕 Fetch+ is a convenient Fetch API replacement with first-class middleware support.
Stars: ✭ 116 (-7.94%)
Mutual labels:  xml
Markup.ml
Error-recovering streaming HTML5 and XML parsers
Stars: ✭ 122 (-3.17%)
Mutual labels:  xml
Kripton
A Java/Kotlin library for Android platform, to manage bean's persistence in SQLite, SharedPreferences, JSON, XML, Properties, Yaml, CBOR.
Stars: ✭ 110 (-12.7%)
Mutual labels:  xml
Dino
Modern XMPP ("Jabber") Chat Client using GTK+/Vala
Stars: ✭ 1,637 (+1199.21%)
Mutual labels:  xml
Lemminx
XML Language Server
Stars: ✭ 117 (-7.14%)
Mutual labels:  xml
Yq
Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
Stars: ✭ 1,688 (+1239.68%)
Mutual labels:  xml
Prettydiff
Beautifier and language aware code comparison tool for many languages. It also minifies and a few other things.
Stars: ✭ 1,635 (+1197.62%)
Mutual labels:  xml
Marklogic Data Hub
The MarkLogic Data Hub: documentation ==>
Stars: ✭ 113 (-10.32%)
Mutual labels:  xml
Saxerator
A SAX-based XML parser for parsing large files into manageable chunks
Stars: ✭ 119 (-5.56%)
Mutual labels:  xml
Bible Database
Bible databases as XML, JSON, SQL & SQLITE3 Database format for various languages. Developers can download it freely for their development works. Freely received, freely give.
Stars: ✭ 111 (-11.9%)
Mutual labels:  xml
Repurrrsive
Recursive lists to use in teaching and examples, because there is no iris data for lists.
Stars: ✭ 112 (-11.11%)
Mutual labels:  xml
Flexlib
FlexLib是一个基于flexbox模型,使用xml文件进行界面布局的框架,融合了web快速布局的能力,让iOS界面开发像写网页一样简单快速
Stars: ✭ 1,569 (+1145.24%)
Mutual labels:  xml
Dotnet Transform Xdt
Modern .NET tools and library for XDT (Xml Document Transformation)
Stars: ✭ 110 (-12.7%)
Mutual labels:  xml
Snodge
Randomly mutate JSON, XML, HTML forms, text and binary data for fuzz testing
Stars: ✭ 121 (-3.97%)
Mutual labels:  xml
Pdfalto
PDF to XML ALTO file converter
Stars: ✭ 109 (-13.49%)
Mutual labels:  xml
Twital
Twital is a "plugin" for Twig that adds some sugar syntax, which makes its templates similar to PHPTal or VueJS.
Stars: ✭ 116 (-7.94%)
Mutual labels:  xml
Sepa king
Ruby gem for creating SEPA XML files
Stars: ✭ 125 (-0.79%)
Mutual labels:  xml
Js2xml
Convert Javascript code to an XML document
Stars: ✭ 124 (-1.59%)
Mutual labels:  xml
Binding.scala
Reactive data-binding for Scala
Stars: ✭ 1,539 (+1121.43%)
Mutual labels:  xml

AttentionXML

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

Requirements

  • python==3.7.4
  • click==7.0
  • ruamel.yaml==0.16.5
  • numpy==1.16.2
  • scipy==1.3.1
  • scikit-learn==0.21.2
  • gensim==3.4.0
  • torch==1.0.1
  • nltk==3.4
  • tqdm==4.31.1
  • joblib==0.13.2
  • logzero==1.5.0

Datasets

Download the GloVe embedding (840B,300d) and convert it to gensim format (which can be loaded by gensim.models.KeyedVectors.load).

We also provide a converted GloVe embedding at here.

XML Experiments

XML experiments in paper can be run directly such as:

./scripts/run_eurlex.sh

Preprocess

Run preprocess.py for train and test datasets with tokenized texts as follows:

python preprocess.py \
--text-path data/EUR-Lex/train_texts.txt \
--label-path data/EUR-Lex/train_labels.txt \
--vocab-path data/EUR-Lex/vocab.npy \
--emb-path data/EUR-Lex/emb_init.npy \
--w2v-model data/glove.840B.300d.gensim

python preprocess.py \
--text-path data/EUR-Lex/test_texts.txt \
--label-path data/EUR-Lex/test_labels.txt \
--vocab-path data/EUR-Lex/vocab.npy 

Or run preprocss.py including tokenizing the raw texts by NLTK as follows:

python preprocess.py \
--text-path data/Wiki10-31K/train_raw_texts.txt \
--tokenized-path data/Wiki10-31K/train_texts.txt \
--label-path data/Wiki10-31K/train_labels.txt \
--vocab-path data/Wiki10-31K/vocab.npy \
--emb-path data/Wiki10-31K/emb_init.npy \
--w2v-model data/glove.840B.300d.gensim

python preprocess.py \
--text-path data/Wiki10-31K/test_raw_texts.txt \
--tokenized-path data/Wiki10-31K/test_texts.txt \
--label-path data/Wiki10-31K/test_labels.txt \
--vocab-path data/Wiki10-31K/vocab.npy 

Train and Predict

Train and predict as follows:

python main.py --data-cnf configure/datasets/EUR-Lex.yaml --model-cnf configure/models/AttentionXML-EUR-Lex.yaml 

Or do prediction only with option "--mode eval".

Ensemble

Train and predict with an ensemble:

python main.py --data-cnf configure/datasets/Wiki-500K.yaml --model-cnf configure/models/FastAttentionXML-Wiki-500K.yaml -t 0
python main.py --data-cnf configure/datasets/Wiki-500K.yaml --model-cnf configure/models/FastAttentionXML-Wiki-500K.yaml -t 1
python main.py --data-cnf configure/datasets/Wiki-500K.yaml --model-cnf configure/models/FastAttentionXML-Wiki-500K.yaml -t 2
python ensemble.py -p results/FastAttentionXML-Wiki-500K -t 3

Evaluation

python evaluation.py --results results/AttentionXML-EUR-Lex-labels.npy --targets data/EUR-Lex/test_labels.npy

Or get propensity scored metrics together:

python evaluation.py \
--results results/FastAttentionXML-Amazon-670K-labels.npy \
--targets data/Amazon-670K/test_labels.npy \
--train-labels data/Amazon-670K/train_labels.npy \
-a 0.6 \
-b 2.6

Reference

You et al., AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification, NeurIPS 2019

Declaration

It is free for non-commercial use. For commercial use, please contact Mr. Ronghi You and Prof. Shanfeng Zhu ([email protected]).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].