Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → uhh-lt → Taxi

uhh-lt / Taxi

Licence: apache-2.0

TAXI: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

Labels

jupyter-notebook

Projects that are alternatives of or similar to Taxi

Notebook playground

Notebooks for playing around with datasets etc.

Stars: ✭ 21 (-22.22%)

Mutual labels: jupyter-notebook

Imagenetmultilabel

Fine-grained ImageNet annotations

Stars: ✭ 22 (-18.52%)

Mutual labels: jupyter-notebook

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

Movie recommender

MovieLens based recommender system.使用MovieLens数据集训练的电影推荐系统。

Stars: ✭ 914 (+3285.19%)

Mutual labels: jupyter-notebook

Repo for Zillow Web scraper

Stars: ✭ 22 (-18.52%)

Mutual labels: jupyter-notebook

STAT406 @ UBC - "Elements of Statistical Learning"

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

Computer vision on 🍓

Stars: ✭ 21 (-22.22%)

Mutual labels: jupyter-notebook

Oxford Deepnlp 2017

🚀 🎉 ✨ Oxford Deep NLP 2017 Course Materials and Practicals, Solutions

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

Facial Landmarking

facial landmarking using dlib

Stars: ✭ 22 (-18.52%)

Mutual labels: jupyter-notebook

Machine Learning Data Science Reuse

Gathers machine learning and data science techniques for problem solving.

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

Utilities for nolearn.lasagne

Stars: ✭ 21 (-22.22%)

Mutual labels: jupyter-notebook

Contributions to Julia Documentation

Stars: ✭ 21 (-22.22%)

Mutual labels: jupyter-notebook

World Models Sonic Pytorch

Attempt at reinforcement learning with curiosity for Sonic the Hedgehog games. Number 149 on OpenAI retro contest leaderboard, but more work needed

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

Fast, general, and tested differentiable structured prediction in PyTorch

Stars: ✭ 913 (+3281.48%)

Mutual labels: jupyter-notebook

Ucsandiegox Dse200x Python For Data Science

UCSandDiego Micro Masters Program

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

Boston Housing Prices

🏠 Predict the selling price of a new home in Boston, Massachusetts area

Stars: ✭ 21 (-22.22%)

Mutual labels: jupyter-notebook

Pythondatasciencehandbook

The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

Stars: ✭ 31,995 (+118400%)

Mutual labels: jupyter-notebook

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

Conjugate Gradient

Painless conjugate gradient notebooks

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

AI SDTM mapping (R for ML, Python, TensorFlow for DL)

Stars: ✭ 27 (+0%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

TAXI: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

This page contains implementation of a method for taxonomy induction that reached the first place in the SemEval 2016 challenge on taxonomy extraction evaluation. The method builds a taxonomy from a domain vocabulary. It extracts hypernyms from substrings and large domain-specific corpora bootstrapped from the input vocabulary. Multiple evaluations based on the SemEval taxonomy extraction datasets of four languages and three domains show state-of-the-art performance of our approach. This page contains implementations of the method including all resources needed to reproduce experiment described in the following paper presented in San Diego at SemEval co-located with the NAACL'2016:

@inproceedings{panchenko2016taxi,
  title={TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns,  Substrings and Focused Crawling},
  author={Panchenko, Alexander and Faralli, Stefano and  Ruppert, Eugen and Remus, Steffen and  Naets, Hubert and  Fairon, Cedrick and Ponzetto, Simone Paolo and Biemann, Chris},
  booktitle={Proceedings of the 10th International Workshop on Semantic Evaluation},
  year={2016},
  address={San Diego, CA, USA},
  organization={Association for Computational Linguistics}
}

If you would like to refer to the system please use this citation. More information about the approach can be found at the TAXI web site.

System Requirements

The system was tested on Debian/Ubuntu Linux and Mac OS X. To load all resources in memory you need about 64 Gb of RAM.

Installation

Clone repository:

git clone https://github.com/tudarmstadt-lt/taxi.git

Download resources into the repository (4.4G compressed by gzip):

cd taxi && wget http://panchenko.me/data/joint/taxi/res/resources.tgz && tar xzf resources.tgz

Install dependencies for using pygraphviz:

$ sudo apt-get install python-dev graphviz libgraphviz-dev pkg-config

Install project dependencies:

pip install -r requirements.txt

Setup spaCy. Download the language models for English, Dutch, French and Italian

$ python -m spacy download en
$ python -m spacy download nl
$ python -m spacy download fr
$ python -m spacy download it

Setup NLTK

$ python -m nltk.downloader stopwords
$ python -m nltk.downloader wordnet

Induction of SemEval Taxonomies

Run the semeval.py to reproduce experimental results, e.g.:

For a test run (few resources loaded, quick):

python semeval.py vocabularies/science_en.csv en simple --test

For a normal run (all resources are loaded, requires 64Gb of RAM):

python semeval.py vocabularies/science_en.csv en simple

Afterwards a noisy graph is being created. Clean the output by executing(this example uses the inputfile science_en.csv-relations.csv-taxo-knn1.csv):

./run.sh taxi_output/simple_full/science_en.csv-relations.csv-taxo-knn1.csv

The vocabularies directory contains input terms for different domains and languages. The script lets you reproduce results in the SemEval 2016 Task 13 Taxonomy Extraction Evaluation described in the our paper. This script load hypernyms from the downloaded resources and constructs a taxonomy for every input vocabulary of the SemEval datasets, e.g. English Food domain. Generally, the TAXI approach takes as input a vocabulary and outputs a taxonomy for a linked subset of the terms from this vocabulary. Currently the main purpose of this repository is to ensure reproducibility of the SemEval results. The results taxonomies will be generated next to the corresponding input vocabulary file. If you need to adapt the script for your needs and require help do not hesitate to contact us.

Distributional Semantics

Download the required embeddings:

$ wget http://ltdata1.informatik.uni-hamburg.de/taxi/embeddings/embeddings_poincare_wordnet
$ wget http://ltdata1.informatik.uni-hamburg.de/taxi/embeddings/own_embeddings_w2v
$ wget http://ltdata1.informatik.uni-hamburg.de/taxi/embeddings/own_embeddings_w2v.trainables.syn1neg.npy
$ wget http://ltdata1.informatik.uni-hamburg.de/taxi/embeddings/own_embeddings_w2v.wv.vectors.npy

Set the directory path in line 45 of distributional_semantics.py to the directory containing the embeddings download above.
To apply distributional semantics to the generated taxonomy, use the script distributional_semantics.py or the notebook distributional_semantics.py.ipynb

The script can be used with following options:

Option	Alternate	Description	Default	Choices
--taxonomy	-t	Input file containing the taxonomy	-	-
--mode	-m	Mode of the algorithm	ds	ds, root, remove
--domain	-d	Domain of the taxonomy	science	science, science_wordnet, food, environment_eurovoc
--exparent	-ep	Exculde parent while calculating cluster similarity	False	-
--exfamily	-ef	Exculde family while calculating cluster similarity	False	-

Example:
$ python distributional_semantics.py -t taxi_output/simple_full/science_en.csv-relations.csv-taxo-knn1.csv -d food -ep

Visualizing taxonomies

To visualize the taxonomy structures in a .csv file, you must have Networkx and Pygraphviz setup in your environment.

To construct a hierarchical taxonomy structure:
$ python visualize_taxonomy.py --file <csv filename>

The images generated will be very large, so alternatively, the graph can be constructed inside the notebook networkx_graph.ipynb

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 27

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗