Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.

Stars: ✭ 831 (+1332.76%)

Mutual labels: artificial-intelligence, natural-language-processing

Ciff

Cornell Instruction Following Framework

Stars: ✭ 23 (-60.34%)

Mutual labels: artificial-intelligence, natural-language-processing

Ml Classify Text Js

Machine learning based text classification in JavaScript using n-grams and cosine similarity

Stars: ✭ 38 (-34.48%)

Mutual labels: artificial-intelligence, natural-language-processing

Texar Pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Stars: ✭ 636 (+996.55%)

Mutual labels: natural-language-processing, machine-translation

Stanza

Official Stanford NLP Python Library for Many Human Languages

Stars: ✭ 5,887 (+10050%)

Mutual labels: artificial-intelligence, natural-language-processing

Nlg Eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

Stars: ✭ 822 (+1317.24%)

Mutual labels: natural-language-processing, machine-translation

Mycroft Core

Mycroft Core, the Mycroft Artificial Intelligence platform.

Stars: ✭ 5,489 (+9363.79%)

Mutual labels: artificial-intelligence, natural-language-processing

String To Tree Nmt

Source code and data for the paper "Towards String-to-Tree Neural Machine Translation"

Stars: ✭ 16 (-72.41%)

Mutual labels: natural-language-processing, machine-translation

Reading comprehension tf

Machine Reading Comprehension in Tensorflow

Stars: ✭ 37 (-36.21%)

Mutual labels: artificial-intelligence, natural-language-processing

Coursera Natural Language Processing Specialization

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.

Stars: ✭ 39 (-32.76%)

Mutual labels: artificial-intelligence, natural-language-processing

Learn Data Science For Free

This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in …

Stars: ✭ 4,757 (+8101.72%)

Mutual labels: artificial-intelligence, natural-language-processing

Cdqa

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.

Stars: ✭ 500 (+762.07%)

Mutual labels: artificial-intelligence, natural-language-processing

View All Similar Projects ➔

Quick Installation

Detailed usage examples and instructions can be found in the Full Documentation.

Simple installation from PyPI

pip install unbabel-comet

To develop locally install Poetry and run the following commands:

git clone https://github.com/Unbabel/COMET
poetry install

Scoring MT outputs:

Via Bash:

Examples from WMT20:

echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und Kindergärten wurden eröffnet." >> src.de
echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp.en
echo -e "They were able to control the fire.\nSchools and kindergartens opened" >> ref.en

comet score -s src.de -h hyp.en -r ref.en

You can export your results to a JSON file using the --to_json flag and select another model/metric with --model.

comet score -s src.de -h hyp.en -r ref.en --model wmt-large-hter-estimator --to_json segments.json

Via Python:

from comet.models import download_model
model = download_model("wmt-large-da-estimator-1719")
data = [
    {
        "src": "Dem Feuer konnte Einhalt geboten werden",
        "mt": "The fire could be stopped",
        "ref": "They were able to control the fire."
    },
    {
        "src": "Schulen und Kindergärten wurden eröffnet.",
        "mt": "Schools and kindergartens were open",
        "ref": "Schools and kindergartens opened"
    }
]
model.predict(data, cuda=True, show_progress=True)

Simple Pythonic way to convert list or segments to model inputs:

source = ["Dem Feuer konnte Einhalt geboten werden", "Schulen und Kindergärten wurden eröffnet."]
hypothesis = ["The fire could be stopped", "Schools and kindergartens were open"]
reference = ["They were able to control the fire.", "Schools and kindergartens opened"]

data = {"src": source, "mt": hypothesis, "ref": reference}
data = [dict(zip(data, t)) for t in zip(*data.values())]

model.predict(data, cuda=True, show_progress=True)

Note: Using the python interface you will get a list of segment-level scores. You can obtain the corpus-level score by averaging the segment-level scores

Model Zoo:

Model	Description
↑`wmt-large-da-estimator-1719`	RECOMMENDED: Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19
↑`wmt-base-da-estimator-1719`	Estimator model build on top of XLM-R (base) trained on DA from WMT17, WMT18 and WMT19
↓`wmt-large-hter-estimator`	Estimator model build on top of XLM-R (large) trained to regress on HTER.
↓`wmt-base-hter-estimator`	Estimator model build on top of XLM-R (base) trained to regress on HTER.
↑`emnlp-base-da-ranker`	Translation ranking model that uses XLM-R to encode sentences. This model was trained with WMT17 and WMT18 Direct Assessments Relative Ranks (DARR).

QE-as-a-metric:

Model	Description
`wmt-large-qe-estimator-1719`	Quality Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19.

Train your own Metric:

Instead of using pretrained models your can train your own model with the following command:

comet train -f {config_file_path}.yaml

Tensorboard:

Launch tensorboard with:

tensorboard --logdir="experiments/"

Download Command:

To download public available corpora to train your new models you can use the download command. For example to download the APEQUEST HTER corpus just run the following command:

comet download -d apequest --saving_path data/

unittest:

In order to run the toolkit tests you must run the following command:

coverage run --source=comet -m unittest discover
coverage report -m

Publications

@inproceedings{rei-etal-2020-comet,
    title = "{COMET}: A Neural Framework for {MT} Evaluation",
    author = "Rei, Ricardo  and
      Stewart, Craig  and
      Farinha, Ana C  and
      Lavie, Alon",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.213",
    pages = "2685--2702",
}

@inproceedings{rei-EtAl:2020:WMT,
  author    = {Rei, Ricardo  and  Stewart, Craig  and  Farinha, Ana C  and  Lavie, Alon},
  title     = {Unbabel's Participation in the WMT20 Metrics Shared Task},
  booktitle      = {Proceedings of the Fifth Conference on Machine Translation},
  month          = {November},
  year           = {2020},
  address        = {Online},
  publisher      = {Association for Computational Linguistics},
  pages     = {909--918},
}

@inproceedings{stewart-etal-2020-comet,
    title = "{COMET} - Deploying a New State-of-the-art {MT} Evaluation Metric in Production",
    author = "Stewart, Craig  and
      Rei, Ricardo  and
      Farinha, Catarina  and
      Lavie, Alon",
    booktitle = "Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)",
    month = oct,
    year = "2020",
    address = "Virtual",
    publisher = "Association for Machine Translation in the Americas",
    url = "https://www.aclweb.org/anthology/2020.amta-user.4",
    pages = "78--109",
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 58

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗