All Projects → Unbabel → Comet

Unbabel / Comet

Licence: apache-2.0
A Neural Framework for MT Evaluation

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Comet

Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (+103.45%)
Mutual labels:  artificial-intelligence, natural-language-processing, machine-translation
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (+129.31%)
Mutual labels:  artificial-intelligence, natural-language-processing, machine-translation
Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (-8.62%)
Mutual labels:  artificial-intelligence, natural-language-processing, machine-translation
Ciphey
⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡
Stars: ✭ 9,116 (+15617.24%)
Mutual labels:  artificial-intelligence, natural-language-processing
Ai Job Recommend
国内公司人工智能方向(含机器学习、深度学习、计算机视觉和自然语言处理)岗位的招聘信息(含全职、实习和校招)
Stars: ✭ 679 (+1070.69%)
Mutual labels:  artificial-intelligence, natural-language-processing
Ai Series
📚 [.md & .ipynb] Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. 💫 人工智能与深度学习实战,数理统计篇 | 机器学习篇 | 深度学习篇 | 自然语言处理篇 | 工具实践 Scikit & Tensoflow & PyTorch 篇 | 行业应用 & 课程笔记
Stars: ✭ 702 (+1110.34%)
Mutual labels:  artificial-intelligence, natural-language-processing
Nlp Notebooks
A collection of notebooks for Natural Language Processing from NLP Town
Stars: ✭ 513 (+784.48%)
Mutual labels:  artificial-intelligence, natural-language-processing
Riceteacatpanda
repo with challenge material for riceteacatpanda (2020)
Stars: ✭ 18 (-68.97%)
Mutual labels:  artificial-intelligence, natural-language-processing
Awesome Ai Ml Dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (+1332.76%)
Mutual labels:  artificial-intelligence, natural-language-processing
Ciff
Cornell Instruction Following Framework
Stars: ✭ 23 (-60.34%)
Mutual labels:  artificial-intelligence, natural-language-processing
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-34.48%)
Mutual labels:  artificial-intelligence, natural-language-processing
Texar Pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 636 (+996.55%)
Mutual labels:  natural-language-processing, machine-translation
Stanza
Official Stanford NLP Python Library for Many Human Languages
Stars: ✭ 5,887 (+10050%)
Mutual labels:  artificial-intelligence, natural-language-processing
Nlg Eval
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Stars: ✭ 822 (+1317.24%)
Mutual labels:  natural-language-processing, machine-translation
Mycroft Core
Mycroft Core, the Mycroft Artificial Intelligence platform.
Stars: ✭ 5,489 (+9363.79%)
Mutual labels:  artificial-intelligence, natural-language-processing
String To Tree Nmt
Source code and data for the paper "Towards String-to-Tree Neural Machine Translation"
Stars: ✭ 16 (-72.41%)
Mutual labels:  natural-language-processing, machine-translation
Reading comprehension tf
Machine Reading Comprehension in Tensorflow
Stars: ✭ 37 (-36.21%)
Mutual labels:  artificial-intelligence, natural-language-processing
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-32.76%)
Mutual labels:  artificial-intelligence, natural-language-processing
Learn Data Science For Free
This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in …
Stars: ✭ 4,757 (+8101.72%)
Mutual labels:  artificial-intelligence, natural-language-processing
Cdqa
⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
Stars: ✭ 500 (+762.07%)
Mutual labels:  artificial-intelligence, natural-language-processing



License GitHub stars PyPI Code Style

Quick Installation

Detailed usage examples and instructions can be found in the Full Documentation.

Simple installation from PyPI

pip install unbabel-comet

To develop locally install Poetry and run the following commands:

git clone https://github.com/Unbabel/COMET
poetry install

Scoring MT outputs:

Via Bash:

Examples from WMT20:

echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und Kindergärten wurden eröffnet." >> src.de
echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp.en
echo -e "They were able to control the fire.\nSchools and kindergartens opened" >> ref.en
comet score -s src.de -h hyp.en -r ref.en

You can export your results to a JSON file using the --to_json flag and select another model/metric with --model.

comet score -s src.de -h hyp.en -r ref.en --model wmt-large-hter-estimator --to_json segments.json

Via Python:

from comet.models import download_model
model = download_model("wmt-large-da-estimator-1719")
data = [
    {
        "src": "Dem Feuer konnte Einhalt geboten werden",
        "mt": "The fire could be stopped",
        "ref": "They were able to control the fire."
    },
    {
        "src": "Schulen und Kindergärten wurden eröffnet.",
        "mt": "Schools and kindergartens were open",
        "ref": "Schools and kindergartens opened"
    }
]
model.predict(data, cuda=True, show_progress=True)

Simple Pythonic way to convert list or segments to model inputs:

source = ["Dem Feuer konnte Einhalt geboten werden", "Schulen und Kindergärten wurden eröffnet."]
hypothesis = ["The fire could be stopped", "Schools and kindergartens were open"]
reference = ["They were able to control the fire.", "Schools and kindergartens opened"]

data = {"src": source, "mt": hypothesis, "ref": reference}
data = [dict(zip(data, t)) for t in zip(*data.values())]

model.predict(data, cuda=True, show_progress=True)

Note: Using the python interface you will get a list of segment-level scores. You can obtain the corpus-level score by averaging the segment-level scores

Model Zoo:

Model Description
wmt-large-da-estimator-1719 RECOMMENDED: Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19
wmt-base-da-estimator-1719 Estimator model build on top of XLM-R (base) trained on DA from WMT17, WMT18 and WMT19
wmt-large-hter-estimator Estimator model build on top of XLM-R (large) trained to regress on HTER.
wmt-base-hter-estimator Estimator model build on top of XLM-R (base) trained to regress on HTER.
emnlp-base-da-ranker Translation ranking model that uses XLM-R to encode sentences. This model was trained with WMT17 and WMT18 Direct Assessments Relative Ranks (DARR).

QE-as-a-metric:

Model Description
wmt-large-qe-estimator-1719 Quality Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19.

Train your own Metric:

Instead of using pretrained models your can train your own model with the following command:

comet train -f {config_file_path}.yaml

Tensorboard:

Launch tensorboard with:

tensorboard --logdir="experiments/"

Download Command:

To download public available corpora to train your new models you can use the download command. For example to download the APEQUEST HTER corpus just run the following command:

comet download -d apequest --saving_path data/

unittest:

In order to run the toolkit tests you must run the following command:

coverage run --source=comet -m unittest discover
coverage report -m

Publications

@inproceedings{rei-etal-2020-comet,
    title = "{COMET}: A Neural Framework for {MT} Evaluation",
    author = "Rei, Ricardo  and
      Stewart, Craig  and
      Farinha, Ana C  and
      Lavie, Alon",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.213",
    pages = "2685--2702",
}
@inproceedings{rei-EtAl:2020:WMT,
  author    = {Rei, Ricardo  and  Stewart, Craig  and  Farinha, Ana C  and  Lavie, Alon},
  title     = {Unbabel's Participation in the WMT20 Metrics Shared Task},
  booktitle      = {Proceedings of the Fifth Conference on Machine Translation},
  month          = {November},
  year           = {2020},
  address        = {Online},
  publisher      = {Association for Computational Linguistics},
  pages     = {909--918},
}
@inproceedings{stewart-etal-2020-comet,
    title = "{COMET} - Deploying a New State-of-the-art {MT} Evaluation Metric in Production",
    author = "Stewart, Craig  and
      Rei, Ricardo  and
      Farinha, Catarina  and
      Lavie, Alon",
    booktitle = "Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)",
    month = oct,
    year = "2020",
    address = "Virtual",
    publisher = "Association for Machine Translation in the Americas",
    url = "https://www.aclweb.org/anthology/2020.amta-user.4",
    pages = "78--109",
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].