All Projects → UnitexGramLab → unitex-lingua

UnitexGramLab / unitex-lingua

Licence: other
Unitex/GramLab Language Resources

Programming Languages

HTML
75241 projects
shell
77523 projects

Projects that are alternatives of or similar to unitex-lingua

Glom
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
Stars: ✭ 1,341 (+7788.24%)
Mutual labels:  dictionaries
Addict
The Python Dict that's better than heroin.
Stars: ✭ 2,141 (+12494.12%)
Mutual labels:  dictionaries
Paroleitaliane
Liste di parole italiane
Stars: ✭ 227 (+1235.29%)
Mutual labels:  dictionaries
Dynamictranslator
Instant translation application for windows in .NET 🎪
Stars: ✭ 131 (+670.59%)
Mutual labels:  dictionaries
Bitextor
Bitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+888.24%)
Mutual labels:  dictionaries
Xdxf makedict
XDXF — a dictionary format, that stores word definitions that are free from representation
Stars: ✭ 177 (+941.18%)
Mutual labels:  dictionaries
Goherokuname
[Mirror] Heroku-like Random Names in Go
Stars: ✭ 41 (+141.18%)
Mutual labels:  dictionaries
SoftUni-Software-Engineering
SoftUni- Software Engineering
Stars: ✭ 47 (+176.47%)
Mutual labels:  dictionaries
Sdcv
Stars: ✭ 171 (+905.88%)
Mutual labels:  dictionaries
Emojipedia
MacOS X Dictionary containing Emoji and their meanings
Stars: ✭ 220 (+1194.12%)
Mutual labels:  dictionaries
Box
Python dictionaries with advanced dot notation access
Stars: ✭ 1,804 (+10511.76%)
Mutual labels:  dictionaries
Ipa Dict
Monolingual wordlists with pronunciation information in IPA
Stars: ✭ 139 (+717.65%)
Mutual labels:  dictionaries
Nspell
📝 Hunspell compatible spell-checker
Stars: ✭ 195 (+1047.06%)
Mutual labels:  dictionaries
Zatt
Python implementation of the Raft algorithm for distributed consensus
Stars: ✭ 119 (+600%)
Mutual labels:  dictionaries
UnPack.jl
`@pack!` and `@unpack` macros
Stars: ✭ 74 (+335.29%)
Mutual labels:  dictionaries
Memorize
🚀 Japanese-English-Mongolian dictionary. It lets you find words, kanji and more quickly and easily
Stars: ✭ 72 (+323.53%)
Mutual labels:  dictionaries
Emoji Ime Dictionary
日本語で絵文字入力をするための IME 追加辞書 📙 Google 日本語入力などで日本語から絵文字への変換を可能にする IME 拡張辞書です
Stars: ✭ 172 (+911.76%)
Mutual labels:  dictionaries
CogNet
CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates
Stars: ✭ 26 (+52.94%)
Mutual labels:  language-resources
Giotto
Theme manager for your app: apply styles to anything you want through a plist file
Stars: ✭ 18 (+5.88%)
Mutual labels:  dictionaries
Dirstalk
Modern alternative to dirbuster/dirb
Stars: ✭ 210 (+1135.29%)
Mutual labels:  dictionaries

Unitex/GramLab Language Resources

Unitex/GramLab is the open source, cross-platform, multilingual, lexicon- and grammar-based corpus processing suite

This repository contains the Language Resources which are distributed within Unitex/GramLab.

Languages

Language name Native name Language Family IETF ISO 639-2 ISO 639-1
Arabic العربية Afro-Asiatic ar ara ar
Chinese 汉语/漢語 Sino-Tibetan zh chi/zho zh
English English Indo-European en eng en
Finnish Suomi Uralic fi fin fi
French Français Indo-European fr fra fr
Georgian (Ancient) ქართული South Caucasian oge
German Deutsch Indo-European de deu de
Greek (ancient) Αρχαία Ελληνικα Indo-European grc grc
Greek (modern) Ελληνικά Indo-European el ell el
Italian Italiano Indo-European it ita it
Korean 한국어 Koreanic ko kor ko
Latin Latine Indo-European la lat la
Malagasy Malagasy Austronesian mg mlg mg
Norwegian Bokmål Norsk bokmål Indo-European no nob nb
Norwegian Nynorsk Norsk nynorsk Indo-European nn nno nn
Polish Polski Indo-European pl pol pl
Portuguese (Portugal) Português (Portugal) Indo-European pt-BR
Portuguese (Brazil) Português (Brasil) Indo-European pt-PT
Russian Русский Indo-European ru rus ru
Serbian-Cyrillic Српски Indo-European sr-Cyrl sro sr
Serbian-Latin Serbian (Latin) Indo-European sr-Latn srm
Spanish Español Indo-European es spa es
Thai ไทย Tai–Kadai th tha th

Contributing

We welcome everyone to contribute to improve the Unitex/GramLab Language Resources by forking this repository and sending a pull request with their changes.

How to support a new language in Unitex

To add a new language to Unitex:

  • Copy the folder template zxx-t-Skel and rename it according to the ISO 639-1 code of the new language
  • Use the IETF language tag if the ISO 639-1 code is not available for your language.

Your new language must provide at least:

  • An alphabet file (Alphabet.txt) and optionally a sorted alphabet (Alphabet_sort.txt)
  • A sample corpus (Corpus/Corpus.txt). Make sure you have the rights to share this resource and provide the author information on Corpus/Corpus.info
  • A sample dictionary (Dela/lang-CODE.dic) containing at least the words of the sample text
  • A sentence delimitation graph (Graphs/Preprocessing/Sentence/Sentence.grf)

Before share your contribution, make sure that:

  • File names only use 7-bits ASCII characters.
  • For each compiled graph fst2 you are also proving the .grf version.
  • For each dictionary .dic you are also providing a .info file describing the dictionary content (codes used in it, number of entries, authors, etc).
  • You accept the LGPLLR license.

RELEX network

Language Resources are mainly built and maintained by the members of the RELEX network, an international network of laboratories specialized in Computational Linguistics that was created by Maurice Gross and his LADL (Laboratoire d'Automatique Documentaire et Linguistique) team.

Country Partner
Belgium Catholic University of Leuven
Belgium CENTAL
Brazil Federal University of Goias
Brazil NILC
Brazil Projeto Relex
Brazil PUC RIO
Canada University of Montréal
Denmark University of Copenhagen
England Research and Development Unit for English Studies
France CRISCO
France EHESS
France LDI
France LIGM
France LIMSI
France LIP6
France LORIA
France UFRL
France Université de Tours
France University Bordeaux 3
France University Grenoble 3
France University of Franche-Comté
France University of Paris-Est Marne-la-Vallée
France University of Rouen
France University of Strasbourg
France University Paris 8
France University Paris-Sorbonne
Germany CIS, University of Munich
Germany University of Heidelberg
Greece ILSP
Greece University of Thessaloniki
Hong Kong City University of Hong Kong
Hungaria Research Institute for Linguistics
Israel University of Tel Aviv
Italy University of Bari
Italy University of Salerno
Japan Information Science Research Center
Korea Hankuk University of Foreign Studies
Madagascar University of Antananarivo
Norway University of Bergen
Poland Adam Mickiewicz University
Portugal LabEL
Portugal University of Algarve
Serbia University of Belgrad
Slovakia The Faculty of Economics
Spain Autonomous University of Barcelona
Spain University of Alicante
Switzerland University of Genève
Switzerland University of Zürich
United States Florida International University
United States New York University
United States University of California San Diego
United States University of North Texas

Documentation

User's Manual (in PDF format) is available in English and French (more translations are welcome). You can view and print them with Evince, downloadable here. The latest version of the User's Manual is accessible here.

Support

Support questions can be posted in the community support forum. Please feel free to submit any suggestions or requests for new features too. Some general advice about asking technical support questions can be found here.

Reporting Bugs

See the Bug Reporting Guide for information on how to report bugs.

Governance Model

Unitex/GramLab project decision-making is based on a community meritocratic process, anyone with an interest in it can join the community, contribute to the project design and participate in decisions. The Unitex/GramLab Governance Model describes how this participation takes place and how to set about earning merit within the project community.

Spelling

Unitex/GramLab is spelled with capitals "U" "G" and "L", and with everything else in lower case. Excepting the forward slash, do not put a space or any character between words. Only when the forward slash is not allowed, you can simply write “UnitexGramLab”.

It's common to refer to the Unitex/GramLab Core as "Unitex", and to the Unitex Project-oriented IDE as "GramLab". If you are mentioning the distribution suite (Core, IDE, Linguistic Resources and others bundled tools) always use "Unitex/GramLab".

License

Language Resources are distributed under the terms of the Lesser General Public License For Linguistic Resources (LGPLLR). Contact [email protected] for further inquiries.


Copyright (C) 2019 Université Paris-Est Marne-la-Vallée

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].