foliapyAn extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Stars: ✭ 13 (-76.79%)
frogFrog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (+25%)
Colibri CoreColibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Stars: ✭ 112 (+100%)
pylangacqLanguage Acquisition Research Tools
Stars: ✭ 33 (-41.07%)
uctoUnicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …
Stars: ✭ 58 (+3.57%)
wikipronMassively multilingual pronunciation mining
Stars: ✭ 167 (+198.21%)
proiel-treebankOfficial releases of the PROIEL treebank of ancient Indo-European languages
Stars: ✭ 30 (-46.43%)
nytwitNew York Times Word Innovation Types dataset
Stars: ✭ 21 (-62.5%)
cljs-corpusA greppable archive of ClojureScript code
Stars: ✭ 37 (-33.93%)
pdf-corpusPython script to quickly create hand-crafted PDF files
Stars: ✭ 17 (-69.64%)
CBLUE中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+576.79%)
linguisticsdownEasy Linguistics Document Writing with R Markdown
Stars: ✭ 24 (-57.14%)
nafNucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences
Stars: ✭ 35 (-37.5%)
KWDLCKyoto University Web Document Leads Corpus
Stars: ✭ 64 (+14.29%)
lametaThe Metadata Editor for Transparent Archiving of language document materials
Stars: ✭ 18 (-67.86%)
TextDatasetCleaner🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-51.79%)
lingvo--Ner-ruNamed entity recognition (NER) in Russian texts / Определение именованных сущностей (NER) в тексте на русском языке
Stars: ✭ 38 (-32.14%)
js-cfb💾 OLE File Container Format
Stars: ✭ 54 (-3.57%)
IntroAppThis Android app adds splash screen slides to make a great intro for an app.
Stars: ✭ 16 (-71.43%)
bible-corpusA multilingual parallel corpus created from translations of the Bible.
Stars: ✭ 115 (+105.36%)
MP4ParseC++ library for MP4 file parsing.
Stars: ✭ 55 (-1.79%)
odinData-structure definition/validation/traversal, mapping and serialisation toolkit for Python
Stars: ✭ 24 (-57.14%)
mimesnifferA MIME type sniffer for Go.
Stars: ✭ 22 (-60.71%)
xrechnung-visualizationXSL transformators for web and pdf rendering of German CIUS XRechnung or EN16931-1:2017 [MIRROR OF GitLab]
Stars: ✭ 26 (-53.57%)
VectorDrawable2SvgConverts Android VectorDrawable .xml files to .svg files
Stars: ✭ 50 (-10.71%)
jrte-corpusJapanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
Stars: ✭ 66 (+17.86%)
dreamland worldDreamLand MUD: all configuration files, and some areas for local dev
Stars: ✭ 16 (-71.43%)
datastories-semeval2017-task6Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-64.29%)
SentimentAnalysisSentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-42.86%)
kaldi helpers🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
Stars: ✭ 13 (-76.79%)
firehoseInterchange format for results for static analysis tools
Stars: ✭ 62 (+10.71%)
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-62.5%)
miniplyA fast and easy-to-use PLY parsing library in a single c++11 header and cpp file
Stars: ✭ 29 (-48.21%)
thai-languagecomputer tools for thai language
Stars: ✭ 20 (-64.29%)
citation-functionMeasuring the Evolution of a Scientific Field through Citation Frames
Stars: ✭ 40 (-28.57%)
go-objOBJ file loader for golang
Stars: ✭ 16 (-71.43%)
datalinguistStanford CoreNLP in idiomatic Clojure.
Stars: ✭ 93 (+66.07%)
ConfigPHP library for simple configuration management
Stars: ✭ 39 (-30.36%)
TinyMATC/C++ library to handle writing simple Matlab(r) MAT file
Stars: ✭ 22 (-60.71%)
GbxDumpA Microsoft Windows application that displays the contents of the file header of *.Gbx files used by the Nadeo game engine GameBox.
Stars: ✭ 19 (-66.07%)
TV4DialogNo description or website provided.
Stars: ✭ 33 (-41.07%)
TinyTIFFlightweight TIFF reader/writer library (C/C++)
Stars: ✭ 91 (+62.5%)
LanguageCodesWe present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).
Stars: ✭ 70 (+25%)
mmtfThe specification of the MMTF format for biological structures
Stars: ✭ 40 (-28.57%)
utils.js👷 🔧 zero dependencies vanilla JavaScript utils.
Stars: ✭ 14 (-75%)
php-halHAL+JSON & HAL+XML API transformer outputting valid (PSR-7) API Responses.
Stars: ✭ 30 (-46.43%)
kanji-frequencyKanji usage frequency data collected from various sources
Stars: ✭ 92 (+64.29%)