All Projects → folia → Similar Projects or Alternatives

630 Open source projects that are alternatives of or similar to folia

foliapy
An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Stars: ✭ 13 (-76.79%)
Mutual labels:  xml, computational-linguistics, folia
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (+25%)
Mutual labels:  computational-linguistics, folia
Colibri Core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Stars: ✭ 112 (+100%)
Mutual labels:  corpus, linguistics
pylangacq
Language Acquisition Research Tools
Stars: ✭ 33 (-41.07%)
ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …
Stars: ✭ 58 (+3.57%)
Mutual labels:  computational-linguistics, folia
fuzzing-corpus
My fuzzing corpus
Stars: ✭ 120 (+114.29%)
Mutual labels:  corpus, file-format
linguistics problems
Natural language processing in examples and games
Stars: ✭ 23 (-58.93%)
Weixin public corpus
微信公众号语料库
Stars: ✭ 465 (+730.36%)
Mutual labels:  corpus, linguistics
wikipron
Massively multilingual pronunciation mining
Stars: ✭ 167 (+198.21%)
proiel-treebank
Official releases of the PROIEL treebank of ancient Indo-European languages
Stars: ✭ 30 (-46.43%)
Mutual labels:  corpus, linguistics
nytwit
New York Times Word Innovation Types dataset
Stars: ✭ 21 (-62.5%)
cljs-corpus
A greppable archive of ClojureScript code
Stars: ✭ 37 (-33.93%)
Mutual labels:  corpus
pdf-corpus
Python script to quickly create hand-crafted PDF files
Stars: ✭ 17 (-69.64%)
Mutual labels:  corpus
CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+576.79%)
Mutual labels:  corpus
linguisticsdown
Easy Linguistics Document Writing with R Markdown
Stars: ✭ 24 (-57.14%)
Mutual labels:  linguistics
naf
Nucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences
Stars: ✭ 35 (-37.5%)
Mutual labels:  file-format
KWDLC
Kyoto University Web Document Leads Corpus
Stars: ✭ 64 (+14.29%)
Mutual labels:  corpus
lameta
The Metadata Editor for Transparent Archiving of language document materials
Stars: ✭ 18 (-67.86%)
Mutual labels:  linguistics
iOS-Shortcuts-Reference
Reference documentation for the iOS Shortcuts app file structure
Stars: ✭ 89 (+58.93%)
Mutual labels:  file-format
TextDatasetCleaner
🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-51.79%)
Mutual labels:  linguistics
lingvo--Ner-ru
Named entity recognition (NER) in Russian texts / Определение именованных сущностей (NER) в тексте на русском языке
Stars: ✭ 38 (-32.14%)
Mutual labels:  linguistics
js-cfb
💾 OLE File Container Format
Stars: ✭ 54 (-3.57%)
Mutual labels:  file-format
IntroApp
This Android app adds splash screen slides to make a great intro for an app.
Stars: ✭ 16 (-71.43%)
Mutual labels:  xml
zwift-workout-file-reference
Reference documentation for the Zwift workout file format
Stars: ✭ 54 (-3.57%)
Mutual labels:  xml
bible-corpus
A multilingual parallel corpus created from translations of the Bible.
Stars: ✭ 115 (+105.36%)
Mutual labels:  corpus
MagicaVoxel File Writer
MagicaVoxel File Writer dependency free cpp class
Stars: ✭ 26 (-53.57%)
Mutual labels:  file-format
MP4Parse
C++ library for MP4 file parsing.
Stars: ✭ 55 (-1.79%)
Mutual labels:  file-format
odin
Data-structure definition/validation/traversal, mapping and serialisation toolkit for Python
Stars: ✭ 24 (-57.14%)
Mutual labels:  xml
mimesniffer
A MIME type sniffer for Go.
Stars: ✭ 22 (-60.71%)
Mutual labels:  file-format
xrechnung-visualization
XSL transformators for web and pdf rendering of German CIUS XRechnung or EN16931-1:2017 [MIRROR OF GitLab]
Stars: ✭ 26 (-53.57%)
Mutual labels:  xml
egret-wenda-corpus
A Public Corpus for Machine Learning
Stars: ✭ 41 (-26.79%)
Mutual labels:  corpus
VectorDrawable2Svg
Converts Android VectorDrawable .xml files to .svg files
Stars: ✭ 50 (-10.71%)
Mutual labels:  xml
jrte-corpus
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
Stars: ✭ 66 (+17.86%)
Mutual labels:  corpus
dreamland world
DreamLand MUD: all configuration files, and some areas for local dev
Stars: ✭ 16 (-71.43%)
Mutual labels:  xml
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-64.29%)
Mutual labels:  computational-linguistics
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-42.86%)
Mutual labels:  computational-linguistics
kaldi helpers
🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
Stars: ✭ 13 (-76.79%)
Mutual labels:  computational-linguistics
clinical nlp elastic
Clinical NLP Analysis with Elasticsearch and Kibana
Stars: ✭ 32 (-42.86%)
Mutual labels:  linguistics
firehose
Interchange format for results for static analysis tools
Stars: ✭ 62 (+10.71%)
Mutual labels:  file-format
mystem-scala
Morphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-62.5%)
Mutual labels:  computational-linguistics
miniply
A fast and easy-to-use PLY parsing library in a single c++11 header and cpp file
Stars: ✭ 29 (-48.21%)
Mutual labels:  file-format
thai-language
computer tools for thai language
Stars: ✭ 20 (-64.29%)
Mutual labels:  corpus
sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (-8.93%)
Mutual labels:  computational-linguistics
citation-function
Measuring the Evolution of a Scientific Field through Citation Frames
Stars: ✭ 40 (-28.57%)
Mutual labels:  computational-linguistics
go-obj
OBJ file loader for golang
Stars: ✭ 16 (-71.43%)
Mutual labels:  file-format
datalinguist
Stanford CoreNLP in idiomatic Clojure.
Stars: ✭ 93 (+66.07%)
Mutual labels:  computational-linguistics
Config
PHP library for simple configuration management
Stars: ✭ 39 (-30.36%)
Mutual labels:  xml
TinyMAT
C/C++ library to handle writing simple Matlab(r) MAT file
Stars: ✭ 22 (-60.71%)
Mutual labels:  file-format
GbxDump
A Microsoft Windows application that displays the contents of the file header of *.Gbx files used by the Nadeo game engine GameBox.
Stars: ✭ 19 (-66.07%)
Mutual labels:  file-format
TV4Dialog
No description or website provided.
Stars: ✭ 33 (-41.07%)
Mutual labels:  corpus
TinyTIFF
lightweight TIFF reader/writer library (C/C++)
Stars: ✭ 91 (+62.5%)
Mutual labels:  file-format
LanguageCodes
We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).
Stars: ✭ 70 (+25%)
Mutual labels:  corpus
mmtf
The specification of the MMTF format for biological structures
Stars: ✭ 40 (-28.57%)
Mutual labels:  file-format
utils.js
👷 🔧 zero dependencies vanilla JavaScript utils.
Stars: ✭ 14 (-75%)
Mutual labels:  xml
php-hal
HAL+JSON & HAL+XML API transformer outputting valid (PSR-7) API Responses.
Stars: ✭ 30 (-46.43%)
Mutual labels:  xml
neural-net-linguistics
Papers about NN and linguistics
Stars: ✭ 14 (-75%)
Mutual labels:  linguistics
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (+44.64%)
Mutual labels:  corpus
CLUEmotionAnalysis2020
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Stars: ✭ 3 (-94.64%)
Mutual labels:  corpus
kanji-frequency
Kanji usage frequency data collected from various sources
Stars: ✭ 92 (+64.29%)
Mutual labels:  corpus
blogspot-themes
Blogspot (Blogger) Themes Library
Stars: ✭ 32 (-42.86%)
Mutual labels:  xml
1-60 of 630 similar projects