All Projects → folia → Similar Projects or Alternatives

630 Open source projects that are alternatives of or similar to folia

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

Stars: ✭ 13 (-76.79%)

Mutual labels: xml, computational-linguistics, folia

frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Stars: ✭ 70 (+25%)

Mutual labels: computational-linguistics, folia

Colibri Core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Stars: ✭ 112 (+100%)

Mutual labels: corpus, linguistics

pylangacq

Language Acquisition Research Tools

Stars: ✭ 33 (-41.07%)

Mutual labels: linguistics, computational-linguistics

ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …

Stars: ✭ 58 (+3.57%)

Mutual labels: computational-linguistics, folia

fuzzing-corpus

My fuzzing corpus

Stars: ✭ 120 (+114.29%)

Mutual labels: corpus, file-format

linguistics problems

Natural language processing in examples and games

Stars: ✭ 23 (-58.93%)

Mutual labels: linguistics, computational-linguistics

Weixin public corpus

微信公众号语料库

Stars: ✭ 465 (+730.36%)

Mutual labels: corpus, linguistics

wikipron

Massively multilingual pronunciation mining

Stars: ✭ 167 (+198.21%)

Mutual labels: linguistics, computational-linguistics

proiel-treebank

Official releases of the PROIEL treebank of ancient Indo-European languages

Stars: ✭ 30 (-46.43%)

Mutual labels: corpus, linguistics

nytwit

New York Times Word Innovation Types dataset

Stars: ✭ 21 (-62.5%)

Mutual labels: corpus, computational-linguistics

cljs-corpus

A greppable archive of ClojureScript code

Stars: ✭ 37 (-33.93%)

Mutual labels: corpus

pdf-corpus

Python script to quickly create hand-crafted PDF files

Stars: ✭ 17 (-69.64%)

Mutual labels: corpus

CBLUE

中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Stars: ✭ 379 (+576.79%)

Mutual labels: corpus

linguisticsdown

Easy Linguistics Document Writing with R Markdown

Stars: ✭ 24 (-57.14%)

Mutual labels: linguistics

naf

Nucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences

Stars: ✭ 35 (-37.5%)

Mutual labels: file-format

KWDLC

Kyoto University Web Document Leads Corpus

Stars: ✭ 64 (+14.29%)

Mutual labels: corpus

lameta

The Metadata Editor for Transparent Archiving of language document materials

Stars: ✭ 18 (-67.86%)

Mutual labels: linguistics

iOS-Shortcuts-Reference

Reference documentation for the iOS Shortcuts app file structure

Stars: ✭ 89 (+58.93%)

Mutual labels: file-format

TextDatasetCleaner

🔬 Очистка датасетов от мусора (нормализация, препроцессинг)

Stars: ✭ 27 (-51.79%)

Mutual labels: linguistics

lingvo--Ner-ru

Named entity recognition (NER) in Russian texts / Определение именованных сущностей (NER) в тексте на русском языке

Stars: ✭ 38 (-32.14%)

Mutual labels: linguistics

js-cfb

💾 OLE File Container Format

Stars: ✭ 54 (-3.57%)

Mutual labels: file-format

IntroApp

This Android app adds splash screen slides to make a great intro for an app.

Stars: ✭ 16 (-71.43%)

Mutual labels: xml

zwift-workout-file-reference

Reference documentation for the Zwift workout file format

Stars: ✭ 54 (-3.57%)

Mutual labels: xml

bible-corpus

A multilingual parallel corpus created from translations of the Bible.

Stars: ✭ 115 (+105.36%)

Mutual labels: corpus

MagicaVoxel File Writer

MagicaVoxel File Writer dependency free cpp class

Stars: ✭ 26 (-53.57%)

Mutual labels: file-format

MP4Parse

C++ library for MP4 file parsing.

Stars: ✭ 55 (-1.79%)

Mutual labels: file-format

odin

Data-structure definition/validation/traversal, mapping and serialisation toolkit for Python

Stars: ✭ 24 (-57.14%)

Mutual labels: xml

mimesniffer

A MIME type sniffer for Go.

Stars: ✭ 22 (-60.71%)

Mutual labels: file-format

xrechnung-visualization

XSL transformators for web and pdf rendering of German CIUS XRechnung or EN16931-1:2017 [MIRROR OF GitLab]

Stars: ✭ 26 (-53.57%)

Mutual labels: xml

egret-wenda-corpus

A Public Corpus for Machine Learning

Stars: ✭ 41 (-26.79%)

Mutual labels: corpus

VectorDrawable2Svg

Converts Android VectorDrawable .xml files to .svg files

Stars: ✭ 50 (-10.71%)

Mutual labels: xml

jrte-corpus

Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)

Stars: ✭ 66 (+17.86%)

Mutual labels: corpus

dreamland world

DreamLand MUD: all configuration files, and some areas for local dev

Stars: ✭ 16 (-71.43%)

Mutual labels: xml

datastories-semeval2017-task6

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Stars: ✭ 20 (-64.29%)

Mutual labels: computational-linguistics

SentimentAnalysis

Sentiment Analysis: Deep Bi-LSTM+attention model

Stars: ✭ 32 (-42.86%)

Mutual labels: computational-linguistics

kaldi helpers

🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.

Stars: ✭ 13 (-76.79%)

Mutual labels: computational-linguistics

clinical nlp elastic

Clinical NLP Analysis with Elasticsearch and Kibana

Stars: ✭ 32 (-42.86%)

Mutual labels: linguistics

firehose

Interchange format for results for static analysis tools

Stars: ✭ 62 (+10.71%)

Mutual labels: file-format

mystem-scala

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Stars: ✭ 21 (-62.5%)

Mutual labels: computational-linguistics

miniply

A fast and easy-to-use PLY parsing library in a single c++11 header and cpp file

Stars: ✭ 29 (-48.21%)

Mutual labels: file-format

thai-language

computer tools for thai language

Stars: ✭ 20 (-64.29%)

Mutual labels: corpus

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

Stars: ✭ 51 (-8.93%)

Mutual labels: computational-linguistics

citation-function

Measuring the Evolution of a Scientific Field through Citation Frames

Stars: ✭ 40 (-28.57%)

Mutual labels: computational-linguistics

go-obj

OBJ file loader for golang

Stars: ✭ 16 (-71.43%)

Mutual labels: file-format

datalinguist

Stanford CoreNLP in idiomatic Clojure.

Stars: ✭ 93 (+66.07%)

Mutual labels: computational-linguistics

Config

PHP library for simple configuration management

Stars: ✭ 39 (-30.36%)

Mutual labels: xml

TinyMAT

C/C++ library to handle writing simple Matlab(r) MAT file

Stars: ✭ 22 (-60.71%)

Mutual labels: file-format

GbxDump

A Microsoft Windows application that displays the contents of the file header of *.Gbx files used by the Nadeo game engine GameBox.

Stars: ✭ 19 (-66.07%)

Mutual labels: file-format

TV4Dialog

No description or website provided.

Stars: ✭ 33 (-41.07%)

Mutual labels: corpus

TinyTIFF

lightweight TIFF reader/writer library (C/C++)

Stars: ✭ 91 (+62.5%)

Mutual labels: file-format

LanguageCodes

We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).

Stars: ✭ 70 (+25%)

Mutual labels: corpus

mmtf

The specification of the MMTF format for biological structures

Stars: ✭ 40 (-28.57%)

Mutual labels: file-format

utils.js

👷 🔧 zero dependencies vanilla JavaScript utils.

Stars: ✭ 14 (-75%)

Mutual labels: xml

php-hal

HAL+JSON & HAL+XML API transformer outputting valid (PSR-7) API Responses.

Stars: ✭ 30 (-46.43%)

Mutual labels: xml

neural-net-linguistics

Papers about NN and linguistics

Stars: ✭ 14 (-75%)

Mutual labels: linguistics

text-classification-cn

中文文本分类实践，基于搜狗新闻语料库，采用传统机器学习方法以及预训练模型等方法

Stars: ✭ 81 (+44.64%)

Mutual labels: corpus

CLUEmotionAnalysis2020

CLUE Emotion Analysis Dataset 细粒度情感分析数据集

Stars: ✭ 3 (-94.64%)

Mutual labels: corpus

kanji-frequency

Kanji usage frequency data collected from various sources

Stars: ✭ 92 (+64.29%)

Mutual labels: corpus

blogspot-themes

Blogspot (Blogger) Themes Library

Stars: ✭ 32 (-42.86%)

Mutual labels: xml

1-60 of 630 similar projects

›

next*5