All Projects → cligs → textbox

cligs / textbox

Licence: other
Text collections made available by the CLiGS group.

Projects that are alternatives of or similar to textbox

number-to-words
convert number into words (english, french, italian, roman, spanish, portuguese, belgium, dutch, swedish, polish, russian, iranian, roman, aegean)
Stars: ✭ 53 (+178.95%)
Mutual labels:  spanish, french, portuguese
pH7-Internationalization
🎌 pH7CMS Internationalization (I18N) package 🙊 Get new languages for your pH7CMS website!
Stars: ✭ 17 (-10.53%)
Mutual labels:  spanish, french, portuguese
language-detector
Detect the language of text
Stars: ✭ 28 (+47.37%)
Mutual labels:  spanish, french
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (+10.53%)
Mutual labels:  corpus
workshops
Scholarly Communications Workshops
Stars: ✭ 13 (-31.58%)
Mutual labels:  digital-humanities
ocr2text
Convert a PDF via OCR to a TXT file in UTF-8 encoding
Stars: ✭ 90 (+373.68%)
Mutual labels:  corpus
open2ch-dialogue-corpus
おーぷん2ちゃんねるをクロールして作成した対話コーパス
Stars: ✭ 65 (+242.11%)
Mutual labels:  corpus
evt-viewer
Edition Visualization Technology 2 - development
Stars: ✭ 66 (+247.37%)
Mutual labels:  digital-humanities
gum
Repository for the Georgetown University Multilayer Corpus (GUM)
Stars: ✭ 71 (+273.68%)
Mutual labels:  corpus
piii.js
A filter for bad words (in Portuguese).
Stars: ✭ 122 (+542.11%)
Mutual labels:  portuguese
htr-united
Ground Truth Resources for the HTR of patrimonial documents
Stars: ✭ 23 (+21.05%)
Mutual labels:  french
createurstech.fr
Première plateforme collaborative et open source qui référence les créateurs de contenus tech francophone.
Stars: ✭ 174 (+815.79%)
Mutual labels:  french
verbecc
Complete Conjugation of any Verb using Machine Learning for French, Spanish, Portuguese, Italian and Romanian
Stars: ✭ 45 (+136.84%)
Mutual labels:  french
OpenConvert
Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)
Stars: ✭ 20 (+5.26%)
Mutual labels:  corpus
ham4corpus
Data from "Hamilton: An American Musical", formatted for reuse. See below for some interesting text analysis basic findings! I am not throwing away my stopword?
Stars: ✭ 53 (+178.95%)
Mutual labels:  digital-humanities
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+3642.11%)
Mutual labels:  corpus
InMangaKindle
Descarga manga en español en diferentes formatos (PNG, PDF, EPUB, MOBI)
Stars: ✭ 43 (+126.32%)
Mutual labels:  spanish
Processando-Processing
Esforço para: Traduzir para o português material de referência sobre Processing; e portar para o Processing Modo Python tutoriais e outros exemplos.
Stars: ✭ 12 (-36.84%)
Mutual labels:  portuguese
linguistic-datasets-portuguese
Linguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento
Stars: ✭ 46 (+142.11%)
Mutual labels:  portuguese
LangageLinotte
Code source officiel du langage de programmation Linotte - Langage de programmation en français simple créé dans le but de permettre aux enfants et aux personnes n'ayant pas une connaissance approfondie de l’informatique d’apprendre la programmation facilement.
Stars: ✭ 29 (+52.63%)
Mutual labels:  french

The CLiGS textbox

DOI

This repository contains several text collections made available by the CLiGS junior research group (see http://cligs.hypotheses.org). As of May 2018, these are:

The text collections are available by cloning the repository, downloading the entire repository as a ZIP file or downloading individual text collections as ZIP files.

All texts are in the public domain. The markup and metadata we have added are provided with a CC-BY (Creative Commons Attribution, see http://creativecommons.org/licenses/by/4.0/) license.

Each collection includes a description of the criteria of text selection, the available data formats, a citation suggestion, etc. More information about formal schemas (for example the TEI schema linked to from all the TEI files) can be found in the reference repository of CLiGS available at https://github.com/cligs/reference.

There is a collection of Python utility scripts which can be used for several tasks related to the textbox, for example to extract metadata automatically from the TEI files. This set of scripts is called the CLiGS toolbox and is available at https://github.com/cligs/toolbox. The toolbox reflects ongoing work and is not organized in official releases.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].