All Projects → INL → OpenConvert

INL / OpenConvert

Licence: other
Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)

Programming Languages

java
68154 projects - #9 most used programming language
XSLT
1337 projects

Projects that are alternatives of or similar to OpenConvert

caffe weight converter
Caffe-to-Keras weight converter. Can also export weights as Numpy arrays for further processing.
Stars: ✭ 68 (+240%)
Mutual labels:  conversion
vectorexpress-api
Vector Express is a free service and API for converting, analyzing and processing vector files.
Stars: ✭ 66 (+230%)
Mutual labels:  conversion
tvsub
TVsub: DCU-Tencent Chinese-English Dialogue Corpus
Stars: ✭ 40 (+100%)
Mutual labels:  corpus
rclc
Rich Context leaderboard competition, including the corpus and current SOTA for required tasks.
Stars: ✭ 20 (+0%)
Mutual labels:  corpus
shell2batch
Coverts simple basic shell scripts to windows batch scripts.
Stars: ✭ 42 (+110%)
Mutual labels:  conversion
roffit
converts nroff man pages to HTML
Stars: ✭ 84 (+320%)
Mutual labels:  conversion
bank2ynab
Easily convert and import your bank's statements into YNAB. This project consolidates other conversion efforts into one universal tool.
Stars: ✭ 197 (+885%)
Mutual labels:  conversion
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+3455%)
Mutual labels:  corpus
sms
A Go library for encoding and decoding SMSs
Stars: ✭ 37 (+85%)
Mutual labels:  conversion
BlocksConverter
A PocketMine-MP plugin allows you to convert Minecraft PC maps to MCPE/Bedrock maps or vice-versa.
Stars: ✭ 47 (+135%)
Mutual labels:  conversion
Probabilistic-RNN-DA-Classifier
Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model
Stars: ✭ 22 (+10%)
Mutual labels:  corpus
DANeS
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
Stars: ✭ 64 (+220%)
Mutual labels:  corpus
Speech-Corpus-Collection
A Collection of Speech Corpus for ASR and TTS
Stars: ✭ 113 (+465%)
Mutual labels:  corpus
german-nouns
A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.
Stars: ✭ 101 (+405%)
Mutual labels:  corpus
embedded-time
Time(ing) library (Instant/Duration/Clock/Timer/Period/Frequency) for bare-metal embedded systems
Stars: ✭ 72 (+260%)
Mutual labels:  conversion
units
A run-time C++ library for working with units of measurement and conversions between them and with string representations of units and measurements
Stars: ✭ 114 (+470%)
Mutual labels:  conversion
proiel-treebank
Official releases of the PROIEL treebank of ancient Indo-European languages
Stars: ✭ 30 (+50%)
Mutual labels:  corpus
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (+5%)
Mutual labels:  corpus
Chatbot-Training-Corpus
总结了一些可以用作聊天机器人训练实作的文字语聊,包含中英文不同语言
Stars: ✭ 117 (+485%)
Mutual labels:  corpus
placekey-py
placekey.io
Stars: ✭ 49 (+145%)
Mutual labels:  conversion

OpenConvert

The OpenConvert tools output TEI from a number of input formats.

Using the command line

The OpenConvert distribution can be accessed at https://github.com/INL/OpenConvert.

The command line can be used as follows:

java -jar OpenConvert.jar -from <input_format> -to <output_format> <input> <output>

Options:

  • -from input format: text, TEI, alto, doc, docx, HTML

  • -to output format: TEI, text or folia

Arguments:

  • input filename, directory name or zip archive name (ending with .zip)

  • output filename, directory name or zip archive name (ending with .zip)

If the from and to flags are omitted, the conversion to be applied will be guessed from file name extensions.

NOTE: the default setting for server is a bit unfortunately set to an INT-internal address. You should be able to run your own test run with the following command line:

java -jar openconvert.client.jar -f text -t tei -a chn-tagger -s https://openconvert.ivdnt.org/openconvert/file test.txt
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].