All Projects → ai-ku → Morse.jl

ai-ku / Morse.jl

Licence: MIT license
Paper: Morphological Analysis Using a Sequence Decoder

Programming Languages

julia
2034 projects
shell
77523 projects

Projects that are alternatives of or similar to Morse.jl

yap
Yet Another (natural language) Parser
Stars: ✭ 40 (+185.71%)
Mutual labels:  morphological-disambiguator
RKOMORAN
RKOMORAN is KOMORAN wrapper for R users
Stars: ✭ 15 (+7.14%)
Mutual labels:  morphological-analyser
Neural-Morphological-Disambiguation-for-Turkish-DEPRECATED
Neural morphological disambiguation for Turkish. Implemented in DyNet
Stars: ✭ 11 (-21.43%)
Mutual labels:  morphological-disambiguator
sinling
A collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (+171.43%)
Mutual labels:  morphological-analyser
rouzeta
reference code for Rouzeta(FST-based morpological analyzer)
Stars: ✭ 14 (+0%)
Mutual labels:  morphological-analyser
lemma
A Morphological Parser (Analyser) / Lemmatizer written in Elixir.
Stars: ✭ 45 (+221.43%)
Mutual labels:  morphological-analyser
elasticsearch-sudachi
The Japanese analysis plugin for elasticsearch
Stars: ✭ 129 (+821.43%)
Mutual labels:  morphological-analyser
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (+400%)
Mutual labels:  morphological-analyser
GrammarEngine
Грамматический Словарь Русского Языка (+ английский, японский, etc)
Stars: ✭ 68 (+385.71%)
Mutual labels:  morphological-analyser
PyKOMORAN
(Beta) PyKOMORAN is wrapped KOMORAN in Python using Py4J.
Stars: ✭ 38 (+171.43%)
Mutual labels:  morphological-analyser
NMeCab
Japanese morphological analyzer on .NET
Stars: ✭ 65 (+364.29%)
Mutual labels:  morphological-analyser
udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (+7.14%)
Mutual labels:  morphological-disambiguator

Morse

Morse is the morphological analysis model described in:

Akyürek, Ekin, Erenay Dayanık, and Deniz Yuret. "Morphological Analysis Using a Sequence Decoder." Transactions of the Association for Computational Linguistics 7 (2019): 567-579. (TACL, arXiv).

Dependencies

  • Julia 1.x
  • Network connection

Installation

   git clone https://github.com/ai-ku/Morse.jl
   cd Morse.jl

Note: Setup and Data is optional because running an experiment from the scripts directory automatically sets up the environment and installs required data when needed. However, if you're working in a cluster node that has no internet connection, you may need to perform these steps manually. To get the pkg> prompt in Julia for package commands please use the ']' key. Backspace gets back to the original julia> prompt.

  • Setup (Optional)

   julia> # Press the `]` key to get the `pkg>` prompt
   (v1.1) pkg> activate .
   (v1.1) Morse> instantiate # only in the first time
  • Data (Optional)

   julia> using Morse
   julia> download(TRDataSet)
   julia> download(UDDataSet)

Experiments

To verify the results presented in the paper, you may run the scripts to train models and ablations. During training logs will be created at logs/ folder.

Detailed information about experiments can be found in scripts/

Note: An Nvidia GPU is required to train the models in a reasonable amount of time.

Tagging

Available Pre-Trained Models

trained(MorseModel, TRDataSet);
trained(MorseModel, UDDataSet, lang="ru"); # Russian
trained(MorseModel, UDDataSet, lang="da"); # Danish
trained(MorseModel, UDDataSet, lang="fi"); # Finnish
trained(MorseModel, UDDataSet, lang="pt"); # Portuguese
trained(MorseModel, UDDataSet, lang="es"); # Español
trained(MorseModel, UDDataSet, lang="hu"); # Hungarian
trained(MorseModel, UDDataSet, lang="bg"); # Bulgarian
trained(MorseModel, UDDataSet, lang="sv"); # Swedish

How To Use

Note: Please use lowercased and tokenized inputs.

   julia> using Knet, KnetLayers, Morse
   julia> model, vocabulary, parser = trained(MorseModel, TRDataSet);
   julia> predictions = model("annem sana yardım edemez .", v=vocabulary, p=parser)
   annem anne+Noun+A3sg+P1sg+Nom
   sana sen+Pron+Pers+A2sg+Pnon+Dat
   yardım yardım+Noun+A3sg+Pnon+Nom
   edemez et+Verb^DB+Verb+Able+Neg+Aor+A3sg
   . .+Punct

Customized Training

Note: Nvidia GPU is required to train on a reasonable time.

   julia> using Knet, KnetLayers, Morse
   julia> config = Morse.intro(split("--logFile nothing --lemma --dataSet TRDataSet")) # you can modify the program arguments
   julia> dataFiles = ["train.txt", "test.txt"] # make sure you have theese files exists in the given path
   julia> data, vocab, parser = prepareData(dataFiles,TRDataSet) # or UDDataSet
   julia> data = miniBatch(data,vocab) # sentence minibatching is required for processing a sentence correctly
   julia> model = MorseModel(config,vocab)
   julia> setoptim!(model, SGD(;lr=1.6,gclip=60.0))
   julia> trainmodel!(model,data,config,vocab,parser) # can take hours or more depends to your data
   julia> predictions = model("Annem sana yardım edemez .", v=vocab, p=parser)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].