All Projects → estnltk → Estnltk

estnltk / Estnltk

Licence: gpl-2.0
Open source tools for Estonian natural language processing

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Estnltk

Slate
A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python
Stars: ✭ 61 (-14.08%)
Mutual labels:  natural-language-processing
Chicksexer
A Python package for gender classification.
Stars: ✭ 64 (-9.86%)
Mutual labels:  natural-language-processing
Touchdown
Cornell Touchdown natural language navigation and spatial reasoning dataset.
Stars: ✭ 69 (-2.82%)
Mutual labels:  natural-language-processing
Repo 2017
Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano
Stars: ✭ 1,123 (+1481.69%)
Mutual labels:  natural-language-processing
Multilingual Latent Dirichlet Allocation Lda
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
Stars: ✭ 64 (-9.86%)
Mutual labels:  natural-language-processing
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+1494.37%)
Mutual labels:  natural-language-processing
Fromscratch
Stars: ✭ 61 (-14.08%)
Mutual labels:  natural-language-processing
Label Embedding Network
Label Embedding Network
Stars: ✭ 69 (-2.82%)
Mutual labels:  natural-language-processing
Kor2vec
Library for Korean morpheme and word vector representation
Stars: ✭ 64 (-9.86%)
Mutual labels:  natural-language-processing
Hackerrank
This is the Repository where you can find all the solution of the Problems which you solve on competitive platforms mainly HackerRank and HackerEarth
Stars: ✭ 68 (-4.23%)
Mutual labels:  natural-language-processing
Emailparser
remove signature blocks from emails
Stars: ✭ 63 (-11.27%)
Mutual labels:  natural-language-processing
Gpt2
PyTorch Implementation of OpenAI GPT-2
Stars: ✭ 64 (-9.86%)
Mutual labels:  natural-language-processing
Capsnet Nlp
CapsNet for NLP
Stars: ✭ 66 (-7.04%)
Mutual labels:  natural-language-processing
Emnlp2018 nli
Repository for NLI models (EMNLP 2018)
Stars: ✭ 62 (-12.68%)
Mutual labels:  natural-language-processing
Ai Writer data2doc
PyTorch Implementation of NBA game summary generator.
Stars: ✭ 69 (-2.82%)
Mutual labels:  natural-language-processing
How To Mine Newsfeed Data And Extract Interactive Insights In Python
A practical guide to topic mining and interactive visualizations
Stars: ✭ 61 (-14.08%)
Mutual labels:  natural-language-processing
Convai Bot 1337
NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager
Stars: ✭ 65 (-8.45%)
Mutual labels:  natural-language-processing
Usaddress
🇺🇸 a python library for parsing unstructured address strings into address components
Stars: ✭ 1,165 (+1540.85%)
Mutual labels:  natural-language-processing
Get started with deep learning for text with allennlp
Getting started with AllenNLP and PyTorch by training a tweet classifier
Stars: ✭ 69 (-2.82%)
Mutual labels:  natural-language-processing
Intent classifier
Stars: ✭ 67 (-5.63%)
Mutual labels:  natural-language-processing

EstNLTK -- Open source tools for Estonian natural language processing

EstNLTK provides common natural language processing functionality such as paragraph, sentence and word tokenization, morphological analysis, named entity recognition, etc. for the Estonian language.

The project is funded by EKT (Eesti Keeletehnoloogia Riiklik Programm).

Currently, there are two branches of EstNLTK:

  • version 1.6 -- the new branch, which is in a beta status and under development. The version 1.6.7beta is available from Anaconda package repository. Due to the beta status, some of the tools are limited or incomplete. Supported Python versions are 3.5, 3.6 and 3.7. The source of the latest release is available at the branch version_1.6, and the development source can be found at devel_1.6.

  • version 1.4.1 -- the old branch, which contains full functionality of different analysis tools. Available via Anaconda package repository for Python 3.5. PyPI packages are also available for Python 3.4, 3.5 and 2.7. Python versions 3.6, 3.7 and beyond are not supported;

Version 1.6

Installation

The recommended way of installing EstNLTK is by using the anaconda python distribution and python 3.5+.

Installable packages have been built for osx, windows-64, and linux-64.

As some of the EstNLTK's dependencies are not yet compatible with the newest version of python (3.8), we recommend to install EstNLTK inside a conda environment that contains python 3.7:

  1. create a conda environment with python 3.7, for instance:
conda create -n py37 python=3.7
  1. activate the environment, for instance:
conda activate py37
  1. install EstNLTK with the command:
conda install -c estnltk -c conda-forge estnltk=1.6.7b

The alternative way for installing if you are unable to use the anaconda distribution is:

python -m pip install estnltk

This is slower, more error-prone and requires you to have the appropriate compilers for building the scientific computation packages for your platform.

Note: for using some of the tools in estnltk, you also need to have Java installed in your system. We recommend using Oracle Java http://www.oracle.com/technetwork/java/javase/downloads/index.html, although alternatives such as OpenJDK (http://openjdk.java.net/) should also work.

Using on Google Colab

You can install EstNLTK on Google Colab environment via command:

!pip install estnltk==1.6.7b0

Note: the PyPI package, which installation is shown above, has been specifically created for Colab. For other platforms/environments, please use our conda packages.

Documentation

Documentation for 1.6 currently comes in the form of jupyter notebooks, which are available here: https://github.com/estnltk/estnltk/tree/version_1.6/tutorials

Note: if you have trouble viewing jupyter notebooks in github (you get an error message Sorry, something went wrong. Reload? at loading a notebook), then try to open notebooks with the help of https://nbviewer.jupyter.org

Source

The source of the latest release is available at the branch version_1.6, and the development source can be found at devel_1.6.

Version 1.4.1

Installation

The recommended way of installing estnltk is by using the anaconda python distribution and python 3.5.

We have installable packages built for osx, windows-64, and linux-64. Installation steps:

  1. create a conda environment with python 3.5, for instance:
conda create -n py35 python=3.5
  1. activate the environment, for instance:
conda activate py35
  1. install estnltk with the command:
conda install -c estnltk -c conda-forge nltk=3.4.4 estnltk=1.4.1

Note: for using some of the tools in estnltk, you also need to have Java installed in your system. We recommend using Oracle Java http://www.oracle.com/technetwork/java/javase/downloads/index.html, although alternatives such as OpenJDK (http://openjdk.java.net/) should also work.

If you have jupyter notebook installed, you can use EstNLTK in an interactive web application. For that, type the command:

jupyter notebook

To run our tutorials, download them as a zip file, unpack them to a directory and run the command jupyter notebook in that directory.


The alternative way for installing if you are unable to use the anaconda distribution is:

python -m pip install estnltk==1.4.1.1

This is slower, more error-prone and requires you to have the appropriate compilers for building the scientific computation packages for your platform.

Find more details in the installation tutorial for version 1.4.

Documentation

Release 1.4.1 documentation is available at https://estnltk.github.io/estnltk/1.4.1/index.html. For previous versions refer to https://estnltk.github.io/estnltk. For more tools see https://estnltk.github.io.

Additional educational materials on EstNLTK version 1.4 are available on web pages of the NLP courses taught at the University of Tartu:

Source

The source of the latest v1.4 release is available at the master branch.

Citation

In case you use EstNLTK 1.6 in your work, please cite us as follows:

@InProceedings{laur-EtAl:2020:LREC,
  author    = {Laur, Sven  and  Orasmaa, Siim  and  Särg, Dage  and  Tammo, Paul},
  title     = {EstNLTK 1.6: Remastered Estonian NLP Pipeline},
  booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month     = {May},
  year      = {2020},
  address   = {Marseille, France},
  publisher = {European Language Resources Association},
  pages     = {7154--7162},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.884}
}

If you use EstNLTK 1.4.1 (or older), please cite:

@InProceedings{ORASMAA16.332,
author = {Siim Orasmaa and Timo Petmanson and Alexander Tkachenko and Sven Laur and Heiki-Jaan Kaalep},
title = {EstNLTK - NLP Toolkit for Estonian},
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
year = {2016},
month = {may},
date = {23-28},
location = {Portorož, Slovenia},
editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
publisher = {European Language Resources Association (ELRA)},
address = {Paris, France},
isbn = {978-2-9517408-9-1},
language = {english}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].