LanguageMachines / frog

Licence: GPL-3.0 license
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Programming Languages

C++
36643 projects - #6 most used programming language
M4
1887 projects

Projects that are alternatives of or similar to frog

datalinguist
Stanford CoreNLP in idiomatic Clojure.
Stars: ✭ 93 (+32.86%)
Mutual labels:  computational-linguistics, dependency-parser, pos-tagger
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+35080%)
Mutual labels:  named-entity-recognition, dependency-parser
Textpipe
Textpipe: clean and extract metadata from text
Stars: ✭ 284 (+305.71%)
Mutual labels:  named-entity-recognition, text-processing
Pyhanlp
中文分词 词性标注 命名实体识别 依存句法分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁 自然语言处理
Stars: ✭ 2,564 (+3562.86%)
Mutual labels:  named-entity-recognition, dependency-parser
ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …
Stars: ✭ 58 (-17.14%)
Mutual labels:  computational-linguistics, folia
foliapy
An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Stars: ✭ 13 (-81.43%)
Mutual labels:  computational-linguistics, folia
nerus
Large silver standart Russian corpus with NER, morphology and syntax markup
Stars: ✭ 47 (-32.86%)
Mutual labels:  syntax, morphology
udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-78.57%)
Mutual labels:  dependency-parser, pos-tagger
yap
Yet Another (natural language) Parser
Stars: ✭ 40 (-42.86%)
Mutual labels:  computational-linguistics, dependency-parser
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (-14.29%)
Mutual labels:  computational-linguistics, text-processing
TweebankNLP
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Stars: ✭ 84 (+20%)
Mutual labels:  named-entity-recognition, dependency-parser
folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
Stars: ✭ 56 (-20%)
Mutual labels:  computational-linguistics, folia
langua
A suite of language tools
Stars: ✭ 29 (-58.57%)
Mutual labels:  syntax, morphology
ArabicProcessingCog
A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-72.86%)
Mutual labels:  computational-linguistics, text-processing
deepfrog
An NLP-suite powered by deep learning
Stars: ✭ 16 (-77.14%)
Mutual labels:  dutch, folia
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+77.14%)
Mutual labels:  named-entity-recognition, text-processing
lemma
A Morphological Parser (Analyser) / Lemmatizer written in Elixir.
Stars: ✭ 45 (-35.71%)
Mutual labels:  morphology, morphological-analyser
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+30%)
Mutual labels:  named-entity-recognition, text-processing
deduce
Deduce: de-identification method for Dutch medical text
Stars: ✭ 40 (-42.86%)
Mutual labels:  text-processing, dutch
CrossNER
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
Stars: ✭ 87 (+24.29%)
Mutual labels:  named-entity-recognition

GitHub build Documentation Status Language Machines Badge DOI GitHub release Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch

Copyright 2006-2020
Ko van der Sloot, Maarten van Gompel, Antal van den Bosch, Bertjan Busser

Centre for Language and Speech Technology, Radboud University Nijmegen
Induction of Linguistic Knowledge Research Group, Tilburg University
KNAW Humanities Cluster

Website: https://languagemachines.github.io/frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package. Most modules were created in the 1990s at the ILK Research Group (Tilburg University, the Netherlands) and the CLiPS Research Centre (University of Antwerp, Belgium). Over the years they have been integrated into a single text processing tool, which is currently maintained and developed by the Language Machines Research Group and the Centre for Language and Speech Technology at Radboud University Nijmegen. A dependency parser, a base phrase chunker, and a named-entity recognizer module were added more recently. Where possible, Frog makes use of multi-processor support to run subtasks in parallel. Frog offers a command-line interface (that can also run as a daemon) and a C++ library.

Various (re)programming rounds have been made possible through funding by NWO, the Netherlands Organisation for Scientific Research, particularly under the CGN project, the IMIX programme, the Implicit Linguistics project, the CLARIN-NL programme and the CLARIAH programme.

License

Frog is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version (see the file COPYING)

frog is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Comments and bug-reports are welcome at our issue tracker or by mailing lamasoftware (at) science.ru.nl. Updates and more info may be found on https://languagemachines.github.io/frog .

Installation

To install Frog, first consult whether your distribution's package manager has an up-to-date package:

  • Alpine Linux users can do apk install frog.
  • Debian/Ubuntu users can do apt install frog but this version will likely be significantly out of date!
  • Arch Linux users can install Frog via the AUR.
  • macOS users with homebrew can do: brew tap fbkarsdorp/homebrew-lamachine && brew install frog
  • An OCI container image is also available and can be used with Docker: docker pull proycon/frog. Alternatively, you can build an OCI container image yourself using the provided Dockerfile in this repository.

To compile and install manually from source instead, do the following:

$ bash bootstrap.sh
$ ./configure
$ make
$ make install

and optionally:

$ make check

If you want to automatically download and install the latest stable versions of the required dependencies, then run ./build-deps.sh prior to the above. You can pass a target directory prefix as first argument and you may need to prepend sudo to ensure you can install there. The dependencies are:

You will still need to take care to install the following 3rd party dependencies through your distribution's package manager, as they are not provided by our script:

  • icu - A C++ library for Unicode and Globalization support. On Debian/Ubuntu systems, install the package libicu-dev.
  • libxml2 - An XML library. On Debian/Ubuntu systems install the package libxml2-dev.
  • libexttextcat - A language detection package.
  • A sane build environment with a C++ compiler (e.g. gcc 4.9 or above or clang), make, autotools, libtool, pkg-config

This software has been tested on:

  • Intel platforms running several versions of Linux, including Ubuntu, Debian, Arch Linux, Fedora (both 32 and 64 bits)
  • Apple platform running macOS

Contents of this distribution:

  • Sources
  • Licensing information ( COPYING )
  • Installation instructions ( INSTALL )
  • Build system based on GNU Autotools
  • Container build file ( Dockerfile )
  • Example data files ( in the demos directory )
  • Documentation ( in the docs directory and on https://frognlp.readthedocs.io )

Usage

Run frog --help for basic usage instructions.

Documentation

The Frog documentation can be found on https://frognlp.readthedocs.io

Container Usage

A pre-made container image can be obtained from Docker Hub as follows:

docker pull proycon/frog

You can also build a container image yourself as follows, make sure you are in the root of this repository:

docker build -t proycon/frog .

This builds the latest stable release, if you want to use the latest development version from the git repository instead, do:

docker build -t proycon/frog --build-arg VERSION=development .

Run the frog container interactively as follows, you can pass any additional arguments that frog takes.

docker run -t -i proycon/frog

Add the -v /path/to/your/data:/data parameter if you want to mount your data volume into the container at /data.

Python Binding

If you are looking to use Frog from Python, please see https://github.com/proycon/python-frog instead for the python binding. It is not included in this repository.

Webservice

If you are looking to run Frog as a webservice yourself, please see https://github.com/proycon/frog_webservice . It is not included in this repository.

Credits

Many thanks go out to the people who made the developments of the Frog components possible: Walter Daelemans, Jakub Zavrel, Ko van der Sloot, Sabine Buchholz, Sander Canisius, Gert Durieux, Peter Berck and Maarten van Gompel.

Thanks to Erik Tjong Kim Sang and Lieve Macken for stress-testing the first versions of Tadpole, the predecessor of Frog

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].