All Projects → mideind → GreynirPackage

mideind / GreynirPackage

Licence: other
The Greynir NLP parser for Icelandic, packaged for PyPI

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to GreynirPackage

Lark
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
Stars: ✭ 2,916 (+5851.02%)
Mutual labels:  parsing-engine, earley, parsing-library
Covfefe
A parser for nondeterministic context free languages
Stars: ✭ 49 (+0%)
Mutual labels:  parsing, earley
pe
Fastest general-purpose parsing library for Python with a familiar API
Stars: ✭ 21 (-57.14%)
Mutual labels:  parsing, parsing-library
Nearley
📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
Stars: ✭ 3,089 (+6204.08%)
Mutual labels:  parsing, parsing-library
DotGrok
Parse text with pattern. Inspired by grok filter.
Stars: ✭ 26 (-46.94%)
Mutual labels:  parsing, parsing-library
extract-emails
Extract emails from a given website
Stars: ✭ 58 (+18.37%)
Mutual labels:  parsing, parsing-library
parson
Yet another PEG parser combinator library and DSL
Stars: ✭ 52 (+6.12%)
Mutual labels:  parsing, parsing-library
FAParser
JSON Parsing + Archiving & Unarchiving in User Defaults
Stars: ✭ 67 (+36.73%)
Mutual labels:  parsing, parsing-library
ansicolor
A JavaScript ANSI color/style management. ANSI parsing. ANSI to CSS. Small, clean, no dependencies.
Stars: ✭ 91 (+85.71%)
Mutual labels:  parsing
StatementParser
Idea behind the StatementParser is, that it would be nice to be able to process financial data from different kind of statements in automatized way. This is often pretty hard as brokers are giving these data only in form of xls/xlst/pdf or other format which is not directly processable and here comes StatmentParser.
Stars: ✭ 21 (-57.14%)
Mutual labels:  parsing
fyodor
Convert your Amazon Kindle highlights and notes into markdown (or any format).
Stars: ✭ 101 (+106.12%)
Mutual labels:  parsing
left-recursion
Quick explanation of eliminating left recursion in Haskell parsers
Stars: ✭ 36 (-26.53%)
Mutual labels:  parsing
tree-hugger
A light-weight, extendable, high level, universal code parser built on top of tree-sitter
Stars: ✭ 96 (+95.92%)
Mutual labels:  parsing
OpenSIEM-Logstash-Parsing
SIEM Logstash parsing for more than hundred technologies
Stars: ✭ 140 (+185.71%)
Mutual labels:  parsing
TeamReference
Team reference for Competitive Programming. Algorithms implementations very used in the ACM-ICPC contests. Latex template to build your own team reference.
Stars: ✭ 29 (-40.82%)
Mutual labels:  parsing
CoreJSON
Core Foundation, libyajl based JSON support.
Stars: ✭ 48 (-2.04%)
Mutual labels:  parsing
angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (+10.2%)
Mutual labels:  parsing
Deep-NLP-Resources
Curated list of all NLP Resources
Stars: ✭ 65 (+32.65%)
Mutual labels:  parsing
sledgehammer
🔨 📶 WiFi-Jammer/DoS toolset
Stars: ✭ 34 (-30.61%)
Mutual labels:  parsing
comby-rust
Rust refactoring templates for comby, the structural find-and-replace tool.
Stars: ✭ 23 (-53.06%)
Mutual labels:  parsing

https://github.com/mideind/GreynirPackage/blob/master/doc/_static/GreynirLogo220.png?raw=true

A fast, efficient natural language processor for Icelandic

https://github.com/mideind/GreynirPackage/workflows/Python%20package/badge.svg?branch=master

Overview

Greynir is a Python 3 (>= 3.6) package, published by Miðeind ehf., for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more.

Greynir's sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions.

Full documentation for Greynir is available here.

Greynir is the engine of Greynir.is, a natural-language front end for a database of over 10 million sentences parsed from Icelandic news articles, and Embla, a voice-driven virtual assistant app for smart devices such as iOS and Android phones.

Greynir includes a hand-written context-free grammar for the Icelandic language, consisting of over 7,000 lines of grammatical productions in extended Backus-Naur format. Its fast C++ parser core is able to cope with long and ambiguous sentences, using an Earley-type parser as enhanced by Scott and Johnstone.

Greynir employs the Tokenizer package, by the same authors, to tokenize text, and uses BinPackage as its database of Icelandic vocabulary and morphology.

Examples

Use Greynir to easily inflect noun phrases:

from reynir import NounPhrase as Nl

# Create a NounPhrase ('nafnliður') object
karfa = Nl("þrír lúxus-miðar á Star Wars og tveir brimsaltir pokar af poppi")

# Print the NounPhrase in the correct case for each context
# (þf=þolfall/accusative, þgf=þágufall/dative). Note that
# the NounPhrase class implements __format__(), allowing you
# to use the case as a format specification, for instance in f-strings.

print(f"Þú keyptir {karfa:þf}.")
print(f"Hér er kvittunin þín fyrir {karfa:þgf}.")

The program outputs the following text, correctly inflected:

Þú keyptir þrjá lúxus-miða á Star Wars og tvo brimsalta poka af poppi.
Hér er kvittunin þín fyrir þremur lúxus-miðum á Star Wars og tveimur brimsöltum pokum af poppi.

Use Greynir to parse a sentence:

>>> from reynir import Greynir
>>> g = Greynir()
>>> sent = g.parse_single("Ása sá sól.")
>>> print(sent.tree.view)
P                               # Root
+-S-MAIN                        # Main sentence
  +-IP                          # Inflected phrase
    +-NP-SUBJ                   # Noun phrase, subject
      +-no_et_nf_kvk: 'Ása'     # Noun, singular, nominative, feminine
    +-VP                        # Verb phrase containing arguments
      +-VP                      # Verb phrase containing verb
        +-so_1_þf_et_p3: 'sá'   # Verb, 1 accusative arg, singular, 3rd p
      +-NP-OBJ                  # Noun phrase, object
        +-no_et_þf_kvk: 'sól'   # Noun, singular, accusative, feminine
+-'.'                           # Punctuation
>>> sent.tree.nouns
['Ása', 'sól']
>>> sent.tree.verbs
['sjá']
>>> sent.tree.flat
'P S-MAIN IP NP-SUBJ no_et_nf_kvk /NP-SUBJ VP so_1_þf_et_p3
    NP-OBJ no_et_þf_kvk /NP-OBJ /VP /IP /S-MAIN p /P'
>>> # The subject noun phrase (S.IP.NP also works)
>>> sent.tree.S.IP.NP_SUBJ.lemmas
['Ása']
>>> # The verb phrase
>>> sent.tree.S.IP.VP.lemmas
['sjá', 'sól']
>>> # The object within the verb phrase (S.IP.VP.NP also works)
>>> sent.tree.S.IP.VP.NP_OBJ.lemmas
['sól']

Prerequisites

This package runs on CPython 3.6 or newer, and on PyPy 3.6 or newer.

To find out which version of Python you have, enter:

$ python --version

If a binary wheel package isn't available on PyPi for your system, you may need to have the python3-dev package (or its Windows equivalent) installed on your system to set up Greynir successfully. This is because a source distribution install requires a C++ compiler and linker:

$ # Debian or Ubuntu:
$ sudo apt-get install python3-dev

Depending on your system, you may also need to install libffi-dev:

$ # Debian or Ubuntu
$ sudo apt-get install libffi-dev

Installation

To install this package, assuming Python 3 is your default Python:

$ pip install reynir

If you have git installed and want to be able to edit the source, do like so:

$ git clone https://github.com/mideind/GreynirPackage
$ cd GreynirPackage
$ # [ Activate your virtualenv here if you have one ]
$ pip install -e .

The package source code is now in GreynirPackage/src/reynir.

Tests

To run the built-in tests, install pytest, cd to your GreynirPackage subdirectory (and optionally activate your virtualenv), then run:

$ python -m pytest

Evaluation

A parsing test pipeline for different parsing schemas, including the Greynir schema, has been developed. It is available here.

Documentation

Please consult Greynir's documentation for detailed installation instructions, a quickstart guide, and reference information, as well as important information about copyright and licensing.

Copyright and licensing

Greynir is copyright © 2021 by Miðeind ehf.. The original author of this software is Vilhjálmur Þorsteinsson.

This software is licensed under the MIT License:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Miðeind ehf.

If you would like to use this software in ways that are incompatible with the standard MIT license, contact Miðeind ehf. to negotiate custom arrangements.


GreynirPackage indirectly embeds the Database of Icelandic Morphology (Beygingarlýsing íslensks nútímamáls), abbreviated BÍN. GreynirPackage does not claim any endorsement by the BÍN authors or copyright holders.

The BÍN source data are publicly available under the CC BY-SA 4.0 license, as further detailed here in English and here in Icelandic.

In accordance with the BÍN license terms, credit is hereby given as follows:

Beygingarlýsing íslensks nútímamáls. Stofnun Árna Magnússonar í íslenskum fræðum. Höfundur og ritstjóri Kristín Bjarnadóttir.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].