All Projects → yohasebe → Engtagger

yohasebe / Engtagger

Licence: gpl-2.0
English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to Engtagger

Make Novice
Automation and Make
Stars: ✭ 122 (-43.78%)
Mutual labels:  english
Parse English
English (natural language) parser
Stars: ✭ 137 (-36.87%)
Mutual labels:  english
Vntk
Vietnamese NLP Toolkit for Node
Stars: ✭ 170 (-21.66%)
Mutual labels:  pos-tagging
Dotnetbook
.NET Platform Architecture book (English, Chinese, Russian)
Stars: ✭ 1,763 (+712.44%)
Mutual labels:  english
Pluralize
Pluralize or singularize any word based on a count
Stars: ✭ 1,808 (+733.18%)
Mutual labels:  english
Jptdp
Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)
Stars: ✭ 146 (-32.72%)
Mutual labels:  pos-tagging
Novels.org
Novels.org - Your Novels in Plain Text (Emacs . org-mode)
Stars: ✭ 120 (-44.7%)
Mutual labels:  english
Sudachipy
Python version of Sudachi, a Japanese tokenizer.
Stars: ✭ 207 (-4.61%)
Mutual labels:  pos-tagging
Workshop Template
The Carpentries Workshop Template
Stars: ✭ 137 (-36.87%)
Mutual labels:  english
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (-26.27%)
Mutual labels:  pos-tagging
Rdrpostagger
A fast and accurate POS and morphological tagging toolkit (EACL 2014)
Stars: ✭ 126 (-41.94%)
Mutual labels:  pos-tagging
R Novice Gapminder
R for Reproducible Scientific Analysis
Stars: ✭ 127 (-41.47%)
Mutual labels:  english
3000
Excel版再要你命3000
Stars: ✭ 158 (-27.19%)
Mutual labels:  english
Roenglishre
An unofficial english translation project for Korea Ragnarok Online (kRO).
Stars: ✭ 121 (-44.24%)
Mutual labels:  english
Python Novice Inflammation
Programming with Python
Stars: ✭ 199 (-8.29%)
Mutual labels:  english
Gse
Go efficient multilingual NLP and text segmentation; support english, chinese, japanese and other. Go 高性能多语言 NLP 和分词
Stars: ✭ 1,695 (+681.11%)
Mutual labels:  english
Indonesian Nlp Resources
data resource untuk NLP bahasa indonesia
Stars: ✭ 143 (-34.1%)
Mutual labels:  pos-tagging
Movies2anki
Convert movies with subtitles to watch them with Anki. Inspired by subs2srs
Stars: ✭ 211 (-2.76%)
Mutual labels:  english
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (-6.45%)
Mutual labels:  pos-tagging
Postcss Spiffing
PostCSS plugin to use British English
Stars: ✭ 158 (-27.19%)
Mutual labels:  english

EngTagger

English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger

Description

A Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained tagger that assigns POS tags to English text based on a lookup dictionary and a set of probability values. The tagger assigns appropriate tags based on conditional probabilities--it examines the preceding tag to determine the appropriate tag for the current word. Unknown words are classified according to word morphology or can be set to be treated as nouns or other parts of speech.
The tagger also extracts as many nouns and noun phrases as it can, using a set of regular expressions.

Features

  • Assigns POS tags to English text
  • Extract noun phrases from tagged text
  • etc.

Synopsis:

require 'rubygems'
require 'engtagger'

# Create a parser object
tgr = EngTagger.new

# Sample text
text = "Alice chased the big fat cat."

# Add part-of-speech tags to text
tagged = tgr.add_tags(text)

#=> "<nnp>Alice</nnp> <vbd>chased</vbd> <det>the</det> <jj>big</jj> <jj>fat</jj><nn>cat</nn> <pp>.</pp>"

# Get a list of all nouns and noun phrases with occurrence counts
word_list = tgr.get_words(text)

#=> {"Alice"=>1, "cat"=>1, "fat cat"=>1, "big fat cat"=>1}

# Get a readable version of the tagged text
readable = tgr.get_readable(text)

#=> "Alice/NNP chased/VBD the/DET big/JJ fat/JJ cat/NN ./PP"

# Get all nouns from a tagged output
nouns = tgr.get_nouns(tagged)

#=> {"cat"=>1, "Alice"=>1}

# Get all proper nouns
proper = tgr.get_proper_nouns(tagged)

#=> {"Alice"=>1}

# Get all past tense verbs
pt_verbs = tgr.get_past_tense_verbs(tagged)

#=> {"chased"=>1}

# Get all the adjectives
adj = tgr.get_adjectives(tagged)

#=> {"big"=>1, "fat"=>1}

# Get all noun phrases of any syntactic level
# (same as word_list but take a tagged input)
nps = tgr.get_noun_phrases(tagged)

#=> {"Alice"=>1, "cat"=>1, "fat cat"=>1, "big fat cat"=>1}

Tag Set

The set of POS tags used here is a modified version of the Penn Treebank tagset. Tags with non-letter characters have been redefined to work better in our data structures. Also, the "Determiner" tag (DET) has been changed from 'DT', in order to avoid confusion with the HTML tag, <DT>.

CC      Conjunction, coordinating               and, or
CD      Adjective, cardinal number              3, fifteen
DET     Determiner                              this, each, some
EX      Pronoun, existential there              there
FW      Foreign words           
IN      Preposition / Conjunction               for, of, although, that
JJ      Adjective                               happy, bad
JJR     Adjective, comparative                  happier, worse
JJS     Adjective, superlative                  happiest, worst
LS      Symbol, list item                       A, A.
MD      Verb, modal                             can, could, 'll
NN      Noun                                    aircraft, data
NNP     Noun, proper                            London, Michael
NNPS    Noun, proper, plural                    Australians, Methodists
NNS     Noun, plural                            women, books
PDT     Determiner, prequalifier                quite, all, half
POS     Possessive                              's, '
PRP     Determiner, possessive second           mine, yours
PRPS    Determiner, possessive                  their, your
RB      Adverb                                  often, not, very, here
RBR     Adverb, comparative                     faster
RBS     Adverb, superlative                     fastest
RP      Adverb, particle                        up, off, out
SYM     Symbol                                  *
TO      Preposition                             to
UH      Interjection                            oh, yes, mmm
VB      Verb, infinitive                        take, live
VBD     Verb, past tense                        took, lived
VBG     Verb, gerund                            taking, living
VBN     Verb, past/passive participle           taken, lived
VBP     Verb, base present form                 take, live
VBZ     Verb, present 3SG -s form               takes, lives
WDT     Determiner, question                    which, whatever
WP      Pronoun, question                       who, whoever
WPS     Determiner, possessive & question       whose
WRB     Adverb, question                        when, how, however

PP      Punctuation, sentence ender             ., !, ?
PPC     Punctuation, comma                      ,
PPD     Punctuation, dollar sign                $
PPL     Punctuation, quotation mark left        ``
PPR     Punctuation, quotation mark right       ''
PPS     Punctuation, colon, semicolon, elipsis  :, ..., -
LRB     Punctuation, left bracket               (, {, [
RRB     Punctuation, right bracket              ), }, ]

Requirements

Install

(sudo) gem install engtagger

Author

of this Ruby library

  • Yoichiro Hasebe (yohasebe [at] gmail.com)

Contributors

  • Carlos Ramirez III
  • Phil London
  • Bazay (Baron Bloomer)

Acknowledgement

This Ruby library is a direct port of Lingua::EN::Tagger available at CPAN. The credit for the crucial part of its algorithm/design therefore goes to Aaron Coburn, the author of the original Perl version.

License

This library is distributed under the GPL. Please see the LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].