All Projects → zencephalon → Tactful_Tokenizer

zencephalon / Tactful_Tokenizer

Licence: other
Accurate Bayesian sentence tokenizer in Ruby.

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to Tactful Tokenizer

tokenizer
A simple tokenizer in Ruby for NLP tasks.
Stars: ✭ 44 (-44.3%)
Mutual labels:  rubynlp
alexa-ruby
Ruby toolkit for Amazon Alexa service
Stars: ✭ 17 (-78.48%)
Mutual labels:  rubynlp
Elasticsearch Ruby
Ruby integrations for Elasticsearch
Stars: ✭ 1,848 (+2239.24%)
Mutual labels:  rubynlp
Machine Learning With Ruby
Curated list: Resources for machine learning in Ruby
Stars: ✭ 1,693 (+2043.04%)
Mutual labels:  rubynlp
rsyntaxtree
Syntax tree generator made with Ruby and RMagic
Stars: ✭ 62 (-21.52%)
Mutual labels:  rubynlp
nlp-pure
Natural language processing algorithms implemented in pure Ruby with minimal dependencies
Stars: ✭ 19 (-75.95%)
Mutual labels:  rubynlp

TactfulTokenizer

Gem Version Build Status Coverage Status

TactfulTokenizer is a Ruby library for high quality sentence tokenization. It uses a Naive Bayesian statistical model, and is based on Splitta, but has support for '?' and '!' as well as primitive handling of XHTML markup. Better support for XHTML parsing is coming shortly.

Additionally supports unicode text tokenization.

Usage

require "tactful_tokenizer"
m = TactfulTokenizer::Model.new
m.tokenize_text("Here in the U.S. Senate we prefer to eat our friends. Is it easier that way? <em>Yes.</em> <em>Maybe</em>!")
#=> ["Here in the U.S. Senate we prefer to eat our friends.", "Is it easier that way?", "<em>Yes.</em>", "<em>Maybe</em>!"]

The input text is expected to consist of paragraphs delimited by line breaks.

Installation

gem install tactful_tokenizer

Author

Copyright © 2010 Matthew Bunday. All rights reserved. Released under the GNU GPL v3.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].