All Projects → mbejda → Node Opennlp

mbejda / Node Opennlp

Licence: mit
Apache OpenNLP wrapper for Nodejs

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Node Opennlp

Chatbot ner
chatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (+396.36%)
Mutual labels:  nlp-library
Sudachi
A Japanese Tokenizer for Business
Stars: ✭ 496 (+801.82%)
Mutual labels:  nlp-library
Atr4s
Toolkit with state-of-the-art Automatic Terms Recognition methods in Scala
Stars: ✭ 23 (-58.18%)
Mutual labels:  nlp-library
Giveme5w1h
Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
Stars: ✭ 316 (+474.55%)
Mutual labels:  nlp-library
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+687.27%)
Mutual labels:  nlp-library
Pythainlp
Thai Natural Language Processing in Python.
Stars: ✭ 582 (+958.18%)
Mutual labels:  nlp-library
NLP-tools
Useful python NLP tools (evaluation, GUI interface, tokenization)
Stars: ✭ 39 (-29.09%)
Mutual labels:  nlp-library
Simplenetnlp
.NET NLP library
Stars: ✭ 38 (-30.91%)
Mutual labels:  nlp-library
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+39860%)
Mutual labels:  nlp-library
Underthesea
Underthesea - Vietnamese NLP Toolkit
Stars: ✭ 823 (+1396.36%)
Mutual labels:  nlp-library
Contextualized Topic Models
A python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.
Stars: ✭ 318 (+478.18%)
Mutual labels:  nlp-library
Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (+674.55%)
Mutual labels:  nlp-library
Janome
Japanese morphological analysis engine written in pure Python
Stars: ✭ 630 (+1045.45%)
Mutual labels:  nlp-library
Quick Nlp
Pytorch NLP library based on FastAI
Stars: ✭ 279 (+407.27%)
Mutual labels:  nlp-library
Natas
Python 3 library for processing historical English
Stars: ✭ 28 (-49.09%)
Mutual labels:  nlp-library
Nagisa
A Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (+372.73%)
Mutual labels:  nlp-library
Kagome
Self-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+907.27%)
Mutual labels:  nlp-library
Tika Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Stars: ✭ 997 (+1712.73%)
Mutual labels:  nlp-library
Sentiment Analyser
ML that can extract german and english sentiment
Stars: ✭ 35 (-36.36%)
Mutual labels:  nlp-library
Kuromoji
Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search
Stars: ✭ 745 (+1254.55%)
Mutual labels:  nlp-library

NodeJs OpenNLP

NPM

Node OpenNLP - (OpenNLP 1.6.0)

OpenNLP Wrapper For Node.js

Node-OpenNLP is depended on Node-Java. Please take make sure your environment is properly configured to run Node-Java. Click here to learn more about Node-Java.

Installation

 npm install opennlp --save

Node-OpenNLP comes with Apache OpenNLP 1.6.0 along with the following trained 1.5 series models:

  • en-chunker.bin
  • en-ner-person.bin
  • en-pos-maxent.bin
  • en-sent.bin
  • en-token.bin

More trained models can be found here: http://opennlp.sourceforge.net/models-1.5

Sentence Detector

The OpenNLP Sentence Detector can detect that a punctuation character marks the end of a sentence or not. In this sense a sentence is defined as the longest white space trimmed character sequence between two punctuation marks.

var openNLP = require("opennlp");
var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';
var sentenceDetector = new openNLP().sentenceDetector;
sentenceDetector.sentDetect(sentence, function(err, results) {
  /// To get probabilities
    sentenceDetector.probs(function(error,probability){
      console.log(error,probability)
    })
	console.log(results)
});

Configurations

The following default configurations can be overrided during initialization.

var openNLP = require("opennlp");
var opennlp = new openNLP({
    models : {
        doccat:__dirname + '/models/en-doccat.bin',
        posTagger: __dirname + '/models/en-pos-maxent.bin',
        tokenizer: __dirname + '/models/en-token.bin',
        nameFinder: __dirname + '/models/en-ner-person.bin',
        sentenceDetector: __dirname + '/models/en-sent.bin',
        chunker: __dirname + '/models/en-chunker.bin'
    },
    openNLP = {
        jar: __dirname + "/lib/opennlp-tools-1.6.0.jar"
    }
});

Tokenizer

The OpenNLP Tokenizers segment an input character sequence into tokens. Tokens are usually words, punctuation, numbers, etc.

var openNLP = require("opennlp");
var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';
var tokenizer = new openNLP().tokenizer;
tokenizer.tokenize(sentence, function(err, results) {
    console.log(err,results);
    tokenizer.getTokenProbabilities(function(error, response) {
            console.log(error,response);
    });
});

Name Finder

The Name Finder can detect named entities and numbers in text. To be able to detect entities the Name Finder needs a model. The model is dependent on the language and entity type it was trained for.

var openNLP = require("opennlp");
var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';
var nameFinder = new openNLP().nameFinder;
nameFinder.find(sentence, function(err, tokens_arr) {
    console.log(err, tokens_arr)
    nameFinder.probs(function(error, response) {
        console.log(error, response)
    });
});

Document Categorizer

The OpenNLP Document Categorizer can classify text into pre-defined categories. It is based on maximum entropy framework.

** To use the document categorizer you need to train a model first. The default trained model that is included is for testing purposes only. **

var openNLP = require("opennlp");
var doccat = new openNLP().doccat;
doccat.categorize("I enjoyed watching Rocky", function(err, list) {
    doccat.getAllResults(list, function(err, category) {
    });
    doccat.getBestCategory(list, function(err, category) {
    });
});
doccat.scoreMap("I enjoyed watching Rocky", function(err, category) {
});
doccat.sortedScoreMap("I enjoyed watching Rocky", function(err, category) {
});
doccat.getCategory(1, function(err, category) {
});
doccat.getIndex('Happy', function(err, index) {
});

Part-of-Speech Tagger

The Part of Speech Tagger marks tokens with their corresponding word type based on the token itself and the context of the token. A token might have multiple pos tags depending on the token and the context. The OpenNLP POS Tagger uses a probability model to predict the correct pos tag out of the tag set.

var openNLP = require("opennlp");
var posTagger = new openNLP().posTagger;
var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';
posTagger.tag(sentence, function(err, tokens_arr) {
    console.log(err, tokens_arr)
});
posTagger.topKSequences(sentence, function(error, tagger) {
    console.log(tagger.getScore())
    console.log(tagger.getProbs())
    console.log(tagger.getOutcomes())
});

Chunker

Text chunking consists of dividing a text in syntactically correlated parts of words, like noun groups, verb groups, but does not specify their internal structure, nor their role in the main sentence.

var openNLP = require("opennlp");
var posTagger = new openNLP().posTagger;
var sentence = 'Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .';
var chunker = new openNLP().chunker;
posTagger.tag(sentence, function(err, tokens_arr) {
    chunker.topKSequences(sentence, tokens_arr, function(err, tokens_arr) {
      console.log(err, tokens_arr)
    });
    chunker.chunk(sentence, tokens_arr, function(err, tokens_arr) {
        chunker.probs(function(error, prob) {

        });
    });
});

Please report any bugs. Feel free to send me a tweet if you need any help.
Follow me on Twitter [@notmilobejda](https://twitter.com/notmilobejda)
My Blog [mbejda.com](https://mbejda.com)

Support via Gratipay NPM

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].