All Projects → winkjs → wink-nlp

winkjs / wink-nlp

Licence: MIT license
Developer friendly Natural Language Processing ✨

Programming Languages

javascript
184084 projects - #8 most used programming language
typescript
32286 projects

Projects that are alternatives of or similar to wink-nlp

Malaya
Natural Language Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Stars: ✭ 239 (-23.4%)
Mutual labels:  sentiment-analysis, ner, pos-tagging
Xmnlp
xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首等功能
Stars: ✭ 591 (+89.42%)
Mutual labels:  sentiment-analysis, ner
Rust Bert
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
Stars: ✭ 510 (+63.46%)
Mutual labels:  sentiment-analysis, ner
Nlp Papers
Papers and Book to look at when starting NLP 📚
Stars: ✭ 111 (-64.42%)
Mutual labels:  sentiment-analysis, ner
Qutuf
Qutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
Stars: ✭ 84 (-73.08%)
Mutual labels:  pattern-matching, pos-tagging
wink-sentiment
Accurate and fast sentiment scoring of phrases with #hashtags, emoticons :) & emojis 🎉
Stars: ✭ 51 (-83.65%)
Mutual labels:  sentiment-analysis, wink
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (-72.76%)
Mutual labels:  sentiment-analysis, ner
Vncorenlp
A Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+13.46%)
Mutual labels:  ner, pos-tagging
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+576.92%)
Mutual labels:  sentiment-analysis, ner
Indonesian Nlp Resources
data resource untuk NLP bahasa indonesia
Stars: ✭ 143 (-54.17%)
Mutual labels:  sentiment-analysis, pos-tagging
Pytorch ner bilstm cnn crf
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF implement in pyotrch
Stars: ✭ 249 (-20.19%)
Mutual labels:  ner, pos-tagging
extractacy
Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)
Stars: ✭ 47 (-84.94%)
Mutual labels:  pattern-matching, ner
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (-34.94%)
Mutual labels:  ner, pos-tagging
Bertweet
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Stars: ✭ 282 (-9.62%)
Mutual labels:  sentiment-analysis, ner
Phonlp
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)
Stars: ✭ 56 (-82.05%)
Mutual labels:  ner, pos-tagging
Chatbot cn
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Stars: ✭ 791 (+153.53%)
Mutual labels:  sentiment-analysis, ner
fairseq-tagging
a Fairseq fork for sequence tagging/labeling tasks
Stars: ✭ 26 (-91.67%)
Mutual labels:  ner, pos-tagging
Phobert
PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
Stars: ✭ 332 (+6.41%)
Mutual labels:  ner, pos-tagging
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-60.26%)
Mutual labels:  sentiment-analysis, ner
Paribhasha
paribhasha.herokuapp.com/
Stars: ✭ 21 (-93.27%)
Mutual labels:  sentiment-analysis, pos-tagging

winkNLP

Build Status Coverage Status Known Vulnerabilities CII Best Practices Gitter Follow on Twitter

Developer friendly Natural Language Processing

winkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP solutions easier and faster, winkNLP is optimized for the right balance of performance and accuracy. The package can handle large amount of raw text at speeds over 525,000 tokens/second. And with a test coverage of ~100%, winkNLP is a tool for building production grade systems with confidence.

Wink Wizard Showcase

Features

WinkNLP has a comprehensive natural language processing (NLP) pipeline covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer):

Processing pipeline: text, tokenization, SBD, negation, sentiment, NER, POS, CER

At every stage a range of properties become accessible for tokens, sentences, and entities. Read more about the processing pipeline and how to configure it in the winkNLP documentation.

It packs a rich feature set into a small foot print codebase of under 1500 lines:

  1. Fast, lossless & multilingual tokenizer

  2. Developer friendly and intuitive API

  3. Built-in API to aid text visualization

  4. Extensive text processing features such as bag-of-words, frequency table, stop word removal, readability statistics computation and many more.

  5. Pre-trained language models with sizes starting from <3MB onwards

  6. BM25-based vectorizer

  7. Multiple similarity methods

  8. Word vector integration

  9. No external dependencies

  10. Runs on web browsers

  11. Typescript support.

Installation

Use npm install:

npm install wink-nlp --save

In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The following table outlines the version specific installation command:

Node.js Version Installation
16 or 18 npm install wink-eng-lite-web-model --save
14 or 12 node -e "require('wink-nlp/models/install')"

The wink-eng-lite-web-model is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section.

The second command installs the wink-eng-lite-model, which works with Node.js version 14 or 12.

How to install for Web Browser

If you’re using winkNLP in the browser use the wink-eng-lite-web-model. Learn about its installation and usage in our guide to using winkNLP in the browser. Explore winkNLP recipes on Observable for live browser based examples.

Getting Started

The "Hello World!" in winkNLP is given below:

// Load wink-nlp package.
const winkNLP = require( 'wink-nlp' );
// Load english language model.
const model = require( 'wink-eng-lite-web-model' );
// Instantiate winkNLP.
const nlp = winkNLP( model );
// Obtain "its" helper to extract item properties.
const its = nlp.its;
// Obtain "as" reducer helper to reduce a collection.
const as = nlp.as;
 
// NLP Code.
const text = 'Hello   World🌎! How are you?';
const doc = nlp.readDoc( text );
 
console.log( doc.out() );
// -> Hello   World🌎! How are you?
 
console.log( doc.sentences().out() );
// -> [ 'Hello   World🌎!', 'How are you?' ]
 
console.log( doc.entities().out( its.detail ) );
// -> [ { value: '🌎', type: 'EMOJI' } ]
 
console.log( doc.tokens().out() );
// -> [ 'Hello', 'World', '🌎', '!', 'How', 'are', 'you', '?' ]
 
console.log( doc.tokens().out( its.type, as.freqTable ) );
// -> [ [ 'word', 5 ], [ 'punctuation', 2 ], [ 'emoji', 1 ] ]

Experiment with the above code on RunKit.

Explore Further

Dive into winkNLP's concepts or head to winkNLP recipes for common NLP tasks or just explore live showcases to learn:

Wikipedia Timeline

Reads any wikipedia article and generates a visual timeline of all its events.

NLP Wizard 🧙

Performs tokenization, sentence boundary detection, pos tagging, named entity detection and sentiment analysis of user input text in real time.

Naive Wikification Tool 🔗

Links entities such as famous persons, locations or objects to the relevant Wikipedia pages.

Speed & Accuracy

The winkNLP processes raw text at ~525,000 tokens per second with its default language model — wink-eng-lite-model, when benchmarked using "Ch 13 of Ulysses by James Joyce" on a 2.2 GHz Intel Core i7 machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.

The benchmark was conducted on Node.js versions 14.8.0, and 12.18.3. It delivered similar/better performance on Node.js versions 16/18.

The winkNLP delivers similar performance on browsers; its performance on a specific machine/browser combination can be measured using the Observable notebook — How to measure winkNLP's speed on browsers?.

It pos tags a subset of WSJ corpus with an accuracy of ~94.7% — this includes tokenization of raw text prior to pos tagging. The current state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.

Its general purpose sentiment analysis delivers a f-score of ~84.5%, when validated using Amazon Product Review Sentiment Labelled Sentences Data Set at UCI Machine Learning Repository. The current benchmark accuracy for specifically trained models can range around 95%.

Memory Requirement

Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entire History of India Volume I with a total peak memory requirement of under 80MB. The book has around 350 pages which translates to over 125,000 tokens.

Documentation

  • Concepts — everything you need to know to get started.
  • API Reference — explains usage of APIs with examples.
  • Change log — version history along with the details of breaking changes, if any.
  • Showcases — live examples with code to give you a head start.

Need Help?

Usage query 👩🏽‍💻

Please ask at Stack Overflow or discuss at Wink JS GitHub Discussions or chat with us at Wink JS Gitter Lobby.

Bug report 🐛

If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a PR.

New feature

Looking for a new feature, request it via the new features & ideas discussion forum or consider becoming a contributor.

About wink

Wink is a family of open source packages for Natural Language Processing, Machine Learning, and Statistical Analysis in NodeJS. The code is thoroughly documented for easy human comprehension and has a test coverage of ~100% for reliability to build production grade solutions.

Copyright & License

Wink NLP is copyright 2017-22 GRAYPE Systems Private Limited.

It is licensed under the terms of the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].