All Projects → bastienbot → Nlp Js Tools French

bastienbot / Nlp Js Tools French

Licence: mit
POS Tagger, lemmatizer and stemmer for french language in javascript

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Nlp Js Tools French

hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+215.63%)
Mutual labels:  tokenizer, stemmer
Pg2bq
Export PostgreSQL tables to Google BigQuery
Stars: ✭ 30 (-6.25%)
Mutual labels:  postgresql
Notes App
Node.js application - simple notes management using Express, Postgres, Objection.js, Docker, Socket.io, Bluebird Promises
Stars: ✭ 14 (-56.25%)
Mutual labels:  postgresql
Hydrocarbon
not just an rss reader
Stars: ✭ 21 (-34.37%)
Mutual labels:  postgresql
Arabic Light Stemmer
Arabic light stemmer. Light stemming for Arabic words removes prefixes and suffixes and normalizes words
Stars: ✭ 14 (-56.25%)
Mutual labels:  stemmer
Tbls
tbls is a CI-Friendly tool for document a database, written in Go.
Stars: ✭ 940 (+2837.5%)
Mutual labels:  postgresql
Analytics
Simple, open-source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.
Stars: ✭ 9,469 (+29490.63%)
Mutual labels:  postgresql
Soci
Official repository of the SOCI - The C++ Database Access Library
Stars: ✭ 960 (+2900%)
Mutual labels:  postgresql
Wait4x
Wait4X is a cli tool to wait for everything! It can be wait for a port to open or enter to rquested state.
Stars: ✭ 30 (-6.25%)
Mutual labels:  postgresql
Postgresql Postgis Timescaledb
PostgreSQL + PostGIS + TimescaleDB docker image 🐘🌎📈
Stars: ✭ 19 (-40.62%)
Mutual labels:  postgresql
Eosio sql plugin
EOSIO sql database plugin
Stars: ✭ 21 (-34.37%)
Mutual labels:  postgresql
Guardian auth
The Guardian Authentication Implementation Using Ecto/Postgresql Elixir Phoenix [ User Authentication ]
Stars: ✭ 15 (-53.12%)
Mutual labels:  postgresql
Lfuzzer
Fuzzing Parsers with Tokens
Stars: ✭ 28 (-12.5%)
Mutual labels:  tokenizer
Express Starter
Express Starter
Stars: ✭ 14 (-56.25%)
Mutual labels:  postgresql
Omnicat Bayes
Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-6.25%)
Mutual labels:  tokenizer
Run johnny
An endless runner game built on phaser and nodejs
Stars: ✭ 14 (-56.25%)
Mutual labels:  postgresql
Treefrog Framework
TreeFrog Framework : High-speed C++ MVC Framework for Web Application
Stars: ✭ 885 (+2665.63%)
Mutual labels:  postgresql
Kotgres
SQL generator and result set mapper for Postgres and Kotlin
Stars: ✭ 21 (-34.37%)
Mutual labels:  postgresql
Pgwatch2
PostgreSQL metrics monitor/dashboard
Stars: ✭ 960 (+2900%)
Mutual labels:  postgresql
Lealone Plugins
与 Lealone 集成的各类插件(例如网络框架以及不同的数据库协议和存储引擎)
Stars: ✭ 31 (-3.12%)
Mutual labels:  postgresql

NLP Javascript tools for french language

Tokenize, POS Tagger, lemmatizer and stemmer

This package is partly based on the Snowball stemming algorythm and the javascript adaptation by Kasun Gajasinghe, University of Moratuwa

This package offers 4 NLP tools in javascript for french language :

  • Tokenizing
  • POS Tagging
  • Lemmatizing
  • Stemming

Install

npm install nlp-js-tools-french

Usage

var NlpjsTFr = require('nlp-js-tools-french');

Corpus to use

var corpus = "Elle semble se nourrir essentiellement de plancton, et de hotdog.";

Configs

var config = {
    tagTypes: ['art', 'ver', 'nom'],
    strictness: false,
    minimumLength: 3,
    debug: true
};

New instance with specific corpus and configs

var nlpToolsFr = new NlpjsTFr(corpus, config);

These are the available methods, self-explanatory. Note: The sentence that is passed into the class earlier is automaticaly tokenized.

var tokenizedWords = nlpToolsFr.tokenized;
var posTaggedWords = nlpToolsFr.posTagger();
var lemmatizedWords = nlpToolsFr.lemmatizer();
var stemmedWords = nlpToolsFr.stemmer();
var stemmedWord = nlpToolsFr.wordStemmer("aléatoirement");

Attributes

config

Shows config

tokenized

["semble", "nourrir", "de"]

Methods return

posTagger()

[{
  "id": 1,
  "word": "semble",
  "pos": [
   "VER",
   "VER"
  ]
 },
 {
  "id": 2,
  "word": "nourrir",
  "pos": [
   "VER"
  ]
 },
 {
  "id": 3,
  "word": "de",
  "pos": [
   "NOM",
   "ART:def",
   "PRE"
  ]
 }]

lemmatizer()

[{
  "id": 1,
  "word": "semble",
  "lemma": "sembler"
 },
 {
  "id": 2,
  "word": "nourrir",
  "lemma": "nourrir"
 },
 {
  "id": 3,
  "word": "de",
  "lemma": "de"
 }]

stemmer()

[{
  "id": 1,
  "word": "semble",
  "stem": "sembl"
 },
 {
  "id": 3,
  "word": "nourrir",
  "stem": "nourr"
 },
 {
  "id": 5,
  "word": "de",
  "stem": "de"
}]

wordStemmer(word)

{
    word: "aléatoirement",
    stem: "aléatoir"
}

Config

Option Type Default Description
tagTypes Array ["adj", "adv", "art", "con", "nom", "ono", "pre", "ver", "pro"] List of dictionnaries the package will look in, in case you only need verbs or nouns, both or whatever else. If a word does not belong to any type, it is tagged as "UNK".
strictness Bool false If you set the strictness to true and try to POS Tag the word generalement, it will fail because the word is missine its accents. On the other hand, trying to POS Tag the word with the strictness set to false well return the types art, pre and nom because the word will match de in these dictionnaries.
minimumLength Int 1 Algorythms will ignore words that are shorter than this parameter.
debug Bool false Enable console debug
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].