All Projects → londogard → londogard-nlp-toolkit

londogard / londogard-nlp-toolkit

Licence: GPL-3.0 license
Londogard Natural Language Processing Toolkit written in Kotlin

Programming Languages

kotlin
9241 projects
Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to londogard-nlp-toolkit

Computer-Science-Learn-Notes
CS(Computer Science)生涯:读书笔记,集成Java知识体系!(Java基础、JVM、JUC、Spring系列、面试八股文、力扣刷题笔记,数据结构和算法、SpringBoot整合其他框架等)
Stars: ✭ 141 (+187.76%)
Mutual labels:  jvm
styx
Programmable, asynchronous, event-based reverse proxy for JVM.
Stars: ✭ 250 (+410.2%)
Mutual labels:  jvm
play-java-chatroom-example
Example Chatroom with Java API
Stars: ✭ 33 (-32.65%)
Mutual labels:  jvm
gchisto
GC日志分析工具,网上不容易找到源码,这里备份一个。不确定工具是否正确,不确定是否有时间研究。
Stars: ✭ 32 (-34.69%)
Mutual labels:  jvm
dragome-sdk
Dragome is a tool for creating client side web applications in pure Java (JVM) language.
Stars: ✭ 79 (+61.22%)
Mutual labels:  jvm
play-scala-compile-di-example
Example Play Project using compile time dependency injection and Play WS with ScalaTest
Stars: ✭ 37 (-24.49%)
Mutual labels:  jvm
mleap
R Interface to MLeap
Stars: ✭ 24 (-51.02%)
Mutual labels:  jvm
java-manta
Java Manta Client SDK
Stars: ✭ 16 (-67.35%)
Mutual labels:  jvm
java-perf-workshop
Guided walkthrough to understand the performance aspects of a Java web service
Stars: ✭ 53 (+8.16%)
Mutual labels:  jvm
Cojen
Java bytecode generation and disassembly tools
Stars: ✭ 28 (-42.86%)
Mutual labels:  jvm
sconfig
Scala configuration library supporting HOCON for Scala, Java, Scala.js, and Scala Native
Stars: ✭ 99 (+102.04%)
Mutual labels:  jvm
jstackSeries.sh
Script for capturing a series of thread dumps from a Java process using jstack (on Linux and Windows)
Stars: ✭ 28 (-42.86%)
Mutual labels:  jvm
openwhisk-runtime-java
Apache OpenWhisk Runtime Java supports Apache OpenWhisk functions written in Java and other JVM-hosted languages
Stars: ✭ 43 (-12.24%)
Mutual labels:  jvm
oh-my-jvm
☕️ using golang write jvm
Stars: ✭ 16 (-67.35%)
Mutual labels:  jvm
aot
Russian morphology for Java
Stars: ✭ 41 (-16.33%)
Mutual labels:  jvm
LLVM-JVM
[W.I.P] A Just-In-Time Java Virtual Machine written in Haskell
Stars: ✭ 22 (-55.1%)
Mutual labels:  jvm
kotlin-guiced
Convenience Kotlin API over the Google Guice DI Library
Stars: ✭ 17 (-65.31%)
Mutual labels:  jvm
JavaHub
Java程序员学习之路,持续更新原创内容,欢迎Star
Stars: ✭ 27 (-44.9%)
Mutual labels:  jvm
jacobin
A more than minimal JVM written in Go and capable of running Java 17 classes.
Stars: ✭ 59 (+20.41%)
Mutual labels:  jvm
SmallVM
TODO: A small and lightweight Java Virtual Machine
Stars: ✭ 23 (-53.06%)
Mutual labels:  jvm

Maven Central Buy Me A Coffee

londogard-nlp-toolkit

Londogard Natural Language Processing Toolkit written in Kotlin for the JVM.
This toolkit will be used throughout Londogard libraries/products such as our Summarizer, Text-Generation & more.

The LanguageSupport enum is used to determine what support different tools like Embeddings or Stopwords have out-of-the-box.

Tool Info Docs Samples (Kotlin Notebook)
Word Embeddings Word & Subword Embeddings available in 157 (fastText.cc) & 275 languages (bpemb) out-of-the-box. embeddings wordembeddings.ipynb
Sentence Embeddings Average & Unsupervised Random Walk Sentence Embeddings sentence-embeddings sentence-embeddings.ipynb
Stopwords Supports 23 languages out-of-the-box through NLTK's list of stopword stopwords stopwords.ipynb
Word Frequencies Supports 34 languages out-of-the-box through LuminosoInsight word frequency tables wordfrequency wordfreq.ipynb
Stemming Supports 14 languages out-of-the-box using Snowball Stemmer under the hood stemming stemmer.ipynb
Tokenizers Char, Word, Subword & Sentence Tokenizer support! SentencePiece? HuggingFace? It's there! - tokenizers
- sentence-tokenizers
tokenizer.ipynb
Vectorizers & Encoders BagOfWords, TF-IDF, BM25 & OneHot - vectorizers (TF-IDF, BM-25,..)
- count-vectorizers (Count, Hash, ..)
encoders (OneHot)
- transforms (TF-IDF, BM-25,..)
TODO
Keyword Extractions CooccurenceKeywords based on algorithm proposed in DOI:10.1142/S0218213004001466 keywords.ipynb
Machine Learning LogisticRegression Classifier (using Gradient Descent), NaïveBayes (binary) & Hidden Markov Model (HMM) as Sequence Classifier - classifiers (LogisticRegression, NaïveBayes)
regression (LinearRegression)
- sequence classifier (HiddenMarkovModel)
See e2e-examples
Deep Learning (Transformers / HuggingFace) ClassifierPipeline and TokenClassifierPipeline which supports HuggingFace ONNX model-names & PyTorch from local files transformers See e2e-examples
spaCy-like API 🚧WIP🚧

Installation

MavenCentral

implementation("com.londogard:nlp:$version")

Guides

Simple end-2-end guides available as notebooks via docs/samples.

This includes:

  1. IMDB Sentiment Analysis using Logistic Regression or Naïve Bayes
  2. IMDB Sentiment Analysis using HuggingFace Transformers, using ClassifierPipeline.create(<model-name>)
  3. POS-Tagging using Hidden Markov Model
  4. POS-Tagging using HuggingFace Transformers, using TokenClassifierPipeline.create(<model-name>)

& potentially more.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].