SacremosesPython port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (+671.05%)
Contextualized Topic ModelsA python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.
Stars: ✭ 318 (+736.84%)
Customer satisfaction analysis基于在线民宿 UGC 数据的意见挖掘项目,包含数据挖掘和NLP 相关的处理,负责数据采集、主题抽取、情感分析等任务。目的是克服用户打分和评论不一致,实时对在线民宿的满意度评测,包含在线评论采集和情感可视化分析。搭建了百度地图POI查询入口,可以进行自动化的批量查询 POI 信息的功能;构建了基于在线民宿语料的 LDA 自动主题聚类模型,利用主题中心词能找出对应的主题属性字典;以用户打分作为标注,然后 litNlp 自带的字符级 TextCNN 进行情感分析,将情感分类概率分布作为情感趋势,最后通过 POI 热力图的方式对不同地域的民宿满意度进行展示。软件版本请见链接。
Stars: ✭ 262 (+589.47%)
Snl CompilerSNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析
Stars: ✭ 19 (-50%)
cang-jieChinese tokenizer for tantivy, based on jieba-rs
Stars: ✭ 48 (+26.32%)
MooOptimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Stars: ✭ 434 (+1042.11%)
Lingua👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Stars: ✭ 341 (+797.37%)
sensimSentence Similarity Estimator (SenSim)
Stars: ✭ 15 (-60.53%)
TapasEnd-to-end neural table-text understanding models.
Stars: ✭ 583 (+1434.21%)
SentencesA multilingual command line sentence tokenizer in Golang
Stars: ✭ 293 (+671.05%)
TokenizerA small library for converting tokenized PHP source code into XML (and potentially other formats)
Stars: ✭ 4,770 (+12452.63%)
pascal-interpreterA simple interpreter for a large subset of Pascal language written for educational purposes
Stars: ✭ 21 (-44.74%)
LfuzzerFuzzing Parsers with Tokens
Stars: ✭ 28 (-26.32%)
Hebrew-TokenizerA very simple python tokenizer for Hebrew text.
Stars: ✭ 16 (-57.89%)
Smoothnlp专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
Stars: ✭ 435 (+1044.74%)
NLPnoteGitbook Address: https://app.gitbook.com/@nlpgroup/s/nlpnote/
Stars: ✭ 101 (+165.79%)
NatashaSolves basic Russian NLP tasks, API for lower level Natasha projects
Stars: ✭ 788 (+1973.68%)
text2textText2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+394.74%)
Php Parser🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)
Stars: ✭ 400 (+952.63%)
nlp newsletterNatural language processing (NLP) newsletter right on GitHub
Stars: ✭ 57 (+50%)
DeeppavlovAn open source library for deep learning end-to-end dialog systems and chatbots.
Stars: ✭ 5,525 (+14439.47%)
LexmachineLex machinary for go.
Stars: ✭ 335 (+781.58%)
FrisoHigh performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+723.68%)
KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+1357.89%)
DabData Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ
Stars: ✭ 294 (+673.68%)
Omnicat BayesNaive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-21.05%)
NerNamed Entity Recognition
Stars: ✭ 288 (+657.89%)
Nlp base自然语言基础模型
Stars: ✭ 524 (+1278.95%)
Data Science HacksData Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+618.42%)
Click2analyze AndroiddevchallengeAn app to analyze the text and fixing the anomaly of the message that deviates from what is standard, normal, or expected. #AndroidDevChallenge
Stars: ✭ 20 (-47.37%)
JumanppJuman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+568.42%)
BabyaiBabyAI platform. A testbed for training agents to understand and execute language commands.
Stars: ✭ 490 (+1189.47%)
ArabicProcessingCogA Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-50%)
fairseq-tagginga Fairseq fork for sequence tagging/labeling tasks
Stars: ✭ 26 (-31.58%)
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+1052.63%)
Rasa UiRasa UI is a frontend for the Rasa Framework
Stars: ✭ 796 (+1994.74%)
nlp-qrmine🔦 Qualitative Research support tools in Python
Stars: ✭ 28 (-26.32%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+1039.47%)
sent2vecHow to encode sentences in a high-dimensional vector space, a.k.a., sentence embedding.
Stars: ✭ 99 (+160.53%)
Sdtm mapperAI SDTM mapping (R for ML, Python, TensorFlow for DL)
Stars: ✭ 27 (-28.95%)
PaddleTokenizer使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-63.16%)
Natural-Language-ProcessingContains various architectures and novel paper implementations for Natural Language Processing tasks like Sequence Modelling and Neural Machine Translation.
Stars: ✭ 48 (+26.32%)
Mustard🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Stars: ✭ 689 (+1713.16%)
JflexThe fast scanner generator for Java™ with full Unicode support
Stars: ✭ 380 (+900%)
SharpmathA small .NET math library.
Stars: ✭ 36 (-5.26%)
Nlp Js Tools FrenchPOS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (-15.79%)
Soynlp한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Stars: ✭ 613 (+1513.16%)
Text mining resourcesResources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+842.11%)