Nlp JourneyDocuments, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation),etc. All codes are implemented intensorflow 2.0.
Stars: ✭ 1,290 (+6689.47%)
MusaeThe reference implementation of "Multi-scale Attributed Node Embedding".
Stars: ✭ 75 (+294.74%)
NLP-paper🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (+21.05%)
pydataberlin-2017Repo for my talk at the PyData Berlin 2017 conference
Stars: ✭ 63 (+231.58%)
MagnitudeA fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+7236.84%)
GemsecThe TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+1005.26%)
Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+4057.89%)
ShallowlearnAn experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+931.58%)
WebvectorsWeb-ify your word2vec: framework to serve distributional semantic models online
Stars: ✭ 154 (+710.53%)
AravecAraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Stars: ✭ 239 (+1157.89%)
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (+152.63%)
doc2vec-apidocument embedding and machine learning script for beginners
Stars: ✭ 92 (+384.21%)
Log Anomaly DetectorLog Anomaly Detection - Machine learning to detect abnormal events logs
Stars: ✭ 169 (+789.47%)
NMFADMMA sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (+105.26%)
Sense2vec🦆 Contextually-keyed word vectors
Stars: ✭ 1,184 (+6131.58%)
Word2VecAndTsneScripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (+136.84%)
Role2vecA scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).
Stars: ✭ 134 (+605.26%)
GermanwordembeddingsToolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (+894.74%)
GensimTopic Modelling for Humans
Stars: ✭ 12,763 (+67073.68%)
biovecProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.
Stars: ✭ 23 (+21.05%)
RolXAn alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)
Stars: ✭ 52 (+173.68%)
TadwAn implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (+126.32%)
Word2vec訓練中文詞向量 Word2vec, Word2vec was created by a team of researchers led by Tomas Mikolov at Google.
Stars: ✭ 48 (+152.63%)
Ml ProjectsML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
Stars: ✭ 127 (+568.42%)
walkletsA lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).
Stars: ✭ 94 (+394.74%)
SplitterA Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
Stars: ✭ 177 (+831.58%)
word2vec-pt-brImplementação e modelo gerado com o treinamento (trigram) da wikipedia em pt-br
Stars: ✭ 34 (+78.95%)
Ja.text8Japanese text8 corpus for word embedding.
Stars: ✭ 79 (+315.79%)
Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+34931.58%)
Lmdb EmbeddingsFast word vectors with little memory usage in Python
Stars: ✭ 404 (+2026.32%)
Russian news corpusRussian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ
Stars: ✭ 76 (+300%)
word-embeddings-from-scratchCreating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (+15.79%)
lda2vecMixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (+42.11%)
Product-Categorization-NLPMulti-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (+57.89%)
spark-word2vecA parallel implementation of word2vec based on Spark
Stars: ✭ 24 (+26.32%)
tokenizrString Tokenization Library for JavaScript
Stars: ✭ 70 (+268.42%)
brauzieAwesome CLI for fetching JWT tokens for OAuth2.0 clients
Stars: ✭ 14 (-26.32%)
NEMPayAdaptable Android & iOS Mosaic Wallet for NEM Blockchain
Stars: ✭ 36 (+89.47%)
EL1T3🖤 Ƭ𝘩𝘦 𝘮𝘰𝘴𝘵 𝘱𝘰𝘸𝘦𝘳𝘧𝘶𝘭𝘭 𝘢𝘯𝘥 𝘉𝘦𝘵𝘵𝘦𝘳 𝘵𝘰𝘬𝘦𝘯 𝘴𝘵𝘦𝘢𝘭𝘦𝘳.
Stars: ✭ 41 (+115.79%)
sent2vecHow to encode sentences in a high-dimensional vector space, a.k.a., sentence embedding.
Stars: ✭ 99 (+421.05%)
IoT-Technical-Guide🐝 IoT Technical Guide --- 从零搭建高性能物联网平台及物联网解决方案和Thingsboard源码分析 ✨ ✨ ✨ (IoT Platform, SaaS, MQTT, CoAP, HTTP, Modbus, OPC, WebSocket, 物模型,Protobuf, PostgreSQL, MongoDB, Spring Security, OAuth2, RuleEngine, Kafka, Docker)
Stars: ✭ 2,565 (+13400%)
DeepSentiPersRepository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-10.53%)
OpenDialogAn Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (+394.74%)
node-uid-generatorGenerates cryptographically strong pseudo-random UIDs with custom size and base-encoding
Stars: ✭ 21 (+10.53%)
open-discourseOpen Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Stars: ✭ 47 (+147.37%)
dialogue-datasetscollect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (+26.32%)
reachLoad embeddings and featurize your sentences.
Stars: ✭ 17 (-10.53%)
EasyTokenGeneratorThis repo aims to dynamically and simply generate tokens in Token Based systems.
Stars: ✭ 15 (-21.05%)
jwt-tokenJson web token generation and validation.
Stars: ✭ 14 (-26.32%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (+15.79%)