All Projects → cliuxinxin → Tx Word2vec Small

cliuxinxin / Tx Word2vec Small

腾讯word2vec模型缩小版

Projects that are alternatives of or similar to Tx Word2vec Small

A3c trading
Trading with recurrent actor-critic reinforcement learning
Stars: ✭ 305 (-2.56%)
Mutual labels:  jupyter-notebook
Pandas Data Science Tasks
Set of real world data science tasks completed using the Python Pandas library
Stars: ✭ 311 (-0.64%)
Mutual labels:  jupyter-notebook
Adaptis
[ICCV19] AdaptIS: Adaptive Instance Selection Network, https://arxiv.org/abs/1909.07829
Stars: ✭ 314 (+0.32%)
Mutual labels:  jupyter-notebook
Vimpyter
Edit your Jupyter notebooks in Vim/Neovim
Stars: ✭ 308 (-1.6%)
Mutual labels:  jupyter-notebook
Zhihu
This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.
Stars: ✭ 3,307 (+956.55%)
Mutual labels:  jupyter-notebook
Keras Multiprocess Image Data Generator
Accelerating Deep Learning with Multiprocess Image Augmentation in Keras
Stars: ✭ 310 (-0.96%)
Mutual labels:  jupyter-notebook
Covid19 twitter
Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development
Stars: ✭ 304 (-2.88%)
Mutual labels:  jupyter-notebook
Tensorflow Image Classification
CNN for multi-class image recognition in tensorflow
Stars: ✭ 312 (-0.32%)
Mutual labels:  jupyter-notebook
Tensorflow Handbook
简单粗暴 TensorFlow 2 | A Concise Handbook of TensorFlow 2 | 一本简明的 TensorFlow 2 入门指导教程
Stars: ✭ 3,616 (+1055.27%)
Mutual labels:  jupyter-notebook
Test Stock Prediction Algorithms
Use deep learning, genetic programming and other methods to predict stock and market movements
Stars: ✭ 312 (-0.32%)
Mutual labels:  jupyter-notebook
Attention Analysis
Stars: ✭ 307 (-1.92%)
Mutual labels:  jupyter-notebook
Erlemar.github.io
Data science portfolio
Stars: ✭ 309 (-1.28%)
Mutual labels:  jupyter-notebook
Statistics For Engineers
Statistics Tutorial for IT Operations Engineers
Stars: ✭ 310 (-0.96%)
Mutual labels:  jupyter-notebook
Recsys
项亮的《推荐系统实践》的代码实现
Stars: ✭ 306 (-2.24%)
Mutual labels:  jupyter-notebook
Thstrader
量化交易。同花顺免费模拟炒股软件客户端的python API。(Python3)
Stars: ✭ 311 (-0.64%)
Mutual labels:  jupyter-notebook
Gan Metrics
An empirical study on evaluation metrics of generative adversarial networks.
Stars: ✭ 307 (-1.92%)
Mutual labels:  jupyter-notebook
Biosentvec
BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
Stars: ✭ 308 (-1.6%)
Mutual labels:  jupyter-notebook
Shapematchinggan
[ICCV 2019, Oral] Controllable Artistic Text Style Transfer via Shape-Matching GAN
Stars: ✭ 315 (+0.64%)
Mutual labels:  jupyter-notebook
Tianchi Medical Lungtumordetect
天池医疗AI大赛[第一季]:肺部结节智能诊断 UNet/VGG/Inception/ResNet/DenseNet
Stars: ✭ 314 (+0.32%)
Mutual labels:  jupyter-notebook
Dlcourse ai
Материалы курса Deep Learning на пальцах
Stars: ✭ 310 (-0.96%)
Mutual labels:  jupyter-notebook

TX-WORD2VEC

腾讯开源的word2vec模型。

原版15个G,一般爱好者很难玩出来。

所以做了一些小的。方便大家使用。

5000-small.txt 这个有5000词,可以下下来玩玩

45000-small.txt 这个有4.5w的词,已经能解决很多问题了

70000-small.txt 7w词 133MB https://pan.baidu.com/s/1DprHD8HwEqkWRBG0ss2y1A

100000-small.txt 10w词 190MB https://pan.baidu.com/s/1KqPOwfrw3KoLJqTsCUdriA

500000-small.txt 50w词 953MB https://pan.baidu.com/s/1SGwxpGW8HjYw8HdKQUB8Gw

1000000-small.txt 100w词 1.9GB https://pan.baidu.com/s/1ObstPl7R8o1L98Ag9owGiw

2000000-small.txt 200w词 3.8GB https://pan.baidu.com/s/1hmCiMandgyedjmP520_Aog

再大就自己去下载吧

https://ai.tencent.com/ailab/nlp/en/data/Tencent_AILab_ChineseEmbedding.tar.gz

如何使用

读取模型

from gensim.models import KeyedVectors

model = KeyedVectors.load_word2vec_format("50-small.txt")

把玩模型

model.most_similar(positive=['女', '国王'], negative=['男'], topn=1)

model.doesnt_match("上海 成都 广州 北京".split(" "))

model.similarity('女人', '男人')

model.most_similar('特朗普',topn=10)

深度模式示例

使用LSTM模型,根据豆瓣评论,预测打分。

首先下载豆瓣的数据

豆瓣评论数据149M https://pan.baidu.com/s/1WbqoCKsmrnpf6n5ZTV-fKA

然后下载库对应的分词包。 https://pan.baidu.com/s/19busyY1yysbOgdYWxIaIQA

加载70000字典前 Image text

加载70000字典后 Image text

代码文件见 Use Tencent Word Embeddings with douban datasets.ipynb

抛转引玉,如果有哪位朋友有功夫更新其他的模型或者数据库,还可以request pull

有问题也可以开issue

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].