atnlp / Torchtext Summary
torchtext使用总结,从零开始逐步实现了torchtext文本预处理过程,包括截断补长,词表构建,使用预训练词向量,构建可用于PyTorch的可迭代数据等步骤。并结合Pytorch实现LSTM.
Stars: ✭ 142
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Torchtext Summary
Notes On Statistical Learning Methods
李航《统计学习方法》中机器学习模型的LaTeX公式笔记
Stars: ✭ 141 (-0.7%)
Mutual labels: jupyter-notebook
Dse210 probability statistics python
Probability and Statistics Using Python Data Science Masters Course at UCSD (DSE 210)
Stars: ✭ 141 (-0.7%)
Mutual labels: jupyter-notebook
Bob Emploi
An application that provides personalized career and job search advice to jobseekers.
Stars: ✭ 141 (-0.7%)
Mutual labels: jupyter-notebook
Makeyourownneuralnetwork
Code for the Make Your Own Neural Network book
Stars: ✭ 1,859 (+1209.15%)
Mutual labels: jupyter-notebook
Byte Sized Code
A collection of Jupyter notebooks for learning Python from the ground up.
Stars: ✭ 142 (+0%)
Mutual labels: jupyter-notebook
Kdd winniethebest
KDD Cup 2020 Challenges for Modern E-Commerce Platform: Multimodalities Recall first place
Stars: ✭ 142 (+0%)
Mutual labels: jupyter-notebook
Image Caption Generator
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
Stars: ✭ 141 (-0.7%)
Mutual labels: jupyter-notebook
Geometric Intuition
Understanding ML and deep learning through geometry
Stars: ✭ 142 (+0%)
Mutual labels: jupyter-notebook
Raspberryturk
The Raspberry Turk is a robot that can play chess—it's entirely open source, based on Raspberry Pi, and inspired by the 18th century chess playing machine, the Mechanical Turk.
Stars: ✭ 140 (-1.41%)
Mutual labels: jupyter-notebook
Notebook
Collection of jupyter notebooks for demonstrating software.
Stars: ✭ 141 (-0.7%)
Mutual labels: jupyter-notebook
Jupytext.vim
Vim plugin for editing Jupyter ipynb files via jupytext
Stars: ✭ 142 (+0%)
Mutual labels: jupyter-notebook
Lacmus
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.
Stars: ✭ 142 (+0%)
Mutual labels: jupyter-notebook
Lessonmaterials
Open Sourced Curriculum and Lessons for an Introductory AI/ML Course
Stars: ✭ 142 (+0%)
Mutual labels: jupyter-notebook
torchtext的使用总结,并结合Pytorch实现LSTM
版本说明
- PyTorch版本:0.4.1
- torchtext:0.2.3
- python:3.6
文件说明
- Test-Dataset.ipynb Test-Dataset.py 使用torchtext进行文本预处理的notebook和py版。
- Test-Dataset2.ipynb 使用Keras和PyTorch构建数据集进行文本预处理。
- Language-Model.ipynb 使用gensim加载预训练的词向量,并使用PyTorch实现语言模型。
使用说明
- 分别提供了notebook版和标准py文件版。
- 从零开始逐步实现了torchtext文本预处理过程,包括截断补长,词表构建,使用预训练词向量,构建可用于pytorch的可迭代数据等。
使用教程参考我的个人博客(第一个为github博客,图片显示有问题,以第二个为准):
- http://www.nlpuser.com/pytorch/2018/10/30/useTorchText/
- https://blog.csdn.net/nlpuser/article/details/88067167
代码中在数据集中使用预训练词向量部分已注释为markdown格式,如下所示,若要使用预训练的词向量,例如glove开源的预训练词向量,需要在当前目录下创建mycache文件夹作为cache目录,并指定预训练词向量文件所在位置。glove词向量下载可参考此链接:https://pan.baidu.com/s/1i5XmTA9
### 通过预训练的词向量来构建词表的方式示例,以glove.6B.300d词向量为例
cache = 'mycache'
if not os.path.exists(cache):
os.mkdir(cache)
vectors = Vectors(name='/Users/wyw/Documents/vectors/glove/glove.6B.300d.txt', cache=cache)
# 指定 Vector 缺失值的初始化方式,没有命中的token的初始化方式
vectors.unk_init = init.xavier_uniform_
TEXT.build_vocab(train, min_freq=5, vectors=vectors)
# 查看词表元素
TEXT.vocab.vectors
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].