All Projects → atnlp → Torchtext Summary

atnlp / Torchtext Summary

torchtext使用总结,从零开始逐步实现了torchtext文本预处理过程,包括截断补长,词表构建,使用预训练词向量,构建可用于PyTorch的可迭代数据等步骤。并结合Pytorch实现LSTM.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Torchtext Summary

Notes On Statistical Learning Methods
李航《统计学习方法》中机器学习模型的LaTeX公式笔记
Stars: ✭ 141 (-0.7%)
Mutual labels:  jupyter-notebook
Dse210 probability statistics python
Probability and Statistics Using Python Data Science Masters Course at UCSD (DSE 210)
Stars: ✭ 141 (-0.7%)
Mutual labels:  jupyter-notebook
Vtubestudio
Placeholder for VTube Studio
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook
Bob Emploi
An application that provides personalized career and job search advice to jobseekers.
Stars: ✭ 141 (-0.7%)
Mutual labels:  jupyter-notebook
Makeyourownneuralnetwork
Code for the Make Your Own Neural Network book
Stars: ✭ 1,859 (+1209.15%)
Mutual labels:  jupyter-notebook
Byte Sized Code
A collection of Jupyter notebooks for learning Python from the ground up.
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook
Www.julialang.org
Julia Project website
Stars: ✭ 140 (-1.41%)
Mutual labels:  jupyter-notebook
Data Analysis
主要是爬虫与数据分析项目总结,外加建模与机器学习,模型的评估。
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook
Sketchy
Stars: ✭ 141 (-0.7%)
Mutual labels:  jupyter-notebook
Kdd winniethebest
KDD Cup 2020 Challenges for Modern E-Commerce Platform: Multimodalities Recall first place
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook
Image Caption Generator
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
Stars: ✭ 141 (-0.7%)
Mutual labels:  jupyter-notebook
Glmnet python
Stars: ✭ 140 (-1.41%)
Mutual labels:  jupyter-notebook
Dla
Deep learning for audio processing
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook
Data Ppf.github.io
website
Stars: ✭ 141 (-0.7%)
Mutual labels:  jupyter-notebook
Geometric Intuition
Understanding ML and deep learning through geometry
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook
Raspberryturk
The Raspberry Turk is a robot that can play chess—it's entirely open source, based on Raspberry Pi, and inspired by the 18th century chess playing machine, the Mechanical Turk.
Stars: ✭ 140 (-1.41%)
Mutual labels:  jupyter-notebook
Notebook
Collection of jupyter notebooks for demonstrating software.
Stars: ✭ 141 (-0.7%)
Mutual labels:  jupyter-notebook
Jupytext.vim
Vim plugin for editing Jupyter ipynb files via jupytext
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook
Lacmus
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook
Lessonmaterials
Open Sourced Curriculum and Lessons for an Introductory AI/ML Course
Stars: ✭ 142 (+0%)
Mutual labels:  jupyter-notebook

torchtext的使用总结,并结合Pytorch实现LSTM

版本说明

  • PyTorch版本:0.4.1
  • torchtext:0.2.3
  • python:3.6

文件说明

  • Test-Dataset.ipynb Test-Dataset.py 使用torchtext进行文本预处理的notebook和py版。
  • Test-Dataset2.ipynb 使用Keras和PyTorch构建数据集进行文本预处理。
  • Language-Model.ipynb 使用gensim加载预训练的词向量,并使用PyTorch实现语言模型。

使用说明

  • 分别提供了notebook版和标准py文件版。
  • 从零开始逐步实现了torchtext文本预处理过程,包括截断补长,词表构建,使用预训练词向量,构建可用于pytorch的可迭代数据等。

    使用教程参考我的个人博客(第一个为github博客,图片显示有问题,以第二个为准):

    代码中在数据集中使用预训练词向量部分已注释为markdown格式,如下所示,若要使用预训练的词向量,例如glove开源的预训练词向量,需要在当前目录下创建mycache文件夹作为cache目录,并指定预训练词向量文件所在位置。glove词向量下载可参考此链接:https://pan.baidu.com/s/1i5XmTA9

###  通过预训练的词向量来构建词表的方式示例,以glove.6B.300d词向量为例
cache = 'mycache'
if not os.path.exists(cache):
    os.mkdir(cache)
vectors = Vectors(name='/Users/wyw/Documents/vectors/glove/glove.6B.300d.txt', cache=cache)
# 指定 Vector 缺失值的初始化方式,没有命中的token的初始化方式
vectors.unk_init = init.xavier_uniform_ 
TEXT.build_vocab(train, min_freq=5, vectors=vectors)
# 查看词表元素
TEXT.vocab.vectors
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].