Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → hamelsmu → Ktext

hamelsmu / Ktext

Licence: mit

Utilities for preprocessing text for deep learning with Keras

Labels

jupyter-notebook deep-learning machine-learning keras nlp-machine-learning

Projects that are alternatives of or similar to Ktext

Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ

Stars: ✭ 294 (+61.54%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Coursera Natural Language Processing Specialization

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.

Stars: ✭ 39 (-78.57%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Hands On Nltk Tutorial

The hands-on NLTK tutorial for NLP in Python

Stars: ✭ 419 (+130.22%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Natural Language Processing With Tensorflow

Natural Language Processing with TensorFlow, published by Packt

Stars: ✭ 222 (+21.98%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Seq2seq tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

Stars: ✭ 132 (-27.47%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Data Science Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

Stars: ✭ 273 (+50%)

Mutual labels: jupyter-notebook, nlp-machine-learning

AI SDTM mapping (R for ML, Python, TensorFlow for DL)

Stars: ✭ 27 (-85.16%)

Mutual labels: jupyter-notebook, nlp-machine-learning

NeMo: a toolkit for conversational AI

Stars: ✭ 3,685 (+1924.73%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Bertqa Attention On Steroids

BertQA - Attention on Steroids

Stars: ✭ 112 (-38.46%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Datasets, tools, and benchmarks for representation learning of code.

Stars: ✭ 1,378 (+657.14%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Melusine is a high-level library for emails classification and feature extraction "dédiée aux courriels français".

Stars: ✭ 222 (+21.98%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Pytorch Question Answering

Important paper implementations for Question Answering using PyTorch

Stars: ✭ 154 (-15.38%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Chinese models for spacy

SpaCy 中文模型 | Models for SpaCy that support Chinese

Stars: ✭ 543 (+198.35%)

Mutual labels: jupyter-notebook, nlp-machine-learning

News push project

Real Time News Scraping and Recommendation System - React | Tensorflow | NLP | News Scrapers

Stars: ✭ 44 (-75.82%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Natural Language Processing Specialization

This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera

Stars: ✭ 151 (-17.03%)

Mutual labels: jupyter-notebook, nlp-machine-learning

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

Stars: ✭ 181 (-0.55%)

Mutual labels: jupyter-notebook, nlp-machine-learning

Academiccontent

Free tech resources for faculty, students, researchers, life-long learners, and academic community builders for use in tech based courses, workshops, and hackathons.

Stars: ✭ 2,196 (+1106.59%)

Mutual labels: jupyter-notebook

Mslearn Aml Labs

Azure Machine Learning Lab Notebooks

Stars: ✭ 182 (+0%)

Mutual labels: jupyter-notebook

subpixel: A subpixel convnet for super resolution with Tensorflow

Stars: ✭ 2,114 (+1061.54%)

Mutual labels: jupyter-notebook

免费学代码系列：小白python入门、数据分析data analyst、机器学习machine learning、深度学习deep learning、kaggle实战

Stars: ✭ 2,309 (+1168.68%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Note: This utility is really old and is no longer maintained. You should use keras.layers.TextVectorization instead of this.

Utilities for pre-processing text for deep learning in Keras.

ktext performs common pre-processing steps associated with deep learning (cleaning, tokenization, padding, truncation). Most importantly, ktext allows you to perform these steps using process-based threading in parallel. If you don't think you might benefit from parallelization, consider using the text preprocessing utilities in keras instead.

ktext helps you with the following:

Cleaning You may want to clean your data to remove items like phone numbers and email addresses and replace them with generic tags, or remove HTML. This step is optional, but can help remove noise in your data.
Tokenization Take a raw string, ex "Hello World!" and tokenize it so it looks like ['Hello', 'World', '!']
Generating Vocabulary and a {Token -> index} mapping Map each unique token in your corpus to an integer value. This usually stored as a dictionary. For example {'Hello': 2, 'World':3, '!':4} might be a valid mapping from tokens to integers. You usually want to reserve an integer for rare or unseen words (ktext uses 1) and another integer for padding (ktext uses 0). You can set a threshold for rare words (see documentation).
Truncating and Padding While it is not necessary, it can be much easier if all your documents are the same length. The way we can accomplish this is through truncating and padding. For all documents below the desired length we can pad the document with 0's and documents above the desired length can be truncated. This utility allows you to build a histogram of your document lengths and choose a sensible document length for your corpus.

This utility accomplishes all of the above using process-based threading for speed. Sklearn style fit, transform, and fit_transform interfaces are provided (but not directly compatible with sklearn yet). Pull requests and comments are welcome.

Note: This utility is useful if all of your data can fit into memory on a single node. Otherwise, if your data cannot fit into memory, consider using distributing computing paradigms such as Hive, Spark or Dask.

Documentation

This notebook contains a tutorial on how to use this library.

Installation

$ pip install ktext

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 182

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗