All Projects → poliglot → Fasttext

poliglot / Fasttext

Unofficial implementation of the paper "Bag of Tricks for Efficient Text Classification" by Joulin et al.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Fasttext

Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-18.87%)
Mutual labels:  data-science
Presentations
Talks & Workshops by the CODAIT team
Stars: ✭ 50 (-5.66%)
Mutual labels:  data-science
Ppd599
USC urban data science course series with Python and Jupyter
Stars: ✭ 1,062 (+1903.77%)
Mutual labels:  data-science
Zenml
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.
Stars: ✭ 1,019 (+1822.64%)
Mutual labels:  data-science
Data Science Lunch And Learn
Resources for weekly Data Science Lunch & Learns
Stars: ✭ 49 (-7.55%)
Mutual labels:  data-science
Numerical Linear Algebra
Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course
Stars: ✭ 8,263 (+15490.57%)
Mutual labels:  data-science
Sklearn Porter
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
Stars: ✭ 1,014 (+1813.21%)
Mutual labels:  data-science
25daysinmachinelearning
I will update this repository to learn Machine learning with python with statistics content and materials
Stars: ✭ 53 (+0%)
Mutual labels:  data-science
Mckinsey Smartcities Traffic Prediction
Adventure into using multi attention recurrent neural networks for time-series (city traffic) for the 2017-11-18 McKinsey IronMan (24h non-stop) prediction challenge
Stars: ✭ 49 (-7.55%)
Mutual labels:  data-science
Datumbox Framework
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+1905.66%)
Mutual labels:  data-science
10 Simple Hacks To Speed Up Your Data Analysis In Python
Some useful Tips and Tricks to speed up the data analysis process in Python.
Stars: ✭ 45 (-15.09%)
Mutual labels:  data-science
Causalnex
A Python library that helps data scientists to infer causation rather than observing correlation.
Stars: ✭ 1,036 (+1854.72%)
Mutual labels:  data-science
Php Ml
PHP-ML - Machine Learning library for PHP
Stars: ✭ 7,900 (+14805.66%)
Mutual labels:  data-science
Diffgram
Data Annotation, Data Labeling, Annotation Tooling, Training Data for Machine Learning
Stars: ✭ 43 (-18.87%)
Mutual labels:  data-science
Semester Biology
Stars: ✭ 52 (-1.89%)
Mutual labels:  data-science
Tidyverse
Easily install and load packages from the tidyverse
Stars: ✭ 1,015 (+1815.09%)
Mutual labels:  data-science
Skoot
A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.
Stars: ✭ 50 (-5.66%)
Mutual labels:  data-science
Data Privacy For Data Scientists
A workshop on data privacy methods for data scientists.
Stars: ✭ 53 (+0%)
Mutual labels:  data-science
Ml Template Azure
Template for getting started with automated ML Ops on Azure Machine Learning
Stars: ✭ 52 (-1.89%)
Mutual labels:  data-science
Tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Stars: ✭ 8,378 (+15707.55%)
Mutual labels:  data-science

FastText

Unofficial implementation of the paper Bag of Tricks for Efficient Text Classification by Joulin et al.

Prerequisites

FastText requires Python 3 with Keras installed.

Obtain the Yelp Dataset from here and place yelp_academic_dataset_review.json in the base directory.

Training

Train the model using the following command:

./train.py

It generates data.csv which represents the model's embedding space of the validation set. It is obtained by removing the last layer of the model and using t-SNE for the dimensionality reduction.

index.html implements a D3 visualisation to view the embedding space. You need to run a local web server because browsers don't allow file accesses:

python -m http.server 8000

Now point your browser to: localhost:8000.

License

FastText is licensed under the terms of the Apache v2.0 license.

Authors

  • Ihor Kroosh
  • Tim Nieradzik
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].