All Projects → olegtarasov → FastText.NetWrapper

olegtarasov / FastText.NetWrapper

Licence: MIT license
.NET Standard wrapper for fastText library. Now works on Windows, Linux and MacOs!

Programming Languages

C#
18002 projects

Projects that are alternatives of or similar to FastText.NetWrapper

actions-suggest-related-links
A GitHub Action to suggest related or similar issues, documents, and links. Based on the power of NLP and fastText.
Stars: ✭ 23 (-59.65%)
Mutual labels:  fasttext
Norm.net
High performance micro-ORM modern Dapper replacement for .NET Standard 2.1 and higher
Stars: ✭ 92 (+61.4%)
Mutual labels:  net
ToxicEye
👽 Program for remote control of windows computers via telegram bot. Written in C#
Stars: ✭ 305 (+435.09%)
Mutual labels:  net
fasttext-serving
Serve your fastText models for text classification and word vectors
Stars: ✭ 21 (-63.16%)
Mutual labels:  fasttext
ungoliant
🕷️ The pipeline for the OSCAR corpus
Stars: ✭ 69 (+21.05%)
Mutual labels:  fasttext
NETProvider
Firebird ADO.NET Data Provider
Stars: ✭ 113 (+98.25%)
Mutual labels:  net
VB.NET
🌐 In this repository included useful examples of Visual Basic completed on Studio 2017 Enterprise Edition, added diploma work of time since 2013. 👔
Stars: ✭ 35 (-38.6%)
Mutual labels:  net
compress-fasttext
Tools for shrinking fastText models (in gensim format)
Stars: ✭ 124 (+117.54%)
Mutual labels:  fasttext
netpoll
Package netpoll implements a network poller based on epoll/kqueue.
Stars: ✭ 38 (-33.33%)
Mutual labels:  net
german-sentiment
A data set and model for german sentiment classification.
Stars: ✭ 37 (-35.09%)
Mutual labels:  fasttext
tortuga
A modern game engine built using dot net core
Stars: ✭ 14 (-75.44%)
Mutual labels:  net
MQTTnet
MQTTnet is a high performance .NET library for MQTT based communication. It provides a MQTT client and a MQTT server (broker). The implementation is based on the documentation from http://mqtt.org/.
Stars: ✭ 3,309 (+5705.26%)
Mutual labels:  net
unicorn-net
WIP .NET binding/wrapper for the Unicorn engine written in C#
Stars: ✭ 44 (-22.81%)
Mutual labels:  net
aliyun-openapi-sdk-net-core
aliyun open api sdk for .net core 2.0
Stars: ✭ 17 (-70.18%)
Mutual labels:  net
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-59.65%)
Mutual labels:  fasttext
profiler-api
The portable version of JetBrains profiler API for .NET Framework / .NET Core / .NET / .NET Standard / Mono
Stars: ✭ 21 (-63.16%)
Mutual labels:  net
fasttext-serverless
Serverless hashtag recommendations using fastText and Python with AWS Lambda
Stars: ✭ 20 (-64.91%)
Mutual labels:  fasttext
goclassy
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
Stars: ✭ 81 (+42.11%)
Mutual labels:  fasttext
fasttext-serving
fastText model serving service
Stars: ✭ 54 (-5.26%)
Mutual labels:  fasttext
Vp.FSharp.Sql
Generic F# ADO Provider Wrapper
Stars: ✭ 35 (-38.6%)
Mutual labels:  net

GitHub Workflow Status Nuget Donwloads

FastText.NetWrapper

This is a cross-platform .NET Standard wrapper for Facebook's FastText library. The wrapper comes with bundled precompiled native binaries for all three platforms: Windows, Linux and MacOs.

Just add it to your project and start using it! No additional setup required. This library will unpack and call appropriate native binary depending on target platform.

What's new

1.3.0

  • Native libraries are now explicitly included in target project and copied to output directory. Hopefully, this solves a couple of problems with the previous approach of dynamically extracting libraries from resources.

1.2.5

  • Fixed progress callbacks for unsupervised model training.

1.2.4

  • Added progress callbacks for model training and autotuning.

1.2.3

  • Added supervised model quantization with Quantize method.
  • Stable version released! 🎉

1.2.2-preview

  • Merged #20 with new GetWordVector method.

1.2.1-preview

  • Added model autotuning with quantization support.
  • Fixed a horrible bug with bool marshalling.

1.2.0-preview

Version 1.2.0 introduces a few breaking changes to library API. If you are not ready to migrate, use v. 1.1.2.

  • ❗️Breaking change:️ Removed both deprecated Train() methods.
  • ❗️Breaking change:️ Removed deprecated SupervisedArgs class.
  • ❗️Breaking change:️ Removed FastTextArgs.SupervisedDefaults() in favor of new SupervisedArgs with default constructor.
  • ❗️Breaking change:️ FastTextArgs class can't be constructed directly, use new SupervisedArgs and UnsupervisedArgs classes.
  • Added an Unsupervised() method to train Skipgram or Cbow models.

1.1.2

  • Fixed a horrible bug with bool marshalling on a 1.1.* branch.

1.1.0, 1.1.1

  • Added new Supervised() method as part of streamlining the API.
  • Added new Test() method for testing supervised model.
  • Deprecated both Train() methods. They will be removed in v. 1.2.0.

1.0.38

  • Fixed a horrible bug with bool marshalling on a 1.0.* branch.

Version 1.2.0 migration guide

  • Instead of old Train() methods use Supervised() and Unsupervised() methods.
  • Instead of FastTextArgs.SupervisedDefaults() use SupervisedArgs or Supervised() overload with 2 arguments.

Usage

Library API closely follows fastText command-line interface, so you can jump right in.

Supervised model training

The simplest use case is to train a supervised model with default parameters. We create a FastTextWrapper and call Supervised().

var fastText = new FastTextWrapper();
fastText.Supervised("cooking.train.txt",  "cooking");

Note the arguments:

  1. We specify an input file with one labeled example per line. Here we use Stack Overflow cooking dataset from Facebook: https://dl.fbaipublicfiles.com/fasttext/data/cooking.stackexchange.tar.gz. You can find extracted files split into training and validation sets in UnitTests directory in this repository.
  2. Your model will be saved to cooking.bin and cooking.vec with pretrained vectors will be placed if the same directory.
  3. Here we use Supervised() overload with 2 arguments. This means that training will be done with default parameters. It's a good starting point and is the same as calling fastText this way:
./fasttext supervised -input cooking.train.txt -output cooking

Loading models

Call LoadModel() and specify path to the .bin model file:

var fastText = new FastTextWrapper();
fastText.LoadModel("model.bin");

Using pretrained vectors

To use pretrained vectors for your supervised model, create an instance of SupervisedArgs and customize it:

var fastText = new FastTextWrapper();
            
var args = new SupervisedArgs
{
    PretrainedVectors = "cooking.unsup.300.vec",
    dim = 300
};

fastText.Supervised("cooking.train.txt", "cooking", args);

Here we get default training arguments, supply a path to pretrained vectors file and adjust vector dimension accordingly.

Important! Be sure to always check the dimension of your pretrained vectors! Many vectors on the internet have dimension 300, but default dimension for fastText supervised model training is 100.

Testing the model

Now you can easily test a supervised model against a validation set. You can specify different values for k and threshlod as well.

var result = fastText.Test("cooking.valid.txt");

You will get an instance of TestResult where you can find aggregated or per-label metrics:

Console.WriteLine($"Results:\n\tPrecision: {result.GlobalMetrics.GetPrecision()}" +
                            $"\n\tRecall: {result.GlobalMetrics.GetRecall()}" +
                            $"\n\tF1: {result.GlobalMetrics.GetF1()}");

You can even get a precision-recall curve (aggregated or per-label)! Here is an example of exporting an SVG plot with cross-platform OxyPlot library:

var result = fastText.Test("cooking.valid.txt");
var curve = result.GetPrecisionRecallCurve();

var series = new LineSeries {StrokeThickness = 1};
series.Points.AddRange(curve.Select(x => new DataPoint(x.recall, x.precision)).OrderBy(x => x.X));

var plotModel = new PlotModel
{
    Series = { series },
    Axes =
    {
        new LinearAxis {Position = AxisPosition.Bottom, Title = "Recall"},
        new LinearAxis {Position = AxisPosition.Left, Title = "Precision"}
    }
};

using (var stream = new FileStream("precision-recall.svg", FileMode.Create, FileAccess.Write))
{
    SvgExporter.Export(plotModel, stream, 600, 600, false);   
}

Supervised model quantization

You can train a new supervised model and quantize it immediatly by replacing SupervisedArgs with QuantizedSupervisedArgs:

var fastText = new FastTextWrapper();
fastText.Supervised("cooking.train.txt", "cooking", new QuantizedSupervisedArgs());

You can also load an existing model and quantize it:

var fastText = new FastTextWrapper();
fastText.LoadModel("model.bin");
fastText.Quantize();

Training unsupervised models

Use Unsupervised() method specifying model type: Skipgram or Cbow:

var fastText = new FastTextWrapper();
fastText.Unsupervised(UnsupervisedModel.SkipGram, "cooking.train.nolabels.txt",  "cooking");

You can use an optional UnsupervisedArgs argument to customize training.

Automatic hyperparameter tuning

You can use fastText autotune to do an automatic hyperparameter search.

Refer to https://github.com/facebookresearch/fastText/blob/master/docs/autotune.md for complete parameter reference.

Use AutotuneArgs to control tuning:

var fastText = new FastTextWrapper();

var autotuneArgs = new AutotuneArgs
{
    Duration = 30, // in seconds
    Metric = "precisionAtRecall:30", // supports custom metrics
    Predictions = 2, // Supports @k predictions
    ModelSize = "10M", // Set this to train a quantized model and do an 
                       // additional quantization hyperparameter search. Requires QuantizedSupervisedArgs.
    ValidationFile = "cooking.valid.txt" // REQUIRED: path to a validation file
};

fastText.Supervised("cooking.train.txt",  "cooking", new QuantizedSupervisedArgs(), autotuneArgs);

Progress callbacks

You can get progress callbacks from the native library. To do so, add a handler to (Un)SupervisedArgs.TrainProgressCallback for simple training, or to AutotuneArgs.AutotuneProgressCallback for hyperparameter tuning.

See ConsoleTest project for an example of using training callbacks with ShellProgressBar library:

using (var pBar = new ProgressBar(100, "Training"))
{
    var ftArgs = new SupervisedArgs
    {
        // ... Other args
        verbose = 0,
        TrainProgressCallback = (progress, loss, wst, lr, eta) =>
        {
            pBar.Tick((int)Math.Ceiling(progress * 100), $"Loss: {loss}, words/thread/sec: {wst}, LR: {lr}, ETA: {eta}");
        }
    };

    fastText.Supervised("cooking.train.txt", outPath, ftArgs);
}

Stopping stderr output

Native FastText library reports training progress to stderr by default. You can turn off this output by setting (Un)SupervisedArgs.verbose = 0 for simple training and AutotuneArgs.Verbose = 0 for hyperparameter tuning.

Getting logs from the wrapper

FastTextWrapper can produce a small amount of logs mostly concerning native library management. You can turn logging on by providing an instance of Microsoft.Extensions.Logging.ILoggerFactory. In this example we use Serilog with console sink.

You can also inject your standard IloggerFactory through .NET Core DI.

// Add the following Nuget packages to your project:
// * Serilog.Sinks.Console
// * Serilog.Extensions.Logging 

Log.Logger = new LoggerConfiguration()
                .MinimumLevel.Debug()
                .WriteTo.Console(theme: ConsoleTheme.None)
                .CreateLogger();

var fastText = new FastTextWrapper(loggerFactory: new SerilogLoggerFactory());

Handling native exceptions

In version 1.1 I've added much better native error handling. Now in case of most native errors you will get a nice NativeLibraryException which you can inspect for detailed error description.

Windows Requirements

Since this wrapper uses native C++ binaries under the hood, you will need to have Visual C++ Runtime Version 140 installed when running under Windows. Visit the MS Downloads page (https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) and select the appropriate redistributable.

FastText C-style API

If you are interested in using FastText with C-style API, here is my fork of the official library: https://github.com/olegtarasov/fastText.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].