All Projects → curiosity-ai → Catalyst

curiosity-ai / Catalyst

Licence: mit
🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

Programming Languages

csharp
926 projects

Projects that are alternatives of or similar to Catalyst

Riceteacatpanda
repo with challenge material for riceteacatpanda (2020)
Stars: ✭ 18 (-91.96%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+466.96%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Ciff
Cornell Instruction Following Framework
Stars: ✭ 23 (-89.73%)
Mutual labels:  artificial-intelligence, natural-language-processing, natural-language-understanding
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+9711.61%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Cocoaai
🤖 The Cocoa Artificial Intelligence Lab
Stars: ✭ 134 (-40.18%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Mycroft Core
Mycroft Core, the Mycroft Artificial Intelligence platform.
Stars: ✭ 5,489 (+2350.45%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-82.59%)
Mutual labels:  artificial-intelligence, natural-language-processing, natural-language-understanding
Olivia
💁‍♀️Your new best friend powered by an artificial neural network
Stars: ✭ 3,114 (+1290.18%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Chars2vec
Character-based word embeddings model based on RNN for handling real world texts
Stars: ✭ 130 (-41.96%)
Mutual labels:  natural-language-processing, embeddings, natural-language-understanding
Xlnet extension tf
XLNet Extension in TensorFlow
Stars: ✭ 109 (-51.34%)
Mutual labels:  artificial-intelligence, natural-language-processing, natural-language-understanding
Botlibre
An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.
Stars: ✭ 412 (+83.93%)
Mutual labels:  artificial-intelligence, natural-language-processing, natural-language-understanding
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Stars: ✭ 165 (-26.34%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Graphbrain
Language, Knowledge, Cognition
Stars: ✭ 294 (+31.25%)
Mutual labels:  artificial-intelligence, natural-language-processing, natural-language-understanding
Awesome Ai Ml Dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (+270.98%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+26.34%)
Mutual labels:  ai, natural-language-processing, natural-language-understanding
Reading comprehension tf
Machine Reading Comprehension in Tensorflow
Stars: ✭ 37 (-83.48%)
Mutual labels:  artificial-intelligence, natural-language-processing, natural-language-understanding
Articutapi
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Stars: ✭ 252 (+12.5%)
Mutual labels:  artificial-intelligence, natural-language-processing, natural-language-understanding
Lda
LDA topic modeling for node.js
Stars: ✭ 262 (+16.96%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Bert As Service
Mapping a variable-length sentence to a fixed-length vector using BERT model
Stars: ✭ 9,779 (+4265.63%)
Mutual labels:  ai, natural-language-processing, natural-language-understanding
Nlpaug
Data augmentation for NLP
Stars: ✭ 2,761 (+1132.59%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing

Nuget Build Status

catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

Gitter

⚡ Features

✨ Getting Started

Using catalyst is as simple as installing its NuGet Package, and setting the storage to use our online repository. This way, models will be lazy loaded either from disk or downloaded from our online repository. Check out also some of the sample projects for more examples on how to use catalyst.

Storage.Current = new OnlineRepositoryStorage(new DiskStorage("catalyst-models"));
var nlp = await Pipeline.ForAsync(Language.English);
var doc = new Document("The quick brown fox jumps over the lazy dog", Language.English);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());

You can also take advantage of C# lazy evaluation and native multi-threading support to process a large number of documents in parallel:

var docs = GetDocuments();
var parsed = nlp.Process(docs);
DoSomething(parsed);

IEnumerable<IDocument> GetDocuments()
{
    //Generates a few documents, to demonstrate multi-threading & lazy evaluation
    for(int i = 0; i < 1000; i++)
    {
        yield return new Document("The quick brown fox jumps over the lazy dog", Language.English);
    }
}

void DoSomething(IEnumerable<IDocument> docs)
{
    foreach(var doc in docs)
    {
        Console.WriteLine(doc.ToJson());
    }
}

Training a new FastText word2vec embedding model is as simple as this:

var nlp = await Pipeline.ForAsync(Language.English);
var ft = new FastText(Language.English, 0, "wiki-word2vec");
ft.Data.Type = FastText.ModelType.CBow;
ft.Data.Loss = FastText.LossType.NegativeSampling;
ft.Train(nlp.Process(GetDocs()));
ft.StoreAsync();

For fast embedding search, we have also released a C# version of the "Hierarchical Navigable Small World" (HNSW) algorithm on NuGet, based on our fork of Microsoft's HNSW.Net. We have also released a C# version of the "Uniform Manifold Approximation and Projection" (UMAP) algorithm for dimensionality reduction on GitHub and on NuGet.

📖 Documentation (coming soon)

Documentation
Getting Started How to use catalyst and its features.
API Reference The detailed reference for catalyst's API.
Contribute How to contribute to catalyst codebase.
Samples Sample projects demonstrating catalyst capabilities
Gitter Join our gitter channel
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].