All Projects → IlyaGusev → Tgcontest

IlyaGusev / Tgcontest

Licence: apache-2.0
Telegram Data Clustering contest solution by Mindful Squirrel

Programming Languages

cpp
1120 projects

Projects that are alternatives of or similar to Tgcontest

Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+4159.46%)
Mutual labels:  data-science, classification, clustering
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+1616.22%)
Mutual labels:  data-science, classification, clustering
Smile
Statistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+7213.51%)
Mutual labels:  data-science, classification, clustering
Mlr
Machine Learning in R
Stars: ✭ 1,542 (+1983.78%)
Mutual labels:  data-science, classification, clustering
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+2868.92%)
Mutual labels:  data-science, classification, clustering
Mlj.jl
A Julia machine learning framework
Stars: ✭ 982 (+1227.03%)
Mutual labels:  data-science, classification, clustering
Neuroflow
Artificial Neural Networks for Scala
Stars: ✭ 105 (+41.89%)
Mutual labels:  data-science, classification, clustering
Uci Ml Api
Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)
Stars: ✭ 190 (+156.76%)
Mutual labels:  data-science, classification, clustering
Pycaret
An open-source, low-code machine learning library in Python
Stars: ✭ 4,594 (+6108.11%)
Mutual labels:  data-science, clustering, classification
Tensorflow Book
Accompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.
Stars: ✭ 4,448 (+5910.81%)
Mutual labels:  classification, clustering
Alphapy
Automated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost
Stars: ✭ 564 (+662.16%)
Mutual labels:  data-science, classification
Elki
ELKI Data Mining Toolkit
Stars: ✭ 613 (+728.38%)
Mutual labels:  data-science, clustering
Metriculous
Measure and visualize machine learning model performance without the usual boilerplate.
Stars: ✭ 71 (-4.05%)
Mutual labels:  data-science, classification
Mlr3
mlr3: Machine Learning in R - next generation
Stars: ✭ 463 (+525.68%)
Mutual labels:  data-science, classification
Food Recipe Cnn
food image to recipe with deep convolutional neural networks.
Stars: ✭ 448 (+505.41%)
Mutual labels:  data-science, classification
Pyclustering
pyclustring is a Python, C++ data mining library.
Stars: ✭ 806 (+989.19%)
Mutual labels:  data-science, clustering
Awesome Fraud Detection Papers
A curated list of data mining papers about fraud detection.
Stars: ✭ 843 (+1039.19%)
Mutual labels:  data-science, classification
Stats Maths With Python
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (+414.86%)
Mutual labels:  data-science, clustering
Scikit Multilearn
A scikit-learn based module for multi-label et. al. classification
Stars: ✭ 638 (+762.16%)
Mutual labels:  classification, clustering
Text classification
all kinds of text classification models and more with deep learning
Stars: ✭ 7,179 (+9601.35%)
Mutual labels:  classification, fasttext

TGNews

Build Status

Links

Demo

Install

Prerequisites: CMake, Boost

$ sudo apt-get install cmake libboost-all-dev build-essential libjsoncpp-dev uuid-dev protobuf-compiler libprotobuf-dev

For MacOS

$ brew install boost jsoncpp ossp-uuid protobuf

If you got zip archive, just go to building binary

To download code and models:

$ git clone https://github.com/IlyaGusev/tgcontest
$ cd tgcontest
$ git submodule update --init --recursive
$ bash download_models.sh
$ wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.5.0%2Bcpu.zip
$ unzip libtorch-cxx11-abi-shared-with-deps-1.5.0+cpu.zip

For MacOS use https://download.pytorch.org/libtorch/cpu/libtorch-macos-1.5.0.zip

To build binary (in "tgcontest" dir):

$ mkdir build && cd build && Torch_DIR="../libtorch" cmake -DCMAKE_BUILD_TYPE=Release .. && make -j4

To download datasets:

$ bash download_data.sh

Run on sample:

./build/tgnews top data --ndocs 10000

Training

  • Russian FastText vectors training: VectorsRu.ipynb Open In Colab
  • Russian fasttext category classifier training: CatTrainRu.ipynb Open In Colab
  • Russian text embedder with triplet loss training (v3): Open In Colab
  • English FastText vectors training: VectorsEn.ipynb Open In Colab
  • English fasttext category classifier training: CatTrainEn.ipynb Open In Colab
  • English text embedder with triplet loss training (v3): Open In Colab
  • PageRank rating calculation: PageRankRating.ipynb Open In Colab
  • Russian ELMo-based sentence embedder training (not used): Open In Colab
  • XLM-RoBERTa pseudo-labeling for categorization: Open In Colab

Models

Data

Markup

Misc

Other contestants

Contacts

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].