A SVM model that classifies the reviews as real or fake. Used both the review text and the additional features contained in the data set to build a model that predicted with over 85% accuracy without using any deep learning techniques.

Stars: ✭ 42 (+147.06%)

Mutual labels: nlp-machine-learning

Inventus

Inventus is a spider designed to find subdomains of a specific domain by crawling it and any subdomains it discovers.

Stars: ✭ 80 (+370.59%)

Mutual labels: mit-license

anuvada

Interpretable Models for NLP using PyTorch

Stars: ✭ 102 (+500%)

Mutual labels: nlp-machine-learning

fswatch

File/Directory Watcher for Modern C++

Stars: ✭ 56 (+229.41%)

Mutual labels: mit-license

Willow

The Web Interaction Library that eases the burden of creating AJAX-based web applications

Stars: ✭ 41 (+141.18%)

Mutual labels: mit-license

View All Similar Projects ➔

lidtk

lidtk - the language identification toolkit - was written in order to investigate the current state of language performance.

Installation

The recommended way to install clana is:

$ pip install lidtk --user

If you want the latest version:

$ git clone https://github.com/MartinThoma/lidtk.git; cd lidtk
$ pip install -e . --user

I recommend getting the WiLI-2018 dataset.

Usage

$ lidtk --help

Usage: lidtk [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  analyze-data           Utility function for the languages...
  analyze-unicode-block  Analyze how important a Unicode block is for...
  char-distrib           Use the character distribution language...
  cld2                   Use the CLD-2 language classifier.
  create-dataset         Create sharable dataset from downloaded...
  download               Download 1000 documents of each language.
  google-cloud           Use the CLD-2 language classifier.
  langdetect             Use the langdetect language classifier.
  langid                 Use the langid language classifier.
  map                    Map predictions to something known by WiLI
  nn                     Use a neural network classifier.
  textcat                Use the CLD-2 language classifier.
  tfidf_nn               Use the TfidfNNClassifier classifier.

For example:

$ lidtk cld2 predict --text 'This is a test.'
eng

The usual order is:

lidtk download: Please use WiLI-2018 instead of downloading the dataset on your own.
lidtk create-dataset: This step can be skipped if you use WiLI-2018
lidtk analyze-unicode-block --start 0 --end 128
lidtk tfidf_nn train vectorizer --config lidtk/classifiers/config/tfidf_nn.yaml
lidtk tfidf_nn train vectorizer --config lidtk/classifiers/config/tfidf_nn.yaml
lidtk tfidf_nn wili --config lidtk/classifiers/config/tfidf_nn.yaml

Or to use one directly:

$ lidtk cld2 predict --text 'This text is written in some language.'

eng

Development

Check tests with tox.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

MartinThoma / lidtk

Programming Languages

Labels

Projects that are alternatives of or similar to lidtk

lidtk

Installation

Usage

Development