All Projects → mit-nlp → Text.jl

mit-nlp / Text.jl

Licence: Apache-2.0 license
Numerous tools for text processing

Programming Languages

julia
2034 projects

TEXT: Numerous tools for text processing

Build Status

This package is a julia implementation of:

  1. Text classification based on BoW models (e.g. topic/langauge id)
  2. Language ID (training and processing) based on word and character n-grams
  3. Lewis's SMART stop list for English
  4. tfidf/tfllr text feature normalization
  5. ngram feature extractors

Prerequistes

  • Stage - Needed for logging and memoization (Note: requires manual install)
  • Ollam - online learning modules (Note: requires manual install)
  • Devectorize - macro-based devectorization
  • DataStructures - for DefaultDict
  • Devectorize
  • GZip
  • Iterators - for iterator helper functions

Install

This is an experimental package which is not currently registered in the julia central repository. You can install via:

Pkg.clone("https://github.com/saltpork/Stage.jl")
Pkg.clone("https://github.com/mit-nlp/Ollam.jl")
Pkg.clone("https://github.com/mit-nlp/Text.jl")

Usage

See test/runtests.jl for detailed usage.

License

This package was created for the DARPA XDATA and Memex program under an Apache v2 License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].