Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → chewxy → Lingo

chewxy / Lingo

Licence: mit

package lingo provides the data structures and algorithms required for natural language processing

Programming Languages

31211 projects - #10 most used programming language

golang

3204 projects

Labels

nlp natural-language-processing language-model nlp-machine-learning nlp-library part-of-speech-tagger

Projects that are alternatives of or similar to Lingo

Transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+49229.2%)

Mutual labels: natural-language-processing, language-model, nlp-library

Nlp profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

Stars: ✭ 181 (+60.18%)

Mutual labels: natural-language-processing, nlp-machine-learning, nlp-library

Lingua

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Stars: ✭ 341 (+201.77%)

Mutual labels: natural-language-processing, nlp-machine-learning, nlp-library

Gpt2

PyTorch Implementation of OpenAI GPT-2

Stars: ✭ 64 (-43.36%)

Mutual labels: natural-language-processing, language-model

Textblob Ar

Arabic support for textblob

Stars: ✭ 60 (-46.9%)

Mutual labels: natural-language-processing, part-of-speech-tagger

How To Mine Newsfeed Data And Extract Interactive Insights In Python

A practical guide to topic mining and interactive visualizations

Stars: ✭ 61 (-46.02%)

Mutual labels: natural-language-processing, nlp-machine-learning

Coursera Natural Language Processing Specialization

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.

Stars: ✭ 39 (-65.49%)

Mutual labels: natural-language-processing, nlp-machine-learning

Lda Topic Modeling

A PureScript, browser-based implementation of LDA topic modeling.

Stars: ✭ 91 (-19.47%)

Mutual labels: natural-language-processing, nlp-machine-learning

Intent classifier

Stars: ✭ 67 (-40.71%)

Mutual labels: natural-language-processing, nlp-machine-learning

Toiro

A comparison tool of Japanese tokenizers

Stars: ✭ 95 (-15.93%)

Mutual labels: natural-language-processing, nlp-library

Pynlp

A pythonic wrapper for Stanford CoreNLP.

Stars: ✭ 103 (-8.85%)

Mutual labels: natural-language-processing, part-of-speech-tagger

Vietnamese Electra

Electra pre-trained model using Vietnamese corpus

Stars: ✭ 55 (-51.33%)

Mutual labels: natural-language-processing, language-model

Python Tutorial Notebooks

Python tutorials as Jupyter Notebooks for NLP, ML, AI

Stars: ✭ 52 (-53.98%)

Mutual labels: natural-language-processing, part-of-speech-tagger

Tika Python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Stars: ✭ 997 (+782.3%)

Mutual labels: nlp-machine-learning, nlp-library

Codesearchnet

Datasets, tools, and benchmarks for representation learning of code.

Stars: ✭ 1,378 (+1119.47%)

Mutual labels: natural-language-processing, nlp-machine-learning

Repo 2016

R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation

Stars: ✭ 103 (-8.85%)

Mutual labels: natural-language-processing, nlp-machine-learning

Greek Bert

A Greek edition of BERT pre-trained language model

Stars: ✭ 84 (-25.66%)

Mutual labels: natural-language-processing, language-model

Easy Bert

A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)

Stars: ✭ 106 (-6.19%)

Mutual labels: natural-language-processing, language-model

Spacy Transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Stars: ✭ 919 (+713.27%)

Mutual labels: natural-language-processing, language-model

Spago

Self-contained Machine Learning and Natural Language Processing library in Go

Stars: ✭ 854 (+655.75%)

Mutual labels: natural-language-processing, language-model

View All Similar Projects ➔

lingo

package lingo provides the data structures and algorithms required for natural language processing.

Specifically, it provides a POS Tagger (lingo/pos), a Dependency Parser (lingo/dep), and a basic tokenizer (lingo/lexer) for English. It also provides data structures for holding corpuses (lingo/corpus), and treebanks (lingo/treebank).

The aim of this package is to provide a production quality pipeline for natural language processing.

Install

The package is go-gettable: go get -u github.com/chewxy/lingo

This package and its subpackages depend on very few external packages. Here they are:

Package	Used For	Vitality	Notes	Licence
gorgonia	Machine learning	Vital. It won't be hard to rewrite them, but why?	Same author	Gorgonia Licence (Apache 2.0-like)
gographviz	Visualization of annotations, and other graph-related visualizations	Vital for visualizations, which are a nice-to-have feature	API last changed 12th April 2017	gographviz licence (Apache 2.0)
errors	Errors	The package won't die without it, but it's a very nice to have	Stable API for the past year	errors licence (MIT/BSD like)
set	Set operations	Can be easily replaced	Stable API for the past year	set licence (MIT/BSD-like)

Usage

See the individual packages for usage. There is also a bunch of executables in the cmd directory. They're meant to be examples as to how a natural language processing pipeline can be set up.

A natural language pipeline with this package is heavily channels driven. Here's is an example for dependency parsing:

func main() {
	inputString: `The cat sat on the mat`
	lx := lexer.New("dummy", strings.NewReader(inputString)) // lexer - required to break a sentence up into words.
	pt := pos.New(pos.WithModel(posModel))                   // POS Tagger - required to tag the words with a part of speech tag.
	dp := dep.New(depModel)                                  // Creates a new parser

	// set up a pipeline
	pt.Input = lx.Output
	dp.Input = pt.Output

	// run all
	go lx.Run()
	go pt.Run()
	go dp.Run()

	// wait to receive:
	for {
		select {
		case d := <- dp.Output:
			// do something
		case err:= <-dp.Error:
			// handle error
		}
	}

}

How It Works

For specific tasks (POS tagging, parsing, named entity recognition etc), refer to the README of each subpackage. This package on its own mainly provides the data structures that the subpackages will use.

Perhaps the most important data structure is the *Annotation structure. It basically holds a word and the associated metadata for the word.

For dependency parses, the graph takes three forms: *Dependency, *DependencyTree and *Annotation. All three forms are convertable from one to another. TODO: explain rationale behind each data type.

Quirks

Very Oddly Specific POS Tags and Dependency Rel Types

A particular quirk you may have noticed is that the POSTag and DependencyType are hard coded in as constants. This package does in fact provide two variations of each: one from Stanford/Penn Treebank and one from UniversalDependencies.

The main reason for hardcoding these are mainly for performance reasons - knowing ahead how much to allocate reduces a lot of additional work the program has to do. It also reduces the chances of mutating a global variable.

Of course this comes as a tradeoff - programs are limited to these two options. Thankfully there are only a limited number of POS Tag and Dependency Relation types. Two of the most popular ones (Stanford/PTB and Universal Dependencies) have been implemented.

The following build tags are supported:

stanfordtags
universaltags
stanfordrel
universalrel

To use a specific tagset or relset, build your program thusly: go build -tags='stanfordtags'.

The default tag and dependency rel types are the universal dependencies version.

Lexer

You should also note that the tokenizer, lingo/lexer is not your usual run-of-the-mill NLP tokenizer. It's a tokenizer that tokenizes by space, with some specific rules for English. It was inspired by Rob Pike's talk on lexers. I thought it'd be cool to write something like that for NLP.

The test cases in package lingo/lexer showcases how it handles unicode, and other pathalogical english.

Contributing

see CONTRIBUTING.md for more info

Licence

This package is licenced under the MIT licence.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 113

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (13) 🔗