Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ispras → Atr4s

ispras / Atr4s

Licence: apache-2.0

Toolkit with state-of-the-art Automatic Terms Recognition methods in Scala

Programming Languages

scala

5932 projects

Labels

nlp-library

Projects that are alternatives of or similar to Atr4s

Giveme5W

Extraction of the five journalistic W-questions (5W) from news articles

Stars: ✭ 16 (-30.43%)

Mutual labels: nlp-library

Contextualized Topic Models

A python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.

Stars: ✭ 318 (+1282.61%)

Mutual labels: nlp-library

Kagome

Self-contained Japanese Morphological Analyzer written in pure Go

Stars: ✭ 554 (+2308.7%)

Mutual labels: nlp-library

clj-duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings. (a duckling clojure fork)

Stars: ✭ 15 (-34.78%)

Mutual labels: nlp-library

Quick Nlp

Pytorch NLP library based on FastAI

Stars: ✭ 279 (+1113.04%)

Mutual labels: nlp-library

Pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Stars: ✭ 426 (+1752.17%)

Mutual labels: nlp-library

Nuts

自然语言处理常见任务（主要包括文本分类，序列标注，自动问答等）解决方案试验田

Stars: ✭ 21 (-8.7%)

Mutual labels: nlp-library

Kuromoji

Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search

Stars: ✭ 745 (+3139.13%)

Mutual labels: nlp-library

Giveme5w1h

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

Stars: ✭ 316 (+1273.91%)

Mutual labels: nlp-library

Sudachi

A Japanese Tokenizer for Business

Stars: ✭ 496 (+2056.52%)

Mutual labels: nlp-library

NLP-tools

Useful python NLP tools (evaluation, GUI interface, tokenization)

Stars: ✭ 39 (+69.57%)

Mutual labels: nlp-library

Chatbot ner

chatbot_ner: Named Entity Recognition for chatbots.

Stars: ✭ 273 (+1086.96%)

Mutual labels: nlp-library

Ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (+1782.61%)

Mutual labels: nlp-library

classy

classy is a simple-to-use library for building high-performance Machine Learning models in NLP.

Stars: ✭ 61 (+165.22%)

Mutual labels: nlp-library

Pythainlp

Thai Natural Language Processing in Python.

Stars: ✭ 582 (+2430.43%)

Mutual labels: nlp-library

NLP Toolkit

Library of state-of-the-art models (PyTorch) for NLP tasks

Stars: ✭ 92 (+300%)

Mutual labels: nlp-library

Lingua

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Stars: ✭ 341 (+1382.61%)

Mutual labels: nlp-library

Underthesea

Underthesea - Vietnamese NLP Toolkit

Stars: ✭ 823 (+3478.26%)

Mutual labels: nlp-library

Janome

Japanese morphological analysis engine written in pure Python

Stars: ✭ 630 (+2639.13%)

Mutual labels: nlp-library

Spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

Stars: ✭ 21,978 (+95456.52%)

Mutual labels: nlp-library

View All Similar Projects ➔

ATR4S

An open-source library for Automatic Term Recognition written in Scala.

To cite ATR4S:

N.Astrakhantsev. ATR4S: Toolkit with State-of-the-art Automatic Terms Recognition Methods in Scala. arXiv preprint arXiv:1611.07804, 2016.

Implemented algorithms

AvgTermFreq
ResidualIDF
TotalTF-IDF
CValue
Basic
ComboBasic
PostRankDC
Relevance
Weirdness
DomainPertinence
NovelTopicModel
LinkProbability
KeyConceptRelatedness
Voting
PU-ATR

Requirements

Libraries

Scala 2.11

Spark 1.5+ (for Voting and PU-ATR)

Emory nlp4j

(Apache OpenNLP is also supported, but preliminary experiments showed that its quality is not better than Emory nlp4j, while it is not thread-safe; if you are going to use OpenNLP, download models from Apache OpenNLP and place them into src/main/resources)

(Stanford CoreNLP is also supported by this helper, which is moved to a separate module licensed by GPL, due to GPL licensing of Stanford CoreNLP).

Data

In order to use some algorithms you need to download auxiliary files and place them into WORKING_DIRECTORY/data directory (note that working directory can be specified in gradle.properties - by default, this is experiments) or specify path in the corresponding configuration/builder class (e.g. Word2VecAdapterConfig of KeyConceptRelatedness).

Namely,

for LinkProbability download info_measure.txt;
for Relevance download COHA_term_occurrences.txt;
for KeyConceptRelatedness download w2vConcepts.model.

Datasets used in the experiments can be downloaded from Release page.

OS

PU algorithm may or may not work on Windows due to some bugs in Spark (see relevant questions on Stackoverflow, maybe they help you: 1, 2, 3).

Linking

The library is published into Maven central and JCenter. Add the following lines depending on your build system.

Gradle

compile 'ru.ispras:atr4s:1.2.2'

Maven

<dependency>
    <groupId>ru.ispras</groupId>
    <artifactId>atr4s</artifactId>
    <version>1.2.2</version>
</dependency>

SBT

libraryDependencies += "ru.ispras" % "atr4s" % "1.2.2"

Building from Sources

Build library with gradle:

./gradlew jar

Usage

Command line example

./gradlew recognize -Pdataset=acl2 -PtopCount=10 -Pconfig=CValue.conf -Poutput=cvalueterms.txt

Here we recognize top 10 terms from text files stored in acl2 directory (should be subdirectory of WORKING_DIRECTORY) by CValue measure (stored in CValue.conf file) and writes recognized terms with weights in cvalueterms.txt.

Note that if the encoding of input text files differs from UTF-8, then you should specify the correct encoding in the config of NLPPreprocessor (or convert input files, there are many tools for that).

Program API

See ATRConfig class, which is a Configuration/builder for a facade class AutomaticTermsRecognizer.

See AutomaticTermsRecognizer object for example.

Program API (Java)

Usage in Java does not differ significantly, so see the same classes for examples. However, since Java does not support parameters with default values, we provide helper static functions named make() for most classes containing parameters with default values or parameters with Scala collections, see example below.

Also note that there is a special method returning weighted terms as Java Iterable, so that you won't need to convert Scala collections to Java ones.

class ATRExample {
    public static void main(String[] args) {
        String datasetDir = args[0];
        int topCount = args[1];
        ATRConfig atrConfig = new ATRConfig(EmoryNLPPreprocessorConfig.make(),
                TCCConfig.make(),
                new OneFeatureTCWeighterConfig(Weirdness.make()));
        Iterable<WeightedTerm> terms = atrConfig.build().recognizeAsJavaIterable(datasetDir, topCount);
        for (WeightedTerm termAndWeight: terms) {
            System.out.println(termAndWeight);
        }
    }
}

License

Apache License Version 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 23

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗