Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (+29.41%)

Mutual labels: text-classification, lda

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+6558.82%)

Mutual labels: text-classification, clustering

Vectorai

Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Stars: ✭ 195 (+1047.06%)

Mutual labels: search-engine, clustering

Ml code

A repository for recording the machine learning code

Stars: ✭ 75 (+341.18%)

Mutual labels: clustering, svd

kwx

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (+94.12%)

Mutual labels: text-classification, lda

ML2017FALL

Machine Learning (EE 5184) in NTU

Stars: ✭ 66 (+288.24%)

Mutual labels: text-classification, clustering

Introduction

Data analysis can't be established without having textual data, due to that my work started from getting raw data from most popular news website www.delfi.lt. I decided to crawl articles from 5 categories (Criminals[227 articles], Music[120 articles], Movies[167 articles], Sports[136 articles], Science[204 articles]).

Classification

Classification performance is measured using confusion matrix where rows are true category and columns predicted category. Furthermore such approach reach above 90% recall and 90% precision.

Topics extraction

Figure shows 6 components with 10 tokens for each component. From these results we can detect most important words and intuitively guess topic for each principal component. For example 4 principal component store information about sports and music whereas 6 principal component store information about criminals.

Main results are presented below:

Search query

Search is based on http://webhome.cs.uvic.ca/~thomo/svd.pdf article, where lsa is applied to find related documents using not only exact query similarities, but deeper relations between documents.

Example

Query = "švietim apdovanojam"

Result:

['Imasi mokslininkų algų: siūlo kelti iki 50 proc.']
['Įteiktos 6 Mokslo premijos']
['Lietuvoje į susitikimą kviečia Nobelio premijos laureatas']
['100 tūkst. eurų išdalins populiarinantiems mokslą']
['V. Vaičaitis. Konkursinis mokslo finansavimas ar pasityčiojimas iš mokslininkų?']

Clustering

In progresss

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

minven / nlp-lt

Programming Languages

Labels