All Projects → textgain → grasp

textgain / grasp

Licence: other
Essential NLP & ML, short & fast pure Python code

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to grasp

ml
经典机器学习算法的极简实现
Stars: ✭ 130 (+124.14%)
Mutual labels:  naive-bayes, perceptron, decision-tree
Jumanpp
Juman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+337.93%)
Mutual labels:  tokenizer, part-of-speech-tagger
machine-learning
Python machine learning applications in image processing, recommender system, matrix completion, netflix problem and algorithm implementations including Co-clustering, Funk SVD, SVD++, Non-negative Matrix Factorization, Koren Neighborhood Model, Koren Integrated Model, Dawid-Skene, Platt-Burges, Expectation Maximization, Factor Analysis, ISTA, F…
Stars: ✭ 91 (+56.9%)
Mutual labels:  naive-bayes, k-nearest-neighbors
Albert
这个是我个人网站的项目,欢迎贡献代码,力求能够应用到实际工作中java相关的大多数技术栈。有兴趣请Star一下,非常感谢。qq交流群:587577705 这个项目将不断地更新!生产环境:
Stars: ✭ 168 (+189.66%)
Mutual labels:  twitter-api, google-api
Machine-Learning-Models
In This repository I made some simple to complex methods in machine learning. Here I try to build template style code.
Stars: ✭ 30 (-48.28%)
Mutual labels:  naive-bayes, decision-tree
scoruby
Ruby Scoring API for PMML
Stars: ✭ 69 (+18.97%)
Mutual labels:  naive-bayes, decision-tree
GraphiPy
GraphiPy: Universal Social Data Extractor
Stars: ✭ 61 (+5.17%)
Mutual labels:  twitter-api, graph-visualization
Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (+3.45%)
Mutual labels:  sentiment-analysis, part-of-speech-tagger
Omnicat Bayes
Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-48.28%)
Mutual labels:  sentiment-analysis, tokenizer
sentiment-analysis-using-python
Large Data Analysis Course Project
Stars: ✭ 23 (-60.34%)
Mutual labels:  sentiment-analysis, naive-bayes
classifier
A general purpose text classifier
Stars: ✭ 31 (-46.55%)
Mutual labels:  naive-bayes, k-nearest-neighbors
Pynlp
A pythonic wrapper for Stanford CoreNLP.
Stars: ✭ 103 (+77.59%)
Mutual labels:  sentiment-analysis, part-of-speech-tagger
Dom
Modern DOM API.
Stars: ✭ 88 (+51.72%)
Mutual labels:  document-object-model, css-selectors
node-social-feed-api
Aggregates social media feeds and outputs them to use in an API
Stars: ✭ 20 (-65.52%)
Mutual labels:  twitter-api, google-api
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (-5.17%)
Mutual labels:  sentiment-analysis, twitter-api
Tia
Your Advanced Twitter stalking tool
Stars: ✭ 98 (+68.97%)
Mutual labels:  sentiment-analysis, twitter-api
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+4241.38%)
Mutual labels:  sentiment-analysis, part-of-speech-tagger
bert-sentiment
Fine-grained Sentiment Classification Using BERT
Stars: ✭ 49 (-15.52%)
Mutual labels:  sentiment-analysis
go-wikidata
Wikidata API bindings in go.
Stars: ✭ 27 (-53.45%)
Mutual labels:  wikipedia-api
overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification
NLP tutorial
Stars: ✭ 41 (-29.31%)
Mutual labels:  sentiment-analysis

Grasp.py – Explainable AI

Grasp is a lightweight AI toolkit for Python, with tools for data mining, natural language processing (NLP), machine learning (ML) and network analysis. It has 300+ fast and essential algorithms, with ~25 lines of code per function, self-explanatory function names, no dependencies, bundled into one well-documented file: grasp.py (200KB).

Grasp is developed by Textgain, a language tech company that uses AI for societal good.

Tools for Data Mining

Download stuff with download(url) (or dl), with built-in caching and logging:

src = dl('https://www.textgain.com', cached=True)

Parse HTML with dom(html) into an Element tree and search it with CSS Selectors:

for e in dom(src)('a[href^="http"]'): # external links
    print(e.href)

Strip HTML with plain(Element) to get a plain text string:

for word, count in wc(plain(dom(src))).items():
    print(word, count)

Find articles with wikipedia(str), in HTML:

for e in dom(wikipedia('cat', language='en'))('p'):
    print(plain(e))

Find opinions with twitter.seach(str):

for tweet in first(10, twitter.search('from:textgain')): # latest 10
    print(tweet.id, tweet.text, tweet.date)

Deploy APIs with App. Works with WSGI and Nginx:

app = App()
@app.route('/')
def index(*path, **query):
    return 'Hi! %s %s' % (path, query)
app.run('127.0.0.1', 8080, debug=True)

Once this app is up, go check http://127.0.0.1:8080/app?q=cat.

Tools for Natural Language Processing

Find language with lang(str) for 40+ languages and ~92.5% accuracy:

print(lang('The cat sat on the mat.')) # en

Find words & sentences with tok(str) (tokenize) at ~125K words/sec:

print(tok("Mr. etc. aren't sentence breaks! ;) This is:.", language='en'))

Find word polarity with pov(str) (point-of-view). Is it a positive or negative opinion?

print(pov(tok('Nice!', language='en'))) # +0.6
print(pov(tok('Dumb.', language='en'))) # -0.4
  • For de, en, es, fr, nl, with ~75% accuracy.
  • You'll need the language models in grasp/lm.

Find word types with tag(str) in 10+ languages using robust ML models from UD:

for word, pos in tag(tok('The cat sat on the mat.'), language='en'):
    print(word, pos)
  • Parts-of-speech include NOUN, VERB, ADJ, ADV, DET, PRON, PREP, ...
  • For ar, da, de, en, es, fr, it, nl, no, pl, pt, ru, sv, tr, with ~95% accuracy.
  • You'll need the language models in grasp/lm.

Tools for Machine Learning

Machine Learning (ML) algorithms learn by example. If you show them 10K spam and 10K real emails (i.e., train a model), they can predict whether other emails are also spam or not.

Each training example is a {feature: weight} dict with a label. For text, the features could be words, the weights could be word count, and the label might be real or spam.

Quantify text with vec(str) (vectorize) into a {feature: weight} dict:

v1 = vec('I love cats! 😀', features=['c3', 'w1'])
v2 = vec('I hate cats! 😡', features=['c3', 'w1'])
  • c1, c2, c3 count consecutive characters. For c2, cats → 1x ca, 1x at, 1x ts.
  • w1, w2, w3 count consecutive words.

Train models with fit(examples), save as JSON, predict labels:

m = fit([(v1, '+'), (v2, '-')], model=Perceptron) # DecisionTree, KNN, ...
m.save('opinion.json')
m = fit(open('opinion.json'))
print(m.predict(vec('She hates dogs.')) # {'+': 0.4: , '-': 0.6}

Once trained, Model.predict(vector) returns a dict with label probabilities (0.0–1.0).

Tools for Network Analysis

Map networks with Graph, a {node1: {node2: weight}} dict subclass:

g = Graph(directed=True)
g.add('a', 'b') # a → b
g.add('b', 'c') # b → c
g.add('b', 'd') # b → d
g.add('c', 'd') # c → d
print(g.sp('a', 'd')) # shortest path: a → b → d
print(top(pagerank(g))) # strongest node: d, 0.8

See networks with viz(graph):

with open('g.html', 'w') as f:
    f.write(viz(g, src='graph.js'))

You'll need to set src to the grasp/graph.js lib.

Tools for Comfort

Easy date handling with date(v), where v is an int, a str, or another date:

print(date('Mon Jan 31 10:00:00 +0000 2000', format='%Y-%m-%d'))

Easy path handling with cd(...), which always points to the script's folder:

print(cd('kb', 'en-loc.csv')

Easy CSV handling with csv([path]), a list of lists of values:

for code, country, _, _, _, _, _ in csv(cd('kb', 'en-loc.csv')):
    print(code, country)
data = csv()
data.append(('cat', 'Lizzy'))
data.append(('cat', 'Polly'))
data.save(cd('cats.csv'))

Tools for Good

A big concern in AI is bias introduced by human trainers. Remember the Model trained earlier? Grasp has tools to explain how & why it makes decisions:

print(explain(vec('She hates dogs.'), m)) # why so negative?

In the returned dict, the model's explanation is: “you wrote hat + ate (hate)”.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].