All Projects → Sotera → watchman

Sotera / watchman

Licence: GPL-2.0 license
Watchman: An open-source social-media event-detection system

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
HTML
75241 projects
Jupyter Notebook
11667 projects
shell
77523 projects
java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to watchman

Predicting Myers Briggs Type Indicator With Recurrent Neural Networks
Stars: ✭ 43 (+138.89%)
Mutual labels:  social-media, tf-idf
clusterix
Visual exploration of clustered data.
Stars: ✭ 44 (+144.44%)
Mutual labels:  clustering, tf-idf
Social Text View
A custom Android TextView that highlights social media lingo (#hashtags, @mentions, phone, emails, and urls).
Stars: ✭ 64 (+255.56%)
Mutual labels:  social-media, media
Splitter
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
Stars: ✭ 177 (+883.33%)
Mutual labels:  clustering, word2vec
M-NMF
An implementation of "Community Preserving Network Embedding" (AAAI 2017)
Stars: ✭ 119 (+561.11%)
Mutual labels:  clustering, community-detection
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+1066.67%)
Mutual labels:  clustering, word2vec
EgoSplitting
A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).
Stars: ✭ 78 (+333.33%)
Mutual labels:  clustering, community-detection
Stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Stars: ✭ 85 (+372.22%)
Mutual labels:  clustering, tf-idf
LabelPropagation
A NetworkX implementation of Label Propagation from a "Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks" (Physical Review E 2008).
Stars: ✭ 101 (+461.11%)
Mutual labels:  clustering, community-detection
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+405.56%)
Mutual labels:  clustering, named-entity-recognition
Danmf
A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Stars: ✭ 161 (+794.44%)
Mutual labels:  clustering, word2vec
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (+350%)
Mutual labels:  word2vec, tf-idf
Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+10311.11%)
Mutual labels:  clustering, community-detection
Servenet
Service Classification based on Service Description
Stars: ✭ 21 (+16.67%)
Mutual labels:  service, word2vec
Text Summarizer
Python Framework for Extractive Text Summarization
Stars: ✭ 96 (+433.33%)
Mutual labels:  clustering, word2vec
Hack The Media
This repo collects examples of intentional and unintentional hacks of media sources
Stars: ✭ 1,194 (+6533.33%)
Mutual labels:  social-media, media
2018 Machinelearning Lectures Esa
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Stars: ✭ 280 (+1455.56%)
Mutual labels:  clustering, tf-idf
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (+0%)
Mutual labels:  clustering, word2vec
media-roller
A self hosted server to download videos from social media with an iOS shortcut for on-click saving to camera roll
Stars: ✭ 52 (+188.89%)
Mutual labels:  social-media, media
TwitterNER
Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html
Stars: ✭ 134 (+644.44%)
Mutual labels:  social-media, named-entity-recognition

Watchman

What is it?

A core set of utilities frequently used in large data processing / ML projects, exposed as REST endpoints. Want to extract text from HTML?... we've got it. Caption a set of images scraped from the web?... this is your place. Extract entities with MITIE or Stanford NER. Yes please.

Dependencies

  1. Node 6
  2. Strongloop 2
  3. Bower
  4. Docker 1.12
  5. Python 2.7 + 3.5

Dev boostrap

# get working copy of .env file from a friend
npm i -g strongloop bower
npm i
# only if models change...
lb-ng server/server.js client/js/lb-services.js

Install with Docker Compose

docker rm $(docker ps -a -q) # optional, remove all un'composed' containers
sudo service docker restart # optional, but should speed things up
cp .env.template .env # add browser API keys, etc.
git clone https://github.com/Sotera/watchman.git app; cd app # optional if in dev env
cp slc-conf.template.json slc-conf.json
sudo script/docker/install-compose.sh
script/deploy/compose up deploy [branch] # branch optional, default: master
# script/deploy/compose up deploy local # deploy local branch, not remote
script/deploy/compose scale image-fetcher=3

# hint: add /docker-compose.override.yml to override services.

Misc

# build mitie-server image
git clone lukewendling/mitie-server
docker build --no-cache --force-rm -t lukewendling/mitie-server .

docker run -d -p 8888:8888 --name mitie lukewendling/mitie-server
./server/workers/start-extractor.js # start workers
# run a worker standalone
WORKER_SCRIPT=./workers/job-queue npm run dev

Tests

Services

conda env create -f services/environment.yml
source activate watchman
python services/run_tests.py

PySpark Docker container (local or standalone cluster mode)

# watchman services must be running
./script/docker/start-pyspark.sh
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].