All Git Users → TeamHG-Memex

17 open source projects by TeamHG-Memex

1. Eli5
A library for debugging/inspecting machine learning classifiers and explaining their predictions
2. Aquarium
Splash + HAProxy + Docker Compose
✭ 160
python
3. Arachnado
Web Crawling UI and HTTP API, based on Scrapy and Tornado
✭ 145
python
4. Deep Deep
Adaptive crawler which uses Reinforcement Learning methods
5. Formasaurus
Formasaurus tells you the type of an HTML form and its fields using machine learning
✭ 88
html
6. Undercrawler
A generic crawler
✭ 63
python
7. Sitehound Frontend
Site Hound (previously THH) is a Domain Discovery Tool
✭ 21
html
8. Tensorboard logger
Log TensorBoard events without touching TensorFlow
✭ 611
python
9. Scrapy Rotating Proxies
use multiple proxies with Scrapy
10. Sklearn Crfsuite
scikit-learn inspired API for CRFsuite
✭ 363
python
11. scrapy-kafka-export
Scrapy extension which writes crawled items to Kafka
12. autopager
Detect and classify pagination links
✭ 71
HTML
13. json-lines
Read JSON lines (jl) files, including gzipped and broken
✭ 35
python
14. autologin-middleware
Scrapy middleware for the autologin
✭ 36
python
15. agnostic
Agnostic Database Migrations
16. extract-html-diff
extract difference between two html pages
17. url-summary
Show summary of a large number of URLs in a Jupyter Notebook
✭ 15
python
1-17 of 17 user projects