All Projects → simon987 → Od Database

simon987 / Od Database

Licence: mit
Distributed crawler, database and web frontend for public directories indexing

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Od Database

Databook
A facebook for data
Stars: ✭ 26 (-78.51%)
Mutual labels:  elasticsearch, bootstrap
Milog
Milog 是一基于 Ruby on Rails 的个人博客网站
Stars: ✭ 24 (-80.17%)
Mutual labels:  elasticsearch, bootstrap
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+128.93%)
Mutual labels:  scraping, elasticsearch
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-32.23%)
Mutual labels:  scraping, elasticsearch
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+2717.36%)
Mutual labels:  elasticsearch
Seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (-3.31%)
Mutual labels:  scraping
Startbootstrap Clean Blog
Start Bootstrap is an open source library of free Bootstrap templates and themes. All of the free templates and themes on Start Bootstrap are released under the MIT license, which means you can use them for any purpose, even for commercial projects.
Stars: ✭ 1,604 (+1225.62%)
Mutual labels:  bootstrap
Reactstrap
Simple React Bootstrap 5 components
Stars: ✭ 10,207 (+8335.54%)
Mutual labels:  bootstrap
Toast
A Bootstrap 4.2+ jQuery plugin for the toast component
Stars: ✭ 121 (+0%)
Mutual labels:  bootstrap
Bootstrap Grid Css
The grid and responsive utilities classes extracted from the Bootstrap 4 framework, compiled into CSS.
Stars: ✭ 119 (-1.65%)
Mutual labels:  bootstrap
Movie Website
🎬基于 Node.js + Express + mongoDB + Bootstrap 搭建的电影网站。
Stars: ✭ 118 (-2.48%)
Mutual labels:  bootstrap
Souqscraper
Simple scriptes for Level UP your scraping Skills, and source code for Level UP playlist on Youtube
Stars: ✭ 118 (-2.48%)
Mutual labels:  scraping
Rebirth Ng
rebirth-ng is a ui framework for Angular & bootstrap.
Stars: ✭ 118 (-2.48%)
Mutual labels:  bootstrap
Servicestackvs
ServiceStackVS - Visual Studio extension for ServiceStack
Stars: ✭ 117 (-3.31%)
Mutual labels:  bootstrap
Elassandra
Elassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+1230.58%)
Mutual labels:  elasticsearch
Detectlm
Detecting Lateral Movement with Machine Learning
Stars: ✭ 117 (-3.31%)
Mutual labels:  elasticsearch
Php Sf Flex Webpack Encore Vuejs
A simple app skeleton to try to make every components work together : symfony 4 (latest stable at the date, but work with sf 3.3+ if you just change the versions in composer.json), symfony/flex, webpack-encore, vuejs 2.5.x, boostrap 4 sass
Stars: ✭ 118 (-2.48%)
Mutual labels:  bootstrap
Elastic Docker
Example setups for Elasticsearch, Kibana, Logstash, and Beats with docker-compose
Stars: ✭ 118 (-2.48%)
Mutual labels:  elasticsearch
Django Crud Ajax Login Register Fileupload
Django Crud, Django Crud Application, Django ajax CRUD,Django Boilerplate application, Django Register, Django Login,Django fileupload, CRUD, Bootstrap, AJAX, sample App
Stars: ✭ 118 (-2.48%)
Mutual labels:  bootstrap
Tabler Angular
Maintained by @arunabhdas Tabler for Angular - Components, demos and documentation
Stars: ✭ 118 (-2.48%)
Mutual labels:  bootstrap

OD-Database

OD-Database is a web-crawling project that aims to index a very large number of file links and their basic metadata from open directories (misconfigured Apache/Nginx/FTP servers, or more often, mirrors of various public services).

Each crawler instance fetches tasks from the central server and pushes the result once completed. A single instance can crawl hundreds of websites at the same time (Both FTP and HTTP(S)) and the central server is capable of ingesting thousands of new documents per second.

The data is indexed into elasticsearch and made available via the web frontend (Currently hosted at https://od-db.the-eye.eu/). There is currently ~1.93 billion files indexed (total of about 300Gb of raw data). The raw data is made available as a CSV file here.

2018-09-20-194116_1127x639_scrot

Contributing

Suggestions/concerns/PRs are welcome

Installation (Docker)

git clone --recursive https://github.com/simon987/od-database
cd od-database
mkdir oddb_pg_data/ tt_pg_data/ es_data/ wsb_data/
docker-compose up

Architecture

diag

Running the crawl server

The python crawler that was a part of this project is discontinued, the go implementation is currently in use.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].