All Projects → Norconex → Collector Http

Norconex / Collector Http

Licence: apache-2.0
Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Collector Http

Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+178.46%)
Mutual labels:  search-engine, web-crawler
Downloadsearch
search for any kinds of files to download
Stars: ✭ 124 (-4.62%)
Mutual labels:  search-engine
Hypertag
Knowledge Management for Humans using Machine Learning & Tags
Stars: ✭ 116 (-10.77%)
Mutual labels:  search-engine
Whoogle Search
A self-hosted, ad-free, privacy-respecting metasearch engine
Stars: ✭ 4,645 (+3473.08%)
Mutual labels:  search-engine
Viewcontroller
📌 A view controller manages a set of views that make up a portion of your app’s user interface,it aims to make ui develop change more clear and flexible.(ViewControler 是一种界面开发组件化实现方式,利用它可以将一些复杂的 UI 界面开发组件化.)
Stars: ✭ 117 (-10%)
Mutual labels:  flexible
Crawlab Lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (-6.15%)
Mutual labels:  web-crawler
Search Online
🔍A simple extension for VSCode to search online easily using search engine.
Stars: ✭ 115 (-11.54%)
Mutual labels:  search-engine
Curatedseotools
Best SEO Tools Stash
Stars: ✭ 128 (-1.54%)
Mutual labels:  search-engine
Proxy
A simple tool for fetching usable proxies from several websites.
Stars: ✭ 124 (-4.62%)
Mutual labels:  web-crawler
Pspider
简单易用的Python爬虫框架,QQ交流群:597510560
Stars: ✭ 1,611 (+1139.23%)
Mutual labels:  web-crawler
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+2522.31%)
Mutual labels:  search-engine
Wikiman
Wikiman is an offline search engine for manual pages, Arch Wiki, Gentoo Wiki and other documentation.
Stars: ✭ 117 (-10%)
Mutual labels:  search-engine
Querqy
Query preprocessor for Java-based search engines (Querqy Core and Solr implementation)
Stars: ✭ 122 (-6.15%)
Mutual labels:  search-engine
Xinahn Client
一个开源,高隐私,自架自用的聚合搜索引擎。https://xinahn.com
Stars: ✭ 116 (-10.77%)
Mutual labels:  search-engine
Swift Selection Search
Swift Selection Search (SSS) is a simple Firefox add-on that lets you quickly search for some text in a page using your favorite search engines.
Stars: ✭ 125 (-3.85%)
Mutual labels:  search-engine
Tinysearch
🔍 Tiny, full-text search engine for static websites built with Rust and Wasm
Stars: ✭ 1,705 (+1211.54%)
Mutual labels:  search-engine
Flex.css
flex.css is declarative layout which is compatible with wechat, UC, webview and other main-stream mobile browser and surpports react, vue, angular.
Stars: ✭ 1,537 (+1082.31%)
Mutual labels:  flexible
Dato.rss
The best RSS Search experience you can find
Stars: ✭ 122 (-6.15%)
Mutual labels:  search-engine
Instantsearch Android
A library of widgets and helpers to build instant-search applications on Android.
Stars: ✭ 129 (-0.77%)
Mutual labels:  search-engine
React Modal Box
React Modal Box, is a simple dependency free and customizable React Component to display Modals on your application. Its simple event system allows you to place the modal in the root component of your application, and calling it with the simple mixins, allows you to be flexible. It also includes event handling mixins, so you can detect when the modal is being called or being hidden.
Stars: ✭ 126 (-3.08%)
Mutual labels:  flexible

Norconex HTTP Collector

Norconex HTTP Collector

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable. Can be used command-line with file-based configuration on any OS, or can be embedded into Java applications using well documented APIs.

Visit the web site for binary downloads and documentation:

https://opensource.norconex.com/collectors/http/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].