All Projects → Nutch → Similar Projects or Alternatives

580 Open source projects that are alternatives of or similar to Nutch

Ansible Config encoder filters
Ansible role used to deliver the Config Encoder Filters.
Stars: ✭ 48 (-97.89%)
Mutual labels:  apache
apache-baseline
DevSec Apache Baseline - InSpec Profile
Stars: ✭ 37 (-98.38%)
Mutual labels:  apache
Hbaseclient
HBase客户端数据管理软件
Stars: ✭ 135 (-94.07%)
Mutual labels:  hadoop
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-95.13%)
Mutual labels:  hadoop
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (-54.98%)
Mutual labels:  hadoop
Apache-Directory-Listing
A directory listing theme for Apache
Stars: ✭ 138 (-93.94%)
Mutual labels:  apache
Echarts
Apache ECharts is a powerful, interactive charting and data visualization library for browser
Stars: ✭ 49,119 (+2057.18%)
Mutual labels:  apache
qpid-jms
Mirror of Apache Qpid JMS
Stars: ✭ 60 (-97.36%)
Mutual labels:  apache
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+268.56%)
Mutual labels:  web-crawler
BigData-News
基于Spark2.2新闻网大数据实时系统项目
Stars: ✭ 36 (-98.42%)
Mutual labels:  hadoop
N2h4
네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (-92.23%)
Mutual labels:  crawling
ap-airflow
Astronomer Core Docker Images
Stars: ✭ 87 (-96.18%)
Mutual labels:  apache
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (-56.08%)
Mutual labels:  hadoop
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+1772.38%)
Mutual labels:  apache
Spider Flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Stars: ✭ 365 (-83.97%)
Mutual labels:  web-crawler
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (-72.99%)
Mutual labels:  hadoop
Weblogsanalysissystem
A big data platform for analyzing web access logs
Stars: ✭ 37 (-98.38%)
Mutual labels:  hadoop
img-cli
An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-99.34%)
Mutual labels:  crawling
Beyond Jupyter
🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Stars: ✭ 135 (-94.07%)
Mutual labels:  apache
fluent-plugin-webhdfs
Hadoop WebHDFS output plugin for Fluentd
Stars: ✭ 57 (-97.5%)
Mutual labels:  hadoop
Jsr203 Hadoop
A Java NIO file system provider for HDFS
Stars: ✭ 35 (-98.46%)
Mutual labels:  hadoop
popular restaurants from officials
서울시 공무원의 업무추진비를 분석하여 진짜 맛집 찾기 프로젝트
Stars: ✭ 22 (-99.03%)
Mutual labels:  crawling
Pulsar
Turn large Web sites into tables and charts using simple SQLs.
Stars: ✭ 100 (-95.61%)
Mutual labels:  web-crawler
Hive Funnel Udf
Hive UDFs for funnel analysis
Stars: ✭ 72 (-96.84%)
Mutual labels:  hadoop
Webster
a reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (-84.01%)
Mutual labels:  crawling
Awesome Cordova Plugins
A curated list of awesome Cordova Apache Plugins https://cordova.apache.org/plugins/
Stars: ✭ 33 (-98.55%)
Mutual labels:  apache
SlackWebhooksGithubCrawler
Search for Slack Webhooks token publicly exposed on Github
Stars: ✭ 21 (-99.08%)
Mutual labels:  crawling
Holiday Cn
📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (-93.1%)
Mutual labels:  crawling
serverless-instagram-crawler
serverless, instagram hashtag crawler with lambda, dynamoDB
Stars: ✭ 33 (-98.55%)
Mutual labels:  crawling
Akkeeper
An easy way to deploy your Akka services to a distributed environment.
Stars: ✭ 30 (-98.68%)
Mutual labels:  hadoop
SchweizerMesser
🎯Python 3 网络爬虫实战、数据分析合集 | 当当 | 网易云音乐 | unsplash | 必胜客 | 猫眼 |
Stars: ✭ 89 (-96.09%)
Mutual labels:  web-crawler
Bigdata Notebook
Stars: ✭ 100 (-95.61%)
Mutual labels:  hadoop
swordfish
Open-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-98.46%)
Mutual labels:  hadoop
Storm Camel Example
Real-time analysis and visualization with Storm-AMQ-Camel-Websockets-Highcharts integration.
Stars: ✭ 28 (-98.77%)
Mutual labels:  hadoop
qpid-dispatch
Mirror of Apache Qpid Dispatch
Stars: ✭ 62 (-97.28%)
Mutual labels:  apache
Mod auth cas
An Apache httpd module for integrating with Apereo CAS Server project.
Stars: ✭ 130 (-94.29%)
Mutual labels:  apache
pomp
Screen scraping and web crawling framework
Stars: ✭ 61 (-97.32%)
Mutual labels:  crawling
Cdc Kafka Hadoop
MySQL to NoSQL real time dataflow
Stars: ✭ 13 (-99.43%)
Mutual labels:  hadoop
jumbo
🐘 A local Hadoop cluster bootstrapper using Vagrant, Ansible, and Ambari.
Stars: ✭ 17 (-99.25%)
Mutual labels:  hadoop
Stormkafkamon
Dumps state of Storm Kafka consumers
Stars: ✭ 99 (-95.65%)
Mutual labels:  apache
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-98.51%)
Mutual labels:  hadoop
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (-62.36%)
Mutual labels:  hadoop
pyCreeper
一个用来快速提取网页内容的信息采集(爬虫)框架, 实现了对网页的动态加载与控制。
Stars: ✭ 25 (-98.9%)
Mutual labels:  web-crawler
Htconvert
Convert .htaccess redirects to nginx.conf redirects
Stars: ✭ 171 (-92.49%)
Mutual labels:  apache
TIL
Today I Learned
Stars: ✭ 43 (-98.11%)
Mutual labels:  hadoop
Dockerfiles
50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (-62.8%)
Mutual labels:  hadoop
ModSecurityCRS
Implementation of ModSecurity, Core Rule Set (CRS) on Apache server. ModSecurity, sometimes called Modsec, is an open-source web application firewall. ModSecurity was installed and configured on an Ubuntu VM using Virtual Box
Stars: ✭ 24 (-98.95%)
Mutual labels:  apache
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+382.7%)
Mutual labels:  hadoop
PoC-CVE-2021-41773
No description or website provided.
Stars: ✭ 39 (-98.29%)
Mutual labels:  apache
Akarata
Indonesian stemmer - Pustaka JavaScript untuk mengambil kata dasar dari kata berimbuhan pada bahasa Indonesia.
Stars: ✭ 26 (-98.86%)
Mutual labels:  apache
Calcite Avatica
Mirror of Apache Calcite - Avatica
Stars: ✭ 130 (-94.29%)
Mutual labels:  hadoop
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-84.1%)
Mutual labels:  web-crawler
Awesome Puppeteer
A curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (-24.11%)
Mutual labels:  crawling
Docker Debian Base
More complete Debian environment for Docker
Stars: ✭ 70 (-96.93%)
Mutual labels:  apache
Ansible Role Apache
Ansible Role - Apache 2.x.
Stars: ✭ 341 (-85.02%)
Mutual labels:  apache
Server Error Pages
Easy to use, professional error pages to replace the plaintext error pages that come with any server software like Nginx or Apache
Stars: ✭ 338 (-85.16%)
Mutual labels:  apache
Redirect.rules
Quick and dirty dynamic redirect.rules generator
Stars: ✭ 69 (-96.97%)
Mutual labels:  apache
Ytk Learn
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-85.2%)
Mutual labels:  hadoop
Ozone
Scalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (-85.51%)
Mutual labels:  hadoop
Zhihu Crawler People
A simple distributed crawler for zhihu && data analysis
Stars: ✭ 182 (-92.01%)
Mutual labels:  web-crawler
301-360 of 580 similar projects