All Projects → Nutch → Similar Projects or Alternatives

580 Open source projects that are alternatives of or similar to Nutch

Geode
Apache Geode
Stars: ✭ 2,016 (-11.46%)
Mutual labels:  apache
Memex Explorer
Viewers for statistics and dashboarding of Domain Search Engine data
Stars: ✭ 115 (-94.95%)
Mutual labels:  apache
Azure Event Hubs Spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-93.85%)
Mutual labels:  apache
Tensorflowonyarn
Support TensorFlow on YARN
Stars: ✭ 114 (-94.99%)
Mutual labels:  hadoop
Apache exporter
Prometheus exporter for Apache.
Stars: ✭ 172 (-92.45%)
Mutual labels:  apache
Xlearning Xdml
extremely distributed machine learning
Stars: ✭ 113 (-95.04%)
Mutual labels:  hadoop
Instagram Bot
An Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-93.94%)
Mutual labels:  crawling
Avro Hadoop Starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-95.17%)
Mutual labels:  hadoop
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+469.04%)
Mutual labels:  hadoop
Waterdrop
Production Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (-18.49%)
Mutual labels:  hadoop
Hbaseclient
HBase客户端数据管理软件
Stars: ✭ 135 (-94.07%)
Mutual labels:  hadoop
Echarts
Apache ECharts is a powerful, interactive charting and data visualization library for browser
Stars: ✭ 49,119 (+2057.18%)
Mutual labels:  apache
N2h4
네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (-92.23%)
Mutual labels:  crawling
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+1772.38%)
Mutual labels:  apache
Beyond Jupyter
🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Stars: ✭ 135 (-94.07%)
Mutual labels:  apache
Pulsar
Turn large Web sites into tables and charts using simple SQLs.
Stars: ✭ 100 (-95.61%)
Mutual labels:  web-crawler
Holiday Cn
📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (-93.1%)
Mutual labels:  crawling
Bigdata Notebook
Stars: ✭ 100 (-95.61%)
Mutual labels:  hadoop
Mod auth cas
An Apache httpd module for integrating with Apereo CAS Server project.
Stars: ✭ 130 (-94.29%)
Mutual labels:  apache
Stormkafkamon
Dumps state of Storm Kafka consumers
Stars: ✭ 99 (-95.65%)
Mutual labels:  apache
Htconvert
Convert .htaccess redirects to nginx.conf redirects
Stars: ✭ 171 (-92.49%)
Mutual labels:  apache
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+382.7%)
Mutual labels:  hadoop
Calcite Avatica
Mirror of Apache Calcite - Avatica
Stars: ✭ 130 (-94.29%)
Mutual labels:  hadoop
Grawler
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (-95.7%)
Mutual labels:  crawling
Hadoop Common
Mirror of Apache Hadoop common
Stars: ✭ 155 (-93.19%)
Mutual labels:  hadoop
Incubator Hop
Hop Orchestration Platform
Stars: ✭ 94 (-95.87%)
Mutual labels:  apache
Airflow Pipeline
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-94.38%)
Mutual labels:  hadoop
Wifi
基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (-95.92%)
Mutual labels:  hadoop
Goaccess
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
Stars: ✭ 14,096 (+519.06%)
Mutual labels:  apache
Tinkerpop
Apache TinkerPop - a graph computing framework
Stars: ✭ 1,309 (-42.51%)
Mutual labels:  apache
Spydra
Ephemeral Hadoop clusters using Google Compute Platform
Stars: ✭ 128 (-94.38%)
Mutual labels:  hadoop
Dockerweb
A docker-powered bash script for shared web hosting management. The ultimate Docker LAMP/LEMP Stack.
Stars: ✭ 89 (-96.09%)
Mutual labels:  apache
Correios
A client library for Brazilian Correios APIs and services (SIGEP & SRO).
Stars: ✭ 153 (-93.28%)
Mutual labels:  apache
Hadoop Mapreduce
Mirror of Apache Hadoop MapReduce
Stars: ✭ 88 (-96.14%)
Mutual labels:  hadoop
Bhban rpa
6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)의 예제 코드입니다. 파이썬을 한 번도 배워본 적 없는 분들을 위한 예제이며, 엑셀부터 디자인, 매크로, 크롤링까지 업무 자동화와 관련된 다양한 분야 예제가 제공됩니다.
Stars: ✭ 124 (-94.55%)
Mutual labels:  crawling
Docker Superset
Repository for Docker Image of Apache-Superset. [Docker Image: https://hub.docker.com/r/abhioncbr/docker-superset]
Stars: ✭ 86 (-96.22%)
Mutual labels:  apache
Qpid Proton
Mirror of Apache Qpid Proton
Stars: ✭ 164 (-92.8%)
Mutual labels:  apache
Spark States
Custom state store providers for Apache Spark
Stars: ✭ 83 (-96.35%)
Mutual labels:  apache
Corpuscrawler
Crawler for linguistic corpora
Stars: ✭ 127 (-94.42%)
Mutual labels:  crawling
Docker Hadoop Cluster
Multiple node cluster on Docker for self development.
Stars: ✭ 82 (-96.4%)
Mutual labels:  hadoop
Learn machine learning
Road to Machine Learning
Stars: ✭ 81 (-96.44%)
Mutual labels:  hadoop
Dig Etl Engine
Download DIG to run on your laptop or server.
Stars: ✭ 81 (-96.44%)
Mutual labels:  crawling
Parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-94.51%)
Mutual labels:  hadoop
Guacamole Install Rhel 7
Apache Guacamole installation bash script for RHEL 7 and CentOS 7 including options for Nginx, HTTPS, SSL, LDAP, Let's Encrypt certificates and more
Stars: ✭ 174 (-92.36%)
Mutual labels:  apache
Hudi Resources
汇总Apache Hudi相关资料
Stars: ✭ 79 (-96.53%)
Mutual labels:  apache
Big Whale
Spark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-92.84%)
Mutual labels:  hadoop
Awesome Web Scraper
A collection of awesome web scaper, crawler.
Stars: ✭ 147 (-93.54%)
Mutual labels:  web-crawler
Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-94.51%)
Mutual labels:  crawling
Ultimate Dork
Web Crawler
Stars: ✭ 79 (-96.53%)
Mutual labels:  web-crawler
Mod auth gssapi
GSSAPI Negotiate module for Apache
Stars: ✭ 78 (-96.57%)
Mutual labels:  apache
Lucenenet
Apache Lucene.NET
Stars: ✭ 1,704 (-25.16%)
Mutual labels:  apache
Php Apache Tika
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Stars: ✭ 76 (-96.66%)
Mutual labels:  apache
Docker Spark
🚢 Docker image for Apache Spark
Stars: ✭ 78 (-96.57%)
Mutual labels:  hadoop
Crawler
Go process used to crawl websites
Stars: ✭ 147 (-93.54%)
Mutual labels:  crawling
Proxy
A simple tool for fetching usable proxies from several websites.
Stars: ✭ 124 (-94.55%)
Mutual labels:  web-crawler
Chukwa
Mirror of Apache Chukwa
Stars: ✭ 77 (-96.62%)
Mutual labels:  hadoop
Poi Android
📈 Apache POI for Android
Stars: ✭ 77 (-96.62%)
Mutual labels:  apache
Azure Event Hubs For Kafka
Azure Event Hubs for Apache Kafka Ecosystems
Stars: ✭ 124 (-94.55%)
Mutual labels:  apache
Tf Yarn
Train TensorFlow models on YARN in just a few lines of code!
Stars: ✭ 76 (-96.66%)
Mutual labels:  hadoop
Rare
Fast, realtime regex-extraction, and aggregation into common formats such as histograms, numerical summaries, tables, and more!
Stars: ✭ 76 (-96.66%)
Mutual labels:  apache
61-120 of 580 similar projects