Mimo-CrawlerA web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.
Stars: ✭ 22 (-99.03%)
HiveApache Hive
Stars: ✭ 4,031 (+77.03%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (-91.3%)
implyrSQL backend to dplyr for Impala
Stars: ✭ 74 (-96.75%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-97.89%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-93.41%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (-88.71%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-87.83%)
hive-jdbc-driverAn alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-98.64%)
TezApache Tez
Stars: ✭ 313 (-86.25%)
Hive Jdbc Uber JarHive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (-91.74%)
Owasp Mth3l3m3nt FrameworkOWASP Mth3l3m3nt Framework is a penetration testing aiding tool and exploitation framework. It fosters a principle of attack the web using the web as well as pentest on the go through its responsive interface.
Stars: ✭ 139 (-93.9%)
GobblinA distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Stars: ✭ 2,006 (-11.9%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-93.85%)
XlearningAI on Hadoop
Stars: ✭ 1,709 (-24.95%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-92.49%)
Htaccess✂A collection of useful .htaccess snippets.
Stars: ✭ 11,830 (+419.54%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+407.03%)
AbotCross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Stars: ✭ 1,961 (-13.88%)
Collector HttpNorconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
Stars: ✭ 130 (-94.29%)
MassivedlDownload a large list of files concurrently
Stars: ✭ 141 (-93.81%)
GeodeApache Geode
Stars: ✭ 2,016 (-11.46%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-93.85%)
Instagram BotAn Instagram bot developed using the Selenium Framework
Stars: ✭ 138 (-93.94%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+469.04%)
N2h4네이버 뉴스 수집을 위한 도구
Stars: ✭ 177 (-92.23%)
Beyond Jupyter🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Stars: ✭ 135 (-94.07%)
Holiday Cn📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (-93.1%)
Mod auth casAn Apache httpd module for integrating with Apereo CAS Server project.
Stars: ✭ 130 (-94.29%)
HtconvertConvert .htaccess redirects to nginx.conf redirects
Stars: ✭ 171 (-92.49%)
Hadoop CommonMirror of Apache Hadoop common
Stars: ✭ 155 (-93.19%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-27.89%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-94.38%)
GoaccessGoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
Stars: ✭ 14,096 (+519.06%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-92.23%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+439.17%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (-8.12%)
Serverpilot LetsencryptAutomate the installation of Let's Encrypt SSL on the free plan of ServerPilot
Stars: ✭ 129 (-94.33%)
SpydraEphemeral Hadoop clusters using Google Compute Platform
Stars: ✭ 128 (-94.38%)
CorreiosA client library for Brazilian Correios APIs and services (SIGEP & SRO).
Stars: ✭ 153 (-93.28%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-94.38%)
Bhban rpa6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)의 예제 코드입니다. 파이썬을 한 번도 배워본 적 없는 분들을 위한 예제이며, 엑셀부터 디자인, 매크로, 크롤링까지 업무 자동화와 관련된 다양한 분야 예제가 제공됩니다.
Stars: ✭ 124 (-94.55%)
Qpid ProtonMirror of Apache Qpid Proton
Stars: ✭ 164 (-92.8%)
Hadoop HdfsMirror of Apache Hadoop HDFS
Stars: ✭ 152 (-93.32%)
Newznab TmuxLaravel based usenet indexer
Stars: ✭ 127 (-94.42%)
CorpuscrawlerCrawler for linguistic corpora
Stars: ✭ 127 (-94.42%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-94.47%)
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-94.51%)
Guacamole Install Rhel 7Apache Guacamole installation bash script for RHEL 7 and CentOS 7 including options for Nginx, HTTPS, SSL, LDAP, Let's Encrypt certificates and more
Stars: ✭ 174 (-92.36%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-92.84%)