All Projects → cmiles74 → scraper

cmiles74 / scraper

Licence: EPL-1.0 license
A simple web scraper built around the JavaFX WebEngine

Programming Languages

clojure
4091 projects

Projects that are alternatives of or similar to scraper

Youtube Comment Suite
Download YouTube comments from numerous videos, playlists, and channels for archiving, general search, and showing activity.
Stars: ✭ 120 (+823.08%)
Mutual labels:  scraper, javafx
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+300%)
Mutual labels:  scraper
JxBrowser-Examples
JxBrowser Examples & Tutorials
Stars: ✭ 49 (+276.92%)
Mutual labels:  javafx
orson-charts
A 3D chart library for Java applications (JavaFX, Swing or server-side).
Stars: ✭ 94 (+623.08%)
Mutual labels:  javafx
diosts
A Go scraper that validates security.txt files and outputs them in the disclose.io JSON format.
Stars: ✭ 18 (+38.46%)
Mutual labels:  scraper
scraped-tvtime-api
A free TVTime API based on scraping TVTime website. No API key required
Stars: ✭ 23 (+76.92%)
Mutual labels:  scraper
JMarkPad
Minimalistic markdown editor with real-time preview
Stars: ✭ 32 (+146.15%)
Mutual labels:  javafx
Brutal-wordlist-Generator
Brutal Wordlist Generator is a java based Application software used to generate the wordlist with best of UX interface
Stars: ✭ 24 (+84.62%)
Mutual labels:  javafx
ScrapeM
A monadic web scraping library
Stars: ✭ 17 (+30.77%)
Mutual labels:  scraper
spring-javafx-material-design-admin
Aplicação desktop para Gerenciamento de estoque e vendas com Spring Boot, JavaFX e Material Design
Stars: ✭ 56 (+330.77%)
Mutual labels:  javafx
ElectronicStoreToolFX
Management tool for an electronics store JavaFx + MySQL
Stars: ✭ 18 (+38.46%)
Mutual labels:  javafx
ModLoaderInstaller
JavaFX application that installs a mod loader for the game The Long Dark
Stars: ✭ 52 (+300%)
Mutual labels:  javafx
VK-Scraper
Scrapes VK user's photos
Stars: ✭ 42 (+223.08%)
Mutual labels:  scraper
esaj
Scrapers for many e-SAJ systems
Stars: ✭ 35 (+169.23%)
Mutual labels:  scraper
InvMan
Open source JavaFX inventory management application
Stars: ✭ 40 (+207.69%)
Mutual labels:  javafx
angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (+315.38%)
Mutual labels:  scraper
MiniMetro Game
Java educational project with JavaFX
Stars: ✭ 20 (+53.85%)
Mutual labels:  javafx
PiHoleWidgets
PiHole Widgets using JAVAFX
Stars: ✭ 67 (+415.38%)
Mutual labels:  javafx
tqrespec
TQRespec - The respec tool for Titan Quest game
Stars: ✭ 59 (+353.85%)
Mutual labels:  javafx
MangaReaderScraper
Search and download mangas from the command line
Stars: ✭ 23 (+76.92%)
Mutual labels:  scraper

Scraper

This project provides a web scraping library built around the JavaFX WebEngine, which in turn is built on top of WebKit. The goal of this project is to provide an robust and easy-to-use web scraper that doesn't require an external binary in order to function. With the introduction of Java 8, this is finally beginning to seem feasible.

If you find this code useful in any way, please feel free to...

Buy Me A Coffee

Usage

It's still early days yet, this project hasn't reached the point where we're releasing builds of the library. Still, you can checkout the project and build it yourself.

[com.nervestaple/scraper "0.1.0-SNAPSHOT"]

Probably more fun is to check out the project and then interact with it directly via the REPL.

$ cd scraper
$ lein repl

From there it's easy to get a handle on a WebEngine instance and scrape out some content.

user> (def we (scraper/get-web-engine))

#'user/we

user> (scraper/load-url we "http://twitch.nervestaple.com")
{:state :ready}

user> (scraper/load-artoo we)
{:state :ready}

user> (scraper/scrape we "h1" {:title "text"})

{"title" "Bishop: Makes Your Web Service Shiny"} {"title" "Why Is My Web Service
API Crappy?"} {"title" "All Your HBase Are Belong to Clojure"}) ({"title" "Work
In Progress"} {"title" "Linux Is All About Choices"} {"title" "Real Life Web App
Integration Testing (IT) with Spring"} {"title" "Bishop: Makes Your Web Service
Shiny"} {"title" "Why Is My Web Service API Crappy?"} {"title" "All Your HBase
Are Belong to Clojure"})

As you can see in the example above, the Artoo.js JavaScript scraping library is injected into the loaded page in order to make your scraping easier. You are welcome! ;-)

If you're interested in being able to see the content that your WebEngine instance is loading, you can get a handle on a WebView instead. This will bring up a new window displaying the WebView.

user> (def wv (scraper/get-web-view))

#'user/wv

user> (def we (:web-engine wv))

#'user/we

Work on the project continues, but this should be enough to get you started.


Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].