All Projects → imanhodjaev → dust

imanhodjaev / dust

Licence: Apache-2.0 License
Archive web pages with all relevant assets or save as a single file HTML

Programming Languages

elixir
2628 projects

Projects that are alternatives of or similar to dust

feedsearch-crawler
Crawl sites for RSS, Atom, and JSON feeds.
Stars: ✭ 23 (+21.05%)
Mutual labels:  scraping
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+2394.74%)
Mutual labels:  scraping
shup
A POSIX shell script to parse HTML
Stars: ✭ 28 (+47.37%)
Mutual labels:  scraping
http interceptor
A lightweight, simple plugin that allows you to intercept request and response objects and modify them if desired.
Stars: ✭ 74 (+289.47%)
Mutual labels:  http-requests
AngleParse
HTML parsing and processing tool for PowerShell.
Stars: ✭ 35 (+84.21%)
Mutual labels:  scraping
naos
📉 Uptime and error monitoring CLI
Stars: ✭ 30 (+57.89%)
Mutual labels:  scraping
ferenda
Transform unstructured document collections to structured Linked Data
Stars: ✭ 22 (+15.79%)
Mutual labels:  scraping
scrapy facebooker
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (+15.79%)
Mutual labels:  scraping
requestify
Parse a raw HTTP request and generate request code in different languages
Stars: ✭ 25 (+31.58%)
Mutual labels:  http-requests
chirps
Twitter bot powering @arichduvet
Stars: ✭ 35 (+84.21%)
Mutual labels:  scraping
subscene scraper
Library to download subtitles from subscene.com
Stars: ✭ 14 (-26.32%)
Mutual labels:  scraping
centra
Core Node.js HTTP client
Stars: ✭ 52 (+173.68%)
Mutual labels:  http-requests
Scraper-Projects
🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (+31.58%)
Mutual labels:  scraping
node-fetch-har
Generate HAR entries for requests made with node-fetch
Stars: ✭ 23 (+21.05%)
Mutual labels:  http-requests
dmi-instascraper
A GUI for Instaloader to scrape users and hashtags with on Instagram
Stars: ✭ 21 (+10.53%)
Mutual labels:  scraping
Captcha-Tools
All-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!
Stars: ✭ 23 (+21.05%)
Mutual labels:  scraping
web-clipper
Easily download the main content of a web page in html, markdown, and/or epub format from command line.
Stars: ✭ 15 (-21.05%)
Mutual labels:  scraping
pomp
Screen scraping and web crawling framework
Stars: ✭ 61 (+221.05%)
Mutual labels:  scraping
nativescript-http
The best way to do HTTP requests in NativeScript, a drop-in replacement for the core HTTP with important improvements and additions like proper connection pooling, form data support and certificate pinning
Stars: ✭ 32 (+68.42%)
Mutual labels:  http-requests
image-collector
Download images from Google Image Search
Stars: ✭ 38 (+100%)
Mutual labels:  scraping

Build Status

Dust

NOTE: Please note this project is still under development so you might experience issues.

Installation 💾

If available in Hex, the package can be installed by adding dust to your list of dependencies in mix.exs:

def deps do
  [
    {:dust, "~> 0.0.2-dev"}
  ]
end

Usage 🧠

"https://github.com"
|> Dust.get()
|> Dust.persist("AWESOME/PAGE.HTML")

"https://times.com"
|> Dust.get(
  headers: headers,
  proxy: %Proxy{...} | "socks5://user:[email protected]:port",
  max_retries: 3
)
|> Dust.persist(result, "AWESOME/PAGE.HTML")

Documentation

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/dust.

Assets 💄

https://www.flaticon.com/free-icon/dust_867847

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].