Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-45.65%)

Mutual labels: webscraping

Anirip

🎬 A Crunchyroll show/season ripper

Stars: ✭ 127 (-30.98%)

Mutual labels: webscraping

Covid 19 jhu data web scrap and cleaning

This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/

Stars: ✭ 80 (-56.52%)

Mutual labels: webscraping

Stardox

Github stargazers information gathering tool

Stars: ✭ 130 (-29.35%)

Mutual labels: webscraping

Keeper Core Api

Nunux Keeper core API

Stars: ✭ 55 (-70.11%)

Mutual labels: webscraping

Geeksforgeeksscrapper

Scrapes g4g and creates PDF

Stars: ✭ 124 (-32.61%)

Mutual labels: webscraping

Hq bot

📲 Bot to help solve HQ trivia

Stars: ✭ 167 (-9.24%)

Mutual labels: webscraping

Youtube Projects

This repository contains all the code I use in my YouTube tutorials.

Stars: ✭ 144 (-21.74%)

Mutual labels: webscraping

Operating Systems Three Easy Pieces

operating systems three easy pieces by Rezmi

Stars: ✭ 128 (-30.43%)

Mutual labels: webscraping

View All Similar Projects ➔

Falkor

A web service for turning HTML pages into traversable JSON documents

Very early stage development. If you have any feature requests just create an issue on the project

Getting started

Running the server locally

lein uberjar
docker build -t falkor .
docker run -t falkor

# Visit http://localhost:5000

Comming soon

Better error handling
CORS
Query filtering (return only certain attributes)
Fetching multiple elements in a single request ( e.g [h1 > a, .subtitle] )

Usage

Get all the title links from the Reddit.com home page

https://falkor-api.herokuapp.com/api/query?url=http://reddit.com&query=a.title

Grab all the news stories from Digg.com

https://falkor-api.herokuapp.com/api/query?url=http://digg.com&query=.story-title%20a

Extract all the images from Digg.com

https://falkor-api.herokuapp.com/api/query?url=http://digg.com&query=img[src]

TODO

Filters to remove some of the attribute cruft

For example if we just want to extract the text for an element and ignore the other attributes

&filter=[text]

License

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 184

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗