All Projects → owainlewis → Falkor

owainlewis / Falkor

Licence: epl-1.0
Open Source web scraping API. Falkor turns web pages into queryable JSON

Programming Languages

clojure
4091 projects

Projects that are alternatives of or similar to Falkor

Instago
Download/access photos, videos, stories, story highlights, postlives, following and followers of Instagram
Stars: ✭ 59 (-67.93%)
Mutual labels:  webscraping
Php Crawler
A php crawler that finds emails on the internets
Stars: ✭ 119 (-35.33%)
Mutual labels:  webscraping
Tiktokbot
A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg
Stars: ✭ 126 (-31.52%)
Mutual labels:  webscraping
Clock
可视化任务调度系统,精简到一个二进制文件 (Web visual task scheduler system , yes ! just one binary solve all the problems !)
Stars: ✭ 86 (-53.26%)
Mutual labels:  webscraping
Wswp
Code for the second edition Web Scraping with Python book by Packt Publications
Stars: ✭ 112 (-39.13%)
Mutual labels:  webscraping
Soup
Web Scraper in Go, similar to BeautifulSoup
Stars: ✭ 1,685 (+815.76%)
Mutual labels:  webscraping
Fifa Fut Data
Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB
Stars: ✭ 55 (-70.11%)
Mutual labels:  webscraping
Decryptr
An extensible API for breaking captchas
Stars: ✭ 154 (-16.3%)
Mutual labels:  webscraping
Nytcrossword
An exploration of New York Times crossword answers from 1994-2017, i.e. the Will Shortz era.
Stars: ✭ 117 (-36.41%)
Mutual labels:  webscraping
Ralger
ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.
Stars: ✭ 130 (-29.35%)
Mutual labels:  webscraping
Udemy bot
An automation bot for free Udemy courses
Stars: ✭ 91 (-50.54%)
Mutual labels:  webscraping
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-45.65%)
Mutual labels:  webscraping
Anirip
🎬 A Crunchyroll show/season ripper
Stars: ✭ 127 (-30.98%)
Mutual labels:  webscraping
Covid 19 jhu data web scrap and cleaning
This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/
Stars: ✭ 80 (-56.52%)
Mutual labels:  webscraping
Stardox
Github stargazers information gathering tool
Stars: ✭ 130 (-29.35%)
Mutual labels:  webscraping
Keeper Core Api
Nunux Keeper core API
Stars: ✭ 55 (-70.11%)
Mutual labels:  webscraping
Geeksforgeeksscrapper
Scrapes g4g and creates PDF
Stars: ✭ 124 (-32.61%)
Mutual labels:  webscraping
Hq bot
📲 Bot to help solve HQ trivia
Stars: ✭ 167 (-9.24%)
Mutual labels:  webscraping
Youtube Projects
This repository contains all the code I use in my YouTube tutorials.
Stars: ✭ 144 (-21.74%)
Mutual labels:  webscraping
Operating Systems Three Easy Pieces
operating systems three easy pieces by Rezmi
Stars: ✭ 128 (-30.43%)
Mutual labels:  webscraping

Falkor

A web service for turning HTML pages into traversable JSON documents

Very early stage development. If you have any feature requests just create an issue on the project

Getting started

Running the server locally

lein uberjar
docker build -t falkor .
docker run -t falkor

# Visit http://localhost:5000

Comming soon

  • Better error handling
  • CORS
  • Query filtering (return only certain attributes)
  • Fetching multiple elements in a single request ( e.g [h1 > a, .subtitle] )

Usage

Get all the title links from the Reddit.com home page

https://falkor-api.herokuapp.com/api/query?url=http://reddit.com&query=a.title

Grab all the news stories from Digg.com

https://falkor-api.herokuapp.com/api/query?url=http://digg.com&query=.story-title%20a

Extract all the images from Digg.com

https://falkor-api.herokuapp.com/api/query?url=http://digg.com&query=img[src]

TODO

Filters to remove some of the attribute cruft

For example if we just want to extract the text for an element and ignore the other attributes

&filter=[text]

License

Copyright © 2015 Forward Digital Limited

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].