Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → hxseven → Htmlsql

hxseven / Htmlsql

htmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax.

Labels

scraping

Projects that are alternatives of or similar to Htmlsql

Email Extractor

The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url

Stars: ✭ 81 (-32.5%)

Mutual labels: scraping

Grawler

Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.

Stars: ✭ 98 (-18.33%)

Mutual labels: scraping

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+35185.83%)

Mutual labels: scraping

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+938.33%)

Mutual labels: scraping

Humanoid

Node.js package to bypass CloudFlare's anti-bot JavaScript challenges

Stars: ✭ 88 (-26.67%)

Mutual labels: scraping

Languagepod101 Scraper

Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

Stars: ✭ 104 (-13.33%)

Mutual labels: scraping

Viewstate

ASP.NET View State Decoder

Stars: ✭ 77 (-35.83%)

Mutual labels: scraping

Awesome Puppeteer

A curated list of awesome puppeteer resources.

Stars: ✭ 1,728 (+1340%)

Mutual labels: scraping

Nintendeals

Library with a set of tools for scraping information about Nintendo games and its prices across all regions (NA, EU and JP).

Stars: ✭ 94 (-21.67%)

Mutual labels: scraping

Webmagic

A scalable web crawler framework for Java.

Stars: ✭ 10,186 (+8388.33%)

Mutual labels: scraping

Billy

legacy backend for Open States

Stars: ✭ 85 (-29.17%)

Mutual labels: scraping

Pastepwn

Python framework to scrape Pastebin pastes and analyze them

Stars: ✭ 87 (-27.5%)

Mutual labels: scraping

D4n155

OWASP D4N155 - Intelligent and dynamic wordlist using OSINT

Stars: ✭ 105 (-12.5%)

Mutual labels: scraping

Google Covid19 Mobility Reports

Data extraction of Google's COVID-19 Mobility Reports

Stars: ✭ 82 (-31.67%)

Mutual labels: scraping

Seleniumcrawler

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

Stars: ✭ 117 (-2.5%)

Mutual labels: scraping

Detect Cms

PHP Library for detecting CMS

Stars: ✭ 78 (-35%)

Mutual labels: scraping

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-16.67%)

Mutual labels: scraping

Od Database

Distributed crawler, database and web frontend for public directories indexing

Stars: ✭ 121 (+0.83%)

Mutual labels: scraping

Souqscraper

Simple scriptes for Level UP your scraping Skills, and source code for Level UP playlist on Youtube

Stars: ✭ 118 (-1.67%)

Mutual labels: scraping

Laravel Bank Statements

Laravel package to collect your bank statements history. Currently support for parsing statements history from BCA, Mandiri, BNI, and MUAMALAT e-banking websites.

Stars: ✭ 105 (-12.5%)

Mutual labels: scraping

View All Similar Projects ➔

htmlSQL - Version 0.5

htmlSQL is an experimental PHP library that allows you to access HTML values by an SQL like syntax. This means that you don't have to write complex functions or regular expressions to extract specific values.

htmlSQL queries look like this:

SELECT href,title FROM a WHERE $class == "list"
       ^ Attributes    ^       ^ search query (can be empty)
         to return     ^
                       ^ HTML tag to search in
                         "*" is possible = all tags

This query should return an array with all links that contain the attribute class="list".

Project Discontinued

HtmlSQL was an experiment I did in 2006. I'm not supporting or extending the library anymore this repository is only for historical purposes. But feel free to fork, modify and study the source code. If you need a reliable library for data scraping I recommend using other modules.

Related projects:

Requirements

Any flavor of PHP4+ should do
Snoopy PHP class - Version 1.2.3 (optional - required for web transfers)
You find all Snoopy related documents (copyright, readme, etc) in the snoopy_data/ subdirectory.

Usage

Just include the "snoopy.class.php" and the "htmlsql.class.php" files into your PHP scripts and look at the examples to get an idea of how to use the htmlSQL library. It should be very simple :-)

Background / idea

I had this idea while extracting some data from a website. As I realized that the algorithms and functions to extract links and other tags are often the same - I had the idea to combine all functions into a universal usable library. While drinking a coffee and thinking about that, I thought it would be cool to access HTML elements by using SQL. So I started creating this library...

Warning

The eval() function is used for the WHERE statement. Make sure that all user data is checked and filtered against malicious PHP code. Never trust any user input!

Todo

Enhance the HTML parser
Test htmlSQL with invalid and bad HTML files
Replace the ugly eval() method for the WHERE statement with an own method
Add more error checks
Add unit tests
Add a LIMIT function like in SQL

Author

Jonas John

License

htmlSQL uses a modified BSD license, you find the full license text in the "htmlsql.class.php".

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 120

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗