All Projects → jurismarches → chopper

jurismarches / chopper

Licence: MIT license
Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to chopper

Requests Html
Pythonic HTML Parsing for Humans™
Stars: ✭ 12,268 (+55663.64%)
Mutual labels:  scraping, beautifulsoup, lxml
Euro2016 TerminalApp
⚽ Instantly find 🏆EURO 2016 live-streams & highlights, now a Web App!
Stars: ✭ 54 (+145.45%)
Mutual labels:  scraping, beautifulsoup
Scraper-Projects
🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (+13.64%)
Mutual labels:  scraping, beautifulsoup
html-table-extractor
extract data from html table
Stars: ✭ 74 (+236.36%)
Mutual labels:  scraping, beautifulsoup
TorScrapper
A Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (+9.09%)
Mutual labels:  scraping, beautifulsoup
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+2550%)
Mutual labels:  scraping, beautifulsoup
linkedin-scraper
Tool to scrape linkedin
Stars: ✭ 74 (+236.36%)
Mutual labels:  scraping, beautifulsoup
Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+2009.09%)
Mutual labels:  scraping, beautifulsoup
Languagepod101 Scraper
Python scraper for Language Pods such as Japanesepod101.com 👹 🗾 🍣 Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
Stars: ✭ 104 (+372.73%)
Mutual labels:  scraping, beautifulsoup
Souqscraper
Simple scriptes for Level UP your scraping Skills, and source code for Level UP playlist on Youtube
Stars: ✭ 118 (+436.36%)
Mutual labels:  scraping, beautifulsoup
Champ
A Telegram bot combined with python to serve some basic functions like weather, music charts, cricket score and much more.
Stars: ✭ 22 (+0%)
Mutual labels:  scraping
iHealth crawler
iHealth 项目的内容爬虫(一个基于 python 和 MongoDB 的医疗咨询爬虫)
Stars: ✭ 24 (+9.09%)
Mutual labels:  lxml
MachineLearning
Machine learning for beginner(Data Science enthusiast)
Stars: ✭ 104 (+372.73%)
Mutual labels:  scraping
serlist
Search engine results page scraper
Stars: ✭ 12 (-45.45%)
Mutual labels:  lxml
Pahe.ph-Scraper
Pahe.ph [Pahe.in] Movies Website Scraper
Stars: ✭ 57 (+159.09%)
Mutual labels:  scraping
List Of User Agents
List of major web + mobile browser user agent strings. +1 Bonus script to scrape :)
Stars: ✭ 247 (+1022.73%)
Mutual labels:  scraping
Musoq
Use SQL on various data sources
Stars: ✭ 252 (+1045.45%)
Mutual labels:  scraping
Memorious
Distributed crawling framework for documents and structured data.
Stars: ✭ 248 (+1027.27%)
Mutual labels:  scraping
readability-cli
A CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!
Stars: ✭ 41 (+86.36%)
Mutual labels:  scraping
scotch-scraping-node
Simple app for scraping author profiles and tutorials from Scotch.io - https://scotch.io.
Stars: ✭ 15 (-31.82%)
Mutual labels:  scraping

axe Chopper

pypi travis coveralls

Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.

Compatible with Python >= 3.6

Installation

pip install chopper

Full documentation

http://chopper.readthedocs.org/en/latest/

Quick start

from chopper.extractor import Extractor

HTML = """
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <div id="header"></div>
    <div id="main">
      <div class="iwantthis">
        HELLO WORLD
        <a href="https://github.com/nope">Do not want</a>
      </div>
    </div>
    <div id="footer"></div>
  </body>
</html>
"""

CSS = """
div { border: 1px solid black; }
div#main { color: blue; }
div.iwantthis { background-color: red; }
a { color: green; }
div#footer { border-top: 2px solid red; }
"""

extractor = Extractor.keep('//div[@class="iwantthis"]').discard('//a')
html, css = extractor.extract(HTML, CSS)

The result is :

>>> html
"""
<html>
  <body>
    <div id="main">
      <div class="iwantthis">
        HELLO WORLD
      </div>
    </div>
  </body>
</html>"""

>>> css
"""
div{border:1px solid black;}
div#main{color:blue;}
div.iwantthis{background-color:red;}
"""
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].