All Projects → rflechner → Scrapysharp

rflechner / Scrapysharp

Licence: mit
reborn of https://bitbucket.org/rflechner/scrapysharp

Programming Languages

csharp
926 projects
fsharp
127 projects

Projects that are alternatives of or similar to Scrapysharp

angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (-76.11%)
Mutual labels:  scraper, parsing, scraping
Goose Parser
Universal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (-6.64%)
Mutual labels:  scraper, parsing, scraping
Email Extractor
The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (-64.16%)
Mutual labels:  scraper, scraping
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+451.33%)
Mutual labels:  scraper, scraping
Phpscraper
PHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (-34.51%)
Mutual labels:  scraper, scraping
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+249.12%)
Mutual labels:  scraper, scraping
Pypatent
Search for and retrieve US Patent and Trademark Office Patent Data
Stars: ✭ 31 (-86.28%)
Mutual labels:  scraper, scraping
Udemycoursegrabber
Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Stars: ✭ 137 (-39.38%)
Mutual labels:  scraper, scraping
Ferret
Declarative web scraping
Stars: ✭ 4,837 (+2040.27%)
Mutual labels:  scraper, scraping
Anime Dl
Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.
Stars: ✭ 190 (-15.93%)
Mutual labels:  scraper, scraping
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (-24.34%)
Mutual labels:  scraper, scraping
Jsonframe Cheerio
simple multi-level scraper json input/output for Cheerio
Stars: ✭ 196 (-13.27%)
Mutual labels:  scraper, scraping
Imagescraper
✂️ High performance, multi-threaded image scraper
Stars: ✭ 630 (+178.76%)
Mutual labels:  scraper, scraping
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+2169.47%)
Mutual labels:  scraper, scraping
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+353.1%)
Mutual labels:  scraper, scraping
Jikan
Unofficial MyAnimeList PHP+REST API which provides functions other than the official API
Stars: ✭ 531 (+134.96%)
Mutual labels:  scraper, parsing
Seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (-48.23%)
Mutual labels:  scraper, scraping
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+94.69%)
Mutual labels:  scraper, scraping
Dataflowkit
Extract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+101.77%)
Mutual labels:  scraper, scraping
Serpscrap
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Stars: ✭ 153 (-32.3%)
Mutual labels:  scraper, scraping

Getting started

ScrapySharp has a Web Client able to simulate a real Web browser (handle referrer, cookies …)

Html parsing has to be as natural as possible. So I like to use CSS Selectors and Linq.

This framework wraps HtmlAgilityPack.

Basic examples of CssSelect usages


using System.Linq;
using HtmlAgilityPack;
using ScrapySharp.Extensions;

class Example
{
    public void Main()
    {
        var divs = html.CssSelect("div");  //all div elements
        var nodes = html.CssSelect("div.content"); //all div elements with css class ‘content’
        var nodes = html.CssSelect("div.widget.monthlist"); //all div elements with the both css class
        var nodes = html.CssSelect("#postPaging"); //all HTML elements with the id postPaging
        var nodes = html.CssSelect("div#postPaging.testClass"); // all HTML elements with the id postPaging and css class testClass

        var nodes = html.CssSelect("div.content > p.para"); //p elements who are direct children of div elements with css class ‘content’

        var nodes = html.CssSelect("input[type=text].login"); // textbox with css class login
    }
}

Scrapysharp can also simulate a web browser


ScrapingBrowser browser = new ScrapingBrowser();

//set UseDefaultCookiesParser as false if a website returns invalid cookies format
//browser.UseDefaultCookiesParser = false;

WebPage homePage = browser.NavigateToPage(new Uri("http://www.bing.com/"));

PageWebForm form = homePage.FindFormById("sb_form");
form["q"] = "scrapysharp";
form.Method = HttpVerb.Get;
WebPage resultsPage = form.Submit();

HtmlNode[] resultsLinks = resultsPage.Html.CssSelect("div.sb_tlst h3 a").ToArray();

WebPage blogPage = resultsPage.FindLinks(By.Text("romcyber blog | Just another WordPress site")).Single().Click();

Install Scrapysharp in your project

It's easy to use Scrapysharp in your project.

A Nuget package exists on nuget.org and on myget

News

Scrapysharp V3 is a reborn.

Old version under GPL license is still on bitbucket

Version 3 is a conversion to .net standard 2.0 and a relicensing.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].