All Projects → gsscoder → pickall

gsscoder / pickall

Licence: MIT license
.NET agile and extensible web searching API

Programming Languages

C#
18002 projects

Projects that are alternatives of or similar to pickall

Thal
Getting started with Puppeteer and Chrome Headless for Web Scraping
Stars: ✭ 2,345 (+9280%)
Mutual labels:  scraping
Jsoup Annotations
Jsoup Annotations POJO
Stars: ✭ 242 (+868%)
Mutual labels:  scraping
algoexpert
AlgoExpert is an online platform that helps software engineers to prepare for coding and technical interviews.
Stars: ✭ 8 (-68%)
Mutual labels:  searching
Goose Parser
Universal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (+744%)
Mutual labels:  scraping
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+856%)
Mutual labels:  scraping
Memorious
Distributed crawling framework for documents and structured data.
Stars: ✭ 248 (+892%)
Mutual labels:  scraping
Panther
A browser testing and web crawling library for PHP and Symfony
Stars: ✭ 2,480 (+9820%)
Mutual labels:  scraping
tvseries
TV Series is a tool that scrapes Episode Synopsis' of popular TV Series' from websites like Wikipedia / IMDb and show in one place with a user-friendly navigation UI.
Stars: ✭ 37 (+48%)
Mutual labels:  scraping
Reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 240 (+860%)
Mutual labels:  scraping
Champ
A Telegram bot combined with python to serve some basic functions like weather, music charts, cricket score and much more.
Stars: ✭ 22 (-12%)
Mutual labels:  scraping
Search Engine Parser
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
Stars: ✭ 216 (+764%)
Mutual labels:  scraping
Scrapysharp
reborn of https://bitbucket.org/rflechner/scrapysharp
Stars: ✭ 226 (+804%)
Mutual labels:  scraping
Musoq
Use SQL on various data sources
Stars: ✭ 252 (+908%)
Mutual labels:  scraping
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+62040%)
Mutual labels:  scraping
google-scraper
This class can retrieve search results from Google.
Stars: ✭ 33 (+32%)
Mutual labels:  scraping
Transistor
Transistor, a Python web scraping framework for intelligent use cases.
Stars: ✭ 205 (+720%)
Mutual labels:  scraping
Loconotion
📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Stars: ✭ 237 (+848%)
Mutual labels:  scraping
github-languages
Tiny little ruby on rails website that crawls though your public github repos to find out what your favourite languages are.
Stars: ✭ 23 (-8%)
Mutual labels:  scraping
Whatsapp-Net
Generate a network graph of connections from your WhatsApp groups data
Stars: ✭ 75 (+200%)
Mutual labels:  scraping
List Of User Agents
List of major web + mobile browser user agent strings. +1 Bonus script to scrape :)
Stars: ✭ 247 (+888%)
Mutual labels:  scraping

Build Status NuGet NuGet Join the Gitter chat!

PickAll

alt text

.NET agile and extensible web searching API. Built with AngleSharp.

Philosophy

PickAll is primarily designed to collect a limited amount of results (possibly the more relavant) from different sources and process these in a chain of steps. Results are essentially URLs and descriptions, but more data can be handled.

Documentation

Documentation is available in the project wiki.

Targets

  • .NET Standard 2.0
  • .NET Core 3.1
  • .NET 5.0

Install via NuGet

$ dotnet add package PickAll --version 1.3.1
  Determining projects to restore...
  ...

Build and sample

# clone the repository
$ git clone https://github.com/gsscoder/pickall.git

# build the package
$ cd pickall/src/PickAll
$ dotnet build -c release

# execute sample
$ cd pickall/samples/PickAll.Sample
$ dotnet build -c release
$ cd ../../artifacts/PickAll.Sample/Release/netcoreapp3.0/PickAll.Sample
./PickAll.Sample "Steve Jobs" -e bing:duckduckgo
Searching 'Steve Jobs' ...
[0] Bing: "Steve Jobs - Wikipedia": "https://it.wikipedia.org/wiki/Steve_Jobs"
[0] DuckDuckGo: "Steve Jobs - Wikipedia": "https://en.wikipedia.org/wiki/Steve_Jobs"
[1] DuckDuckGo: "Steve Jobs - Apple, Family & Death - Biography": "https://www.biography.com/business-figure/steve-jobs"
[2] Bing: "CC-BY-SA licenza": "http://creativecommons.org/licenses/by-sa/3.0/"
[2] DuckDuckGo: "Steve Jobs - IMDb": "https://www.imdb.com/name/nm0423418/"
[3] Bing: "Biografia di Steve Jobs - Biografieonline": "https://biografieonline.it/biografia.htm?BioID=1560&biografia=Steve+Jobs"

Test

# change to tests directory
$ cd pickall/tests/PickAll.Specs

# build with debug configuration
$ dotnet build -c debug
...

# execute tests
$ dotnet test
...

At a glance

CSharp:

using PickAll;

var context = new SearchContext()
    .WithEvents()
    .With<Google>() // search on google.com
    .With<Yahoo>() // search on yahoo.com
    .With<Uniqueness>() // remove duplicates
    .With<Order>() // prioritize results
    // match Levenshtein distance with maximum of 15
    .With<FuzzyMatch>(new FuzzyMatchSettings { Text = "mechanics", MaximumDistance = 15 });
    // repeat a search using more frequent words of previous results
    .With<Improve>(new ImproveSettings { WordCount = 2, NoiseLength = 3 })
    // scrape result pages and extract all text
    .With<Textify>(new TextifySettings { IncludeTitle = true, NoiseLength = 3 });
// attach events
context.ResultCreated += (sender, e) => Console.WriteLine($"Result created from {e.Result.Originator}");
// execute services (order of addition)
var results = await context.SearchAsync("quantum physics");
// do anything you need with LINQ
var scientific = results.Where(result => result.Url.Contains("wikipedia"));
foreach (var result in scientific) {
    Console.WriteLine($"{result.Url} {result.Description}");
}

FSharp:

let context = new SearchContext(typeof<Google>,
                                typeof<DuckDuckGo>,
                                typeof<Yahoo>)
let results = context.SearchAsync("quantum physics")
              |> Async.AwaitTask
              |> Async.RunSynchronously

results |> Seq.iter (fun x -> printfn "%s %s" x.Url x.Description)

Libraries

Tools

Icon

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].