anaskhan96 / Soup

Licence: mit

Web Scraper in Go, similar to BeautifulSoup

Programming Languages

31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Soup

BookingScraper

🌎 🏨 Scrape Booking.com 🏨 🌎

Stars: ✭ 68 (-95.96%)

Mutual labels: webscraper, beautifulsoup, webscraping

fBrowser

Helpful Selenium functions to make web-scraping easier and faster

Stars: ✭ 16 (-99.05%)

Mutual labels: webscraper, webscraping

CoWin-Vaccine-Notifier

Automated Python Script to retrieve vaccine slots availability and get notified when a slot is available.

Stars: ✭ 102 (-93.95%)

Mutual labels: webscraper, webscraping

anime-scraper

[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3

Stars: ✭ 21 (-98.75%)

Mutual labels: webscraper, webscraping

GoodReadsScraper

📚 A GoodReads.com Scraper script to get books reviews including text and rating.

Stars: ✭ 36 (-97.86%)

Mutual labels: beautifulsoup, webscraping

super-anime-downloader

A program which takes an Anime name or URL and downloads the specified range of episodes.

Stars: ✭ 26 (-98.46%)

Mutual labels: webscraper, webscraping

PacPaw

Pawn package manager for SA-MP

Stars: ✭ 14 (-99.17%)

Mutual labels: beautifulsoup, webscraping

MediumScraper

Scraping articles of medium and providing audio versions 📑 to 🔊 using django

Stars: ✭ 12 (-99.29%)

Mutual labels: web-scraper, beautifulsoup

OkanimeDownloader

Scrape your favorite Anime from Okanime.com without effort

Stars: ✭ 13 (-99.23%)

Mutual labels: beautifulsoup, webscraping

TrackPurchase

단 몇줄의 코드로 다양한 쇼핑 플랫폼에서 결제 내역을 긁어오자!

Stars: ✭ 19 (-98.87%)

Mutual labels: webscraper, webscraping

non-api-fb-scraper

Scrape public FaceBook posts from any group or user into a .csv file without needing to register for any API access

Stars: ✭ 40 (-97.63%)

Mutual labels: beautifulsoup, webscraping

Sig To Googlecalendar

A python script to get class schedules on UFLA's SIG and convert to a .CSV file to use in Google Calendar

Stars: ✭ 14 (-99.17%)

Mutual labels: webscraping, beautifulsoup

VideoRecognition-realtime-autotrainer-alerts

State of the art object detection in real-time using YOLOV3 algorithm. Augmented with a process that allows easy training of the classifier as a plug & play solution . Provides alert if an item in an alert list is detected.

Stars: ✭ 36 (-97.86%)

Mutual labels: web-scraper, webscraping

Scrapple

A framework for creating semi-automatic web content extractors

Stars: ✭ 464 (-72.46%)

Mutual labels: web-scraper, beautifulsoup

Daftlistings

A library that enables programmatic interaction with daft.ie. Daft.ie has nationwide coverage and contains about 80% of the total available properties in Ireland.

Stars: ✭ 86 (-94.9%)

Mutual labels: web-scraper, beautifulsoup

Clock

可视化任务调度系统，精简到一个二进制文件 (Web visual task scheduler system , yes ! just one binary solve all the problems !)

Stars: ✭ 86 (-94.9%)

Mutual labels: webscraping

Hive

lots of spider (很多爬虫）

Stars: ✭ 110 (-93.47%)

Mutual labels: beautifulsoup

Covid 19 jhu data web scrap and cleaning

This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/

Stars: ✭ 80 (-95.25%)

Mutual labels: webscraping

Detect Cms

PHP Library for detecting CMS

Stars: ✭ 78 (-95.37%)

Mutual labels: web-scraper

Souqscraper

Simple scriptes for Level UP your scraping Skills, and source code for Level UP playlist on Youtube

Stars: ✭ 118 (-93%)

Mutual labels: beautifulsoup

View All Similar Projects ➔

soup

Web Scraper in Go, similar to BeautifulSoup

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.

Exported variables and functions implemented till now :

var Headers map[string]string // Set headers as a map of key-value pairs, an alternative to calling Header() individually
var Cookies map[string]string // Set cookies as a map of key-value  pairs, an alternative to calling Cookie() individually
func Get(string) (string,error) {} // Takes the url as an argument, returns HTML string
func GetWithClient(string, *http.Client) {} // Takes the url and a custom HTTP client as arguments, returns HTML string
func Post(string, string, interface{}) (string, error) {} // Takes the url, bodyType, and payload as an argument, returns HTML string
func PostForm(string, url.Values) {} // Takes the url and body. bodyType is set to "application/x-www-form-urlencoded"
func Header(string, string) {} // Takes key,value pair to set as headers for the HTTP request made in Get()
func Cookie(string, string) {} // Takes key, value pair to set as cookies to be sent with the HTTP request in Get()
func HTMLParse(string) Root {} // Takes the HTML string as an argument, returns a pointer to the DOM constructed
func Find([]string) Root {} // Element tag,(attribute key-value pair) as argument, pointer to first occurence returned
func FindAll([]string) []Root {} // Same as Find(), but pointers to all occurrences returned
func FindStrict([]string) Root {} //  Element tag,(attribute key-value pair) as argument, pointer to first occurence returned with exact matching values
func FindAllStrict([]string) []Root {} // Same as FindStrict(), but pointers to all occurrences returned
func FindNextSibling() Root {} // Pointer to the next sibling of the Element in the DOM returned
func FindNextElementSibling() Root {} // Pointer to the next element sibling of the Element in the DOM returned
func FindPrevSibling() Root {} // Pointer to the previous sibling of the Element in the DOM returned
func FindPrevElementSibling() Root {} // Pointer to the previous element sibling of the Element in the DOM returned
func Children() []Root {} // Find all direct children of this DOM element
func Attrs() map[string]string {} // Map returned with all the attributes of the Element as lookup to their respective values
func Text() string {} // Full text inside a non-nested tag returned, first half returned in a nested one
func FullText() string {} // Full text inside a nested/non-nested tag returned
func SetDebug(bool) {} // Sets the debug mode to true or false; false by default
func HTML() {} // HTML returns the HTML code for the specific element

Root is a struct, containing three fields :

Pointer containing the pointer to the current html node
NodeValue containing the current html node's value, i.e. the tag name for an ElementNode, or the text in case of a TextNode
Error containing an error in a struct if one occurrs, else nil is returned. A detailed text explaination of the error can be accessed using the Error() function. A field Type in this struct of type ErrorType will denote the kind of error that took place, which will consist of either of the following
- ErrUnableToParse
- ErrElementNotFound
- ErrNoNextSibling
- ErrNoPreviousSibling
- ErrNoNextElementSibling
- ErrNoPreviousElementSibling
- ErrCreatingGetRequest
- ErrInGetRequest
- ErrReadingResponse

Installation

Install the package using the command

go get github.com/anaskhan96/soup

Example

An example code is given below to scrape the "Comics I Enjoy" part (text and its links) from xkcd.

More Examples

package main

import (
	"fmt"
	"github.com/anaskhan96/soup"
	"os"
)

func main() {
	resp, err := soup.Get("https://xkcd.com")
	if err != nil {
		os.Exit(1)
	}
	doc := soup.HTMLParse(resp)
	links := doc.Find("div", "id", "comicLinks").FindAll("a")
	for _, link := range links {
		fmt.Println(link.Text(), "| Link :", link.Attrs()["href"])
	}
}

Contributions

This package was developed in my free time. However, contributions from everybody in the community are welcome, to make it a better web scraper. If you think there should be a particular feature or function included in the package, feel free to open up a new issue or pull request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

anaskhan96 / Soup

Programming Languages

Labels

Projects that are alternatives of or similar to Soup

soup

Installation

Example

Contributions