Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tidyverse → Rvest

tidyverse / Rvest

Licence: other

Simple web scraping for R

Programming Languages

7636 projects

Labels

html web-scraping

Projects that are alternatives of or similar to Rvest

Coolqlcool

Nextjs server to query websites with GraphQL

Stars: ✭ 623 (-50.28%)

Mutual labels: web-scraping

Actor Google Search Scraper

Apify actor that crawls Google Search result pages (SERPs) and extracts a list of organic results, ads, related queries and more. It supports selection of custom country, language and location.

Stars: ✭ 38 (-96.97%)

Mutual labels: web-scraping

Cascadia

Go cascadia package command line CSS selector

Stars: ✭ 67 (-94.65%)

Mutual labels: web-scraping

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (-47.65%)

Mutual labels: web-scraping

Snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Stars: ✭ 886 (-29.29%)

Mutual labels: web-scraping

Scrapy Craigslist

Web Scraping Craigslist's Engineering Jobs in NY with Scrapy

Stars: ✭ 54 (-95.69%)

Mutual labels: web-scraping

Scrapy Fake Useragent

Random User-Agent middleware based on fake-useragent

Stars: ✭ 520 (-58.5%)

Mutual labels: web-scraping

Reader

Extract clean(er), readable text from web pages via Mercury Web Parser.

Stars: ✭ 75 (-94.01%)

Mutual labels: web-scraping

Uc Davis Cs Exams Analysis

📈 Regression and Classification with UC Davis student quiz data and exam data

Stars: ✭ 33 (-97.37%)

Mutual labels: web-scraping

Decapitated

Headless 'Chrome' Orchestration in R

Stars: ✭ 65 (-94.81%)

Mutual labels: web-scraping

Youtube tutorials

Collection of scripts corresponding to LucidProgramming YouTube tutorials

Stars: ✭ 769 (-38.63%)

Mutual labels: web-scraping

Webmiddle

Node.js framework for modular web scraping and data extraction

Stars: ✭ 13 (-98.96%)

Mutual labels: web-scraping

Instago

Download/access photos, videos, stories, story highlights, postlives, following and followers of Instagram

Stars: ✭ 59 (-95.29%)

Mutual labels: web-scraping

Faster Than Requests

Faster requests on Python 3

Stars: ✭ 639 (-49%)

Mutual labels: web-scraping

Arachnid

Powerful web scraping framework for Crystal

Stars: ✭ 68 (-94.57%)

Mutual labels: web-scraping

Pythoncode Tutorials

The Python Code Tutorials

Stars: ✭ 544 (-56.58%)

Mutual labels: web-scraping

Project Tauro

A Router WiFi key recovery/cracking tool with a twist.

Stars: ✭ 52 (-95.85%)

Mutual labels: web-scraping

Detect Cms

PHP Library for detecting CMS

Stars: ✭ 78 (-93.77%)

Mutual labels: web-scraping

Ping Sm

Receive an email or Telegram message as soon as Migros Sanalmarket is available for delivery in your neighborhood.

Stars: ✭ 71 (-94.33%)

Mutual labels: web-scraping

Social Media Profile Scrapers

Fetch user's data across social media

Stars: ✭ 60 (-95.21%)

Mutual labels: web-scraping

View All Similar Projects ➔

rvest

Overview

rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser.

If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.

Installation

# The easiest way to get rvest is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just rvest:
install.packages("rvest")

Usage

library(rvest)

# Start by reading a HTML page with read_html():
starwars <- read_html("https://rvest.tidyverse.org/articles/starwars.html")

# Then find elements that match a css selector or XPath expression
# using html_elements(). In this example, each <section> corresponds
# to a different film
films <- starwars %>% html_elements("section")
films
#> {xml_nodeset (7)}
#> [1] <section><h2 data-id="1">\nThe Phantom Menace\n</h2>\n<p>\nReleased: 1999 ...
#> [2] <section><h2 data-id="2">\nAttack of the Clones\n</h2>\n<p>\nReleased: 20 ...
#> [3] <section><h2 data-id="3">\nRevenge of the Sith\n</h2>\n<p>\nReleased: 200 ...
#> [4] <section><h2 data-id="4">\nA New Hope\n</h2>\n<p>\nReleased: 1977-05-25\n ...
#> [5] <section><h2 data-id="5">\nThe Empire Strikes Back\n</h2>\n<p>\nReleased: ...
#> [6] <section><h2 data-id="6">\nReturn of the Jedi\n</h2>\n<p>\nReleased: 1983 ...
#> [7] <section><h2 data-id="7">\nThe Force Awakens\n</h2>\n<p>\nReleased: 2015- ...

# Then use html_element() to extract one element per film. Here
# we the title is given by the text inside <h2>
title <- films %>% 
  html_element("h2") %>% 
  html_text2()
title
#> [1] "The Phantom Menace"      "Attack of the Clones"   
#> [3] "Revenge of the Sith"     "A New Hope"             
#> [5] "The Empire Strikes Back" "Return of the Jedi"     
#> [7] "The Force Awakens"

# Or use html_attr() to get data out of attributes. html_attr() always
# returns a string so we convert it to an integer using a readr function
episode <- films %>% 
  html_element("h2") %>% 
  html_attr("data-id") %>% 
  readr::parse_integer()
episode
#> [1] 1 2 3 4 5 6 7

If the page contains tabular data you can convert it directly to a data frame with html_table():

html <- read_html("https://en.wikipedia.org/w/index.php?title=The_Lego_Movie&oldid=998422565")

html %>% 
  html_element(".tracklist") %>% 
  html_table()
#> # A tibble: 29 x 4
#>    No.   Title                    `Performer(s)`                          Length
#>    <chr> <chr>                    <chr>                                   <chr> 
#>  1 1.    "\"Everything Is Awesom… "Tegan and Sara featuring The Lonely I… 2:43  
#>  2 2.    "\"Prologue\""           ""                                      2:28  
#>  3 3.    "\"Emmett's Morning\""   ""                                      2:00  
#>  4 4.    "\"Emmett Falls in Love… ""                                      1:11  
#>  5 5.    "\"Escape\""             ""                                      3:26  
#>  6 6.    "\"Into the Old West\""  ""                                      1:00  
#>  7 7.    "\"Wyldstyle Explains\"" ""                                      1:21  
#>  8 8.    "\"Emmett's Mind\""      ""                                      2:17  
#>  9 9.    "\"The Transformation\"" ""                                      1:46  
#> 10 10.   "\"Saloons and Wagons\"" ""                                      3:38  
#> # … with 19 more rows

Code of Conduct

Please note that the rvest project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 1,253

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗