actor-scraperHouse of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
Stars: ✭ 83 (+97.62%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+1592.86%)
rreddit𝐫⟋ Get Reddit data
Stars: ✭ 49 (+16.67%)
Neural-Scam-ArtistWeb Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-57.14%)
sp-subway-scraper🚆This web scraper builds a dataset for São Paulo subway operation status
Stars: ✭ 24 (-42.86%)
2018-2019The GitHub repository containing all the material related to the Computational Thinking and Programming course of the Digital Humanities and Digital Knowledge degree at the University of Bologna (a.a. 2018/2019).
Stars: ✭ 29 (-30.95%)
web-poetWeb scraping Page Objects core library
Stars: ✭ 67 (+59.52%)
heroshiHeroshi – open source web crawler.
Stars: ✭ 51 (+21.43%)
wikirepoPython based Wikidata framework for easy dataframe extraction
Stars: ✭ 33 (-21.43%)
iwwAI based web-wrapper for web-content-extraction
Stars: ✭ 61 (+45.24%)
2017-summer-workshopExercises, data, and more for our 2017 summer workshop (funded by the Estes Fund and in partnership with Project Jupyter and Berkeley's D-Lab)
Stars: ✭ 33 (-21.43%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-64.29%)
PythonScrapyBasicSetupBasic setup with random user agents and IP addresses for Python Scrapy Framework.
Stars: ✭ 57 (+35.71%)
Linkedin-ClientWeb scraper for grabing data from Linkedin profiles or company pages (personal project)
Stars: ✭ 42 (+0%)
HiA Programming language for Web Scraping
Stars: ✭ 14 (-66.67%)
tableau-scrapingTableau scraper python library. R and Python scripts to scrape data from Tableau viz
Stars: ✭ 91 (+116.67%)
npo classifierAutomated coding using machine-learning and remapping the U.S. nonprofit sector: A guide and benchmark
Stars: ✭ 18 (-57.14%)
grailerweb scraping tool for grailed.com
Stars: ✭ 30 (-28.57%)
UofT-Timetable-GeneratorA web application that generates timetables for university students at the University of Toronto
Stars: ✭ 34 (-19.05%)
Intro-Cultural-AnalyticsIntroduction to Cultural Analytics & Python, course website and online textbook powered by Jupyter Book
Stars: ✭ 137 (+226.19%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+469.05%)
cl-torrentsSearching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)
Stars: ✭ 83 (+97.62%)
DocbaoCông cụ quét và phân tích từ khoá các trang báo mạng Việt Nam
Stars: ✭ 230 (+447.62%)
cummings.eeA collection of the work of Edward Estlin Cummings, as it enters the public domain.
Stars: ✭ 32 (-23.81%)
Selenium Python HeliumSelenium-python but lighter: Helium is the best Python library for web automation.
Stars: ✭ 2,732 (+6404.76%)
rymscraperPython API to extract data from rateyourmusic.com.
Stars: ✭ 63 (+50%)
R Web Scraping Cheat SheetGuide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
Stars: ✭ 207 (+392.86%)
Bet On SibylMachine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Stars: ✭ 190 (+352.38%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+26.19%)
scraping-ebayScraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (+88.1%)
actor-content-checkerYou can use this act to monitor any page's content and get a notification when content changes.
Stars: ✭ 16 (-61.9%)
LearnpythonforresearchThis repository provides everything you need to get started with Python for (social science) research.
Stars: ✭ 163 (+288.1%)
TraduXioA participative platform for cultural texts translators
Stars: ✭ 19 (-54.76%)
Netflix CloneNetflix like full-stack application with SPA client and backend implemented in service oriented architecture
Stars: ✭ 156 (+271.43%)
halfstaff🇺🇸 Is the US flag at half-staff?
Stars: ✭ 22 (-47.62%)
HelenaA Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.
Stars: ✭ 151 (+259.52%)
TopicsExplorerExplore your own text collection with a topic model – without prior knowledge.
Stars: ✭ 53 (+26.19%)
PhpscraperPHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+252.38%)
ZillowZillow Scraper for Python using Selenium
Stars: ✭ 141 (+235.71%)
ham4corpusData from "Hamilton: An American Musical", formatted for reuse. See below for some interesting text analysis basic findings! I am not throwing away my stopword?
Stars: ✭ 53 (+26.19%)
Actor Page AnalyzerApify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
Stars: ✭ 124 (+195.24%)
Ayakashi⚡️ Ayakashi.io - The next generation web scraping framework
Stars: ✭ 117 (+178.57%)
TikTokDownloader PyWebIO🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具,支持API调用,在线批量解析及下载。
Stars: ✭ 919 (+2088.1%)
Save For OfflineAndroid app for saving webpages for offline reading.
Stars: ✭ 114 (+171.43%)
twicTopic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models
Stars: ✭ 51 (+21.43%)
RodA Devtools driver for web automation and scraping
Stars: ✭ 1,392 (+3214.29%)
srqmAn introductory statistics course for social scientists, using Stata
Stars: ✭ 43 (+2.38%)
SillyniumAutomate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements
Stars: ✭ 100 (+138.1%)
investigation-amazon-brandsMaterials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"
Stars: ✭ 56 (+33.33%)
savedditBulk Downloader for Reddit
Stars: ✭ 130 (+209.52%)
dvtDistant Viewing Toolkit for the Analysis of Visual Culture
Stars: ✭ 57 (+35.71%)
linkextractorA Docker tutorial using a link extraction application example
Stars: ✭ 41 (-2.38%)
GSoC-Data-AnalyserSimple search for organisations participating/participated in the GSoC
Stars: ✭ 29 (-30.95%)
Node-js-functionalitiesThis repository contains very useful restful API's and functionalities in node-js containing many important tutorial code for mastering node-js, all tutorials have been published on medium.com, tutorials link is given below
Stars: ✭ 69 (+64.29%)
bechdel-testDoes your favorite film pass the test?
Stars: ✭ 25 (-40.48%)