booknlpBookNLP, a natural language processing pipeline for books
Stars: ✭ 636 (+1414.29%)
etymology-dbAn open etymology dataset created using Wiktionary data. Contains 3.8M entries, 1.8M terms, 2900 languages, and 31 unique relationship types.
Stars: ✭ 20 (-52.38%)
R-data-wranglingMaterials for my my R data workshop. https://cengel.github.io/R-data-wrangling/
Stars: ✭ 17 (-59.52%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+69.05%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-4.76%)
Data-Wrangling-with-PythonSimplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+114.29%)
mybookMy Lectures on Computational Communication
Stars: ✭ 33 (-21.43%)
IMDB-ScraperScrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-11.9%)
textboxText collections made available by the CLiGS group.
Stars: ✭ 19 (-54.76%)
Pythoncovers python basic to advance topics, practice questions, logical problems in python, web development using html, css, bootstrap, jquery, DOM, Django 🚀🚀. 💥 🌈
Stars: ✭ 29 (-30.95%)
WaWebSessionHandler(DISCONTINUED) Save WhatsApp Web Sessions as files and open them everywhere!
Stars: ✭ 27 (-35.71%)
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-50%)
extractnetA Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (+23.81%)
named-entity-recognitionNotebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities
Stars: ✭ 18 (-57.14%)
htmlunit🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
Stars: ✭ 39 (-7.14%)
wiki从diy行为艺术到diy苏格拉底式对话,从diy一个仪式到diy一次旷课,各种活动指南的百科。diy💔是706孵化的一个非代码开源项目。
Stars: ✭ 49 (+16.67%)
PaperScraperA web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journals.
Stars: ✭ 63 (+50%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+26.19%)
Econ-Data-ScienceArticles/ Journals and Videos related to Economics📈 and Data Science 📊
Stars: ✭ 102 (+142.86%)
faexportThe API for Furaffinity you wish existed
Stars: ✭ 61 (+45.24%)
codechef-rank-comparatorWeb application hosted on Heroku cloud platform based on web scraping in python using lxml library (XML Path Language).
Stars: ✭ 23 (-45.24%)
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-64.29%)
reapr🕸→ℹ️ Reap Information from Websites
Stars: ✭ 14 (-66.67%)
iowebWeb Scraping Framework
Stars: ✭ 31 (-26.19%)
leetcode-compensationCompensation analysis on the posts scraped from leetcode.com/discuss/compensation. At present, the reports have been generated only for Indian cities.
Stars: ✭ 83 (+97.62%)
actor-scraperHouse of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
Stars: ✭ 83 (+97.62%)
rreddit𝐫⟋ Get Reddit data
Stars: ✭ 49 (+16.67%)
sp-subway-scraper🚆This web scraper builds a dataset for São Paulo subway operation status
Stars: ✭ 24 (-42.86%)
heroshiHeroshi – open source web crawler.
Stars: ✭ 51 (+21.43%)
iwwAI based web-wrapper for web-content-extraction
Stars: ✭ 61 (+45.24%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-64.29%)
Linkedin-ClientWeb scraper for grabing data from Linkedin profiles or company pages (personal project)
Stars: ✭ 42 (+0%)
tableau-scrapingTableau scraper python library. R and Python scripts to scrape data from Tableau viz
Stars: ✭ 91 (+116.67%)
grailerweb scraping tool for grailed.com
Stars: ✭ 30 (-28.57%)
Intro-Cultural-AnalyticsIntroduction to Cultural Analytics & Python, course website and online textbook powered by Jupyter Book
Stars: ✭ 137 (+226.19%)
cl-torrentsSearching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)
Stars: ✭ 83 (+97.62%)
cummings.eeA collection of the work of Edward Estlin Cummings, as it enters the public domain.
Stars: ✭ 32 (-23.81%)
rymscraperPython API to extract data from rateyourmusic.com.
Stars: ✭ 63 (+50%)
scraping-ebayScraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (+88.1%)
savedditBulk Downloader for Reddit
Stars: ✭ 130 (+209.52%)
TraduXioA participative platform for cultural texts translators
Stars: ✭ 19 (-54.76%)
halfstaff🇺🇸 Is the US flag at half-staff?
Stars: ✭ 22 (-47.62%)
TopicsExplorerExplore your own text collection with a topic model – without prior knowledge.
Stars: ✭ 53 (+26.19%)
ham4corpusData from "Hamilton: An American Musical", formatted for reuse. See below for some interesting text analysis basic findings! I am not throwing away my stopword?
Stars: ✭ 53 (+26.19%)
TikTokDownloader PyWebIO🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具,支持API调用,在线批量解析及下载。
Stars: ✭ 919 (+2088.1%)
twicTopic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models
Stars: ✭ 51 (+21.43%)
srqmAn introductory statistics course for social scientists, using Stata
Stars: ✭ 43 (+2.38%)
investigation-amazon-brandsMaterials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"
Stars: ✭ 56 (+33.33%)
automation-scriptsSimple scripts that I'm using to automate the boring things.
Stars: ✭ 14 (-66.67%)
dvtDistant Viewing Toolkit for the Analysis of Visual Culture
Stars: ✭ 57 (+35.71%)
linkextractorA Docker tutorial using a link extraction application example
Stars: ✭ 41 (-2.38%)
GSoC-Data-AnalyserSimple search for organisations participating/participated in the GSoC
Stars: ✭ 29 (-30.95%)
Node-js-functionalitiesThis repository contains very useful restful API's and functionalities in node-js containing many important tutorial code for mastering node-js, all tutorials have been published on medium.com, tutorials link is given below
Stars: ✭ 69 (+64.29%)