Nlp TutorialTutorial: Natural Language Processing in Python
Stars: ✭ 274 (-3.52%)
ferendaTransform unstructured document collections to structured Linked Data
Stars: ✭ 22 (-92.25%)
jazzThe Scripting Engine that Combines Speed, Safety, and Simplicity
Stars: ✭ 132 (-53.52%)
internet-affordability🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-95.42%)
gunaydinYour good mornings ☀️
Stars: ✭ 16 (-94.37%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-79.23%)
chesfCHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-93.66%)
Olivia💁♀️Your new best friend powered by an artificial neural network
Stars: ✭ 3,114 (+996.48%)
rubiumRubium is a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby
Stars: ✭ 65 (-77.11%)
scraperNodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
Stars: ✭ 37 (-86.97%)
ogpParserOpen Graph Protocol Parser for Node.js
Stars: ✭ 43 (-84.86%)
memes-apiAPI for scrapping common meme sites
Stars: ✭ 17 (-94.01%)
browser-poolA Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (-75%)
AwesomefakenewsThis repository contains recent research on fake news.
Stars: ✭ 270 (-4.93%)
ScrappingMastering the art of scrapping 🎓
Stars: ✭ 24 (-91.55%)
webdextIntelligent Web Data Extractor
Stars: ✭ 75 (-73.59%)
copycatA PHP Scraping Class
Stars: ✭ 70 (-75.35%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-2.46%)
scrapScrapping Facebook with JavaScript.
Stars: ✭ 25 (-91.2%)
PyLexPerform lexical analysis on words, one word at a time.
Stars: ✭ 60 (-78.87%)
Instagram-to-discordMonitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (-60.21%)
NlpythonThis repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (-6.69%)
ha-multiscrapeHome Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (-63.73%)
ZeiverA Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-95.07%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (-92.61%)
Link GrammarThe CMU Link Grammar natural language parser
Stars: ✭ 286 (+0.7%)
scrapmanRetrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
Stars: ✭ 21 (-92.61%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-94.72%)
puppeteer-botcheck🕵♂ Bot detection tests for Puppeteer. Hide and seek!
Stars: ✭ 42 (-85.21%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+1010.56%)
humanparserParse a human name string into salutation, first name, middle name, last name, suffix.
Stars: ✭ 78 (-72.54%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (-92.25%)
PyswipPySwip is a Python - SWI-Prolog bridge enabling to query SWI-Prolog in your Python programs. It features an (incomplete) SWI-Prolog foreign language interface, a utility class that makes it easy querying with Prolog and also a Pythonic interface.
Stars: ✭ 276 (-2.82%)
dustArchive web pages with all relevant assets or save as a single file HTML
Stars: ✭ 19 (-93.31%)
covid19br-pubProjeto de monitoramento de publicações oficiais relacionadas a COVID-19 no Brasil.
Stars: ✭ 12 (-95.77%)
LdaLDA topic modeling for node.js
Stars: ✭ 262 (-7.75%)
oversmashOverwatch API library for player details and career stats
Stars: ✭ 42 (-85.21%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-92.25%)
iowebWeb Scraping Framework
Stars: ✭ 31 (-89.08%)
LambdasoupFunctional HTML scraping and rewriting with CSS in OCaml
Stars: ✭ 280 (-1.41%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-94.01%)
shupA POSIX shell script to parse HTML
Stars: ✭ 28 (-90.14%)
RARBG-scraperWith Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (-86.62%)
Ai Job NotesAI算法岗求职攻略(涵盖准备攻略、刷题指南、内推和AI公司清单等资料)
Stars: ✭ 3,191 (+1023.59%)
ScrapeBotA Selenium-driven tool for automated website interaction and scraping.
Stars: ✭ 16 (-94.37%)
image-collectorDownload images from Google Image Search
Stars: ✭ 38 (-86.62%)
ArchiteuthisMITM HTTP(S) proxy with integrated load-balancing, rate-limiting and error handling. Built for automated web scraping.
Stars: ✭ 35 (-87.68%)
Autonlp🤗 AutoNLP: train state-of-the-art natural language processing models and deploy them in a scalable environment automatically
Stars: ✭ 263 (-7.39%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+66.9%)
naos📉 Uptime and error monitoring CLI
Stars: ✭ 30 (-89.44%)
Textractextract text from any document. no muss. no fuss.
Stars: ✭ 3,165 (+1014.44%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-0.35%)
SwemThe Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"
Stars: ✭ 279 (-1.76%)
Recurrent Entity NetworksTensorFlow implementation of "Tracking the World State with Recurrent Entity Networks".
Stars: ✭ 276 (-2.82%)