All Projects → Clean Text → Similar Projects or Alternatives

885 Open source projects that are alternatives of or similar to Clean Text

feedsearch-crawler
Crawl sites for RSS, Atom, and JSON feeds.
Stars: ✭ 23 (-91.9%)
Mutual labels:  scraping
Nlp Tutorial
Tutorial: Natural Language Processing in Python
Stars: ✭ 274 (-3.52%)
ferenda
Transform unstructured document collections to structured Linked Data
Stars: ✭ 22 (-92.25%)
Mutual labels:  scraping
jazz
The Scripting Engine that Combines Speed, Safety, and Simplicity
Stars: ✭ 132 (-53.52%)
Mutual labels:  scraping
internet-affordability
🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-95.42%)
Mutual labels:  scraping
Awesome Distributed Deep Learning
A curated list of awesome Distributed Deep Learning resources.
Stars: ✭ 277 (-2.46%)
gunaydin
Your good mornings ☀️
Stars: ✭ 16 (-94.37%)
Mutual labels:  scraping
bots-zoo
No description or website provided.
Stars: ✭ 59 (-79.23%)
Mutual labels:  scraping
chesf
CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-93.66%)
Mutual labels:  scraping
Olivia
💁‍♀️Your new best friend powered by an artificial neural network
Stars: ✭ 3,114 (+996.48%)
rubium
Rubium is a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby
Stars: ✭ 65 (-77.11%)
Mutual labels:  scraping
scraper
Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
Stars: ✭ 37 (-86.97%)
Mutual labels:  scraping
ogpParser
Open Graph Protocol Parser for Node.js
Stars: ✭ 43 (-84.86%)
Mutual labels:  scraping
Scrapy Crawlera
Crawlera middleware for Scrapy
Stars: ✭ 281 (-1.06%)
Mutual labels:  scraping
angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (-80.99%)
Mutual labels:  scraping
memes-api
API for scrapping common meme sites
Stars: ✭ 17 (-94.01%)
Mutual labels:  scraping
browser-pool
A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (-75%)
Mutual labels:  scraping
Awesomefakenews
This repository contains recent research on fake news.
Stars: ✭ 270 (-4.93%)
Scrapping
Mastering the art of scrapping 🎓
Stars: ✭ 24 (-91.55%)
Mutual labels:  scraping
webdext
Intelligent Web Data Extractor
Stars: ✭ 75 (-73.59%)
Mutual labels:  scraping
copycat
A PHP Scraping Class
Stars: ✭ 70 (-75.35%)
Mutual labels:  scraping
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (-2.46%)
Mutual labels:  scraping
scrap
Scrapping Facebook with JavaScript.
Stars: ✭ 25 (-91.2%)
Mutual labels:  scraping
PyLex
Perform lexical analysis on words, one word at a time.
Stars: ✭ 60 (-78.87%)
Mutual labels:  scraping
Instagram-to-discord
Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
Stars: ✭ 113 (-60.21%)
Mutual labels:  scraping
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (-6.69%)
ha-multiscrape
Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
Stars: ✭ 103 (-63.73%)
Mutual labels:  scraping
Zeiver
A Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-95.07%)
Mutual labels:  scraping
zcrawl
An open source web crawling platform
Stars: ✭ 21 (-92.61%)
Mutual labels:  scraping
Link Grammar
The CMU Link Grammar natural language parser
Stars: ✭ 286 (+0.7%)
scrapman
Retrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
Stars: ✭ 21 (-92.61%)
Mutual labels:  scraping
papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-94.72%)
Mutual labels:  scraping
puppeteer-botcheck
🕵‍♂ Bot detection tests for Puppeteer. Hide and seek!
Stars: ✭ 42 (-85.21%)
Mutual labels:  scraping
Apify Js
Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+1010.56%)
Mutual labels:  scraping
LInkedIn-Reverese-Lookup
🔎Search LinkedIn profile by email address📧
Stars: ✭ 20 (-92.96%)
Mutual labels:  scraping
humanparser
Parse a human name string into salutation, first name, middle name, last name, suffix.
Stars: ✭ 78 (-72.54%)
Mutual labels:  scraping
crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (-92.25%)
Mutual labels:  scraping
Pyswip
PySwip is a Python - SWI-Prolog bridge enabling to query SWI-Prolog in your Python programs. It features an (incomplete) SWI-Prolog foreign language interface, a utility class that makes it easy querying with Prolog and also a Pythonic interface.
Stars: ✭ 276 (-2.82%)
PrawWallpaperDownloader
Download images from reddit
Stars: ✭ 18 (-93.66%)
Mutual labels:  scraping
dust
Archive web pages with all relevant assets or save as a single file HTML
Stars: ✭ 19 (-93.31%)
Mutual labels:  scraping
covid19br-pub
Projeto de monitoramento de publicações oficiais relacionadas a COVID-19 no Brasil.
Stars: ✭ 12 (-95.77%)
Mutual labels:  scraping
Lda
LDA topic modeling for node.js
Stars: ✭ 262 (-7.75%)
oversmash
Overwatch API library for player details and career stats
Stars: ✭ 42 (-85.21%)
Mutual labels:  scraping
scrapy facebooker
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-92.25%)
Mutual labels:  scraping
ioweb
Web Scraping Framework
Stars: ✭ 31 (-89.08%)
Mutual labels:  scraping
Lambdasoup
Functional HTML scraping and rewriting with CSS in OCaml
Stars: ✭ 280 (-1.41%)
Mutual labels:  scraping
scrapy-fieldstats
A Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-94.01%)
Mutual labels:  scraping
shup
A POSIX shell script to parse HTML
Stars: ✭ 28 (-90.14%)
Mutual labels:  scraping
RARBG-scraper
With Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (-86.62%)
Mutual labels:  scraping
Ai Job Notes
AI算法岗求职攻略(涵盖准备攻略、刷题指南、内推和AI公司清单等资料)
Stars: ✭ 3,191 (+1023.59%)
ScrapeBot
A Selenium-driven tool for automated website interaction and scraping.
Stars: ✭ 16 (-94.37%)
Mutual labels:  scraping
image-collector
Download images from Google Image Search
Stars: ✭ 38 (-86.62%)
Mutual labels:  scraping
Architeuthis
MITM HTTP(S) proxy with integrated load-balancing, rate-limiting and error handling. Built for automated web scraping.
Stars: ✭ 35 (-87.68%)
Mutual labels:  scraping
Autonlp
🤗 AutoNLP: train state-of-the-art natural language processing models and deploy them in a scalable environment automatically
Stars: ✭ 263 (-7.39%)
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+66.9%)
Mutual labels:  scraping
naos
📉 Uptime and error monitoring CLI
Stars: ✭ 30 (-89.44%)
Mutual labels:  scraping
Textract
extract text from any document. no muss. no fuss.
Stars: ✭ 3,165 (+1014.44%)
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-0.35%)
Swem
The Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"
Stars: ✭ 279 (-1.76%)
Recurrent Entity Networks
TensorFlow implementation of "Tracking the World State with Recurrent Entity Networks".
Stars: ✭ 276 (-2.82%)
61-120 of 885 similar projects