All Projects → Corpuscrawler → Similar Projects or Alternatives

145 Open source projects that are alternatives of or similar to Corpuscrawler

Spidermon
Scrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+143.31%)
Mutual labels:  crawling
spanish-corpora
Unannotated Spanish 3 Billion Words Corpora
Stars: ✭ 61 (-51.97%)
Mutual labels:  linguistics
Pdf downloader
A Scrapy Spider for downloading PDF files from a webpage.
Stars: ✭ 18 (-85.83%)
Mutual labels:  crawling
Lexpredict Lexnlp
LexNLP by LexPredict
Stars: ✭ 439 (+245.67%)
Mutual labels:  linguistics
mystem
CGo bindings to Yandex.Mystem
Stars: ✭ 28 (-77.95%)
Mutual labels:  linguistics
Crawling Projects
Web scraping and automation using python
Stars: ✭ 49 (-61.42%)
Mutual labels:  crawling
Apify Js
Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+2383.46%)
Mutual labels:  crawling
Grawler
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (-22.83%)
Mutual labels:  crawling
TextGridTools
Read, write, and manipulate Praat TextGrid files with Python
Stars: ✭ 84 (-33.86%)
Mutual labels:  linguistics
Scrapyrt
HTTP API for Scrapy spiders
Stars: ✭ 637 (+401.57%)
Mutual labels:  crawling
Dataflowkit
Extract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+259.06%)
Mutual labels:  crawling
serverless-instagram-crawler
serverless, instagram hashtag crawler with lambda, dynamoDB
Stars: ✭ 33 (-74.02%)
Mutual labels:  crawling
Python Crawling Tutorial
Python crawling tutorial
Stars: ✭ 57 (-55.12%)
Mutual labels:  crawling
Isp Data Pollution
ISP Data Pollution to Protect Private Browsing History with Obfuscation
Stars: ✭ 425 (+234.65%)
Mutual labels:  crawling
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-21.26%)
Mutual labels:  crawling
Stopstalk Deployment
Stop stalking and start StopStalking 😉
Stars: ✭ 276 (+117.32%)
Mutual labels:  crawling
Psychopy
For running psychology and neuroscience experiments
Stars: ✭ 1,020 (+703.15%)
Mutual labels:  linguistics
rsyntaxtree
Syntax tree generator made with Ruby and RMagic
Stars: ✭ 62 (-51.18%)
Mutual labels:  linguistics
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+33240.94%)
Mutual labels:  crawling
treebender
A HDPSG-inspired symbolic natural language parser written in Rust
Stars: ✭ 24 (-81.1%)
Mutual labels:  linguistics
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+521.26%)
Mutual labels:  crawling
OpenGNT
Open Greek New Testament Project; NA28 / NA27 Equivalent Text & Resources
Stars: ✭ 55 (-56.69%)
Mutual labels:  linguistics
Dig Etl Engine
Download DIG to run on your laptop or server.
Stars: ✭ 81 (-36.22%)
Mutual labels:  crawling
popular restaurants from officials
서울시 공무원의 업무추진비를 분석하여 진짜 맛집 찾기 프로젝트
Stars: ✭ 22 (-82.68%)
Mutual labels:  crawling
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+3938.58%)
Mutual labels:  crawling
Weixin public corpus
微信公众号语料库
Stars: ✭ 465 (+266.14%)
Mutual labels:  linguistics
talospider
talospider - A simple,lightweight scraping micro-framework
Stars: ✭ 57 (-55.12%)
Mutual labels:  crawling
Beta
An open source reimplementation of Benny Brodda's BETA in Python
Stars: ✭ 65 (-48.82%)
Mutual labels:  linguistics
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+246.46%)
Mutual labels:  crawling
Elpis
🙊 WIP software for creating speech recognition models.
Stars: ✭ 101 (-20.47%)
Mutual labels:  linguistics
Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (+235.43%)
Mutual labels:  linguistics
Yesterday I Learned
Brainfarts are caused by the rupturing of the cerebral sphincter.
Stars: ✭ 50 (-60.63%)
Mutual labels:  linguistics
Webster
a reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+186.61%)
Mutual labels:  crawling
Colibri Core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Stars: ✭ 112 (-11.81%)
Mutual labels:  linguistics
Sasila
一个灵活、友好的爬虫框架
Stars: ✭ 286 (+125.2%)
Mutual labels:  crawling
Python Datamuse
Python 3 wrapper for the Datamuse API
Stars: ✭ 47 (-62.99%)
Mutual labels:  linguistics
Gopa
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+118.11%)
Mutual labels:  crawling
Wikipron
Massively multilingual pronunciation mining
Stars: ✭ 99 (-22.05%)
Mutual labels:  linguistics
Spidy
The simple, easy to use command line web crawler.
Stars: ✭ 257 (+102.36%)
Mutual labels:  crawling
Phonemes
Jason Riggle's chart of phonological features in JSON format + extras
Stars: ✭ 33 (-74.02%)
Mutual labels:  linguistics
ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (-46.46%)
Mutual labels:  crawling
Awesome Puppeteer
A curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+1260.63%)
Mutual labels:  crawling
bots-zoo
No description or website provided.
Stars: ✭ 59 (-53.54%)
Mutual labels:  crawling
Awesome Sentiment Analysis
😀😄😂😭 A curated list of Sentiment Analysis methods, implementations and misc. 😥😟😱😤
Stars: ✭ 816 (+542.52%)
Mutual labels:  linguistics
concepticon-data
The curation repository for the data behind Concepticon.
Stars: ✭ 25 (-80.31%)
Mutual labels:  linguistics
Flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
Stars: ✭ 93 (-26.77%)
Mutual labels:  linguistics
wikipron
Massively multilingual pronunciation mining
Stars: ✭ 167 (+31.5%)
Mutual labels:  linguistics
Nltk data
NLTK Data
Stars: ✭ 675 (+431.5%)
Mutual labels:  linguistics
flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-62.2%)
Mutual labels:  crawling
Skycaiji
蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+1092.13%)
Mutual labels:  crawling
img-cli
An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-88.19%)
Mutual labels:  crawling
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+359.06%)
Mutual labels:  crawling
SlackWebhooksGithubCrawler
Search for Slack Webhooks token publicly exposed on Github
Stars: ✭ 21 (-83.46%)
Mutual labels:  crawling
Textannotationgraphs
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
Stars: ✭ 73 (-42.52%)
Mutual labels:  linguistics
Scrapy Selenium
Scrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+333.07%)
Mutual labels:  crawling
Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-1.57%)
Mutual labels:  crawling
Ichiran
Linguistic tools for texts in Japanese language
Stars: ✭ 120 (-5.51%)
Mutual labels:  linguistics
Pyconll
A minimal, pure Python library to interface with CoNLL-U format files.
Stars: ✭ 104 (-18.11%)
Mutual labels:  linguistics
Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-46.46%)
Mutual labels:  crawling
Ferret
Declarative web scraping
Stars: ✭ 4,837 (+3708.66%)
Mutual labels:  crawling
1-60 of 145 similar projects