SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+143.31%)
spanish-corporaUnannotated Spanish 3 Billion Words Corpora
Stars: ✭ 61 (-51.97%)
Pdf downloaderA Scrapy Spider for downloading PDF files from a webpage.
Stars: ✭ 18 (-85.83%)
mystemCGo bindings to Yandex.Mystem
Stars: ✭ 28 (-77.95%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+2383.46%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (-22.83%)
TextGridToolsRead, write, and manipulate Praat TextGrid files with Python
Stars: ✭ 84 (-33.86%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+401.57%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+259.06%)
Isp Data PollutionISP Data Pollution to Protect Private Browsing History with Obfuscation
Stars: ✭ 425 (+234.65%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-21.26%)
PsychopyFor running psychology and neuroscience experiments
Stars: ✭ 1,020 (+703.15%)
rsyntaxtreeSyntax tree generator made with Ruby and RMagic
Stars: ✭ 62 (-51.18%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+33240.94%)
treebenderA HDPSG-inspired symbolic natural language parser written in Rust
Stars: ✭ 24 (-81.1%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+521.26%)
OpenGNTOpen Greek New Testament Project; NA28 / NA27 Equivalent Text & Resources
Stars: ✭ 55 (-56.69%)
Dig Etl EngineDownload DIG to run on your laptop or server.
Stars: ✭ 81 (-36.22%)
talospidertalospider - A simple,lightweight scraping micro-framework
Stars: ✭ 57 (-55.12%)
BetaAn open source reimplementation of Benny Brodda's BETA in Python
Stars: ✭ 65 (-48.82%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+246.46%)
Elpis🙊 WIP software for creating speech recognition models.
Stars: ✭ 101 (-20.47%)
PynlplPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (+235.43%)
Yesterday I LearnedBrainfarts are caused by the rupturing of the cerebral sphincter.
Stars: ✭ 50 (-60.63%)
Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+186.61%)
Colibri CoreColibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Stars: ✭ 112 (-11.81%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+125.2%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+118.11%)
WikipronMassively multilingual pronunciation mining
Stars: ✭ 99 (-22.05%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (+102.36%)
PhonemesJason Riggle's chart of phonological features in JSON format + extras
Stars: ✭ 33 (-74.02%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (-46.46%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+1260.63%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-53.54%)
Awesome Sentiment Analysis😀😄😂😭 A curated list of Sentiment Analysis methods, implementations and misc. 😥😟😱😤
Stars: ✭ 816 (+542.52%)
concepticon-dataThe curation repository for the data behind Concepticon.
Stars: ✭ 25 (-80.31%)
FlatFoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
Stars: ✭ 93 (-26.77%)
wikipronMassively multilingual pronunciation mining
Stars: ✭ 167 (+31.5%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-62.2%)
Skycaiji蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+1092.13%)
img-cliAn interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-88.19%)
TextannotationgraphsA modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
Stars: ✭ 73 (-42.52%)
Scrapy SeleniumScrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+333.07%)
SquidwarcSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-1.57%)
IchiranLinguistic tools for texts in Japanese language
Stars: ✭ 120 (-5.51%)
PyconllA minimal, pure Python library to interface with CoNLL-U format files.
Stars: ✭ 104 (-18.11%)
ArachnidPowerful web scraping framework for Crystal
Stars: ✭ 68 (-46.46%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+3708.66%)