Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

Stars: ✭ 3,154 (+2383.46%)

Mutual labels: crawling

Grawler

Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.

Stars: ✭ 98 (-22.83%)

Mutual labels: crawling

TextGridTools

Read, write, and manipulate Praat TextGrid files with Python

Stars: ✭ 84 (-33.86%)

Mutual labels: linguistics

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+401.57%)

Mutual labels: crawling

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (+259.06%)

Mutual labels: crawling

serverless-instagram-crawler

serverless, instagram hashtag crawler with lambda, dynamoDB

Stars: ✭ 33 (-74.02%)

Mutual labels: crawling

Python Crawling Tutorial

Python crawling tutorial

Stars: ✭ 57 (-55.12%)

Mutual labels: crawling

Isp Data Pollution

ISP Data Pollution to Protect Private Browsing History with Obfuscation

Stars: ✭ 425 (+234.65%)

Mutual labels: crawling

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

Stars: ✭ 100 (-21.26%)

Mutual labels: crawling

Stopstalk Deployment

Stop stalking and start StopStalking 😉

Stars: ✭ 276 (+117.32%)

Mutual labels: crawling

Psychopy

For running psychology and neuroscience experiments

Stars: ✭ 1,020 (+703.15%)

Mutual labels: linguistics

rsyntaxtree

Syntax tree generator made with Ruby and RMagic

Stars: ✭ 62 (-51.18%)

Mutual labels: linguistics

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Stars: ✭ 42,343 (+33240.94%)

Mutual labels: crawling

treebender

A HDPSG-inspired symbolic natural language parser written in Rust

Stars: ✭ 24 (-81.1%)

Mutual labels: linguistics

Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

Stars: ✭ 789 (+521.26%)

Mutual labels: crawling

OpenGNT

Open Greek New Testament Project; NA28 / NA27 Equivalent Text & Resources

Stars: ✭ 55 (-56.69%)

Mutual labels: linguistics

Dig Etl Engine

Download DIG to run on your laptop or server.

Stars: ✭ 81 (-36.22%)

Mutual labels: crawling

popular restaurants from officials

서울시 공무원의 업무추진비를 분석하여 진짜 맛집 찾기 프로젝트

Stars: ✭ 22 (-82.68%)

Mutual labels: crawling

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (+3938.58%)

Mutual labels: crawling

Weixin public corpus

微信公众号语料库

Stars: ✭ 465 (+266.14%)

Mutual labels: linguistics

talospider

talospider - A simple,lightweight scraping micro-framework

Stars: ✭ 57 (-55.12%)

Mutual labels: crawling

Beta

An open source reimplementation of Benny Brodda's BETA in Python

Stars: ✭ 65 (-48.82%)

Mutual labels: linguistics

Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

Stars: ✭ 440 (+246.46%)

Mutual labels: crawling

Elpis

🙊 WIP software for creating speech recognition models.

Stars: ✭ 101 (-20.47%)

Mutual labels: linguistics

Pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Stars: ✭ 426 (+235.43%)

Mutual labels: linguistics

Yesterday I Learned

Brainfarts are caused by the rupturing of the cerebral sphincter.

Stars: ✭ 50 (-60.63%)

Mutual labels: linguistics

Webster

a reliable high-level web crawling & scraping framework for Node.js.

Stars: ✭ 364 (+186.61%)

Mutual labels: crawling

Colibri Core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Stars: ✭ 112 (-11.81%)

Mutual labels: linguistics

Sasila

一个灵活、友好的爬虫框架

Stars: ✭ 286 (+125.2%)

Mutual labels: crawling

Python Datamuse

Python 3 wrapper for the Datamuse API

Stars: ✭ 47 (-62.99%)

Mutual labels: linguistics

Gopa

[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn

Stars: ✭ 277 (+118.11%)

Mutual labels: crawling

Wikipron

Massively multilingual pronunciation mining

Stars: ✭ 99 (-22.05%)

Mutual labels: linguistics

Spidy

The simple, easy to use command line web crawler.

Stars: ✭ 257 (+102.36%)

Mutual labels: crawling

Phonemes

Jason Riggle's chart of phonological features in JSON format + extras

Stars: ✭ 33 (-74.02%)

Mutual labels: linguistics

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (-46.46%)

Mutual labels: crawling

Awesome Puppeteer

A curated list of awesome puppeteer resources.

Stars: ✭ 1,728 (+1260.63%)

Mutual labels: crawling

bots-zoo

No description or website provided.

Stars: ✭ 59 (-53.54%)

Mutual labels: crawling

Awesome Sentiment Analysis

😀😄😂😭 A curated list of Sentiment Analysis methods, implementations and misc. 😥😟😱😤

Stars: ✭ 816 (+542.52%)

Mutual labels: linguistics

concepticon-data

The curation repository for the data behind Concepticon.

Stars: ✭ 25 (-80.31%)

Mutual labels: linguistics

Flat

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

Stars: ✭ 93 (-26.77%)

Mutual labels: linguistics

wikipron

Massively multilingual pronunciation mining

Stars: ✭ 167 (+31.5%)

Mutual labels: linguistics

Nltk data

NLTK Data

Stars: ✭ 675 (+431.5%)

Mutual labels: linguistics

flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons