All Projects → dmotz → stream-snitch

dmotz / stream-snitch

Licence: MIT License
Event emitter for watching text streams with regex patterns

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to stream-snitch

whatsup
Reactive framework, simple, fast, easy to use!
Stars: ✭ 115 (+505.26%)
Mutual labels:  streams
cl-stream
Stream classes for Common Lisp
Stars: ✭ 17 (-10.53%)
Mutual labels:  streams
asynckit
Minimal async jobs utility library, with streams support
Stars: ✭ 21 (+10.53%)
Mutual labels:  streams
urx
urx is a stream-based Reactive state management library
Stars: ✭ 18 (-5.26%)
Mutual labels:  streams
movie-catalog
🎬 A movie catalog app for both Android & IOS ~ Flutter.io project in Dart | Dart, Bloc, Movies
Stars: ✭ 46 (+142.11%)
Mutual labels:  streams
obfuscator
Obfuscate PHP source files with basic XOR encryption in userland code at runtime.
Stars: ✭ 20 (+5.26%)
Mutual labels:  streams
wasm-streams
Bridging between web streams and Rust streams using WebAssembly
Stars: ✭ 61 (+221.05%)
Mutual labels:  streams
sio-go
Authenticated encryption for streams in Go
Stars: ✭ 21 (+10.53%)
Mutual labels:  streams
streams
Simple Go stream processor
Stars: ✭ 20 (+5.26%)
Mutual labels:  streams
queueable
Convert streams to async ⌛ iterables ➰
Stars: ✭ 43 (+126.32%)
Mutual labels:  streams
redis-microservices-demo
Microservice application with various Redis use-cases with RediSearch, RedisGraph and Streams. The data are synchronize between MySQL and Redis using Debezium as a CDC engine
Stars: ✭ 48 (+152.63%)
Mutual labels:  streams
IoTPy
Python for streams
Stars: ✭ 24 (+26.32%)
Mutual labels:  streams
bs-most
Reason/BuckleScript bindings for the Most.js reactive toolkit
Stars: ✭ 41 (+115.79%)
Mutual labels:  streams
stream-registry
Stream Discovery and Stream Orchestration
Stars: ✭ 105 (+452.63%)
Mutual labels:  streams
enumerable4j
Amazing Ruby's "Enumerable" ported to Java
Stars: ✭ 29 (+52.63%)
Mutual labels:  streams
swift-futures
Demand-driven asynchronous programming in Swift
Stars: ✭ 32 (+68.42%)
Mutual labels:  streams
bash-streams-handbook
💻 Learn Bash streams, pipelines and redirection, from beginner to advanced.
Stars: ✭ 153 (+705.26%)
Mutual labels:  streams
web-streams-polyfill
Web Streams, based on the WHATWG spec reference implementation
Stars: ✭ 198 (+942.11%)
Mutual labels:  streams
Layr
A decentralized (p2p) file storage system built atop Kademlia DHT that enforces data integrity, privacy, and availability through sharding, proofs of retrievability, redundancy, and encryption, with smart-contract powered incentive scheme
Stars: ✭ 90 (+373.68%)
Mutual labels:  streams
Jetpack-Compose-MVI-Demo
Demo / Sample Android Project created with Jetpack Compose and MVI Architecture Pattern
Stars: ✭ 114 (+500%)
Mutual labels:  streams

stream-snitch

Event emitter for watching text streams with regex patterns

Dan Motzenbecker, MIT License

@dcmotz

Intro

stream-snitch is a tiny Node module that allows you to match streaming data patterns with regular expressions. It's much like ... | grep, but for Node streams using native events and regular expression objects. It's also a good introduction to the benefits of streams if you're unconvinced or unintroduced.

Use Cases

The most obvious use case is scraping or crawling documents from an external source.

Typically you might buffer the incoming chunks from a response into a string buffer and then inspect the full response in the response's end callback.

For instance, if you had a function intended to download all image URLs embedded in a document:

function scrape(url, fn, cb) {
  http.get(url, function(res) {
    var data = '';
    res.on('data', function(chunk) { data += chunk });
    res.on('end', function() {
      var rx = /<img.+src=["'](.+)['"].?>/gi, src;
      while (src = rx.exec(data)) fn(src);
      cb();
    });
  });
}

Of course, the response could be enormous and bloat your data buffer. What's worse is the response chunks could come slowly and you'd like to perform hundreds of these download tasks concurrently and get the job done as quickly as possible. Waiting for the entire response to finish negates part of the asynchronous benefits Node's model offers and mainly ignores the fact that the response is a stream object that represents the data in steps as they occur.

Here's the same task with stream-snitch:

function scrape(url, fn, cb) {
  http.get(url, function(res) {
    var snitch = new StreamSnitch(/<img.+src=["'](.+)['"].?>/gi);
    snitch.on('match', function(match) { fn(match[1]) });
    res.pipe(snitch);
    res.on('end', cb)
  });
}

The image download tasks (represented by fn) can occur as sources are found without having to wait for a potentially huge or slow request to finish first. Since you specify native regular expressions, the objects sent to match listeners will contain capture group matches as the above demonstrates (match[1]).

For crawling, you could match href properties and recursively pipe their responses through more stream-snitch instances.

Here's another example (in CoffeeScript) from soundscrape that matches data from inline JSON:

scrape = (page, artist, title) ->
  http.get "#{ baseUrl }#{ artist }/#{ title or 'tracks?page=' + page }", (res) ->
    snitch = new StreamSnitch /bufferTracks\.push\((\{.+?\})\)/g
    snitch[if title then 'once' else 'on'] 'match', (match) ->
      download parse match[1]
      scrape ++page, artist, title unless ++trackCount % 10

    res.pipe snitch

Usage

$ npm install stream-snitch

Create a stream-snitch instance with a search pattern, set a match callback, and pipe some data in:

var fs           = require('fs'),
    StreamSnitch = require('stream-snitch'),
    albumList    = fs.createReadStream('./recently_played_(HUGE).txt'),
    cosmicSnitch = new StreamSnitch(/^cosmic\sslop$/mgi);

cosmicSnitch.on('match', console.log.bind(console));
albumList.pipe(cosmicSnitch);

For the lazy, you can even specify the match event callback in the instantiation:

var words = new StreamSnitch(/\s(\w+)\s/g, function(m) { /* ... */ });

Caveats

stream-snitch is simple internally and uses regular expressions for flexibility, rather than more efficient procedural parsing. The first consequence of this is that it only supports streams of text and will decode binary buffers automatically.

Since it offers support for any arbitrary regular expressions including capture groups and start / end operators, chunks are internally buffered and examined and discarded only when matches are found. When given a regular expression in multiline mode (/m), the buffer is cleared at the start of every newline.

stream-snitch will periodically clear its internal buffer if it grows too large, which could occur if no matches are found over a large amount of data or you use an overly broad capture. There is the chance that legitimate match fragments could be discarded when the water mark is reached unless you specify a large enough buffer size for your needs.

The default buffer size is one megabyte, but you can pass a custom size like this if you anticipate a very large capture size:

new StreamSnitch(/.../g, { bufferCap: 1024 * 1024 * 20 });

If you want to reuse a stream-snitch instance after one stream ends, you can manually call the clearBuffer() method.

It should be obvious, but remember to use the m (multiline) flag in your patterns if you're using the $ operator for looking at endings on a line by line basis. If you're legitimately looking for a pattern at the end of a document, stream-snitch only offers some advantage over buffering the entire response, in that it periodically discards chunks from memory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].