Tatooine: A pluggable, simple and powerful web scraper.
Installation
$ npm install tatooine --save
Docs
// schemas: Array<Schema> => A list of schemas.
// customEngines?: Array<CustomEngine> => An optional list of custom engines.
const promise = Tatooine(schemas, customEngines)
Standard Engines
For convenience, Tatooine provide three useful standard engines.
- Markup Engine docs (e.g. RSS, Source Code Scraping, etc.)
- JSON Engine docs (e.g. APIs, Web Services, etc.)
- SPA Engine docs (e.g. Single Page Applications, Async Content, etc.)
Extending Standard Engines
The fork
property allows extends the engine capabilities for your needs while creating schemas for the standard engines spa
, json
and/or markup
.
// index.js
import Tatooine from "tatooine"
const schemas = [{
engine: "json",
options: { ... },
selectors: { ... },
fork({ sources, error }) {
// Do anything you want with the data provided and then;
return { sources, error };
}
}]
const promise = Tatooine(schemas)
Note: The data returned in fork
as parameter is the data already processed using the given schema configs.
Custom Engines
Beyond the standard engines, you can also create custom engines with your own rules whenever needed. Basically, you should follow the structure below to extend Tatooine's engine capabilities:
// xyz-engine.js
function getSourcesFromSomewhere(schema) {
// Your engine logic
}
export default {
engine: "xyz",
run: getSourcesFromSomewhere,
}
// xyz-schema.js
export default {
engine: "xyz",
...
};
// index.js
import Tatooine from "tatooine"
import xyzEngine from "./xyz-engine.js"
import xyzSchema from "./xyz-schema.js"
const promise = Tatooine([xyzSchema], [xyzEngine])