All Projects → bpolaszek → Bentools Etl

bpolaszek / Bentools Etl

Licence: mit
PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.

Projects that are alternatives of or similar to Bentools Etl

YaEtl
Yet Another ETL in PHP
Stars: ✭ 60 (+33.33%)
Mutual labels:  etl, loader, transformer
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+444.44%)
Mutual labels:  etl, extract, transform
Etl.net
Mass processing data with a complete ETL for .net developers
Stars: ✭ 129 (+186.67%)
Mutual labels:  etl, extract, transform
proc-that
proc(ess)-that - easy extendable ETL tool for Node.js. Written in TypeScript.
Stars: ✭ 25 (-44.44%)
Mutual labels:  etl, loader, transformer
filmow to letterboxd
🎥 💚To import Filmow watched movies in Letterboxd. The script can be found on the letterboxd page about importing data under "Additional tools".
Stars: ✭ 13 (-71.11%)
Mutual labels:  export, import
dynamic-import-assets
Dynamic Imports for JavaScript and CSS.
Stars: ✭ 20 (-55.56%)
Mutual labels:  loader, import
Xnalaramesh
Blender addon Import/Export XPS Models, Poses
Stars: ✭ 262 (+482.22%)
Mutual labels:  export, import
Npoi.mapper
Use this tool to import or export data with Excel file. The tool is a convention based mapper between strong typed object and Excel data via NPOI.
Stars: ✭ 348 (+673.33%)
Mutual labels:  export, import
react-native-less-transformer
Use Less to style your React Native apps.
Stars: ✭ 26 (-42.22%)
Mutual labels:  loader, transform
Node Firestore Import Export
Firestore data import and export
Stars: ✭ 271 (+502.22%)
Mutual labels:  export, import
Simple Excel
Read and write simple Excel and CSV files
Stars: ✭ 502 (+1015.56%)
Mutual labels:  export, import
zabbix-review-export-import
Clone of zabbix-review-export with added import object(s) feature
Stars: ✭ 36 (-20%)
Mutual labels:  export, import
es2postgres
ElasticSearch to PostgreSQL loader
Stars: ✭ 18 (-60%)
Mutual labels:  etl, loader
Shrimpit
Shrimpit 🍤 is a small CLI analysis tool for checking unused JavaScript, JSX & Vue templates ES6 exports in your project.
Stars: ✭ 255 (+466.67%)
Mutual labels:  export, import
l2cu
L²CU: LDraw Linux Command line Utility
Stars: ✭ 14 (-68.89%)
Mutual labels:  export, import
Csvkeychain
Import/export between Apple Keychain.app and plain CSV file.
Stars: ✭ 281 (+524.44%)
Mutual labels:  export, import
Jailer
Database Subsetting and Relational Data Browsing Tool.
Stars: ✭ 576 (+1180%)
Mutual labels:  extract, export
React Native Svg Transformer
Import SVG files in your React Native project the same way that you would in a Web application.
Stars: ✭ 568 (+1162.22%)
Mutual labels:  transformer, loader
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-8.89%)
Mutual labels:  etl, export
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+2024.44%)
Mutual labels:  etl, export

Latest Stable Version License Build Status Coverage Status Quality Score Total Downloads

Okay, so you heard about the Extract / Transform / Load pattern and you're looking for a PHP library to do the stuff.

Alright, let's go!

Installation

composer require bentools/etl:^[email protected]

Warning: version 3.0 is a complete rewrite and a involves important BC breaks. Don't upgrade from ^2.0 unless you know what you're doing!

Usage

To sum up, you will apply transformations onto an iterable of any things in order to load them in some place. Sometimes your iterable is ready to go, sometimes you just don't need to perform transformations, but anyway you need to load that data somewhere.

Let's start with a really simple example:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Loader\JsonFileLoader;

$data = [
    'foo',
    'bar',
];

$etl = EtlBuilder::init()
    ->loadInto(JsonFileLoader::toFile(__DIR__.'/data.json'))
    ->createEtl();
$etl->process($data);

Basically you just loaded the string ["foo","bar"] into data.json. Yay!

Now let's apply a basic uppercase transformation:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Loader\JsonFileLoader;

$data = [
    'foo',
    'bar',
];

$etl = EtlBuilder::init()
    ->transformWith(new CallableTransformer('strtoupper'))
    ->loadInto(JsonFileLoader::factory())
    ->createEtl();
$etl->process($data, __DIR__.'/data.json'); // You can also set the output file when processing :-)

Didn't you just write the string ["FOO","BAR"] into data.json ? Yes, you did!

Okay, but what if your source data is not an iterable (yet)? It can be a CSV file or a CSV string, for instance. Here's another example:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Extractor\CsvExtractor;
use BenTools\ETL\Loader\JsonFileLoader;

$data = <<<CSV
country_code,country_name,president
US,USA,"Donald Trump"
RU,Russia,"Vladimir Putin"
CSV;

$etl = EtlBuilder::init()
    ->extractFrom(new CsvExtractor())
    ->loadInto(JsonFileLoader::factory(['json_options' => \JSON_PRETTY_PRINT]))
    ->createEtl();
$etl->process($data, __DIR__.'/data.json');

As you guessed, the following content was just written into presidents.json:

[
    {
        "country_code": "US",
        "country_name": "USA",
        "president": "Donald Trump"
    },
    {
        "country_code": "RU",
        "country_name": "Russia",
        "president": "Vladimir Putin"
    }
]

We provide helpful extractors and loaders to manipulate JSON, CSV, text, and you'll also find a DoctrineORMLoader for when your transformer yields Doctrine entities.

Because yes, a transformer must return a \Generator. Why? Because a single extracted item can lead to several output items. Let's take a more sophisticated example:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Extractor\JsonExtractor;

$pdo = new \PDO('mysql:host=localhost;dbname=test');
$input = __DIR__.'/presidents.json';

$etl = EtlBuilder::init()
    ->extractFrom(new JsonExtractor())
    ->transformWith(
        function ($item) use ($pdo) {
            $stmt = $pdo->prepare('SELECT country_code FROM countries WHERE country_code = ?');
            $stmt->bindValue(1, $item['country_code'], \PDO::PARAM_STR);
            $stmt->execute();
            if (0 === $stmt->rowCount()) {
                yield ['INSERT INTO countries (country_code, country_name) VALUES (?, ?)', [$item['country_code'], $item['country_name']]];
            }

            yield ['REPLACE INTO presidents (country_code, president_name) VALUES (?, ?)', [$item['country_code'], $item['president']]];

        }
    )
    ->loadInto(
        $loader = function (\Generator $queries) use ($pdo) {
            foreach ($queries as $query) {
                list($sql, $params) = $query;
                $stmt = $pdo->prepare($sql);
                foreach ($params as $i => $value) {
                    $stmt->bindValue($i + 1, $value);
                }
                $stmt->execute();
            }
        }
    )
    ->createEtl();

$etl->process(__DIR__.'/presidents.json'); // The JsonExtractor will convert that file to a PHP array

As you can see, from a single item, we loaded up to 2 queries.

Your extractors, transformers and loaders can implement ExtractorInterface, TransformerInterface or LoaderInterface as well as being simple callables.

Skipping items

Each extractor / transformer / loader callback gets the current Etl object injected in their arguments.

This allows you to ask the ETL to skip an item, or even to stop the whole process:

use BenTools\ETL\Etl;
use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Transformer\CallableTransformer;

$fruits = [
    'apple',
    'banana',
    'strawberry',
    'pineapple',
    'pear',
];


$storage = [];
$etl = EtlBuilder::init()
    ->transformWith(new CallableTransformer('strtoupper'))
    ->loadInto(
        function ($generated, $key, Etl $etl) use (&$storage) {
            foreach ($generated as $fruit) {
                if ('BANANA' === $fruit) {
                    $etl->skipCurrentItem();
                    break;
                }
                if ('PINEAPPLE' === $fruit) {
                    $etl->stopProcessing();
                    break;
                }
                $storage[] = $fruit;
            }
        })
    ->createEtl();

$etl->process($fruits);

var_dump($storage); // ['APPLE', 'STRAWBERRY']

Events

Now you're wondering how you can hook on the ETL lifecycle, to log things, handle exceptions, ... This library ships with a built-in Event Dispatcher that you can leverage when:

  • The ETL starts
  • An item has been extracted
  • The extraction failed
  • An item has been transformed
  • Transformation failed
  • Loader is initialized (1st item is about to be loaded)
  • An item has been loaded
  • Loading failed
  • An item has been skipped
  • The ETL was stopped
  • A flush operation was completed
  • A rollback operation was completed
  • The ETL completed the whole process.

The ItemEvents (on extract, transform, load) will allow you to mark the current item to be skipped, or even handle runtime exceptions. Let's take another example:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\EventDispatcher\Event\ItemExceptionEvent;

$fruits = [
    'apple',
    new \RuntimeException('Is tomato a fruit?'),
    'banana',
];


$storage = [];
$etl = EtlBuilder::init()
    ->transformWith(
        function ($item, $key) {
            if ($item instanceof \Exception) {
                throw $item;
            }

            yield $key => $item;
        })
    ->loadInto(
        function (iterable $transformed) use (&$storage) {
            foreach ($transformed as $fruit) {
                $storage[] = $fruit;
            }
        })
    ->onTransformException(
        function (ItemExceptionEvent $event) {
            echo $event->getException()->getMessage(); // Is tomato a fruit?
            $event->ignoreException();
        })
    ->createEtl();

$etl->process($fruits);

var_dump($storage); // ['apple', 'banana']

Here, we intentionnally threw an exception during the transform operation. But thanks to the event dispatcher, we could tell the ETL this exception can be safely ignored and it can pursue the rest of the process.

You can attach as many event listeners as you wish, and sort them by priority.

Recipes

A recipe is an ETL pattern that can be reused through different tasks. If you want to log everything that goes through an ETL for example, use our built-in Logger recipe:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\Recipe\LoggerRecipe;

$etl = EtlBuilder::init()
    ->useRecipe(new LoggerRecipe($logger))
    ->createEtl();

You can also create your own recipes:

use BenTools\ETL\EtlBuilder;
use BenTools\ETL\EventDispatcher\Event\ItemEvent;
use BenTools\ETL\Extractor\JsonExtractor;
use BenTools\ETL\Loader\CsvFileLoader;
use BenTools\ETL\Recipe\Recipe;


class JSONToCSVRecipe extends Recipe
{
    /**
     * @inheritDoc
     */
    public function updateBuilder(EtlBuilder $builder): EtlBuilder
    {
        return $builder
            ->extractFrom(new JsonExtractor())
            ->loadInto($loader = CsvFileLoader::factory(['delimiter' => ';']))
            ->onLoaderInit(
                function (ItemEvent $event) use ($loader) {
                    $loader::factory(['keys' => array_keys($event->getItem())], $loader);
                })
            ;
    }

}

$builder = EtlBuilder::init()->useRecipe(new JSONToCSVRecipe());
$etl = $builder->createEtl();
$etl->process(__DIR__.'/presidents.json', __DIR__.'/presidents.csv');

The above example will result in presidents.csv containing:

country_code;country_name;president
US;USA;"Donald Trump"
RU;Russia;"Vladimir Putin"

To sum up, a recipe is a kind of an ETLBuilder factory, but keep in mind that a recipe will only add event listeners to the existing builder but can also replace the builder's extractor, transformer and/or loader.

Tests

./vendor/bin/phpunit

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].