Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → capitalone → Locopy

capitalone / Locopy

Licence: apache-2.0

locopy: Loading/Unloading to Redshift and Snowflake using Python.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

database aws sql data pandas s3 etl copy snowflake redshift

Projects that are alternatives of or similar to Locopy

Fluentmigrator

Fluent migrations framework for .NET

Stars: ✭ 2,636 (+3510.96%)

Mutual labels: snowflake, sql, database, redshift

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+3167.12%)

Mutual labels: aws, pandas, etl, redshift

astro

Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Stars: ✭ 79 (+8.22%)

Mutual labels: etl, s3, snowflake, pandas

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (+21.92%)

Mutual labels: aws, s3, etl, redshift

Awesome Aws

A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.

Stars: ✭ 9,895 (+13454.79%)

Mutual labels: aws, s3, redshift

Luigi Warehouse

A luigi powered analytics / warehouse stack

Stars: ✭ 72 (-1.37%)

Mutual labels: aws, etl, redshift

Amazon S3 Find And Forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (+57.53%)

Mutual labels: aws, s3, data

Bitcoin Etl

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 174 (+138.36%)

Mutual labels: aws, sql, etl

Yuniql

Free and open source schema versioning and database migration made natively with .NET Core.

Stars: ✭ 156 (+113.7%)

Mutual labels: snowflake, sql, redshift

Awesome Business Intelligence

Actively curated list of awesome BI tools. PRs welcome!

Stars: ✭ 1,157 (+1484.93%)

Mutual labels: sql, etl, database

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (-58.9%)

Mutual labels: etl, snowflake, redshift

Linq2db

Linq to database provider.

Stars: ✭ 2,211 (+2928.77%)

Mutual labels: sql, etl, database

Deveeldb

DeveelDB is a complete SQL database system, primarly developed for .NET/Mono frameworks

Stars: ✭ 80 (+9.59%)

Mutual labels: sql, database, data

Reddit Detective

Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more

Stars: ✭ 129 (+76.71%)

Mutual labels: etl, database, data

Ethereum Etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 956 (+1209.59%)

Mutual labels: aws, sql, etl

starlake

Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing

Stars: ✭ 16 (-78.08%)

Mutual labels: etl, snowflake, redshift

Datacleaner

The premier open source Data Quality solution

Stars: ✭ 391 (+435.62%)

Mutual labels: etl, database, data

Data Science Best Resources

Carefully curated resource links for data science in one place

Stars: ✭ 1,104 (+1412.33%)

Mutual labels: aws, sql, database

Terraform Aws S3 Log Storage

This module creates an S3 bucket suitable for receiving logs from other AWS services such as S3, CloudFront, and CloudTrail

Stars: ✭ 65 (-10.96%)

Mutual labels: aws, s3

React Deploy S3

Deploy create react app's in AWS S3

Stars: ✭ 66 (-9.59%)

Mutual labels: aws, s3

View All Similar Projects ➔

.. image:: https://github.com/capitalone/locopy/workflows/Python%20package/badge.svg :target: https://github.com/capitalone/locopy/actions .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/ambv/black

locopy: Data Load and Copy using Python

A Python library to assist with ETL processing for:

Amazon Redshift (COPY, UNLOAD)
Snowflake (COPY INTO <table>, COPY INTO <location>)

In addition:

The library supports Python 3.6 to 3.8
DB Driver (Adapter) agnostic. Use your favourite driver that complies with DB-API 2.0 <https://www.python.org/dev/peps/pep-0249/>_
It provides functionality to download and upload data to S3 buckets, and internal stages (Snowflake)

Quick Installation

.. code-block:: bash

pip install locopy

or install from conda-forge

.. code-block:: bash

conda config --add channels conda-forge
conda install locopy

Installation instructions

A virtual or conda environment is highly recommended

.. code-block:: bash

$ virtualenv locopy
$ source locopy/bin/activate
$ pip install --upgrade setuptools pip
$ pip install locopy

Python Database API Specification 2.0

Rather than using a specific Python DB Driver / Adapter for Postgres (which should supports Amazon Redshift or Snowflake), locopy prefers to be agnostic. As an end user you can use any Python Database API Specification 2.0 package.

The following packages have been tested:

psycopg2
pg8000
snowflake-connector-python

You can use which ever one you prefer by importing the package and passing it into the constructor input dbapi.

Usage

You need to store your connection parameters in a YAML file (or pass them in directly). The YAML would consist of the following items:

.. code-block:: yaml

# required to connect to redshift
host: my.redshift.cluster.com
port: 5439
database: db
user: userid
password: password
## optional extras for the dbapi connector
sslmode: require
another_option: 123

If you aren't loading data, you don't need to have AWS tokens set up. The Redshift connection (Redshift) can be used like this:

.. code-block:: python

import pg8000
import locopy

with locopy.Redshift(dbapi=pg8000, config_yaml="config.yml") as redshift:
    redshift.execute("SELECT * FROM schema.table")
    df = redshift.to_dataframe()
print(df)

If you want to load data to Redshift via S3, the Redshift class inherits from S3:

.. code-block:: python

import pg8000
import locopy

with locopy.Redshift(dbapi=pg8000, config_yaml="config.yml") as redshift:
    redshift.execute("SET query_group TO quick")
    redshift.execute("CREATE TABLE schema.table (variable VARCHAR(20)) DISTKEY(variable)")
    redshift.load_and_copy(
        local_file="example/example_data.csv",
        s3_bucket="my_s3_bucket",
        table_name="schema.table",
        delim=",")
    redshift.execute("SELECT * FROM schema.table")
    res = redshift.cursor.fetchall()

print(res)

If you want to download data from Redshift to a CSV, or read it into Python

.. code-block:: python

my_profile = "some_profile_with_valid_tokens"
with locopy.Redshift(dbapi=pg8000, config_yaml="config.yml", profile=my_profile) as redshift:
    ##Optionally provide export if you ALSO want the exported data copied to a flat file
    redshift.unload_and_copy(
        query="SELECT * FROM schema.table",
        s3_bucket="my_s3_bucket",
        export_path="my_output_destination.csv")

Note on tokens ^^^^^^^^^^^^^^

To load data to S3, you will need to be able to generate AWS tokens, or assume the IAM role on a EC2 instance. There are a few options for doing this, depending on where you're running your script and how you want to handle tokens. Once you have your tokens, they need to be accessible to the AWS command line interface. See http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#config-settings-and-precedence for more information, but you can:

Populate environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, etc.
Leverage the AWS credentials file. If you have multiple profiles configured you can either call locopy.Redshift(profile="my-profile"), or set up an environment variable AWS_DEFAULT_PROFILE.
If you are on a EC2 instance you can assume the credentials associated with the IAM role attached.

Advanced Usage

See the docs <https://capitalone.github.io/locopy/>_ for more detailed usage instructions and examples including Snowflake.

Contributors

We welcome and appreciate your contributions! Before we can accept any contributions, we ask that you please be sure to sign the Contributor License Agreement (CLA) <https://cla-assistant.io/capitalone/locopy>_.

This project adheres to the Open Source Code of Conduct <https://developer.capitalone.com/resources/code-of-conduct/>_. By participating, you are expected to honor this code.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 73

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗