Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → shaypal5 → S3bp

shaypal5 / S3bp

Licence: mit

Read and write Python objects to S3, caching them on your hard drive to avoid unnecessary IO.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pandas s3 pandas-dataframe

Projects that are alternatives of or similar to S3bp

astro

Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Stars: ✭ 79 (+229.17%)

Mutual labels: s3, pandas

Data Science Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

Stars: ✭ 273 (+1037.5%)

Mutual labels: pandas, pandas-dataframe

Algorithmic-Trading

I have been deeply interested in algorithmic trading and systematic trading algorithms. This Repository contains the code of what I have learnt on the way. It starts form some basic simple statistics and will lead up to complex machine learning algorithms.

Stars: ✭ 47 (+95.83%)

Mutual labels: pandas-dataframe, pandas

Sidetable

sidetable builds simple but useful summary tables of your data

Stars: ✭ 217 (+804.17%)

Mutual labels: pandas, pandas-dataframe

Pandera

A light-weight, flexible, and expressive pandas data validation library

Stars: ✭ 506 (+2008.33%)

Mutual labels: pandas, pandas-dataframe

Locopy

locopy: Loading/Unloading to Redshift and Snowflake using Python.

Stars: ✭ 73 (+204.17%)

Mutual labels: s3, pandas

data-analysis-using-python

Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data

Stars: ✭ 81 (+237.5%)

Mutual labels: pandas-dataframe, pandas

Rightmove webscraper.py

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object

Stars: ✭ 125 (+420.83%)

Mutual labels: pandas, pandas-dataframe

Dataframe Go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Stars: ✭ 487 (+1929.17%)

Mutual labels: pandas, pandas-dataframe

Pytablewriter

pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.

Stars: ✭ 422 (+1658.33%)

Mutual labels: pandas, pandas-dataframe

Data Science Projects With Python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

Stars: ✭ 198 (+725%)

Mutual labels: pandas, pandas-dataframe

Just Pandas Things

An ongoing list of pandas quirks

Stars: ✭ 660 (+2650%)

Mutual labels: pandas, pandas-dataframe

Repository to store sample python programs for python learning

Stars: ✭ 4,154 (+17208.33%)

Mutual labels: pandas, pandas-dataframe

cracking-the-pandas-cheat-sheet

인프런 - 단 두 장의 문서로 데이터 분석과 시각화 뽀개기

Stars: ✭ 62 (+158.33%)

Mutual labels: pandas-dataframe, pandas

Swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

Stars: ✭ 1,844 (+7583.33%)

Mutual labels: pandas, pandas-dataframe

skippa

SciKIt-learn Pipeline in PAndas

Stars: ✭ 33 (+37.5%)

Mutual labels: pandas-dataframe, pandas

Df2gspread

Manage Google Spreadsheets in Pandas DataFrame with Python

Stars: ✭ 114 (+375%)

Mutual labels: pandas, pandas-dataframe

Gspread Dataframe

Read/write Google spreadsheets using pandas DataFrames

Stars: ✭ 118 (+391.67%)

Mutual labels: pandas, pandas-dataframe

Prettypandas

A Pandas Styler class for making beautiful tables

Stars: ✭ 376 (+1466.67%)

Mutual labels: pandas, pandas-dataframe

Pdpipe

Easy pipelines for pandas DataFrames.

Stars: ✭ 590 (+2358.33%)

Mutual labels: pandas, pandas-dataframe

View All Similar Projects ➔

s3bp = S3-backed Python (objects)

Read and write Python objects from/to S3, caching them on your hard drive to avoid unnecessary IO. Special care given to pandas dataframes.

.. code-block:: python

import s3bp
s3bp.save_object(name_to_id_dict, filepath, 'user-data-bucket')
last_week_dataset = s3bp.load_object(second_filepath, 'my-dataset-s3-bucket')

Dependencies and Setup

s3bp uses the following packages:

boto3
botocore (instaled with boto3)
dateutil (a.k.a. python-dateutil)
pyyaml
pandas
feather-format

The boto3 package itself requires that you have an AWS config file at ~/.aws/config with your AWS account credentials to successfully communicate with AWS. Read here_ on how you can configure it.

You can install s3bp using:

.. code-block:: python

pip install s3bp

Use

Saving

Save an object to your bucket with:

.. code-block:: python

    import s3bp
    name_to_id_dict = {'Dan': 8382, 'Alon': 2993}
    s3bp.save_object(name_to_id_dict, '~/Documents/data_files/name_to_id_map', 'user-data-bucket')

File upload is done asynchronously and in the background by default, only printing exceptions (and not throwing them). If you'd like to wait on your upload, and/or for a failed upload to raise an exception rather than print one, set ``wait=True``:

.. code-block:: python

    s3bp.save_object(name_to_id_dict, '~/Documents/data_files/name_to_id_map', 'user-data-bucket', wait=True)

Loading

Load an object from your bucket with:

.. code-block:: python

name_to_id_dict = s3bp.load_object('~/Documents/data_files/name_to_id_map', 'user-data-bucket')

Notice that if the most updated version is already on your hard drive, it will be loaded from disk. If, however, a more updated version is found on the S3 (determined by comparing modification time), or if the file is not present, it will be downloaded from S3. Furthermore, any missing directories on the path will be created.

Serialization Format


Objects are saved as Python pickle files by default. You can change the way objects are serialized by providing a different serializer when calling ``save_object``. A serializer is a callable that takes two positonal arguments - a Python object and a path to a file - and dumps the object to the given file. It doesn't have to serialize all Python objects successfully.

For example:

.. code-block:: python

    def pandas_df_csv_serializer(pyobject, filepath):
        pyobject.to_csv(filepath)
    
    import pandas as pd
    df1 = pd.Dataframe(data=[[1,3],[6,2]], columns=['A','B'], index=[1,2])
    s3bp.save_object(df1, '~/Documents/data_files/my_frame.csv', 'user-data-bucket', serializer=pandas_df_csv_serializer)

Notice that a corresponding deserializer will have to be provided when loading the object by providing ``load_object`` with a deserializing callable through the ``deserializer`` keyword argument.

Default Bucket
~~~~~~~~~~~~~~
You can set a default bucket with:

.. code-block:: python

    s3bp.set_default_bucket('user-data-bucket')

You can now load and save objects without specifying a bucket, in which case the default bucket will be used:

.. code-block:: python

    profile_dict = s3bp.load_object('~/Documents/data_files/profile_map')

Once set, your configuration will presist through sessions. If you'd like to unset the default bucket - making operations with no bucket specification fail - use ``s3bp.unset_default_bucket()``.

Base Directories
~~~~~~~~~~~~~~~~
You can set a specific directory as a base directory, mapping it to a specific bucket, using:

.. code-block:: python

    s3bp.map_base_directory_to_bucket('~/Desktop/labels', 'my-labels-s3-bucket')

Now, saving or loading objects from files in that directory - including sub-directories - will automatically use the mapped bucket, unless a different bucket is given explicitly. Furthermore, the files uploaded to the bucket will not be keyed by their file name, but by the sub-path rooted at the given base directory.

This effectively results in replicating the directory tree rooted at this directory on the bucket. For example, given the above mapping, saving an object to the path ``~/Desktop/labels/user_generated/skunks.csv`` will also create a ``labels`` folder on the ``my-labels-s3-bucket``, a ``user_generated`` folder inside it and will upload the file into ``labels/user_generated``.

**You can add as many base directories as you want**, and can map several to the same bucket, or each to a different one.

This can be used both to automatocally backup entire folders (and their sub-folder structure) to S3 and to synchronize these kind of folders over different machines reading and writing Dataframes into them at different times.


Pandas love <3
--------------

Special care is given to pandas Dataframe objects, for which a couple of dedicated wrapper methods and several serializers are already defined. To save a dataframe use:

.. code-block:: python

    import s3bp
    import pandas as pd
    df1 = pd.Dataframe(data=[[1,3],[6,2]], columns=['A','B'], index=[1,2])
    s3bp.save_dataframe(df1, '~/Desktop/datasets/weasels.csv', 'my-datasets-s3-bucket')

This will use the default CSV serializer to save the dataframe to disk.
Similarly, you can load a dataframe from your bucket with:

.. code-block:: python

    df1 = s3bp.load_dataframe('~/Desktop/datasets/weasels.csv', 'my-datasets-s3-bucket')

To use another format assign the corresponding string to the ``format`` keyword:

.. code-block:: python

    s3bp.save_dataframe(df1, '~/Desktop/datasets/weasels.csv', 'my-datasets-s3-bucket', format='feather')

Suported pandas Dataframes serialization formats:

* CSV
* Excel
* Feather (see `the feather package`_)

.. links:
.. _the feather package: https://github.com/wesm/feather
.. _Read here: http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 24

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗