All Projects → aiguofer → Gspread Pandas

aiguofer / Gspread Pandas

Licence: bsd-3-clause
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gspread Pandas

Datasheets
Read data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (+162.39%)
Mutual labels:  google, data-science, pandas, data-analytics, data
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+20.8%)
Mutual labels:  data-science, pandas, data
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-24.34%)
Mutual labels:  data-science, data-analytics, data
Pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+186.28%)
Mutual labels:  pandas, data-engineering, data
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+2076.55%)
Mutual labels:  data-science, data, data-engineering
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+568.58%)
Mutual labels:  data-science, data-engineering, data
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (+161.06%)
Mutual labels:  data-science, pandas, data
Danfojs
danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Stars: ✭ 1,304 (+476.99%)
Mutual labels:  data-science, pandas, data-analytics
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+18764.6%)
Mutual labels:  data-science, data-engineering, data-analytics
Awesome Bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+4536.28%)
Mutual labels:  data-science, data-analytics, data
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+955.31%)
Mutual labels:  data-science, pandas, data-engineering
D6t Python
Accelerate data science
Stars: ✭ 118 (-47.79%)
Mutual labels:  data-science, pandas, data-engineering
Datacompy
Pandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-34.96%)
Mutual labels:  data-science, pandas, data
Auptimizer
An automatic ML model optimization tool.
Stars: ✭ 166 (-26.55%)
Mutual labels:  data-science, data-engineering
Exportsheetdata
Add-on for Google Sheets that allows sheets to be exported as JSON or XML.
Stars: ✭ 170 (-24.78%)
Mutual labels:  google-sheets, data
Pandas Datareader
Extract data from a wide range of Internet sources into a pandas DataFrame.
Stars: ✭ 2,183 (+865.93%)
Mutual labels:  pandas, data
Soda Sql
Metric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (-23.45%)
Mutual labels:  data-science, data-engineering
Andrew Ng Notes
This is Andrew NG Coursera Handwritten Notes.
Stars: ✭ 180 (-20.35%)
Mutual labels:  data-science, pandas
California Coronavirus Data
The Los Angeles Times' independent tally of coronavirus cases in California.
Stars: ✭ 188 (-16.81%)
Mutual labels:  pandas, data
Free Ai Resources
🚀 FREE AI Resources - 🎓 Courses, 👷 Jobs, 📝 Blogs, 🔬 AI Research, and many more - for everyone!
Stars: ✭ 192 (-15.04%)
Mutual labels:  data-science, data

=============== Getting Started

.. image:: https://img.shields.io/pypi/v/gspread-pandas.svg :target: https://pypi.python.org/pypi/gspread-pandas :alt: PyPI Version

.. image:: https://img.shields.io/travis/aiguofer/gspread-pandas.svg :target: https://travis-ci.org/aiguofer/gspread-pandas :alt: Travis-CI Build Status

.. image:: https://readthedocs.org/projects/gspread-pandas/badge/?version=latest :target: https://gspread-pandas.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

author: Diego Fernandez

Links:

  • Documentation <http://gspread-pandas.readthedocs.io/>_
  • Source code <https://github.com/aiguofer/gspread-pandas>_
  • Short video tutorial <https://youtu.be/2yIcNYzfzPw>_

.. attention:: Upgrading from < 2.0

If you are upgrading, the ``user`` is now an optional param that
uses ``default`` as the default. If you're a single user, you might
want to re-name your credentials to ``default`` so you can stop
specifying it:

.. code-block:: console

    mv ~/.config/gspread_pandas/creds{<old_name>,default}

Overview

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames. It enables you to easily pull data from Google spreadsheets into DataFrames as well as push data into spreadsheets from DataFrames. It leverages gspread <https://github.com/burnash/gspread/>__ in the backend for most of the heavylifting, but it has a lot of added functionality to handle things specific to working with DataFrames as well as some extra nice to have features.

The target audience are Data Analysts and Data Scientists, but it can also be used by Data Engineers or anyone trying to automate workflows with Google Sheets and Pandas.

Some key goals/features:

  • Be easy to use interactively, with good docstrings and auto-completion
  • Nicely handle headers and indexes (including multi-level headers and merged cells)
  • Run on Jupyter, headless server, and/or scripts
  • Allow storing different user credentials or using Service Accounts
  • Automatically handle token refreshes
  • Enable handling of frozen rows and columns
  • Enable filling in all merged cells when pulling data
  • Nicely handle large data sets and auto-retries
  • Enable creation of filters
  • Handle retries when exceeding 100 second user quota
  • When pushing DataFrames with MultiIndex columns, allow merging or flattening headers
  • Ability to handle Spreadsheet permissions
  • Ability to specify ValueInputOption and ValueRenderOption for specific columns

Installation / Usage

To install use pip:

.. code-block:: console

$ pip install gspread-pandas

Or clone the repo:

.. code-block:: console

$ git clone https://github.com/aiguofer/gspread-pandas.git
$ python setup.py install

Before using, you will need to download Google client credentials for your app.

Client Credentials

To allow a script to use Google Drive API we need to authenticate our self towards Google. To do so, we need to create a project, describing the tool and generate credentials. Please use your web browser and go to Google console <https://console.developers.google.com/>__ and :

  • Choose Create Project in popup menu on the top.

  • A dialog box appears, so give your project a name and click on Create button.

  • On the left-side menu click on API Manager.

  • A table of available APIs is shown. Switch Drive API and click on Enable API button. Do the same for Sheets API. Other APIs might be switched off, for our purpose.

  • On the left-side menu click on Credentials.

  • In section OAuth consent screen select your email address and give your product a name. Then click on Save button.

  • In section Credentials click on Add credentials and switch OAuth client ID (if you want to use your own account or enable the use of multiple accounts) or Service account key (if you prefer to have a service account interacting with spreadsheets).

  • If you select OAuth client ID:

    • Select Application type item as Other and give it a name.
    • Click on Create button.
    • Click on Download JSON icon on the right side of created OAuth client IDs and store the downloaded file on your file system.
  • If you select Service account key

    • Click on Service account dropdown and select New service account
    • Give it a Service account name and ignore the Role dropdown (unless you know you need this for something else, it's not necessary for working with spreadsheets)
    • Note the Service account ID as you might need to give that user permission to interact with your spreadsheets
    • Leave Key type as JSON
    • Click Create and store the downloaded file on your file system.
  • Please be aware, the file contains your private credentials, so take care of the file in the same way you care of your private SSH key; Move the downloaded JSON to ~/.config/gspread_pandas/google_secret.json (or you can configure the directory and file name by directly calling gspread_pandas.conf.get_config

Thanks to similar project df2gspread <https://github.com/maybelinot/df2gspread>__ for this great description of how to get the client credentials.

You can read more about it in the configuration docs <https://gspread-pandas.readthedocs.io/en/latest/configuration.html>__ including how to change the default behavior.

Example

.. code:: python

from __future__ import print_function
import pandas as pd
from gspread_pandas import Spread, Client

file_name = "http://stats.idre.ucla.edu/stat/data/binary.csv"
df = pd.read_csv(file_name)

# 'Example Spreadsheet' needs to already exist and your user must have access to it
spread = Spread('Example Spreadsheet')
# This will ask to authenticate if you haven't done so before

# Display available worksheets
spread.sheets

# Save DataFrame to worksheet 'New Test Sheet', create it first if it doesn't exist
spread.df_to_sheet(df, index=False, sheet='New Test Sheet', start='A2', replace=True)
spread.update_cells('A1', 'A1', ['Created by:', spread.email])
print(spread)
# <gspread_pandas.client.Spread - User: '<example_user>@gmail.com', Spread: 'Example Spreadsheet', Sheet: 'New Test Sheet'>

# You can now first instanciate a Client separately and query folders and
# instanciate other Spread objects by passing in the Client
client = Client()
# Assumming you have a dir called 'example dir' with sheets in it
available_sheets = client.find_spreadsheet_files_in_folders('example dir')
spreads = []
for sheet in available_sheets.get('example dir', []):
    spreads.append(Spread(sheet['id'], client=client))

Troubleshooting

SSL Error

If you're getting an SSL related error or can't seem to be able to open existing spreadsheets that you have access to, you might be running into an issue caused by certifi. This has mainly been experienced on RHEL and CentOS running Python 2.7. You can read more about it in issue 223 <https://github.com/burnash/gspread/issues/223>_ and issue 354 <https://github.com/burnash/gspread/issues/354>_ but, in short, the solution is to either install a specific version of certifi that works for you, or remove it altogether.

.. code-block:: console

pip install certifi==2015.4.28

or

.. code-block:: console

pip uninstall certifi

EOFError in Rodeo

If you're trying to use gspread_pandas from within Rodeo <https://www.yhat.com/products/rodeo>_ you might get an EOFError: EOF when reading a line error when trying to pass in the verification code. The workaround for this is to first verify your account in a regular shell. Since you're just doing this to get your Oauth token, the spreadsheet doesn't need to be valid. Just run this in shell:

.. code:: python

python -c "from gspread_pandas import Spread; Spread('<user_key>','')"

Then follow the instructions to create and store the OAuth creds.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].