All Projects → ToucanToco → toucan-connectors

ToucanToco / toucan-connectors

Licence: BSD-3-Clause license
Connectors available to retrieve data in Toucan Toco small apps

Programming Languages

python
139335 projects - #7 most used programming language
PLpgSQL
1095 projects
shell
77523 projects
Jupyter Notebook
11667 projects
Dockerfile
14818 projects
Makefile
30231 projects
M4
1887 projects

Labels

Projects that are alternatives of or similar to toucan-connectors

covid-19
Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (+7.69%)
Mutual labels:  pandas
datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (+307.69%)
Mutual labels:  pandas
pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+1138.46%)
Mutual labels:  pandas
Engezny
Engezny is a python package that quickly generates all possible charts from your dataframe and saves them for you, and engezny is only supporting now uni-parameter visualization using the pie, bar and barh visualizations.
Stars: ✭ 25 (+92.31%)
Mutual labels:  pandas
Datscan
DatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (+0%)
Mutual labels:  pandas
Data-Science-101
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (+46.15%)
Mutual labels:  pandas
whyqd
data wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (+23.08%)
Mutual labels:  pandas
fal
do more with dbt. fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.
Stars: ✭ 567 (+4261.54%)
Mutual labels:  pandas
Data-Science-Resources
A guide to getting started with Data Science and ML.
Stars: ✭ 17 (+30.77%)
Mutual labels:  pandas
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+4607.69%)
Mutual labels:  pandas
trackanimation
Track Animation is a Python 2 and 3 library that provides an easy and user-adjustable way of creating visualizations from GPS data.
Stars: ✭ 74 (+469.23%)
Mutual labels:  pandas
grailer
web scraping tool for grailed.com
Stars: ✭ 30 (+130.77%)
Mutual labels:  pandas
pandas twitter
Analyzing Trump's tweets using Python (Pandas + Twitter workshop)
Stars: ✭ 81 (+523.08%)
Mutual labels:  pandas
tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
Stars: ✭ 212 (+1530.77%)
Mutual labels:  pandas
introduction to ml with python
도서 "[개정판] 파이썬 라이브러리를 활용한 머신 러닝"의 주피터 노트북과 코드입니다.
Stars: ✭ 211 (+1523.08%)
Mutual labels:  pandas
Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+592.31%)
Mutual labels:  pandas
pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (+7361.54%)
Mutual labels:  pandas
cracking-the-pandas-cheat-sheet
인프런 - 단 두 장의 문서로 데이터 분석과 시각화 뽀개기
Stars: ✭ 62 (+376.92%)
Mutual labels:  pandas
pybacen
This library was developed for economic analysis in the Brazilian scenario (Investments, micro and macroeconomic indicators)
Stars: ✭ 40 (+207.69%)
Mutual labels:  pandas
chatstats
💬📊 Fun data visualizations for Facebook Messenger chats
Stars: ✭ 18 (+38.46%)
Mutual labels:  pandas

Pypi-v Pypi-pyversions Pypi-l Pypi-wheel GitHub Actions Coverage

Toucan Connectors

Toucan Toco data connectors are plugins to the Toucan Toco platform. Their role is to return Pandas DataFrames from many different sources.

Components Diagram

Each connector is dedicated to a single type of source (PostrgeSQL, Mongo, Salesforce, etc...) and is made of two classes:

  • Connector which contains all the necessary information to use a data provider (e.g. hostname, auth method and details, etc...).
  • DataSource which contains all the information to get a dataframe (query, path, etc...) using the Connector class above.

The Toucan Toco platform instantiates these classes using values provided by Toucan admin and app designers, it then uses the following methods to get data and metadata:

  • Connector._retrieve_data returning an instance of pandas.DataFrame, method used to return data to a Toucan Toco end user
  • Connector.get_slice returning an instance of DataSlice, method used to return data to a Toucan Toco application designer when building a query.
  • Connector.get_status returning an instance of ConnectorStatus, method used to inform an admin or Toucan Toco application designer of the status of its connection to a third party data service. Is it reachable from our servers? Are the authentication details and method working? etc...

Installing for development

We use poetry for packaging and development. Use the following command to install the project for development:

poetry install -E all

Dependencies

This project uses make and Python 3.8. Install the main dependencies :

pip install -e .

We are using the setuptools construct extra_requires to define each connector's dependencies separately. For example to install the MySQL connector dependencies:

pip install -e ".[mysql]"

There is a shortcut called all to install all the dependencies for all the connectors. I do not recommend that you use this as a contributor to this package, but if you do, use the section below to install the necessary system packages.

pip install -e ".[all]"

You may face issues when instally the repo locally due to dependencies. That's why a dev container is available to be used with visual studio. Refer to this doc to use it.

System packages

Some connectors dependencies require specific system packages. As each connector can define its dependencies separatly you do not need this until you want to use these specific connectors.

ODBC

On linux, you're going to need bindings for unixodbc to install pyodbc from the requirements, and to install that (using apt), just follow:

sudo apt-get update
sudo apt-get install unixodbc-dev

MSSSQL

To test and use mssql (and azure_mssql) you need to install the Microsoft ODBC driver for SQL Server for Linux or MacOS

PostgreSQL

On macOS, to test the postgres connector, you need to install postgresql by running for instance brew install postgres. You can then install the library with env LDFLAGS='-L/usr/local/lib -L/usr/local/opt/openssl/lib -L/usr/local/opt/readline/lib' pip install psycopg2

Testing

We are using pytest and various packages of its ecosystem. To install the testing dependencies, run:

pip install -r requirements-testing.txt

As each connector is an independant plugin, its tests are written independently from the rest of the codebase. Run the tests for a specifc connector (http_api in this example) like this:

pytest tests/http_api

Note: running the tests above implies that you have installed the specific dependencies of the http_api connector (using the pip install -e .[http_api] command)

Our CI does run all the tests for all the connectors, like this:

pip install -e ".[all]"
make test

Some connectors are tested using mocks (cf. trello), others are tested by making calls to data providers (cf. elasticsearch) running on the system in docker containers. The required images are in the tests/docker-compose.yml file, they need to be pulled (cf. pytest --pull) to run the relevant tests.

Contributing

This is an open source repository under the BSD 3-Clause Licence. The Toucan Toco tech team are the maintainers of this repository, we welcome contributions.

At the moment the main use of this code is its integration into Toucan Toco commercially licenced software, as a result our dev and maintenance efforts applied here are mostly driven by Toucan Toco internal priorities.

The starting point of a contribution should be an Issue, either one you create or an existing one. This allows us (maintainers) to discuss the contribution before it is produced and avoids back and forth in reviews or stalled pull requests.

Step 1: Generate base classes and tests files

To generate the connector and test modules from boilerplate, run:

make new_connector type=mytype

mytype should be the name of a system we would like to build a connector for, such as MySQL or Hive or Magento.

Open the folder in tests for the new connector. You can start writing your tests before implementing it.

Some connectors are tested with calls to the actual data systems that they target, for example elasticsearch, mongo, mssql.

Others are tested with mocks of the classes or functions returning data that you are wrapping (see : HttpAPI, or microstrategy).

If you have a container for your target system, add a docker image in the docker-compose.yml, then use the pytest fixture service_container to automatically start the docker and shut it down for you when you are running tests.

The fixture will not pull the image for you for each test runs, you need to pull the image on your machine (at least once) using the pytest --pull option.

Step 2: New connector

Open the folder mytype in toucan_connectors for your new connector and create your classes.

import pandas as pd

# Careful here you need to import ToucanConnector from the deep path, not the __init__ path.
from toucan_connectors.toucan_connector import ToucanConnector, ToucanDataSource


class MyTypeDataSource(ToucanDataSource):
    """Model of my datasource"""
    query: str


class MyTypeConnector(ToucanConnector):
    """Model of my connector"""
    data_source_model: MyTypeDataSource

    host: str
    port: int
    database: str

    def _retrieve_data(self, data_source: MyTypeDataSource) -> pd.DataFrame:
        ...

    def get_slice(self, ...) -> DataSlice:
        ...

    def get_status(self) -> ConnectorStatus:
        ...

Step 3: Register your connector, add documentation

Add your connector in toucan_connectors/__init__.py. The key is what we call the type of the connector, which is an id used to retrieve it when used in Toucan Toco platform.

CONNECTORS_CATALOGUE = {
  ...,
  'MyType': 'mytype.mytype_connector.MyTypeConnector',
  ...
}

Add you connector requirements to the setup.py in the extras_require dictionary:

extras_require = {
    ...
    'mytype': ['my_dependency_pkg1==x.x.x', 'my_dependency_pkg2>=x.x.x']
}

If you need to add testing dependencies, add them to the requirements-testing.txt file.

You can now generate and edit the documentation page for your connector:

# Example: PYTHONPATH=. python doc/generate.py github > doc/connectors/github.md
PYTHONPATH=. python doc/generate.py myconnectormodule > doc/connectors/mytypeconnector.md

Step 4 : Create a pull request

Make sure your new code is properly formatted by running make lint. If it's not, please use make format. You can now create a pull request.

Publish

Install the wheel package:

pip install wheel

To publish the toucan-connectors package on pypi, use:

make build
make upload
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].