All Projects → datajoint → datajoint-python

datajoint / datajoint-python

Licence: LGPL-2.1 license
Relational data pipelines for the science lab

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to datajoint-python

Tiledb
The Universal Storage Engine
Stars: ✭ 1,072 (+665.71%)
Mutual labels:  s3, scientific-computing, data-analysis
Geoweaver
a web system to allow users to automatically record history and manage complicated scientific workflows in web browsers involving the online spatial data facilities, high-performance computation platforms, and open-source libraries.
Stars: ✭ 32 (-77.14%)
Mutual labels:  pipeline-framework, scientific-computing, workflow-management
Visit
VisIt - Visualization and Data Analysis for Mesh-based Scientific Data
Stars: ✭ 140 (+0%)
Mutual labels:  scientific-computing, data-analysis
Matplotplusplus
Matplot++: A C++ Graphics Library for Data Visualization 📊🗾
Stars: ✭ 2,433 (+1637.86%)
Mutual labels:  scientific-computing, data-analysis
activerecord-setops
Union, Intersect, and Difference set operations for ActiveRecord (also, SQL's UnionAll).
Stars: ✭ 21 (-85%)
Mutual labels:  relational-databases, relational-algebra
Gdl
GDL - GNU Data Language
Stars: ✭ 104 (-25.71%)
Mutual labels:  scientific-computing, data-analysis
Freud
Powerful, efficient particle trajectory analysis in scientific Python.
Stars: ✭ 118 (-15.71%)
Mutual labels:  scientific-computing, data-analysis
MERlin
MERlin is an extensible analysis pipeline applied to decoding MERFISH data
Stars: ✭ 19 (-86.43%)
Mutual labels:  pipeline-framework, cloud-computing
Gonum
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
Stars: ✭ 5,384 (+3745.71%)
Mutual labels:  scientific-computing, data-analysis
CC33Z
Curso de Ciência da Computação
Stars: ✭ 50 (-64.29%)
Mutual labels:  databases, data-analysis
Awesome Nosql Guides
💻 Curated list of awesome resources and links about using NoSQL databases
Stars: ✭ 116 (-17.14%)
Mutual labels:  databases, relational-databases
Pytim
a python package for the interfacial analysis of molecular simulations
Stars: ✭ 38 (-72.86%)
Mutual labels:  scientific-computing, data-analysis
Edge
Extreme-scale Discontinuous Galerkin Environment (EDGE)
Stars: ✭ 18 (-87.14%)
Mutual labels:  scientific-computing, cloud-computing
Awesome Scientific Python
A curated list of awesome scientific Python resources
Stars: ✭ 127 (-9.29%)
Mutual labels:  scientific-computing, data-analysis
Reflow
A language and runtime for distributed, incremental data processing in the cloud
Stars: ✭ 706 (+404.29%)
Mutual labels:  scientific-computing, cloud-computing
Collapse
Advanced and Fast Data Transformation in R
Stars: ✭ 184 (+31.43%)
Mutual labels:  scientific-computing, data-analysis
Kneed
Knee point detection in Python 📈
Stars: ✭ 328 (+134.29%)
Mutual labels:  scientific-computing, data-analysis
Gop
GoPlus - The Go+ language for engineering, STEM education, and data science
Stars: ✭ 7,829 (+5492.14%)
Mutual labels:  scientific-computing, data-analysis
prefect-docker-compose
A simple guide to understand Prefect and make it work with your own docker-compose configuration.
Stars: ✭ 122 (-12.86%)
Mutual labels:  s3, workflow-management
Awstaghelper
AWS bulk tagging tool
Stars: ✭ 98 (-30%)
Mutual labels:  s3, cloud-computing

DOI Build Status Coverage Status PyPI version Requirements Status Slack

Welcome to DataJoint for Python!

DataJoint for Python is a framework for scientific workflow management based on relational principles. DataJoint is built on the foundation of the relational data model and prescribes a consistent method for organizing, populating, computing, and querying data.

DataJoint was initially developed in 2009 by Dimitri Yatsenko in Andreas Tolias' Lab at Baylor College of Medicine for the distributed processing and management of large volumes of data streaming from regular experiments. Starting in 2011, DataJoint has been available as an open-source project adopted by other labs and improved through contributions from several developers. Presently, the primary developer of DataJoint open-source software is the company DataJoint (https://datajoint.com). Related resources are listed at https://datajoint.org.

Installation

pip3 install datajoint

If you already have an older version of DataJoint installed using pip, upgrade with

pip3 install --upgrade datajoint

Documentation and Tutorials

Citation

  • If your work uses DataJoint for Python, please cite the following Research Resource Identifier (RRID) and manuscript.

  • DataJoint (RRID:SCR_014543) - DataJoint for Python (version <Enter version number>)

  • Yatsenko D, Reimer J, Ecker AS, Walker EY, Sinz F, Berens P, Hoenselaar A, Cotton RJ, Siapas AS, Tolias AS. DataJoint: managing big scientific data using MATLAB or Python. bioRxiv. 2015 Jan 1:031658. doi: https://doi.org/10.1101/031658

Python Native Blobs

Click to expand details

DataJoint 0.12 adds full support for all native python data types in blobs: tuples, lists, sets, dicts, strings, bytes, None, and all their recursive combinations. The new blobs are a superset of the old functionality and are fully backward compatible. In previous versions, only MATLAB-style numerical arrays were fully supported. Some Python datatypes such as dicts were coerced into numpy recarrays and then fetched as such.

However, since some Python types were coerced into MATLAB types, old blobs and new blobs may now be fetched as different types of objects even if they were inserted the same way. For example, new dict objects will be returned as dict while the same types of objects inserted with datajoint 0.11 will be recarrays.

Since this is a big change, we chose to temporarily disable this feature by default in DataJoint for Python 0.12.x, allowing users to adjust their code if necessary. From 13.x, the flag will default to True (on), and will ultimately be removed when corresponding decode support for the new format is added to datajoint-matlab (see: datajoint-matlab #222, datajoint-python #765).

The flag is configured by setting the enable_python_native_blobs flag in dj.config.

import datajoint as dj
dj.config["enable_python_native_blobs"] = True

You can safely enable this setting if both of the following are true:

  • The only kinds of blobs your pipeline have inserted previously were numerical arrays.
  • You do not need to share blob data between Python and MATLAB.

Otherwise, read the following explanation.

DataJoint v0.12 expands DataJoint's blob serialization mechanism with improved support for complex native python datatypes, such as dictionaries and lists of strings.

Prior to DataJoint v0.12, certain python native datatypes such as dictionaries were 'squashed' into numpy structured arrays when saved into blob attributes. This facilitated easier data sharing between MATLAB and Python for certain record types. However, this created a discrepancy between insert and fetch datatypes which could cause problems in other portions of users pipelines.

DataJoint v0.12, removes the squashing behavior, instead encoding native python datatypes in blobs directly. However, this change creates a compatibility problem for pipelines which previously relied on the type squashing behavior since records saved via the old squashing format will continue to fetch as structured arrays, whereas new record inserted in DataJoint 0.12 with enable_python_native_blobs would result in records returned as the appropriate native python type (dict, etc).
Furthermore, DataJoint for MATLAB does not yet support unpacking native Python datatypes.

With dj.config["enable_python_native_blobs"] set to False, any attempt to insert any datatype other than a numpy array will result in an exception. This is meant to get users to read this message in order to allow proper testing and migration of pre-0.12 pipelines to 0.12 in a safe manner.

The exact process to update a specific pipeline will vary depending on the situation, but generally the following strategies may apply:

  • Altering code to directly store numpy structured arrays or plain multidimensional arrays. This strategy is likely best one for those tables requiring compatibility with MATLAB.
  • Adjust code to deal with both structured array and native fetched data for those tables that are populated with dicts in blobs in pre-0.12 version. In this case, insert logic is not adjusted, but downstream consumers are adjusted to handle records saved under the old and new schemes.
  • Migrate data into a fresh schema, fetching the old data, converting blobs to a uniform data type and re-inserting.
  • Drop/Recompute imported/computed tables to ensure they are in the new format.

As always, be sure that your data is safely backed up before modifying any important DataJoint schema or records.

API docs

The API documentation can be built using sphinx by running

pip install sphinx sphinx_rtd_theme 
(cd docs-api/sphinx && make html)

Generated docs are written to docs-api/docs/html/index.html. More details in docs-api/README.md.

Running Tests Locally

Click to expand details
  • Create an .env with desired development environment values e.g.
PY_VER=3.7
ALPINE_VER=3.10
MYSQL_VER=5.7
MINIO_VER=RELEASE.2021-09-03T03-56-13Z
HOST_UID=1000
HOST_GID=1000
  • cp local-docker-compose.yml docker-compose.yml
  • docker-compose up -d (Note configured JUPYTER_PASSWORD)
  • Select a means of running Tests e.g. Docker Terminal, or Local Terminal (see bottom)
  • Add entry in /etc/hosts for 127.0.0.1 fakeservices.datajoint.io
  • Run desired tests. Some examples are as follows:
Use Case Shell Code
Run all tests nosetests -vsw tests --with-coverage --cover-package=datajoint
Run one specific class test nosetests -vs --tests=tests.test_fetch:TestFetch.test_getattribute_for_fetch1
Run one specific basic test nosetests -vs --tests=tests.test_external_class:test_insert_and_fetch

Launch Docker Terminal

  • Shell into datajoint-python_app_1 i.e. docker exec -it datajoint-python_app_1 sh

Launch Local Terminal

  • See datajoint-python_app environment variables in local-docker-compose.yml
  • Launch local terminal
  • export environment variables in shell
  • Add entry in /etc/hosts for 127.0.0.1 fakeservices.datajoint.io

Launch Jupyter Notebook for Interactive Use

  • Navigate to localhost:8888
  • Input Jupyter password
  • Launch a notebook i.e. New > Python 3
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].