All Projects → google → Tensorflow Recorder

google / Tensorflow Recorder

Licence: apache-2.0
TFRecorder makes it easy to create TensorFlow records (TFRecords) from Pandas DataFrames and CSVs files containing images or structured data.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tensorflow Recorder

Serverless Image Processor
AWS Lambda image processor
Stars: ✭ 106 (-10.92%)
Mutual labels:  image-processing
Pyrate
A Python tool for estimating velocity and time-series from Interferometric Synthetic Aperture Radar (InSAR) data.
Stars: ✭ 110 (-7.56%)
Mutual labels:  image-processing
Autoannotationtool
A label tool aim to reduce semantic segmentation label time, rectangle and polygon annotation is supported
Stars: ✭ 113 (-5.04%)
Mutual labels:  image-processing
Sod
An Embedded Computer Vision & Machine Learning Library (CPU Optimized & IoT Capable)
Stars: ✭ 1,460 (+1126.89%)
Mutual labels:  image-processing
Deltacv
An open-source high performance library for image processing
Stars: ✭ 110 (-7.56%)
Mutual labels:  image-processing
Ios Rubik Solver
An iOS app that detects a 3x3 Rubik's cube, recognizes the color of all cubies, solves it and provides a 3D visualisation of the solving process.
Stars: ✭ 111 (-6.72%)
Mutual labels:  image-processing
Uav Mapper
UAV-Mapper is a lightweight UAV Image Processing System, Visual SFM reconstruction or Aerial Triangulation, Fast Ortho-Mosaic, Plannar Mosaic, Fast Digital Surface Map (DSM) and 3d reconstruction for UAVs.
Stars: ✭ 106 (-10.92%)
Mutual labels:  image-processing
Overmix
Automatic anime screenshot stitching in high quality
Stars: ✭ 114 (-4.2%)
Mutual labels:  image-processing
Intro To Cv Ud810
Problem Set solutions for the "Introduction to Computer Vision (ud810)" MOOC from Udacity
Stars: ✭ 110 (-7.56%)
Mutual labels:  image-processing
Mindboggle
Automated anatomical brain label/shape analysis software (+ website)
Stars: ✭ 112 (-5.88%)
Mutual labels:  image-processing
Neural Doodle
Turn your two-bit doodles into fine artworks with deep neural networks, generate seamless textures from photos, transfer style from one image to another, perform example-based upscaling, but wait... there's more! (An implementation of Semantic Style Transfer.)
Stars: ✭ 9,680 (+8034.45%)
Mutual labels:  image-processing
Nvidia Gpu Tensor Core Accelerator Pytorch Opencv
A complete machine vision container that includes Jupyter notebooks with built-in code hinting, Anaconda, CUDA-X, TensorRT inference accelerator for Tensor cores, CuPy (GPU drop in replacement for Numpy), PyTorch, TF2, Tensorboard, and OpenCV for accelerated workloads on NVIDIA Tensor cores and GPUs.
Stars: ✭ 110 (-7.56%)
Mutual labels:  image-processing
Eos
A lightweight 3D Morphable Face Model fitting library in modern C++14
Stars: ✭ 1,579 (+1226.89%)
Mutual labels:  image-processing
Gift
Go Image Filtering Toolkit
Stars: ✭ 1,473 (+1137.82%)
Mutual labels:  image-processing
Aesthetics
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Stars: ✭ 113 (-5.04%)
Mutual labels:  image-processing
Self Driving Car
A End to End CNN Model which predicts the steering wheel angle based on the video/image
Stars: ✭ 106 (-10.92%)
Mutual labels:  image-processing
L1stabilizer
🎥 Video stabilization using L1-norm optimal camera paths.
Stars: ✭ 111 (-6.72%)
Mutual labels:  image-processing
Reproducible Image Denoising State Of The Art
Collection of popular and reproducible image denoising works.
Stars: ✭ 1,776 (+1392.44%)
Mutual labels:  image-processing
Serverless Docker Image Resize
Simple serverless image resize on-the-fly - Deploy with one command - Built with AWS Lambda and S3
Stars: ✭ 114 (-4.2%)
Mutual labels:  image-processing
Swiftyimages
A set of efficient extensions and classes for manipulating images and colors.
Stars: ✭ 111 (-6.72%)
Mutual labels:  image-processing

TFRecorder

TFRecorder makes it easy to create TFRecords from Pandas DataFrames or CSV Files. TFRecord reads data, transforms it using TensorFlow Transform, stores it in the TFRecord format using Apache Beam and optionally Google Cloud Dataflow. Most importantly, TFRecorder does this without requiring the user to write an Apache Beam pipeline or TensorFlow Transform code.

TFRecorder can convert any Pandas DataFrame or CSV file into TFRecords. If your data includes images TFRecorder can also serialize those into TFRecords. By default, TFRecorder expects your DataFrame or CSV file to be in the same 'Image CSV' format that Google Cloud Platform's AutoML Vision product uses, however you can also specify an input data schema using TFRecorder's flexible schema system.

'TFRecorder CI/CD Badge'

Release Notes

Why TFRecorder?

Using the TFRecord storage format is important for optimal machine learning pipelines and getting the most from your hardware (in cloud or on prem). The TFRecorder project started inside Google Cloud AI Services when we realized we were writing TFRecord conversion code over and over again.

When to use TFRecords:

  • Your model is input bound (reading data is impacting training time).
  • Anytime you want to use tf.Dataset
  • When your dataset can't fit into memory

Installation

Install from Github

  1. Clone this repo.
git clone https://github.com/google/tensorflow-recorder.git

For "bleeding edge" changes, check out the dev branch.

  1. From the top directory of the repo, run the following command:
python setup.py install

Install from PyPi

pip install tfrecorder

Usage

Generating TFRecords

You can generate TFRecords from a Pandas DataFrame, CSV file or a directory containing images.

From Pandas DataFrame

TFRecorder has an accessor which enables creation of TFRecord files through the Pandas DataFrame object.

Make sure the DataFrame contains a header identifying each of the columns. In particular, the split column needs to be specified so that TFRecorder would know how to split the data into train, test and validation sets.

Running on a local machine
import pandas as pd
import tfrecorder

csv_file = '/path/to/images.csv'
df = pd.read_csv(csv_file, names=['split', 'image_uri', 'label'])
df.tensorflow.to_tfr(output_dir='/my/output/path')
Running on Cloud Dataflow

Google Cloud Platform Dataflow workers need to be supplied with the tfrecorder package that you would like to run remotely. To do so first download or build the package (a python wheel file) and then specify the path the file when tfrecorder is called.

Step 1: Download or create the wheel file.

To download the wheel from pip: pip download tfrecorder --no-deps

To build from source/git: python setup.py sdist

Step 2: Specify the project, region, and path to the tfrecorder wheel for remote execution.

Cloud Dataflow Requirements

  • The output_dir must be a Google Cloud Storage location.
  • The image files specified in an image_uri column must be located in Google Cloud Storage.
  • If being run from your local machine, the user must be authenticated to use Google Cloud.
import pandas as pd
import tfrecorder

df = pd.read_csv(...)
df.tensorflow.to_tfr(
    output_dir='gs://my/bucket',
    runner='DataflowRunner',
    project='my-project',
    region='us-central1',
    tfrecorder_wheel='/path/to/my/tfrecorder.whl')

From CSV

Using Python interpreter:

import tfrecorder

tfrecorder.convert(
    source='/path/to/data.csv',
    output_dir='gs://my/bucket')

Using the command line:

tfrecorder create-tfrecords \
    --input_data=/path/to/data.csv \
    --output_dir=gs://my/bucket

From an image directory

import tfrecorder

tfrecorder.convert(
    source='/path/to/image_dir',
    output_dir='gs://my/bucket')

The image directory should have the following general structure:

image_dir/
  <dataset split>/
    <label>/
      <image file>

Example:

images/
  TRAIN/
    cat/
      cat001.jpg
    dog/
      dog001.jpg
  VALIDATION/
    cat/
      cat002.jpg
    dog/
      dog002.jpg
  ...

Loading a TF Dataset from TFRecord files

You can load a TensorFlow dataset from TFRecord files generated by TFRecorder on your local machine.

import tfrecorder

dataset_dict = tfrecorder.load('/path/to/tfrecord_dir')
train = dataset_dict['TRAIN']

Verifying data in TFRecords generated by TFRecorder

Using Python interpreter:

import tfrecorder

tfrecorder.inspect(
    tfrecord_dir='/path/to/tfrecords/',
    split='TRAIN',
    num_records=5,
    output_dir='/tmp/output')

This will generate a CSV file containing structured data and image files representing the images encoded into TFRecords.

Using the command line:

tfrecorder inspect \
    --tfrecord-dir=/path/to/tfrecords/ \
    --split='TRAIN' \
    --num_records=5 \
    --output_dir=/tmp/output

Default Schema

If you don't specify an input schema, TFRecorder expects data to be in the same format as AutoML Vision input. This format looks like a Pandas DataFrame or CSV formatted as:

split image_uri label
TRAIN gs://my/bucket/image1.jpg cat

where:

  • split can take on the values TRAIN, VALIDATION, and TEST
  • image_uri specifies a local or Google Cloud Storage location for the image file.
  • label can be either a text-based label that will be integerized or integer

Flexible Schema

TFRecorder's flexible schema system allows you to use any schema you want for your input data.

For example, the default image CSV schema input can be defined like this:

import pandas as pd
import tfrecorder
from tfrecorder import input_schema
from tfrecorder import types

image_csv_schema = input_schema.Schema({
    'split': types.SplitKey,
    'image_uri': types.ImageUri,
    'label': types.StringLabel
})

# You can then pass the schema to `tfrecorder.create_tfrecords`.

df = pd.read_csv(...)
df.tensorflow.to_tfr(
    output_dir='gs://my/bucket',
    schema_map=image_csv_schema,
    runner='DataflowRunner',
    project='my-project',
    region='us-central1')

Flexible Schema Example

Imagine that you have a dataset that you would like to convert to TFRecords that looks like this:

split x y label
TRAIN 0.32 42 1

You can use TFRecorder as shown below:

import pandas as pd
import tfrecorder
from tfrecorder import input_schema
from tfrecorder import types

# First create a schema map
schema = input_schema.Schema({
    'split': types.SplitKey,
    'x': types.FloatInput,
    'y': types.IntegerInput,
    'label': types.IntegerLabel,
})

# Now call TFRecorder with the specified schema_map

df = pd.read_csv(...)
df.tensorflow.to_tfr(
    output_dir='gs://my/bucket',
    schema=schema,
    runner='DataflowRunner',
    project='my-project',
    region='us-central1')

After calling TFRecorder's to_tfr() function, TFRecorder will create an Apache beam pipeline, either locally or in this case using Google Cloud's Dataflow runner. This beam pipeline will use the schema map to identify the types you've associated with each data column and process your data using TensorFlow Transform and TFRecorder's image processing functions to convert the data into into TFRecords.

Supported types

TFRecorder's schema system supports several types. You can use these types by referencing them in the schema map. Each type informs TFRecorder how to treat your DataFrame columns.

types.SplitKey

  • A split key is required for TFRecorder at this time.
  • Only one split key is allowed.
  • Specifies a split key that TFRecorder will use to partition the input dataset on.
  • Allowed values are 'TRAIN', 'VALIDATION, and 'TEST'

Note: If you do not want your data to be partitioned, include a column with types.SplitKey and set all the elements to TRAIN.

types.ImageUri

  • Specifies the path to an image. When specified, TFRecorder will load the specified image and store the image as a base64 encoded tf.string in the key 'image' along with the height, width, and image channels as integers using the keys 'image_height', 'image_width', and 'image_channels'.
  • A schema can contain only one imageUri column

types.IntegerInput

  • Specifies an int input.
  • Will be scaled to mean 0, variance 1.

types.FloatInput

  • Specifies an float input.
  • Will be scaled to mean 0, variance 1.

types.CategoricalInput

  • Specifies a string input.
  • Vocabulary computed and output integerized.

types.IntegerLabel

  • Specifies an integer target.
  • Not transformed.

types.StringLabel

  • Specifies a string target.
  • Vocabulary computed and output integerized.

Contributing

Pull requests are welcome. Please see our code of conduct and contributing guide.

Why TFRecorder?

Using the TFRecord storage format is important for optimal machine learning pipelines and getting the most from your hardware (in cloud or on prem).

TFRecords help when:

  • Your model is input bound (reading data is impacting training time).
  • Anytime you want to use tf.Dataset
  • When your dataset can't fit into memory

Need help with using AI in the cloud? Visit Google Cloud AI Services.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].