All Projects → chop-dbhi → Diff Table

chop-dbhi / Diff Table

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Diff Table

Monitor Table Change With Sqltabledependency
Get SQL Server notification on record table change
Stars: ✭ 459 (+2085.71%)
Mutual labels:  sql, table
Daff
align and compare tables
Stars: ✭ 598 (+2747.62%)
Mutual labels:  csv, diff
Finviz
Unofficial API for finviz.com
Stars: ✭ 493 (+2247.62%)
Mutual labels:  csv, sql
Rainbow csv
🌈Rainbow CSV - Vim plugin: Highlight columns in CSV and TSV files and run queries in SQL-like language
Stars: ✭ 337 (+1504.76%)
Mutual labels:  csv, sql
Sheetjs
📗 SheetJS Community Edition -- Spreadsheet Data Toolkit
Stars: ✭ 28,479 (+135514.29%)
Mutual labels:  csv, table
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+1909.52%)
Mutual labels:  csv, table
Trdsql
CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats.
Stars: ✭ 593 (+2723.81%)
Mutual labels:  csv, sql
fastapi-csv
🏗️ Create APIs from CSV files within seconds, using fastapi
Stars: ✭ 46 (+119.05%)
Mutual labels:  csv, table
Countries
Countries, Languages & Continents data (capital and currency, native name, calling codes).
Stars: ✭ 656 (+3023.81%)
Mutual labels:  csv, sql
Municipios Brasileiros
🏡 Código IBGE, nome do município, capital, código UF, UF, estado, latitude e longitude das cidades brasileiras
Stars: ✭ 638 (+2938.1%)
Mutual labels:  csv, sql
Sq
swiss-army knife for data
Stars: ✭ 275 (+1209.52%)
Mutual labels:  csv, sql
Rows
A common, beautiful interface to tabular data, no matter the format
Stars: ✭ 739 (+3419.05%)
Mutual labels:  csv, table
Sqawk
Like Awk but with SQL and table joins
Stars: ✭ 263 (+1152.38%)
Mutual labels:  csv, sql
Excelmerge
GUI Diff Tool for Excel
Stars: ✭ 425 (+1923.81%)
Mutual labels:  csv, diff
csv2latex
🔧 Simple script in python to convert CSV files to LaTeX table
Stars: ✭ 54 (+157.14%)
Mutual labels:  csv, table
Datasette
An open source multi-tool for exploring and publishing data
Stars: ✭ 5,640 (+26757.14%)
Mutual labels:  csv, sql
Eventflow
Async/await first CQRS+ES and DDD framework for .NET
Stars: ✭ 1,932 (+9100%)
Mutual labels:  events, eventsourcing
tardis
Event sourcing toolkit
Stars: ✭ 35 (+66.67%)
Mutual labels:  events, eventsourcing
World countries
Constantly updated lists of world countries and their associated alpha-2, alpha-3 and numeric country codes as defined by the ISO 3166 standard, available in CSV, JSON , PHP and SQL formats, in multiple languages and with national flags included
Stars: ✭ 598 (+2747.62%)
Mutual labels:  csv, sql
Nano Sql
Universal database layer for the client, server & mobile devices. It's like Lego for databases.
Stars: ✭ 717 (+3314.29%)
Mutual labels:  csv, sql

diff-table

Build Status Coverage Status

A tool to compare two tables of data. Currently the tool supported tables from Postgres and CSV files.

The primary use case is to compare the output of a query executed at different points in time. For example, in a batch ETL process that runs every night, you can compare the previous batch with the new batch.

Install

Download a release or install from source:

go get -u github.com/chop-dbhi/diff-table/cmd/diff-table

Usage

The minimum requirement is to specify two tables (CSV or relational) and specify the key columns. For example, using the provided example data in the repository, the command would look like this.

diff-table \
  -csv1 example/file1.csv \
  -csv2 example/file2.csv \
  -key id

The default output is a JSON encoded object with a summary of the differences, including column and row changes.

{
  "total_rows": 4,
  "columns_added": [
    "city"
  ],
  "columns_dropped": [],
  "type_changes": {},
  "rows_added": 1,
  "rows_deleted": 1,
  "rows_changed": 1
}

Adding the -diff option in the command will result in row-level changes in the output.

diff-table \
  -csv1 example/file1.csv \
  -csv2 example/file2.csv \
  -key id \
  -diff
{
  "total_rows": 4,
  "columns_added": [
    "city"
  ],
  "columns_dropped": [],
  "type_changes": {},
  "rows_added": 1,
  "rows_deleted": 1,
  "rows_changed": 2,
  "row_diffs": [
    {
      "key": {
        "id": "1"
      },
      "changes": {
        "city": {
          "old": null,
          "new": "Trenton"
        },
        "color": {
          "old": "Blue",
          "new": "Teal"
        }
      }
    },
    {
      "key": {
        "id": "3"
      },
      "changes": {
        "city": {
          "old": null,
          "new": "Philadelphia"
        }
      }
    }
  ],
  "new_rows": [
    {
      "city": "Allentown",
      "color": "Black",
      "gender": "Male",
      "id": "4",
      "name": "Neal"
    }
  ],
  "deleted_rows": [
    {
      "id": "2"
    }
  ]
}

Events

The above type of output is convenient for summary usage, however in some use cases a set of events may be more useful. Using the -events option will result in the changes being streamed as they are discovered rather than aggregating everything up into a single output object.

diff-table \
  -csv1 example/file1.csv \
  -csv2 example/file2.csv \
  -key id \
  -events

The output is a newline-delimited set of JSON-encoded events. Column-based changes will always come first. The type field denotes the type of event which can be switched on during consumption. The base event structured looks as follows:

{
  // Event name/type.
  "type": "row-added",

  // Unix epoch timestamp in seconds.
  "time": 1520114848,

  // Offset of the row in the batch, if applicable. This can also be used as an
  // ordering mechanism.
  "offset": 1,

  // Key of the row this event pertains to, if applicable.
  "key": {...},

  // Full snapshot of the row. This will be present in row-added events, but
  // can be optionally included for row-changed and row-removed events using
  // the -data option.
  "data": {...},

  // Set of changes found relative to the old row. Each key is the column
  // name and the value is an object with `old` and `new` keys with the
  // respective values.
  "changes": {...},

  // If a column -added, -changed, or -removed event, this is the name of the
  // column affected.
  "column": "city"

  // If a column-changed event, this is the old type.
  "old_type": "int32",

  // If a column-changed event, this is the new type.
  "new_type": "int64"
}
{
  "type": "column-added",
  "column": "city"
}
{
  "type": "row-changed",
  "offset": 1,
  "key": {
    "id": "1"
  },
  "changes": {
    "city": {
      "old": null,
      "new": "Trenton"
    },
    "color": {
      "old": "Blue",
      "new": "Teal"
    }
  }
}
{
  "type": "row-removed",
  "offset": 2,
  "key": {
    "id": "2"
  }
}
{
  "type": "row-changed",
  "offset": 3,
  "key": {
    "id": "3"
  },
  "changes": {
    "city": {
      "old": null,
      "new": "Philadelphia"
    }
  }
}
{
  "type": "row-added",
  "offset": 4,
  "key": {
    "id": "4"
  },
  "data": {
    "city": "Allentown",
    "color": "Black",
    "gender": "Male",
    "id": "4",
    "name": "Neal"
  }
}

Snapshots

In addition to change events, snapshots are supported which scans a table and emits a row-stored event including the current state of each row. These events are structurally equivalent to row-added events which include the full data.

The use case for them is to initialize a stream from a stateful starting point, but can be used over time to make new starting points for consumers. This prevents consumers from needing to know what the starting state was when the stream was initiated as well as not needing to read from the very beginning of the stream in order to build up the current state. With snapshots, the consumer can start from the last known snapshot and consume from there.

To create a snapshot, use the -snapshot option with the newest table or CSV file specified.

diff-table \
  -csv1 example/file1.csv \
  -key id \
  -snapshot

Examples

Below are examples of how tables can be specified including SQL-based tables and CSV files (with sorted or unsorted rows) and how columns can be renamed (not in the data source, just in the diff-table runtime) before the tables are compared.

Tables in the same database

diff-table \
  -db postgres://localhost:5432/postgres \
  -table1 data_v1 \
  -table2 data_v2 \
  -key id

Tables from different servers/databases

diff-table \
  -db postgres://localhost:5432/postgres \
  -db2 postgres://localhost:5435/other \
  -table1 data_v1 \
  -table2 data_v2 \
  -key id

CSV files

Note: this assumes the CSV files are pre-sorted by the specified key columns.

diff-table \
  -csv1 data_v1.csv \
  -csv2 data_v2.csv  \
  -key id

Unsorted CSV files

diff-table \
  -csv1 data_v1.csv \
  -csv1.sort \
  -csv2 data_v2.csv  \
  -csv2.sort \
  -key id

CSV file and database table (o.O)

Note: this assumes the CSV file is pre-sorted by the specified key columns.

diff-table \
  -csv1 data_v1.csv \
  -db2 postgres://localhost:5432/postgres \
  -table2 data_v2 \
  -key id

Tables with renamed columns

The data_v1.foo column will be renamed to bar (just in the diff-table runtime, not in the source) and compared to the data_v2.bar column. The same will happen for the baz:buz column rename.

diff-table \
  -db postgres://localhost:5432/postgres \
  -table1 data_v1 \
  -rename1 "foo:bar,baz:buz" \
  -table2 data_v2 \
  -key id

SQL statements

In addition to tables, arbitrary SQL statements are supported as well. The basic requirement is that the columns specified in -key must existing the statement and the data must be ordered by the key columns.

diff-table \
  -db postgres://localhost:5432/postgres \
  -kye id \
  -sql1 "select id, col1, col2 from table1 order by id" \
  -sql1 "select id, col1, col2 from table2 order by id"
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].