All Projects → simonw → Csv Diff

simonw / Csv Diff

Licence: apache-2.0
Python CLI tool and library for diffing CSV and JSON files

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Csv Diff

Csvs To Sqlite
Convert CSV files into a SQLite database
Stars: ✭ 568 (+381.36%)
Mutual labels:  csv, click
Winmerge
WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
Stars: ✭ 2,358 (+1898.31%)
Mutual labels:  csv, diff
Daff
align and compare tables
Stars: ✭ 598 (+406.78%)
Mutual labels:  csv, diff
Excelmerge
GUI Diff Tool for Excel
Stars: ✭ 425 (+260.17%)
Mutual labels:  csv, diff
Diff Table
Stars: ✭ 21 (-82.2%)
Mutual labels:  csv, diff
Laravel Excel
🚀 Supercharged Excel exports and imports in Laravel
Stars: ✭ 10,417 (+8727.97%)
Mutual labels:  csv
Manage
Command Line Manager + Interactive Shell for Python Projects
Stars: ✭ 111 (-5.93%)
Mutual labels:  click
Vscode Partial Diff
Visual Studio Code Extension. Take a diff of 2 parts of text(s)
Stars: ✭ 106 (-10.17%)
Mutual labels:  diff
Typer Cli
Run Typer scripts with completion, without having to create a package, using Typer CLI.
Stars: ✭ 102 (-13.56%)
Mutual labels:  click
Administrative Divisions Of China
中华人民共和国行政区划:省级(省份直辖市自治区)、 地级(城市)、 县级(区县)、 乡级(乡镇街道)、 村级(村委会居委会) ,中国省市区镇村二级三级四级五级联动地址数据。
Stars: ✭ 11,727 (+9838.14%)
Mutual labels:  csv
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1180.51%)
Mutual labels:  csv
Magit Delta
Use delta (https://github.com/dandavison/delta) when viewing diffs in Magit
Stars: ✭ 109 (-7.63%)
Mutual labels:  diff
Tableqa
AI Tool for querying natural language on tabular data.
Stars: ✭ 109 (-7.63%)
Mutual labels:  csv
Jobfunnel
Scrape job websites into a single spreadsheet with no duplicates.
Stars: ✭ 1,528 (+1194.92%)
Mutual labels:  csv
Papaparse
Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
Stars: ✭ 10,206 (+8549.15%)
Mutual labels:  csv
React Papaparse
react-papaparse is the fastest in-browser CSV (or delimited text) parser for React. It is full of useful features such as CSVReader, CSVDownloader, readString, jsonToCSV, readRemoteFile, ... etc.
Stars: ✭ 116 (-1.69%)
Mutual labels:  csv
Csv2db
The CSV to database command line loader
Stars: ✭ 102 (-13.56%)
Mutual labels:  csv
Csvexport
Very simple CSV-export tool for C#
Stars: ✭ 110 (-6.78%)
Mutual labels:  csv
Node Rus Diff
JSON diff
Stars: ✭ 112 (-5.08%)
Mutual labels:  diff
Pandiff
Prose diffs for any document format supported by Pandoc
Stars: ✭ 110 (-6.78%)
Mutual labels:  diff

csv-diff

PyPI Changelog Tests License

Tool for viewing the difference between two CSV, TSV or JSON files. See Generating a commit log for San Francisco’s official list of trees (and the sf-tree-history repo commit log) for background information on this project.

Installation

pip install csv-diff

Usage

Consider two CSV files:

one.csv

id,name,age
1,Cleo,4
2,Pancakes,2

two.csv

id,name,age
1,Cleo,5
3,Bailey,1

csv-diff can show a human-readable summary of differences between the files:

$ csv-diff one.csv two.csv --key=id
1 row changed, 1 row added, 1 row removed

1 row changed

  Row 1
    age: "4" => "5"

1 row added

  id: 3
  name: Bailey
  age: 1

1 row removed

  id: 2
  name: Pancakes
  age: 2

The --key=id option means that the id column should be treated as the unique key, to identify which records have changed.

The tool will automatically detect if your files are comma- or tab-separated. You can over-ride this automatic detection and force the tool to use a specific format using --format=tsv or --format=csv.

You can also feed it JSON files, provided they are a JSON array of objects where each object has the same keys. Use --format=json if your input files are JSON.

Use --show-unchanged to include full details of the unchanged values for rows with at least one change in the diff output:

% csv-diff one.csv two.csv --key=id --show-unchanged
1 row changed

  id: 1
    age: "4" => "5"

    Unchanged:
      name: "Cleo"

You can use the --json option to get a machine-readable difference:

$ csv-diff one.csv two.csv --key=id --json
{
    "added": [
        {
            "id": "3",
            "name": "Bailey",
            "age": "1"
        }
    ],
    "removed": [
        {
            "id": "2",
            "name": "Pancakes",
            "age": "2"
        }
    ],
    "changed": [
        {
            "key": "1",
            "changes": {
                "age": [
                    "4",
                    "5"
                ]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}

As a Python library

You can also import the Python library into your own code like so:

from csv_diff import load_csv, compare
diff = compare(
    load_csv(open("one.csv"), key="id"),
    load_csv(open("two.csv"), key="id")
)

diff will now contain the same data structure as the output in the --json example above.

If the columns in the CSV have changed, those added or removed columns will be ignored when calculating changes made to specific rows.

As a Docker container

Build the image

$ docker build -t csvdiff .

Run the container

$ docker run --rm -v $(pwd):/files csvdiff

Suppose current directory contains two csv files : one.csv two.csv

$ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].