All Projects → csv2db → Csv2db

csv2db / Csv2db

Licence: apache-2.0
The CSV to database command line loader

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Csv2db

Jooq
jOOQ is the best way to write SQL in Java
Stars: ✭ 4,695 (+4502.94%)
Mutual labels:  oracle, database, mysql, postgresql, sqlserver, db2
Symmetric Ds
SymmetricDS is a database and file synchronization solution that is platform-independent, web-enabled, and database agnostic. SymmetricDS was built to make data replication across two to tens of thousands of databases and file systems fast, easy and resilient. We specialize in near real time, bi-directional data replication across large node networks over the WAN or LAN.
Stars: ✭ 450 (+341.18%)
Mutual labels:  oracle, database, mysql, postgresql, sqlserver, db2
Linq2db
Linq to database provider.
Stars: ✭ 2,211 (+2067.65%)
Mutual labels:  oracle, etl, database, mysql, postgresql, db2
Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+13.73%)
Mutual labels:  oracle, etl, database, mysql, sqlserver
Sqlcheck
Automatically identify anti-patterns in SQL queries
Stars: ✭ 2,062 (+1921.57%)
Mutual labels:  command-line, oracle, database, mysql, postgresql
Typeorm
ORM for TypeScript and JavaScript (ES7, ES6, ES5). Supports MySQL, PostgreSQL, MariaDB, SQLite, MS SQL Server, Oracle, SAP Hana, WebSQL databases. Works in NodeJS, Browser, Ionic, Cordova and Electron platforms.
Stars: ✭ 26,559 (+25938.24%)
Mutual labels:  oracle, database, mysql, postgresql, sqlserver
Dbshield
Database firewall written in Go
Stars: ✭ 620 (+507.84%)
Mutual labels:  oracle, database, mysql, postgresql, db2
Kangaroo
SQL client and admin tool for popular databases
Stars: ✭ 127 (+24.51%)
Mutual labels:  oracle, database, mysql, postgresql, sqlserver
Pyetl
python ETL framework
Stars: ✭ 33 (-67.65%)
Mutual labels:  oracle, csv, etl, mysql, sqlserver
Liquibase
Main Liquibase Source
Stars: ✭ 2,910 (+2752.94%)
Mutual labels:  oracle, database, mysql, sqlserver, db2
Dbeaver
Free universal database tool and SQL client
Stars: ✭ 23,752 (+23186.27%)
Mutual labels:  oracle, database, mysql, postgresql, db2
Evolve
Database migration tool for .NET and .NET Core projects. Inspired by Flyway.
Stars: ✭ 477 (+367.65%)
Mutual labels:  database, mysql, postgresql, sqlserver
Trdsql
CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats.
Stars: ✭ 593 (+481.37%)
Mutual labels:  cli, csv, mysql, postgresql
Blog
Everything about database,business.(Most for PostgreSQL).
Stars: ✭ 6,330 (+6105.88%)
Mutual labels:  oracle, database, mysql, postgresql
Experdb Db2pg
eXperDB-DB2PG is a data migration solution that transfers data extracted from various DBMSs to eXperDB or PostgreSQL. Currently, Oracle and Oracle Spatial, MySQL, SQL Server(MS-SQL) and Sybase data can be transferred.
Stars: ✭ 24 (-76.47%)
Mutual labels:  oracle, mysql, postgresql, db2
Smartsql
SmartSql = MyBatis in C# + .NET Core+ Cache(Memory | Redis) + R/W Splitting + PropertyChangedTrack +Dynamic Repository + InvokeSync + Diagnostics
Stars: ✭ 775 (+659.8%)
Mutual labels:  oracle, mysql, postgresql, sqlserver
Antdata.orm
特色:vs插件或者t4一键生成entity 支持配置非物理外键。分离linq转sql引擎(原生linq非扩展)和执行dal功能,支持异步,支持netcore2.0
Stars: ✭ 428 (+319.61%)
Mutual labels:  oracle, mysql, postgresql, sqlserver
Zxw.framework.netcore
基于EF Core的Code First模式的DotNetCore快速开发框架,其中包括DBContext、IOC组件autofac和AspectCore.Injector、代码生成器(也支持DB First)、基于AspectCore的memcache和Redis缓存组件,以及基于ICanPay的支付库和一些日常用的方法和扩展,比如批量插入、更新、删除以及触发器支持,当然还有demo。欢迎提交各种建议、意见和pr~
Stars: ✭ 691 (+577.45%)
Mutual labels:  oracle, mysql, postgresql, sqlserver
Eosio sql plugin
EOSIO sql database plugin
Stars: ✭ 21 (-79.41%)
Mutual labels:  oracle, mysql, postgresql, db2
Q
q - Run SQL directly on CSV or TSV files
Stars: ✭ 8,809 (+8536.27%)
Mutual labels:  command-line-tool, cli, command-line, csv
                  ___       ____  
  ___________   _|__ \ ____/ / /_ 
 / ___/ ___/ | / /_/ // __  / __ \
/ /__(__  )| |/ / __// /_/ / /_/ /
\___/____/ |___/____/\____/_____/ 
                                  

The CSV to database command line loader.

csv2db reads CSV files and loads them into a database. Rather than having to go through the CSV data first and find out what columns and data types are present in the CSV files, csv2db will read the header in each CSV file and automatically load data into the columns of the same name into the target table. Spaces in the header column names are automatically replaced with _ characters, for example the column station id in the CSV file will be interpreted as station_id column in the table.

This approach allows you to get data into the database first and worry about the data cleansing part later, which is usually much easier once the data is in the database rather than in the CSV files.

csv2db is capable of scanning all CSV file headers at once and generate a CREATE TABLE statement with all the column names present. This is particularly useful if the format of the CSV files has changed over time or because you want to load different CSV file types into the same database table.

Usage

$ ./csv2db -h
usage: csv2db [-h] {generate,gen,load,lo} ...

The CSV to database command line loader.
Version: 1.5.1
(c) Gerald Venzl

positional arguments:
  {generate,gen,load,lo}
    generate (gen)      Prints a CREATE TABLE SQL statement to create the
                        table and columns based on the header row of the CSV
                        file(s).
    load (lo)           Loads the data from the CSV file(s) into the database.

optional arguments:
  -h, --help            show this help message and exit
$ ./csv2db generate -h
usage: csv2db generate [-h] [-f FILE] [-v] [--debug] [-t TABLE]
                       [-c COLUMN_TYPE] [-s SEPARATOR] [-q QUOTE]

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  The file to load, by default all *.csv.zip files
  -v, --verbose         Verbose output.
  --debug               Debug output.
  -t TABLE, --table TABLE
                        The table name to use.
  -c COLUMN_TYPE, --column-type COLUMN_TYPE
                        The column type to use for the table generation.
  -s SEPARATOR, --separator SEPARATOR
                        The columns separator character(s).
  -q QUOTE, --quote QUOTE
                        The quote character on which a string won't be split.
$ ./csv2db load -h
usage: csv2db load [-h] [-f FILE] [-v] [--debug] -t TABLE
                   [-o {oracle,mysql,postgres,sqlserver,db2}] -u USER
                   [-p PASSWORD] [-m HOST] [-n PORT] [-d DBNAME] [-b BATCH]
                   [-s SEPARATOR] [-q QUOTE] [-a]

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  The file to load, by default all *.csv.zip files
  -v, --verbose         Verbose output.
  --debug               Debug output.
  -t TABLE, --table TABLE
                        The table name to use.
  -o {oracle,mysql,postgres,sqlserver,db2}, --dbtype {oracle,mysql,postgres,sqlserver,db2}
                        The database type.
  -u USER, --user USER  The database user to load data into.
  -p PASSWORD, --password PASSWORD
                        The database schema password. csv2db will prompt for
                        the password if the parameter is missing which is a
                        more secure method of providing a password.
  -m HOST, --host HOST  The host name on which the database is running on.
  -n PORT, --port PORT  The port on which the database is listening. If not
                        passed on the default port will be used (Oracle: 1521,
                        MySQL: 3306, PostgreSQL: 5432, SQL Server: 1433, DB2:
                        50000).
  -d DBNAME, --dbname DBNAME
                        The name of the database.
  -b BATCH, --batch BATCH
                        How many rows should be loaded at once.
  -s SEPARATOR, --separator SEPARATOR
                        The columns separator character(s).
  -q QUOTE, --quote QUOTE
                        The quote character on which a string won't be split.
  -a, --directpath      Execute a direct path INSERT load operation (Oracle
                        only).

How to use csv2db

Loading CSV files into the database

csv2db can load plain text csv files as well as compressed csv files in .zip or .gz format without having to uncompress them first.

$ ./csv2db load -f test/resources/201811-citibike-tripdata.csv -t citibikes -u csv_data -p csv_data -d ORCLPDB1

Loading file test/resources/201811-citibike-tripdata.csv
File loaded.

$ ./csv2db load -f test/resources/201811-citibike-tripdata.csv.gz -t citibikes -u csv_data -p csv_data -d ORCLPDB1

Loading file test/resources/201811-citibike-tripdata.csv.gz
File loaded.

csv2db --verbose option will provide verbose output.

$ ./csv2db load -f test/resources/201811-citibike-tripdata.csv -t citibikes -u csv_data -p csv_data -d ORCLPDB1 --verbose
Finding file(s).
Found 1 file(s).
Establishing database connection.

Loading file test/resources/201811-citibike-tripdata.csv
16 rows loaded.
File loaded.

Closing database connection.

csv2db can load multiple files at once, using either wildcard characters (e.g. data*.csv.zip) or by passing on the folder containing CSV files.

Note: String including wildcard characters have to be enclosed in ""

$ ./csv2db load -f "test/resources/201811-citibike-tripdata.*" -t citibikes -u csv_data -p csv_data -d ORCLPDB1 --verbose
Finding file(s).
Found 3 file(s).
Establishing database connection.

Loading file test/resources/201811-citibike-tripdata.csv
16 rows loaded.
File loaded.


Loading file test/resources/201811-citibike-tripdata.csv.gz
10 rows loaded.
File loaded.


Loading file test/resources/201811-citibike-tripdata.csv.zip
10 rows loaded.
File loaded.

Closing database connection.
$ ./csv2db load -f test/resources -t citibikes -u csv_data -p csv_data -d ORCLPDB1 --verbose
Finding file(s).
Found 3 file(s).
Establishing database connection.

Loading file test/resources/201811-citibike-tripdata.csv
16 rows loaded.
File loaded.


Loading file test/resources/201811-citibike-tripdata.csv.gz
10 rows loaded.
File loaded.


Loading file test/resources/201811-citibike-tripdata.csv.zip
10 rows loaded.
File loaded.

Closing database connection.

csv2db will load all values as strings. You can either load all data into a staging table with all columns being strings as well, or rely on implicit data type conversion on the database side.

Create a staging table

csv2db can generate the SQL statement for a staging table for your data using the generate command:

$ ./csv2db generate -f test/resources/201811-citibike-tripdata.csv
CREATE TABLE <TABLE NAME>
(
  TRIPDURATION VARCHAR(1000),
  STARTTIME VARCHAR(1000),
  STOPTIME VARCHAR(1000),
  START_STATION_ID VARCHAR(1000),
  START_STATION_NAME VARCHAR(1000),
  START_STATION_LATITUDE VARCHAR(1000),
  START_STATION_LONGITUDE VARCHAR(1000),
  END_STATION_ID VARCHAR(1000),
  END_STATION_NAME VARCHAR(1000),
  END_STATION_LATITUDE VARCHAR(1000),
  END_STATION_LONGITUDE VARCHAR(1000),
  BIKEID VARCHAR(1000),
  USERTYPE VARCHAR(1000),
  BIRTH_YEAR VARCHAR(1000),
  GENDER VARCHAR(1000)
);

By default you will have to fill in the table name. You can also specify the table name via the -t option:

$ ./csv2db generate -f test/resources/201811-citibike-tripdata.csv -t STAGING
CREATE TABLE STAGING
(
  TRIPDURATION VARCHAR(1000),
  STARTTIME VARCHAR(1000),
  STOPTIME VARCHAR(1000),
  START_STATION_ID VARCHAR(1000),
  START_STATION_NAME VARCHAR(1000),
  START_STATION_LATITUDE VARCHAR(1000),
  START_STATION_LONGITUDE VARCHAR(1000),
  END_STATION_ID VARCHAR(1000),
  END_STATION_NAME VARCHAR(1000),
  END_STATION_LATITUDE VARCHAR(1000),
  END_STATION_LONGITUDE VARCHAR(1000),
  BIKEID VARCHAR(1000),
  USERTYPE VARCHAR(1000),
  BIRTH_YEAR VARCHAR(1000),
  GENDER VARCHAR(1000)
);

csv2db will use VARCHAR(1000) as default data type for all columns for the staging table. If you wish to use a different data type, you can specify it via the -c option:

$ ./csv2db generate -f test/resources/201811-citibike-tripdata.csv -t STAGING -c CLOB
CREATE TABLE STAGING
(
  TRIPDURATION CLOB,
  STARTTIME CLOB,
  STOPTIME CLOB,
  START_STATION_ID CLOB,
  START_STATION_NAME CLOB,
  START_STATION_LATITUDE CLOB,
  START_STATION_LONGITUDE CLOB,
  END_STATION_ID CLOB,
  END_STATION_NAME CLOB,
  END_STATION_LATITUDE CLOB,
  END_STATION_LONGITUDE CLOB,
  BIKEID CLOB,
  USERTYPE CLOB,
  BIRTH_YEAR CLOB,
  GENDER CLOB
);

The idea is to have a staging table that data can be loaded into first and then figure out the correct data types for each column.

Installation

You can install csv2db either by cloning this Git repository

$ git clone https://github.com/csv2db/csv2db

or by downloading one of the releases

$ LOCATION=$(curl -s https://api.github.com/repos/csv2db/csv2db/releases/latest | grep "tag_name" | awk '{print "https://github.com/csv2db/csv2db/archive/" substr($2, 2, length($2)-3) ".zip"}') ; curl -L -o csv2db.zip $LOCATION
$ unzip csv2db.zip
$ cd csv2db*
$ ./csv2db

In order for csv2db to work you will have to install the appropriate database driver(s). The following drivers are being used, all available on pypi.org:

You can install any of these drivers via pip:

$ python3 -m pip install cx-Oracle
$ python3 -m pip install mysql-connector-python
$ python3 -m pip install psycopg2-binary
$ python3 -m pip install pymssql
$ python3 -m pip install ibm-db

For more instruction on how to install the driver(s) on your environment, please see the documentation of the individual driver or refer to the csv2db Installation Guide.

NOTE: You only have to install the driver for the database(s) that you want to load data into.

Miscellaneous

Exit codes

csv2db returns following exit codes:

Exit code Value Meaning
SUCCESS 0 Successful execution of the program.
GENERIC_ERROR 1 A generic error occurred.
ARGUMENT_ERROR 2 An argument is either missing or incorrect.
DATABASE_ERROR 3 A database error occurred.
DATA_LOADING_ERROR 4 An error occurred during loading of data. csv2db will continue to process other files, if any.

$NO_COLOR support

csv2db is capable of color coded output and will do so by default (except on Windows).
Debug output is yellow.
Error output is red.
This can be deactivated by setting the $NO_COLOR environment variable. For more details on $NO_COLOR see https://no-color.org/

LICENSE

Copyright 2019 Gerald Venzl

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].