All Projects → evdubs → spdr-etf-holdings

evdubs / spdr-etf-holdings

Licence: MPL-2.0 License
ETL for the SPDR ETF holdings XLS documents

Programming Languages

racket
414 projects
shell
77523 projects

Projects that are alternatives of or similar to spdr-etf-holdings

web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (+0%)
Mutual labels:  etl
kafka-connect-datagen
A Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (+92.86%)
Mutual labels:  etl
es2postgres
ElasticSearch to PostgreSQL loader
Stars: ✭ 18 (+28.57%)
Mutual labels:  etl
dswarm
an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (+307.14%)
Mutual labels:  etl
maxwell-sink
consume maxwell generated message from kafka,export it to another mysql.
Stars: ✭ 16 (+14.29%)
Mutual labels:  etl
DataXServer
为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能
Stars: ✭ 130 (+828.57%)
Mutual labels:  etl
dflib
In-memory Java DataFrame library
Stars: ✭ 50 (+257.14%)
Mutual labels:  etl
TEAM
The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.
Stars: ✭ 27 (+92.86%)
Mutual labels:  etl
rivery cli
Rivery CLI
Stars: ✭ 16 (+14.29%)
Mutual labels:  etl
oesophagus
Enterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)
Stars: ✭ 12 (-14.29%)
Mutual labels:  etl
persistity
A persistence framework for game developers
Stars: ✭ 34 (+142.86%)
Mutual labels:  etl
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (+50%)
Mutual labels:  etl
lineage
Generate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (+14.29%)
Mutual labels:  etl
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (+214.29%)
Mutual labels:  etl
koza
Data transformation framework for LinkML data models
Stars: ✭ 21 (+50%)
Mutual labels:  etl
oic-options-chains
ETL for OIC Options Chains
Stars: ✭ 22 (+57.14%)
Mutual labels:  etl
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (+21.43%)
Mutual labels:  etl
openrefine-client
The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Stars: ✭ 67 (+378.57%)
Mutual labels:  etl
carry
Python ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (+721.43%)
Mutual labels:  etl
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+464.29%)
Mutual labels:  etl

spdr-etf-holdings

These Racket programs will download the SPDR ETF holdings XLS documents and insert the holding data into a PostgreSQL database. The intended usage on Windows with Microsoft Excel is:

$ racket extract.rkt
$ racket transform-load-com.rkt

On other platforms, you will need to do something like the following (and will need some bit of software to do the XLS->CSV transformation):

$ racket extract.rkt
$ for f in `ls /var/tmp/spdr/etf-holdings/date/` ; do libreoffice --headless --convert-to csv --outdir /var/tmp/spdr/etf-holdings/date $f ; done
$ racket transform-load-csv.rkt

If you have libreoffice installed, you can instead just do the following as XLS->CSV conversion using libreoffice is supported within the process:

$ racket extract.rkt
$ racket transform-load-csv.rkt -c

You will need to provide a database password for the transform-load-*.rkt programs. The available parameters are:

$ racket transform-load-csv.2019-11-02.rkt -h
racket transform-load-csv.2019-11-02.rkt [ <option> ... ]
 where <option> is one of
  -b <folder>, --base-folder <folder> : SPDR ETF Holdings base folder. Defaults to /var/tmp/spdr/etf-holdings
  -c, --convert-xls : Convert XLS documents to CSV for handling. This requires libreoffice to be installed
  -d <date>, --folder-date <date> : SPDR ETF Holdings folder date. Defaults to today
  -n <name>, --db-name <name> : Database name. Defaults to 'local'
  -p <password>, --db-pass <password> : Database password
  -u <user>, --db-user <user> : Database user name. Defaults to 'user'
  --help, -h : Show this help
  -- : Do not treat any remaining argument as a switch (at this level)
 Multiple single-letter switches can be combined after one `-`. For
  example: `-h-` is the same as `-h --`

The provided schema.sql file shows the expected schema within the target PostgreSQL instance. This process assumes you can write to a /var/tmp/spdr folder. This process also assumes you have loaded your database with NASDAQ symbol file information. This data is provided by the nasdaq-symbols project.

Dependencies

It is recommended that you start with the standard Racket distribution. With that, you will need to install the following packages:

$ raco pkg install --skip-installed gregor http-easy tasks threading

Format and URL updates

On 2020-01-01, the URL for SPDR ETF documents changed; extract.2020-01-01.rkt uses this new location.

On 2019-11-02, columns were added to the SPDR ETF documents; transform-load.2019-11-02.rkt can process these new columns.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].