All Projects → openva → crump

openva / crump

Licence: MIT license
A parser for the Virginia State Corporation Commission's business registration records.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to crump

api sof
Tutorial para acessar a API do Sistema de Orçamento e Finanças _SOF da cidade de São Paulo, utilizando Python e a biblioteca Pandas, realizar análises e salvar arquivo CSV/Excel
Stars: ✭ 31 (+72.22%)
Mutual labels:  open-data
wbstats
wbstats: An R package for searching and downloading data from the World Bank API
Stars: ✭ 106 (+488.89%)
Mutual labels:  open-data
datapackage-m
Power Query M functions for working with Tabular Data Packages (Frictionless Data) in Power BI and Excel
Stars: ✭ 26 (+44.44%)
Mutual labels:  open-data
MADBike
This is the public repository of the MADBike app for iOS. Public bike rental service for BiciMAD.
Stars: ✭ 23 (+27.78%)
Mutual labels:  open-data
berlin-open-source-portal
Showcase of Open Source Software that is built, maintained and/or funded by Berlin state governmental agencies
Stars: ✭ 21 (+16.67%)
Mutual labels:  open-data
osm-extracts
Each day, OSM Extracts by Interline mirrors the entire OpenStreetMap planet and creates city and region sized extracts
Stars: ✭ 34 (+88.89%)
Mutual labels:  open-data
LDWizard
A generic framework for simplifying the creation of linked data.
Stars: ✭ 17 (-5.56%)
Mutual labels:  open-data
Ro-dou
Gerador de DAGs no Airflow para fazer clipping do Diário Oficial da União.
Stars: ✭ 41 (+127.78%)
Mutual labels:  open-data
CityScoreToolkit
Open-source version of Boston's CityScore performance dashboard
Stars: ✭ 42 (+133.33%)
Mutual labels:  open-data
building-data-genome-project-2
Whole building non-residential hourly energy meter data from the Great Energy Predictor III competition
Stars: ✭ 112 (+522.22%)
Mutual labels:  open-data
wtm-udacity-scholars-nanodegree-resources
A List of Resources for Udacity Nanodegrees
Stars: ✭ 15 (-16.67%)
Mutual labels:  business
openSenseMap-API
API for opensensemap.org
Stars: ✭ 46 (+155.56%)
Mutual labels:  open-data
git-rdm
A research data management plugin for the Git version control system.
Stars: ✭ 34 (+88.89%)
Mutual labels:  open-data
digipathos
Brazilian Agricultural Research Corporation (EMBRAPA) fully annotated dataset for plant diseases. Plug and play installation over PiP.
Stars: ✭ 38 (+111.11%)
Mutual labels:  open-data
company-introduction-jp
日本の会社紹介スライドのまとめです。
Stars: ✭ 49 (+172.22%)
Mutual labels:  open-data
server-monitor-ui
Server Operation Monitor
Stars: ✭ 17 (-5.56%)
Mutual labels:  business
finance-news-aggregator
A news aggregator in python, that focuses primarily on business and market news sources.
Stars: ✭ 59 (+227.78%)
Mutual labels:  business
statistics-coded
Catalogue of resources (R/Python/SQL/SAS/Stata/...) to reproduce the results of Eurostat Statistics Explained articles
Stars: ✭ 31 (+72.22%)
Mutual labels:  open-data
egov
eGov España - API abierto de acceso a datos púbicos
Stars: ✭ 21 (+16.67%)
Mutual labels:  open-data
Airspace-Bootstrap-Agency-Template
Airspace is a clean, unique, and free Bootstrap website template.
Stars: ✭ 43 (+138.89%)
Mutual labels:  business

Crump

Known Vulnerabilities

A parser for the Virginia State Corporation Commission's business entity records, which are provided as a single, enormous fixed-width file. Named for Beverley T. Crump, the first member of the State Corporation Commission.

Crump retrieves the current SCC records (updated weekly) and turns them into CSV and JSON. Alternately, it can improve the quality of the data (formatting dates, ZIP codes, replacing internal status codes with human-readable translations, etc.), atomize the data into millions of individual JSON files, or create Elasticsearch-compatible bulk API data.

The most recent copy of the raw SCC data can be found at https://s3.amazonaws.com/virginia-business/current.zip.

Usage

usage: crump [-h] [-a] [-i file.txt] [-o output_dir] [-t] [-d] [-e] [-m]

optional arguments:
  -h, --help            show this help message and exit
  -a, --atomize         generate millions of per-record JSON files
  -i file.txt, --input file.txt
                        raw SCC data (default: cisbemon.txt)
  -o output_dir, --output output_dir
                        directory for JSON and CSV
  -t, --transform       format properly date, ZIP, etc. fields
  -d, --download        download the data file, if missing
  -e, --elasticsearch   create Elasticsearch bulk API data
  -m, --map             generate Elasticsearch index map

For general purposes, ./crump -td is probably the best way to invoke Crump. This will download the current data file and transform the data to make it adhere to basic data quality norms.

License

Released under the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].