All Projects → minus34 → census-loader

minus34 / census-loader

Licence: Apache-2.0 license
A quick way to get started with ABS Census 2016 data

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
CSS
56736 projects
shell
77523 projects
PLpgSQL
1095 projects
HTML
75241 projects

Projects that are alternatives of or similar to census-loader

rafagas
Daily geospatial links curated by Raf Roset
Stars: ✭ 17 (-46.87%)
Mutual labels:  cartography, geospatial, geography
readabs
Download and tidy time series data from the Australian Bureau of Statistics in R
Stars: ✭ 73 (+128.13%)
Mutual labels:  australia, abs
python-censusbatchgeocoder
A simple Python wrapper for U.S. Census Geocoding Services API batch service
Stars: ✭ 40 (+25%)
Mutual labels:  geospatial, census
python-for-gis-progression-path
Progression path for a GIS analyst who wants to become proficient in using Python for GIS: from apprentice to guru
Stars: ✭ 98 (+206.25%)
Mutual labels:  geospatial, geography
turf-go
A Go language port of Turf.js
Stars: ✭ 41 (+28.13%)
Mutual labels:  geospatial, geography
Geoswift
The Swift Geometry Engine.
Stars: ✭ 1,267 (+3859.38%)
Mutual labels:  cartography, geospatial
census-100-people
Census 2016: This is Australia as 100 people
Stars: ✭ 13 (-59.37%)
Mutual labels:  australia, census
Osmnx
OSMnx: Python for street networks. Retrieve, model, analyze, and visualize street networks and other spatial data from OpenStreetMap.
Stars: ✭ 3,357 (+10390.63%)
Mutual labels:  geospatial, geography
Geography for hackers
Geography for Hackers - Teaching all how to hack geography, use GIS, and think spatially
Stars: ✭ 25 (-21.87%)
Mutual labels:  cartography, geography
Editor
An open source visual editor for the 'Mapbox Style Specification'
Stars: ✭ 1,167 (+3546.88%)
Mutual labels:  cartography, geospatial
Contextily
Context geo-tiles in Python
Stars: ✭ 254 (+693.75%)
Mutual labels:  cartography, geography
awesome-geospatial-data-download-sites
This is the repo for open source geospatial data download sites.
Stars: ✭ 19 (-40.62%)
Mutual labels:  geospatial
geos-cli
A native geometry command line library using libgeos.
Stars: ✭ 20 (-37.5%)
Mutual labels:  geospatial
census-map-downloader
Easily download U.S. census maps
Stars: ✭ 31 (-3.12%)
Mutual labels:  census
Land-Cover-Classification-using-Sentinel-2-Dataset
Application of deep learning on Satellite Imagery of Sentinel-2 satellite that move around the earth from June, 2015. This image patches can be trained and classified using transfer learning techniques.
Stars: ✭ 36 (+12.5%)
Mutual labels:  geospatial
pyGISS
📡 A lightweight GIS Software in less than 100 lines of code
Stars: ✭ 114 (+256.25%)
Mutual labels:  geospatial
deegree3
Official deegree repository providing geospatial core libraries, data access and advanced OGC web service implementations
Stars: ✭ 118 (+268.75%)
Mutual labels:  geospatial
NetCDF.jl
NetCDF support for the julia programming language
Stars: ✭ 102 (+218.75%)
Mutual labels:  geospatial
SpatialDataScience
Introduction to Data Science with R
Stars: ✭ 29 (-9.37%)
Mutual labels:  geography
geovoronoi
a package to create and plot Voronoi regions within geographic boundaries
Stars: ✭ 106 (+231.25%)
Mutual labels:  geospatial

census-loader

A quick way to get started with Australian Bureau of Statistics (ABS) Census 2011 or 2016 data.

census-loader is 2 things:

  1. A quick way to load the entire census into Postgres
  2. A map server for quickly visualising census data and trends

DEMOS (CURRENTLY OFFLINE)

melbourne_rent.png

There are 3 options for loading the data

  1. Run the load-census Python script and build the database schemas in a single step
  2. Build the database in a docker environment.
  3. Download the Postgres dump files and restore them in your database. Note: Census 2016 data and ASGS boundaries only

Option 1 - Run load-census.py

Running the Python script takes 15-30 minutes on a Postgres server configured for performance.

Benchmarks are:

  • 3 year old, 32 core Windows server with SSDs = 15 mins
  • MacBook Pro = 25 mins

Performance

To get a good load time you'll need to configure your Postgres server for performance. There's a good guide here, noting it's a few years old and some of the memory parameters can be beefed up if you have the RAM.

Pre-requisites

  • Postgres 9.6+ with PostGIS 2.3+ (tested on 9.6 on macOS Sierra and Windows 10)
  • Add the Postgres bin directory to your system PATH
  • Python 3.x with Psycopg2, xlrd & Pandas packages installed

Process

  1. Download ABS Census DataPacks
  2. Download ABS 2016 ASGS boundaries or ABS 2011 ASGS boundaries (requires a free login) IMPORTANT - download the ESRI Shapefile versions
  3. (optional) Download the 2016 Indigenous and Non-ABS boundaries as well
  4. Unzip the Census CSV files to a directory on your Postgres server
  5. Alter security on the directory to grant Postgres read access
  6. Unzip the ASGS boundaries to a local directory
  7. Create the target database (if required)
  8. Check the optional and required arguments by running load-census.py with the -h argument (see command line examples below)
  9. Run the script, come back in 10-15 minutes and enjoy!

Command Line Options

The behaviour of census-loader can be controlled by specifying various command line options to the script. Supported arguments are:

Required Arguments

  • --census-data-path specifies the path to the extracted Census metadata and data tables (eg *.xlsx and *.csv files). This directory must be accessible by the Postgres server, and the corresponding local path for the server to this directory may need to be set via the local-server-dir argument
  • --census-bdys-path specifies the path to the extracted ASGS boundary files. Unlike census-data-path, this path does not necessarily have to be accessible to the remote Postgres server.

Postgres Parameters

  • --pghost the host name for the Postgres server. This defaults to the PGHOST environment variable if set, otherwise defaults to localhost.
  • --pgport the port number for the Postgres server. This defaults to the PGPORT environment variable if set, otherwise 5432.
  • --pgdb the database name for Postgres server. This defaults to the PGDATABASE environment variable if set, otherwise psma_201602.
  • --pguser the username for accessing the Postgres server. This defaults to the PGUSER environment variable if set, otherwise postgres.
  • --pgpassword password for accessing the Postgres server. This defaults to the PGPASSWORD environment variable if set, otherwise password.

Optional Arguments

  • --census-year year of the ABS Census data to load. Valid values are 2011 and 2016 Defaults to 2016.
  • --data-schema schema name to store Census data tables in. Defaults to census_2016_data. You will need to change this argument if you set --census-year=2011
  • --boundary-schema schema name to store Census boundary tables in. Defaults to census_2016_bdys. You will need to change this argument if you set --census-year=2011
  • --web-schema schema name to store Census boundary tables in. Defaults to census_2016_web. You will need to change this argument if you set --census-year=2011
  • --max-processes specifies the maximum number of parallel processes to use for the data load. Set this to the number of cores on the Postgres server minus 2, but limit to 12 if 16+ cores - there is minimal benefit beyond 12. Defaults to 3.

Example Command Line Arguments

python load-census.py --census-data-path="C:\temp\census_2016_data" --census-bdys-path="C:\temp\census_2016_boundaries"

Loads the 2016 Census data using a maximum of 3 parallel processes into the default schemas. Census data archives have been extracted to the folder C:\temp\census_2016_data, and ASGS boundaries have been extracted to the C:\temp\census_2016_boundaries folder.

python load-census.py --census-year=2011 --max-processes=6 --data-schema=census_2011_data --boundary-schema=census_2011_bdys --census-data-path="C:\temp\census_2011_data" --census-bdys-path="C:\temp\census_2011_boundaries"

Loads the 2011 Census data using a maximum of 6 parallel processes into renamed schemas. Census data archives have been extracted to the folder C:\temp\census_2011_data, and ASGS boundaries have been extracted to the C:\temp\census_2011_boundaries folder.

Attribution

When using the resulting data from this process - you will need to adhere to the ABS data attribution requirements for the Census and ASGS data, as per the Creative Commons (Attribution) license.

WARNING:

  • The scripts will DROP ALL TABLES and recreate them using CASCADE; meaning you'll LOSE YOUR VIEWS if you have created any! If you want to keep the existing data - you'll need to change the schema names in the script or use a different database

IMPORTANT:

  • Whilst you can choose which 3 schemas to load the data into, I haven't QA'd the permutations. Stick with the defaults if you have limited Postgres experience

Option 2 - Build the database in a docker environment

Create a Docker container with Census data and ASGS boundaries ready to go, so they can be deployed anywhere.

Process

  1. Download ABS Census DataPacks
  2. Download ABS 2016 ASGS boundaries or ABS 2011 ASGS boundaries (requires a free login) IMPORTANT - download the ESRI Shapefile versions
  3. (optional) Download the 2016 Indigenous and Non-ABS boundaries as well
  4. Unzip Census data and ASGS boundaries in the data/ directory of this repository
  5. Run docker-compose: docker-compose up. The database will be built.
  6. Use the constructed database as you wish.

If you want only the db running, do docker-compose up db. If you want to view the webapp, navigate to localhost, or the docker machine IP on (if you're doing Docker the old way!).

Option 3 - Load PG_DUMP Files

Download Postgres dump files and restore them in your database.

Should take 15-30 minutes.

Pre-requisites

Process

  1. Download census_2016_data.dmp (~0.6Gb)
  2. Download census_2016_bdys.dmp (~1.1Gb)
  3. Download census_2016_web.dmp (~0.8Gb)
  4. Edit the restore-census-schemas.bat or .sh script in the supporting-files folder for your database parameters and for the location of pg_restore
  5. Run the script, come back in 15-30 minutes and enjoy!

Data License

Source: Australian Bureau of Statistics

DATA CUSTOMISATION

  • Display optimised tables are created by this process, They allow for web mapping from the state level down the SA1 level. These are created in the census web schema.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].