All Projects → JustinGOSSES → MannvilleGroup_Strat_Hackathon

JustinGOSSES / MannvilleGroup_Strat_Hackathon

Licence: other
stratigraphic machine-learning - active work moved to Predictatops

Programming Languages

Jupyter Notebook
11667 projects
Lasso
22 projects
HTML
75241 projects

Projects that are alternatives of or similar to MannvilleGroup Strat Hackathon

pybedforms
python port of the USGS bedforms software tool
Stars: ✭ 30 (+76.47%)
Mutual labels:  geology, stratigraphy
QGeoloGIS
Migrated to: https://gitlab.com/Oslandia/qgis/QGeoloGIS
Stars: ✭ 27 (+58.82%)
Mutual labels:  geology, well-logs
wellioviz
d3.js v5 visualization of well logs
Stars: ✭ 36 (+111.76%)
Mutual labels:  geology, well-logs
btc-bash-ng
math and bitcoin tools in gnu bc and bash
Stars: ✭ 25 (+47.06%)
Mutual labels:  curve
gis-for-geoscientists
Repository for "GIS for Geoscientists" workshop series. This repo contains data, protocols, outputs, lectures, and resources used the workshop. Course taught by Nicholas Barber. Available for future booking upon request! Contact me ([email protected]) for a quote.
Stars: ✭ 19 (+11.76%)
Mutual labels:  geology
libgoldilocks
An implementation of Mike Hamburg's Ed448 (Goldilocks) curve - derived from libdecaf. This is a mirror of https://bugs.otr.im/otrv4/libgoldilocks
Stars: ✭ 17 (+0%)
Mutual labels:  curve
dmatrix2np
Convert XGBoost's DMatrix format to np.array
Stars: ✭ 14 (-17.65%)
Mutual labels:  xgboost
moon geology atlas of space
Code, data, and instructions for mapping the geology of the moon
Stars: ✭ 76 (+347.06%)
Mutual labels:  geology
stackgbm
🌳 Stacked Gradient Boosting Machines
Stars: ✭ 24 (+41.18%)
Mutual labels:  xgboost
PRo3D
PRo3D, short for Planetary Robotics 3D Viewer, is an interactive 3D visualization tool allowing planetary scientists to work with high-resolution 3D reconstructions of the Martian surface.
Stars: ✭ 33 (+94.12%)
Mutual labels:  geology
BedMachine
Matlab tools for loading, interpolating, and displaying BedMachine ice sheet topography.
Stars: ✭ 18 (+5.88%)
Mutual labels:  geology
smooth-corners
CSS superellipse masks using the Houdini API
Stars: ✭ 133 (+682.35%)
Mutual labels:  curve
XGBoost-in-Insurance-2017
Data and Code to reproduce results for my talk at Paris: R in Insurance 2017 Conference
Stars: ✭ 16 (-5.88%)
Mutual labels:  xgboost
discrete frechet
Compute the Fréchet distance between two polygonal curves in Euclidean space.
Stars: ✭ 68 (+300%)
Mutual labels:  curve
aws-customer-churn-pipeline
An End to End Customer Churn Prediction solution using AWS services.
Stars: ✭ 30 (+76.47%)
Mutual labels:  xgboost
ml-simulations
Animated Visualizations of Popular Machine Learning Algorithms
Stars: ✭ 33 (+94.12%)
Mutual labels:  xgboost
Machine Learning Code
《统计学习方法》与常见机器学习模型(GBDT/XGBoost/lightGBM/FM/FFM)的原理讲解与python和类库实现
Stars: ✭ 169 (+894.12%)
Mutual labels:  xgboost
frechet
Discrete Fréchet distance and of the minimum path required for traversing with it
Stars: ✭ 14 (-17.65%)
Mutual labels:  curve
sagemaker-xgboost-container
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Stars: ✭ 93 (+447.06%)
Mutual labels:  xgboost
gis-snippets
Some code snippets for GIS tasks
Stars: ✭ 45 (+164.71%)
Mutual labels:  geology

Development has moved to predictatops repository.

MannvilleGroup_Strat_Hackathon

Project Statement

Predict stratigraphic surfaces based on training on human-picked stratigraphic surfaces. Used 2000+ wells with Picks from the Mannville, including McMurray, in Alberta, Canada.

Philosophy

Instead of assuming there is a mathematical or pattern basis for stratigraphic surfaces that can be teased out of logs, focus on creating programatic features and operations that mimic the comparison-based observations that would have been done by a geologist.

Project Summary

There has been studies that attempt to do similiar things for decades. A lot of them assume a mathematical pattern to stratigraphic surfaces and either don't train specifically on human-picked tops or do so lightly. We wanted to try as close a geologic approach (as opposed to mathematical or geophysical approach) as possible. What we managed to get done by the end of the hackathon is sorta a small scale first pass.

Eventually, we want to get to the point where we've identified a large number of feature types that both have predictive value and can be tied back to geologist insight. There are a lot of observations happening visually (and therefore not consciously) when a geologist looks at a well log and correlates it. We want to focus on engineering features that mimic these observations and the multitude of scales at which they occur.

In addition to automating correlation of nearby wells based on picks that already exist, which has value, we think this will help geologist have better discussions, and more quantitative discussions, about the basis of their correlation and why correlations might differ between geologists. You can imagine a regional area with two separate teams with different approaches to picking a top. You could use this to programmatically pick tops in area B like they are picked in area A and also the inverse. The differences in pick style then becomes easier to analyze with less additional work.

Datasets for Hackathon project

Report for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/document/OFR/OFR_1994_14.PDF

Electronic data for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/publications/SPE_006.html Data is also in the repo folder: SPE_006_originalData

@dalide used the Alberta Geological Society's UWI conversion tool to find lat/longs for each of the well UWIs. These were then used to find each well's nearest neighbors as demonstrated in this notebook.

Folder Re-Organization

On February 11th, 2018, @JustinGosses reorganized the folder to get a lot of the notebooks out of the top-level and into sub-folders as things were getting too crowded. This might cause the directory urls to some files to be incorrect. This will be the case for any notebook from the Hackathon or 2017. Fixing this problem will just require adding a ../ or ../../ to the front of the directory in most cases.

Key Jupyter Notebooks finished during Hackathon project

Final Data Prep & Machine Learning for the prediction finished by end of hackathon https://github.com/JustinGOSSES/MannvilleGroup_Strat_Hackathon/blob/master/data_prep_wells_xgb.ipynb

Version of feature engineering work done during hackathon (but didn't get to include during hackathon) https://github.com/JustinGOSSES/MannvilleGroup_Strat_Hackathon/blob/master/Feature_Brainstorm_Justin_vD-Copy1.ipynb

Key Jupyter Notebooks Post Hackathon

Code development has moved to the modular_redo sub-folder. Things were made more modular to better enable short bits of work when time available. The notebooks are a bit messy but will clean up in near future. https://github.com/JustinGOSSES/MannvilleGroup_Strat_Hackathon/tree/master/notebooks_2018/modular_redo]

Recent updates

The code runs faster and and mean absolute error is down from 90 to 15.03 and now 7+. Key approaches were:

  1. Leverage knowledge from nearby wells.
  2. Instead of distinguishing between 2 classes, pick and not pick, distinguish between 3 classes: (a) pick, (b) not pick but within 3 meters and (c) not pick and not within 3 meters of pick.
  3. More features
  4. A Two step approach to machine-learning:

4-1. First step is class-based prediction. Classes are groups based on distance from actual pick. For example, depths at the pick, depths within 0.5 meter, depths within 5 meters above, etc. 4-2. Second step is more concerned with picking between the depths predicted as being in the classes nearest to the pick. We've explored both a rule-based scoring and a regression machine-learning process for this. 4-2-1. The rule-based approach uses the class prediction and simple additive scoring of the class predictions based across different size windows. In a scenario where there are two depths with a predicted class of 100, we decide between them by adding up all the class scores across different size windows above and below each depth. The depth with the highest aggregate score wins and it declared the "predicted depth". We take this route as we assume the right depth will have more depths near it that look like the top pick and as such have higher classes predicted for depths around it while false positives will be more likely to have more lower level classes around it. 4-2-2. We're also trying regression-based machine-learning to predict the distance from each depth in question to the actual pick. The depth with the lowest predicted distance between it and actual pick is chosen as the "predicted pick". This approach hasn't given any better results than the simple rule-based aggregate scoring.

Distribution of Absolute Error in Test Portion of Dataset for Top McMurray Surface in Meters.

Y-axis is number of picks in each bin, and X-axis is distance predicted pick is off from human-generated pick. image of current_errors_TopMcMr_20181006

Current algorithm used is XGBoost.

Future Work [also see issues]

  1. Visualize probabilty of pick along well instead of just returning max probability prediction in each well.
  2. Generate average aggregate wells in different local areas for wells at different prediction levels. See if there are trends or if this helps to idenetify geologic meaningful features that correlate to many combined machine-learning model features.
  3. Explore methods to visualize weigtings of features on individual well basis using techniques similar to those learned in image-based deep-learning.
  4. Cluster wells using unsupervised learning and then see if clusters can be created that correlated with supervised prediction results. (initial trials with UMAP give encouraging results)
  5. Rework parts of this into more object oriented approach.
  6. Use H2O's automl library to try to improve on standard XGBoost approach.

Eventual Move of this Repository Contents to a Different Repository

The plan is that once things are winnowed down to a final approach, the resulting code will be moved the StratPickSupML repository will it will be cleaned into one or more modules and demo notebooks with less clutter of failed but possibly useful if reworked approaches.

Help Wanted

This repo isn't particularly organized and there hasn't be a lot of time spent (actually no time spent) to make jumping in and helping out easy. That being said, there's no reason you couldn't just jump in an start improving things. The original group is working on this at a low level when we have time. There are a few issues that are enhancements that would be a good place to start.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].