All Projects → twhughes → Symbolic-Regression

twhughes / Symbolic-Regression

Licence: other
predicting equations from raw data with deep learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Symbolic-Regression

Equation and Codebox
Microsoft Word VSTO Add-In,可以插入带编号的公式和代码
Stars: ✭ 27 (-43.75%)
Mutual labels:  equation
js-multibase
JavaScript implementation of the multibase specification
Stars: ✭ 22 (-54.17%)
Mutual labels:  decoding
notion-page-to-html
NodeJS tool to convert public Notion pages to HTML from page ID
Stars: ✭ 109 (+127.08%)
Mutual labels:  equation
split-ease
The JavaScript Easing function with a beginning, middle and end
Stars: ✭ 55 (+14.58%)
Mutual labels:  equation
ciphr
CLI crypto swiss-army knife for performing and composing encoding, decoding, encryption, decryption, hashing, and other various cryptographic operations on streams of data from the command line; mostly intended for ad hoc, infosec-related uses.
Stars: ✭ 100 (+108.33%)
Mutual labels:  decoding
multibase
multi base encoding/decoding utility
Stars: ✭ 15 (-68.75%)
Mutual labels:  decoding
klatexformula
Generate images from LaTeX equations that you can drag and drop, copy and paste or save to disk.
Stars: ✭ 70 (+45.83%)
Mutual labels:  equation
hal-cgp
Cartesian genetic programming (CGP) in pure Python.
Stars: ✭ 20 (-58.33%)
Mutual labels:  symbolic-regression
SimpleGP
Simple Genetic Programming for Symbolic Regression in Python3
Stars: ✭ 20 (-58.33%)
Mutual labels:  symbolic-regression
mathconverter
Converts from AsciiMath, LaTeX, MathML to LaTeX, MathML
Stars: ✭ 35 (-27.08%)
Mutual labels:  equation
stiff3
Adaptive solver for stiff systems of ODEs using semi-implicit Runge-Kutta method of third order
Stars: ✭ 13 (-72.92%)
Mutual labels:  equation
Mathquill
Easily type math in your webapp
Stars: ✭ 1,968 (+4000%)
Mutual labels:  equation
sms
A Go library for encoding and decoding SMSs
Stars: ✭ 37 (-22.92%)
Mutual labels:  decoding
tex-equation-to-svg
Convert a TeX or LaTeX string to an SVG.
Stars: ✭ 34 (-29.17%)
Mutual labels:  equation
CTF-CryptoTool
CTF-CryptoTool is a tool written in python, for breaking crypto text of CTF challenges. It tries to decode the cipher by bruteforcing it with all known cipher decoding methods easily. Also works for the cipher which does not have a key.
Stars: ✭ 38 (-20.83%)
Mutual labels:  decoding
Fluid Simulation
Self advection, external force and pressure solve to a velocity field represented by a MaC grid.
Stars: ✭ 107 (+122.92%)
Mutual labels:  equation
LiveVideo10ms
Real time video decoding on android
Stars: ✭ 41 (-14.58%)
Mutual labels:  decoding
re-typescript
An opinionated attempt at finally solving typescript interop for ReasonML / OCaml.
Stars: ✭ 68 (+41.67%)
Mutual labels:  decoding
vorbis aotuv
"aoTuV" is library for encoding and decoding of OggVorbis
Stars: ✭ 35 (-27.08%)
Mutual labels:  decoding
morton-nd
A header-only compile-time Morton encoding / decoding library for N dimensions.
Stars: ✭ 78 (+62.5%)
Mutual labels:  decoding

Accelerating Symbolic Regression with Deep Learning

by Tyler Hughes, Siddharth Buddhiraju, and Rituraj

Introduction

The goal of symbolic regression is to generate a function that describes a given set of datapoints. This function can, generally, include undetermined constants that may be fit later. Typical approaches to symbolic regression involve global optimization techniques based on genetic algorithms [1,2,3], that search a subset of the entire space of possible equations to find the best fit. This approach is cumbersome and does not utilize any of the features or structure inherent in the original data.

In this work, we approach symbolic as a modified machine translation problem, where our goal is to translate the original dataset into an equation using a mixture of machine learning techniques. We first train a neural network to extract features from the dataset by performing a fit and stripping out the learned parameters from the network. We then train a recurrent neural network model (using LSTMs) to decode this feature vector into an equation. Further processing of this equation can be performed to fit constants and evaluate accuracy vs. simplicity.

This work presents a fresh approach to the problem of symbolic regression and may allow for potential increases in predictive power, reliability, and speed compared to previous solutions.

Paper

This work was performed as the final project for Stanford CS 221, Artificial Intelligence: Principles and Techniques. The final writeup is linked here.

Code

Generating the Dataset

The data is generated by running the generate_examples.py script. One must specify the maximum equation tree depth allowable by setting the tree_depth variable. For a number of training examples, this script generates a random equation up to this maximum depth, sets the constants randomly, and then generates a set of (x,y) pairs. Then, a neural network in NN.py fits to this dataset and returns a feature vector for this training example. All of this data is stored in ./data/ directory and is separated by the maximum allowed tree depth. 1500 examples of depths 1, 2, and 3 have been pre-generated and are in ./data/. Therefore, for purposes of training, this section can be skipped.

Decoding and Equation Prediction

The files LSTM_basic.py, LSTM_sequence.py, and LSTM_tree.py are each separate implementations of the LSTM network used to decode the feature vector into equations.

LSTM_basic.py allows for user-defined equations and feature vectors, and was used for debugging and investigating simple cases.

LSTM_sequence.py trains an linear LSTM sequence model on the dataset and returns equations containing special elements for parentheses. For example, sin(x) would be represented as [ sin , ( , x , ) ]

LSTM_tree.py trains an tree-based LSTM sequence model that is intended to mirror the basic representation of equations as trees. This script trains a full binary tree of LSTMs using the original dataset. On prediction time, it returns another full binary equation tree with special characters, which are then pruned to leave the predicted equation.

These three files show both training and prediction statistics to the user.

Curve Fitting

After generating the equations, we provide a file fitter.py for doing curve fitting to the original (x,y) data points. This script contains an example, which fits when running the script. The functions from this script can also be imported for post-processing.

Dependencies

References

[1] Josh Bongard and Hod Lipson. Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 104(24):9943–9948, 2007.

[2] John Duffy and Jim Engle-Warnick. Using symbolic regression to infer strategies from experimental data. In Evolutionary computation in Economics and Finance, pages 61–82. Springer, 2002.

[3] Wouter Minnebo, Sean Stijven, and Katya Vladislavleva. Empowering knowledge computing with variable selection, 2011.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].