Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

rk2900 / Drsa

Deep Recurrent Survival Analysis, an auto-regressive deep model for time-to-event data analysis with censorship handling. An implementation of our AAAI 2019 paper and a benchmark for several (Python) implemented survival analysis methods.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning data-science

Projects that are alternatives of or similar to Drsa

Breakdown

Model Agnostics breakDown plots

Stars: ✭ 93 (-6.06%)

Mutual labels: data-science

Blurr

Data transformations for the ML era

Stars: ✭ 96 (-3.03%)

Mutual labels: data-science

Har Keras Cnn

Human Activity Recognition (HAR) with 1D Convolutional Neural Network in Python and Keras

Stars: ✭ 97 (-2.02%)

Mutual labels: data-science

Bayesian Cognitive Modeling In Pymc3

PyMC3 codes of Lee and Wagenmakers' Bayesian Cognitive Modeling - A Pratical Course

Stars: ✭ 93 (-6.06%)

Mutual labels: data-science

Machinelearning

A repo with tutorials for algorithms from scratch

Stars: ✭ 96 (-3.03%)

Mutual labels: data-science

Jupyterlab Prodigy

🧬 A JupyterLab extension for annotating data with Prodigy

Stars: ✭ 97 (-2.02%)

Mutual labels: data-science

Data Science Blogs

A Handful of D(u)S(t)

Stars: ✭ 92 (-7.07%)

Mutual labels: data-science

Nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Stars: ✭ 10,698 (+10706.06%)

Mutual labels: data-science

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+1251.52%)

Mutual labels: data-science

Flurs

🌊 FluRS: A Python library for streaming recommendation algorithms

Stars: ✭ 97 (-2.02%)

Mutual labels: data-science

R Course

Una introduccion al analisis de datos con R y R Studio

Stars: ✭ 93 (-6.06%)

Mutual labels: data-science

Probflow

A Python package for building Bayesian models with TensorFlow or PyTorch

Stars: ✭ 95 (-4.04%)

Mutual labels: data-science

Papers Literature Ml Dl Rl Ai

Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning

Stars: ✭ 1,341 (+1254.55%)

Mutual labels: data-science

Ml Pyxis

Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.

Stars: ✭ 93 (-6.06%)

Mutual labels: data-science

Oreilly reactive python for data

Resources for the O'Reilly online video "Reactive Python for Data"

Stars: ✭ 98 (-1.01%)

Mutual labels: data-science

Ds With Pysimplegui

Data science and Machine Learning GUI programs/ desktop apps with PySimpleGUI package

Stars: ✭ 93 (-6.06%)

Mutual labels: data-science

Ohayo

ohayo is a fast and free data science studio.

Stars: ✭ 96 (-3.03%)

Mutual labels: data-science

Recommenders

Best Practices on Recommendation Systems

Stars: ✭ 11,818 (+11837.37%)

Mutual labels: data-science

D2l En

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.

Stars: ✭ 11,837 (+11856.57%)

Mutual labels: data-science

Sspipe

Simple Smart Pipe: python productivity-tool for rapid data manipulation

Stars: ✭ 96 (-3.03%)

Mutual labels: data-science

View All Similar Projects ➔

Deep Recurrent Survival Analysis (DRSA)

A tensorflow implementation of DRSA model. This is the experiment code for our AAAI 2019 paper "Deep Recurrent Survival Analysis".

If you have any problems, please feel free to contact the authors Kan Ren, Jiarui Qin and Lei Zheng.

Abstract

Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at fine-grained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three real-world tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.

Model Description

Our model is DRSA model. The baseline models are Kaplan-Meier, Lasso-Cox, Gamma, MTLSA, STM, DeepSurv, DeepHit, DRN, and DRSA. Among the baseline implementations, we forked the code of STM and MTLSA. We made some minor modifications on the two projects to fit in our experiments. To get the modified code, you may click MTLSA @ ba353f8 and STM @ df57e70. Many thanks to the authors of STM and MTLSA. Other baselines' implementations are in python directory.

Data Preparation

We have uploaded a tiny data sample for training and evaluation.

The full dataset for this project can be directly downloaded from this link: https://goo.gl/nUFND4. (I've uploaded the full dataset with three split compressed ZIP files with Git LFS in this repo.) This dataset contains three large-scale datasets in three real-world tasks, which is the first dataset with such scale for experiment reproduction in survival analysis.

After download please replace the sample data in data/ folder with the full data files.

Dataset	MD5 Code	Size
drsa.zip	b63c53559f58e6afa62c121b0dd1997d	2.6 GB

Data specification

We have three datasets and each of them contains .yzbx.txt, featureindex.txt and .log.txt. We created the first data file .log.txt from the raw data of the original data source (please refer to our paper). Then we made feature engineering according to the created feature dictionary featindex.txt. The corresponding feature engineered data are in .yzbx.txt.

If you need to reproduce the experiemtns, you may run over .yzbx.txt. If you want to dive deep and explain the observations of experiments, you would need to look into the the other files like .log.txt and featindex.txt.

In yzbx.txt file, each line is a sample containing the "yztx" data (here we use t and b exchangably), the information is splitted by SPACE. Here z is the true event time, t is the observation time and x is the list of features (multi-hot encoded as feat_id:1). In the experiment, we only use ztx data. Note that, for the uncensored data, z <= t, while for the censored data, z > t.

We conduct a simulation of observation experiments which ranges from the whole timeline of each dataset. Then the end of each observation (in right-censored situation) is tracked as t in the final data yztx along with the true event time z. The true event time z is originally logged in the raw data file. The raw data file (without any feature engineering) is from the other related works as described in the exp. part of our paper. We put the download links as below:

clinic: http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets (supposed to be support2csv.zip, but the raw CLINIC dataset is somehow different, so we have uploaded the raw dataset in this repository.)
music: https://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html
bidding: https://github.com/rk2900/make-ipinyou-data

Installation and Reproduction

TensorFlow(>=1.3) and the other dependant packages (e.g., numpy, sklearn and matplotlib) should be pre-installed before running the code. The Python version we used is 2.7.6.

After package installation, you can simply run the code in python directory with the demo tiny dataset(sampled from BIDDING dataset). The outputs of the code are in python/output directory.

The running command are listed as below.

python km.py             # for Kaplan-Meier
python gamma_model.py    # for Gamma
python cox.py            # for Lasso-Cox and DeepSurv
python deephit.py        # for DeepHit
python DRSA.py 0.0001     # for DRSA

We have set default hyperparameters in the model implementation. So the parameter arguments are optional for running the code.

The results will be printed on the screen with the format: Subset, Train/Test, Step, Cross Entropy, AUC(C-index), ANLP, Total Loss, batch size, hidden state size, learing rate, anlp learning rate, alpha, beta.

Citation

You are more than welcome to cite our paper:

@inproceedings{ren2019deep,
  title={Deep recurrent survival analysis},
  author={Ren, Kan and Qin, Jiarui and Zheng, Lei and Yang, Zhengyu and Zhang, Weinan and Qiu, Lin and Yu, Yong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={33},
  number={01},
  pages={4798--4805},
  year={2019}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 99

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗