All Projects → stomioka → Sdtm_mapper

stomioka / Sdtm_mapper

Licence: gpl-3.0
AI SDTM mapping (R for ML, Python, TensorFlow for DL)

Projects that are alternatives of or similar to Sdtm mapper

Seq2seq tutorial
Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"
Stars: ✭ 132 (+388.89%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Ktext
Utilities for preprocessing text for deep learning with Keras
Stars: ✭ 182 (+574.07%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Natural Language Processing Specialization
This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera
Stars: ✭ 151 (+459.26%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
News push project
Real Time News Scraping and Recommendation System - React | Tensorflow | NLP | News Scrapers
Stars: ✭ 44 (+62.96%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+911.11%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Bertqa Attention On Steroids
BertQA - Attention on Steroids
Stars: ✭ 112 (+314.81%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Nlp profiler
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (+570.37%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+5003.7%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Nemo
NeMo: a toolkit for conversational AI
Stars: ✭ 3,685 (+13548.15%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Natural Language Processing With Tensorflow
Natural Language Processing with TensorFlow, published by Packt
Stars: ✭ 222 (+722.22%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (+44.44%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Hands On Nltk Tutorial
The hands-on NLTK tutorial for NLP in Python
Stars: ✭ 419 (+1451.85%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Pytorch Question Answering
Important paper implementations for Question Answering using PyTorch
Stars: ✭ 154 (+470.37%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Melusine
Melusine is a high-level library for emails classification and feature extraction "dédiée aux courriels français".
Stars: ✭ 222 (+722.22%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Dab
Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ
Stars: ✭ 294 (+988.89%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Chinese models for spacy
SpaCy 中文模型 | Models for SpaCy that support Chinese
Stars: ✭ 543 (+1911.11%)
Mutual labels:  jupyter-notebook, nlp-machine-learning
Word2vec Workshop
word2vec workshop - a conceptual introduction and practical application
Stars: ✭ 21 (-22.22%)
Mutual labels:  jupyter-notebook
Osgeolive Notebooks
Repository for OSGeo-Live Jupyter Notebooks
Stars: ✭ 21 (-22.22%)
Mutual labels:  jupyter-notebook
Deeplearning tutorial
Deep Learning Tutorial in Python with Keras library
Stars: ✭ 21 (-22.22%)
Mutual labels:  jupyter-notebook
Techtalks
Slides and Supplementary Material of the past TechTalks at the Karlsruhe Machine Learning, Statistics and AI Meetup
Stars: ✭ 21 (-22.22%)
Mutual labels:  jupyter-notebook

sdtm-mapper

Sam Tomioka

Feb 2019

About

sdtm-mapper is a Python package to generate machine readable CDISC SDTM mapping specifications with help from AI. This can be used for following tasks.

  1. Generates an empty specifications for training data from a user provided SAS dataset. This empty specification will contain SAS dataset attributes. You don't need to use Proc Contents in SAS to do this! SAS datasets maybe in your aws s3 bucket or local folder.
  2. Runs models to generate a mapping specifications.
  3. Generates your own mapping algorithms using your data. The models can be trained to generate the target variables but also programming sudo code.

The first version comes with three pre-trained models (Included in the package). These are trained on feed forward NN with trainable ELMo embedding layer for 34 classes using adverse event datasets from 18 clinical trials, and validation was done on 3 clinical trials until the models were optimized. Test was done on 1 clinical trial. 22 clinical trials data are extracted from Medidata Rave built by 3 different CROs and Sunovion Pharmaceuticals.

Models Parameters Training Acc Validation Acc Test Acc*
1. Elmo+sfnn+ae+Model1.h5 271,142 0.9795 0.9800 0.9540
2. Elmo+fnn+ae+Model2.h5 664,870 0.9846 1.0000 0.9425
3. Elmo+fnn+ae+Model3.h5 594,854 0.9966 1.0000 0.9666

Table 1 - Performance of three models
* Macro accuracy account for system variables for 'drop'.

High variance models may be due to addition of CDASH metadata, and probably better to remove them.

Improvement of the task specific model are explored by Peters et.al [1]:

  1. Freeze context-independent representations from the pre-trained biLM and concatenate them and $ELMo^{task}_{k}$ and pass that into task RNN.
  2. Replacing $h_k$ with $[x_k; ELMo^{task}_{k}]$. Peters et.al [1] has shown improved performance in some tasks such as SNLI and SQuAD by including ELMo at the output of the task RNN.
  3. Add a moderate amount of dropout to ELMo.
  4. Regularize the ELMo weights by adding $\gamma||w||^2_2$ to the loss function.

These can be considered as future enhancment for other domains that may not perform well.

Here is the architecture of ELMo.

Figure 1 - biLM architecture for ELMo

Installation

pip install sdtm-mapper

Tutorials on Google Colab

  1. How to prepare training data using sdtm-mapper from SAS7bdat files?
  2. Tutorial on how to use sdtm-mapper to generate mapping specifications
  3. Train your data using SDTMMapper on Model 1: Note that you need to supply your training data.

Notes

You have to have an environment to use tensorflow, tensorflow-hub etc.

If you want to contribute for adding more models for different SDTM domains, please join PhUSE ML Project Community. Most of the work has been done during the weekends or evening. Your contributions are always welcome!

Notes about the trained models:

The models were build and trained on raw AE datasets from clincial trials conducted by Sunovion Pharmaceuticals. The EDC system we use is Medidata RaveX. The training data contains some e-source data. The performance may not be good for your data. You can also build your models using SDTMMapper tool and use your custom model for your datasets.

Old reame file is found here

Issues

For any questions, comments, suggestions, or issues, please post them here

For personal communication related to SDTMMapper, please contact Sam Tomioka

Disclaimer

This is not an official Sunovion Pharmaceuticals product.

References

1] Peters,M et al. (2018). Deep contextualized word representations

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].