I have been deeply interested in algorithmic trading and systematic trading algorithms. This Repository contains the code of what I have learnt on the way. It starts form some basic simple statistics and will lead up to complex machine learning algorithms.

Stars: ✭ 47 (+67.86%)

Mutual labels: statistics, pandas

Imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).

Stars: ✭ 194 (+592.86%)

Mutual labels: statistics, ml

Fecon235

Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt Case-Shiller housing asset portfolio equities SPX bonds TIPS rates currency FX euro EUR USD JPY yen XAU gold Brent WTI oil Holt-Winters time-series forecasting statistics econometrics

Stars: ✭ 708 (+2428.57%)

Mutual labels: statistics, pandas

Polyaxon

Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)

Stars: ✭ 2,966 (+10492.86%)

Mutual labels: workflow, ml

Pingouin

Statistical package in Python based on Pandas

Stars: ✭ 651 (+2225%)

Mutual labels: statistics, pandas

Fecon236

Tools for financial economics. Curated wrapper over Python ecosystem. Source code for fecon235 Jupyter notebooks.

Stars: ✭ 72 (+157.14%)

Mutual labels: statistics, pandas

Dataframe Go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Stars: ✭ 487 (+1639.29%)

Mutual labels: statistics, pandas

Ml Dl Scripts

The repository provides usefull python scripts for ML and data analysis

Stars: ✭ 119 (+325%)

Mutual labels: statistics, ml

fairlens

Identify bias and measure fairness of your data

Stars: ✭ 51 (+82.14%)

Mutual labels: statistics, pandas

Csinva.github.io

Slides, paper notes, class notes, blog posts, and research on ML 📉, statistics 📊, and AI 🤖.

Stars: ✭ 342 (+1121.43%)

Mutual labels: statistics, ml

Choochoo

Training Diary

Stars: ✭ 186 (+564.29%)

Mutual labels: statistics, pandas

prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (+92.86%)

Mutual labels: workflow, pandas

View All Similar Projects ➔

A library for making stability analysis simple. Easily evaluate the effect of judgement calls to your data-science pipeline (e.g. choice of imputation strategy)!

Why use `vflow`?

Using vflow's simple wrappers easily enables many best practices for data science, and makes writing pipelines easy (following the veridical data-science framework.

Stability	Computation	Reproducibility
Replace a single function (e.g. preprocessing) with a set of functions and easily assess the stability of downstream results	Automatic parallelization and caching throughout the pipeline	Automatic experiment tracking and saving

Here we show a simple example of an entire data-science pipeline with several perturbations (e.g. different data subsamples, models, and metrics) written simply using vflow.

import sklearn
from sklearn.metrics import accuracy_score, balanced_accuracy_score
from vflow import init_args, Vset

# initialize data
X, y = sklearn.datasets.make_classification()
X_train, X_test, y_train, y_test = init_args(
    sklearn.model_selection.train_test_split(X, y),
    names=['X_train', 'X_test', 'y_train', 'y_test']  # optionally name the args
)

# subsample data
subsampling_funcs = [
    sklearn.utils.resample for _ in range(3)
]
subsampling_set = Vset(name='subsampling',
                       modules=subsampling_funcs,
                       output_matching=True)
X_trains, y_trains = subsampling_set(X_train, y_train)

# fit models
models = [
    sklearn.linear_model.LogisticRegression(),
    sklearn.tree.DecisionTreeClassifier()
]
modeling_set = Vset(name='modeling',
                    modules=models,
                    module_keys=["LR", "DT"])
modeling_set.fit(X_trains, y_trains)
preds_test = modeling_set.predict(X_test)

# get metrics
binary_metrics_set = Vset(name='binary_metrics',
                          modules=[accuracy_score, balanced_accuracy_score],
                          module_keys=["Acc", "Bal_Acc"])
binary_metrics = binary_metrics_set.evaluate(preds_test, y_test)

Once we've written this pipeline, we can easily measure the stability of metrics (e.g. "Accuracy") to our choice of subsampling or model.

Documentation

See the docs for reference on the API

Notebook examples (Note that some of these require more dependencies than just those required for vflow - to install all, use the notebooks dependencies in the setup.py file)

Synthetic classification

Enhancer genomics

fMRI voxel prediction

Fashion mnist classification

Feature importance stability

Clinical decision rule vetting

Installation

Install with pip install vflow (see here for help). For dev version (unstable), clone the repo and run python setup.py develop from the repo directory.

References

interface: easily build on scikit-learn and dvc (data version control)
computation: integration with ray and caching with joblib
tracking: mlflow
pull requests very welcome! (see contributing.md)

@software{duncan2020vflow,
   author = {Duncan, James and Kapoor, Rush and Agarwal, Abhineet and Singh, Chandan and Yu, Bin},
   doi = {10.21105/joss.03895},
   month = {1},
   title = {{VeridicalFlow: a Python package for building trustworthy data science pipelines with PCS}},
   url = {https://doi.org/10.21105/joss.03895},
   year = {2022}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Yu-Group / veridical-flow

Programming Languages

Labels

Projects that are alternatives of or similar to veridical-flow

Why use `vflow`?

Documentation

Installation

References

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Yu-Group / veridical-flow

Programming Languages

Labels

Projects that are alternatives of or similar to veridical-flow

Why use vflow?

Documentation

Installation

References

Why use `vflow`?