Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → finos → datahub

finos / datahub

Licence: Apache-2.0 License

DataHub - Synthetic data library

Programming Languages

139335 projects - #7 most used programming language

184084 projects - #8 most used programming language

56736 projects

Labels

data library sklearn pandas synthetic

Projects that are alternatives of or similar to datahub

machine learning web app game where the user competes against the AI in picking stocks

Stars: ✭ 108 (+63.64%)

Mutual labels: sklearn, pandas

sklearn-predict

机器学习数据，预测趋势并画图

Stars: ✭ 16 (-75.76%)

Mutual labels: sklearn, pandas

Machine Learning Projects

This repository consists of all my Machine Learning Projects.

Stars: ✭ 135 (+104.55%)

Mutual labels: sklearn, pandas

Data-Analyst-Nanodegree

Kai Sheng Teh - Udacity Data Analyst Nanodegree

Stars: ✭ 42 (-36.36%)

Mutual labels: sklearn, pandas

Machine Learning

从零基础开始机器学习之旅

Stars: ✭ 209 (+216.67%)

Mutual labels: sklearn, pandas

Precompiled packages for AWS Lambda

Stars: ✭ 997 (+1410.61%)

Mutual labels: sklearn, pandas

主要是爬虫与数据分析项目总结，外加建模与机器学习，模型的评估。

Stars: ✭ 142 (+115.15%)

Mutual labels: sklearn, pandas

Daily Stock Forecast

Daily Stock Forecasts using Machine Learning & Python

Stars: ✭ 341 (+416.67%)

Mutual labels: sklearn, pandas

SciKIt-learn Pipeline in PAndas

Stars: ✭ 33 (-50%)

Mutual labels: sklearn, pandas

Data Science Notebook

📖 每一个伟大的思想和行动都有一个微不足道的开始

Stars: ✭ 196 (+196.97%)

Mutual labels: sklearn, pandas

A constantly updated python machine learning cheatsheet

Stars: ✭ 136 (+106.06%)

Mutual labels: sklearn, pandas

Universal 1d/2d data containers with Transformers functionality for data analysis.

Stars: ✭ 25 (-62.12%)

Mutual labels: sklearn, pandas

Tensorflow Ml Nlp

텐서플로우와 머신러닝으로 시작하는 자연어처리(로지스틱회귀부터 트랜스포머 챗봇까지)

Stars: ✭ 176 (+166.67%)

Mutual labels: sklearn, pandas

NOTE: skutil is now deprecated. See its sister project: https://github.com/tgsmith61591/skoot. Original description: A set of scikit-learn and h2o extension classes (as well as caret classes for python). See more here: https://tgsmith61591.github.io/skutil

Stars: ✭ 29 (-56.06%)

Mutual labels: sklearn, pandas

ml-workflow-automation

Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.

Stars: ✭ 44 (-33.33%)

Mutual labels: sklearn, pandas

Breast-Cancer-Scikitlearn

simple tutorial on Machine Learning with Scikitlearn

Stars: ✭ 33 (-50%)

Mutual labels: sklearn

Arch-Data-Science

Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision

Stars: ✭ 92 (+39.39%)

Mutual labels: pandas

A Python project to download, process and visualize medium-to-massive amount of seismic waveforms and metadata

Stars: ✭ 18 (-72.73%)

Mutual labels: pandas

A dynamic microsimulation framework for python

Stars: ✭ 15 (-77.27%)

Mutual labels: pandas

Pandas type stubs. Helps you type-check your code.

Stars: ✭ 84 (+27.27%)

Mutual labels: pandas

View All Similar Projects ➔

DataHub

Synthetic data generation

DataHub is a set of python libraries dedicated to the production of synthetic data to be used in tests, machine learning training, statistical analysis, and other use cases wiki. DataHub uses existing datasets to generate synthetic models. If no existing data is available it will use user-provided scripts and data rules to generate synthetic data using out-of-the-box helper datasets.

Synthetic datasets are simply artificiality manufactured sets, produced to a desired degree of accuracy. Real Data does play a part in synthetic generation, all depending on the realism you require. The product roadmaps details out the functionality planned in this respect.

DataHub's core is predominantly based around pandas data frames and object generation. A common question: Now that I have a data frame of synthetic-data, what do I do with it? The Pandas library comes with an array of options here - so for the time being sinking to databases is out of the scope of the core library, however see that examples in the test folder for some common patterns.

note As we build out a config based synthetic spec generator, we will bring this back into scope - please see our roadmap/issue list and get involved in the discussion.

Key documents

For information on how to get started with DataHub see our Getting Started Guide
For more technical information about DataHub and how to customize it, see the Developer Guide
For high-level project direction see Road Map, Requirements Gathering Approach and Delegated Action Groups.
For Feature Development, Good First Issues, Help Wanted and Bug Tracking see DataHub GitHub Issues.
This project uses Gravizo for all diagrams and charts as highlighted in DataHub Issue 41.

Overview of Synthetic data

Synthetic data is information that's is artificially manufactured rather than generated by *real-world events.
Synthetic data is created algorithmically, and can be used as a stand-in for test datasets of production data
Real data does play a part in synthetic data generation - depending on how realistic you want the output

License

Copyright 2020 Citigroup

Distributed under the Apache License, Version 2.0.

SPDX-License-Identifier: Apache-2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 66

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (19) 🔗