loadwiki / Papers4DataAchitect

Licence: other

Collect papers for data engineering such as OLTP/OLAP/ETL/DistributedStorage.

Projects that are alternatives of or similar to Papers4DataAchitect

HTAPBench

Benchmark suite to evaluate HTAP database engines

Stars: ✭ 15 (-11.76%)

Mutual labels: olap, oltp

Radon

RadonDB is an open source, cloud-native MySQL database for building global, scalable cloud services

Stars: ✭ 1,584 (+9217.65%)

Mutual labels: olap, oltp

paper seacher

where where where paper

Stars: ✭ 45 (+164.71%)

Mutual labels: papers

Guided-I2I-Translation-Papers

Guided Image-to-Image Translation Papers

Stars: ✭ 117 (+588.24%)

Mutual labels: papers

dlink

Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.

Stars: ✭ 1,535 (+8929.41%)

Mutual labels: olap

Object Detection

Summary of object detection（modules&&improvements）

Stars: ✭ 50 (+194.12%)

Mutual labels: papers

cuteOS-references

Documentation, references, and collected academic research for the cuteOS Kernel.

Stars: ✭ 32 (+88.24%)

Mutual labels: papers

Awesome-Federated-Learning-on-Graph-and-GNN-papers

Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

Stars: ✭ 206 (+1111.76%)

Mutual labels: papers

procedural-advml

Task-agnostic universal black-box attacks on computer vision neural network via procedural noise (CCS'19)

Stars: ✭ 47 (+176.47%)

Mutual labels: papers

awesome-visual-localization-papers

The relocalization task aims to estimate the 6-DoF pose of a novel (unseen) frame in the coordinate system given by the prior model of the world.

Stars: ✭ 60 (+252.94%)

Mutual labels: papers

awesome-secure-computation

Awesome list for cryptographic secure computation paper. This repo includes *Lattice*, *DifferentialPrivacy*, *MPC* and also a comprehensive summary for top conferences.

Stars: ✭ 125 (+635.29%)

Mutual labels: papers

PyPaperBot

PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub.

Stars: ✭ 184 (+982.35%)

Mutual labels: papers

tools-generation-detection-synthetic-content

Compilation of the state of the art of tools, articles, forums and links of interest to generate and detect any type of synthetic content using deep learning.

Stars: ✭ 107 (+529.41%)

Mutual labels: papers

reading-group

Discussions on papers, frameworks, blogs and ideas every Saturday.

Stars: ✭ 57 (+235.29%)

Mutual labels: papers

metriql

The metrics layer for your data. Join us at https://metriql.com/slack

Stars: ✭ 227 (+1235.29%)

Mutual labels: olap

flock

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms

Stars: ✭ 232 (+1264.71%)

Mutual labels: olap

List-of-Academic-Research-on-Usability-in-FOSS

No description or website provided.

Stars: ✭ 29 (+70.59%)

Mutual labels: papers

awesome-end2end-speech-recognition

💬 A list of End-to-End speech recognition, including papers, codes and other materials

Stars: ✭ 49 (+188.24%)

Mutual labels: papers

MachineLearning-Papers Survey

機械学習関連の論文Survey用レポジトリ

Stars: ✭ 104 (+511.76%)

Mutual labels: papers

Paper-Notes

Paper notes in deep learning/machine learning and computer vision

Stars: ✭ 37 (+117.65%)

Mutual labels: papers

View All Similar Projects ➔

Papers4DataAchitect

Background

There are so many kinds of distributed data store systems , distributed compute systems, distributed machine learning system in DT times.

As a application engineer, you may use RDBMS,NoSQL even NewSQL to store and manage data.
As a data enginner, you may
collect data first
- extract data from app's log file
- capture data chang in RDBMS or NoSQL database,
- crawl data from various web sites
- pull data from third party data vendor through web service api
- massive time sequence data from IOT frontend or some sensor such as car-net or monitor-camera with AI enhancement.
clean and transform data next
- use some ETL utility or run-time stream process system such as flink, kafka.
analyze and training data at last
- analyzed in spark/SQL on hadoop/OLAP datawarehouse, generate data report and visual the result use some tools such as tableau.
- training machine learning models in a distributed machine learning system such as spark ML-Lib, Angel. These model will make some adervtise CTR inference or user recommendation.

Purpose

It is a key ability to work efficently with the different utility for big data/ML pipeline. However, various tools for big data and machine learning are more complicated and more complex. These tools is neither mature as traditional RDBMS nor simple as local algorithm library sucha as sk-learn . Sometimes digging deep into the implemention details of a distributed data store/process/training system may be hard and unnecessary. Nevertheless, understanding some common sense of the software stack will be a great help.

Different system do have some common sense and design patern. It it a good idea to read the original paper which describles the background ,key algorithm,author's consideration. For a data architect or algorithm engineer, reading these paper may be a greate help.

This repository wll collect and classify these papers as a user guide for data/algorithm enginner/architectur.

How to read

A short comment w available as user guide for every paper. The comment consist of background description, abstraction, and contrast between similar systems.

Contents

Online Analysis Process
New SQL DB
Run-time Streaming Compute
Graph Compute
Distributed Machine Learning Paradigm
LinkedIn Big Data Stack

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

loadwiki / Papers4DataAchitect

Labels

Projects that are alternatives of or similar to Papers4DataAchitect

Papers4DataAchitect

Background

Purpose

How to read