All Projects → loadwiki → Papers4DataAchitect

loadwiki / Papers4DataAchitect

Licence: other
Collect papers for data engineering such as OLTP/OLAP/ETL/DistributedStorage.

Projects that are alternatives of or similar to Papers4DataAchitect

HTAPBench
Benchmark suite to evaluate HTAP database engines
Stars: ✭ 15 (-11.76%)
Mutual labels:  olap, oltp
Radon
RadonDB is an open source, cloud-native MySQL database for building global, scalable cloud services
Stars: ✭ 1,584 (+9217.65%)
Mutual labels:  olap, oltp
paper seacher
where where where paper
Stars: ✭ 45 (+164.71%)
Mutual labels:  papers
Guided-I2I-Translation-Papers
Guided Image-to-Image Translation Papers
Stars: ✭ 117 (+588.24%)
Mutual labels:  papers
dlink
Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
Stars: ✭ 1,535 (+8929.41%)
Mutual labels:  olap
Object Detection
Summary of object detection(modules&&improvements)
Stars: ✭ 50 (+194.12%)
Mutual labels:  papers
cuteOS-references
Documentation, references, and collected academic research for the cuteOS Kernel.
Stars: ✭ 32 (+88.24%)
Mutual labels:  papers
Awesome-Federated-Learning-on-Graph-and-GNN-papers
Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.
Stars: ✭ 206 (+1111.76%)
Mutual labels:  papers
procedural-advml
Task-agnostic universal black-box attacks on computer vision neural network via procedural noise (CCS'19)
Stars: ✭ 47 (+176.47%)
Mutual labels:  papers
awesome-visual-localization-papers
The relocalization task aims to estimate the 6-DoF pose of a novel (unseen) frame in the coordinate system given by the prior model of the world.
Stars: ✭ 60 (+252.94%)
Mutual labels:  papers
awesome-secure-computation
Awesome list for cryptographic secure computation paper. This repo includes *Lattice*, *DifferentialPrivacy*, *MPC* and also a comprehensive summary for top conferences.
Stars: ✭ 125 (+635.29%)
Mutual labels:  papers
PyPaperBot
PyPaperBot is a Python tool for downloading scientific papers using Google Scholar, Crossref, and SciHub.
Stars: ✭ 184 (+982.35%)
Mutual labels:  papers
tools-generation-detection-synthetic-content
Compilation of the state of the art of tools, articles, forums and links of interest to generate and detect any type of synthetic content using deep learning.
Stars: ✭ 107 (+529.41%)
Mutual labels:  papers
reading-group
Discussions on papers, frameworks, blogs and ideas every Saturday.
Stars: ✭ 57 (+235.29%)
Mutual labels:  papers
metriql
The metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+1235.29%)
Mutual labels:  olap
flock
Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
Stars: ✭ 232 (+1264.71%)
Mutual labels:  olap
List-of-Academic-Research-on-Usability-in-FOSS
No description or website provided.
Stars: ✭ 29 (+70.59%)
Mutual labels:  papers
awesome-end2end-speech-recognition
💬 A list of End-to-End speech recognition, including papers, codes and other materials
Stars: ✭ 49 (+188.24%)
Mutual labels:  papers
MachineLearning-Papers Survey
機械学習関連の論文Survey用レポジトリ
Stars: ✭ 104 (+511.76%)
Mutual labels:  papers
Paper-Notes
Paper notes in deep learning/machine learning and computer vision
Stars: ✭ 37 (+117.65%)
Mutual labels:  papers

Papers4DataAchitect

Background

There are so many kinds of distributed data store systems , distributed compute systems, distributed machine learning system in DT times.

  • As a application engineer, you may use RDBMS,NoSQL even NewSQL to store and manage data.

  • As a data enginner, you may

  • collect data first

    • extract data from app's log file
    • capture data chang in RDBMS or NoSQL database,
    • crawl data from various web sites
    • pull data from third party data vendor through web service api
    • massive time sequence data from IOT frontend or some sensor such as car-net or monitor-camera with AI enhancement.
  • clean and transform data next

    • use some ETL utility or run-time stream process system such as flink, kafka.
  • analyze and training data at last

    • analyzed in spark/SQL on hadoop/OLAP datawarehouse, generate data report and visual the result use some tools such as tableau.
    • training machine learning models in a distributed machine learning system such as spark ML-Lib, Angel. These model will make some adervtise CTR inference or user recommendation.

Purpose

It is a key ability to work efficently with the different utility for big data/ML pipeline. However, various tools for big data and machine learning are more complicated and more complex. These tools is neither mature as traditional RDBMS nor simple as local algorithm library sucha as sk-learn . Sometimes digging deep into the implemention details of a distributed data store/process/training system may be hard and unnecessary. Nevertheless, understanding some common sense of the software stack will be a great help.

Different system do have some common sense and design patern. It it a good idea to read the original paper which describles the background ,key algorithm,author's consideration. For a data architect or algorithm engineer, reading these paper may be a greate help.

This repository wll collect and classify these papers as a user guide for data/algorithm enginner/architectur.

How to read

A short comment w available as user guide for every paper. The comment consist of background description, abstraction, and contrast between similar systems.

Contents

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].