Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → fluid-cloudnative → Fluid

fluid-cloudnative / Fluid

Licence: apache-2.0

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud

Programming Languages

31211 projects - #10 most used programming language

Labels

kubernetes big-data

Projects that are alternatives of or similar to Fluid

bigtable

TypeScript Bigtable Client with 🔋🔋 included.

Stars: ✭ 13 (-95.09%)

Mutual labels: big-data

Knowage Server

Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.

Stars: ✭ 276 (+4.15%)

Mutual labels: big-data

Baize

白泽自动化运维系统：配置管理、网络探测、资产管理、业务管理、CMDB、CD、DevOps、作业编排、任务编排等功能,未来将添加监控、报警、日志分析、大数据分析等部分内容

Stars: ✭ 296 (+11.7%)

Mutual labels: big-data

bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 20 (-92.45%)

Mutual labels: big-data

Attic Predictionio Sdk Php

PredictionIO PHP SDK

Stars: ✭ 272 (+2.64%)

Mutual labels: big-data

Oie Resources

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Stars: ✭ 283 (+6.79%)

Mutual labels: big-data

pipeline

OONI data processing pipeline

Stars: ✭ 36 (-86.42%)

Mutual labels: big-data

Morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Stars: ✭ 303 (+14.34%)

Mutual labels: big-data

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (+4.15%)

Mutual labels: big-data

Smooks

An extensible Java framework for building XML and non-XML streaming applications

Stars: ✭ 293 (+10.57%)

Mutual labels: big-data

bigstatsr

R package for statistical tools with big matrices stored on disk.

Stars: ✭ 139 (-47.55%)

Mutual labels: big-data

Datahub

The Metadata Platform for the Modern Data Stack

Stars: ✭ 4,232 (+1496.98%)

Mutual labels: big-data

Flink

Apache Flink is an open source project of The Apache Software Foundation (ASF). The Apache Flink project originated from the Stratosphere research project.

Stars: ✭ 17,781 (+6609.81%)

Mutual labels: big-data

mmtf-workshop-2018

Structural Bioinformatics Training Workshop & Hackathon 2018

Stars: ✭ 50 (-81.13%)

Mutual labels: big-data

Couchdb Fauxton

Apache CouchDB

Stars: ✭ 295 (+11.32%)

Mutual labels: big-data

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-94.72%)

Mutual labels: big-data

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+1628.68%)

Mutual labels: big-data

Helix

Mirror of Apache Helix

Stars: ✭ 304 (+14.72%)

Mutual labels: big-data

Cloudbreak

A tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.

Stars: ✭ 301 (+13.58%)

Mutual labels: big-data

Crate

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.

Stars: ✭ 3,254 (+1127.92%)

Mutual labels: big-data

View All Similar Projects ➔

Fluid

English | 简体中文

What is NEW!
Mar. 16th, 2021. Fluid v0.5.0 is RELEASED! It provides various new features, such as on-the-fly dataset scale out/in, metadata backup, support Fuse global mode and so on. Please check the CHANGELOG for details.
Nov. 6th, 2020. Fluid v0.4.0 is RELEASED! It provides various features and bugfix, such as Prefetch Dataset automatically before using it. Please check the CHANGELOG for details.
Oct. 1st, 2020. Fluid v0.3.0 is RELEASED! It provides various features and bugfix, such as Data Access Acceleration For Persistent Volume and Hostpath mode in K8s. Please check the CHANGELOG for details.

What is Fluid?

Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intesive applications, such as big data and AI applications.

Features

Native Support for DataSet Abstraction

Make the abilities needed by data-intensive applictions as navtive-supported functions, to achieve efficient data access and reduce the cost of multidimensional management.
Cloud Data Warming up and Accessing Acceleration

Fluid empowers Distributed Cache Capaicty(Alluixo inside) in Kubernetes with Observability, Portability, Horizontal Scalability
Co-Orchestration for Data and Application

During application scheduling and data placement on cloud, taking both the app's characteristics and data location into consideration, to improve the performance.
Support Multiple Namespaces Management

User can create and manage datasets in multiple namespaces
Support Heterogeneous Data Source Management

Unify the Data access for OSS, HDFS, CEPH and Other underlayer storages

Key Concepts

Dataset: A set of logically related data that will be used by a computing engine, such as Spark for big data and TensorFlow for AI scenarios. The management of dataset has many metrics, has multiple dimensions, such as security, version management and data acceleration. And we hope to start with data acceleration and provide support for the management of data sets.

Runtime: Security, version management and data acceleration, and defines a series of life cycle interfaces. You can implement them.

AlluxioRuntime: From Alluixo, Fluid manages and schedules Alluxio Runtime to achieve dataset visibility, elastic scaling, and data migration. It is an engine which supports data management and caching of datasets.

Prerequisites

Kubernetes version > 1.14, and support CSI
Golang 1.12+
Helm 3

Quick Start

You can follow our Get Started guide to quickly start a testing Kubernetes cluster.

Documentation

You can see our documentation at docs for more in-depth installation and instructions for production:

Qucik Demo

Demo 1: Accelerate Remote File Accessing with Fluid

Demo 2: Machine Learning with Fluid

Demo 3: Accelerate PVC with Fluid

Demo 4: Preload dataset with Fluid

Demo 5: On-the-fly dataset cache scaling

Community

Feel free to reach out if you have any questions. The maintainers of this project are reachable via:

DingTalk:

Contributing

Contributions are welcome and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.

Open Srouce License

Fluid is under the Apache 2.0 license. See the LICENSE file for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 265

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (49) 🔗