All Projects → kamu-data → Kamu Cli

kamu-data / Kamu Cli

Licence: mpl-2.0
Next generation tool for decentralized exchange and transformation of semi-structured data

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to Kamu Cli

Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+82.61%)
Mutual labels:  blockchain, spark, flink
Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+2539.13%)
Mutual labels:  sql, spark, flink
Parquet Generator
Parquet file generator
Stars: ✭ 16 (-76.81%)
Mutual labels:  sql, spark
Spark Scala Tutorial
A free tutorial for Apache Spark.
Stars: ✭ 907 (+1214.49%)
Mutual labels:  spark, jupyter
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+1282.61%)
Mutual labels:  spark, jupyter
Elasticsearch Spark Recommender
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Stars: ✭ 707 (+924.64%)
Mutual labels:  spark, jupyter
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+1084.06%)
Mutual labels:  spark, flink
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+45723.19%)
Mutual labels:  sql, spark
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+7889.86%)
Mutual labels:  spark, flink
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-40.58%)
Mutual labels:  blockchain, sql
Data Ingestion Platform
Stars: ✭ 39 (-43.48%)
Mutual labels:  spark, flink
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-20.29%)
Mutual labels:  spark, flink
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+908.7%)
Mutual labels:  sql, spark
Curriculum
👩‍🏫 👨‍🏫 The open-source curriculum of Enki!
Stars: ✭ 624 (+804.35%)
Mutual labels:  blockchain, sql
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+1097.1%)
Mutual labels:  spark, flink
Datafusion
DataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+785.51%)
Mutual labels:  sql, spark
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1142.03%)
Mutual labels:  spark, flink
Covenantsql
A decentralized, trusted, high performance, SQL database with blockchain features
Stars: ✭ 1,148 (+1563.77%)
Mutual labels:  blockchain, sql
Bdp Dataplatform
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+560.87%)
Mutual labels:  spark, flink
Justenoughscalaforspark
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Stars: ✭ 538 (+679.71%)
Mutual labels:  spark, jupyter

Kamu

build Release

Welcome to kamu - a new-generation data management and transformation tool!

About

kamu is a reference implementation of Open Data Fabric - a Web 3.0 technology that powers a distributed structured data supply chain for providing timely, high-quality, and verifiable data for data science, smart contracts, web and applications.

Open Data Fabric

Using kamu you can become a member of the world's first peer-to-peer data pipeline that:

  • Connects publishers and consumers of data worldwide.
  • Enables effective collaboration of people around data transformation and cleaning.
  • Ensures data propagates with minimal latency.
  • Provides the most complete, secure, and fully accurate lineage and provenance information on where every piece of data came from and how it was produced.
  • Guarantees reproducibility of all data workflows.

Documentation

Learning Materials

Kamu 101 - First Steps

Features

  • For Data Publishers

    • Create and share your own dataset with the world
    • Ingest any existing data set from the web
    • Easily keep track of any updates to the data source in the future
    • Close the feedback loop and see who and how uses your data Pull Data
  • For Data Professionals

    • Collaborate on cleaning and improving data of existing datasets
    • Create derivative datasets by transforming, enriching, and summarizing data others have published
    • Write query once - run it forever with one of our state of the art stream processing engines
    • Always stay up-to-date by pulling latest updates from the data sources with just one command
    • Built-in support for GIS data
  • For Data Consumers

    • Download a dataset from a shared repository
    • Easily verify that all data comes from trusted sources
    • Audit the chain of transformations this data went through
    • Validate that downloaded data was in fact produced by the declared transformations
  • For Data Exploration

    • Explore data and run ad-hoc SQL queries (backed by the power of Apache Spark) SQL Shell
    • Launch a Jupyter notebook with one command
    • Join, filter, and shape your data using SQL
    • Visualize the result using your favorite library Jupyter

Project Status Disclaimer

kamu is an alpha quality software. Our main goal currently is to demonstrate the potential of the Open Data Fabric protocol and its transformative properties to the community and the industry and validate our ideas.

Naturally, we don't recommend using kamu for any critical tasks - it's definitely not prod-ready. We are, however absolutely delighted to use kamu for our personal data analytics needs and small projects, and hoping you will enjoy it too.

If you do - simply make sure to maintain your source data separately and don't rely on kamu for data storage. This way any time a new version comes out that breaks some compatibility you can simply delete your kamu workspace and re-create it from scratch in a matter of seconds.

Also, please be patient with current performance and resource usage. We fully realize that waiting 15s to process a few KiB of CSV isn't great. Stream processing technologies is a relatively new area, and the data processing engines kamu uses (e.g. Apache Spark and Flink) are tailored to run in large clusters, not on a laptop. They take a lot of resources to just boot up, so the start-stop-continue nature of kamu's transformations is at odds with their design. We are hoping that the industry will recognize our use-case and expect to see a better support for it in future. We are committed to improving the performance significantly in the near future.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].