All Projects → beneath-hq → beneath

beneath-hq / beneath

Licence: other
Beneath is a serverless real-time data platform ⚡️

Programming Languages

go
31211 projects - #10 most used programming language
typescript
32286 projects
python
139335 projects - #7 most used programming language
java
68154 projects - #9 most used programming language
Jupyter Notebook
11667 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to beneath

versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+121.54%)
Mutual labels:  etl, data-warehouse, data-engineering, dataops, data-pipelines
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Stars: ✭ 24 (-63.08%)
Mutual labels:  etl, data-engineering, data-pipelines
Dagster
An orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+6206.15%)
Mutual labels:  etl, analytics, data-pipelines
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+21.54%)
Mutual labels:  etl, analytics, data-engineering
Dataform
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+426.15%)
Mutual labels:  etl, analytics, data-engineering
Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Stars: ✭ 179 (+175.38%)
Mutual labels:  etl, analytics, data-engineering
rivery cli
Rivery CLI
Stars: ✭ 16 (-75.38%)
Mutual labels:  etl, dataops, data-pipelines
uptasticsearch
An Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (-27.69%)
Mutual labels:  etl, data-engineering
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+841.54%)
Mutual labels:  etl, data-engineering
arthur-redshift-etl
ELT Code for your Data Warehouse
Stars: ✭ 22 (-66.15%)
Mutual labels:  etl, data-engineering
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (-32.31%)
Mutual labels:  etl, data-engineering
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+544.62%)
Mutual labels:  dataops, mlops
Data-Engineering-Projects
Personal Data Engineering Projects
Stars: ✭ 167 (+156.92%)
Mutual labels:  data-warehouse, data-engineering
blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (-12.31%)
Mutual labels:  etl, data-engineering
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-18.46%)
Mutual labels:  etl, data-engineering
neon-workshop
A Pachyderm deep learning tutorial for conference workshops
Stars: ✭ 19 (-70.77%)
Mutual labels:  data-engineering, data-pipelines
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+18.46%)
Mutual labels:  etl, data-engineering
cli
Polyaxon Core Client & CLI to streamline MLOps
Stars: ✭ 18 (-72.31%)
Mutual labels:  dataops, mlops
ml-in-production
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Stars: ✭ 29 (-55.38%)
Mutual labels:  data-engineering, data-pipelines
data-science-best-practices
The goal of this repository is to enable data scientists and ML engineers to develop data science use cases and making it ready for production use. This means focusing on the versioning, scalability, monitoring and engineering of the solution.
Stars: ✭ 53 (-18.46%)
Mutual labels:  analytics, mlops

Beneath

Beneath is a serverless real-time data platform. Our goal is to create one end-to-end platform for data workers that combines data storage, processing, and visualization with data quality management and governance.


Go Report Card GoDoc Twitter

Beneath is a work in progress and your input makes a big difference! If you like it, star the project to show your support or reach out and tell us what you think.

🧠 Philosophy

The holy grail of data work is putting data science into production. It's glorious to build live dashboards that aggregate multiple data sources, send real-time alerts based on a machine learning model, or offer customer-specific analytics in your frontend.

But building a modern data management stack is a full-time job, and a lot can go wrong. If you were starting a project from scratch today, you might set up Postgres, BigQuery, Kafka, Airflow, DBT and Metabase just to cover the basics. Later, you would need more tools to do data quality management, data cataloging, data versioning, data lineage, permissions management, change data capture, stream processing, and so on.

Beneath is a new way of building data apps. It takes an end-to-end approach that combines data storage, processing, and visualization with data quality management and governance in one serverless platform. The idea is to provide one opinionated layer of abstraction, i.e. one SDK and UI, which under the hood builds on modern data technologies.

Beneath is inspired by services like Netlify and Vercel that make it remarkable easy for developers to build and run web apps. In that same spirit, we want to give data scientists and engineers the fastest developer experience for building data products.

🚀 Status

We started with the data storage and governance layers. You can use the Beneath Beta today to store, explore, query, stream, monitor and share data. It offers several interfaces, including a Python client, a CLI, websockets, and a web UI. The beta is stable for non-critical use cases. If you try out the beta and have any feedback to share, we'd love to hear it!

Next up, we're tackling the data processing and data visualization layers, which will bring expanded opportunity for data governance and data quality management (see the roadmap at the end of this README for progress).

🎬 Tour

The snippet below presents a whirlwind tour of the Python API:

# Create a new table
table = await client.create_table("examples/demo/foo", schema="""
  type Foo @schema {
    foo: String! @key
    bar: Timestamp
  }
""")

# Write batch or real-time data
await table.write(data)

# Load into a dataframe
df = await beneath.load_full(table)

# Replay and subscribe to changes
await beneath.consume(table, callback, subscription_path="...")

# Analyze with SQL
data = await beneath.query_warehouse(f"SELECT count(*) FROM `{table}`")

# Lookup by key, range or prefix
data = await table.query_index(filter={"foo": {"_prefix": "bar"}})

The image below shows a screenshot from the Beneath console. Check out the home page for a demo video.

Source code example

🐣 Get started

The best way to try Beneath is with a free beta account. Sign up here. When you have created an account, you can:

  1. Install and authenticate the Beneath SDK
  2. Browse public projects and integrate using Python, JavaScript, Websockets and more
  3. Create a private or public project and start writing data

We're working on bundling a self-hosted version that you can run locally. If you're interested in self-hosting, let us know!

👋 Community and Support

🎓 Documentation

📦 Features and roadmap

  • Data storage
    • Log streaming for replay and subscribe
    • Replication to key-value store for fast indexed lookups
    • Replication to data warehouse for OLAP queries (SQL)
    • Schema management and enforcement
    • Data versioning
    • Schema evolution and migrations
    • Secondary indexes
    • Strongly consistent operations for OLTP
    • Geo-replicated storage
  • Data processing
    • Scheduled/triggered SQL queries
    • Compute sandbox for batch and streaming pipelines
    • Git-integration for continuous deployments
    • DAG view of tables and pipelines for data lineage
    • Data app catalog (one-click parameterized deployments)
  • Data visualization and exploration
    • Vega-based charts
    • Dashboards composed from charts and tables
    • Alerting layer
    • Python notebooks (Jupyter)
  • Data governance
    • Web console and CLI for creating and browsing resources
    • Usage dashboards for tables, services, users and organizations
    • Usage quota management
    • Granular permissions management
    • Service accounts with custom permissions and quotas
    • API secrets (tokens) that can be issued/revoked
    • Data search and discovery
    • Audit logs as meta-tables
  • Data quality management
    • Field validation rules, checked on write
    • Alert triggers
    • Data distribution tests
    • Machine learning model re-training and monitoring
  • Integrations
    • gRPC, REST and websockets APIs
    • Command-line interface (CLI)
    • Python client
    • JS and React client
    • PostgreSQL wire-protocol compatibility
    • GraphQL API for data
    • Row restricted access tokens for identity-centered apps
    • Self-hosted Beneath on Kubernetes with federation

🍿 How it works

Check out the Concepts section of the docs for an overview of how Beneath works.

The contributing/ directory in this repository contains a deeper technical walkthrough of the software architecture.

🛒 License

This repository contains the full source code for Beneath. Beneath's core is source available, licensed under the Business Source License, which converts to the Apache 2.0 license after four years. All the client libraries (in the clients/ directory) and examples (in the examples/ directory) are open-source, licensed under the MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].