All Projects → icicle-lang → icicle

icicle-lang / icicle

Licence: BSD-3-Clause license
Icicle Streaming Query Language

Programming Languages

haskell
3896 projects
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to icicle

Benthos
Fancy stream processing made operationally mundane
Stars: ✭ 3,705 (+23056.25%)
Mutual labels:  event-sourcing, streaming-data
CQELight
CQELight is an entreprise grade extensible and customisable framework for creating software with CQRS, DDD & Event Sourcing patterns
Stars: ✭ 21 (+31.25%)
Mutual labels:  event-sourcing
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (+31.25%)
Mutual labels:  event-sourcing
hexagonal-clean-architecture
Clean architecture focused on microservices with .NET Core 3.1 and C# 8
Stars: ✭ 47 (+193.75%)
Mutual labels:  event-sourcing
event-sourcing-bundle
A lightweight but also all-inclusive event sourcing bundle with a focus on developer experience and based on doctrine dbal
Stars: ✭ 12 (-25%)
Mutual labels:  event-sourcing
nestjs-file-streaming
NestJS File Streaming With MongoDB
Stars: ✭ 28 (+75%)
Mutual labels:  streaming-data
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (+175%)
Mutual labels:  feature-engineering
transit
Massively real-time city transit streaming application
Stars: ✭ 20 (+25%)
Mutual labels:  streaming-data
Market-Mix-Modeling
Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales
Stars: ✭ 31 (+93.75%)
Mutual labels:  feature-engineering
hollywood
Hollywood-js is a Framework for building very modular and high scalable server side applications following CQRS (Command Query Responsibility Segregation) and enforcing IoC.
Stars: ✭ 20 (+25%)
Mutual labels:  event-sourcing
richflow
A Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.
Stars: ✭ 17 (+6.25%)
Mutual labels:  streaming-data
fastknn
Fast k-Nearest Neighbors Classifier for Large Datasets
Stars: ✭ 64 (+300%)
Mutual labels:  feature-engineering
cqrs-event-sourcing-example
Example of a list-making Web API using CQRS, Event Sourcing and DDD.
Stars: ✭ 28 (+75%)
Mutual labels:  event-sourcing
payments-DDD-ES-tutorial
This is tutorial project to learn how to connect Symfony4 and docker compose with Domain Driven Design and Event Sourcing
Stars: ✭ 23 (+43.75%)
Mutual labels:  event-sourcing
ftgogo
FTGOGO - event-driven architecture demonstration application using edat
Stars: ✭ 82 (+412.5%)
Mutual labels:  event-sourcing
eventsourcing-java-example
A simplified (in memory) example of Event Sourcing implementation for banking domain.
Stars: ✭ 83 (+418.75%)
Mutual labels:  event-sourcing
fs2-es
Event sourcing utilities for FS2
Stars: ✭ 75 (+368.75%)
Mutual labels:  event-sourcing
sklearn-audio-classification
An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP
Stars: ✭ 31 (+93.75%)
Mutual labels:  feature-engineering
dudulina
CQRS + Event Sourcing library for PHP
Stars: ✭ 53 (+231.25%)
Mutual labels:  event-sourcing
EngineX
Engine X - 实时AI智能决策引擎、规则引擎、风控引擎、数据流引擎。 通过可视化界面进行规则配置,无需繁琐开发,节约人力,提升效率,实时监控,减少错误率,随时调整; 支持规则集、评分卡、决策树,名单库管理、机器学习模型、三方数据接入、定制化开发等;
Stars: ✭ 369 (+2206.25%)
Mutual labels:  feature-engineering

Icicle

The Icicle Streaming Query Language.

Build Status

Purpose

Icicle is a language designed for easy collaboration in feature engineering and business intelligence without fear or difficulty.

Using static type checking with a novel, modal type system, we can be 10x faster than Spark, allow different users' queries to be efficiently fused, and guarantee that if a query type checks, it won't crash or interfere with any other.

Icicle is a simple language designed for data scientists, data engineers, and business intelligence professional to achieve state of the art performance without difficulty.

The key principles of Icicle are to:

  • Permit different users to build queries independently, but execute them together efficiently with sharing and non-interference;
  • Provide a static guarantee that all computations must be computed in a single pass;
  • Use a first class notion of time - one should be able to query any entity's state and features at any time (this is important for preventing label leakage for instance); and
  • Use query fusion and high level optimisations to achieve great performance.

Motivation

When performing a data engineering and machine learning tasks, one has many options for creating features. Languages like R can provide expressivity, but they don't scale well to the gigabyte, terabyte, or petabyte level; SQL can be applied for machine learning features, but is clunky to write, can fail at runtime, its hard to protect against label leakage, and its runtime order is hard to quantify, especially at the terabyte and petabyte levels.

Icicle is a total programming language designed to provide O(n) runtime for all feature queries, while providing a pleasant environment for data scientists and engineers to write expressive features.

Examples

The simplest examples and counter-examples one may consider are mean and variance. First up, one could write mean as:

mean : Element Double -> Aggregate Double
mean v = sum v / count v

This is fine1, and one can be sure that Icicle will fuse the sum and count queries such that the data will only be visited once. For calculating the variance and standard deviation, one might naïvely try this:

variance : Element Double -> Aggregate Double
variance v =
  let
    mean'  = mean v              -- Aggregate Double
    count' = count v             -- Aggregate Double
    sq2    = sum ((v - mean')^2) -- Illegal subtraction of Aggregate from Element
  in
    sq2 / count'

But clearly, this has a massive problem. The data must be traversed twice to calculate this query: first to calculate the mean, and then to calculate the sum of squares differences. In Icicle, this version of variance is a type error, and we instead provide Welford's numerically stable streaming calculation for variances.

Context

Icicle is designed for, but not dependent on, the ivory data-store. While parts of this document uses the terms of ivory, the problems being addressed are not unique to ivory, and one can adapt these ideas to different contexts. For an idea of what ivory does, see

Facts & Values

Facts are (typed) values, keyed along three dimensions:

  • Entity, this would be typically thought to represent the primary key of a row in a traditional data base.

  • Attribute, this would be typically thought to represent the name of a column in a traditional data base.

  • Time, this represents when a fact is valid at. Different types of facts may interpret this in different ways (for example for a state like value, this would indicate a fact is valid from time (t) until the next fact with the same entity / attribute and a more resent time dimension. There is no analog in traditional data bases, but this is more common in immutable or append-only data stores.

Values themselves are structured, and may be primitives, structs, or lists of values.

Data Processing

Data processing in Ivory (and similar data stores) is heavily parallelized. This places restrictions on how data is processed and how expressions can relate to each other - in most cases these restrictions are simplifying to the desigin of icicle.

The basic invariants are:

  • Data is processed in "batches", where a batch has a set of uniform properties:

  • All facts in a batch are for the same entity.

  • All facts in a batch are for the same attribute.

  • Facts in a batch are processed in chronological order.

  • A batch is guaranteed to have all facts for a given entity / attribute.

Expressions

Icicle supports a wide variety of expressions, and queries which can be computed in an event soucing or streaming manner should be computable in Icicle.

The best place to get a feel for expressions is the ambling document, which gives a run through of some queries, and how Icicle is different to other query languages.

Optimisation

Icicle has a highly optimising backend, which compiles queries to C programs operating on flattened data structures. A great introduction to Icicle's optimisations is a talk by one of its authors, Jacob Stanley: Icicle: The Highs and Lows of Optimising DSLs.

1: Actually, this isn't numerically stable, in the icicle prelude, we use a more robust version.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].