All Projects → joshday → OnlineStatsBase.jl

joshday / OnlineStatsBase.jl

Licence: other
Base types for OnlineStats.

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to OnlineStatsBase.jl

Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+850%)
Mutual labels:  big-data, streaming-data
Onlinestats.jl
Single-pass algorithms for statistics
Stars: ✭ 507 (+1850%)
Mutual labels:  big-data, streaming-data
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (+388.46%)
Mutual labels:  big-data
ByteSlice
"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-7.69%)
Mutual labels:  big-data
xcast
A High-Performance Data Science Toolkit for the Earth Sciences
Stars: ✭ 28 (+7.69%)
Mutual labels:  big-data
cloudberry
Big Data Visualization
Stars: ✭ 89 (+242.31%)
Mutual labels:  big-data
awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 11,093 (+42565.38%)
Mutual labels:  streaming-data
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (+23.08%)
Mutual labels:  big-data
godsend
A simple and eloquent workflow for streaming messages to micro-services.
Stars: ✭ 15 (-42.31%)
Mutual labels:  streaming-data
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-34.62%)
Mutual labels:  big-data
Big-Data-Demo
基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (+461.54%)
Mutual labels:  big-data
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+8976.92%)
Mutual labels:  big-data
insightedge
InsightEdge Core
Stars: ✭ 22 (-15.38%)
Mutual labels:  big-data
talaria
TalariaDB is a distributed, highly available, and low latency time-series database for Presto
Stars: ✭ 148 (+469.23%)
Mutual labels:  big-data
incubator-liminal
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (+350%)
Mutual labels:  big-data
MLBD
Materials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-23.08%)
Mutual labels:  big-data
beekeeper
Service for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (+65.38%)
Mutual labels:  big-data
LoL-Match-Prediction
Win probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (+30.77%)
Mutual labels:  big-data
meetups-archivos
Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (+130.77%)
Mutual labels:  big-data
Twitter-Stream-API-Dataset
Twitter Dynamic Dataset Api. Create any dataset YOU want.
Stars: ✭ 20 (-23.08%)
Mutual labels:  streaming-data

Build status codecov


OnlineStatsBase

This package defines the basic types and interface for OnlineStats.



Interface

Required

  • _fit!(stat, y): Update the "sufficient statistics" of the estimator from a single observation y.

Required (with Defaults)

  • value(stat, args...; kw...) = <first field of struct>: Calculate the value of the estimator from the "sufficient statistics".
  • nobs(stat) = stat.n: Return the number of observations.

Optional

  • _merge!(stat1, stat2): Merge stat2 into stat1 (an error by default in OnlineStatsBase versions >= 1.5).
  • Base.empty!(stat): Return the stat to its initial state (an error by default).



Example

  • Make a subtype of OnlineStat and give it a _fit!(::OnlineStat{T}, y::T) method.
  • T is the type of a single observation. Make sure it's adequately wide.
using OnlineStatsBase

mutable struct MyMean <: OnlineStat{Number}
    value::Float64
    n::Int
    MyMean() = new(0.0, 0)
end
function OnlineStatsBase._fit!(o::MyMean, y)
    o.n += 1
    o.value += (1 / o.n) * (y - o.value)
end



That's all there is to it!

y = randn(1000)

o = fit!(MyMean(), y)
# MyMean: n=1_000 | value=0.0530535
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].