All Projects → snowplow → Snowplow

snowplow / Snowplow

Licence: apache-2.0
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

Programming Languages

scala
5932 projects
PLpgSQL
1095 projects
shell
77523 projects
python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
Thrift
134 projects

Projects that are alternatives of or similar to Snowplow

Platform
Code Climate Engineering Data Platform
Stars: ✭ 104 (-98.25%)
Mutual labels:  data, analytics
Aresdb
A GPU-powered real-time analytics storage and query engine.
Stars: ✭ 2,814 (-52.59%)
Mutual labels:  data, analytics
Reddit Detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Stars: ✭ 129 (-97.83%)
Mutual labels:  data, analytics
Awesome Streamlit
The purpose of this project is to share knowledge on how awesome Streamlit is and can be
Stars: ✭ 769 (-87.04%)
Mutual labels:  data, analytics
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-93.04%)
Mutual labels:  data, analytics
Athenax
SQL-based streaming analytics platform at scale
Stars: ✭ 1,178 (-80.15%)
Mutual labels:  data, analytics
Weld
High-performance runtime for data analytics applications
Stars: ✭ 2,709 (-54.36%)
Mutual labels:  data, analytics
Weeklypedia
A weekly email update of all the most popular wikipedia articles
Stars: ✭ 50 (-99.16%)
Mutual labels:  data, analytics
Introduction Datascience Python Book
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications
Stars: ✭ 275 (-95.37%)
Mutual labels:  data, analytics
objectiv-analytics
Powerful product analytics for data teams, with full control over data & models.
Stars: ✭ 399 (-93.28%)
Mutual labels:  snowplow, product-analytics
Metabase
The simplest, fastest way to get business intelligence and analytics to everyone in your company 😋
Stars: ✭ 26,803 (+351.61%)
Mutual labels:  data, analytics
Tensorbase
TensorBase BE is building a high performance, cloud neutral bigdata warehouse for SMEs fully in Rust.
Stars: ✭ 440 (-92.59%)
Mutual labels:  data, analytics
Stats
A well tested and comprehensive Golang statistics library package with no dependencies.
Stars: ✭ 2,196 (-63%)
Mutual labels:  data, analytics
rudder-sdk-js
JavaScript SDK for RudderStack - the Customer Data Platform for Developers.
Stars: ✭ 92 (-98.45%)
Mutual labels:  product-analytics, marketing-analytics
Lexpredict Lexnlp
LexNLP by LexPredict
Stars: ✭ 439 (-92.6%)
Mutual labels:  data, analytics
Countly Server
Countly helps you get insights from your application. Available self-hosted or on private cloud.
Stars: ✭ 4,857 (-18.16%)
Mutual labels:  data, analytics
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (-90.06%)
Mutual labels:  data
Data Science Career
Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-89.39%)
Mutual labels:  analytics
Keen Js
https://keen.io/ JavaScript SDKs. Track users and visualise the results. Demo http://keen.github.io/keen-dataviz.js/
Stars: ✭ 588 (-90.09%)
Mutual labels:  analytics
Countly Sdk Ios
Countly Product Analytics iOS SDK with macOS, watchOS and tvOS support.
Stars: ✭ 585 (-90.14%)
Mutual labels:  analytics

Snowplow

Release License Discourse posts

Snowplow logo

Overview

Snowplow is an enterprise-strength marketing and product analytics platform. It does three things:

  1. Identifies your users, and tracks the way they engage with your website or application
  2. Stores your users' behavioral data in a scalable "event data warehouse" you control: Amazon Redshift, Google BigQuery, Snowflake or Elasticsearch
  3. Lets you leverage the biggest range of tools to analyze that data, including big data tools (e.g. Spark) via EMR or more traditional tools e.g. Looker, Mode, Superset, Re:dash to analyze that behavioral data

To find out more, please check out the Snowplow website and the docs website.

Version Compatibility Matrix

For compatibility assurance, the version compatibility matrix offers clarity on our recommended stack. It is strongly recommended when setting up a Snowplow pipeline to use the versions listed in the version compatibility matrix which can be found within our docs.

Public Roadmap

This repository also contains the Snowplow Public Roadmap. The Public Roadmap lets you stay up to date and find out what's happening on the Snowplow Platform. Help us prioritize our cards: open the issue and leave a 👍 to vote for your favorites. Want us to build a feature or function? Tell us by heading to our Discourse forum 💬.

Try Snowplow

Setting up a full open-source Snowplow pipeline requires a non-trivial amount of engineering expertise and time investment. You might be interested in finding out what Snowplow can do first, by setting up Try Snowplow.

Open Source Quick Start

The Open Source Quick Start will help you get up and running with a Snowplow open source pipeline. Snowplow publishes a set of terraform modules, which automate the setting up & deployment of the required infrastructure & applications for an operational Snowplow open source pipeline, with just a handful of input variables required on your side.

Join the Snowplow Research Panel and help shape the future of open source

As part of our ongoing efforts to improve the Snowplow Open Source experience, we're looking for users of our open-source software and members of our community to take part in research studies. Join here.

Our Commercial Offering

If you wish to get everything setup and managed for you, you can consider Snowplow BDP. You can also request a demo.

Snowplow technology 101

Snowplow architecture

The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats.

To briefly explain these six sub-systems:

  • Trackers fire Snowplow events. Currently we have 15 trackers, covering web, mobile, desktop, server and IoT
  • Collector receives Snowplow events from trackers. Currently we have one official collector implementation with different sinks: Amazon Kinesis, Google PubSub, Amazon SQS, Apache Kafka and NSQ
  • Enrich cleans up the raw Snowplow events, enriches them and puts them into storage. Currently we have several implementations, built for different environments (GCP, AWS, Apache Kafka) and one core library
  • Storage is where the Snowplow events live. Currently we store the Snowplow events in a flat file structure on S3, and in the Redshift, Postgres, Snowflake and BigQuery databases
  • Data modeling is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We officially support data models for Redshift, Snowflake and BigQuery.
  • Analytics are performed on the Snowplow events or on the aggregate tables.

For more information on the current Snowplow architecture, please see the Technical architecture.

About this repository

This repository is an umbrella repository for all loosely-coupled Snowplow components and is updated on each component release.

Since June 2020, all components have been extracted into their dedicated repositories (more info here) and this repository serves as an entry point for Snowplow users, the home of our public roadmap and as a historical artifact.

Components that have been extracted to their own repository are still here as git submodules.

Trackers

Web

Mobile

Desktop & Server

Collector

Enrich

Loaders

Iglu

Data modeling

Web

Mobile

Testing

Parsing enriched event

Bad rows

Terraform Modules

Need help?

We want to make it super-easy for Snowplow users and contributors to talk to us and connect with each other, to share ideas, solve problems and help make Snowplow awesome. Here are the main channels we're running currently, we'd love to hear from you on one of them:

Discourse

This is for all Snowplow users: engineers setting up Snowplow, data modelers structuring the data and data consumers building insights. You can find guides, recipes, questions and answers from Snowplow users including the Snowplow team.

We welcome all questions and contributions!

Twitter

@SnowplowData for official news or @SnowplowLabs for engineering-heavy conversations and release updates.

GitHub

If you spot a bug, then please raise an issue in the GitHub repository of the component in question. Likewise if you have developed a cool new feature or an improvement, please open a pull request, we'll be glad to integrate it in the codebase!

If you want to brainstorm a potential new feature, then Discourse is the best place to start.

Email

[email protected]

If you want to talk directly to us (e.g. about a commercially sensitive issue), email is the easiest way.

Copyright and license

Snowplow is copyright 2012-2021 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].