All Projects → feast-dev → Feast

feast-dev / Feast

Licence: apache-2.0
Feature Store for Machine Learning

Programming Languages

python
139335 projects - #7 most used programming language
java
68154 projects - #9 most used programming language
shell
77523 projects
HCL
1544 projects
go
31211 projects - #10 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to Feast

Transmogrifai
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (-19.1%)
Mutual labels:  features, spark, ml, feature-engineering
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-96.93%)
Mutual labels:  spark, big-data, data-engineering
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-94.1%)
Mutual labels:  spark, big-data, data-engineering
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+12.54%)
Mutual labels:  spark, ml, big-data
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-97.48%)
Mutual labels:  spark, big-data
Spark Doc Zh
Apache Spark 官方文档中文版
Stars: ✭ 1,126 (-56.29%)
Mutual labels:  spark, big-data
Labs
Research on distributed system
Stars: ✭ 73 (-97.17%)
Mutual labels:  spark, big-data
Cookbook
The Data Engineering Cookbook
Stars: ✭ 9,829 (+281.56%)
Mutual labels:  big-data, data-engineering
Spark Website
Apache Spark Website
Stars: ✭ 75 (-97.09%)
Mutual labels:  spark, big-data
Home
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (-53.45%)
Mutual labels:  spark, ml
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-96.23%)
Mutual labels:  spark, big-data
Waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-97.67%)
Mutual labels:  spark, data-engineering
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-97.79%)
Mutual labels:  spark, big-data
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-97.24%)
Mutual labels:  spark, big-data
Spark Tda
SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-98.25%)
Mutual labels:  spark, ml
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+1127.41%)
Mutual labels:  spark, big-data
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-48.06%)
Mutual labels:  spark, big-data
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+326.67%)
Mutual labels:  spark, big-data
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (-41.34%)
Mutual labels:  big-data, data-engineering
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (-66.85%)
Mutual labels:  big-data, feature-engineering


unit-tests integration-tests-and-build java-integration-tests linter Docs Latest Python API License GitHub Release

Overview

Feast is an open source feature store for machine learning. Feast is the fastest path to productionizing analytic data for model training and online inference.

Please see our documentation for more information about the project.

📐 Architecture

The above architecture is the minimal Feast deployment. Want to run the full Feast on GCP/AWS? Click here.

🐣 Getting Started

1. Install Feast

pip install feast

2. Create a feature repository

feast init my_feature_repo
cd my_feature_repo

3. Register your feature definitions and set up your feature store

feast apply

4. Build a training dataset

from feast import FeatureStore
import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame.from_dict({
    "driver_id": [1001, 1002, 1003, 1004],
    "event_timestamp": [
        datetime(2021, 4, 12, 10, 59, 42),
        datetime(2021, 4, 12, 8,  12, 10),
        datetime(2021, 4, 12, 16, 40, 26),
        datetime(2021, 4, 12, 15, 1 , 12)
    ]
})

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features = [
        'driver_hourly_stats:conv_rate',
        'driver_hourly_stats:acc_rate',
        'driver_hourly_stats:avg_daily_trips'
    ],
).to_df()

print(training_df.head())

# Train model
# model = ml.fit(training_df)
            event_timestamp  driver_id  conv_rate  acc_rate  avg_daily_trips
0 2021-04-12 08:12:10+00:00       1002   0.713465  0.597095              531
1 2021-04-12 10:59:42+00:00       1001   0.072752  0.044344               11
2 2021-04-12 15:01:12+00:00       1004   0.658182  0.079150              220
3 2021-04-12 16:40:26+00:00       1003   0.162092  0.309035              959

5. Load feature values into your online store

CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!

6. Read online features at low latency

from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        'driver_hourly_stats:conv_rate',
        'driver_hourly_stats:acc_rate',
        'driver_hourly_stats:avg_daily_trips'
    ],
    entity_rows=[{"driver_id": 1001}]
).to_dict()

pprint(feature_vector)

# Make prediction
# model.predict(feature_vector)
{
    "driver_id": [1001],
    "driver_hourly_stats__conv_rate": [0.49274],
    "driver_hourly_stats__acc_rate": [0.92743],
    "driver_hourly_stats__avg_daily_trips": [72]
}

📦 Functionality and Roadmap

The list below contains the functionality that contributors are planning to develop for Feast

🎓 Important Resources

Please refer to the official documentation at Documentation

👋 Contributing

Feast is a community project and is still under active development. Please have a look at our contributing and development guides if you want to contribute to the project:

Contributors

Thanks goes to these incredible people:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].