All Projects → ab180 → lrmr

ab180 / lrmr

Licence: other
Less-Resilient MapReduce framework for Go

Programming Languages

go
31211 projects - #10 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to lrmr

Data Engineering Howto
A list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+6325%)
Mutual labels:  data-engineering
Gspread Pandas
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (+606.25%)
Mutual labels:  data-engineering
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-37.5%)
Mutual labels:  data-engineering
Data Engineering Nanodegree
Projects done in the Data Engineering Nanodegree by Udacity.com
Stars: ✭ 151 (+371.88%)
Mutual labels:  data-engineering
Soda Sql
Metric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+440.63%)
Mutual labels:  data-engineering
Elastik Nearest Neighbors
Go to: https://github.com/alexklibisz/elastiknn
Stars: ✭ 249 (+678.13%)
Mutual labels:  data-engineering
Airflow Autoscaling Ecs
Airflow Deployment on AWS ECS Fargate Using Cloudformation
Stars: ✭ 136 (+325%)
Mutual labels:  data-engineering
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+771.88%)
Mutual labels:  data-engineering
Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Stars: ✭ 179 (+459.38%)
Mutual labels:  data-engineering
awesome-dbt
A curated list of awesome dbt resources
Stars: ✭ 520 (+1525%)
Mutual labels:  data-engineering
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+375%)
Mutual labels:  data-engineering
Auptimizer
An automatic ML model optimization tool.
Stars: ✭ 166 (+418.75%)
Mutual labels:  data-engineering
Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+678.13%)
Mutual labels:  data-engineering
Gcp Data Engineer Exam
Study materials for the Google Cloud Professional Data Engineering Exam
Stars: ✭ 144 (+350%)
Mutual labels:  data-engineering
hive-metastore-client
A client for connecting and running DDLs on hive metastore.
Stars: ✭ 37 (+15.63%)
Mutual labels:  data-engineering
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+328.13%)
Mutual labels:  data-engineering
Ploomber
A convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (+590.63%)
Mutual labels:  data-engineering
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+81.25%)
Mutual labels:  data-engineering
qsv
CSVs sliced, diced & analyzed.
Stars: ✭ 438 (+1268.75%)
Mutual labels:  data-engineering
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+246.88%)
Mutual labels:  data-engineering

lrmr

Online MapReduce framework for Go, which is capable for jobs in sub-second.

  • Sacrificing resilience for fast performance
  • Easily scalable onto distributed clusters
  • Easily embeddable to existing applications
  • Uses etcd for cluster management / coordination

Example (Driver)

package main

import (
	"context"
	"fmt"

	"github.com/ab180/lrmr"
	. "github.com/ab180/lrmr/test"
)

func main() {
	cluster, err := lrmr.ConnectToCluster()
	if err != nil {
		panic(err)
	}
	defer cluster.Close()

	result, err := lrmr.FromLocalFile("./test/testdata/unpacked/").
		FlatMap(DecodeCSV()).
		GroupByKey().
		Reduce(Count()).
		RunAndCollect(context.Background(), cluster)

	if err != nil {
		panic(err)
	}
	fmt.Println("Outputs:", result.Ouptuts)
	fmt.Println("Metrics:", result.Metrics.String())
}

Example (Executor)

Executor is a worker in a distributed cluster which runs jobs submitted from the driver.

package main

import (
	"context"
	"fmt"

	"github.com/ab180/lrmr"
	. "github.com/ab180/lrmr/test"
)

func main() {
	c, err := lrmr.ConnectToCluster()
	if err != nil {
		log.Fatalf("failed to join cluster: %v", err)
	}
	exec, err := lrmr.NewExecutor(c, opt)
	if err != nil {
		log.Fatalf("failed to initiate executor: %v", err)
	}
	defer exec.Close()

	if err := exec.Start(); err != nil {
		log.Fatalf("failed to start executor: %v", err)
	}
}

Building and Developing lrmr

Requirements

  • Go 1.19 or above

LICENSE: MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].