All Projects → kevwan → mapreduce

kevwan / mapreduce

Licence: MIT license
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to mapreduce

Zio
ZIO — A type-safe, composable library for async and concurrent programming in Scala
Stars: ✭ 3,167 (+3305.38%)
Mutual labels:  concurrent-programming, concurrent
goroutines
provides utilities to perform common tasks on goroutines
Stars: ✭ 19 (-79.57%)
Mutual labels:  concurrent-programming, concurrent
practice
Java并发编程与高并发解决方案:http://coding.imooc.com/class/195.html Java开发企业级权限管理系统:http://coding.imooc.com/class/149.html
Stars: ✭ 39 (-58.06%)
Mutual labels:  concurrent-programming, concurrent
myconcurrent
Java并发的系统性学习
Stars: ✭ 25 (-73.12%)
Mutual labels:  concurrent-programming, concurrent
Dashmap
Blazing fast concurrent HashMap for Rust.
Stars: ✭ 1,128 (+1112.9%)
Mutual labels:  concurrent-programming, concurrent
Tascalate Concurrent
Implementation of blocking (IO-Bound) cancellable java.util.concurrent.CompletionStage and related extensions to java.util.concurrent.ExecutorService-s
Stars: ✭ 144 (+54.84%)
Mutual labels:  concurrent-programming, concurrent
java-multithread
Códigos feitos para o curso de Multithreading com Java, no canal RinaldoDev do YouTube.
Stars: ✭ 24 (-74.19%)
Mutual labels:  concurrent-programming, concurrent
scalable-concurrent-containers
High performance containers and utilities for concurrent and asynchronous programming
Stars: ✭ 101 (+8.6%)
Mutual labels:  concurrent-programming
jdk-source-code-reading
JDK source code reading
Stars: ✭ 19 (-79.57%)
Mutual labels:  concurrent
ooso
Java library for running Serverless MapReduce jobs
Stars: ✭ 25 (-73.12%)
Mutual labels:  mapreduce
talepy
📚Coordinate "transactions" across a number of services in python
Stars: ✭ 20 (-78.49%)
Mutual labels:  concurrent
durablefunctions-mapreduce-dotnet
An implementation of MapReduce on top of C# Durable Functions over the NYC 2017 Taxi dataset to compute average ride time per-day
Stars: ✭ 20 (-78.49%)
Mutual labels:  mapreduce
concurrent-data-structure
Concurrent Data Structure for Rust
Stars: ✭ 18 (-80.65%)
Mutual labels:  concurrent
TAOMP
《多处理器编程的艺术》一书中的示例代码实现,带有注释与单元测试
Stars: ✭ 39 (-58.06%)
Mutual labels:  concurrent-programming
asyncoro
Python framework for asynchronous, concurrent, distributed, network programming with coroutines
Stars: ✭ 50 (-46.24%)
Mutual labels:  concurrent-programming
LockFreeSkipList
A set implementation based on lockfree skiplist.
Stars: ✭ 14 (-84.95%)
Mutual labels:  concurrent-programming
treap
A thread-safe, persistent Treap (tree + heap) for ordered key-value mapping and priority sorting.
Stars: ✭ 23 (-75.27%)
Mutual labels:  concurrent
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-84.95%)
Mutual labels:  mapreduce
blacklight
a stack-based concatenative virtual machine for implementing highly concurrent languages
Stars: ✭ 42 (-54.84%)
Mutual labels:  concurrent-programming
web-scraping-engine
A simple web scraping engine supporting concurrent and anonymous scraping
Stars: ✭ 27 (-70.97%)
Mutual labels:  concurrent

mapreduce

English | 简体中文

Go codecov Go Report Card Release License: MIT

Why we have this repo

mapreduce is part of go-zero, but a few people asked if mapreduce can be used separately. But I recommend you to use go-zero for many more features.

Why MapReduce is needed

In practical business scenarios we often need to get the corresponding properties from different rpc services to assemble complex objects.

For example, to query product details.

  1. product service - query product attributes
  2. inventory service - query inventory properties
  3. price service - query price attributes
  4. marketing service - query marketing properties

If it is a serial call, the response time will increase linearly with the number of rpc calls, so we will generally change serial to parallel to optimize response time.

Simple scenarios using WaitGroup can also meet the needs, but what if we need to check the data returned by the rpc call, data processing, data aggregation? The official go library does not have such a tool (CompleteFuture is provided in java), so we implemented an in-process data batching MapReduce concurrent tool based on the MapReduce architecture.

Design ideas

Let's sort out the possible business scenarios for the concurrency tool:

  1. querying product details: supporting concurrent calls to multiple services to combine product attributes, and supporting call errors that can be ended immediately.
  2. automatic recommendation of user card coupons on product details page: support concurrently verifying card coupons, automatically rejecting them if they fail, and returning all of them.

The above is actually processing the input data and finally outputting the cleaned data. There is a very classic asynchronous pattern for data processing: the producer-consumer pattern. So we can abstract the life cycle of data batch processing, which can be roughly divided into three phases.

  1. data production generate
  2. data processing mapper
  3. data aggregation reducer

Data producing is an indispensable stage, data processing and data aggregation are optional stages, data producing and processing support concurrent calls, data aggregation is basically a pure memory operation, so a single concurrent process can do it.

Since different stages of data processing are performed by different goroutines, it is natural to consider the use of channel to achieve communication between goroutines.

How can I terminate the process at any time?

It's simple, just receive from a channel or the given context in the goroutine.

A simple example

Calculate the sum of squares, simulating the concurrency.

package main

import (
    "fmt"
    "log"

    "github.com/kevwan/mapreduce"
)

func main() {
    val, err := mapreduce.MapReduce(func(source chan<- int) {
        // generator
        for i := 0; i < 10; i++ {
            source <- i
        }
    }, func(i int, writer mapreduce.Writer[int], cancel func(error)) {
        // mapper
        writer.Write(i * i)
    }, func(pipe <-chan int, writer mapreduce.Writer[int], cancel func(error)) {
        // reducer
        var sum int
        for i := range pipe {
            sum += i
        }
        writer.Write(sum)
    })
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println("result:", val)
}

More examples: https://github.com/zeromicro/zero-examples/tree/main/mapreduce

References

go-zero: https://github.com/zeromicro/go-zero

Give a Star!

If you like or are using this project to learn or start your solution, please give it a star. Thanks!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].