All Projects → haptork → Easylambda

haptork / Easylambda

Licence: other
distributed dataflows with functional list operations for data processing with C++14

Programming Languages

cpp14
131 projects

Projects that are alternatives of or similar to Easylambda

t8code
Parallel algorithms and data structures for tree-based AMR with arbitrary element shapes.
Stars: ✭ 37 (-92.21%)
Mutual labels:  hpc, parallel, mpi
Core
parallel finite element unstructured meshes
Stars: ✭ 124 (-73.89%)
Mutual labels:  parallel, hpc, mpi
Raftlib
The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators
Stars: ✭ 717 (+50.95%)
Mutual labels:  parallel, dataflow-programming, hpc
Hpcinfo
Information about many aspects of high-performance computing. Wiki content moved to ~/docs.
Stars: ✭ 171 (-64%)
Mutual labels:  parallel, hpc, mpi
future.batchtools
🚀 R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools
Stars: ✭ 77 (-83.79%)
Mutual labels:  hpc, parallel, distributed-computing
hp2p
Heavy Peer To Peer: a MPI based benchmark for network diagnostic
Stars: ✭ 17 (-96.42%)
Mutual labels:  hpc, parallel, mpi
Future.apply
🚀 R package: future.apply - Apply Function to Elements in Parallel using Futures
Stars: ✭ 159 (-66.53%)
Mutual labels:  parallel, distributed-computing, hpc
ParMmg
Distributed parallelization of 3D volume mesh adaptation
Stars: ✭ 19 (-96%)
Mutual labels:  hpc, parallel, mpi
ParallelUtilities.jl
Fast and easy parallel mapreduce on HPC clusters
Stars: ✭ 28 (-94.11%)
Mutual labels:  hpc, parallel, distributed-computing
pystella
A code generator for grid-based PDE solving on CPUs and GPUs
Stars: ✭ 18 (-96.21%)
Mutual labels:  hpc, mpi
cruise
User space POSIX-like file system in main memory
Stars: ✭ 27 (-94.32%)
Mutual labels:  hpc, parallel
dbcsr
DBCSR: Distributed Block Compressed Sparse Row matrix library
Stars: ✭ 65 (-86.32%)
Mutual labels:  hpc, mpi
hyperqueue
Scheduler for sub-node tasks for HPC systems with batch scheduling
Stars: ✭ 48 (-89.89%)
Mutual labels:  hpc, distributed-computing
zmq
ZeroMQ based distributed patterns
Stars: ✭ 27 (-94.32%)
Mutual labels:  parallel, distributed-computing
yask
YASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.
Stars: ✭ 81 (-82.95%)
Mutual labels:  hpc, mpi
muster
Massively Scalable Clustering
Stars: ✭ 22 (-95.37%)
Mutual labels:  parallel, mpi
wxparaver
wxParaver is a trace-based visualization and analysis tool designed to study quantitative detailed metrics and obtain qualitative knowledge of the performance of applications, libraries, processors and whole architectures.
Stars: ✭ 23 (-95.16%)
Mutual labels:  hpc, mpi
tbslas
A parallel, fast solver for the scalar advection-diffusion and the incompressible Navier-Stokes equations based on semi-Lagrangian/Volume-Integral method.
Stars: ✭ 21 (-95.58%)
Mutual labels:  parallel, mpi
libquo
Dynamic execution environments for coupled, thread-heterogeneous MPI+X applications
Stars: ✭ 21 (-95.58%)
Mutual labels:  hpc, mpi
PartitionedArrays.jl
Vectors and sparse matrices partitioned into pieces for parallel distributed-memory computations.
Stars: ✭ 45 (-90.53%)
Mutual labels:  hpc, mpi

Build Status codecov

ezl: easyLambda

Parallel data processing made easy using functional and dataflow programming with modern C++

Welcome to easyLambda and thanks for your interest. The site aims to be a comprehensive guide for easyLambda.

What is easyLambda

EasyLambda is header only C++14 library for data processing in parallel with functional list operations (map, filter, reduce, scan, zip) that are tied together in type--safe dataflow.

EasyLambda is parallel, it scales from multiple cores to hundreds of distributed nodes without any need to deal with parallelism in user code.

EasyLambda is fast. It has minimal overhead in serial execution and builds upon high performance MPI parallelism that is known to be more efficient than any other comparable work [1].

EasyLambda is expressive and succinct, thanks to the column selection for composition of functions and many generic algorithms such as configurable parallel file reader, predicates, correlation, summary etc.

EasyLambda is intuitive and easy to understand with its uniform property based (or ExpressionBuilder) interface for everything from configuring parallelism to changing behavior of generic algorithms to routing dataflow.

EasyLambda is easily interoperable with other libraries like standard library or raw MPI code, since it uses standard data types and enforces no special structure, data-types or requirements on the user functions.

Why easyLambda

EasyLambda is a good fit for the following tasks:

  • table/list processing and analysis from CSV or flat text files.
  • post-processing of scientific simulation results.
  • running iterative machine learning algorithms.
  • parallel type-safe data reading.
  • to play with dataflow programming and functional list operations.

Since, it can smoothly interoperate with other libraries, it is possible to add distributed parallelism using easyLambda to the existing libraries or codebase when its programming abstraction fits well e.g. it can be used along with bare MPI code or with a machine learning library to add distributed training and testing.

EasyLambda will also interest you if you

  • are a modern C++ enthusiast
  • want to dabble with metaprogramming
  • like functional and dataflow programming
  • have cluster resources that you want to put to use in everyday tasks without much effort.
  • have always wanted a high-level MPI interface.

Benchmarks

EasyLambda combines the efficiency of MPI with a high level programming abstraction. With easyLambda you get easy to understand code with good run-time performance. Check out the benchmarks and comparisons for performance and ease of use.

benchmarks

Getting Started

Check out the Getting Started section of the library webpage to know how to install and begin with easyLambda. The library can also be used on aws elastic cloud or single instance.

Examples

A detailed walkthrough of the library is given here, The examples directory contains various examples and demonstrations with explanations of features and options.

Here we mention some examples in short.

Example wordcount

The following program calculates frequency of each word in the data files.

auto reader = fromFile<string>(argv[1]).rowSeparator('s').colSeparator("");
ezl::rise(reader)
  .reduce<1>(ezl::count(), 0).dump()
  .run();

The dataflow pipeline starts with rise and subsequent operations are added to it. In the above example, the pipeline begins by reading in data from the specified file(s). fromFile is a library function that takes column types and the specified file(s) glob pattern as input and reads the file(s) in parallel. It has a lot of properties for controlling data-format, parallelism, denormalization etc (shown in demoFromFile).

In reduce we pass the index of the key column to group by, the library function for counting and initial value of the result.

Example pi (Monte-Carlo)

Following is a dataflow for calculating pi using Monte-Carlo method.

ezl::rise(ezl::kick(10000)) // 10000 trials shared over all processes
  .map([] { 
    return pow(rnd(), 2) + pow(rnd(), 2);
  })
  .filter(ezl::lt(1.))
  .reduce(ezl::count(), 0)
  .map([](int inCircleCount) { 
    return (4.0 * inCircleCount / 10000); 
  }).dump()
  .run();

The dataflow starts with rise in which we pass a library function to call the next unit a number of times. The steps in the algorithm have been expressed with the composition of small operations, some are common library functions like count(), lt() (less-than) and some are user-defined functions specific to the problem.

Example CSV stats

Here is another example from cods2016. A stripped version of the input data-file is given with ezl here. The data contains student profiles with scores, gender, job-salary, city etc.

auto scores = ezl::fromFile<char, array<float, 3>>(fileName)
                .cols({"Gender", "English", "Logical", "Domain"})
                .colSeparator("\t");

ezl::rise(scores)
  .filter<2>(ezl::gtAr<3>(0.F))   // filter valid domain scores > 0
  .map<1>([] (char gender) {      // transforming with 0/1 for isMale
    return float(gender == 'm');
  }).colsTransform()
  .reduceAll(ezl::corr<1>())
    .dump("", "Corr. of gender with scores\n(gender|E|L|D)")
  .run();

The above example prints the correlation of English, logical and domain scores with respect to gender. We can find similarity of the above code with steps in a spreadsheet analysis or with SQL query. We select the columns to work with viz. gender and three scores. We filter the rows based on a column and predicate. Next, we transform a selected column in-place and then find an aggregate property (correlation) for all the rows.


Contributing

Suggestions and feedback are welcome. Feel free to contact via mail or issues for any query.

Some of the possible directions of improvement:

  • compile time optimization
  • use of specialized data structures in various units like reduce etc.
  • addition of more examples e.g. neural nets, simulations etc.
  • design simplifications
  • parallelism optimization
  • code reviews
  • documentation

Possible ideas for future extenstions:

  • fault tolerance
  • algorithms / functions to plot streaming and buffered data
  • domain specific algorithms
  • MPI single-sided communications
  • Experiments to extend current programming abstraction to cover more problems like domain-decomposition etc.

Check internals and blog for design and implementation details.

Acknowledgments

A big thanks to cppcon, meetingc++ and other conferences and all C++ expert speakers, committee members and compiler implementers for modernising C++ and teaching it with so much enthusiasm. I had fun implementing this, hoping you will have fun using it. Looking forward to learn more from the community.

I wish to thank eicossa and Nitesh for their (less online, more offline :P) contributions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].