All Projects → LinkedInAttic → Datafu

LinkedInAttic / Datafu

Hadoop library for large-scale data processing, now an Apache Incubator project

Programming Languages

java
68154 projects - #9 most used programming language

Apache DataFu

Follow @apachedatafu

Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by the need for stable, well-tested libraries for data mining and statistics.

It consists of two libraries:

  • Apache DataFu Pig: a collection of user-defined functions for Apache Pig
  • Apache DataFu Hourglass: an incremental processing framework for Apache Hadoop in MapReduce

DataFu is currently undergoing incubation with Apache. A mirror of the official git repository can be found on GitHub at https://github.com/apache/incubator-datafu.

For more information please visit the website:

If you'd like to jump in and get started, check out the corresponding guides for each library:

Blog Posts

Presentations

Videos

Other Resources

An interesting example of using Quantile from DataFu can be found in the Hadoop Real-World Solutions Cookbook.

From Around the Web

Papers

Getting Help

Please visit the website:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].