All Projects → sigmoidanalytics → spork

sigmoidanalytics / spork

Licence: Apache-2.0 License
Pig on Apache Spark

Programming Languages

java
68154 projects - #9 most used programming language
perl
6916 projects
GAP
223 projects
PigLatin
29 projects
shell
77523 projects
python
139335 projects - #7 most used programming language

Apache Pig

Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

Pig compiles these dataflow programs into (sequences of) map-reduce or Apache Tez jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.

General Info

For the latest information about Pig, please visit our website at:

http://pig.apache.org/

and our wiki, at:

http://wiki.apache.org/pig/

Getting Started

  1. To learn about Pig, try http://wiki.apache.org/pig/PigTutorial
  2. To build and run Pig, try http://wiki.apache.org/pig/BuildPig and http://wiki.apache.org/pig/RunPig
  3. To check out the function library, try http://wiki.apache.org/pig/PiggyBank

Contributing to the Project

We welcome all contributions. For the details, please, visit http://wiki.apache.org/pig/HowToContribute.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].