All Projects → bixo → Bixo

bixo / Bixo

Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading pipe assembly, you can quickly create specialized web mining applications.

Programming Languages

arc
50 projects

=============================== Introduction

Bixo is an open source Java web mining toolkit that runs as a series of Cascading pipes. It is designed to be used as a tool for creating customized web mining apps. By building a customized Cascading pipe assembly, you can quickly create a workflow using Bixo that fetches web content, parses, analyzes, and publishes the results.

Bixo borrows heavily from the Apache Nutch project, as well as many other open source projects at Apache and elsewhere.

Bixo is released under the Apache License, Version 2.0.

=============================== Building

See http://openbixo.org/documentation/building-bixo/ for full details.

You need Apache Ant 1.7 or higher.

To get a list of valid targets:

% cd % ant -p

To clean and build a jar (which also runs all tests):

% ant clean jar

Note that "ant clean test jar" will currently fail, due to a bug in the maven ant task plugin used for managing dependencies.

To create Eclipse project files:

% ant eclipse

Then, from Eclipse follow the standard procedure to import an existing Java project into your Workspace.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].