ContinuumIO / Nutchpy
Licence: apache-2.0
For interacting with nutch via Python
Stars: ✭ 20
Programming Languages
java
68154 projects - #9 most used programming language
Nutchpy
Introduction
Nutchpy is a Python library for working with Apache Nutch. In particular, the library provides functionality to work with existing Nutch data structures including various readers for the Nutch EcoSystem e.g. readers for Sequence Files, LinkDb, Nodes, etc. A small examples directory exists showing how Nutchpy can be used to interact with some of the above data strutures.
Install
To build nutchpy
from source, run the following commands in your terminal:
git clone https://github.com/ContinuumIO/nutchpy.git
conda install -c blaze apache-maven
cd nutchpy; python setup.py install;
Alternatively, you can download nutchpy
from binstar with conda:
conda install -c blaze nutchpy
Running
import nutchpy
node_path = "<FULL-PATH>/data"
seq_reader = nutchpy.sequence_reader
print(seq_reader.head(10,node_path))
print(seq_reader.slice(10,20,node_path))
Run Requirements
- JDK 1.6+
- python
- py4j
Build Requirements
- python
- apache-maven (
conda install -c blaze apache-maven
)
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].