All Projects → j-min → Easy Namuwiki Extractor

j-min / Easy Namuwiki Extractor

Easy Namuwiki Extractor

Programming Languages

python
139335 projects - #7 most used programming language

Easy NamuWiki Extractor

Simple Namuwiki Extractor extension of Namu Wiki Extractor

This module strips the namu mark from a namu wiki document and extracts its plain text only.

Environment

Usage

  • Clone this repo : git clone https://github.com/j-min/Easy-Namuwiki-Extractor

  • Download Namuwiki json dump inside directory of repo : wget http://file2.unofficialnis.ga/namuwiki_161031.json

  • You can find latest dumps here

  • Run extractor: python Run_extractor.py -i input_json_file -o outputfile_name

  • Tags:

--input (-i) : input filename
--output (-o) : output filename
--multiprocess (-m) : run multiprocessing module
--title (-t) : include titles of documents while extracting

How Namuwiki Json looks like

alt tag

Sample Output

alt tag

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].