1. Wiki2textExtract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.
3. Text As DataA PyData 2013 talk on straightforward, data-driven ways to handle natural language text in Python.
4. wikiparsecAn LL parser for extracting information from Wiki text, particularly Wiktionary.
5. python-ftfyFixes mojibake and other glitches in Unicode text, after the fact.