1. Rouge 2.0ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
2. Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
3. OpinRankOpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)
4. phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English