1. Hadoop PotA scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
3. SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
5. AgePredictorAge classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum
6. autoextractorA toolkit for clustering web pages based on various similarity measures.