GitPlanet
Projects
Users
Categories
Languages
About
All Git Users
→ chrismattmann
5 open source projects by chrismattmann
[ Open user page on Github ]
1.
Tika Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
✭ 997
python
nlp
detection
parse
nlp-machine-learning
recognition
buffer
text-recognition
nlp-library
mime
extraction
text-extraction
2.
tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
✭ 92
python
HTML
machine-learning
information-retrieval
clustering
tika
cosine-similarity
jaccard-similarity
cosine-distance
similarity-score
tika-similarity
metadata-features
tika-python
3.
lucene-geo-gazetteer
Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.
✭ 34
java
shell
nlp
geonames
apache
opennlp
lucene
nlp-machine-learning
gazetteer
irds
geoindex
allcountries
4.
solrcene
Spatial Branch of Apache Solr
✭ 13
java
ruby
javascript
perl
python
C++
5.
trec-dd-polar
A dataset downloaded from the deep and scientific web across three major Polar data centers for use in research.
✭ 13
shell
python
1-5
of
5
user projects