All Projects → sherlok → sherlok

sherlok / sherlok

Licence: Apache-2.0 license
Distributed restful text mining.

Programming Languages

CSS
56736 projects
javascript
184084 projects - #8 most used programming language
java
68154 projects - #9 most used programming language
HTML
75241 projects
FreeMarker
481 projects
shell
77523 projects

Sherlok

Distributed restful text mining.

Join the chat at https://gitter.im/sherlok/sherlok Build Status

Sherlok is a flexible and powerful open source, distributed, real-time text-mining engine. Sherlok works as a RESTful annotation server based on Apache UIMA. For example, Sherlok can:

  • highlight persons and locations in text (using DKPro OpenNLP),
  • identify proteins and brain regions in biomedical texts (using Bluima),
  • perform sentiment analysis using deep learning (using Stanford Sentiment),
  • analyse the syntax of tweets (using TweetNLP),
  • analyze clinical text and perform knowledge extraction (using Apache cTAKES)

Getting Started

  • Download and unzip the latest Sherlok release
  • Install a Java runtime
  • Run bin/sherlok (Unix), or bin/sherlok.bat (Windows)

Annotate neuron mentions from Python:

pip install --upgrade sherlok

>>> from sherlok import Sherlok
>>> print list(Sherlok().annotate('neuroner', 'layer 4 neuron'))

[(0, 14, 'layer 4 neuron', u'Neuron', {}),
 (8, 14, 'neuron',  u'Neuron', {}),
 (8, 14, 'neuron',  u'NeuronTrigger', {}),
 (0, 7,  'layer 4', u'Layer', {u'ontologyId': u'HBP_LAYER:0000004'})]

Tag persons and locations with Javascript:

require('sherlok');
var text = 'Jack Burton (born April 29, 1954 in El Paso), also known as Jake Burton, is an American snowboarder and founder of Burton Snowboards.';
sherlok.annotate('opennlp.ners.en', text, function(annotation){
      console$(annotation);
});
{ begin=0, end=11,  value="person"}
{ begin=36, end=43, value="location"}
{ begin=60, end=71, value="person"}

More Built-in Text mining pipelines

Further Documentation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].