All Projects → dstl → Baleen

dstl / Baleen

Licence: apache-2.0
Entity Extraction Text Processor

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Baleen

scikitcrf NER
Python library for custom entity recognition using Sklearn CRF
Stars: ✭ 17 (-88.44%)
Mutual labels:  entity-extraction
Multiple Relations Extraction Only Look Once
Multiple-Relations-Extraction-Only-Look-Once. Just look at the sentence once and extract the multiple pairs of entities and their corresponding relations. 端到端联合多关系抽取模型,可用于 http://lic2019.ccf.org.cn/kg 信息抽取。
Stars: ✭ 269 (+82.99%)
Mutual labels:  entity-extraction
Nlp tensorflow project
Use tensorflow to achieve some NLP project, eg: classification chatbot ner attention QAetc.
Stars: ✭ 27 (-81.63%)
Mutual labels:  entity-extraction
alter-nlu
Natural language understanding library for chatbots with intent recognition and entity extraction.
Stars: ✭ 45 (-69.39%)
Mutual labels:  entity-extraction
laravel-nlp
Laravel wrapper for common NLP tasks
Stars: ✭ 41 (-72.11%)
Mutual labels:  entity-extraction
Deeppavlov
An open source library for deep learning end-to-end dialog systems and chatbots.
Stars: ✭ 5,525 (+3658.5%)
Mutual labels:  entity-extraction
lima
The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Stars: ✭ 75 (-48.98%)
Mutual labels:  entity-extraction
Clustype
Automatic Entity Recognition and Typing for Domain-Specific Corpora (KDD'15)
Stars: ✭ 99 (-32.65%)
Mutual labels:  entity-extraction
A Pytorch Tutorial To Sequence Labeling
Empower Sequence Labeling with Task-Aware Neural Language Model | a PyTorch Tutorial to Sequence Labeling
Stars: ✭ 257 (+74.83%)
Mutual labels:  entity-extraction
Entity Recognition Datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Stars: ✭ 891 (+506.12%)
Mutual labels:  entity-extraction
InformationExtractionSystem
Information Extraction System can perform NLP tasks like Named Entity Recognition, Sentence Simplification, Relation Extraction etc.
Stars: ✭ 27 (-81.63%)
Mutual labels:  entity-extraction
nlpserver
NLP Web Service
Stars: ✭ 76 (-48.3%)
Mutual labels:  entity-extraction
Dbpedia Spotlight
DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.
Stars: ✭ 729 (+395.92%)
Mutual labels:  entity-extraction
node-alchemy
An Alchemy API library for Node.JS
Stars: ✭ 54 (-63.27%)
Mutual labels:  entity-extraction
Recognizers Text
Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, and date/time expressed in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI. Partial support for NL, JA, KO, SV). Contributions are greatly welcome! Packages are available at https://www.nuget.org/profiles/Recognizers.Text and https://www.npmjs.com/~recognizers.text
Stars: ✭ 915 (+522.45%)
Mutual labels:  entity-extraction
extractacy
Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)
Stars: ✭ 47 (-68.03%)
Mutual labels:  entity-extraction
Nlp.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
Stars: ✭ 4,670 (+3076.87%)
Mutual labels:  entity-extraction
Ruijin round1
瑞金医院MMC人工智能辅助构建知识图谱大赛初赛
Stars: ✭ 117 (-20.41%)
Mutual labels:  entity-extraction
Open Semantic Entity Search Api
Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of entities like persons, organizations and places for (semi)automatic semantic tagging & analysis of documents by linked data knowledge graph like SKOS thesaurus, RDF ontology, database(s) or list(s) of names
Stars: ✭ 98 (-33.33%)
Mutual labels:  entity-extraction
Entity Relation Extraction
Entity and Relation Extraction Based on TensorFlow and BERT. 基于TensorFlow和BERT的管道式实体及关系抽取,2019语言与智能技术竞赛信息抽取任务解决方案。Schema based Knowledge Extraction, SKE 2019
Stars: ✭ 784 (+433.33%)
Mutual labels:  entity-extraction

Baleen

Baleen 2 has now reached the end of it's life, and we encourage all users to move to Baleen 3. There is no intention to provide further updates or any significant support for Baleen 2.

Build Status Coverage Status Codacy Badge

Baleen is an extensible text processing capability that allows entity-related information to be extracted from unstructured and semi-structured data sources. It makes available in a structured format things of interest otherwise stored in formats such as text documents - references to people, organisations, unique identifiers, location information.

Baleen is written in Java 8 using the software project management tool Maven 3 and draws heavily on the Apache Unstructured Information Management Architecture (UIMA) which provides a framework, components and infrastructure to handle unstructured information management.

Baleen was written by the Defence Science and Technology Laboratory (Dstl) in support of UK Defence users looking to extract entities and search unstructured text documents. License information can be found in the accompanying LICENSE.txt file in this repository and the licenses of libraries on which Baleen is dependent are listed in the file THIRD-PARTY.txt.

Baleen is still under active development, and is released here not as a final product but as a work in progress. As such, there may be bugs, issues, typos, mistakes in the documentation, and more. We hope that contributions from other users will improve Baleen and result in a better framework for others to use.

Upgrading to 2.4 and later

Baleen 2.4 (and later) contain a number of changes that may make it incompatible with older pipelines - for upgrade guidance see the Upgrading Between Versions wiki page.

Getting Started

Baleen includes an in-built server, which hosts full documentation and guides on how to use Baleen. To get started, you will need to launch this server and read this documentation. To launch the server, run the following command.

java -jar baleen-2.8.0-SNAPSHOT.jar

Once running, the server can be accessed at http://localhost:6413.

If you require the Javadoc to be available through the in-built server, then you should place the Baleen Javadoc JAR in the same directory as the Baleen JAR.

Prerequisites

Running

To run Baleen, you will need:

  • A sensible amount of RAM. Start with 4GB and alter according to the number of annotators being employed.
  • Java 8 or larer

Developing

The develop with Baleen, we suggest you use:

  • Oracle Java JDK 1.8
  • Eclipse Mars or greater (assumed to include Maven)
  • Maven

Baleen requires Java 8 or later.

Licence

Crown copyright 2017-2019

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License

Data

Baleen contains data derived from other data sources. For more information, please refer to the Baleen source code.

Code-Point Open

Licensed under the Open Government Licence (OGL) v3 - http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

Contains OS data (c) Crown copyright and database right 2015

Contains Royal Mail data (c) Royal Mail copyright and database right 2015

Contains National Statistics data (c) Crown copyright and database right 2015

Countries JSON

Licensed under the ODC Open Database Licence (ODbL) 1.0 - http://opendatacommons.org/licenses/odbl/1.0/

Any rights in individual contents of the database are licensed under the Database Contents License - http://opendatacommons.org/licenses/dbcl/1.0/

Countries GeoJSON

Licensed under the ODC Public Domain Dedication and Licence (PDDL) 1.0 - http://opendatacommons.org/licenses/pddl/1.0/

OpenNLP Language Models

Licensed under the Apache Software License 2.0 - http://www.apache.org/licenses/LICENSE-2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].