All Projects → dstl → baleen3

dstl / baleen3

Licence: Apache-2.0 License
Baleen 3 is a data processing tool based on the Annot8 framework

Programming Languages

typescript
32286 projects
java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to baleen3

machine-learning-data-pipeline
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (+46.67%)
Mutual labels:  data-processing
traceml
Engine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
Stars: ✭ 445 (+2866.67%)
Mutual labels:  data-processing
pyGAPS
A framework for processing adsorption data and isotherm fitting
Stars: ✭ 36 (+140%)
Mutual labels:  data-processing
Processor
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
Stars: ✭ 54 (+260%)
Mutual labels:  data-processing
Anatomy-of-System-Engineering
System Engineering Memory Map
Stars: ✭ 17 (+13.33%)
Mutual labels:  data-processing
Speech-Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 21 (+40%)
Mutual labels:  data-processing
processor
A simple and lightweight JavaScript data processing tool. Live demo:
Stars: ✭ 27 (+80%)
Mutual labels:  data-processing
pulserl
Apache Pulsar client library for Erlang/Elixir
Stars: ✭ 15 (+0%)
Mutual labels:  data-processing
stargate
An Apache Pulsar client written in Elixir
Stars: ✭ 33 (+120%)
Mutual labels:  data-processing
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (+13.33%)
Mutual labels:  data-processing
rec-core
Data pipelining service
Stars: ✭ 19 (+26.67%)
Mutual labels:  data-processing
ECG analysis
No description or website provided.
Stars: ✭ 32 (+113.33%)
Mutual labels:  data-processing
cq
Clojure Command-line Data Processor for JSON, YAML, EDN, XML and more
Stars: ✭ 111 (+640%)
Mutual labels:  data-processing
blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
Stars: ✭ 100 (+566.67%)
Mutual labels:  data-processing
meta-schema
Little DSL to make data processing sane with clojure.spec and spec-tools
Stars: ✭ 25 (+66.67%)
Mutual labels:  data-processing
rsgislib
Remote Sensing and GIS Software Library; python module tools for processing spatial data.
Stars: ✭ 103 (+586.67%)
Mutual labels:  data-processing
mech
🦾 Main repository for the Mech programming language. Start here!
Stars: ✭ 135 (+800%)
Mutual labels:  data-processing
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (+233.33%)
Mutual labels:  data-processing
alfa
♿ Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
Stars: ✭ 75 (+400%)
Mutual labels:  data-processing
bonobo-sqlalchemy
PREVIEW - SQL databases in Bonobo, using sqlalchemy
Stars: ✭ 23 (+53.33%)
Mutual labels:  data-processing

Baleen

Baleen 3 is a tool for building and running data annotation pipelines. It is built on top of the Annot8 data processing framework and provides an easy way to use and interact with Annot8, without needing to do any development yourself.

Prerequisites

To use Baleen 3, you will require Java 11 or later. If you wish to build Baleen 3 yourself, then you will require Apache Maven to be installed.

Getting Started

For a quick guide to getting started with Baleen 3 and building your first simple pipeline, refer to the Getting Started documentation.

Building

From the root directory of the Baleen 3 project, run the following command.

mvn clean package

Alternatively, you can download a pre-built release from the Releases page.

Using

To run Baleen 3, follow the following steps. If you have downloaded a pre-built release, then you should extract the ZIP file and start from Step 4:

  1. Build Baleen 3, as described above
  2. Copy the resultant JAR file, target/baleen-3.*.jar to the folder you want to run Baleen 3 from (referred to as $BALEEN_HOME)
  3. Create the following directories (any you don't create will be automatically created by Baleen 3):
    1. $BALEEN_HOME/components - this is where you will place all Annot8 components
    2. $BALEEN_HOME/pipelines - this is where you will place any pipeline configurations, and where pipelines created by the REST API will be persisted
    3. $BALEEN_HOME/templates - this is where you will place any pipeline templates to be available in the UI to simplify pipeline creation.
  4. Copy any Annot8 component JARs you wish to use into $BALEEN_HOME/components
  5. Run java -jar baleen-3.*.jar

Once running, Baleen 3 will be available at http://localhost:6413.

Development

We use Maven to build the project. It builds both the server, written in java, and the user interface written in TypeScript. The server follows the standard maven project layout with an additional src directory for the user interface application:

├── src
│  ├── main
│  │  ├── app
│  │  ├── java
│  │  └── resources
│  └── test
│     └── java

The ./baleen-dev.sh script can be used to run the jar straight from the target folder. The app has all the usual typescript development support scripts within the app folder. For more information on the UI development see the app README.md

Importing

If you would like to extend the capabilities of Baleen you can add the dependency to your pom:

  <dependency>
    <groupId>uk.gov.dstl</groupId>
    <artifactId>baleen</artifactId>
    <version>${baleen.version}</version>
  </dependency>

and then add the Baleen.class to the your Spring Boot Application, for example:

package org.example;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

import uk.gov.dstl.baleen.Baleen;

// Add other classes as required
@SpringBootApplication(scanBasePackageClasses = { Baleen.class })
public class BaleenExtended {

  public static void main(String[] args) {
    SpringApplication.run(BaleenExtended.class, args);
  }
}

This uses an importable jar instead of the executable jar that is built by default. This is available on maven central. To build this dependency manually run with the importable profile:

mvn clean install -P importable

Then an importable jar will be installed in the local maven repository and can be added to your pom dependencies as above.

Licence

Dstl (c) Crown Copyright 2020

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].