All Projects → danieldk → Dictomaton

danieldk / Dictomaton

Licence: apache-2.0
Finite state dictionaries in Java

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Dictomaton

Phpcollections
A set of collections for PHP.
Stars: ✭ 53 (-57.26%)
Mutual labels:  dictionary, collections
Stencil Store
Store is a lightweight shared state library by the StencilJS core team. Implements a simple key/value map that efficiently re-renders components when necessary.
Stars: ✭ 107 (-13.71%)
Mutual labels:  dictionary, state
Buckets Js
A complete, fully tested and documented data structure library written in pure JavaScript.
Stars: ✭ 1,128 (+809.68%)
Mutual labels:  dictionary, collections
Mlib
Library of generic and type safe containers in pure C language (C99 or C11) for a wide collection of container (comparable to the C++ STL).
Stars: ✭ 321 (+158.87%)
Mutual labels:  dictionary, collections
Data Structures With Go
Data Structures with Go Language
Stars: ✭ 121 (-2.42%)
Mutual labels:  dictionary, collections
Tiny Atom
Pragmatic and concise state management.
Stars: ✭ 109 (-12.1%)
Mutual labels:  state
100 Words Design Patterns Java
GoF Design Patterns, each pattern described with story from real life.
Stars: ✭ 117 (-5.65%)
Mutual labels:  state
Samples
Community driven repository for Dapr samples
Stars: ✭ 104 (-16.13%)
Mutual labels:  state
Jellyfish
🎐 a python library for doing approximate and phonetic matching of strings.
Stars: ✭ 1,571 (+1166.94%)
Mutual labels:  levenshtein
Utils.js
Useful JavaScript Functions Collection 一些很实用的JavaScript函数封装集合
Stars: ✭ 121 (-2.42%)
Mutual labels:  collections
En2bg4term
грижливо подбиран речник с преводи на често срещани понятия от света на ИТ в превод на български
Stars: ✭ 120 (-3.23%)
Mutual labels:  dictionary
Fastenshtein
The fastest .Net Levenshtein around
Stars: ✭ 115 (-7.26%)
Mutual labels:  levenshtein
Reworm
🍫 the simplest way to manage state
Stars: ✭ 1,467 (+1083.06%)
Mutual labels:  state
State
Finite state machine for TypeScript and JavaScript
Stars: ✭ 118 (-4.84%)
Mutual labels:  state
Android Keyboard
Android Keyboard with 180+ dictionaries. Support swipe input (sliding input), Emoji keyboard, AI predictions, dictionaries downloading, and keyboard themes.
Stars: ✭ 108 (-12.9%)
Mutual labels:  dictionary
Ichiran
Linguistic tools for texts in Japanese language
Stars: ✭ 120 (-3.23%)
Mutual labels:  dictionary
Collections.pooled
Fast, low-allocation ports of List, Dictionary, HashSet, Stack, and Queue using ArrayPool and Span.
Stars: ✭ 115 (-7.26%)
Mutual labels:  dictionary
Test State
Scala Test-State.
Stars: ✭ 119 (-4.03%)
Mutual labels:  state
React Workshop
⚒ 🚧 This is a workshop for learning how to build React Applications
Stars: ✭ 114 (-8.06%)
Mutual labels:  state
Dictionary
Словари по фронтенду
Stars: ✭ 1,682 (+1256.45%)
Mutual labels:  dictionary

dictomaton

Introduction

This Java library implements dictionaries that are stored in finite state automata. Dictomaton has the following features:

  • Finite state dictionaries that implement the Java Set interface.
  • Perfect hash dictionaries, that provide a unique hash for each character sequence that is in the dictionary. Perfect hash dictionaries can be used in two directions: (1) obtaining the hash code for a character sequence and (2) obtaining the character sequence for a hash code.
  • Levenshtein automata, that allow you to efficiently find all the sequences in the dictionary that are within the given edit distance of a sequence.
  • String to primitive type mappings, where the keys are stored in a perfect hashing automaton and the values in an (unboxed) array.

Using Dictomaton

Dictomaton is in the Maven Central Repository:

<dependency>
    <groupId>eu.danieldk.dictomaton</groupId>
    <artifactId>dictomaton</artifactId>
    <version>1.1.1</version>
</dependency>

SBT:

libraryDependencies += "eu.danieldk.dictomaton" % "dictomaton" % "1.1.1"

Grails:

compile 'eu.danieldk.dictomaton:dictomaton:1.1.1'

Comparisons

The following table compares the sizes of the object graphs of the Dictionary type of this library to that of TreeSet and HashSet. The comparisons were obtained by storing all the words in the web2 and web2a dictionaries and were measured using memory-measurer

Data type Objects References char int boolean float
TreeSet 936277 1872555 3193749 624184 312091 0
HashSet 936277 1772657 3193749 936277 1 1
Dictionary 41188 94546 424169 397033 1 1

Benchmarks

Benchmarks are in a different test group than normal unit tests. You can run benchmarks via Maven, adding the Benchmarks group:

mvn test -Djunit.groups=eu.danieldk.dictomaton.categories.Benchmarks

Changelog

1.2.0

  • Exposing state through StateInfo object, which allows user of PerfectHashDictionary to resume transitions, which makes it e.g. far more efficient to look up a string and its prefixes. (contributed by René Kriegler).
  • DictionaryBuilder now accepts adding more general CharSequence instead of String and uses CharSequence internally (contributed by René Kriegler).

1.1.0

  • Added immutable mapping from String to a generic type.
  • Added a key-ordered builder for immutable mappings. This builder is more efficient since it construct the key automaton on the fly.

1.0.0

  • Added Levenshtein automata for looking up sequences in a Dictionary that are within a certain edit distance of a sequence.
  • Provide a variant of perfect hash automata that puts right language cardinalities in transitions rather than states. This provides faster hashing and hashcode lookups at the cost of some memory.
  • Added String to String mapping (ImmutableStringStringMap).
  • Generic object values.

0.0.3

  • Fix an off-by-one error in integer width of the state table.

0.0.2

  • Rename the project from fsadict-java to dictomaton.
  • Store the state and transition tables as packed int arrays, resulting in drastically smaller automata.

Release plan

Plans for 1.3.0: Perhaps an explicit, fast, and compact data storage format as an alternative to Java serialization. C or C++ version.

Contributors

  • Daniël de Kok (maintainer)
  • René Kriegler
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].