All Projects โ†’ knadh โ†’ Dictmaker

knadh / Dictmaker

Licence: agpl-3.0
A stand-alone web server app for rapidly building and publishing full fledged dictionary websites and APIs for any language.

Programming Languages

go
31211 projects - #10 most used programming language
language
365 projects

Projects that are alternatives of or similar to Dictmaker

Bewgor
Bull's Eye Wordlist Generator - Does your password rely on predictable patterns of accessible info?
Stars: โœญ 333 (+243.3%)
Mutual labels:  dictionary, wordlist
UltimateCMSWordlists
๐Ÿ“š An ultimate collection wordlists of the best-known CMS
Stars: โœญ 54 (-44.33%)
Mutual labels:  dictionary, wordlist
Joss
The Journal of Open Source Software
Stars: โœญ 779 (+703.09%)
Mutual labels:  publishing, academic
Parsifal
Parsifal is a tool to assist researchers to perform Systematic Literature Reviews
Stars: โœญ 254 (+161.86%)
Mutual labels:  publishing, academic
Duplicut
Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)
Stars: โœญ 352 (+262.89%)
Mutual labels:  dictionary, wordlist
Kaonashi
Wordlist, rules and masks from Kaonashi project (RootedCON 2019)
Stars: โœญ 353 (+263.92%)
Mutual labels:  dictionary, wordlist
Probable Wordlists
Version 2 is live! Wordlists sorted by probability originally created for password generation and testing - make sure your passwords aren't popular!
Stars: โœญ 7,312 (+7438.14%)
Mutual labels:  dictionary, wordlist
Dictionary
Dictionary of Ukrainian counterparts for technical terms
Stars: โœญ 79 (-18.56%)
Mutual labels:  dictionary
Fundamental Haskell
Fundamental Haskell book, to the point terse statements on Haskell, Category theory, and related fields. Encyclopedic pocketbook of meaning. Zen kลan-like meditations of understanding. For quick or memory curve spaced repetition learning.
Stars: โœญ 88 (-9.28%)
Mutual labels:  dictionary
Pyrustic
Lightweight framework and software suite to help develop, package, and publish Python desktop applications
Stars: โœญ 75 (-22.68%)
Mutual labels:  publishing
Wpub
W3C Web Publications
Stars: โœญ 73 (-24.74%)
Mutual labels:  publishing
Dyno
Package dyno is a utility to work with dynamic objects at ease.
Stars: โœญ 81 (-16.49%)
Mutual labels:  dictionary
Eazydict
็ฎ€ๅ•ๆ˜“็”จ็š„ๅ‘ฝไปค่กŒ่ฏๅ…ธ ๐Ÿ“• ๐Ÿ“™ ๐Ÿ“— ๐Ÿ“˜ ๐Ÿ““
Stars: โœญ 92 (-5.15%)
Mutual labels:  dictionary
Dev Terms
A list of generic terminology used by developers
Stars: โœญ 76 (-21.65%)
Mutual labels:  dictionary
Yorubaname Website
Source code for YorubaName dictionary
Stars: โœญ 95 (-2.06%)
Mutual labels:  dictionary
Color Names
Large list of handpicked color names ๐ŸŒˆ
Stars: โœญ 1,198 (+1135.05%)
Mutual labels:  dictionary
Django Publications
A Django app for managing scientific publications.
Stars: โœญ 95 (-2.06%)
Mutual labels:  publishing
Wordlist Dracos
Collection My Wordlist
Stars: โœญ 95 (-2.06%)
Mutual labels:  wordlist
Imtools
Fast and memory-efficient immutable collections and helper data structures
Stars: โœญ 85 (-12.37%)
Mutual labels:  dictionary
Pyglossary
A tool for converting dictionary files aka glossaries. The primary purpose is to be able to use our offline glossaries in any Open Source dictionary we like on any OS/device.
Stars: โœญ 1,257 (+1195.88%)
Mutual labels:  dictionary

dictmaker

dictmaker

dictmaker is a stand-alone, single-binary server application for building and publishing dictionary websites. Alar (Kannada-English dictionary) is an example in production.

  • Generic entry-relation (many)-entry structure for dictionary data in a Postgres database with just two tables (entries, relations)
  • Entirely built on top of Postgres full text search using tsvector tokens
  • Works with any language. Plug in external tokenizers or use a built in tokenizer supported by Postgres for search
  • Possible to have entries and definitions in any number of languages in a single database
  • HTTP/JSON REST like APIs
  • Themes and templates to publish dictionary websites
  • Paginated A-Z (or any alphabet for any language) glossary generation for dictionary words

How it works

dictmaker has no concept of language or semantics. To make a universal dictionary interface possible, it treats all data as unicode strings that can be searched with Postgres DB's fulltext capabilities, where search tokens are generated and stored along with the dictionary entries. There are several built-in fulltext dictionaries and tokenizers that Postgres supports out of the box, mostly European languages (\dFd lists installed dictionaries). For languages that do not have fulltext dictionaries, it is possible to generate search tokens manually using an external algorithm. For instance, to make a phonetic English dictionary, Metaphone tokens can be generated manually and search queries issued against them.

Concepts

  • There can be any number of languages defined in the dictionary. eg: 'english', 'malayalam', 'kannada' etc.
  • All content, the entry words and their definitions, are stored in the entries table
  • Entry-definition many-to-many relationships are stored in the relations table, represented by from_id (entry word) -> to_id (definition word), where both IDs refer to the entries table.

entries table schema

id SERIAL
guid TEXT A custom, unique GUID for every entry, like a UUID or a hash
content TEXT Actual language content. Dictionary word or definition entries
initial TEXT The first "alphabet" of the content. For English, for the word Apple, the initial is A
weight INT An optional numeric value to order search results
tokens TSVECTOR Fulltext search tokens. For English, Postgres' built-in tokenizer gives to_tsvector('fully conditioned') = 'condit':2 'fulli':1
lang TEXT String representing the language of content. Eg: en, english
types TEXT[] Strings describing the types of content. Eg {noun, propernoun}
tags TEXT[] Optional tags
phones TEXT[] Phonetic (pronunciation) descriptions of the content. Eg: {ap(ษ™)l, aapl} for Apple
notes TEXT Optional text notes

relations table schema

from_id INT ID of the head word or the dictionary entry in the entries table
to_id INT ID of the definition content in the entry table
types TEXT[] Strings describing the types of this relation. Eg {noun, propernoun}
weight INT An optional numeric value to order definition results
tags TEXT[] Optional tags
notes TEXT Optional text notes

Installation

  1. Download the latest release release of dictmaker
  2. Run ./dictmaker --new to generate a sample config.toml and DB schema.sql
  3. Create a Postgres DB and execute schema.sql on it to create the tables
  4. Define your dictionary's languages and properties along with other configuration in config.toml
  5. Populate the entries and relations tables with your dictionary data. See the "Sample dictionary" section below
  6. Run the binary: ./dictmaker

Dictionary query API

# /dictionary/from_lang/to_lang/word
# Optional query params: ?type=noun&type=noun2&tag=a&tag=b ...
curl localhost:8080/api/dictionary/english/english/apple

Sample dictionary

The sample/sample.sql shows how to setup an English-English and English-Italian dictionary. Retaining the English-Italian config in the generated sample config file, execute sample/sample.sql on your database.

Then try:

curl localhost:8080/api/dictionary/english/english/apple
curl localhost:8080/api/dictionary/english/italian/apple

Themes

See the alar-dict/alar.ink repository that powers the Alar dictionary. A theme is a directory with a collection of Go HTML templates. Run a theme by passing ./dictmaker --site=theme_dir.

Tokenizer plugins

For languages that do not have Postgres fulltext dictionaries and tokenizers, dictmaker supports loading compiled Go tokenizer plugins that implement the search.Tokenizer interface. See tokenizers/kannada (and Makefile for compilation help).

Licensed under the AGPL v3 license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].