All Projects → newca12 → Dictionary Builder

newca12 / Dictionary Builder

Licence: gpl-3.0
Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to Dictionary Builder

Yomichan
Japanese pop-up dictionary extension for Chrome and Firefox.
Stars: ✭ 464 (+1060%)
Mutual labels:  dictionary
Wudao Dict
有道词典的命令行版本,支持英汉互查和在线查询。
Stars: ✭ 746 (+1765%)
Mutual labels:  dictionary
Ihm Dictionary
📖 mmCIF support for hybrid/integrative models
Stars: ✭ 12 (-70%)
Mutual labels:  dictionary
Dictionary Of Pentesting
Dictionary collection project such as Pentesing, Fuzzing, Bruteforce and BugBounty. 渗透测试、SRC漏洞挖掘、爆破、Fuzzing等字典收集项目。
Stars: ✭ 492 (+1130%)
Mutual labels:  dictionary
Awesome chinese medical nlp
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc
Stars: ✭ 623 (+1457.5%)
Mutual labels:  dictionary
Algorithm
Algorithm is a library of tools that is used to create intelligent applications.
Stars: ✭ 787 (+1867.5%)
Mutual labels:  dictionary
Web App
Dictionary database with future API and bot integrations
Stars: ✭ 461 (+1052.5%)
Mutual labels:  dictionary
Zidian
28GB超大字典(dictionary )
Stars: ✭ 38 (-5%)
Mutual labels:  dictionary
Odh
A chrome extension to show online dictionary content.
Stars: ✭ 695 (+1637.5%)
Mutual labels:  dictionary
Ciklinbekin
電子化平話字音表。 戚林八音校注、 Dictionary of the Foochow Dialect。 校對中, 尚未完善, 請謹慎取用。
Stars: ✭ 12 (-70%)
Mutual labels:  dictionary
Dictionary
Programming Dictionary
Stars: ✭ 503 (+1157.5%)
Mutual labels:  dictionary
Vocabulary
[Not Maintained anymore] Python Module to get Meanings, Synonyms and what not for a given word
Stars: ✭ 545 (+1262.5%)
Mutual labels:  dictionary
Potchana
An open source worldwide languages dictionary plug-in for macOS. As features on major Thailand tech forums like MacThai.com
Stars: ✭ 10 (-75%)
Mutual labels:  dictionary
Soapengine
This generic SOAP client allows you to access web services using a your iOS app, Mac OS X app and AppleTV app.
Stars: ✭ 468 (+1070%)
Mutual labels:  dictionary
Probable Wordlists
Version 2 is live! Wordlists sorted by probability originally created for password generation and testing - make sure your passwords aren't popular!
Stars: ✭ 7,312 (+18180%)
Mutual labels:  dictionary
Word forms
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.
Stars: ✭ 463 (+1057.5%)
Mutual labels:  dictionary
Bidict
The bidirectional mapping library for Python.
Stars: ✭ 779 (+1847.5%)
Mutual labels:  dictionary
Slackword
Dictionary in your slack....additionally, you can get random words.
Stars: ✭ 39 (-2.5%)
Mutual labels:  dictionary
Google Ime Dictionary
日英変換・英語略語展開のための IME 追加辞書 📙 日本語から英語への和英変換や英語略語の展開を Google 日本語入力や ATOK などで可能にする IME 拡張辞書です
Stars: ✭ 30 (-25%)
Mutual labels:  dictionary
Kakajson
Fast conversion between JSON and model in Swift.
Stars: ✭ 867 (+2067.5%)
Mutual labels:  dictionary

Dictionary builder OpenHub

About

This project allow you to build dictionaries based on Wiktionary entries.

Dictionary builder used to be a demonstration of advanced JAXB techniques to unmarshall very large xml document with very low memory footprint.
The Java/JAXB implementation has been archived in java-jaxb branch

Then it was re-written with Scala and Akka Streams.
The Scala/akka-stream implementation has been archived in scala-akka-streams branch

And now re-written with Rust.

The resulting dictionnary is exactly the same with the three implementations. None of these implementations was designed to be use as a benchmark but nethertheless Rust results are breathtaking. See below.

dictionary-builder is an EDLA project.

The purpose of edla.org is to promote the state of the art in various domains.

How to use it

  1. Rust need to be installed to generate an executable

  2. Get a fresh wiktionary backup
    Choose your favorite language and download the dump containing the current versions of article content here
    Example for the english dump: http://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles-multistream.xml.bz2

  3. Uncompress the fresh downloaded dump somewhere (Take care you need up to 6 Gigas of free disk space)

  4. Build the executable : cargo build --release

  5. Edit Setings.toml to indicate the language you choose, where the dump is located and last but not least where the dictionary should be generated.
    (Take care you need some free disk space to store your dictionary)

  6. Launch the program : ./target/release/dictionary-builder

  7. Some results :
    From the English dictionary 746879 entries are generated in less than 2 minutes and 3 Gigas disk space are required for the dictionary.

That's it.

Limitations

The Rust version was not tested on Windows systems.

Performance comparaison

Test were done on a modest i7-4600U CPU @ 2.10GHz with SSD.
The results sound like a joke :

Rust Scala/akka streams Java/JAXB
without definition 37s 4min 47s 7min 36s
with definitions 1min 53s 5min 46s 9min 1s

Rust implementation outperform by far the others implementations and the icing on the cake : Rust use ten time less memory. 🚀

License

© 2009-2020 Olivier ROLAND. Distributed under the GPLv3 License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].