All Projects → MihaiValentin → Lunr Languages

MihaiValentin / Lunr Languages

Licence: other
A collection of languages stemmers and stopwords for Lunr Javascript library

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Lunr Languages

SwiftGenStrings
genstrings replacement for Swift that actually works
Stars: ✭ 29 (-90.2%)
Mutual labels:  localization
labels
Bolt Labels extension - Translatable labels for Bolt
Stars: ✭ 18 (-93.92%)
Mutual labels:  localization
Resnetcam Keras
Keras implementation of a ResNet-CAM model
Stars: ✭ 269 (-9.12%)
Mutual labels:  localization
kapture-localization
Provide mapping and localization pipelines based on kapture format
Stars: ✭ 111 (-62.5%)
Mutual labels:  localization
django-autotranslate
A simple Django app to automatically translate the pot (.po) files generated by django's makemessages command using google translate.
Stars: ✭ 59 (-80.07%)
Mutual labels:  localization
Mojito
An automation platform that enables continuous localization.
Stars: ✭ 256 (-13.51%)
Mutual labels:  localization
Self-Driving-Car-NanoDegree-Udacity
This repository contains code and writeups for projects and labs completed as a part of UDACITY's first of it's kind self driving car nanodegree program.
Stars: ✭ 29 (-90.2%)
Mutual labels:  localization
Eo Locale
🌏Internationalize js apps 👔Elegant lightweight library based on Internationalization API
Stars: ✭ 290 (-2.03%)
Mutual labels:  localization
blazor-ui-messages
Localization messages for Telerik UI for Blazor components: https://www.telerik.com/blazor-ui
Stars: ✭ 24 (-91.89%)
Mutual labels:  localization
Tower
i18n & L10n library for Clojure/Script
Stars: ✭ 264 (-10.81%)
Mutual labels:  localization
i18n
Package i18n is for app Internationalization and Localization.
Stars: ✭ 79 (-73.31%)
Mutual labels:  localization
react-gettext
Tiny React library for implementing gettext localization in your application.
Stars: ✭ 23 (-92.23%)
Mutual labels:  localization
Identityserver4aspnetcoreidentitytemplate
An ASP.NET Core 3.1 IdentityServer4 Identity Bootstrap 4 template with localization
Stars: ✭ 262 (-11.49%)
Mutual labels:  localization
LocaleManager
Changing iOS locale and language on the fly without exiting. RTL supported.
Stars: ✭ 51 (-82.77%)
Mutual labels:  localization
Icu4x
Solving i18n for client-side and resource-constrained environments.
Stars: ✭ 275 (-7.09%)
Mutual labels:  localization
gwizo
Simple Go implementation of the Porter Stemmer algorithm with powerful features.
Stars: ✭ 26 (-91.22%)
Mutual labels:  stemmer
Ruby Stemmer
Expose libstemmer_c to Ruby
Stars: ✭ 254 (-14.19%)
Mutual labels:  stemmer
Js Lingui
🌍📖 A readable, automated, and optimized (5 kb) internationalization for JavaScript
Stars: ✭ 3,249 (+997.64%)
Mutual labels:  localization
Geomapnet
Geometry-Aware Learning of Maps for Camera Localization (CVPR2018)
Stars: ✭ 281 (-5.07%)
Mutual labels:  localization
Docs L10n
Translations of TensorFlow documentation
Stars: ✭ 262 (-11.49%)
Mutual labels:  localization

Lunr Languages npm Bower Join the chat at https://gitter.im/lunr-languages/Lobby CircleCI branch

Lunr Languages is a Lunr addon that helps you search in documents written in the following languages:

  • German
  • French
  • Spanish
  • Italian
  • Japanese
  • Dutch
  • Danish
  • Portuguese
  • Finnish
  • Romanian
  • Hungarian
  • Russian
  • Norwegian
  • Thai
  • Vietnamese
  • Arabic
  • Contribute with a new language

Lunr Languages is compatible with Lunr version 0.6, 0.7, 1.0 and 2.X.

How to use

Lunr-languages works well with script loaders (Webpack, requirejs) and can be used in the browser and on the server.

In a web browser

The following example is for the German language (de).

Add the following JS files to the page:

<script src="lunr.js"></script> <!-- lunr.js library -->
<script src="lunr.stemmer.support.js"></script>
<script src="lunr.de.js"></script> <!-- or any other language you want -->

then, use the language in when initializing lunr:

var idx = lunr(function () {
  // use the language (de)
  this.use(lunr.de);
  // then, the normal lunr index initialization
  this.field('title', { boost: 10 });
  this.field('body');
  // now you can call this.add(...) to add documents written in German
});

That's it. Just add the documents and you're done. When searching, the language stemmer and stopwords list will be the one you used.

In a web browser, with RequireJS

Add require.js to the page:

<script src="lib/require.js"></script>

then, use the language in when initializing lunr:

require(['lib/lunr.js', '../lunr.stemmer.support.js', '../lunr.de.js'], function(lunr, stemmerSupport, de) {
  // since the stemmerSupport and de add keys on the lunr object, we'll pass it as reference to them
  // in the end, we will only need lunr.
  stemmerSupport(lunr); // adds lunr.stemmerSupport
  de(lunr); // adds lunr.de key

  // at this point, lunr can be used
  var idx = lunr(function () {
  // use the language (de)
  this.use(lunr.de);
  // then, the normal lunr index initialization
  this.field('title', { boost: 10 })
  this.field('body')
  // now you can call this.add(...) to add documents written in German
  });
});

With node.js

var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.de.js')(lunr); // or any other language you want

var idx = lunr(function () {
  // use the language (de)
  this.use(lunr.de);
  // then, the normal lunr index initialization
  this.field('title', { boost: 10 })
  this.field('body')
  // now you can call this.add(...) to add documents written in German
});

Indexing multi-language content

If your documents are written in more than one language, you can enable multi-language indexing. This ensures every word is properly trimmed and stemmed, every stopword is removed, and no words are lost (indexing in just one language would remove words from every other one.)

var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);

var idx = lunr(function () {
  // the reason "en" does not appear above is that "en" is built in into lunr js
  this.use(lunr.multiLanguage('en', 'ru'));
  // then, the normal lunr index initialization
  // ...
});

You can combine any number of supported languages this way. The corresponding lunr language scripts must be loaded (English is built in).

If you serialize the index and load it in another script, you'll have to initialize the multi-language support in that script, too, like this:

lunr.multiLanguage('en', 'ru');
var idx = lunr.Index.load(serializedIndex);

How to add a new language

Check the Contributing section

How does Lunr Languages work?

Searching inside documents is not as straight forward as using indexOf(), since there are many things to consider in order to get quality search results:

  • Tokenization
    • Given a string like "Hope you like using Lunr Languages!", the tokenizer would split it into individual words, becoming an array like ['Hope', 'you', 'like', 'using', 'Lunr', 'Languages!']
    • Though it seems a trivial task for Latin characters (just splitting by the space), it gets more complicated for languages like Japanese. Lunr Languages has this included for the Japanese language.
  • Trimming
    • After tokenization, trimming ensures that the words contain just what is needed in them. In our example above, the trimmer would convert Languages! into Languages
    • So, the trimmer basically removes special characters that do not add value for the search purpose.
  • Stemming
    • What happens if our text contains the word consignment but we want to search for consigned? It should find it, since its meaning is the same, only the form is different.
    • A stemmer extracts the root of words that can have many forms and stores it in the index. Then, any search is also stemmed and searched in the index.
    • Lunr Languages does stemming for all the included languages, so you can capture all the forms of words in your documents.
  • Stop words
    • There's no point in adding or searching words like the, it, so, etc. These words are called Stop words
    • Stop words are removed so your index will only contain meaningful words.
    • Lunr Languages includes stop words for all the included languages.

Technical details & Credits

I've created this project by compiling and wrapping stemmers toghether with stop words from various sources so they can be directly used with all the current versions of Lunr.

I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].