Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → parpalak → Rose

parpalak / Rose

Licence: mit

Simple PHP search engine that supports Russian and English morphology

Labels

search snippets

Projects that are alternatives of or similar to Rose

Vscode Es7 Javascript React Snippets

Extension for Javascript/React snippets with search supporting ES7 and babel features

Stars: ✭ 435 (+291.89%)

Mutual labels: search, snippets

Lh Cpp

C&C++ ftplugins suite for Vim

Stars: ✭ 108 (-2.7%)

Mutual labels: snippets

Node Sonic Channel

🦉 Sonic Channel integration for Node. Used in pair with Sonic, the fast, lightweight and schema-less search backend.

Stars: ✭ 101 (-9.01%)

Mutual labels: search

Ds2i

A library of inverted index data structures

Stars: ✭ 104 (-6.31%)

Mutual labels: search

Redux Search

Redux bindings for client-side search

Stars: ✭ 1,377 (+1140.54%)

Mutual labels: search

Glsnip

copy and paste across machines

Stars: ✭ 107 (-3.6%)

Mutual labels: snippets

Simpleaudioindexer

Searching for the occurrence seconds of words/phrases or arbitrary regex patterns within audio files

Stars: ✭ 100 (-9.91%)

Mutual labels: search

Datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble

Stars: ✭ 1,635 (+1372.97%)

Mutual labels: search

Regex Snippets

Organized list of useful RegEx snippets

Stars: ✭ 109 (-1.8%)

Mutual labels: snippets

Ghost Search

A simple but powerful search library for Ghost Blogging Platform.

Stars: ✭ 104 (-6.31%)

Mutual labels: search

Sublime Robot Framework Assistant

Robot Framework plugin for Sublime Text3

Stars: ✭ 103 (-7.21%)

Mutual labels: snippets

Cloudboost

Realtime JavaScript Backend.

Stars: ✭ 1,378 (+1141.44%)

Mutual labels: search

Java

All Algorithms implemented in Java

Stars: ✭ 42,893 (+38542.34%)

Mutual labels: search

PHP search-systems made possible

Stars: ✭ 101 (-9.01%)

Mutual labels: search

Unitylibrary

📚 Library of all kind of scripts, snippets & shaders for Unity

Stars: ✭ 1,968 (+1672.97%)

Mutual labels: snippets

Atom Turbo Javascript

Commands and snippets for faster Javascript and Typescript with the Atom Editor

Stars: ✭ 100 (-9.91%)

Mutual labels: snippets

Dotfiles

This is a mirror from https://gitlab.com/andreyorst/dotfiles

Stars: ✭ 103 (-7.21%)

Mutual labels: snippets

Sketchup Ruby Api Tutorials

SketchUp Ruby API Tutorials and Examples

Stars: ✭ 105 (-5.41%)

Mutual labels: snippets

Laravel 5 Snippets

Laravel 5 Snippets for Sublime Text

Stars: ✭ 110 (-0.9%)

Mutual labels: snippets

Go Sonic

Sonic driver written in Go.

Stars: ✭ 110 (-0.9%)

Mutual labels: search

View All Similar Projects ➔

Rose

This is a simple search engine for content sites with simplified English and Russian morphology support. It indexes your content and provides a full-text search.

Requirements

PHP 5.6 or later.
A relational database (MySQL is supported for now) in case of significant content size.

Installation

composer require s2/rose

If you do not use composer, download the archive, unpack it somewhere and ensure including php-files from src/ directory based on a PSR-0/4 scheme. Though you really should use composer.

Usage

Preparing Storage

The index can be stored in a database or in a file. Storage is an abstraction layer that hides implementation details. In most cases you gonna need database storage PdoStorage.

Both indexing and searching require the storage.

$pdo = new \PDO('mysql:host=127.0.0.1;dbname=s2_rose_test;charset=utf8', 'username', 'passwd');
$pdo->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);

use S2\Rose\Storage\Database\PdoStorage;

$storage = new PdoStorage($pdo, 'table_prefix_');

When you want to rebuild the index, you call PdoStorage::erase() method:

$storage->erase();

It drops index tables (if exist) and creates new ones from scratch. This method will be enough to upgrade to a new version of Rose that breaks down the backward compatibility of the index.

Morphology

For natural language processing, Rose uses stemmers. Stemmer cuts off the changing part of words and Rose deals with stems. It has no built-in dictionaries but contains heuristic stemmers developed by Porter. You can integrate any other algorithm by implementing the StemmerInterface.

use S2\Rose\Stemmer\PorterStemmerEnglish;
use S2\Rose\Stemmer\PorterStemmerRussian;

// For optimization primary language goes first (in this case Russian)
$stemmer = new PorterStemmerRussian(new PorterStemmerEnglish());

Indexing

Indexer builds the search index. It depends on a stemmer and a storage.

use S2\Rose\Indexer;

$indexer = new Indexer($storage, $stemmer);

Indexer accepts your data in a special format. The data must be wrapped in the Indexable class:

use S2\Rose\Entity\Indexable;

// required params
$indexable = new Indexable(
	'id_1',            // External ID - an identifier in your system 
	'Test page title', // Title 
	'This is the first page to be indexed. I have to make up a content.',
	1                  // Instance ID - an optional ID of your subsystem 
);

// optional params
$indexable
	->setKeywords('singlekeyword, multiple keywords')       // The same as Meta Keywords
	->setDescription('Description can be used in snippets') // The same as Meta Description
	->setDate(new \DateTime('2016-08-24 00:00:00'))
	->setUrl('url1')
;

$indexer->index($indexable);

$indexable = new Indexable(
	'id_2',
	'Test page title 2',
	'This is the second page to be indexed. Let\'s compose something new.'
);
$indexable->setKeywords('content, page');

$indexer->index($indexable);

The constructor of Indexable requires 4 arguments:

external ID - an arbitrary string ID that is sufficient for your code to identify the page;
page title;
page content;
instance ID - an optional int ID of the page source (e.g. for multi-site services).

You may also provide some optional parameters: keywords, description, date and URL. Keywords affect the relevance. The description can be used for building a snippet (see below). It's a good idea to use the content of "keyword" and "description" meta-tags for this purpose (if you have any, of course). The URL can be an arbitrary string.

The Indexer::index() method is used both for adding and updating the index. If the content is not changed, this method skips the job. Otherwise, the content is being removed and indexed again.

When you remove a page from the site, just call

$indexer->removeById($externalId, $instanceId);

Searching

Full-text search results can be obtained via Finder class. $resultSet->getItems() returns all the information about content items and their relevance.

use S2\Rose\Finder;
use S2\Rose\Entity\Query;

$finder    = new Finder($storage, $stemmer);
$resultSet = $finder->find(new Query('content'));

foreach ($resultSet->getItems() as $item) {
	                         // first iteration:          second iteration:
	$item->getId();          // 'id_2'                    'id_1'
	$item->getInstanceId();  // null                      1
	$item->getTitle();       // 'Test page title 2'       'Test page title'
	$item->getUrl();         // ''                        'url1'
	$item->getDescription(); // ''                        'Description can be used in snippets'
	$item->getDate();        // null                      new \DateTime('2016-08-24 00:00:00')
	$item->getRelevance();   // 31.0                      1.0
	$item->getSnippet();     // ''                        'Description can be used in snippets'
}

Modify the Query object to use a pagination:

$query = new Query('content');
$query
	->setLimit(10)  // 10 results per page
	->setOffset(20) // third page
;
$resultSet = $finder->find($query);

Adjust the relevance for favorite and popular pages:

use S2\Rose\Entity\ExternalId;

$resultSet = $finder->find(new Query('content'));
$externalId1 = $resultSet->getFoundExternalIds()->toArray()[0];
var_dump($externalId1->getId(), $externalId1->getInstanceId()); // id_1 1
$resultSet->setRelevanceRatio($externalId1, 3.14);
$resultSet->setRelevanceRatio(new ExternalId('id_2', null), 2);

foreach ($resultSet->getItems() as $item) {
	                         // first iteration:          second iteration:
	$item->getId();          // 'id_2'                    'id_1'
	$item->getRelevance();   // 62.0                      3.14
}

Provide instance id to limit the scope of the search with a subsystem:

$resultSet = $finder->find((new Query('content'))->setInstanceId(1));
$resultSet->setRelevanceRatio('id_1', 3.14);

foreach ($resultSet->getItems() as $item) {
	                         // first iteration:
	$item->getId();          // 'id_1'
	$item->getInstanceId();  // 1
}

Highlighting and Snippets

It's a common practice to highlight the found words in the search results. You can obtain the highlighted title:

$resultSet = $finder->find(new Query('title'));
$resultSet->getItems()[0]->getHighlightedTitle($stemmer); // 'Test page <i>title</i>'

This method requires the stemmer since it takes into account the morphology and highlights all the word forms. By default, words are highlighted with italics. You can change the highlight template by calling $finder->setHighlightTemplate('<b>%s</b>').

Snippets are small text fragments containing found words displaying in the search result. SnippetBuilder processes the source and selects best matching sentences. It should be done just before $resultSet->getItems():

use S2\Rose\Entity\ExternalContent;
use S2\Rose\SnippetBuilder;

$snippetBuilder = new SnippetBuilder($stemmer);
$this->snippetBuilder->setSnippetLineSeparator(' &middot; '); // Set snippet line separator. Default is '... '.
$snippetBuilder->attachSnippets($resultSet, static function (array $externalIds) {
    /** @var \S2\Rose\Entity\ExternalId[] $externalIds */

	$result = new ExternalContent();
	foreach ($externalIds as $externalId) {
		if ($externalId->getId() === 'id_1') {
			$result->attach($externalId, 'This page is to be indexed. I have to make up a content.');
		}
		else {
			$result->attach($externalId, 'This is the second page to be indexed. Let\'s compose something new.');
		}
	}
	return $result;
});

$resultSet->getItems()[0]->getSnippet(); // 'I have to make up a <i>content</i>.'

Words in snippets are highlighted the same way as in titles.

Building snippets is quite a heavy operation. Use it with pagination to reduce the snippet generation time.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 111

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗