Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → izenecloud → Sf1r Lite

izenecloud / Sf1r Lite

Licence: apache-2.0

Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search

Labels

search-engine information-retrieval

Projects that are alternatives of or similar to Sf1r Lite

Apache Solr open-source search software

Stars: ✭ 651 (+312.03%)

Mutual labels: search-engine, information-retrieval

A math-aware search engine.

Stars: ✭ 278 (+75.95%)

Mutual labels: search-engine, information-retrieval

Apache Lucene open-source search software

Stars: ✭ 1,009 (+538.61%)

Mutual labels: search-engine, information-retrieval

PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.

Stars: ✭ 71 (-55.06%)

Mutual labels: search-engine, information-retrieval

Hardware-accelerated vector-based search engine. Available as a HTTP service or as an embedded library.

Stars: ✭ 529 (+234.81%)

Mutual labels: search-engine, information-retrieval

query-wellformedness

25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.

Stars: ✭ 80 (-49.37%)

Mutual labels: search-engine, information-retrieval

Search Engine in Erlang

Stars: ✭ 27 (-82.91%)

Mutual labels: search-engine, information-retrieval

Evildork targeting your fiancee👁️

Stars: ✭ 46 (-70.89%)

Mutual labels: search-engine, information-retrieval

PISA: Performant Indexes and Search for Academia

Stars: ✭ 489 (+209.49%)

Mutual labels: search-engine, information-retrieval

Apache Lucene and Solr open-source search software

Stars: ✭ 4,217 (+2568.99%)

Mutual labels: search-engine, information-retrieval

Conceptualsearch

Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs

Stars: ✭ 245 (+55.06%)

Mutual labels: search-engine, information-retrieval

Vectorsinsearch

Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015

Stars: ✭ 71 (-55.06%)

Mutual labels: search-engine, information-retrieval

Drop in solution for Decentralized Neural Information Retrieval. Index latent vectors along with JSON metadata and do efficient k-NN search.

Stars: ✭ 222 (+40.51%)

Mutual labels: search-engine, information-retrieval

Rated Ranking Evaluator

Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures

Stars: ✭ 134 (-15.19%)

Mutual labels: search-engine, information-retrieval

A Python implementation of the BM25 ranking function.

Stars: ✭ 159 (+0.63%)

Mutual labels: search-engine, information-retrieval

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+129.11%)

Mutual labels: search-engine, information-retrieval

Relevancyfeedback

Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, conceptual search, semantic search and personalized search

Stars: ✭ 19 (-87.97%)

Mutual labels: search-engine, information-retrieval

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+2057.59%)

Mutual labels: search-engine, information-retrieval

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

Stars: ✭ 12,347 (+7714.56%)

Mutual labels: search-engine

🔍 Ambar: Document Search Engine

Stars: ✭ 1,829 (+1057.59%)

Mutual labels: search-engine

View All Similar Projects ➔

SF1R-Lite(Search Formula-1 Lite Engine)

A distributed massive data engine for vertical search in C++.

Features

Flexible configuration. SF1R could be highly configurable to support either distributed or non-distributed search engine. For Asia languages, different kinds of morphlogical analyzer or dedicated tokenizer could be applied as well to be adapted to different situations. Each SF1R instance could be configured to support multiple collections, while the concept of collection could be compared with "Table" in RDBMS. Collections could managed totally dynamically without stopping the server instance.
Commercially proved . SF1R has been fully proved under commercial environments with both complicated situations and ultra high concurrency. In order to satisfy different kinds of requirements, three kinds of indices are supported within SF1R, including Lucene like file based inverted index, pure memory based inverted index with ultra high decompression performance, and succinct self index. This is a practical deployment for a search cloud with both distributed and non-distributed verticals, all of them are behind a single nginx based http reverse proxy to provide unified entry.
Mining components extendable. In the early stage of SF1R, there are tens of mining components attached, such as duplicate detection,taxonomy generation, query recommendation, collaborative filtering,...,etc. To keep the repository as lite as possible, we made some refinements to remove most mining components. However, the architecture of SF1R has guaranteed the flexibility to introduce any of them, actually, one of index---succinct self index, it was encapsulated using mining component for conveniences.

Documents

The Chinese documents could be accessed here, while we also prepared the English technical report.

Dependencies

We've just switched to C++ 11 for SF1R recently, and GCC 4.8 is required to build SF1R correspondingly. We do not recommend to use Ubuntu for project building due to the nested references among lots of libraries. CentOS / Redhat / Gentoo / CoreOS are preferred platform. You also need CMake and Boost 1.56 to build the repository .Here are the dependent repositories list:

cmake: The cmake modules required to build all iZENECloud C++ projects.
izenelib: The general purpose C++ libraries.
icma: The Chinese morphological analyzer library.
ijma: The Japanese morphological analyzer library.
ilplib: The language processing libraries.
idmlib: The data mining libraries.

Besides, there are some third party repositores required:

Tokyocabinet: The tokyocabinet key-value library is seldomly used, but we had an unified access method encapsulation.
Google Glog: The logging library provided by Google.
Thrift: This is optional, if you want to have SF1R being able to connect to Cassandra, Thrift is required, and we have prepared C++ Cassandra client in izenelib.

Additionally, there are two extra projects:

nginx: The nginx based reverse proxy for SF1R. This is the first nginx project to be able to connect with Zookeeper to get aware of SF1R's node topology.
Ruby driver: The ruby client for SF1R, also it contains a web API sender for testing purpose.

Usage

To use SF1R, you should have configuration files located in the config directory. After that:

$ cd bin
$ ./CobraProcess -F config

Please see the documents for further usage.

License

The SF1R project is published under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 158

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗