All Projects → pegesund → clojureranker

pegesund / clojureranker

Licence: EPL-2.0 license
Tune Solr-rankings with Clojure code.

Programming Languages

clojure
4091 projects
java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to clojureranker

solr-vector-scoring
Vector Plugin for Solr: calculate dot product / cosine similarity on documents
Stars: ✭ 28 (+115.38%)
Mutual labels:  solr, solr-plugin
yasa
Yet Another Solr Admin
Stars: ✭ 48 (+269.23%)
Mutual labels:  solr, solr-plugin
SolrConfigExamples
Examples of Solr configuration entries for Solr plugins and Conceptual Search\Semantic Search from Simon Hughes Dice.com
Stars: ✭ 26 (+100%)
Mutual labels:  solr, solr-plugin
Code4java
Repository for my java projects.
Stars: ✭ 164 (+1161.54%)
Mutual labels:  solr
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+1169.23%)
Mutual labels:  solr
go-solr
Solr client in Go, core admin, add docs, update, delete, search and more
Stars: ✭ 67 (+415.38%)
Mutual labels:  solr
sophie
A Solr browser and administration tool
Stars: ✭ 28 (+115.38%)
Mutual labels:  solr
Tis Solr
an enterprise search engine base on Apache Solr
Stars: ✭ 158 (+1115.38%)
Mutual labels:  solr
feathers-solr
Feathersjs Solr Client
Stars: ✭ 29 (+123.08%)
Mutual labels:  solr
Sola
Scene search On Liresolr for Animation. (and video)
Stars: ✭ 253 (+1846.15%)
Mutual labels:  solr
Conceptualsearch
Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs
Stars: ✭ 245 (+1784.62%)
Mutual labels:  solr
Xbin Store
模仿国内知名B2C网站,实现的一个分布式B2C商城 使用Spring Boot 自动配置 Dubbox / MVC / MyBatis / Druid / Solr / Redis 等。使用Spring Cloud版本请查看
Stars: ✭ 2,140 (+16361.54%)
Mutual labels:  solr
goobi-viewer-core
Goobi viewer - Presentation software for digital libraries, museums, archives and galleries. Open Source.
Stars: ✭ 18 (+38.46%)
Mutual labels:  solr
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+1169.23%)
Mutual labels:  solr
ClarityNLP
An NLP framework for clinical phenotyping. Docker | Python | Solr | OMOP. http://claritynlp.readthedocs.io/en/latest/
Stars: ✭ 100 (+669.23%)
Mutual labels:  solr
Solrtexttagger
A text tagger based on Lucene / Solr, using FST technology
Stars: ✭ 162 (+1146.15%)
Mutual labels:  solr
solrdump
Export SOLR documents efficiently with cursors.
Stars: ✭ 33 (+153.85%)
Mutual labels:  solr
Typo3 Docker Boilerplate
🍲 TYPO3 Docker Boilerplate project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)
Stars: ✭ 240 (+1746.15%)
Mutual labels:  solr
Relevant Search Book
Code and Examples for Relevant Search
Stars: ✭ 231 (+1676.92%)
Mutual labels:  solr
query-segmenter
Solr Query Segmenter for structuring unstructured queries
Stars: ✭ 21 (+61.54%)
Mutual labels:  solr

clj.clojureranker

  • Rescore Solr scoring functions with clojure functions
  • Connect to nRepl for fast development cycle
  • Use the whole Clojure ecosystem while rescoring
  • Build Solr plugins without repacking jars and restarting Solr all the time

Usage

You can write clojure functions to rescore the Solr results.

It rescores only the n-top results in the query, the top-parameter defines how many to rescore

It should be pretty fast to start going

  • Checkout this project and run: lein uberjar
  • Copy the uberjar in the target dir into solr classpath
    • If you put the jar in solr/lib add startup="lazy" to you requesthandler
    • I normally put the jar in the core-dir/lib (you will have to create this dir)
  • If one use the solr-config below you will be up and running default rescorer, which rescores random
  • Create your own leiningen project and add to solr classpath, this should contain the new rescore function or just keep working on this leiningen project
  • Update the solr config with require and config to reach the new rescore function

Even simpler distribution

If you think creating a leiningen project is overkill, you can also do use the "load-file" parameter which should point to an absolute file path.

The plugin will the a load-file on this file at startup.

Solr configuration

  <searchComponent name="cselect" class="clojureranker.Rescorer">
     <lst name="defaults">
       <bool name="start-nrepl">true</bool>
       <str name="searchComponentName">cselect</str>     
       <str name="require">clojureranker.test</str>  
       <str name="function">clojureranker.test/rescore</str> 
       <int name="top">30</int>                      
     </lst>
  </searchComponent>

Then add this lines to your request handler to activate the component:

     <arr name="last-components">
       <str>cselect</str>
     </arr>

Note:

  • You need to repeat the searchcomponent-name in the defaults config (like above)
  • Start repl with the start-nrepl-param. Only one repl will be started pr. solr instance
  • You can have different search-components if you need different rescore-functions on different cores

Rescore function

Example on the look of a rescore function:

(defn rescore [score_list]
  "this is only a test rescore function"
  (map (fn [doc]
         (let [old-score (first doc)
               lucene-id (second doc)
               solr-doc (nth doc 2)
               new-score (if (= (.get solr-doc "id") "055357342X") 1 (rand))
               ]
           [new-score lucene-id])
         ) score_list)
  )

The input to the rescore function is a list of lists like this

 [[score lucene-id solr-doc] [score lucene-id solr-doc] [score lucene-id solr-doc] ...]

The return of the function must be a list of type

 [[new-score lucene-id] [new-score lucene-id] [new-score lucene-id] ...]

To note:

  • Sorting will be handled by the framework, you just provide the new score
  • All solr fields are available with the get-function above
  • In the example above I just random score all hits, except if the id is 055357342X. Then I score this to 1, so this should always be on the top.

nRepl

Repl is started at 7888, connect with your favorite editor and recompile and test out on the fly. There is no long restart, packing cycles, but when you require new packages in the project file you will have to rebuid and restart solr.

The repl should off course only be run in debug environments, as it is a loaded gun :)

Speed

It is pretty fast and I cant hardly notice the difference between a normal solr query and a rescored one.

But if you do heavy stuff, like getting info through http-requests and/or heavy vector calculations response time will probably rise.

Contributions and feedback

is of course welcome. Just drop create a pull request and drop me a note.

TellusR

My company, Sannsyn, is working on a plugin called TellusR to do stuff like this in Solr:

  • AB-testing directly in Solr
  • Boosting, tuning based on ai
  • Personalization based on semantic and/or click/purchase info
  • Statistics to see how the search is used:
    • Most used terms
    • Trending stuff
    • Which stuff converts best to click/buys
    • Find which articles are never shown in hit lists
    • Find articles which are shown, but does not convert
    • Number of zero-hits, how these trends and which terms these are
    • Avg hits pr day, distribution through time and so on
    • Response time
    • Request times
    • We use smart algorithms and anomaly detections to warn you about trouble
  • Gui to synonyms, elevation and advanced boosting rules
  • More features coming :)

We also adopt the plugin for larger customer if needed.

Parts of this will be open source, stay tuned or if you are interested, just drop us a line to get some early info

Embedding and boostrapping the clojure interpreter

This line did cost me my last non-grey hair straw, but it made me available to embed and boostrap the clojure interpreter from Solr:

    Thread.currentThread().setContextClassLoader(this.getClass().getClassLoader());

I mention here specifically as I might save some work for someone else.

Drop me a line if you have an alternative approach.

Solr versions

This plugin is compiled against solr 8.4.1-core. Chances are good that it will work out of the box with newer/older versions as well.

But if you would like to be certain, just checkout and change the 8.4.1 in the project-file to your solr version and the run:

lein uberjar

The new jar to add to Solr will be in the target-dir

Credits

This plugin is loosely based on info in this article

Thanks for for open sourcing!

License

Copyright © 2020 Petter Egesund and Sannsyn

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.

This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].