All Projects → ubleipzig → solrdump

ubleipzig / solrdump

Licence: GPL-3.0 license
Export SOLR documents efficiently with cursors.

Programming Languages

go
31211 projects - #10 most used programming language
shell
77523 projects
Makefile
30231 projects

Projects that are alternatives of or similar to solrdump

kitodo-presentation
Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
Stars: ✭ 33 (+0%)
Mutual labels:  solr, code4lib
cassandra-exporter
Simple Tool to Export / Import Cassandra Tables into JSON
Stars: ✭ 44 (+33.33%)
Mutual labels:  export, dump
Wikiteam
Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2020, WikiTeam has preserved more than 250,000 wikis.
Stars: ✭ 404 (+1124.24%)
Mutual labels:  export, dump
mysql-backup-golang
Mysql backup golang
Stars: ✭ 25 (-24.24%)
Mutual labels:  dump
jira-project-export
Export issues and metadata for a single JIRA project as JSON.
Stars: ✭ 18 (-45.45%)
Mutual labels:  export
pass2csv
Export pass(1), "the standard unix password manager", to CSV.
Stars: ✭ 70 (+112.12%)
Mutual labels:  export
pagebuster
PageBuster - dump all executable pages of packed processes.
Stars: ✭ 188 (+469.7%)
Mutual labels:  dump
Dataset Serialize
JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus (FPC)
Stars: ✭ 213 (+545.45%)
Mutual labels:  export
mtga-utils
Magic the Gathering: Arena related stuff (Card collection export)
Stars: ✭ 47 (+42.42%)
Mutual labels:  export
siskin
Tasks around metadata.
Stars: ✭ 20 (-39.39%)
Mutual labels:  code4lib
metadata-qa-marc
QA catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA)
Stars: ✭ 59 (+78.79%)
Mutual labels:  code4lib
vue-datagrid
Spreadsheet data grid component. Handles enormous data processing.
Stars: ✭ 171 (+418.18%)
Mutual labels:  export
rodf
ODF generation library for Ruby
Stars: ✭ 50 (+51.52%)
Mutual labels:  export
europeana-portal-collections
Europeana Collections portal as a Rails + Blacklight application.
Stars: ✭ 18 (-45.45%)
Mutual labels:  code4lib
goobi-viewer-core
Goobi viewer - Presentation software for digital libraries, museums, archives and galleries. Open Source.
Stars: ✭ 18 (-45.45%)
Mutual labels:  solr
Portphp
Data import/export framework for PHP
Stars: ✭ 225 (+581.82%)
Mutual labels:  export
go-solr
Solr client in Go, core admin, add docs, update, delete, search and more
Stars: ✭ 67 (+103.03%)
Mutual labels:  solr
annif
ANNotation Infrastructure using Finna: an automatic subject indexing tool using Finna as corpus
Stars: ✭ 14 (-57.58%)
Mutual labels:  code4lib
LuaKit
Lua核心工具包,包含对面向对象,组件系统(灵活的绑定解绑模式),mvc分模块加载,事件分发系统等常用模式的封装。同时提供打印,内存泄漏检测,性能分析等常用工具类。
Stars: ✭ 112 (+239.39%)
Mutual labels:  dump
open2fa
Two-factor authentication app with import/export for iOS and macOS. All codes encrypted with AES 256. FaceID & TouchID support included. Written with love in SwiftUI ❤️
Stars: ✭ 24 (-27.27%)
Mutual labels:  export

README

Export documents from a SOLR index as JSON, fast and simply from the command line.

Requesting large number of documents from SOLR can lead to Deep Paging problems:

When you wish to fetch a very large number of sorted results from Solr to feed into an external system, using very large values for the start or rows parameters can be very inefficient.

See also: Fetching A Large Number of Sorted Results: Cursors

As an alternative to increasing the "start" parameter to request subsequent pages of sorted results, Solr supports using a "Cursor" to scan through results. Cursors in Solr are a logical concept, that doesn't involve caching any state information on the server. Instead the sort values of the last document returned to the client are used to compute a "mark" representing a logical point in the ordered space of sort values.

Requirements

SOLR 4.7 or higher, since the cursor mechanism was introduced with SOLR 4.7 (2014-02-25) — see also efficient deep paging with cursors.

Project Status: Active – The project has reached a stable, usable state and is being actively developed. https://goreportcard.com/report/github.com/ubleipzig/solrdump

This project has been developed for Project finc at Leipzig University Library.

Installation

Via debian or rpm package.

Or via go tool:

$ go get github.com/ubleipzig/solrdump/...

Usage

$ solrdump -h
Usage of solrdump:
  -fl string
        field or fields to export, separate multiple values by comma
  -q string
        SOLR query (default "*:*")
  -rows int
        number of rows returned per request (default 1000)
  -server string
        SOLR server, host post and collection (default "http://localhost:8983/solr/example")
  -sort string
        sort order (only unique fields allowed) (default "id asc")
  -verbose
        show progress
  -version
        show version and exit

Export id and title field for all documents:

$ solrdump -server https://localhost:8983/solr/biblio -q '*:*' -fl id,title
{"id":"0000001864","title":"Veröffentlichungen des Museums für Völkerkunde zu Leipzig"}
{"id":"0000002001","title":"Festschrift zur Feier des 500jährigen Bestehens der ... /"}
...

Export documents matching a query and postprocess with jq:

$ solrdump -server https://localhost:8983/solr/biblio -q 'title:"topic model"' -fl id,title | \
  jq -r .title | \
  head -10

A generic approach to topic models and its application to virtual communities /
Topic models for image retrieval on large scale databases
On the use of language models and topic models in the web new algorithms for filtering, ...
Integration von Topic Models und Netzwerkanalyse bei der Bestimmung des Kundenwertes
Time dynamic topic models /
...

Instant search as one-liner

Using solrdump + jq + fzf (or peco).

$ solrdump -server http://solr.io/solr/biblio -q 'title:"leipzig"' -fl 'id,source_id,title' | \
    jq -rc '[.source_id, .title[:80]] | @tsv' | fzf -e

...

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].