Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → miku → Solrbulk

miku / Solrbulk

Licence: gpl-3.0

SOLR bulk indexing utility for the command line.

Programming Languages

31211 projects - #10 most used programming language

Labels

solr indexing

Projects that are alternatives of or similar to Solrbulk

Taoshop

开源电子商务项目，SpringBoot+Dubbo技术栈实现微服务，实现一款分布式集群的电商系统. 项目releases链接：https://github.com/u014427391/taoshop/releases (开发中...)

Stars: ✭ 491 (+1302.86%)

Mutual labels: solr

Blacklight

Blacklight provides a discovery interface for any Solr (http://lucene.apache.org/solr) index.

Stars: ✭ 670 (+1814.29%)

Mutual labels: solr

Solarium

PHP Solr client library

Stars: ✭ 849 (+2325.71%)

Mutual labels: solr

Php Docker Boilerplate

🍲 PHP Docker Boilerplate for Symfony, Wordpress, Joomla or any other PHP Project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)

Stars: ✭ 503 (+1337.14%)

Mutual labels: solr

Hypopg

Hypothetical Indexes for PostgreSQL

Stars: ✭ 594 (+1597.14%)

Mutual labels: indexing

Hugo Elasticsearch

Generate Elasticsearch indexes for Hugo static sites by parsing front matter

Stars: ✭ 19 (-45.71%)

Mutual labels: indexing

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+34211.43%)

Mutual labels: solr

Git To Solr

Index git history into a Solr repository

Stars: ✭ 31 (-11.43%)

Mutual labels: solr

Elki

ELKI Data Mining Toolkit

Stars: ✭ 613 (+1651.43%)

Mutual labels: indexing

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+2320%)

Mutual labels: solr

Ethql

A GraphQL interface to Ethereum 🔥

Stars: ✭ 547 (+1462.86%)

Mutual labels: indexing

Pysolr

Pysolr — Python Solr client

Stars: ✭ 582 (+1562.86%)

Mutual labels: solr

Relevancyfeedback

Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, conceptual search, semantic search and personalized search

Stars: ✭ 19 (-45.71%)

Mutual labels: solr

Pgm Index

🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Stars: ✭ 499 (+1325.71%)

Mutual labels: indexing

Solrnet

Solr client for .Net

Stars: ✭ 853 (+2337.14%)

Mutual labels: solr

Faster

Fast persistent recoverable log and key-value store + cache, in C# and C++.

Stars: ✭ 4,846 (+13745.71%)

Mutual labels: indexing

Springbootexamples

Spring Boot 学习教程

Stars: ✭ 794 (+2168.57%)

Mutual labels: solr

Deepdatabase

A relational database engine using B+ tree indexing

Stars: ✭ 32 (-8.57%)

Mutual labels: indexing

Wukong

An ORM Client library for SolrCloud http://wukong.readthedocs.io/en/latest/

Stars: ✭ 10 (-71.43%)

Mutual labels: solr

Movement

Movement is an easier, simpler way to explore and use NIEM. Want to join the Movement and contribute to it? Start here.

Stars: ✭ 19 (-45.71%)

Mutual labels: solr

View All Similar Projects ➔

solrbulk

Motivation:

Sometimes you need to index a bunch of documents really, really fast. Even with Solr 4.0 and soft commits, if you send one document at a time you will be limited by the network. The solution is two-fold: batching and multi-threading. http://lucidworks.com/blog/high-throughput-indexing-in-solr/

solrbulk expects as input a file with line-delimited JSON. Each line represents a single document. solrbulk takes care of reformatting the documents into the bulk JSON format, that SOLR understands.

solrbulk will send documents in batches and in parallel. The number of documents per batch can be set via -size, the number of workers with -w.

This project has been developed for project finc at Leipzig University Library.

Installation

Installation via Go tools.

$ go get github.com/miku/solrbulk/cmd/...

There are also DEB, RPM and arch packages available at https://github.com/miku/solrbulk/releases/.

Usage

Flags.

$ solrbulk
Usage of solrbulk:
  -commit int
        commit after this many docs (default 1000000)
  -cpuprofile string
        write cpu profile to file
  -memprofile string
        write heap profile to file
  -no-final-commit
        omit final commit (at end of file or stdin)
  -optimize
        optimize index
  -purge
        remove documents from index before indexing (use purge-query to selectively clean)
  -purge-pause duration
        insert a short pause after purge (default 2s)
  -purge-query string
        query to use, when purging (default "*:*")
  -server string
        url to SOLR server, including host, port and path to collection
  -size int
        bulk batch size (default 1000)
  -update-request-handler-name string
        where solr.UpdateRequestHandler is mounted on the server,
        https://is.gd/s0eirv (default "/update")
  -v    prints current program version
  -verbose
        output basic progress
  -w int
        number of workers to use (default 4)
  -z    unzip gz'd file on the fly

Example

Given a newline delimited JSON file:

$ cat file.ldj
{"id": "1", "state": "Alaska"}
{"id": "2", "state": "California"}
{"id": "3", "state": "Oregon"}
...

$ solrbulk -verbose -server https://192.168.1.222:8085/collection1 file.ldj

The server parameter contains host, port and path up to, but excluding the default update route for search (since 0.3.4, this can be adjusted via -update-request-handler-name flag).

For example, if you usually update via https://192.168.1.222:8085/solr/biblio/update the server parameter would be:

$ solrbulk -server https://192.168.1.222:8085/solr/biblio file.ldj

Some performance observations

Having as many workers as core is generally a good idea. However the returns seem to diminish fast with more cores.
Disable autoCommit, autoSoftCommit and the transaction log in solrconfig.xml.
Use some high number for -commit. solrbulk will issue a final commit request at the end of the processing anyway.
For some use cases, the bulk indexing approach is about twice as fast as a standard request to /solr/update.
On machines with more cores, try to increase maxIndexingThreads.

Elasticsearch?

Try esbulk.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 35

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗