All Projects → libris → librisxl

libris / librisxl

Licence: Apache-2.0 license
Libris XL

Programming Languages

groovy
2714 projects
java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language
shell
77523 projects

Labels

Projects that are alternatives of or similar to librisxl

solrdump
Export SOLR documents efficiently with cursors.
Stars: ✭ 33 (-34%)
Mutual labels:  code4lib
siskin
Tasks around metadata.
Stars: ✭ 20 (-60%)
Mutual labels:  code4lib
metadata-qa-marc
QA catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA)
Stars: ✭ 59 (+18%)
Mutual labels:  code4lib
annif
ANNotation Infrastructure using Finna: an automatic subject indexing tool using Finna as corpus
Stars: ✭ 14 (-72%)
Mutual labels:  code4lib
iromlab
Loader software for automated imaging of optical media with Nimbie disc robot
Stars: ✭ 26 (-48%)
Mutual labels:  code4lib
europeana-portal-collections
Europeana Collections portal as a Rails + Blacklight application.
Stars: ✭ 18 (-64%)
Mutual labels:  code4lib
roadoi
Use Unpaywall with R
Stars: ✭ 60 (+20%)
Mutual labels:  code4lib
videlibri
📚 Cross-platform library client to automate any OPAC and library catalog from your local device, e.g. for renewing of borrowed books or searching for books available in the library in automated scripts.
Stars: ✭ 18 (-64%)
Mutual labels:  code4lib
kitodo-production
Kitodo.Production
Stars: ✭ 52 (+4%)
Mutual labels:  code4lib
openrefine-docker
OpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files for automated builds.
Stars: ✭ 19 (-62%)
Mutual labels:  code4lib
brunnhilde
Siegfried-based characterization tool for directories and disk images
Stars: ✭ 55 (+10%)
Mutual labels:  code4lib
openrefine-client
The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Stars: ✭ 67 (+34%)
Mutual labels:  code4lib
Library-Search-Plugin-Public
The Library Search Plugin plugin allows users (students, researchers, etc.) to search your library's catalogue, Google Scholar, WorldCat, or PubMed, without having to navigate to the respective websites first! It also comes with a neat context menu that allows users to select text, right-click, and search!
Stars: ✭ 17 (-66%)
Mutual labels:  code4lib
scholia
Wikidata-based scholarly profiles
Stars: ✭ 166 (+232%)
Mutual labels:  code4lib
kitodo-presentation
Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
Stars: ✭ 33 (-34%)
Mutual labels:  code4lib
metis-framework
Metis, named after the Titaness of Wisdom, is our in-development data publication framework including both a client application and a number of data processing (micro)services
Stars: ✭ 15 (-70%)
Mutual labels:  code4lib
urnlib
Java library for representing, parsing and encoding URNs as in RFC2141 and RFC8141
Stars: ✭ 24 (-52%)
Mutual labels:  code4lib
CSharp MARC
C# class libraries and full featured editor for MARC Records
Stars: ✭ 51 (+2%)
Mutual labels:  code4lib
isolyzer
Verify size of ISO 9660 image against Volume Descriptor fields
Stars: ✭ 29 (-42%)
Mutual labels:  code4lib
openrefine-batch
Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Stars: ✭ 76 (+52%)
Mutual labels:  code4lib

Libris XL



Parts

The project consists of:

  • Core

    • whelk-core/ The root component of XL. A shared library implementing a linked data store, including search and MARC conversion.
  • Applications

    • oaipmh/ A servlet web application. OAIPMH service for Libris XL
    • rest/ A servlet web application. Search, RESTful CRUD and other HTTP APIs
    • marc_export/ A servlet (and CLI program) for exporting libris data as MARC.
    • importers/ Java application to load or reindex data into the system.
    • apix_server/ A servlet web application. XL reimplementation of the Libris legacy APIX API.
  • Tools

    • whelktool/ CLI tool for running scripted mass updates of data.
    • librisxl-tools/ Configuration and scripts used for setup, maintenance and operations.

Related external repositories:

  • Core metadata to be loaded is managed in the definitions repository.

  • Also see LXLViewer, our application for viewing and editing the datasets through the REST API.

Dependencies

The instructions below assume an Ubuntu 20.04 system (Debian should be identical), but should work for e.g. Fedora/CentOS/RHEL with minor adjustments.

  1. Gradle

    No setup required. Just use the checked-in gradle wrapper to automatically get the specified version of Gradle and Groovy.

  2. Elasticsearch (version 7.x)

    Download Elasticsearch (for Ubuntu/Debian, select "Install with apt-get"; before importing the Elasticsearch PGP key you might have to do sudo apt install gnupg if you're running a minimal distribution.)

    NOTE:

    • We use the elasticsearch-oss version.
    • The ICU Analysis plugin (icu-analysis) must be installed; see "Setting up Elasticsearch" below.
  3. PostgreSQL (version 14.2 or later)

    # Ubuntu/Debian
    sudo apt install postgresql postgresql-client
    # macOS
    brew install postgresql
    

    Windows: Download and install https://www.postgresql.org/download/windows/

  4. Java (version 17)

    sudo apt install openjdk-17-jdk # or openjdk-17-headless
    
  5. Apache

    sudo apt install apache2
    

Setup

Cloning repositories

Make sure you check out this repository, and also definitions and devops:

git clone [email protected]:libris/librisxl.git
git clone [email protected]:libris/definitions.git
# devops repo is private; ask for access
git clone [email protected]:libris/devops.git

You should now have the following directory structure:

.
├── definitions
├── devops
├── librisxl

Setting up PostgreSQL

Ensure PostgreSQL is started. In Debian/Ubuntu, this happens automatically after apt install. Otherwise, try systemctl start postgresql in any modern Linux system.

Create database and a database user and set up permissions:

sudo -u postgres bash
createdb whelk_dev
psql -c "CREATE USER whelk PASSWORD 'whelk';"
# !! Replace yourusername with your actual username (i.e., the user you'll run whelk, fab, etc. as)
psql -c "CREATE USER yourusername;"
psql -c "GRANT ALL ON SCHEMA public TO whelk;" whelk_dev
psql -c "GRANT ALL ON ALL TABLES IN SCHEMA public TO whelk;" whelk_dev
# Now find out where the pg_hba.conf file is:
psql -t -P format=unaligned -c 'show hba_file;'
exit

Give all users access to your local database by editing pg_hba.conf. You got the path from the last psql command just above. It's probably something like /etc/postgresql/12/main/pg_hba.conf. Edit it and add the following above any uncommented line (PostgreSQL uses the first match):

host    all             all        127.0.0.1/32            trust
host    all             all        ::1/128                 trust

Restart PostgreSQL for the changes to take effect:

sudo systemctl restart postgresql

Test connectivity:

psql -h localhost -U whelk whelk_dev
psql (12.5 (Ubuntu 12.5-0ubuntu0.20.04.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

whelk_dev=> \q

Setting up Elasticsearch

Edit /etc/elasticsearch/elasticsearch.yml. Uncomment cluster.name and set it to something unique on the network. This name is later specified when you configure the XL system.

Next, install the ICU Analysis plugin:

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-icu

Finally, (re)start Elasticsearch:

sudo systemctl restart elasticsearch

(To adjust the JVM heap size for Elasticsearch, edit /etc/elasticsearch/jvm.options and then restart Elasticsearch.)

Configuring secrets

Use librisxl/secret.properties.in as a starting point:

cd librisxl
cp secret.properties.in secret.properties
# In secret.properties, set:
# - elasticCluster to whatever you set cluster.name to in the Elasticsearch configuration above.
vim secret.properties
# Make sure kblocalhost.kb.se points to 127.0.0.1
echo '127.0.0.1 kblocalhost.kb.se' | sudo tee -a /etc/hosts

Importing test data

Run the fabric task that sets up a new Elasticsearch index and imports example data:

cd ../devops
# Make sure you have Python 3 and curl
sudo apt install python3 python3-pip curl
# Create virtual Python 3 environment for fab
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create Elasticsearch index
fab conf.xl_local app.whelk.create_es_index
# Import test data
fab conf.xl_local app.whelk.import_work_example_data

Running

To start the CRUD part of the whelk, run the following commands:

*NIX-systems:

cd ../librisxl/rest
export JAVA_OPTS="-Dfile.encoding=utf-8"
../gradlew -Dxl.secret.properties=../secret.properties appRun

Windows:

$ cd $LIBRISXL/rest
$ setx JAVA_OPTS "-Dfile.encoding=utf-8"
$ ../gradlew.bat -Dxl.secret.properties=../secret.properties appRun

The system is then available on http://localhost:8180. (The OAI-PMH service is started in a similar way: just cd into oaipmh instead of rest.)

To run the frontend, first set up the Libris cataloging client and the id.kb.se web app (follow the README in each):

At this point, you should have the LXLViewer cataloging client running on port 8080 and the id.kb.se app running on port 3000, but they won't work yet. Next, edit /etc/apache2/sites-enabled/000-default.conf and add the following:

<VirtualHost *:5000>
    ServerName kblocalhost.kb.se
    ProxyRequests Off
    ProxyPreserveHost On

    RewriteEngine On

    <LocationMatch "^/([bcdfghjklmnpqrstvwxz0-9]{15,16})$">
        ProxyPreserveHost Off
        RewriteCond %{HTTP_ACCEPT} (text/html|application/xhtml|\*/\*|^$)
        RewriteCond %{REQUEST_METHOD} GET
        RewriteRule ([^/]+)$ http://id.kblocalhost.kb.se:5000/$1 [P]
    </LocationMatch>

    <Location /_nuxt>
        ProxyPreserveHost Off
        ProxyPass http://id.kblocalhost.kb.se:5000/_nuxt
    </Location>
    
    ProxyPass        /katalogisering     http://localhost:8080/katalogisering                   
    ProxyPassReverse /katalogisering     http://localhost:8080/katalogisering

    ProxyPassMatch ^/vocab/(data.*) http://localhost:8180/https://id.kb.se/vocab//$1
    ProxyPass /vocab http://localhost:8180/https://id.kb.se/vocab
    ProxyPass /context.jsonld http://localhost:8180/https://id.kb.se/vocab/context

    RewriteCond %{REQUEST_METHOD} ^(POST|PUT|DELETE|OPTIONS)$
    RewriteRule ^/data(.*)$ http://localhost:8180/$1 [P,L]

    ProxyPass / http://localhost:8180/

    AddOutputFilterByType DEFLATE text/css text/html text/plain text/xml
    AddOutputFilterByType DEFLATE application/x-javascript text/x-component application/javascript
    AddOutputFilterByType DEFLATE application/json application/ld+json
</VirtualHost>

<VirtualHost *:5000>
    ServerName id.kblocalhost.kb.se
    ProxyRequests Off
    ProxyPreserveHost On

    RewriteEngine On

    RewriteCond %{HTTP_ACCEPT} (text/html|application/xhtml|\*/\*) [OR]
    RewriteCond %{HTTP_ACCEPT} ^$
    RewriteCond %{HTTP_ACCEPT} !^(text/turtle|application/trig|application/ld\+json|application/rdf\+xml)($|.+/x?html;q=0.*)
    RewriteCond %{REQUEST_URI} !\.(json|jsonld)$
    RewriteCond %{REQUEST_URI} !data\..+$
    RewriteCond %{REQUEST_URI} !/maintenance.html
    RewriteCond %{REQUEST_URI} !/robots.txt
    RewriteRule ^/(.*)$ http://localhost:3000/$1 [P,L]
    
    ProxyPass /_nuxt http://localhost:3000/_nuxt
    ProxyPass /_loading http://localhost:3000/_loading
    ProxyPass /__webpack_hmr http://localhost:3000/__webpack_hmr

    # NOTE: The double slash is needed because of an "ambitious" sameAs on the vocab resource.
    ProxyPassMatch ^/vocab/(data.*) http://localhost:8180/https://id.kb.se/vocab//$1
    ProxyPass /vocab http://localhost:8180/https://id.kb.se/vocab
    ProxyPass /vocab/display/data.jsonld http://localhost:8180/https://id.kb.se/vocab/display
    ProxyPass /context.jsonld http://localhost:8180/https://id.kb.se/vocab/context/data.jsonld

    ProxyPassMatch ^/(data.*)$ http://localhost:8180/$1
    ProxyPassMatch ^/find(.*) http://localhost:8180/find$1

    ProxyPassMatch ^/(http.*)$ http://localhost:8180/$1 nocanon
    ProxyPassMatch ^/([bcdfghjklmnpqrstvwxz0-9]{15,16}) http://localhost:8180/$1
    ProxyPassMatch ^/library/(.*) http://localhost:8180/https://libris.kb.se/library/$1 nocanon
    ProxyPassMatch ^/(.*) http://localhost:8180/https://id.kb.se/$1 nocanon

    AddOutputFilterByType DEFLATE text/css text/html text/plain text/xml
    AddOutputFilterByType DEFLATE application/x-javascript text/x-component application/javascript
    AddOutputFilterByType DEFLATE application/json application/ld+json
</VirtualHost>

Edit /etc/apache2/ports.conf and add the following line:

Listen 5000

Add these lines to /etc/hosts:

127.0.0.1 kblocalhost.kb.se
127.0.0.1 id.kblocalhost.kb.se

Make sure some necessary Apache modules are enabled:

a2enmod rewrite proxy proxy_http

Now (re)start Apache:

systemctl restart apache2

You should now be able to visit http://id.kblocalhost.kb.se:5000, and use the cataloging client on http://kblocalhost.kb.se:5000/katalogisering/. The XL API itself is available on http://kblocalhost.kb.se:5000 (proxied via Apache), or directly on http://localhost:8180.

Maintenance

Everything you would want to do should be covered by the devops repo. This section is mostly kept as a reminder of alternate (less preferred) ways.

Development Workflow

If you need to work locally (e.g. in this or the "definitions" repo) and perform specific tests, you can use this workflow:

  1. Create and push a branch for your work.
  2. Set the branch in the conf.xl_local config in the devops repo.
  3. Use the regular tasks to e.g. reload data.

New Elasticsearch config

If a new index is to be set up, and unless you run locally in a pristine setup, or use the recommended devops-method for loading data you need to PUT the config to the index, like:

$ curl -XPUT http://localhost:9200/indexname_versionnumber \
    -H 'Content-Type: application/json' \
    -d @librisxl-tools/elasticsearch/libris_config.json

Create an alias for your index

$ curl -XPOST http://localhost:9200/_aliases \
    -H 'Content-Type: application/json' \
    -d  '{"actions":[{"add":{"index":"indexname_versionnumber","alias":"indexname"}}]}'

(To replace an existing setup with entirely new configuration, you need to delete the index curl -XDELETE http://localhost:9200/<indexname>/ and read all data again (even locally).)

Format updates

If the MARC conversion process has been updated and needs to be run anew, the only option is to reload the data from production using the importers application.

Statistics

Produce a stats file (here for bib) by running:

$ cd importers && ../gradlew build
$ RECTYPE=bib && time java -Dxl.secret.properties=../secret.properties -Dxl.mysql.properties=../mysql.properties -jar build/libs/vcopyImporter.jar vcopyjsondump $RECTYPE | grep '^{' | pypy ../librisxl-tools/scripts/get_marc_usage_stats.py $RECTYPE /tmp/usage-stats-$RECTYPE.json
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].