All Projects → internetarchive → fatcat-scholar

internetarchive / fatcat-scholar

Licence: other
search interface for scholarly works

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
CSS
56736 projects
Makefile
30231 projects

Projects that are alternatives of or similar to fatcat-scholar

cds-videos
Access articles, reports and multimedia content in HEP
Stars: ✭ 15 (-76.19%)
Mutual labels:  digital-library
goobi-viewer-core
Goobi viewer - Presentation software for digital libraries, museums, archives and galleries. Open Source.
Stars: ✭ 18 (-71.43%)
Mutual labels:  digital-library
linkedresearch.org
🌐 linkedresearch.org
Stars: ✭ 32 (-49.21%)
Mutual labels:  scholarly-communication
SHARE
SHARE is building a free, open, data set about research and scholarly activities across their life cycle.
Stars: ✭ 93 (+47.62%)
Mutual labels:  scholarly-communication
kitodo-production
Kitodo.Production
Stars: ✭ 52 (-17.46%)
Mutual labels:  digital-library
goobi-workflow
Goobi workflow - Workflow management software for digitisation projects used in more than 70 cultural heritage institutions in at least 17 countries.
Stars: ✭ 43 (-31.75%)
Mutual labels:  digital-library
mycore
MyCoRe (acronym for My Content Repository) is an open source repository software framework for building disciplinary or institutional repositories, digital archives, digital libraries, and scientific journals.
Stars: ✭ 25 (-60.32%)
Mutual labels:  digital-library
kitodo-presentation
Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
Stars: ✭ 33 (-47.62%)
Mutual labels:  digital-library

fatcat-scholar / Internet Archive Scholar

This is source code for scholar.archive.org, a full-text web search interface over the 25+ million open research papers in the Internet Archive.

All of the technical heavy lifting of harvesting, crawling, and metadata corrections are handled by the fatcat service; this service is just a bare-bones, read-only search interface. Unlike the basic fatcat.wiki search, this index allows querying the full content of papers when available.

Overview

This repository is fairly small and contains:

  • fatcat_scholar/: Python code for web service and indexing pipeline
  • fatcat_scholar/templates/: HTML template for web interface
  • tests/: Python test files
  • proposals/: design documentation and change proposals
  • data/: empty directory for indexing pipeline

A data pipeline converts groups of one or more fatcat "release" entities (grouped under a single "work" entity) into a single search index document. Elasticsearch is used as the full-text search engine. A simple web interface parses search requests and formats Elasticsearch results with highlights and first-page thumbnails.

The current Python web framework is FastAPI, though the number of routes is very small and it would be easy to switch to a more conventional framework like Flask.

Getting Started for Developers

You need pipenv and Python 3.8 installed. Most tasks are run using a Makefile; make help will show all options.

Working on the indexing pipeline effectively requires internal access to the Internet Archive cluster and services, though some contributions and bugfixes are probably possible without staff access.

To install dependencies for the first time run:

make dep

then run the tests (to ensure everything is working):

make test

To start the web interface run:

make serve

While developing the web interface, you will almost certainly need an example database running locally. A docker-compose file in extra/docker/ can be used to run Elasticsearch 7.x locally. The make dev-index command will reset the local index with the correct schema mapping, and index any intermediate files in the ./data/ directory. We don't have an out-of-the-box solution for non-IA staff at this step (yet).

After making changes to any user interface strings, the interface translation file (".pot") needs to be updated with make extract-i18n. When these changes are merged to master, the Weblate translation system will be updated automatically.

This repository uses black for code formatting; please run make fmt and make lint for submitting a pull request.

Contributing

Software, copy-editing, translation, and other contributions to this repository are welcome! For content and metadata corrections, or identifying new content to include, the best place to start is the in fatcat repository. Learn more in the fatcat guide. You can chat and ask questions on gitter.im/internetarchive/fatcat.

Contributors in this project are asked to abide by our Code of Conduct.

The web interface is translated using the Weblate platform, at internetarchive/fatcat-scholar

The software license for this repository is Affero General Public License v3+ (APGL 3+), as described in the LICENSE.md file. We ask that you acknowledge the license terms when making your first contribution.

For software developers, the "help wanted" tag in Github Issues is a way to discover bugs and tasks that external folks could contribute to.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].