All Projects → PDFTron → pdftron-document-search

PDFTron / pdftron-document-search

Licence: other
Build search across multiple documents client-side in your file storage

Programming Languages

javascript
184084 projects - #8 most used programming language
CSS
56736 projects
HTML
75241 projects

Projects that are alternatives of or similar to pdftron-document-search

antiword
R wrapper for antiword utility
Stars: ✭ 47 (+42.42%)
Mutual labels:  extract-text
headlesscommerce.org
Headless Commerce resources
Stars: ✭ 25 (-24.24%)
Mutual labels:  algolia-instantsearch
pdf-to-text
Read pdf files on javascript
Stars: ✭ 62 (+87.88%)
Mutual labels:  extract-text
simple NER
simple rule based named entity recognition
Stars: ✭ 29 (-12.12%)
Mutual labels:  extract-text
Docotic.Pdf.Samples
C# and VB.NET samples for Docotic.Pdf library
Stars: ✭ 52 (+57.58%)
Mutual labels:  extract-text
Instant-Weather
An Android weather application implemented using the MVVM pattern, Retrofit2, Dagger Hilt, LiveData, ViewModel, Coroutines, Room, Navigation Components, Data Binding and some other libraries from the Android Jetpack.
Stars: ✭ 677 (+1951.52%)
Mutual labels:  algolia-instantsearch
unrtf
Wrapper for 'unrtf' utility to extract text from RTF documents
Stars: ✭ 14 (-57.58%)
Mutual labels:  extract-text
pd3f
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Stars: ✭ 132 (+300%)
Mutual labels:  extract-text

PDFTron Document Search

PDFTron Document Search demonstrates building an application where users can search across multiple documents using:

Watch a quick video that walks you throught the app. I also put together a blog to help you get started.

Screenshot

This repo is designed to help to get started in creating your own document searching workflow.

Install

npm install

Algolia Configuration

This application uses Algolia to search documents. However, be aware that Algolia is not the only third-party search provider. Consider alternatives such as ElasticSearch.

To get started with this sample, please register a new app with Algolia.

Create a new index called document_search: Screenshot

After you configured your app, create .env file in the root of the directory and place the following:

REACT_APP_ALGOLIA_APP_ID=your_key_goes_here
REACT_APP_ALGOLIA_API_KEY=your_key_goes_here
REACT_APP_ALGOLIA_SEARCH_KEY=your_key_goes_here
REACT_APP_ALGOLIA_INDEX_NAME=document_search

The above information can be found under API Keys in your Algolia Dashboard. Screenshot

Firebase Configuration

This application uses Firebase to store documents. You can use any other backend of your choice. However, to get started with this sample, please register a new app with Firebase.

Make sure you create a storage bucket, and enable authentication for email and Google. Screenshot

After you have registered an app, create .env file in the root of the directory and place the following:

REACT_APP_API_KEY=your_key_goes_here
REACT_APP_MESSAGING_SENDER_ID=your_key_goes_here
REACT_APP_APP_ID=your_key_goes_here
REACT_APP_AUTH_DOMAIN=your_domain_goes_here
REACT_APP_DATABASE_URL=your_database_go_here
REACT_APP_PROJECT_ID=your_project_id
REACT_APP_STORAGE_BUCKET=your_storage_bucket

The above information can be found under settings of your Firebase app. Screenshot

Change Firestore Database rules to:

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    match /{document=**} {
      allow read, write: if request.auth != null;
    }
  }
}

Change Storage rules to:

rules_version = '2';
service firebase.storage {
  match /b/{bucket}/o {
    match /{allPaths=**} {
      allow read, write: if request.auth != null;
    }
  }
}

Now you can run the application and start uploading your documents.

CORS

You will need to set up CORS on your Firestore to allow WebViewer to access files stored in your bucket. I created a CORS file called cors.json:

[
  {
    "origin": ["*"],
    "method": ["GET"],
    "maxAgeSeconds": 3600
  }
]

And then used gsutil to update it: https://cloud.google.com/storage/docs/configuring-cors

Run

npm start

Project structure

src/
  app/             - Redux Store Configuration
  components/      - React components
    Navigate/            - Component responsible for navigating between different screens
    PasswordReset/       - Reset password
    Profile/             - Profile information and a sign out button
    Search/              - Search previously uploaded documents
    SignIn/              - Sign in
    SignUp/              - Sign up
    Upload               - Upload a document, which will be indexed for searching and saved to file storage.
    View/                - View document with the search result highlighted
  App              - Configuration for navigation, authentication
  index            - Entry point and configuration for React-Redux
  firebase/        - Firebase configuration for authentication, updating documents, storing PDFs
  tools/           - Helper function to copy over PDFTron dependencies into /public on post-install

API documentation

See API documentation.

Contributing

See contributing.

License

See license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].