All Projects → WeebSearch → worker

WeebSearch / worker

Licence: GPL-3.0 license
⚒ Web crawler that analyzes and dissects subtitles into database entries

Programming Languages

typescript
32286 projects
shell
77523 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to worker

anitomy-js
Native Node.js wrapper for Anitomy
Stars: ✭ 21 (+31.25%)
Mutual labels:  anime
aniyomi
Unofficial fork of Tachiyomi for anime
Stars: ✭ 1,814 (+11237.5%)
Mutual labels:  anime
octopus
CD/DVD/HD/SD Indexer. Creates indexes of your floppies, CD/DVD disks, hard/external/network disks, pendrives and other removables media.
Stars: ✭ 19 (+18.75%)
Mutual labels:  indexing
NovelLibrary
One stop for reading all novels
Stars: ✭ 93 (+481.25%)
Mutual labels:  anime
Anime4K-rs
An attempt to write Anime4K in Rust
Stars: ✭ 104 (+550%)
Mutual labels:  anime
Shokofin
Repository for Shokofin, a plugin that brings Shoko to Jellyfin.
Stars: ✭ 44 (+175%)
Mutual labels:  anime
fcgi-function
A cross-platform module to writing C/C++ service for nginx.
Stars: ✭ 33 (+106.25%)
Mutual labels:  indexing
JikanKt
A Kotlin wrapper for Jikan REST API
Stars: ✭ 17 (+6.25%)
Mutual labels:  anime
animepahe-dl
⬇️ animepahe anime downloader
Stars: ✭ 66 (+312.5%)
Mutual labels:  anime
search it
Umfangreiche Volltextsuche für REDAXO 5 CMS. Durchsucht Artikel, Medien, Dateien, PDF-Inhalte und Datenbank-Einträge.
Stars: ✭ 60 (+275%)
Mutual labels:  indexing
jojo-cards
Card game based on Jojo's Bizarre Adventure (ジョジョの奇妙な冒険)
Stars: ✭ 112 (+600%)
Mutual labels:  anime
anime-cli
A CLI for streaming, downloading anime shows. The shows data is indexed through GogoAnime.
Stars: ✭ 31 (+93.75%)
Mutual labels:  anime
shallty
Let me suck your fucking trash fansub!
Stars: ✭ 30 (+87.5%)
Mutual labels:  anime
shortcut
Rust crate providing an indexed, queryable column-based storage system
Stars: ✭ 28 (+75%)
Mutual labels:  indexing
vueman.ga
Delightful reading and tracking of your mangas.
Stars: ✭ 32 (+100%)
Mutual labels:  anime
flitch
🍂 Android Anime Streaming App.
Stars: ✭ 80 (+400%)
Mutual labels:  anime
doki-theme-web
Cute anime character themes for your Chrome, Edge, & Brave browser.
Stars: ✭ 97 (+506.25%)
Mutual labels:  anime
filename-simplifier
☄ Simplify your library
Stars: ✭ 14 (-12.5%)
Mutual labels:  anime
Manime
🍱 An anime app, based on single activity and MVVM architecture.
Stars: ✭ 24 (+50%)
Mutual labels:  anime
sfsdb
Simple yet extensible database you already know how to use
Stars: ✭ 36 (+125%)
Mutual labels:  indexing

An open source database of anime episode and character transcripts.


Why?

Anime is great, and while there's a lot of information out there on anime metadata on great sites like Anilist, there's no way to know what your favorite characters have said without going through all the episodes yourself. What exactly did Aoba say in S1 E1 of New Game!? How often did Louise speak in the first season of Familiar of Zero compared to the last? ¯\_(ツ)_/¯

These are interesting things to be able to answer. Why do I want to answer them? Stop asking so many questions.

How does (will) it work?

  • Crawlers fetch subtitles from websites

  • Subs that don't match one of the handful of known and consistent formats are filtered out

  • Some subtitles have information on speakers, those are parsed as well

  • Anime, episode and character information is looked up on MAL and Anilist

  • Data is given structure and saved on Postgres

  • Solr is updated with new information as they get added to Postgres

  • GraphQL is used as an API to interface with Elasticsearch

  • Requests are checked and cached on Redis for each query

Todo and Planned Features

Workers (Typescript)

  • Support multiple sub groups

  • Support multiple file types (rar, zip, 7z, tar.gz)

  • Support Japanese subtitles

  • Add more sub websites to crawl

Backend (Typescript)

  • Integrate Hifumi's API or start the API from scratch with Prisma

  • User authentication, JWT? Sessions.

  • Internal Graphql to expose ORM features to the workers

  • Solr integration for indexing dialogues

  • Redis integration for caching user queries

Frontend (Angular) Frontend Repo

  • Start a website with Angular

  • Create a web-based transcript editor to fix parsing mistakes or add new information

    • Available to users designated as data mods

    • Supports:

      • Marking lines with the correct speakers [color coded]

      • Editing existing character information

      • Editing episode and character metadata

      • Deleting unnecessary dialogues and characters (which there are a lot of)

      • Merging animes, dialogues, characters and more

Getting Started

Manual
  1. Copy .env.example to .env
  2. Run npm install
  3. Install Postgres
  4. Install Redis
  5. Run prisma deploy
Docker
  1. Copy .env.example to .env
  2. Download Docker
  3. Run docker-compose up -d
  4. Run prisma deploy

Tools

  • npm run subs starting the sub crawler

  • npm start start the API to serve data

  • npm run lint checks the code for tslint violations

  • npm test runs jest tests against the spec.ts files

    • Remember to include tests for new changes
Contributing

Yes, I know the TSLint rules are very restrictive if you're not used to functional style. But you can do it, I believe in you, you don't need to use silly for loops when you have map, reduce and recursion.

I do expect the linter to pass for commits to get merged so you might want to keep an eye out for that.


Note:

This service is still a work in progress, meaning any documentation or service component may change or get added literally overnight

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].