All Projects → czcorpus → kontext

czcorpus / kontext

Licence: GPL-2.0 license
An advanced, extensible web front-end for the Manatee-open corpus search engine

Programming Languages

typescript
32286 projects
python
139335 projects - #7 most used programming language
HTML
75241 projects
javascript
184084 projects - #8 most used programming language
shell
77523 projects
PEG.js
56 projects

Projects that are alternatives of or similar to kontext

parallel-corpora-tools
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Stars: ✭ 35 (-30%)
Mutual labels:  corpora, corpus-tools
timelens
Timelens command-line client
Stars: ✭ 39 (-22%)
Mutual labels:  user-interface
CogNet
CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates
Stars: ✭ 26 (-48%)
Mutual labels:  corpus-linguistics
wiregui
A graphical user interface for wireguard (client-side) for linux and windows
Stars: ✭ 99 (+98%)
Mutual labels:  user-interface
datasacks
Tools to make Unity3D UI connections better.
Stars: ✭ 24 (-52%)
Mutual labels:  user-interface
nerus
Large silver standart Russian corpus with NER, morphology and syntax markup
Stars: ✭ 47 (-6%)
Mutual labels:  corpus-linguistics
ungoliant
🕷️ The pipeline for the OSCAR corpus
Stars: ✭ 69 (+38%)
Mutual labels:  corpus-linguistics
kanji-frequency
Kanji usage frequency data collected from various sources
Stars: ✭ 92 (+84%)
Mutual labels:  corpus-linguistics
resto
🔗 a CLI app can send pretty HTTP & API requests with TUI
Stars: ✭ 113 (+126%)
Mutual labels:  user-interface
Dotfiles
🍙 Personal dotfiles repository.
Stars: ✭ 148 (+196%)
Mutual labels:  user-interface
ElvUI
ElvUI for World of Warcraft - Vanilla (1.12.1)
Stars: ✭ 67 (+34%)
Mutual labels:  user-interface
HydraPlay
A multiroom audio player setup, based on snapcast and mopidy.
Stars: ✭ 102 (+104%)
Mutual labels:  user-interface
CrossNER
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
Stars: ✭ 87 (+74%)
Mutual labels:  corpora
SimpleFox
🦊 A Userstyle theme for Firefox minimalist and Keyboard centered.
Stars: ✭ 1,403 (+2706%)
Mutual labels:  user-interface
qt frameless main window
A Qt Widget based frameless main window lib, with full control over the whole screen. This lib is called qtf in short.
Stars: ✭ 20 (-60%)
Mutual labels:  user-interface
vsm-box
Web-component for creating & showing VSM-sentences — Visual Syntax Method
Stars: ✭ 25 (-50%)
Mutual labels:  user-interface
PySimpleGUI
Launched in 2018. It's 2022 and PySimpleGUI is actively developed & supported. Create complex windows simply. Supports tkinter, Qt, WxPython, Remi (in browser). Create GUI applications trivially with a full set of widgets. Multi-Window applications are also simple. 3.4 to 3.11 supported. 325+ Demo programs & Cookbook for rapid start. Extensive d…
Stars: ✭ 10,846 (+21592%)
Mutual labels:  user-interface
goclassy
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
Stars: ✭ 81 (+62%)
Mutual labels:  corpus-linguistics
react-ui-hooks
🧩Simple repository of React hooks for building UI components
Stars: ✭ 20 (-60%)
Mutual labels:  user-interface
interfaces
A diverse set of royalty-free user avatars to be used for marketing graphics and application screenshots.
Stars: ✭ 50 (+0%)
Mutual labels:  user-interface

KonText screenshot

Contents

Introduction

KonText is an advanced corpus query interface and corpus data integration platform built around corpus search engine Manatee-open. It is written in Python 3 and TypeScript and it runs on any major Linux distribution. The development is maintained by the Institute of the Czech National Corpus.

Features

  • fully editable query chain
    • any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
  • multiple search modes:
    • concordance,
    • paradigmatic query,
    • word list
  • simple and advanced query types
    • advanced CQL editor with syntax highlighting and attribute recognition
    • interactive PoS tag composing tool for positional and key-value tagsets
    • customizable query suggestions and simple type query refinement (e.g. for homonym disambiguation)
  • support for spoken corpora
    • defined text segments can be played back as audio
    • KWIC detail with easily distinguishable speeches
  • rich concordance view options and tools
    • any positional attribute can be set as primary
    • multiple ways how to display other attributes
    • user-defined line groups - filtering, reviewing groups ratios
    • tokens and KWICs can be connected to external data services (e.g. dictionaries, encyclopedias)
  • rich subcorpus-related functionality
    • a subcorpus can be either private or published
    • text types metadata can be gradually refined to a specific subcorpus ("which publishers are there in case only fiction is selected?")
    • a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
  • frequency distribution
    • univariate
      • positional attributes (including tuples of multiple attributes per token)
      • structural attributes
    • multivariate distribution (2 dimensions) for both positional and structural attributes
  • collocation analysis
  • persistent URLs - any result page can be easily shared even if the original query is megabytes long
  • access to previous queries, named queries
  • convenient corpus access
    • finding corpus by a keyword (tag), size, description
    • adding corpus to favorites (incl. subcorpora, aligned corpora)
  • saving result to Excel, CSV, XML, TXT
  • integrability with existing information systems

Internal features

  • modern client-side application (written in TypeScript, event stream architecture, React components, extensible)
  • server-side written as a WSGI application with fully decoupled background concordance/frequency/collocation calculation (using an integrated worker server)
  • modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)

Installation

Docker

Running KonText as a set of Docker containers is the most convenient and flexible way. To run a basic configuration instance (i.e. no MySQL/MariaDB server, no WebSocket server) use:

docker-compose up

To run a production grade instance:

docker-compose -f docker-compose.yml -f docker-compose.mysql.yml --env-file .env.mysql up

(the .env.mysql allows configuring custom MySQL/MariaDB credentials and KonText configuration file)

Manual installation

Key requirements

  • Python 3.6 (or newer)
  • Manatee corpus search engine - version 2.167.8 and onwards
  • a key-value storage
    • Redis (recommended), SQLite (supported), custom implementations possible
  • a task queue - Rq (recommended), Celery task queue (supported)
  • HTTP proxy server

For Ubuntu OS users, it is recommended to use the install script which should perform most of the actions necessary to install and run KonText. For other Linux distributions we recommend running KonText within a container or a virtual machine. Please refer to the doc/INSTALL.md file for details.

Customization and contribution

Please refer to our Wiki.

Notable users

How to cite KonText

Tomáš Machálek (2020) - KonText: Advanced and Flexible Corpus Query Interface

@inproceedings{machalek-2020-kontext,
    title = "{K}on{T}ext: Advanced and Flexible Corpus Query Interface",
    author = "Mach{\'a}lek, Tom{\'a}{\v{s}}",
    booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.865",
    pages = "7003--7008",
    language = "English",
    ISBN = "979-10-95546-34-4",
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].