All Projects → open-discourse → open-discourse

open-discourse / open-discourse

Licence: MIT License
Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).

Programming Languages

python
139335 projects - #7 most used programming language
typescript
32286 projects
shell
77523 projects

Projects that are alternatives of or similar to open-discourse

CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+706.38%)
Mutual labels:  corpus
thai-language
computer tools for thai language
Stars: ✭ 20 (-57.45%)
Mutual labels:  corpus
fuzzing-corpus
My fuzzing corpus
Stars: ✭ 120 (+155.32%)
Mutual labels:  corpus
PoetryCorpus
Поэтический корпус русского языка
Stars: ✭ 40 (-14.89%)
Mutual labels:  corpus
cljs-corpus
A greppable archive of ClojureScript code
Stars: ✭ 37 (-21.28%)
Mutual labels:  corpus
PubMed-PICO-Detection
PubMed PICO Element Detection Dataset
Stars: ✭ 37 (-21.28%)
Mutual labels:  corpus
jrte-corpus
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
Stars: ✭ 66 (+40.43%)
Mutual labels:  corpus
DeepSentiPers
Repository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-63.83%)
Mutual labels:  corpus
named-entity-recognition-template
Build a deep learning model for predicting the named entities from text.
Stars: ✭ 51 (+8.51%)
Mutual labels:  corpus
SpiCE-Corpus
An open-access corpus of conversational bilingual speech in Cantonese and English
Stars: ✭ 33 (-29.79%)
Mutual labels:  corpus
CLUEmotionAnalysis2020
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Stars: ✭ 3 (-93.62%)
Mutual labels:  corpus
KWDLC
Kyoto University Web Document Leads Corpus
Stars: ✭ 64 (+36.17%)
Mutual labels:  corpus
OneStopEnglishCorpus
No description or website provided.
Stars: ✭ 38 (-19.15%)
Mutual labels:  corpus
pdf-corpus
Python script to quickly create hand-crafted PDF files
Stars: ✭ 17 (-63.83%)
Mutual labels:  corpus
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-53.19%)
Mutual labels:  corpus
egret-wenda-corpus
A Public Corpus for Machine Learning
Stars: ✭ 41 (-12.77%)
Mutual labels:  corpus
folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
Stars: ✭ 56 (+19.15%)
Mutual labels:  corpus
Species-Names-Corpus
物种名称语料库。植物名,动物名。
Stars: ✭ 23 (-51.06%)
Mutual labels:  corpus
dialogue-datasets
collect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (-48.94%)
Mutual labels:  corpus
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (+100%)
Mutual labels:  corpus

Open Discourse

Table of Content

Project Info

The platform is our contribution to democratizing access to political debates and issues.

Open Discourse is a non-profit project of the employees of Limebit GmbH. The idea emerged from the skills and motivations of the employees, in break conversations and from the common ideas of democracy.

We hope that through our preliminary work, data-based journalism, science and civil society will benefit and that the facilitated access to data will encourage to analyze the political history of the Bundestag based on the language used by politicians.

Repository Structure

This Repo is structured in three different parts.

  • database:
    • Docker-Container for the Postgres Database
    • Contains Scripts that update the Database
  • frontend:
    • Frontend for the Full Text Search
  • proxy:
    • Docker-Container for the Proxy, which protects the database
  • python:
    • Includes every python script in different subsections, sorted by execution order

Docker Setup

For a quick setup using Docker, please read the DOCKER_SETUP

Local Setup

Required software: python3, yarn, docker-compose, node version 12 - ideally installed via node version manager (nvm)

  • run yarn in following directories:
    • database
    • frontend
  • run sh setup.sh in the python directory
  • run docker-compose build in the root folder

Start the Database

These steps will guide you through starting the Database

Database: Normal Start

You can easily start the Database via docker-compose.

// run from repository root
docker-compose up -d database

Database: Initial Start / Reset

For the initial start of the Database, you will also need to upload the schema.

// run from database folder
yarn run db:update:local

Generate Data

Generate the OpenDiscourse-Database from the ground up. The Database has to be started for this script to finish.

This script is just a pipeline executing all scripts in src. You can also manually run every script seperatly. For Documentation on this, please visit the README in src

// run from python folder
sh build.sh

Start the Full Text Search

Note: All of the previous steps have to be completed at least once for the Full Text Search to work properly.

If you want to setup the Full Text Search, follow these steps:

  • run yarn in following directories:
    • frontend
    • proxy

Choose one of the following ways to start the Frontend:

Run Frontend with Docker

  • run docker-compose up -d in the root folder

Run Frontend locally

  • run docker-compose up -d database proxyin the root folder
  • run yarn dev in the frontend folder

Further Documentation

Notes

  • We use Python 3.7.4 during development of the project
  • The graphql endpoint was deprecated and removed by version 1.1.0
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].