All Projects → selimfirat → bilkent-turkish-writings-dataset

selimfirat / bilkent-turkish-writings-dataset

Licence: other
Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to bilkent-turkish-writings-dataset

turkce-kufur-karaliste
Türkçe için bir kara liste (blacklist)
Stars: ✭ 117 (+290%)
Mutual labels:  turkish, turkish-language
language-detector
Detect the language of text
Stars: ✭ 28 (-6.67%)
Mutual labels:  turkish
TurkishWordNet
Turkish WordNet KeNet
Stars: ✭ 32 (+6.67%)
Mutual labels:  turkish
tdk
python library for turkish dictionary. 📕🇹🇷
Stars: ✭ 17 (-43.33%)
Mutual labels:  turkish
clean-code-javascript-tr
🛁 JavaScript için uyarlanmış Temiz Kod kavramları
Stars: ✭ 86 (+186.67%)
Mutual labels:  turkish
almanca
Almanca dilbilgisi ve gramer notlari / Lesson notes I have taken to learn the German language beginning from A1.
Stars: ✭ 15 (-50%)
Mutual labels:  turkish
Divan.hs
Ottoman Divan poetry vezin checker in Haskell!
Stars: ✭ 37 (+23.33%)
Mutual labels:  turkish
Question-Answering-based-on-SQuAD
Question Answering System using BiDAF Model on SQuAD v2.0
Stars: ✭ 20 (-33.33%)
Mutual labels:  nlp-datasets
TurkeyStartup
Türkiye 'de ki Melek Yatırımcı Listesi
Stars: ✭ 30 (+0%)
Mutual labels:  turkish
zeyrek
Python morphological analyzer for Turkish language. Partial port of ZemberekNLP.
Stars: ✭ 36 (+20%)
Mutual labels:  turkish
Neural-Morphological-Disambiguation-for-Turkish-DEPRECATED
Neural morphological disambiguation for Turkish. Implemented in DyNet
Stars: ✭ 11 (-63.33%)
Mutual labels:  turkish
bilisim sozlugu
Translating computer words from English to Turkish
Stars: ✭ 28 (-6.67%)
Mutual labels:  turkish
startup-sozlugu
Startup dünyasında sık kullan kelimeler ve terimler
Stars: ✭ 21 (-30%)
Mutual labels:  turkish
turengo
Translate text using tureng.com from your terminal.
Stars: ✭ 57 (+90%)
Mutual labels:  turkish-language
Turkcekaynaklar Com
Özenle seçilmiş Türkçe kaynaklar listesi En: Curated list of Turkish resources
Stars: ✭ 1,810 (+5933.33%)
Mutual labels:  turkish
moonstar
MoonStar Türkçe Dil Kılavuzu
Stars: ✭ 11 (-63.33%)
Mutual labels:  turkish
fsharp kitap
Kitap Önizleme Versiyonu
Stars: ✭ 25 (-16.67%)
Mutual labels:  turkish
turkish banks
All Turkish Banks and Their Branches
Stars: ✭ 28 (-6.67%)
Mutual labels:  turkish
backend-best-practices
Backend uygulamaları geliştirirken dikkate alınabilecek örnek yöntemlerin derlendiği güncellenen bir kaynak.
Stars: ✭ 80 (+166.67%)
Mutual labels:  turkish
phpwaf-phanalyzer
AliGuard PHP WAF
Stars: ✭ 12 (-60%)
Mutual labels:  turkish

Bilkent Turkish Writings Dataset

This dataset contains the turkish creative writings of Turkish 101 and Turkish 102 courses between 2014-2018. It contains 4 publicly published writings of students 2 for each course. The writings in this dataset promotes creativity, content, composition, grammar, spelling and punctuation.

The writings can be found here as bunch of PDFs.

The dataset is continuously growing since each semester new texts are published publicly.

Currently, there are 6,844 writings in this dataset which is 33.1MB of data in a csv file.

Description of Turkish 101 & 102 in Bilkent University

This course is the first of a sequence of two courses designed to develop creative writing skills of the students through their own writings in Turkish. It is an active learning course. Students write their own blogs and instructors comment and send feedback about the creativity, content, composition, grammar, spelling and punctuation of the writing regularly.

Downloading the dataset

The data can be found in ./data/texts.csv.

git clone https://github.com/selimfirat/bilkent-turkish-writings-dataset.git
mv ./bilkent-turkish-writings-dataset/data/texts.csv <TARGET_PATH>

How to scrape from scratch

git clone https://github.com/selimfirat/bilkent-turkish-writings-dataset.git
pip install -r requirements.txt
cd bilkent-turkish-writings-dataset/scraper
scrapy crawl bilkent_turkish_writings
cd ../
python convert_to_text.py

In the end, there will be ~2GB of PDFs(it worth to continuous crawling & preprocessing) which can be deleted after the converting to text is done. The last two line suggested to executed using this notebook.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].