All Projects → nishkalavallabhi → OneStopEnglishCorpus

nishkalavallabhi / OneStopEnglishCorpus

Licence: CC-BY-SA-4.0 License
No description or website provided.

Projects that are alternatives of or similar to OneStopEnglishCorpus

Awesome Deeplearning Resources
Deep Learning and deep reinforcement learning research papers and some codes
Stars: ✭ 2,483 (+6434.21%)
Mutual labels:  paper, corpus
folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
Stars: ✭ 56 (+47.37%)
Mutual labels:  corpus
neural network papers
记录一些读过的论文,给出个人对论文的评分情况并简述论文insight
Stars: ✭ 152 (+300%)
Mutual labels:  paper
adage
Data and code related to the paper "ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa..." Jie Tan, et al · mSystems · 2016
Stars: ✭ 61 (+60.53%)
Mutual labels:  paper
Awesome-Polarization
List of awesome papers on Polarization Imaging
Stars: ✭ 31 (-18.42%)
Mutual labels:  paper
Object-Detection-Confidence-Bias
Code for "The Box Size Confidence Bias Harms Your Object Detector" (https://arxiv.org/abs/2112.01901)
Stars: ✭ 22 (-42.11%)
Mutual labels:  paper
influence boosting
Supporting code for the paper "Finding Influential Training Samples for Gradient Boosted Decision Trees"
Stars: ✭ 57 (+50%)
Mutual labels:  paper
Wavelet-like-Auto-Encoder
No description or website provided.
Stars: ✭ 61 (+60.53%)
Mutual labels:  paper
thai-language
computer tools for thai language
Stars: ✭ 20 (-47.37%)
Mutual labels:  corpus
named-entity-recognition-template
Build a deep learning model for predicting the named entities from text.
Stars: ✭ 51 (+34.21%)
Mutual labels:  corpus
ZSL-ADA
Code accompanying the paper "A Generative Framework for Zero Shot Learning with Adversarial Domain Adaptation"
Stars: ✭ 18 (-52.63%)
Mutual labels:  paper
Paper Note
📚 记录一些自己读过的论文与笔记
Stars: ✭ 22 (-42.11%)
Mutual labels:  paper
gemnet pytorch
GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)
Stars: ✭ 80 (+110.53%)
Mutual labels:  paper
midi degradation toolkit
A toolkit for generating datasets of midi files which have been degraded to be 'un-musical'.
Stars: ✭ 29 (-23.68%)
Mutual labels:  paper
my-bookshelf
Collection of books/papers that I've read/I'm going to read/I would remember that they exist/It is unlikely that I'll read/I'll never read.
Stars: ✭ 49 (+28.95%)
Mutual labels:  paper
KWDLC
Kyoto University Web Document Leads Corpus
Stars: ✭ 64 (+68.42%)
Mutual labels:  corpus
RTRT-Trans-Caustics
A reference implementation of ”Rendering transparent objects with caustics using real-time ray tracing“ using Unreal Engine 4.25.1.
Stars: ✭ 12 (-68.42%)
Mutual labels:  paper
TAGCN
Tensorflow Implementation of the paper "Topology Adaptive Graph Convolutional Networks" (Du et al., 2017)
Stars: ✭ 17 (-55.26%)
Mutual labels:  paper
sensim
Sentence Similarity Estimator (SenSim)
Stars: ✭ 15 (-60.53%)
Mutual labels:  paper
PubMed-PICO-Detection
PubMed PICO Element Detection Dataset
Stars: ✭ 37 (-2.63%)
Mutual labels:  corpus

This repository hosts the dataset described in the following paper:

OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
Sowmya Vajjala and Ivana Lučić
2018
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 297–304. Association for Computational Linguistics.
url. bib file

Please cite the above paper if you use this corpus in your research.

DOI

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Description of this repo:

  • Texts-SeparatedByReadingLevel/: This is the actual corpus folder, containing three sub-folders, one per reading level. Each file has the same name followed by a -ele.txt/-int.txt/-adv.txt depending on the sub-folder it is in.
  • Texts-Together-OneCSVperFile/: This folder has one csv file per text, three columns for three reading levels. Paragraph breaks are preserved.
  • Sentence-Aligned/: This folder contains three text files, with pair-wise sentence alignments (adv-int, int-ele, adv-ele). Cosine similarity was used to align sentences.
  • Processed-AllLevels-AllFiles/ : folder contains sub-folders with output files from Stanford parser, Stanford CoreNLP, and Upenn's Discourse Connectives Tagger

For enquiries: contact: [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].