All Projects → elleros → Courseraforums

elleros / Courseraforums

Anonymized versions of the discussion threads from the forums of 60 Coursera MOOCs

Projects that are alternatives of or similar to Courseraforums

Elastic data
Elasticsearch datasets ready for bulk loading
Stars: ✭ 30 (-40%)
Mutual labels:  dataset
Human3.6m downloader
Human3.6M downloader by Python
Stars: ✭ 37 (-26%)
Mutual labels:  dataset
Multidigitmnist
Combine multiple MNIST digits to create datasets with 100/1000 classes for few-shot learning/meta-learning
Stars: ✭ 48 (-4%)
Mutual labels:  dataset
Wikisql
A large annotated semantic parsing corpus for developing natural language interfaces.
Stars: ✭ 965 (+1830%)
Mutual labels:  dataset
Okutama Action
Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection
Stars: ✭ 36 (-28%)
Mutual labels:  dataset
Qri
you're invited to a data party!
Stars: ✭ 1,003 (+1906%)
Mutual labels:  dataset
Dns Lots Of Lookups
dnslol is a command line tool for performing lots of DNS lookups.
Stars: ✭ 30 (-40%)
Mutual labels:  dataset
Distil
💧 In memory dataset filtering, inspired by snikch/aggro
Stars: ✭ 49 (-2%)
Mutual labels:  dataset
Pts
Quantized Mesh Terrain Data Generator and Server for CesiumJS Library
Stars: ✭ 36 (-28%)
Mutual labels:  dataset
Watermarkreco
Pytorch implementation of the paper "Large-Scale Historical Watermark Recognition: dataset and a new consistency-based approach"
Stars: ✭ 45 (-10%)
Mutual labels:  dataset
Multi Plier
An unsupervised transfer learning approach for rare disease transcriptomics
Stars: ✭ 33 (-34%)
Mutual labels:  dataset
Dataconfs
A list of conferences connected with data worldwide.
Stars: ✭ 36 (-28%)
Mutual labels:  dataset
Covid Ctset
Large Covid-19 CT scans dataset from paper: https://doi.org/10.1101/2020.06.08.20121541
Stars: ✭ 40 (-20%)
Mutual labels:  dataset
Rstudioconf tweets
🖥 A repository for tracking tweets about rstudio::conf
Stars: ✭ 32 (-36%)
Mutual labels:  dataset
Mtnt
Code for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-4%)
Mutual labels:  dataset
Day night dataset list
Collecting a list of dataset with day and night annotations
Stars: ✭ 30 (-40%)
Mutual labels:  dataset
People Counting Dataset
the large-scale data set for people counting (LOI counting)
Stars: ✭ 37 (-26%)
Mutual labels:  dataset
Php Ml
PHP-ML - Machine Learning library for PHP
Stars: ✭ 7,900 (+15700%)
Mutual labels:  dataset
Chinesetrafficpolicepose
Detects Chinese traffic police commanding poses 检测中国交警指挥手势
Stars: ✭ 49 (-2%)
Mutual labels:  dataset
Letsgodataset
This repository makes the integral Let's Go dataset publicly available.
Stars: ✭ 41 (-18%)
Mutual labels:  dataset

Coursera Forums

This repository provides the anonymized versions of the discussion threads from the forums of 60 Coursera Massive Open Online Courses (MOOCs), for a total of about 100,000 threads. This dataset is associated to the paper:

Language independent analysis and classification of discussion threads in coursera MOOC forums, by Lorenzo A. Rossi and Omprakash Gnawali, IEEE International Conference on Information Reuse and Integration (IRI), August 2014.

According to the 2019 Google Scholar Metrics, our paper is the 2nd most cited in Information Reuse and Integration for the July 2014 - July 2019 period.

If you use the dataset for your work, please cite the paper. BiBTeX entry:

@inproceedings{coursera-iri2014,
   author = {Lorenzo A. Rossi and Omprakash Gnawali},
   title = {{Language Independent Analysis and Classification of Discussion Threads in Coursera MOOC Forums}},
   booktitle = {Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI 2014)},
   month = aug,
   year = {2014}
}

If you have questions, email lorenzo [dot] rossi [at] gmail

The dataset has been anonymized as follows: the text and the names of the authors of all the posts and comments have been removed. The author indentifiers have been hashed so not to match the Coursera user indentifiers. Anonymous users have ID 0 on this repository as well as on Coursera.

Description of the CSV files

All the files are in the folder data.

course_information.csv

Basic information about the 60 courses used in this dataset.

  • name course name
  • course_id Coursera course identifier
  • weeks course duration (in weeks)
  • hours average number of hours ofcourse work requested per week
  • start_date start date
  • end_date end date (often not enetered in this file)
  • type type of course ('Q': quantitative, ...)
  • language language use to teach the course (also the main but not necessarily the only language of the forums). 'E': English, 'FR': French, 'SP': Spanish, 'CH': Chinese
  • num_threads number of threads in the forum
  • mandatory_posts number of posts to receive credits for activity in the forums some Coursera courses give credits for posting in the forums
  • num_users number of unique users active in the forum. Anonymous users are counted as 1 unit; so the number of actual users may be larger.

course_subforums.csv

Data about the threads and the subpforums containing them for the discussion forums of the 60 courses.

  • thread_id thread identifier
  • course_id Coursera course identifier
  • og_forum name of the original (sub)forum
  • og_forum_id identifier (int) of the original subforum
  • parent_forum name of the parent (sub)forum
  • parent_forum_id identifier of the parent (sub)forum
  • forum_chain complete sequence of (sub)forum names from root to current subforum
  • depth number of (sub)forum in forum_chain
  • num_views number of views for the thread
  • num_tags number of tags associated to the subforum title
  • forum_id possibly re-mapped subforum identifier
    • 2: General (Miscellaneous) Discussion
    • 3: Assignments
    • 4: Study Groups / Meetups
    • 7: Course Feedback / Suggestions
    • 8: Lectures
    • 9: Platform Issues
    • 100: Signature Track
    • otherwise: not remapped

course_posts.csv

Data aboout all the posts or comments (minus spam) made in the discussion forums of the 60 courses.

  • post_id identifier (integer) of the post (comment)
  • thread_id identifier of the thread
  • course_id Coursera course identifier
  • parent_id identifier of the parent post (0 for posts, nonzero for comments)
  • order order of the post in the thread (1: first post, 2: ..., 0: comment)
  • user_id user ID (hashed version of original Coursera user ID)
  • user_type 'Student', 'Anonymous', 'Staff', 'Instructor', 'Community TA', 'Coursera Staff' or 'Coursera Tech Support'
  • post_time time stamp
  • relative_t normalized posting time relative to the course's start and end time
  • votes sum of the votes received by the post (comment). Each user can add +/-1 to a post
  • num_words (new) number of words (NA for posts in Chinese language)
  • forum_id possibly re-mapped subforum identifier
    • 2: General (Miscellaneous) Discussion
    • 3: Assignments
    • 4: Study Groups / Meetups
    • 7: Course Feedback / Suggestions
    • 8: Lectures
    • 9: Platform Issues
    • 100: Signature Track
    • otherwise: not remapped
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].