All Projects → tswsxk → EduData

tswsxk / EduData

Licence: Apache-2.0 license
Edudata: Datasets in Education and convenient interface for downloading and preprocessing dataset in education

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to EduData

GDINA
GDINA
Stars: ✭ 23 (-41.03%)
Mutual labels:  psychometrics, cognitive-diagnosis
cotswoldjam
Command-line utilities for the Raspberry Pi, of particular interest to Raspberry Jam organisers & educators
Stars: ✭ 90 (+130.77%)
Mutual labels:  education
awesome-learning-collections
✨ A curated list of awesome learning collections on various topics.
Stars: ✭ 76 (+94.87%)
Mutual labels:  education
google-guide-to-technical-development
This guide provides tips and resources to help you develop your technical skills (academically and non-academically) through self-paced, hands-on learning. This guide is intended for Computer Science students seeking an internship or university grad role at Google.
Stars: ✭ 90 (+130.77%)
Mutual labels:  education
xlines
X lines of Python
Stars: ✭ 100 (+156.41%)
Mutual labels:  education
pylife
a general library for fatigue and reliability
Stars: ✭ 45 (+15.38%)
Mutual labels:  education
diwa
A Deliberately Insecure Web Application
Stars: ✭ 32 (-17.95%)
Mutual labels:  education
Main
Management materials and content
Stars: ✭ 32 (-17.95%)
Mutual labels:  education
v3
Lightweight, multi-board spaces for teaching remote classes
Stars: ✭ 31 (-20.51%)
Mutual labels:  education
physics-is-beautiful
Files for Physics Is Beautiful Website
Stars: ✭ 12 (-69.23%)
Mutual labels:  education
Awesome CV
Curated educational list for computer vision
Stars: ✭ 68 (+74.36%)
Mutual labels:  education
30-seconds-of-csharp
Short C# code snippets for all your development needs
Stars: ✭ 132 (+238.46%)
Mutual labels:  education
HEVD Kernel Exploit
Exploits pack for the Windows Kernel mode driver HackSysExtremeVulnerableDriver written for educational purposes.
Stars: ✭ 44 (+12.82%)
Mutual labels:  education
Spyware
Python-based spyware for Windows that logs the foreground window activites, keyboard inputs. Furthermore it is able to take screenshots and and run shell commands in the background.
Stars: ✭ 31 (-20.51%)
Mutual labels:  education
legesher
Because language shouldn't be a barrier to code
Stars: ✭ 29 (-25.64%)
Mutual labels:  education
snake
Basic Snake Game in TypeScript
Stars: ✭ 25 (-35.9%)
Mutual labels:  education
project-omega
A collection of non-trivial coding problems to improve software engineering skills.
Stars: ✭ 15 (-61.54%)
Mutual labels:  education
office-hours-help-queue
A queue to help manage office hours for large courses
Stars: ✭ 77 (+97.44%)
Mutual labels:  education
ji.py
吉.py
Stars: ✭ 12 (-69.23%)
Mutual labels:  education
wikonnect
Wikonnect seeks to bridge the digital divide through the provision of digital literacy skills. Management support through Asha (www.asha.io).
Stars: ✭ 31 (-20.51%)
Mutual labels:  education

EduData

PyPI test codecov Documentation Status Download License DOI

Convenient interface for downloading and preprocessing datasets in education. For more details, please refer to the full documentation.

The datasets include:

Your can also visit our datashop BaseData to get those mentioned-above (most of them) datasets.

Except those mentioned-above dataset, we also provide some benchmark dataset for some specified task, which is listed as follows:

Installation

Git and install by pip

pip install -e .

or install from pypi:

pip install EduData

CLI

edudata $subcommand $parameters1 $parameters2

To see the help information:

edudata -- --help
edudata $subcommand --help

The cli tools is constructed based on fire. Refer to the documentation for detailed usage.

Download Dataset

Before downloading dataset, first check the available dataset:

edudata ls

and get:

assistment-2009-2010-skill
assistment-2012-2013-non-skill
assistment-2015
junyi
...
ktbd
ktbd-a0910
ktbd-junyi
ktbd-synthetic
...

Download the dataset by specifying the name of dataset:

edudata download assistment-2009-2010-skill

In order to change the storing directory, use the following order:

edudata download assistment-2009-2010-skill $dir

For detailed information of each dataset, refer to the docs

Task Specified Tools

Knowledge Tracing


Format converter

In Knowledge Tracing task, there is a popular format (we named it triple line (tl) format) to represent the interaction sequence records:

5
419,419,419,665,665
1,1,1,0,0

which can be found in Deep Knowledge Tracing. In this format, three lines are composed of an interaction sequence. The first line indicates the length of the interaction sequence, and the second line represents the exercise id followed by the third line, where each elements stands for correct answer (i.e., 1) or wrong answer (i.e., 0)

In order to deal with the issue that some special symbols are hard to be stored in the mentioned-above format, we offer another one format, named json sequence to represent the interaction sequence records:

[[419, 1], [419, 1], [419, 1], [665, 0], [665, 0]]

Each item in the sequence represent one interaction. The first element of the item is the exercise id (in some works, the exercise id is not one-to-one mapped to one knowledge unit(ku)/concept, but in junyi, one exercise contains one ku) and the second one indicates whether the learner correctly answer the exercise, 0 for wrongly while 1 for correctly
One line, one json record, which is corresponded to a learner's interaction sequence.

We provide tools for converting two format:

# convert tl sequence to json sequence, by default, the exercise tag and answer will be converted into int type
edudata tl2json $src $tar
# convert tl sequence to json sequence without converting
edudata tl2json $src $tar False
# convert json sequence to tl sequence
edudata json2tl $src $tar

Dataset Preprocess

The cli tools to quickly convert the "raw" data of the dataset into "mature" data for knowledge tracing task. The "mature" data is in json sequence format and can be modeled by XKT and TKT(TBA)

junyi

# download junyi dataset to junyi/
>>> edudata download junyi
# build knolwedge graph
>>> edudata dataset junyi kt extract_relations junyi/ junyi/data/
# prepare dataset for knwoeldge tracing task, which is represented in json sequence
>>> edudata dataset junyi kt build_json_sequence junyi/ junyi/data/ junyi/data/graph_vertex.json 1000
# after preprocessing, a json sequence file, named student_log_kt_1000, can be found in junyi/data/
# further preprocessing like spliting dataset into train and test can be performed
>>> edudata train_valid_test junyi/data/student_log_kt_1000 -- --train_ratio 0.8 --valid_ratio 0.1 --test_ratio 0.1

Analysis Dataset

This tool only supports the json sequence format. To check the following statical indexes of the dataset:

  • knowledge units number
  • correct records number
  • the number of sequence
edudata kt_stat $filename

Evaluation

In order to better verify the effectiveness of model, the dataset is usually divided into train/valid/test or using kfold method.

edudata train_valid_test $filename1 $filename2 --train_ratio 0.8 --valid_ratio 0.1 --test_ratio 0.1
edudata kfold $filename1 $filename2 --n_splits 5

Refer to longling for more tools and detailed information.

Citation

If this repository is helpful for you, please cite our work

@misc{bigdata2021edudata,
  title={EduData},
  author={bigdata-ustc},
  publisher = {GitHub},
  journal = {GitHub repository},
  year = {2021},
  howpublished = {\url{https://github.com/bigdata-ustc/EduData}},
}

More works

Refer to our website and github for our publications and more projects

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].