All Projects → asreview → systematic-review-datasets

asreview / systematic-review-datasets

Licence: MIT License
A collection of fully labeled systematic review datasets (title-abstract screening)

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to systematic-review-datasets

asreview-wordcloud
ASReview extension to generate wordcloud from data files.
Stars: ✭ 14 (-44%)
Mutual labels:  utrecht-university, asreview
asreview-visualization
Visualization extension for ASReview
Stars: ✭ 16 (-36%)
Mutual labels:  utrecht-university, asreview
node-emails-from-csv
A simple NodeJS aplication that helps sending emails for events. Uses CSV files for target users.
Stars: ✭ 18 (-28%)
Mutual labels:  csv-files
traj-pred-irl
Official implementation codes of "Regularizing neural networks for future trajectory prediction via IRL framework"
Stars: ✭ 23 (-8%)
Mutual labels:  datasets
csv2latex
🔧 Simple script in python to convert CSV files to LaTeX table
Stars: ✭ 54 (+116%)
Mutual labels:  csv-files
covid19-datasets
A list of high quality open datasets for COVID-19 data analysis
Stars: ✭ 56 (+124%)
Mutual labels:  datasets
Text-Summarization-Repo
텍스트 요약 분야의 주요 연구 주제, Must-read Papers, 이용 가능한 model 및 data 등을 추천 자료와 함께 정리한 저장소입니다.
Stars: ✭ 213 (+752%)
Mutual labels:  datasets
Few-Shot-Intent-Detection
Few-Shot-Intent-Detection includes popular challenging intent detection datasets with/without OOS queries and state-of-the-art baselines and results.
Stars: ✭ 63 (+152%)
Mutual labels:  datasets
allie
🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Stars: ✭ 93 (+272%)
Mutual labels:  datasets
Three-Filters-to-Normal
Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator (RAL+ICRA'21)
Stars: ✭ 41 (+64%)
Mutual labels:  datasets
napkinXC
Extremely simple and fast extreme multi-class and multi-label classifiers.
Stars: ✭ 38 (+52%)
Mutual labels:  datasets
farabio
🤖 PyTorch toolkit for biomedical imaging ❤️
Stars: ✭ 48 (+92%)
Mutual labels:  datasets
Clustering-Datasets
This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.
Stars: ✭ 189 (+656%)
Mutual labels:  datasets
csv-compare
Tool to compare curves from one csv files with curves from other csv files using an adjustable tolerance
Stars: ✭ 21 (-16%)
Mutual labels:  csv-files
akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 5,155 (+20520%)
Mutual labels:  datasets
PharmacoDB
Search across publicly available datasets to find instances where a drug or cell line of interest has been profiled.
Stars: ✭ 38 (+52%)
Mutual labels:  datasets
awesome-utrecht-university
A curated list of awesome open source projects from Utrecht University.
Stars: ✭ 31 (+24%)
Mutual labels:  utrecht-university
masader
The largest public catalogue for Arabic NLP and speech datasets. There are +250 datasets annotated with more than 25 attributes.
Stars: ✭ 66 (+164%)
Mutual labels:  datasets
let-it-be
中国高等教育群体的心理健康状态数据集
Stars: ✭ 28 (+12%)
Mutual labels:  datasets
extra keras datasets
📃🎉 Additional datasets for tensorflow.keras
Stars: ✭ 20 (-20%)
Mutual labels:  datasets

Systematic Review Datasets

This repository provides an overview of labeled datasets used for Systematic Reviews. The datasets are available under an open licence and can be used for text mining and machine learning purposes. This repository contains scripts to collect, preprocess and clean the systematic review datasets.

Datasets

The datasets are alphabetically ordered. See index.csv for all available properties.

id topic n_papers n_included license
Appenzeller-Herzog_2020 Wilson disease 3453 29 CC-BY Attribution 4.0 International
Bannach-Brown_2019 Animal Model of Depression 1993 280 CC-BY Attribution 4.0 International
Bos_2018 Dementia 5746 11 CC-BY Attribution 4.0 International
Cohen_2006_ACEInhibitors ACEInhibitors 2544 41 custom open license
Cohen_2006_ADHD ADHD 851 20 custom open license
Cohen_2006_Antihistamines Antihistamines 310 16 custom open license
Cohen_2006_AtypicalAntipsychotics Atypical Antipsychotics 1120 146 custom open license
Cohen_2006_BetaBlockers Beta Blockers 2072 42 custom open license
Cohen_2006_CalciumChannelBlockers Calcium Channel Blockers 1218 100 custom open license
Cohen_2006_Estrogens Estrogens 368 80 custom open license
Cohen_2006_NSAIDS NSAIDS 393 41 custom open license
Cohen_2006_Opiods Opiods 1915 15 custom open license
Cohen_2006_OralHypoglycemics Oral Hypoglycemics 503 136 custom open license
Cohen_2006_ProtonPumpInhibitors Proton Pump Inhibitors 1333 51 custom open license
Cohen_2006_SkeletalMuscleRelaxants Skeletal Muscle Relaxants 1643 9 custom open license
Cohen_2006_Statins Statins 3465 85 custom open license
Cohen_2006_Triptans Triptans 671 24 custom open license
Cohen_2006_UrinaryIncontinence Urinary Incontinence 327 40 custom open license
Hall_2012 Software Fault Prediction 8911 104 CC-BY Attribution 4.0 International
Kitchenham_2010 Software Engineering 1704 45 CC-BY Attribution 4.0 International
Kwok_2020 Virus Metagenomics 2481 120 CC-BY Attribution 4.0 International
Nagtegaal_2019 Nudging 2019 101 CC0
Radjenovic_2013 Software Fault Prediction 6000 48 CC-BY Attribution 4.0 International
Wahono_2015 Software Defect Detection 7002 62 CC-BY Attribution 4.0 International
Wolters_2018 Dementia 5019 19 CC-BY Attribution 4.0 International
van_Dis_2020 Anxiety-Related Disorders 10953 73 CC-BY Attribution 4.0 International
van_de_Schoot_2017 PTSD Trajectories 6189 43 CC-BY Attribution 4.0 International

Publishing your data

For publishing either your data, we recommend using the Open Science frame (OSF). OSF is part of the Center for Open Science (COS), which aims at increasing openness, integrity, and reproducibility of research (OSF, 2020). How to share your data using OSF: A step-by-step guide.

Another platform to publish your data open access is provided by Zenodo. Zenodo is a platform which encourages scientists to share all materials (including data) that are necessary to understand the scholarly process (Zenodo, 2020).

When uploading your dataset to OSF or Zenodo, make sure to provide all relevant information about the dataset, by filling out all available fields. The data to be put on Zenodo or OSF can be documented as extensively as you would like (flowcharts, explanation of certain decisions, etc.). This can include a link to the systematic review itself, if it has been published elsewhere.

License

When sharing your dataset or a link to your already published systematic review, we recommend using a CC-BY or CC0 license for both Zenodo and OSF. By adding a Creative Commons license, everybody from individual creators to large institutions are given a standardized way to allow use of their creative work under copyright law (Creative Commons, 2020).

In short, the CC-BY license means that reusers are allowed to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. The CC0 license releases data in the public domain, allowing reuse in any form without any conditions. This can be appropriate when sharing (meta)data only. With both OSF (see step-by-step guide) and Zenodo you can easily add the license to your project after creating a project in either platform.

File format

The folder datasets/ has subfolders for the different systematic reviews datasets. In each of these subfolders, the .ipynb script retrieves a dataset from OSF or Zenodo, and preprocesses it by adding customized labels and marking duplicates. The script also reports the inclusion rate, and missing patterns and word clouds of titles and abstracts. After preprocessing, an ASReview-compatible dataset in .csv format is generated in the output/ folder. Extensions .csv, .xlsx, and .xls. CSV files should be comma-separated and UTF-8 encoded. To indicate labeling decisions, one can use "included" or "label_included". This label should be filled with all 0’s and 1’s, where 0 means that the record is not included and 1 means included.

License

The scripts in the current project are MIT licensed. The datasets (should) have a permissive license.

Contact

Contact details can be found at the ASReview project page.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].