All Projects → logpai → bugrepo

logpai / bugrepo

Licence: other
A collection of publicly available bug reports

Projects that are alternatives of or similar to bugrepo

industrial-ml-datasets
A curated list of datasets, publically available for machine learning research in the area of manufacturing
Stars: ✭ 45 (-51.61%)
Mutual labels:  datasets
metadat
Meta-analytic datasets for R
Stars: ✭ 21 (-77.42%)
Mutual labels:  datasets
scRNAseq cell cluster labeling
Scripts to run and benchmark scRNA-seq cell cluster labeling methods
Stars: ✭ 41 (-55.91%)
Mutual labels:  datasets
delitos-caba
🚓 Crime dataset for the City of Buenos Aires, Argentina
Stars: ✭ 44 (-52.69%)
Mutual labels:  datasets
CompBioDatasetsForMachineLearning
A Curated List of Computational Biology Datasets Suitable for Machine Learning
Stars: ✭ 90 (-3.23%)
Mutual labels:  datasets
dagpi
Dagpi is a powerful and fast api that does image manipulation as well as serves datasets. It is fast and written in rust and python. Perfect for discord bots, social media apps, camera apps and more.
Stars: ✭ 25 (-73.12%)
Mutual labels:  datasets
firestore-to-bigquery-export
NPM package for copying and converting Cloud Firestore data to BigQuery.
Stars: ✭ 26 (-72.04%)
Mutual labels:  datasets
DiscEval
Discourse Based Evaluation of Language Understanding
Stars: ✭ 18 (-80.65%)
Mutual labels:  datasets
humanflow2
Official repository of Learning Multi-Human Optical Flow (IJCV 2019)
Stars: ✭ 37 (-60.22%)
Mutual labels:  datasets
biomechanics dataset
Information of public available data sets for biomechanics.
Stars: ✭ 31 (-66.67%)
Mutual labels:  datasets
data.world-py
Python package for data.world
Stars: ✭ 98 (+5.38%)
Mutual labels:  datasets
geodaData
Data package for accessing GeoDa datasets using R
Stars: ✭ 15 (-83.87%)
Mutual labels:  datasets
CHR
SIXray : A Large-scale Security Inspection X-ray Benchmark in CVPR 2019
Stars: ✭ 78 (-16.13%)
Mutual labels:  datasets
clothing-detection-ecommerce-dataset
Clothing detection dataset
Stars: ✭ 43 (-53.76%)
Mutual labels:  datasets
Google-Playstore-Dataset
Google PlayStore App dataset. (2.3 million App Data) and 24 attributes
Stars: ✭ 27 (-70.97%)
Mutual labels:  datasets
videohash
Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
Stars: ✭ 155 (+66.67%)
Mutual labels:  duplicate-detection
dh-core
Functional data science
Stars: ✭ 123 (+32.26%)
Mutual labels:  datasets
torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Stars: ✭ 1,125 (+1109.68%)
Mutual labels:  datasets
Thirukkural-Tamil-Dataset
திருக்குறள் by திருவள்ளுவர்.
Stars: ✭ 44 (-52.69%)
Mutual labels:  datasets
mlx
Machine Learning eXchange (MLX). Data and AI Assets Catalog and Execution Engine
Stars: ✭ 132 (+41.94%)
Mutual labels:  datasets

BugRepo

BugRepo maintains a collection of bug reports that are publicly available for research purposes. Bug reports are a main data source for facilitating NLP-based research in software engineering. We categorize the datasets into the following research directions.

1. Duplicate bug idenfication

Project Timespan #Components #Issues #Issue/day #Duplicates %Duplicates Median Resolving Time
Mozilla Core 1997/03/28 ~ 2013/12/31 130 205,069 33.5 44,691 21.8% 102.1 days
Firefox 1999/07/30 ~ 2013/12/31 52 115,814 22.0 35,814 30.9% 76.4 days
Thunderbird 2000/04/12 ~ 2013/12/31 23 32,551 6.5 12,501 38.4% 83.7 days
Eclipse Platform 2001/10/10 ~ 2013/12/30 21 85,156 19.1 14,404 16.9% 29.8 days
JDT 2001/10/10 ~ 2013/12/31 6 45,296 10.1 7,688 17.0% 23.0 days
Spark 2010/04/01 ~ 2018/01/10 29 22,639 8.0 3,077 13.6% 7.1 days
Hadoop 2005/07/24 ~ 2017/11/01 45 12,855 2.9 1,861 14.5% 14.3 days
MapReduce 2006/03/17 ~ 2018/01/15 63 7,019 1.6 977 13.9% 28.2 days
HDFS 2006/04/06 ~ 2018/01/12 71 12,779 3.0 1,659 13.0% 9.7 days
HBase 2007/02/27 ~ 2018/01/21 95 19,788 5.0 1,340 6.8% 6.8 days
Cassandra 2009/03/07 ~ 2018/01/21 24 14,071 4.3 2,083 14.8% 8.6 days
Mesos 2011/02/16 ~ 2018/01/26 40 8,454 3.3 800 9.5% 23.5 days

Train/test data splitting: We split each dataset into 80%, 20% according to the chronological order as train/test data respectively.

Project Total (+/-) Train (+/-) Test (+/-)
Mozilla Core 205,069 (54,237/150,832) 164,055 (50,122/113,933) 41,014 (4,115/36,899)
Firefox 115,814 (34,262/81,552) 92,651 (30,026/62625) 23,163 (4,236/18,927)
Thunderbird 32,551 (11,631/20,920) 26,040 (10,046/15,994) 6,511 (1,585/4,926)
Eclipse Platform 85,156 (19,845/65,311) 68,124 (17,518/50,606) 17,032 (2,327/14,705)
JDT 45,296 (10,127/35,169) 36,236 (8,859/27,377) 9,060 (1,268/7,792)
Spark 19,766 (2,813/16,953) 15,812 (2,425/13,387) 3,972 (388/3,566)
Hadoop 10,624 (827/9,797) 8,499 (656/7,843) 2,125 (171/1,954)
MapReduce 5,608 (880/4,728) 4,486 (779/3,707) 1,122 (101/1,021)
HDFS 10,676 (1,530/9,146) 8,540 (1,398/7,142) 2,136 (132/2,004)
HBase 16,594 (455/16,139) 13,275 (384/12,891) 3,319 (71/3,248)
Cassandra 11,950 (1,261/10,689) 9,560 (962/8,598) 2,390 (299/2,091)
Mesos 6,564 (615/5,949) 5,251 (535/4,716) 1,313 (80/1,233)

Links to more duplicate bug report datasets

Publications

2. Bug localization

Bug localization is a process to map a bug report to the corresponding buggy source file. This dataset contains bug reports, commit history, and API descriptions of six open source Java projects including Eclipse Platform UI, SWT, JDT, AspectJ, Birt, and Tomcat. The dataset is currently available here.

Project Timespan #Bugs mapped
AspectJ 2002-03-13 ~ 2014-01-10 593
Birt 2005-06-14 ~ 2013-12-19 4,178
Eclipse 2001-10-10 ~ 2014-01-17 6,495
JDT 2001-10-10 ~ 2014-01-14 6,274
SWT 2002-02-19 ~ 2014-01-17 4,151
Tomcat 2002-07-06 ~ 2014-01-18 1,056

Publications

3. Bug triaging

Given a software bug report, bug triaging is the process to identify an appropriate developer who could fix the bug. Automatic bug triaging algorithm can be formulated as a classification problem, which takes the bug title and description as the input, mapping it to one of the available developers (class labels). The dataset is currently available here.

Project #Bugs #Bugs for classifier
Chromium 383,104 118,643
Mozilla Core 314,388 128,215
Firefox 162,307 24,214

Publications

4. Bug-fixing time estimation

The bug report datasets hosted in this repository contain detailed information about bug fixing time tracking, which can thus be used for research on bug-fixing time estimation.

Publications

5. Bug information mining

Lamkanfi et al. [MSR'13] contributed a dataset with over 200.000 reported bugs extracted from the Eclipse and Mozilla projects. Besides providing a single snapshot of a bug report, they also include all the incremental modifications as performed during the lifetime of the bug report. The dataset is currently available here.

Project #Components #Bugs
Eclipse Platform 22 24,775
JDT 6 10,814
CDT 20 5,640
GEF 5 5,655
Mozilla Core 137 74,292
Firefox 47 69,879
Thunderbird 23 19,237
Bugzilla 21 4,616

Publications

  • [MSR'13] Ahmed Lamkanfi and Javier Perez and Serge Demeyer. The Eclipse and Mozilla Defect Tracking Dataset: a Genuine Dataset for Mining Bug Information. International Working Conference on Mining Software Repositories (MSR), 2013.

License

The datasets are freely available for research purposes.

LogPAI Team, 2018.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].