All Projects → ZhangShurong → rebucket

ZhangShurong / rebucket

Licence: other
ReBucket – A Method for Clustering Duplicate Crash Reports based on Call Stack Similarity

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to rebucket

tl-create
tl-create is a cross-platform command line tool to create a X.509 trust list from various trust stores. (Keywords: CABFORUM, eIDAS, WebPKI)
Stars: ✭ 32 (+52.38%)
Mutual labels:  microsoft
Rage
Rage allows you to execute any file in a Microsoft Office document.
Stars: ✭ 68 (+223.81%)
Mutual labels:  microsoft
opensource-management-portal
Microsoft's monolithic GitHub Management Portal enabling enterprise scale self-service powered by the GitHub API 🏔🧑‍💻🧰
Stars: ✭ 369 (+1657.14%)
Mutual labels:  microsoft
WooCommerceConnector
A Power BI Custom Connector for WooCommerce
Stars: ✭ 27 (+28.57%)
Mutual labels:  microsoft
Unity.IO.Compression
A port of System.IO.Compression for Unity
Stars: ✭ 73 (+247.62%)
Mutual labels:  microsoft
cadru
A Microsoft .NET Framework toolkit
Stars: ✭ 58 (+176.19%)
Mutual labels:  microsoft
avz
🔑 AVZ: Social Authorization
Stars: ✭ 43 (+104.76%)
Mutual labels:  microsoft
WFAv7 Installer
Batch script to install Windows 10 ARMv7 to Lumia devices (Dual boot with Windows Phone).
Stars: ✭ 33 (+57.14%)
Mutual labels:  microsoft
n2d
An easy to use ESP8266 flash tool with built-in support for the Deauther Project.
Stars: ✭ 136 (+547.62%)
Mutual labels:  microsoft
fundamental-tools
Web applications with ABAP, done simple.
Stars: ✭ 42 (+100%)
Mutual labels:  microsoft
Windows-Whistler
A port of the Whistler theme that eventually got replaced by Luna in Windows 2001 (XP)
Stars: ✭ 24 (+14.29%)
Mutual labels:  microsoft
react-bootstrap-ribbon
A ribbon menu inspired by Microsoft for React using Bootstrap
Stars: ✭ 24 (+14.29%)
Mutual labels:  microsoft
CleanUnwantedUpdates
A set of scripts to detect updates of Microsoft (TM) Windows (TM) OS which harm users' privacy and uninstall them
Stars: ✭ 24 (+14.29%)
Mutual labels:  microsoft
JetEntityFrameworkProvider
Microsoft Access (Jet) Entity Framework provider
Stars: ✭ 77 (+266.67%)
Mutual labels:  microsoft
MicroHub
🖼 Microsoft + Github = MicroHub
Stars: ✭ 44 (+109.52%)
Mutual labels:  microsoft
mvp-docs
The Unofficial MVP Contribution Guide:
Stars: ✭ 18 (-14.29%)
Mutual labels:  microsoft
Windows-Python-RAT
A New Microsoft Windows Remote Administrator Tool [RAT] with Python by Sir.4m1R.
Stars: ✭ 70 (+233.33%)
Mutual labels:  microsoft
MS-Office-Electron
A Microsoft Office Online Desktop Client made with Electron. Free of Cost.
Stars: ✭ 45 (+114.29%)
Mutual labels:  microsoft
StoreLib
Storelib is a DotNet library that provides APIs to interact with the various Microsoft Store endpoints.
Stars: ✭ 21 (+0%)
Mutual labels:  microsoft
azureselected
Azure Selected content localization.
Stars: ✭ 17 (-19.05%)
Mutual labels:  microsoft

rebucket

implements rebucket algorithm for research.

How To Use?

Usage: python test.py

dataset

https://github.com/logpai/bugrepo

Implement

todo

  • implements rebucket algorithm with c++
  • data strcuture

以下为中文说明

Rebucket算法实现

如果是南科大的小伙伴碰巧找到了本项目,可以看看issue哦。

算法本身请参见rebucket论文,本文档只说明项目相关内容

项目结构

rebucket
|
|---- dataset, 处理后的数据集
|
|---- rebucket, C++实现rebucket
|
|---- generate_dataset.py, 生成数据集的脚本
|
|---- test.py 测试脚本
|
|---- rebucket.py 算法脚本

数据集处理部分

为什么需要处理数据集 因为原始的数据集bugrepo并不是每个记录都含有堆栈,因此需要提取出堆栈信息,声称可用的数据集。生成数据集的脚本是generate_dataset.py。生成数据集的位置在dataset中。
数据集提取算法为:

http://groups.csail.mit.edu/pag/pubs/bettenburg-msr-2008.pdf

数据集格式
因为数据量不大且为了兼容其他项目,因此数据集采用的是json字符串存储。其格式为

{
    "stack_id":"堆栈ID",
    "duplicated_stack":"重复堆栈ID",
    "stack_arr":[堆栈内容,用数组表示]
}

验证算法部分

因为原始的论文中已经提供了详细的度量值,本文只简单描述如何计算分类错误数。
假设正确的分类应该是

{[1,2,3],[4,5,6],[7,8]}

但是由于种种原因,分类错误,导致了以下分类结果:

{[1,2],[3],[4,5,6,7,8]}

上述过程的漏报数为1,因为7,8这两个堆栈均被分到了4,5,6中,意味着,有一类错误没有反应出来。或者换种说法,意味着生产环境中,有一类错误没有上报。
计算漏报数非常简单,只需要对比分类结果与真实结果,找出哪一类没有被分类即可。相关代码在rebucket.py中的wrong函数中。

如何运行c++代码?

进入rebucket目录

mkdir build
cmake ..
make

此时,build目录下面会有动态连接库以及test.py,请执行

python test.py -d ../../dataset/Firefox/df_mozilla_firefox.json
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].