All Projects → vz-risk → Vcdb

vz-risk / Vcdb

Licence: other
VERIS Community Database

Projects that are alternatives of or similar to Vcdb

Nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Stars: ✭ 392 (-1.51%)
Mutual labels:  jupyter-notebook
Ijcai 2018
ijcai-2018 top1 solution
Stars: ✭ 395 (-0.75%)
Mutual labels:  jupyter-notebook
Icnet Tensorflow
TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".
Stars: ✭ 396 (-0.5%)
Mutual labels:  jupyter-notebook
Deepnetsforeo
Deep networks for Earth Observation
Stars: ✭ 393 (-1.26%)
Mutual labels:  jupyter-notebook
User Machine Learning Tutorial
useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html
Stars: ✭ 393 (-1.26%)
Mutual labels:  jupyter-notebook
Deep Learning Har
Convolutional and LSTM networks to classify human activity
Stars: ✭ 395 (-0.75%)
Mutual labels:  jupyter-notebook
Tensorflow Docs
TensorFlow 最新官方文档中文版
Stars: ✭ 3,782 (+850.25%)
Mutual labels:  jupyter-notebook
Automl
Google Brain AutoML
Stars: ✭ 4,795 (+1104.77%)
Mutual labels:  jupyter-notebook
Motion Cosegmentation
Reference code for "Motion-supervised Co-Part Segmentation" paper
Stars: ✭ 393 (-1.26%)
Mutual labels:  jupyter-notebook
Notes On Dirichlet Processes
🎲 Notes explaining Dirichlet Processes, HDPs, and Latent Dirichlet Allocation
Stars: ✭ 396 (-0.5%)
Mutual labels:  jupyter-notebook
Examples
Example deep learning projects that use wandb's features.
Stars: ✭ 391 (-1.76%)
Mutual labels:  jupyter-notebook
Baiduyun deeplearning competition
百度云魅族深度学习应用大赛
Stars: ✭ 393 (-1.26%)
Mutual labels:  jupyter-notebook
Pytorch classification
利用pytorch实现图像分类的一个完整的代码,训练,预测,TTA,模型融合,模型部署,cnn提取特征,svm或者随机森林等进行分类,模型蒸馏,一个完整的代码
Stars: ✭ 395 (-0.75%)
Mutual labels:  jupyter-notebook
Zhihu Text Classification
[2017知乎看山杯 多标签 文本分类] ye组(第六名) 解题方案
Stars: ✭ 392 (-1.51%)
Mutual labels:  jupyter-notebook
Bloom Contrib
Making carbon footprint data available to everyone.
Stars: ✭ 398 (+0%)
Mutual labels:  jupyter-notebook
Pattern classification
A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks
Stars: ✭ 3,880 (+874.87%)
Mutual labels:  jupyter-notebook
Ephemera Miscellany
Ephemera and other documentation associated with the 1337list project.
Stars: ✭ 395 (-0.75%)
Mutual labels:  jupyter-notebook
Face specific augm
Face Renderer to perform Domain (Face) Specific Data Augmentation
Stars: ✭ 398 (+0%)
Mutual labels:  jupyter-notebook
Pandas
Data & Code for my video on the Pandas library of Python
Stars: ✭ 397 (-0.25%)
Mutual labels:  jupyter-notebook
Jupyter Renderers
Renderers and renderer extensions for JupyterLab
Stars: ✭ 395 (-0.75%)
Mutual labels:  jupyter-notebook

The VERIS Community Database

Information sharing is a complex and challenging undertaking. If done correctly, everyone involved benefits from the collective intelligence. If done poorly, it may mislead participants or create a learning opportunity for our adversaries. The Verizon RISK Team supports and participates in a variety of information sharing initiatives and research efforts. We continue to drive the publication of the Verizon Data Breach Investigations Report (DBIR) annually, where we have an unprecedented number of new data-sharing partners, and we are committed to keeping the report publicly available and free to download. We regularly receive inquiries about our dataset, and our ability to share further, but we are limited in what data we can share in raw format due to agreements with our partners and customers.

The Problem

While there are a handful of efforts to capture security incidents that are publicly disclosed, there is no unrestricted, comprehensive raw dataset available for download on security incidents that is sufficiently rich to support both community research and corporate decision-making. There are organizations that collect—and in some form—disseminate aggregated collections, but they are either not in a format that lends itself to ease of data manipulation and transformation required for research, or the underlying data are not freely and publicly available for use. This gap has long hampered researchers who are studying the problems surrounding security incidents, as well as the risk managers who are starved for reliable data upon which to base their risk calculations.

Getting Involved

If you want to get involved in this project, we have directions in the wiki for this repo. If you are new to GitHub, it is the book icon to the top of this page section.

WARNING ON SAMPLING

Most VCDB issues are chosen randomly (with a preferences for those in the last year), however we specifically select healthcare issues and some priority incidents. Incidents not chosen randomly can be identified by the value of 'plus.sub_source'. It will be 'phidbr' for healthcare issues and 'priority' for priority issues. For those wishing to normalize out non-random selection, here is the issue composition as of Jan 13, 2018 to normalize the actuall incidents to:

{
'2013': {'all': 1199, 'phidbr': 0, 'priority': 11},
'2014': {'all': 3885, 'phidbr': 30, 'priority': 113},
'2015': {'all': 1844, 'phidbr': 197, 'priority': 47},
'2016': {'all': 1996, 'phidbr': 516, 'priority': 75},
'2017': {'all': 1826, 'phidbr': 455, 'priority': 75},
'2018': {'all': 28, 'phidbr': 8, 'priority': 1}
}

VCDB Statistics

As of Jan 13, 2018

vcdb %>%
    dplyr::group_by(attribute.confidentiality.data_disclosure.Yes) %>%
    dplyr::count(timeline.incident.year) %>%
    dplyr::ungroup() %>%
    dplyr::rename(breach = attribute.confidentiality.data_disclosure.Yes) %>%
    dplyr::mutate(breach = ifelse(breach, "Breach", "Incident")) %>%
  ggplot2::ggplot() + 
    ggplot2::geom_bar(ggplot2::aes(x=timeline.incident.year, y=n, group=breach, fill=breach), stat="identity") + 
    ggplot2::labs(title="VCDB Breaches and Incidents by Incident Year", x="Count", y="Year") +
    ggplot2::scale_x_continuous(expand=c(0,0), limits=c(2003, 2018)) +
    ggplot2::scale_y_continuous(expand=c(0,0)) +
    ggplot2::scale_fill_brewer() +
    ggplot2::theme_minimal() +
    ggplot2::theme(
        panel.grid.major.x = ggplot2::element_blank(),
        panel.grid.minor.x = ggplot2::element_blank(),
        panel.grid.minor.y = ggplot2::element_blank()
    )

plot of chunk yearly

vcdb %>%
    verisr::getenumCI("action", by="asset.variety") %>%
    dplyr::filter(!is.na(n)) %>%
    dplyr::mutate(by = stringr::str_sub(by, 15)) %>%
  ggplot2::ggplot() +
    ggplot2::geom_tile(ggplot2::aes(x=enum, y=by, fill=x)) +
    ggplot2::geom_text(ggplot2::aes(x=enum, y=by, label=x)) +
    ggplot2::scale_fill_gradient2() +
    ggplot2::theme_void() +
    ggplot2::theme(
        axis.text = ggplot2::element_text(),
        axis.text.x = ggplot2::element_text(hjust=1, angle=90)
    )

plot of chunk a2grid

Index

  • vcdb_diff.json - An update to the verisc.json schema file to produce the schema file used for the vcdb
  • vcdb_diff-labels.json - An update to the verisc-labels.json labels file to produce the vcdb labels file
  • vcdb.json - The vcdb schema file
  • vcdb-labels.json - The vcdb labels file
  • vcdb-merged.json - The full schema, combining the schema file and enumerations from the labels file.
  • vcdb-enum.json - A json file containing just the enumerations from the schema.
  • vcdb-keynames-real.txt - A text file containing the keys in the vcdb schema.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].