All Projects → YangzlTHU → IStego100K

YangzlTHU / IStego100K

Licence: other
IStego100K: Large-scale Image Steganalysis Dataset

Programming Languages

python
139335 projects - #7 most used programming language
matlab
3953 projects
CMake
9771 projects
C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to IStego100K

Scylla
The Simplistic Information Gathering Engine | Find Advanced Information on a Username, Website, Phone Number, etc.
Stars: ✭ 154 (+266.67%)
Mutual labels:  information-security
Detectionlab
Automate the creation of a lab environment complete with security tooling and logging best practices
Stars: ✭ 3,237 (+7607.14%)
Mutual labels:  information-security
Information Security Handbook
No description or website provided.
Stars: ✭ 22 (-47.62%)
Mutual labels:  information-security
Rebel Framework
Advanced and easy to use penetration testing framework 💣🔎
Stars: ✭ 183 (+335.71%)
Mutual labels:  information-security
Werdlists
⌨️ Wordlists, Dictionaries and Other Data Sets for Writing Software Security Test Cases
Stars: ✭ 216 (+414.29%)
Mutual labels:  information-security
awesome-Steganalysis-project
🍿️some awesome Steganalysis
Stars: ✭ 25 (-40.48%)
Mutual labels:  steganalysis
Docker Misp
Automated Docker MISP container - Malware Information Sharing Platform and Threat Sharing
Stars: ✭ 148 (+252.38%)
Mutual labels:  information-security
chmod-stego
A PoC on passing data through UNIX file privilege bits (RWX Triplets)
Stars: ✭ 23 (-45.24%)
Mutual labels:  stego
Armor
Armor is a simple Bash script designed to create encrypted macOS payloads capable of evading antivirus scanners.
Stars: ✭ 228 (+442.86%)
Mutual labels:  information-security
gosint
Gosint is a distributed asset information collection and vulnerability scanning platform
Stars: ✭ 344 (+719.05%)
Mutual labels:  information-security
Ail Framework
AIL framework - Analysis Information Leak framework
Stars: ✭ 191 (+354.76%)
Mutual labels:  information-security
Secure Desktop
Anti-keylogger/anti-rat application for Windows
Stars: ✭ 201 (+378.57%)
Mutual labels:  information-security
7uring
An advanced cryptography tool for hashing, encrypting, encoding, steganography and more.
Stars: ✭ 15 (-64.29%)
Mutual labels:  steganalysis
Tjcs Course
💡 同济大学计算机科学与技术、信息安全专业课程资源共享仓库。含部分科目介绍、报告模板、实验工具等内容。期待更多课程加入……
Stars: ✭ 154 (+266.67%)
Mutual labels:  information-security
netizenship
a commandline #OSINT tool to find the online presence of a username in popular social media websites like Facebook, Instagram, Twitter, etc.
Stars: ✭ 33 (-21.43%)
Mutual labels:  information-security
Netpwn
Tool made to automate tasks of pentesting.
Stars: ✭ 152 (+261.9%)
Mutual labels:  information-security
AperiSolve
Steganalysis web platform
Stars: ✭ 268 (+538.1%)
Mutual labels:  steganalysis
Scylla
The Simplistic Information Gathering Engine | Find Advanced Information on a Username, Website, Phone Number, etc.
Stars: ✭ 424 (+909.52%)
Mutual labels:  information-security
vimana-framework
Vimana is an experimental security framework that aims to provide resources for auditing Python web applications.
Stars: ✭ 47 (+11.9%)
Mutual labels:  information-security
CAECNNcode
some code for deep steganalysis
Stars: ✭ 0 (-100%)
Mutual labels:  steganalysis

IStego100K

IStego100K: Large-scale Image Steganalysis Dataset, mixed with various steganographic algorithms, embedding rates, and quality factors.

In order to promote the rapid development of image steganalysis technology, in this work, we construct and release a multivariable large-scale image steganalysis dataset called IStego100K. It contains 208,104 images with the same size of 1024*1024. Among them, 200,000 images (100,000 cover-stego image pairs) are divided as the training set and the remaining 8,104 as testing set. In addition, we hope that IStego100K can help researchers further explore the development of universal image steganalysis algorithms, so we try to reduce limits on the images in IStego100K. For each image in IStego100K, the quality factors is randomly set in the range of 75-95, the steganographic algorithm is randomly selected from three well-known steganographic algorithms, which are J-uniward, nsF5 and UERD, and the embedding rate is also randomly set to be a value of 0.1-0.4. In addition, considering the possible mismatch between training samples and test samples in real environment, we add a test set (DS-Test) whose source of samples are different from the training set. We hope that this test set can help to evaluate the robustness of steganalysis algorithms. We tested the performance of some latest steganalysis algorithms on IStego100K, with specific results and analysis details in the experimental part. We hope that the IStego100K dataset will further promote the development of universal image steganalysis technology

If you used this dataset in your work, please consider to cite it in the following format:

@inproceedings{yangzl2019IStego100K,
  title         =   {IStego100K: Large-scale Image Steganalysis Dataset},
  author        =   {Yang, Zhongliang and Wang, Ke and Ma, Sai and Huang, Yongfeng and Kang, Xiangui and Zhao, Xianfeng},
  booktitle     =   {International Workshop on Digital Watermarking},
  year          =   {2019},
  organization  =   {Springer}
}

Full PDF can be downloaded from arxiv

Important!!!

Considering that the data set is really large (almost 50G), which is difficult to download. We are now considering sharing this data set with everyone by express delivery. For specific delivery methods and delivery addresses, you can contact me by email: [email protected]

We look forward to working with you to promote the development of image steganography and steganalysis!

Download

Train Set

100,000 pairs of cover and stego images (200K in total), origin images were downloaded from Unsplash

Same-Source Test Set

Marked as SS-Test in the paper. 8104 images with cover/stego labels (not in pair), origin images were downloaded from Unsplash

Different-Source Test Set

Marked as DS-Test in the paper.10000 images with cover/stego labels (not in pair), origin images were shot on different mobile devices.

Note: The number of images is 11809 in the paper, but we removed some low quality images before uploading.

Alternate links

For those who cannot access Google in Mainland China, try this Baidu Cloud Disk link:

Detailed Parameters

We also provide detailed parameters for each image here.

The parameter files are organized as follows:

parameters={
    "000001.jpg":{ # parameters for stego-file    
        "quality": 95,  # quality factor
        "rate": 0.4, # embedding rate (payload)
        "steg_algorithm": "nsf5" # steganographic algorithm
     },
     "000002.jpg":{ # parameters for cover-file
       "quality": 90 # quality factor
     }
}
Note: For the training set, cover files and stego files are in pairs with same quality factors, so we omitted the parameter file for cover files in training set.

Steganographic Algorithms

We use the following steganographic algorithms for our dataset:

  • nsF5: J. Fridrich, T. Pevný, and J. Kodovský, Statistically undetectable JPEG steganography: Dead ends, challenges, and opportunities. In J. Dittmann and J. Fridrich, editors, Proceedings of the 9th ACM Multimedia & Security Workshop, pages 3–14, Dallas, TX, September 20–21, 2007. [pdf]
  • J-UNIWARD: V. Holub, J. Fridrich, T. Denemark, Universal Distortion Function for Steganography in an Arbitrary Domain. EURASIP Journal on Information Security, (Section:SI: Revised Selected Papers of ACM IH and MMS 2013), 2014(1).[pdf]
  • UERD: L. Guo, J. Ni, W. Su, C. Tang, and Y.Q. Shi. Using statistical image model for jpeg steganography: uniform embedding revisited. IEEE Transactions on Information Forensics & Security, 10(12), 2669-2680, 2015. [pdf]
  • HILL_GINA: Y. Wang, W. Zhang, W. Li, X. Yu and N. Yu, Non-Additive Cost Functions for Color Image Steganography Based on Inter-Channel Correlations and Differences. IEEE Transactions on Information Forensics & Security. PP. 1-1. 10.1109/TIFS.2019.2956590. . [pdf]

For more details, including codes and tutorial, please refer to our Steganography page.

Steganalysis Algorithms

We apply the following steganalysis algorithms for dataset evaluation:

  • DCTR: V. Holub and J. Fridrich, Low Complexity Features for JPEG Steganalysis Using Undecimated DCT, IEEE Transactions on Information Forensics and Security, to appear. [code] [pdf]
  • GFR: X. Song, F. Liu, C. Yang, X. Luo and Y. Zhang, Steganalysis of Adaptive JPEG Steganography Using 2D Gabor Filters, Proceedings of the 3rd ACM Workshop on Information Hiding and Multimedia Security. ACM, 2015. [code] [pdf]
  • SRNet: M. Boroumand,M. Chen,and J. Fridrich. Deep Residual Network for Steganalysis of Digital Images, IEEE Transactions on Information Forensics and Security. PP. 1-1. 10.1109/TIFS.2018.2871749, 2018. [code] [pdf]
  • XuNet: G. Xu. Deep convolutional neural network to detect j-uniward, Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. ACM, 2017. [code] [pdf]

For more details, including codes and tutorial, please refer to our Steganalysis page.

Overall Results

Dataset Methods Acc(%) P(%) R(%) F1(%)
SS-Test DCTR
GFR
SRNet
XuNet
71.34
66.26
-
-
79.72
69.58
-
-
57.23
57.97
-
-
66.63
63.25
-
-
DS-Test DCTR
GFR
SRNet
XuNet
56.95
59.12
-
-
55.50
61.61
-
-
70.11
48.42
-
-
61.95
54.22
-
-
Note: We trained SRNet and XuNet on a single GPU (GTX 1080Ti), and found that they are hardly to converge on IStego100K.

Results for Different Steganography Algorithms

Test Set Steganalysis Steganography Acc(%) P(%) R(%) F1(%)
SS-Test DCTR UERD
nsF5
J-uniward
71.77
84.44
57.73
79.75
85.10
67.58
58.36
83.51
29.71
67.40
84.30
41.27
SS-Test GFR UERD
nsF5
J-uniward
68.47
71.61
58.81
71.34
72.72
62.91
61.75
69.18
42.92
66.20
70.91
51.02
DS-Test DCTR UERD
nsF5
J-uniward
53.96
62.28
51.67
53.35
60.56
51.43
63.06
87.59
59.83
57.80
71.61
55.31
DS-Test GFR UERD
nsF5
J-uniward
56.05
67.24
54.59
58.40
68.21
56.62
42.09
64.58
39.26
48.92
66.35
46.37

Results for Different Steganography Algorithms

Test Set Steganalysis Payload Acc(%) P(%) R(%) F1(%)
SS-Test DCTR 0.1
0.2
0.3
0.4
58.55
71.43
76.30
79.55
67.84
80.19
82.22
83.74
32.51
56.90
67.11
73.35
43.96
66.57
73.90
78.20
SS-Test GFR 0.1
0.2
0.3
0.4
55.87
63.51
70.83
75.71
59.40
67.98
72.04
74.89
37.10
51.08
67.89
76.75
45.67
58.33
69.95
72.05
DS-Test DCTR 0.1
0.2
0.3
0.4
52.86
56.21
58.56
60.17
52.42
54.99
56.53
57.72
61.90
68.40
74.11
76.05
56.77
60.97
64.13
65.63
DS-Test GFR 0.1
0.2
0.3
0.4
52.29
56.66
62.15
65.40
53.42
58.87
64.65
67.18
35.79
44.19
53.65
60.22
42.86
50.49
58.63
63.51

Results for Different Quality Factors on SS-Test

Steganalysis QF Acc(%) P(%) R(%) F1(%)
DCTR 75
80
85
90
95
75.23
71.50
74.09
69.04
62.12
85.63
86.48
84.34
76.09
66.41
60.64
61.56
59.18
55.54
49.05
71.00
71.82
69.55
64.21
56.43
GFR 75
80
85
90
95
70.08
69.91
68.42
64.67
58.30
75.06
74.98
71.54
67.02
59.76
60.15
59.75
61.17
57.75
50.82
66.78
66.50
65.95
62.04
64.93

More Details

For more details such as pre-processing, data distribution, and steganalysis baselines, please take a look at the arxiv.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].