Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → 0xsha → Sweetie Data

0xsha / Sweetie Data

Licence: other

This repo contains logstash of various honeypots

Labels

data-science dataset samples logstash threat-intelligence malware-research honeypot threatintel

Projects that are alternatives of or similar to Sweetie Data

YAFRA

YAFRA is a semi-automated framework for analyzing and representing reports about IT Security incidents.

Stars: ✭ 22 (-86.5%)

Mutual labels: malware-research, threatintel, threat-intelligence

Malware Feed

Bringing you the best of the worst files on the Internet.

Stars: ✭ 69 (-57.67%)

Mutual labels: threat-intelligence, malware-research, threatintel

Python Iocextract

Defanged Indicator of Compromise (IOC) Extractor.

Stars: ✭ 300 (+84.05%)

Mutual labels: threat-intelligence, malware-research, threatintel

awesome-malware-analysis

Defund the Police.

Stars: ✭ 9,181 (+5532.52%)

Mutual labels: malware-research, threatintel, threat-intelligence

Threatingestor

Extract and aggregate threat intelligence.

Stars: ✭ 439 (+169.33%)

Mutual labels: threat-intelligence, malware-research, threatintel

Php Ml

PHP-ML - Machine Learning library for PHP

Stars: ✭ 7,900 (+4746.63%)

Mutual labels: data-science, dataset

Vulnerability Data Archive

With the hope that someone finds the data useful, we periodically publish an archive of almost all of the non-sensitive vulnerability information in our vulnerability reports database. See also https://github.com/CERTCC/Vulnerability-Data-Archive-Tools

Stars: ✭ 63 (-61.35%)

Mutual labels: threat-intelligence, threatintel

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-51.53%)

Mutual labels: data-science, dataset

Patrowlhears

PatrowlHears - Vulnerability Intelligence Center / Exploits

Stars: ✭ 89 (-45.4%)

Mutual labels: threat-intelligence, threatintel

Sysmontools

Utilities for Sysmon

Stars: ✭ 903 (+453.99%)

Mutual labels: threat-intelligence, threatintel

Phishing catcher

Phishing catcher using Certstream

Stars: ✭ 1,232 (+655.83%)

Mutual labels: threat-intelligence, threatintel

Ml Pyxis

Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.

Stars: ✭ 93 (-42.94%)

Mutual labels: data-science, dataset

Otx misp

Imports Alienvault OTX pulses to a MISP instance

Stars: ✭ 45 (-72.39%)

Mutual labels: threat-intelligence, threatintel

Qri

you're invited to a data party!

Stars: ✭ 1,003 (+515.34%)

Mutual labels: data-science, dataset

Dataconfs

A list of conferences connected with data worldwide.

Stars: ✭ 36 (-77.91%)

Mutual labels: data-science, dataset

Openml R

R package to interface with OpenML

Stars: ✭ 81 (-50.31%)

Mutual labels: data-science, dataset

Threatbus

🚌 The missing link to connect open-source threat intelligence tools.

Stars: ✭ 139 (-14.72%)

Mutual labels: threat-intelligence, threatintel

Dbg Pds

Deutsche Boerse's Financial Trading Public Data Set

Stars: ✭ 124 (-23.93%)

Mutual labels: data-science, dataset

Coffee Quality Database

Building the Coffee Quality Institute Database

Stars: ✭ 141 (-13.5%)

Mutual labels: data-science, dataset

Intelowl

Intel Owl: analyze files, domains, IPs in multiple ways from a single API at scale

Stars: ✭ 2,114 (+1196.93%)

Mutual labels: threat-intelligence, threatintel

View All Similar Projects ➔

Sweetie data

This repo contains data of various honeypots mostly gathered with awsome t-pot! . Want to know what malicious actors are up to? Do you believe data is the only source of truth? There you have it. Put on your sherlock hat and find the crime. This repo contains three months of data from 12/19 to 2/20.

Who can uses these data

Security researchers
Malware analysts
Threat intelligence companies
Universities
Data scientists
Anyone else interested

Motivation

This research was a side project mainly motivated by understanding the current state of attacks in the wild.But as an individual, I have minimal resources and time so, I can't afford to scale and maintain, so I decide to take the servers down and share the data with the community. ♥

How to use it

Folder structure

Here is the list of honeypots and analyzers used during this experiment.

adbhoney
cowrie
dionaea
elasticpot
heralding
medpot
p0f
suricata
tanner

Each honeypot has a log folder. Most of the logs are JSON or SQLite. Some honeypots contain other data, such as sample files.

Payloads

As mentioned, some honeypots also collect files, for example, adbhoney and cowrie. You can find file archives in the root directory of each honeypot.

file samples:

380c4553681d76dca812fd679068ff42645363cf3aef11afe036252051725c7a.raw: ELF 32-bit MSB executable, Motorola m68k, 68020, version 1 (SYSV), statically linked, stripped
3c0ac166b8511744430f4869b744beeef873c9a3c857e8d6607262a8d156f796.raw: ELF 64-bit MSB executable, MIPS, MIPS64 version 1 (SYSV), statically linked, stripped
590dbe0f8c6977d808cdc66d6e46cb6579c0d42d520a74c8a27210d3b97d9930.raw: ELF 32-bit MSB executable, SPARC, version 1 (SYSV), statically linked, stripped
608ee011537005f368c9731f4c4dee6a247b620cde52908ed0678df28c617971.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=ba88e16fed564b3e4d7aba0787c6fbab52471e50, stripped
615b1640e5ce651bfab71ee6be1244183ae244576a9eca3073dfe444eba072ad.raw: ELF 32-bit LSB executable, ARM, version 1 (ARM), statically linked, stripped
63946c28efa919809c03be75a3937c4be80589a9df79cd1be72037d493b70857.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=0c9b76185c23d668c7b4f1bdba94dfb94a9bed7a, stripped
755286a4739343aa7f64227bcad34384df8d1602ac175b94a44068d51f237eb7.raw: ELF 32-bit LSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, stripped
76ae6d577ba96b1c3a1de8b21c32a9faf6040f7e78d98269e0469d896c29dc64.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=0af1f8be964f83d69ec4163415260349fa6cede8, stripped
7a48c93c5cb63a09505a009260d1cca8203285e0c1c6ff5b0df9cbb470820865.raw: Java archive data (JAR)
7a656791b445fff02ac6e9dd1081cc265db935476a9ee71139cb6aef52102e2b.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=53abe9912786eea2bd09f4af4d634454777556e5, stripped
9d8bf69ebedb94061469734f1486c0da01c1e566bf7be83ce3779aa1a0b54371.raw: ELF 32-bit LS

You can use VirusTotal API for bulkscan.

Visualation

Like T-pot you can use elastic stack and kiabana dashboard .

Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack so you can do anything from tracking query load to understanding the way requests flow through your apps.

It will do a fantastic job of making sense of these data, but at the same time, these data are too detail-oriented, so for the best results, you can have to role your-own analyzer.

Extra miles

For example, here is what I wrote to extract possible web application exploits from Suricata logs it uses pandas to read large JSON files then filter the data frame with an entry contain HTTP next. It will check if there is a file in url.

# (C) 2020 0xSha <[email protected]>
#
# $Id: suricata_http_path_filter.py Sun Feb 23 20:51:13 +07 2020 0xSha $
#

import pandas as pd
import json

def list_to_dict(lst):
	it = iter(lst)
	dic_result = dict(zip(it, it))
	return dic_result


results = []
df = pd.read_json('/data/suricata/log/eve.json',lines=True)
filtered_df = df[df['http'].notnull()]

f =  pd.DataFrame(filtered_df['http'])
for i in f.iterrows():
	if "url" in i[1].to_dict()['http']:
		if i[1].to_dict()['http']['url'] != "/":
			results.append(i[1].to_dict()['http'])

sorted_results = [sorted(d.items()) for d in results]

unique_results = list(map(json.loads,set(map(json.dumps, sorted_results))))

with open("/suricata_http_paths.json" , "w") as suricata_out_http:
	for item in unique_results:
		concat_list = [j for i in item for j in i]
		suricata_out_http.writelines( str(  json.dumps(list_to_dict(concat_list) )))

The output is a cleaned JSON file. Here is an example of an exciting line.

{"hostname": "http_content_type": "text/html", "http_method": "GET", "http_port": 80, "http_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "length": 3348, "protocol": "HTTP/1.1", "status": 404, "url": "/index.php?s=/Index/\\think\\app/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP"}

As we see, we successfully extracted an exploit for thinkphp.

Findings

There are too much data and endless possibility of extraction and analysis, but here are a few things that come into my mind when I want to draw a conclusion.

The number of malicious packets transferred a day is unbeliveble.
Fortunately, a big chunk of malicious actors are script kiddies, but somehow they still score in 2020
Very first computer attacks like brute forces are still a thing in 2020 when it comes to protocols like VNC and SQL SERVER.
Mixing security, machine learning, and data science can bring "real" next-generation defense results.

How to contribute

Add a pull request and share your logstash.
Share it with whomever you belive can use it
Do the extra work and share your findings with community ♥

References

Any ideas ?

me [at] 0xsha.io

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 163

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗