All Projects → 0xsha → Sweetie Data

0xsha / Sweetie Data

Licence: other
This repo contains logstash of various honeypots

Projects that are alternatives of or similar to Sweetie Data

YAFRA
YAFRA is a semi-automated framework for analyzing and representing reports about IT Security incidents.
Stars: ✭ 22 (-86.5%)
Mutual labels:  malware-research, threatintel, threat-intelligence
Malware Feed
Bringing you the best of the worst files on the Internet.
Stars: ✭ 69 (-57.67%)
Mutual labels:  threat-intelligence, malware-research, threatintel
Python Iocextract
Defanged Indicator of Compromise (IOC) Extractor.
Stars: ✭ 300 (+84.05%)
Mutual labels:  threat-intelligence, malware-research, threatintel
awesome-malware-analysis
Defund the Police.
Stars: ✭ 9,181 (+5532.52%)
Mutual labels:  malware-research, threatintel, threat-intelligence
Threatingestor
Extract and aggregate threat intelligence.
Stars: ✭ 439 (+169.33%)
Mutual labels:  threat-intelligence, malware-research, threatintel
Php Ml
PHP-ML - Machine Learning library for PHP
Stars: ✭ 7,900 (+4746.63%)
Mutual labels:  data-science, dataset
Vulnerability Data Archive
With the hope that someone finds the data useful, we periodically publish an archive of almost all of the non-sensitive vulnerability information in our vulnerability reports database. See also https://github.com/CERTCC/Vulnerability-Data-Archive-Tools
Stars: ✭ 63 (-61.35%)
Mutual labels:  threat-intelligence, threatintel
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-51.53%)
Mutual labels:  data-science, dataset
Patrowlhears
PatrowlHears - Vulnerability Intelligence Center / Exploits
Stars: ✭ 89 (-45.4%)
Mutual labels:  threat-intelligence, threatintel
Sysmontools
Utilities for Sysmon
Stars: ✭ 903 (+453.99%)
Mutual labels:  threat-intelligence, threatintel
Phishing catcher
Phishing catcher using Certstream
Stars: ✭ 1,232 (+655.83%)
Mutual labels:  threat-intelligence, threatintel
Ml Pyxis
Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
Stars: ✭ 93 (-42.94%)
Mutual labels:  data-science, dataset
Otx misp
Imports Alienvault OTX pulses to a MISP instance
Stars: ✭ 45 (-72.39%)
Mutual labels:  threat-intelligence, threatintel
Qri
you're invited to a data party!
Stars: ✭ 1,003 (+515.34%)
Mutual labels:  data-science, dataset
Dataconfs
A list of conferences connected with data worldwide.
Stars: ✭ 36 (-77.91%)
Mutual labels:  data-science, dataset
Openml R
R package to interface with OpenML
Stars: ✭ 81 (-50.31%)
Mutual labels:  data-science, dataset
Threatbus
🚌 The missing link to connect open-source threat intelligence tools.
Stars: ✭ 139 (-14.72%)
Mutual labels:  threat-intelligence, threatintel
Dbg Pds
Deutsche Boerse's Financial Trading Public Data Set
Stars: ✭ 124 (-23.93%)
Mutual labels:  data-science, dataset
Coffee Quality Database
Building the Coffee Quality Institute Database
Stars: ✭ 141 (-13.5%)
Mutual labels:  data-science, dataset
Intelowl
Intel Owl: analyze files, domains, IPs in multiple ways from a single API at scale
Stars: ✭ 2,114 (+1196.93%)
Mutual labels:  threat-intelligence, threatintel

Sweetie data

This repo contains data of various honeypots mostly gathered with awsome t-pot! . Want to know what malicious actors are up to? Do you believe data is the only source of truth? There you have it. Put on your sherlock hat and find the crime. This repo contains three months of data from 12/19 to 2/20.

Who can uses these data

  • Security researchers
  • Malware analysts
  • Threat intelligence companies
  • Universities
  • Data scientists
  • Anyone else interested

Motivation

This research was a side project mainly motivated by understanding the current state of attacks in the wild.But as an individual, I have minimal resources and time so, I can't afford to scale and maintain, so I decide to take the servers down and share the data with the community. ♥

How to use it

Folder structure

Here is the list of honeypots and analyzers used during this experiment.

  • adbhoney
  • cowrie
  • dionaea
  • elasticpot
  • heralding
  • medpot
  • p0f
  • suricata
  • tanner

Each honeypot has a log folder. Most of the logs are JSON or SQLite. Some honeypots contain other data, such as sample files.

Payloads

As mentioned, some honeypots also collect files, for example, adbhoney and cowrie. You can find file archives in the root directory of each honeypot.

file samples:

380c4553681d76dca812fd679068ff42645363cf3aef11afe036252051725c7a.raw: ELF 32-bit MSB executable, Motorola m68k, 68020, version 1 (SYSV), statically linked, stripped
3c0ac166b8511744430f4869b744beeef873c9a3c857e8d6607262a8d156f796.raw: ELF 64-bit MSB executable, MIPS, MIPS64 version 1 (SYSV), statically linked, stripped
590dbe0f8c6977d808cdc66d6e46cb6579c0d42d520a74c8a27210d3b97d9930.raw: ELF 32-bit MSB executable, SPARC, version 1 (SYSV), statically linked, stripped
608ee011537005f368c9731f4c4dee6a247b620cde52908ed0678df28c617971.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=ba88e16fed564b3e4d7aba0787c6fbab52471e50, stripped
615b1640e5ce651bfab71ee6be1244183ae244576a9eca3073dfe444eba072ad.raw: ELF 32-bit LSB executable, ARM, version 1 (ARM), statically linked, stripped
63946c28efa919809c03be75a3937c4be80589a9df79cd1be72037d493b70857.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=0c9b76185c23d668c7b4f1bdba94dfb94a9bed7a, stripped
755286a4739343aa7f64227bcad34384df8d1602ac175b94a44068d51f237eb7.raw: ELF 32-bit LSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, stripped
76ae6d577ba96b1c3a1de8b21c32a9faf6040f7e78d98269e0469d896c29dc64.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=0af1f8be964f83d69ec4163415260349fa6cede8, stripped
7a48c93c5cb63a09505a009260d1cca8203285e0c1c6ff5b0df9cbb470820865.raw: Java archive data (JAR)
7a656791b445fff02ac6e9dd1081cc265db935476a9ee71139cb6aef52102e2b.raw: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=53abe9912786eea2bd09f4af4d634454777556e5, stripped
9d8bf69ebedb94061469734f1486c0da01c1e566bf7be83ce3779aa1a0b54371.raw: ELF 32-bit LS

You can use VirusTotal API for bulkscan.

Visualation

Like T-pot you can use elastic stack and kiabana dashboard .

kibana kibana kibana

Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack so you can do anything from tracking query load to understanding the way requests flow through your apps.

It will do a fantastic job of making sense of these data, but at the same time, these data are too detail-oriented, so for the best results, you can have to role your-own analyzer.

Extra miles

For example, here is what I wrote to extract possible web application exploits from Suricata logs it uses pandas to read large JSON files then filter the data frame with an entry contain HTTP next. It will check if there is a file in url.

# (C) 2020 0xSha <[email protected]>
#
# $Id: suricata_http_path_filter.py Sun Feb 23 20:51:13 +07 2020 0xSha $
#

import pandas as pd
import json

def list_to_dict(lst):
	it = iter(lst)
	dic_result = dict(zip(it, it))
	return dic_result


results = []
df = pd.read_json('/data/suricata/log/eve.json',lines=True)
filtered_df = df[df['http'].notnull()]

f =  pd.DataFrame(filtered_df['http'])
for i in f.iterrows():
	if "url" in i[1].to_dict()['http']:
		if i[1].to_dict()['http']['url'] != "/":
			results.append(i[1].to_dict()['http'])

sorted_results = [sorted(d.items()) for d in results]

unique_results = list(map(json.loads,set(map(json.dumps, sorted_results))))

with open("/suricata_http_paths.json" , "w") as suricata_out_http:
	for item in unique_results:
		concat_list = [j for i in item for j in i]
		suricata_out_http.writelines( str(  json.dumps(list_to_dict(concat_list) )))

The output is a cleaned JSON file. Here is an example of an exciting line.

{"hostname": "http_content_type": "text/html", "http_method": "GET", "http_port": 80, "http_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "length": 3348, "protocol": "HTTP/1.1", "status": 404, "url": "/index.php?s=/Index/\\think\\app/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP"}

As we see, we successfully extracted an exploit for thinkphp.

Findings

There are too much data and endless possibility of extraction and analysis, but here are a few things that come into my mind when I want to draw a conclusion.

  • The number of malicious packets transferred a day is unbeliveble.
  • Fortunately, a big chunk of malicious actors are script kiddies, but somehow they still score in 2020
  • Very first computer attacks like brute forces are still a thing in 2020 when it comes to protocols like VNC and SQL SERVER.
  • Mixing security, machine learning, and data science can bring "real" next-generation defense results.

How to contribute

  • Add a pull request and share your logstash.
  • Share it with whomever you belive can use it
  • Do the extra work and share your findings with community ♥
  • ko-fi

References

Any ideas ?

  • me [at] 0xsha.io
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].