All Projects → CERT-Polska → Hfinger

CERT-Polska / Hfinger

Licence: gpl-3.0
Hfinger - fingerprinting HTTP requests

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Hfinger

R4ds
📖 R for data import/export , clean, wrangling, exploration, visualization, & analysis with R https://xiangyunhuang.github.io/r4ds/
Stars: ✭ 19 (-69.84%)
Mutual labels:  network-analysis
Sst Elements
SST Architectural Simulation Components and Libraries
Stars: ✭ 36 (-42.86%)
Mutual labels:  network-analysis
Pydata Networkx
A short tutorial on network analysis using Game of Thrones, US Airports and Python!
Stars: ✭ 50 (-20.63%)
Mutual labels:  network-analysis
Kirby Sri
Subresource integrity hashing & cache-busting static assets for Kirby
Stars: ✭ 9 (-85.71%)
Mutual labels:  fingerprinting
Python Modularity Maximization
Python implementation of Newman's spectral methods to maximize modularity.
Stars: ✭ 30 (-52.38%)
Mutual labels:  network-analysis
Graphrole
Automatic feature extraction and node role assignment for transfer learning on graphs (ReFeX & RolX)
Stars: ✭ 38 (-39.68%)
Mutual labels:  network-analysis
Pgrouting
Repository contains pgRouting library. Development branch is "develop", stable branch is "master"
Stars: ✭ 804 (+1176.19%)
Mutual labels:  network-analysis
Ios Sdk
AppSpector is a debugging service for mobile apps
Stars: ✭ 56 (-11.11%)
Mutual labels:  network-analysis
Workshops
Public materials for Statnet Workshops
Stars: ✭ 34 (-46.03%)
Mutual labels:  network-analysis
Ndmtk
Network Discovery and Management Toolkit
Stars: ✭ 42 (-33.33%)
Mutual labels:  network-analysis
Netgraph
A cross platform http sniffer with a web UI
Stars: ✭ 852 (+1252.38%)
Mutual labels:  network-analysis
P0f3 Api Py
Python client for p0f3 API
Stars: ✭ 12 (-80.95%)
Mutual labels:  network-analysis
Netsci Project
Network Analysis for Financial Markets
Stars: ✭ 39 (-38.1%)
Mutual labels:  network-analysis
Pulse
Network Inspector for Apple Platforms
Stars: ✭ 841 (+1234.92%)
Mutual labels:  network-analysis
Pattern
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Stars: ✭ 8,112 (+12776.19%)
Mutual labels:  network-analysis
Rnn Rbm In Theano
An implementation of RNN-RBM & GBRBM.
Stars: ✭ 6 (-90.48%)
Mutual labels:  network-analysis
Nettacker
Automated Penetration Testing Framework
Stars: ✭ 982 (+1458.73%)
Mutual labels:  network-analysis
Netdiffuser
netdiffuseR: Analysis of Diffusion and Contagion Processes on Networks
Stars: ✭ 57 (-9.52%)
Mutual labels:  network-analysis
Ppd599
USC urban data science course series with Python and Jupyter
Stars: ✭ 1,062 (+1585.71%)
Mutual labels:  network-analysis
Serviceping
A ping like utility for tcp services
Stars: ✭ 41 (-34.92%)
Mutual labels:  network-analysis

Hfinger - fingerprinting HTTP requests

Tool for fingerprinting HTTP requests of malware. Based on Tshark and written in Python3. Working prototype stage :-)

It's main objective is to provide a representation of malware requests in a shorter form than printing whole request, but still human interpretable. This representation should be unique between malware families, what means that any fingerprint should be seen only for one particular family.

An academic paper accompanies work on this tool, describing, for example, motivation of design choices. It will be published here after peer-review process.

The idea

Basic assumption of this project is that HTTP requests of different malware families are more or less unique, so they can be fingerprinted to provide some sort of identification. Hfinger retains information about structure and values of some headers to provide means for further analysis. For example grouping of similar requests - at this moment it is still work in progress.

After analysis of malware's HTTP requests and headers, some parts of requests were identified as being most distinctive. These include:

  • Request method
  • Protocol version
  • Header order
  • Popular headers' values
  • Payload length, entropy and presence of non-ASCII characters

Additionally, some standard features of request URL were also considered. All these parts were translated into set of features, described in details here.

The above features are translated into varying length representation, which is the actual fingerprint. Depending on report mode, different features are used to fingerprint requests. More information on these modes is presented below. Feature selection process will be described in the upcoming academic paper.

Installation

At this moment hfinger is distributed only via this repository. Tshark required before installation - tested on Xubuntu 20.04 LTS with tshark package in version 3.2.3.

Please note, that as with any PoC, you should run it in a python virtual environment. Its setup is not covered here, but you can try this tutorial.

Hfinger installation:

  1. Download repository.
  2. Unpack it to a chosen location.
  3. In terminal, change directory to the main catalogue of the unpacked repo.
  4. Enable venv
  5. Run python3 setup.py install
  6. Hfinger should be installed and ready to use.

Usage

Calling the tool from a command line:

usage: hfinger.py [-h] (-f FILE | -d DIR) [-o output_path] [-m {0,1,2}]

Hfinger - fingerprinting HTTP requests stored in pcap files

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  Read single pcap file
  -d DIR, --directory DIR
                        Read pcap files from directory DIR
  -o output_path, --output-path output_path
                        Path to the output directory
  -m {0,1,2}, --mode {0,1,2}
                        Fingerprint report mode. 0 - optimal (default), 1 -
                        informative, 2 - all features

You must provide path to a pcap file (-f) or directory (-d) with pcap files. The output is in JSON format. It will be printed to standard output or to provided directory (-o) using name of the source file. For example output of the command:

python3 hfinger.py -f example.pcap -o /tmp/pcap

will be saved to:

/tmp/pcap/example.pcap.json

When any issues are encountered, for example finding unknown header, they are printed to standard error output, so please monitor it.

Fingerprint creation

An example of a POST request is presented below.

POST /dir1/dir2?var1=val1 HTTP/1.1
Host: 127.0.0.1:8000
Accept: */*
User-Agent: My UA
Content-Length: 9
Content-Type: application/x-www-form-urlencoded

misc=test

The fingerprint created by hfinger in the default report mode for this request is presented below. Particular features of the fingerprint are separated using |. They are described below in the order of appearance in the fingerprint. example

Firstly URL features are extracted:

  • URL length represented as a logarithm base 10 of the length,
  • extension of the requested file, but only if it is on a list of known extensions in hfinger/configs/extensions.txt (in the example it is empty as the request does not contain it),
  • number of variables in the URL (in the example there as only one variable var1).

Secondly header structure features are analyzed:

  • request method encoded as first two letters of the method (PO),
  • protocol version encoded as an integer (1 for version 1.1, 0 for version 1.0, and 9 for version 0.9),
  • and popular headers and their values,

When analyzing popular headers, the request is checked if they appear in it. These headers are:

  • Connection
  • Accept-Encoding
  • Content-Encoding
  • Cache-Control
  • TE
  • Accept-Charset
  • Content-Type
  • Accept
  • Accept-Language
  • User-Agent

When header is found in the request, its value is checked against table of typical values to create pairs of header_name_representation:value_representation. The name of the header is encoded according to hfinger/configs/headerslow.json and value is encoded according to tables stored in hfinger/configs directory. In the above example Accept is encoded as ac and its value */* as as-as (asterisk-asterisk), giving ac:as-as. The pairs are inserted into fingerprint in order of appearance in the request and are delimited using /. If the header value cannot be found in in the encoding table it is hashed using FNV1a hash.
Also if it is composed of multiple values, they are tokenized to provide list of values delimited with ,, for example Accept: */*, text/* would give ac:as-as,te-as. However, at this point of development, if the header value contain "quality value" tag (q=), then the whole value is hashed with FNV. Finally values of User-Agent and Accept-Language headers are directly hashed using FNV.

Some of the hfinger report modes provide a list of headers in order of appearance in the request. The list is created using similar method as described above. The header names are encoded using hfinger/configs/headerslow.json and separated with ,. If the header name does not start with upper case letter (or any of its parts when analyzing compound headers such as Accept-Encoding), then encoded representation is prefixed with !. If the header name is not on the list of known headers it is hashed using FNV.

Finally, in the payload features, length of the payload is represented as a base 10 logarithm of the actual payload length rounded to an integer.

Please note that the above description of fingerprint creation covers the default feature set. Hfinger is equipped with other feature sets, which can be chosen depending on the required amount of information. They are available via report modes switch.

Report modes

Hfinger operates in three fingerprint report modes, which differ in information extracted from requests:

  • optimal,
  • informative,
  • all features.

The modes were chosen in order to optimize hfinger capabilities to uniquely identify malware families versus its capability to generalize information about the requests. Description of features is provided here. The all features mode provide the most unique fingerprints, however it produces bigger number of fingerprints than other two modes. The optimal mode provides slightly less unique fingerprints, but also significantly reduces the number of fingerprints. The informative mode is similar to optimal regarding uniqueness, however it produces more fingerprints for the price of giving more information about URL, headers and payload.

The modes consists of feature sets:

  • optimal (the default - option 0):
    • URL length represented as a base 10 logarithm of the actual length,
    • extension of the requested file,
    • number of variables in the URL,
    • request method,
    • protocol version,
    • popular headers and their values,
    • payload length represented a base 10 logarithm of the actual length rounded to integer,
  • informative (option 1):
    • URL length represented as a base 10 logarithm of the actual length,
    • number of directories in the URL,
    • extension of the requested file,
    • number of variables in the URL,
    • request method,
    • protocol version,
    • order of headers,
    • popular headers and their values,
    • payload length represented a base 10 logarithm of the actual length rounded to integer,
    • and payload entropy represented as an integer,
  • all features (option 2):
    • URL length represented as a base 10 logarithm of the actual length,
    • number of directories in the URL,
    • average length of directory in the URL, represented as a base 10 logarithm of actual average length,
    • extension of the requested file,
    • length of the variable part of the URL, represented as a base 10 logarithm of the length and rounded to an integer,
    • number of variables in the URL,
    • average value length, represented as base 10 logarithm of the actual average value length rounded to an integer,
    • request method,
    • protocol version,
    • order of headers,
    • popular headers and their values,
    • presence of non-ASCII characters (with "N" when such characters are present and "A" when they are not),
    • payload length represented a base 10 logarithm of the actual length,
    • payload entropy.

Co-financed by the Connecting Europe Facility by of the European Union

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].