All Projects → s0md3v → Entropy

s0md3v / Entropy

Entropy is a (prototype) WAF driven by maths.

Programming Languages

python
139335 projects - #7 most used programming language

Its just a prototype of a WAF core which makes of mathematical algorithms to determine if the input is malicious.
Play with it by entering payloads (prefer XSS for now) and let me know about your experience.

Detection Methods

  • Entropy
  • Shannon Entropy
  • Levenshtein Distance
  • Special Character Ratio
  • Some regex (I don't think its necessary but still...)

How it works?

Entropy gets it name from a scientific term "Entropy".

Entropy is basically the measure of randomness of something

But how does it apply to detection of malicious payloads? Take a look at these two strings and their entropy

String: black pens & red caps
Entropy: 0.000302964443769

String: <svg onload=alert()>
Entropy: 53.4044125463

Does it make sense now?
Let me introduce you to all the algorithms used now

Entropy

Here's how we calculate entropy:

log(score)/log(2)) * len(payload)

Where score is the number of special characters in the string.
Higher the entropy, higher is the probablity of string to be malicious.

Shannon Entropy
for number in range(256):
    result = float(payload.count(chr(number)))/len(payload)
    if result != 0:
        entropy = entropy - result * log(result, 2)

For a better understanding take a look the source code.
But what shannon entropies does is that considers patterns unlike the normal entropy.
Take a look at these three strings and their shannon entropies:

String: s0md3v
Entropy: 2.58496250072

String: ../../../../
Entropy: 0.918295834054

String: //////////////
Entropy: 0.0

The first string has no repeating pattern and hence has the highest value of shannon entropy while the second string however has a repeating pattern which lowers it entropy to nearly one. The last string only consists a single character and has no randomness and hence has 0 shannon entropy.
So again, higher the shannon entropy, higher is the probablity of string to be malicious.

Special Char ratio
(len(payload) - score) <= len(payload)/2

Where score is again the number of special characters in the string.
We are just checking if the string's 50% part or more is made of special characters.
Higher is the special char ration, higher is the probablity of string to be malicious.

Levenshtein Distance

Most of the WAFs check if the input matches a regex or payload in their signature database. But instead of looking for same payloads in signature database, Entropy looks for similar payloads using Levenshtein Distance algorithm. Instead of reinventing the wheel and writing the algorithm myself, I used FuzzyWuzzy module but when this project will be further developed, I may use my own code.

Thats all folks.

License & Other Stuff

This project has no license and in that case, according to international standards you are not allowed to modify or redistribute it but as its hosted on Github, you are free to view and use the code ;)
Do you think this is a great idea? Do you know something which can make it better? Mail me at s0md3v(at)gmail(dot)com

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].