Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → adobe → Stringlifier

adobe / Stringlifier

Licence: other

Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.

Programming Languages

python

139335 projects - #7 most used programming language

python3

1442 projects

Labels

machine-learning pytorch api classification analysis clustering convolutional-networks tf-idf

Projects that are alternatives of or similar to Stringlifier

Uci Ml Api

Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

Stars: ✭ 190 (+123.53%)

Mutual labels: api, classification, clustering

Finviz

Unofficial API for finviz.com

Stars: ✭ 493 (+480%)

Mutual labels: api, analysis

Tensorflow Book

Accompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.

Stars: ✭ 4,448 (+5132.94%)

Mutual labels: classification, clustering

Cortex

Cortex: a Powerful Observable Analysis and Active Response Engine

Stars: ✭ 676 (+695.29%)

Mutual labels: api, analysis

Code Sleep Python

Awesome Projects in Python - Machine Learning Applications, Games, Desktop Applications all in Python 🐍

Stars: ✭ 306 (+260%)

Mutual labels: analysis, classification

Malheur

A Tool for Automatic Analysis of Malware Behavior

Stars: ✭ 313 (+268.24%)

Mutual labels: classification, clustering

Scikit Multilearn

A scikit-learn based module for multi-label et. al. classification

Stars: ✭ 638 (+650.59%)

Mutual labels: classification, clustering

clana

CLANA is a toolkit for classifier analysis.

Stars: ✭ 28 (-67.06%)

Mutual labels: analysis, classification

Tribuo

Tribuo - A Java machine learning library

Stars: ✭ 882 (+937.65%)

Mutual labels: classification, clustering

Satellite imagery analysis

Implementation of different techniques to find insights from the satellite data using Python.

Stars: ✭ 31 (-63.53%)

Mutual labels: classification, clustering

Mlj.jl

A Julia machine learning framework

Stars: ✭ 982 (+1055.29%)

Mutual labels: classification, clustering

Pycaret

An open-source, low-code machine learning library in Python

Stars: ✭ 4,594 (+5304.71%)

Mutual labels: clustering, classification

All Algorithms implemented in R

Stars: ✭ 294 (+245.88%)

Mutual labels: classification, clustering

Tensorflow Resources

Curated Tensorflow code resources to help you get started with Deep Learning.

Stars: ✭ 330 (+288.24%)

Mutual labels: classification, convolutional-networks

2018 Machinelearning Lectures Esa

Machine Learning Lectures at the European Space Agency (ESA) in 2018

Stars: ✭ 280 (+229.41%)

Mutual labels: tf-idf, clustering

Smile

Statistical Machine Intelligence & Learning Engine

Stars: ✭ 5,412 (+6267.06%)

Mutual labels: classification, clustering

Weka Jruby

Machine Learning & Data Mining with JRuby

Stars: ✭ 64 (-24.71%)

Mutual labels: classification, clustering

hmm

A Hidden Markov Model implemented in Javascript

Stars: ✭ 29 (-65.88%)

Mutual labels: clustering, classification

Python-Machine-Learning-Fundamentals

D-Lab's 6 hour introduction to machine learning in Python. Learn how to perform classification, regression, clustering, and do model selection using scikit-learn and TPOT.

Stars: ✭ 46 (-45.88%)

Mutual labels: clustering, classification

Pytorchinsight

a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results

Stars: ✭ 713 (+738.82%)

Mutual labels: classification, convolutional-networks

View All Similar Projects ➔

stringlifier

String-classifier - is a python module for detecting random string and hashes text/code.

Typical usage scenarios include:

Sanitizing application or security logs
Detecting accidentally exposed credentials (complex passwords or api keys)

Interactive notebook

You can see Stringlifier in action by checking out this interactive notebook hosted on Colaboratory.

Quick start guide

You can quickly use stringlifier via pip-installation:

$ pip install stringlifier

In case you are using the pip3 installation that comes with Python3, use pip3 instead of pip in the above command.

$ pip3 install stringlifier

API example:

from stringlifier.api import Stringlifier

stringlifier=Stringlifier()

s = stringlifier("com.docker.hyperkit -A -u -F vms/0/hyperkit.pid -c 8 -m 8192M -b 127.0.0.1 --pass=\"NlcXVpYWRvcg\" -s 0:0,hostbridge -s 31,lpc -s 1:0,virtio-vpnkit,path=vpnkit.eth.sock,uuid=45172425-08d1-41ec-9d13-437481803412 -U c6fb5010-a83e-4f74-9a5a-50d9086b9")

After this, s should be:

'com.docker.hyperkit -A -u -F vms/0/hyperkit.pid -c 8 -m 8192M -b <IP_ADDR> --pass="<RANDOM_STRING>" -s 0:0,hostbridge -s 31,lpc -s 1:0,virtio-vpnkit,path=vpnkit.eth.sock,uuid=<UUID> -U <UUID>'

You can also choose to see the full tokenization and classification output:

s, tokens = stringlifier("com.docker.hyperkit -A -u -F vms/0/hyperkit.pid -c 8 -m 8192M -b 127.0.0.1 --pass=\"NlcXVpYWRvcg\" -s 0:0,hostbridge -s 31,lpc -s 1:0,virtio-vpnkit,path=vpnkit.eth.sock,uuid=45172425-08d1-41ec-9d13-437481803412 -U c6fb5010-a83e-4f74-9a5a-50d9086b9", return_tokens=True)

s will be the same as before and tokens will contain the following data:

[[('0', 33, 34, '<NUMERIC>'),
   ('8', 51, 52, '<NUMERIC>'),
   ('8192', 56, 60, '<NUMERIC>'),
   ('127.0.0.1', 65, 74, '<IP_ADDR>'),
   ('NlcXVpYWRvcg', 83, 95, '<RANDOM_STRING>'),
   ('0', 100, 101, '<NUMERIC>'),
   ('0', 102, 103, '<NUMERIC>'),
   ('31', 118, 120, '<NUMERIC>'),
   ('1', 128, 129, '<NUMERIC>'),
   ('0', 130, 131, '<NUMERIC>'),
   ('45172425-08d1-41ec-9d13-437481803412', 172, 208, '<UUID>'),
   ('c6fb5010-a83e-4f74-9a5a-50d9086b9', 212, 244, '<UUID>')]]

Building your own classifier

You can also train your own model if you want to detect different types of strings. For this you can use the Command Line Interface for the string classifier:

$ python3 stringlifier/modules/stringc.py --help

Usage: stringc.py [options]

Options:
  -h, --help            show this help message and exit
  --interactive
  --train
  --resume
  --train-file=TRAIN_FILE
  --dev-file=DEV_FILE
  --store=OUTPUT_BASE
  --patience=PATIENCE   (default=20)
  --batch-size=BATCH_SIZE
                        (default=32)
  --device=DEVICE

For instructions on how to generate your training data, use this link.

Important note: This model might not scale if detecting a type of string depends on the surrounding tokens. In this case, you can look at a more advanced tool for sequence processing such as NLP-Cube

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 85

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗