Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → chen0040 → keras-malicious-url-detector

chen0040 / keras-malicious-url-detector

Licence: MIT license

Malicious URL detector using keras recurrent networks and scikit-learn classifiers

Programming Languages

139335 projects - #7 most used programming language

Labels

recurrent-neural-networks lstm convolutional-neural-networks url-detector binary-classification malicious-url

Projects that are alternatives of or similar to keras-malicious-url-detector

Stockprediction

Plain Stock Close-Price Prediction via Graves LSTM RNNs

Stars: ✭ 134 (+458.33%)

Mutual labels: recurrent-neural-networks, lstm

Keras implementation of Legendre Memory Units

Stars: ✭ 160 (+566.67%)

Mutual labels: recurrent-neural-networks, lstm

Document Classifier Lstm

A bidirectional LSTM with attention for multiclass/multilabel text classification.

Stars: ✭ 136 (+466.67%)

Mutual labels: recurrent-neural-networks, lstm

Rnn Text Classification Tf

Tensorflow Implementation of Recurrent Neural Network (Vanilla, LSTM, GRU) for Text Classification

Stars: ✭ 114 (+375%)

Mutual labels: recurrent-neural-networks, lstm

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

Stars: ✭ 220 (+816.67%)

Mutual labels: recurrent-neural-networks, lstm

Linear Attention Recurrent Neural Network

A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)

Stars: ✭ 119 (+395.83%)

Mutual labels: recurrent-neural-networks, lstm

Stock Price Predictor

This project seeks to utilize Deep Learning models, Long-Short Term Memory (LSTM) Neural Network algorithm, to predict stock prices.

Stars: ✭ 146 (+508.33%)

Mutual labels: recurrent-neural-networks, lstm

Language Translation

Neural machine translator for English2German translation.

Stars: ✭ 82 (+241.67%)

Mutual labels: recurrent-neural-networks, lstm

Lstm anomaly thesis

Anomaly detection for temporal data using LSTMs

Stars: ✭ 178 (+641.67%)

Mutual labels: recurrent-neural-networks, lstm

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Stars: ✭ 2,097 (+8637.5%)

Mutual labels: recurrent-neural-networks, lstm

Pytorch Learners Tutorial

PyTorch tutorial for learners

Stars: ✭ 97 (+304.17%)

Mutual labels: recurrent-neural-networks, lstm

LSTM-Time-Series-Analysis

Using LSTM network for time series forecasting

Stars: ✭ 41 (+70.83%)

Mutual labels: recurrent-neural-networks, lstm

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (+300%)

Mutual labels: recurrent-neural-networks, lstm

Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Stars: ✭ 126 (+425%)

Mutual labels: recurrent-neural-networks, lstm

Multitask sentiment analysis

Multitask Deep Learning for Sentiment Analysis using Character-Level Language Model, Bi-LSTMs for POS Tag, Chunking and Unsupervised Dependency Parsing. Inspired by this great article https://arxiv.org/abs/1611.01587

Stars: ✭ 93 (+287.5%)

Mutual labels: recurrent-neural-networks, lstm

Image Caption Generator

[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow

Stars: ✭ 141 (+487.5%)

Mutual labels: recurrent-neural-networks, lstm

using rnn (lstm or gru) and ctc to convert line image into text, based on torch7 and warp-ctc

Stars: ✭ 70 (+191.67%)

Mutual labels: recurrent-neural-networks, lstm

Ai Reading Materials

Some of the ML and DL related reading materials, research papers that I've read

Stars: ✭ 79 (+229.17%)

Mutual labels: recurrent-neural-networks, lstm

Deep News Summarization

News summarization using sequence to sequence model with attention in TensorFlow.

Stars: ✭ 167 (+595.83%)

Mutual labels: recurrent-neural-networks, lstm

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+13270.83%)

Mutual labels: recurrent-neural-networks, lstm

View All Similar Projects ➔

keras-malicious-url-detector

Malicious URL detector using char-level recurrent neural networks with Keras

The purpose of this project is to study malicious url detector that does not rely on any prior knowledge about urls

The training data is from this link, can be found in "demo/data/URL.txt"

Deep Learning Models

The following deep learning models have been implemented and studied:

LSTM: this approach uses LSTM recurrent networks for classifier with categorical cross entropy loss function
- training: demo/lstm_train.py (one-hot encoding)
- predictor: demo/lstm_predict.py (one-hot encoding)
- training: demo/lstm_embed_train.py (word embedding)
- predictor: demo/lstm_embed_predict.py (word embedding)
CNN + LSTM: this approach uses CNN + LSTM recurrent networks for classifier with categorical cross entropy loss function
- training: demo/cnn_lstm_train.py
- predictor: demo/cnn_lstm_predict.py
Bidirectional LSTM: this approach uses Bidirectional LSTM recurrent networks for classifier with categorical cross entropy loss function
- training: demo/bidirectional_lstm_train.py
- predictor: demo/bidirectional_lstm_predict.py

Usage

To run the training on Bidirectional LSTM:

cd demo
python bidirectional_lstm_train.py

Below is the code in bidirectional_lstm_train.py:

from keras_malicious_url_detector.library.bidirectional_lstm import BidirectionalLstmEmbedPredictor
from keras_malicious_url_detector.library.utility.url_data_loader import load_url_data
import numpy as np
from keras_malicious_url_detector.library.utility.text_model_extractor import extract_text_model
from keras_malicious_url_detector.library.utility.plot_utils import plot_and_save_history


def main():

    random_state = 42
    np.random.seed(random_state)

    data_dir_path = './data'
    model_dir_path = './models'
    report_dir_path = './reports'

    url_data = load_url_data(data_dir_path)

    text_model = extract_text_model(url_data['text'])

    batch_size = 64
    epochs = 30

    classifier = BidirectionalLstmEmbedPredictor()

    history = classifier.fit(text_model=text_model,
                             model_dir_path=model_dir_path,
                             url_data=url_data, batch_size=batch_size, epochs=epochs)

    plot_and_save_history(history, BidirectionalLstmEmbedPredictor.model_name,
                          report_dir_path + '/' + BidirectionalLstmEmbedPredictor.model_name + '-history.png')


if __name__ == '__main__':
    main()

After the training, the trained models are saved in the demo/models folder.

To test the trained model,run:

cd demo
python bidirectional_lstm_predict.py

Below is the code in bidrectional_lstm_predict.py:

from keras_malicious_url_detector.library.bidirectional_lstm import BidirectionalLstmEmbedPredictor
from keras_malicious_url_detector.library.utility.url_data_loader import load_url_data


def main():

    data_dir_path = './data'
    model_dir_path = './models'

    predictor = BidirectionalLstmEmbedPredictor()
    predictor.load_model(model_dir_path)

    url_data = load_url_data(data_dir_path)
    count = 0
    for url, label in zip(url_data['text'], url_data['label']):
        predicted_label = predictor.predict(url)
        print('predicted: ' + str(predicted_label) + ' actual: ' + str(label))
        count += 1
        if count > 20:
            break


if __name__ == '__main__':
    main()

Performance

Currently the bidirectional LSTM gives the best performance, with 75% - 80% accuracy after 30 to 40 epochs of training

Below is the training history (loss and accuracy) for the bidirectional LSTM:

Issues

Currently the data size of the urls is small
Class imbalances - the URL.txt contains class imbalances (more 0 than 1), ideally the problem should be an outlier or anomaly detection problem. To handle the class imabalances, currently a resampling method is used to make sure that there are more or less equal number of each classes

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 24

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗