All Projects → surajr → Url Classification

surajr / Url Classification

Machine learning to classify Malicious (Spam)/Benign URL's

Projects that are alternatives of or similar to Url Classification

Keras transfer cifar10
Object classification with CIFAR-10 using transfer learning
Stars: ✭ 120 (+26.32%)
Mutual labels:  jupyter-notebook, classifier
Machine Learning
Machine learning for Project Cognoma
Stars: ✭ 30 (-68.42%)
Mutual labels:  jupyter-notebook, classifier
Streamingphish
Python-based utility that uses supervised machine learning to detect phishing domains from the Certificate Transparency log network.
Stars: ✭ 271 (+185.26%)
Mutual labels:  jupyter-notebook, phishing
Vehicle Detection And Tracking
Udacity Self-Driving Car Engineer Nanodegree. Project: Vehicle Detection and Tracking
Stars: ✭ 60 (-36.84%)
Mutual labels:  jupyter-notebook, classifier
Hate Speech And Offensive Language
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Stars: ✭ 543 (+471.58%)
Mutual labels:  jupyter-notebook, classifier
Pancancer
Building classifiers using cancer transcriptomes across 33 different cancer-types
Stars: ✭ 84 (-11.58%)
Mutual labels:  jupyter-notebook, classifier
Kaggle Competitions
All Kaggle competitions
Stars: ✭ 94 (-1.05%)
Mutual labels:  jupyter-notebook
Py Thin Plate Spline
Code for computing interpolating / approximating thin plate splines.
Stars: ✭ 95 (+0%)
Mutual labels:  jupyter-notebook
Stingray
Anything can happen in the next half hour (including spectral timing made easy)!
Stars: ✭ 94 (-1.05%)
Mutual labels:  jupyter-notebook
Zphisher
An automated phishing tool with 30+ templates.
Stars: ✭ 1,321 (+1290.53%)
Mutual labels:  phishing
Transferlearningtutorial
Applying transfer learning to a custom dataset by retraining Inception's final layer
Stars: ✭ 95 (+0%)
Mutual labels:  jupyter-notebook
Deep Learning Coursera
Deep Learning Specialization by Andrew Ng on Coursera.
Stars: ✭ 95 (+0%)
Mutual labels:  jupyter-notebook
Awesome Panel
A repository for sharing knowledge on Panel by HoloViz in order to build awesome analytics apps in Python
Stars: ✭ 95 (+0%)
Mutual labels:  jupyter-notebook
Python
Python 3
Stars: ✭ 94 (-1.05%)
Mutual labels:  jupyter-notebook
Ismir2018 tutorial
Stars: ✭ 95 (+0%)
Mutual labels:  jupyter-notebook
Notebooks
Examples and IPython Notebooks about NetworkX
Stars: ✭ 93 (-2.11%)
Mutual labels:  jupyter-notebook
Python Thenotheoryguide
Jupyter NoteBooks to get you boosted with the basics of python with hands-on-practice.
Stars: ✭ 95 (+0%)
Mutual labels:  jupyter-notebook
Lstm Odyssey
Implementations of "LSTM: A Search Space Odyssey" variants and their training results on the PTB dataset.
Stars: ✭ 94 (-1.05%)
Mutual labels:  jupyter-notebook
Deepspeechdistances
Authors' implementation of DeepSpeech Distances.
Stars: ✭ 95 (+0%)
Mutual labels:  jupyter-notebook
Deeplearningbookcode Volume2
Python/Jupyter notebooks for Volume 2 of "Deep Learning - From Basics to Practice" by Andrew Glassner
Stars: ✭ 95 (+0%)
Mutual labels:  jupyter-notebook

Phishing URL Classification

Malicious Web sites are a cornerstone of Internet criminal activities. These Web sites contain various unwanted content such as spam-advertised products, phishing sites, dangerous "drive-by" harness that infect a visitor's system with malware. The most influential approaches to the malicious URL problem are manually constructed lists in which all malicious web page`s URLs are listed, as well as users systems that analyze the content or behavior of a Web site as it is visited.

The disadvantage of Blacklisting approach is that we have to do the tedious task of searching the list for presence of the entry. And the list can be very large considering the amount of web sites on the Internet. Also the list cannot be kept upto date because of the evergrowing growth of web link each and every hour.

In the given System we are using Machine-Learning techniques to classify a URL as either Safe or Unsafe in Real Time without even the need to download the webpage.

Algorithms we are using in this system are :

The system is presently working only on Lexical features(Simple text features of a URL) which includes:

  • Length of URL
  • Domain Length
  • Presence of Ip Address in Host Name
  • Presence of Security Sensitive Words in URL

and many more(around 22 total). The Host Based Features like country code in which site is hosted, creation date, updation date etc. are still yet to be added to the system and increase accuracy of the classifier but increase the Latency time in classifying the URL as we have to query WHOIS servers in order to come up with the Host Based Features. For this query purpose the PyWhois module has been used.

About Dataset

For this given system we are using two sources to collect our data,namely:

Phishtank.com

For the phishing/malicious URLs we are collecting data from [Phishtank] (https://www.phishtank.com/).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].