All Projects → PENGZhaoqing → kdd99-scikit

PENGZhaoqing / kdd99-scikit

Licence: other
Solutions to kdd99 dataset with Decision tree and Neural network by scikit-learn

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kdd99-scikit

MachineLearning
Implementations of machine learning algorithm by Python 3
Stars: ✭ 16 (-68%)
Mutual labels:  scikit-learn, mlp
NIDS-Intrusion-Detection
Simple Implementation of Network Intrusion Detection System. KddCup'99 Data set is used for this project. kdd_cup_10_percent is used for training test. correct set is used for test. PCA is used for dimension reduction. SVM and KNN supervised algorithms are the classification algorithms of project. Accuracy : %83.5 For SVM , %80 For KNN
Stars: ✭ 45 (-10%)
Mutual labels:  intrusion-detection, kdd99
sklearn-oblique-tree
a python interface to OC1 and other oblique decision tree implementations
Stars: ✭ 33 (-34%)
Mutual labels:  scikit-learn, decision-tree
Algorithmic-Trading
Algorithmic trading using machine learning.
Stars: ✭ 102 (+104%)
Mutual labels:  scikit-learn, decision-tree
UNSW NB15
Feature coded UNSW_NB15 intrusion detection data.
Stars: ✭ 50 (+0%)
Mutual labels:  intrusion-detection, kdd99
Deep-Learning-Models
Deep Learning Models implemented in python.
Stars: ✭ 17 (-66%)
Mutual labels:  mlp
osprey
🦅Hyperparameter optimization for machine learning pipelines 🦅
Stars: ✭ 71 (+42%)
Mutual labels:  scikit-learn
five-minute-midas
Predicting Profitable Day Trading Positions using Decision Tree Classifiers. scikit-learn | Flask | SQLite3 | pandas | MLflow | Heroku | Streamlit
Stars: ✭ 41 (-18%)
Mutual labels:  scikit-learn
dbt-ml-preprocessing
A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.
Stars: ✭ 128 (+156%)
Mutual labels:  scikit-learn
vector space modelling
NLP in python Vector Space Modelling and document classification NLP
Stars: ✭ 16 (-68%)
Mutual labels:  scikit-learn
books
A collection of online books for data science, computer science and coding!
Stars: ✭ 29 (-42%)
Mutual labels:  scikit-learn
playground
A Streamlit application to play with machine learning models directly from the browser
Stars: ✭ 48 (-4%)
Mutual labels:  scikit-learn
kaggle-titanic
Titanic assignment on Kaggle competition
Stars: ✭ 30 (-40%)
Mutual labels:  scikit-learn
A-Detector
⭐ An anomaly-based intrusion detection system.
Stars: ✭ 69 (+38%)
Mutual labels:  scikit-learn
machine-learning-capstone-project
This is the final project for the Udacity Machine Learning Nanodegree: Predicting article retweets and likes based on the title using Machine Learning
Stars: ✭ 28 (-44%)
Mutual labels:  scikit-learn
DecisionTrees
A python implementation of the CART algorithm for decision trees
Stars: ✭ 38 (-24%)
Mutual labels:  decision-tree
SentimentAnalysis
(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset
Stars: ✭ 40 (-20%)
Mutual labels:  decision-tree
doubleml-for-py
DoubleML - Double Machine Learning in Python
Stars: ✭ 129 (+158%)
Mutual labels:  scikit-learn
abess
Fast Best-Subset Selection Library
Stars: ✭ 266 (+432%)
Mutual labels:  scikit-learn
Machine Learning From Scratch
Machine Learning models from scratch with a better visualisation
Stars: ✭ 15 (-70%)
Mutual labels:  decision-tree

kdd99-scikit

Solutions to kdd99 dataset with Decision Tree (CART) and Multilayer Perceptron by scikit-learn

Intro to Kdd99 Dataset

The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between "bad" connections, called intrusions or attacks, and "good" normal connections. Note that the test data is not from the same probability distribution as the training data, and it includes specific attack types not in the training data.

Snapshoot of training data(raw/kddcup.data_10_percent.txt):

0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,235,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,29,29,1.00,0.00,0.03,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,219,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0.00,0.00,0.00,0.00,1.00,0.00,0.00,39,39,1.00,0.00,0.03,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,217,2032,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0.00,0.00,0.00,0.00,1.00,0.00,0.00,49,49,1.00,0.00,0.02,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,217,2032,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,6,6,0.00,0.00,0.00,0.00,1.00,0.00,0.00,59,59,1.00,0.00,0.02,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,212,1940,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,1,69,1.00,0.00,1.00,0.04,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,159,4087,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,5,5,0.00,0.00,0.00,0.00,1.00,0.00,0.00,11,79,1.00,0.00,0.09,0.04,0.00,0.00,0.00,0.00,normal.

Prerequisite

Usage

详细说明

Fork first and then execute Preprocessing.py file to do:

  1. 将raw目录下的训练集和测试集的target类别用数字标识,生成新的文件储存在data目录下
  2. 将data目录下的训练机和测试集导入MongoDB数据库,方便后面快速读取
git clone https://github.com/your-github-account/kdd99-scikit
cd kdd99-scikit
python Preprocessing.py

For Decision Tree

.
├── CART_Predictor.py
├── CART_Runner.py
├── CART_Trainer.py
├── CART_all.py
├── __init__.py
└── output
    ├── CART.pkl
    └── tree-vis.pdf

决策树代码位于CART目录下,CART_Trainer类封装了训练模型时调用的方法,CART_Predictor类封装了predict方法用于测试和输出,两者由CART_Runner调用,CART_all.py除去了各种类和调用方式,将代码整合在一起

cd CART
python CART_Runner.py

Output:

  1. Confusion matrix:
[[ 6294    38    15    10    11]
[    5   800     4     0     0]
[  191    20 41508     1     0]
[    0     0     0     3     0]
[ 1076     5     0    16     3]]
  1. Performance report:
            precision    recall  f1-score   support

        0       0.83      0.99      0.90      6368
        1       0.93      0.99      0.96       809
        2       1.00      0.99      1.00     41720
        3       0.10      1.00      0.18         3
        4       0.21      0.00      0.01      1100

avg / total       0.96      0.97      0.96     50000
  • 训练完成的决策树导出到CART/output/tree-vis.pdf供可视化,如图:

  • 决策树模型被持久化在 CART/output/CART.pkl文件下,方便以后做离线预测

For MLP

.
├── MLP_Predictor.py
├── MLP_Predictor.pyc
├── MLP_Runner.py
├── MLP_Trainer.py
├── MLP_Trainer.pyc
├── __init__.py
└── output
    ├── MLP.pkl
    └── decision-tree.pkl
cd MLP
python MLP_Runner.py

Output:

  1. Confusion matrix:

    [[ 6320    41     6     0     1]
     [    5   801     3     0     0]
     [  212     0 41508     0     0]
     [    2     1     0     0     0]
     [ 1095     1     2     0     2]]
    
  2. Performance report:

                 precision    recall  f1-score   support
    
              0       0.83      0.99      0.90      6368
              1       0.95      0.99      0.97       809
              2       1.00      0.99      1.00     41720
              3       0.00      0.00      0.00         3
              4       0.67      0.00      0.00      1100
    
    avg / total       0.97      0.97      0.96     50000
    
  • MLP模型被持久化在 MLP/output/MLP.pkl文件下,方便以后做离线预测

Structure

.
├── CART
│   ├── CART_Predictor.py
│   ├── CART_Predictor.pyc
│   ├── CART_Runner.py
│   ├── CART_Trainer.py
│   ├── CART_Trainer.pyc
│   ├── CART_all.py
│   ├── __init__.py
│   └── output
│       ├── CART.pkl
│       ├── trained_text.txt
│       └── tree-vis.pdf
├── MLP
│   ├── MLP_Predictor.py
│   ├── MLP_Predictor.pyc
│   ├── MLP_Runner.py
│   ├── MLP_Trainer.py
│   ├── MLP_Trainer.pyc
│   ├── __init__.py
│   └── output
│       ├── MLP.pkl
│       └── decision-tree.pkl
├── Mongo_Con.py
├── Mongo_Con.pyc
├── Preprocessing.py
├── Preprocessing.pyc
├── Preprocessing_all.py
├── README.md
├── Snip20161130_3.png
├── Variable.py
├── Variable.pyc
├── __init__.py
├── __init__.pyc
├── data
│   ├── corrected.txt
│   └── kddcup.data_10_percent.txt
└── raw
    ├── corrected.txt
    ├── kddcup.data_10_percent.txt
    ├── testdata_unlabeled_50000.txt
    └── training_attack_types.txt

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].