All Projects → logpai → Loglizer

logpai / Loglizer

Licence: mit
A log analysis toolkit for automated anomaly detection [ISSRE'16]

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Loglizer

Logdeep
log anomaly detection toolkit including DeepLog
Stars: ✭ 125 (-84.08%)
Mutual labels:  log-analysis, anomaly-detection
Loghub
A large collection of system log datasets for AI-powered log analytics
Stars: ✭ 551 (-29.81%)
Mutual labels:  log-analysis, anomaly-detection
Log3c
Log-based Impactful Problem Identification using Machine Learning [FSE'18]
Stars: ✭ 131 (-83.31%)
Mutual labels:  log-analysis, anomaly-detection
Logparser
A toolkit for automated log parsing [ICSE'19, TDSC'18, DSN'16]
Stars: ✭ 620 (-21.02%)
Mutual labels:  log-analysis, anomaly-detection
Deep Learning For Hackers
Machine Learning tutorials with TensorFlow 2 and Keras in Python (Jupyter notebooks included) - (LSTMs, Hyperameter tuning, Data preprocessing, Bias-variance tradeoff, Anomaly Detection, Autoencoders, Time Series Forecasting, Object Detection, Sentiment Analysis, Intent Recognition with BERT)
Stars: ✭ 586 (-25.35%)
Mutual labels:  anomaly-detection
Lnav
Log file navigator
Stars: ✭ 4,032 (+413.63%)
Mutual labels:  log-analysis
Deepadots
Repository of the paper "A Systematic Evaluation of Deep Anomaly Detection Methods for Time Series".
Stars: ✭ 335 (-57.32%)
Mutual labels:  anomaly-detection
Keras Anomaly Detection
Anomaly detection implemented in Keras
Stars: ✭ 335 (-57.32%)
Mutual labels:  anomaly-detection
Ganomaly
GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training
Stars: ✭ 563 (-28.28%)
Mutual labels:  anomaly-detection
Luminaire
Luminaire is a python package that provides ML driven solutions for monitoring time series data.
Stars: ✭ 316 (-59.75%)
Mutual labels:  anomaly-detection
Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+547.52%)
Mutual labels:  anomaly-detection
Outlier Exposure
Deep Anomaly Detection with Outlier Exposure (ICLR 2019)
Stars: ✭ 343 (-56.31%)
Mutual labels:  anomaly-detection
Telemanom
A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.
Stars: ✭ 589 (-24.97%)
Mutual labels:  anomaly-detection
Credit Card Fraud Detection Using Autoencoders In Keras
iPython notebook and pre-trained model that shows how to build deep Autoencoder in Keras for Anomaly Detection in credit card transactions data
Stars: ✭ 337 (-57.07%)
Mutual labels:  anomaly-detection
Ad examples
A collection of anomaly detection methods (iid/point-based, graph and time series) including active learning for anomaly detection/discovery, bayesian rule-mining, description for diversity/explanation/interpretability. Analysis of incorporating label feedback with ensemble and tree-based detectors. Includes adversarial attacks with Graph Convolutional Network.
Stars: ✭ 641 (-18.34%)
Mutual labels:  anomaly-detection
Rnn Time Series Anomaly Detection
RNN based Time-series Anomaly detector model implemented in Pytorch.
Stars: ✭ 718 (-8.54%)
Mutual labels:  anomaly-detection
Adtk
A Python toolkit for rule-based/unsupervised anomaly detection in time series
Stars: ✭ 615 (-21.66%)
Mutual labels:  anomaly-detection
Wdbgark
WinDBG Anti-RootKit Extension
Stars: ✭ 450 (-42.68%)
Mutual labels:  anomaly-detection
Anomaly Detection Resources
Anomaly detection related books, papers, videos, and toolboxes
Stars: ✭ 5,306 (+575.92%)
Mutual labels:  anomaly-detection
Getting Things Done With Pytorch
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoencoders, Object Detection with YOLO v5, Build your first Neural Network, Time Series forecasting for Coronavirus daily cases, Sentiment Analysis with BERT.
Stars: ✭ 738 (-5.99%)
Mutual labels:  anomaly-detection

loglizer

Loglizer is a machine learning-based log analysis toolkit for automated anomaly detection.

Loglizer是一款基于AI的日志大数据分析工具, 能用于自动异常检测、智能故障诊断等场景

Logs are imperative in the development and maintenance process of many software systems. They record detailed runtime information during system operation that allows developers and support engineers to monitor their systems and track abnormal behaviors and errors. Loglizer provides a toolkit that implements a number of machine-learning based log analysis techniques for automated anomaly detection.

🔭 If you use loglizer in your research for publication, please kindly cite the following paper.

Framework

Framework of Anomaly Detection

The log analysis framework for anomaly detection usually comprises the following components:

  1. Log collection: Logs are generated at runtime and aggregated into a centralized place with a data streaming pipeline, such as Flume and Kafka.
  2. Log parsing: The goal of log parsing is to convert unstructured log messages into a map of structured events, based on which sophisticated machine learning models can be applied. The details of log parsing can be found at our logparser project.
  3. Feature extraction: Structured logs can be sliced into short log sequences through interval window, sliding window, or session window. Then, feature extraction is performed to vectorize each log sequence, for example, using an event counting vector.
  4. Anomaly detection: Anomaly detection models are trained to check whether a given feature vector is an anomaly or not.

Models

Anomaly detection models currently available:

Model Paper reference
Supervised models
LR [EuroSys'10] Fingerprinting the Datacenter: Automated Classification of Performance Crises, by Peter Bodík, Moises Goldszmidt, Armando Fox, Hans Andersen. [Microsoft]
Decision Tree [ICAC'04] Failure Diagnosis Using Decision Trees, by Mike Chen, Alice X. Zheng, Jim Lloyd, Michael I. Jordan, Eric Brewer. [eBay]
SVM [ICDM'07] Failure Prediction in IBM BlueGene/L Event Logs, by Yinglung Liang, Yanyong Zhang, Hui Xiong, Ramendra Sahoo. [IBM]
Unsupervised models
LOF [SIGMOD'00] LOF: Identifying Density-Based Local Outliers, by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander.
One-Class SVM [Neural Computation'01] Estimating the Support of a High-Dimensional Distribution, by John Platt, Bernhard Schölkopf, John Shawe-Taylor, Alex J. Smola, Robert C. Williamson.
Isolation Forest [ICDM'08] Isolation Forest, by Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou.
PCA [SOSP'09] Large-Scale System Problems Detection by Mining Console Logs, by Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael I. Jordan. [Intel]
Invariants Mining [ATC'10] Mining Invariants from Console Logs for System Problem Detection, by Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, Jiang Li. [Microsoft]
Clustering [ICSE'16] Log Clustering based Problem Identification for Online Service Systems, by Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, Xuewei Chen. [Microsoft]
DeepLog (coming) [CCS'17] DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning, by Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar.
AutoEncoder (coming) [Arxiv'18] Anomaly Detection using Autoencoders in High Performance Computing Systems, by Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini.

Log data

We have collected a set of labeled log datasets in loghub for research purposes. If you are interested in the datasets, please follow the link to submit your access request.

Install

git clone https://github.com/logpai/loglizer.git
cd loglizer
pip install -r requirements.txt

API usage

# Load HDFS dataset. If you would like to try your own log, you need to rewrite the load function.
(x_train, y_train), (x_test, y_test) = dataloader.load_HDFS(...)

# Feature extraction and transformation
feature_extractor = preprocessing.FeatureExtractor()
feature_extractor.fit_transform(...) 

# Model training
model = PCA()
model.fit(...)

# Feature transform after fitting
x_test = feature_extractor.transform(...)
# Model evaluation with labeled data
model.evaluate(...)

# Anomaly prediction
x_test = feature_extractor.transform(...)
model.predict(...) # predict anomalies on given data

For more details, please follow the demo in the docs to get started. Please note that all ML models are not magic, you need to figure out how to tune the parameters in order to make them work on your own data.

Benchmarking results

If you would like to reproduce the following results, please run benchmarks/HDFS_bechmark.py on the full HDFS dataset (HDFS100k is for demo only).

HDFS
Model Precision Recall F1
LR 0.955 0.911 0.933
Decision Tree 0.998 0.998 0.998
SVM 0.959 0.970 0.965
LOF 0.967 0.561 0.710
One-Class SVM 0.995 0.222 0.363
Isolation Forest 0.830 0.776 0.802
PCA 0.975 0.635 0.769
Invariants Mining 0.888 0.945 0.915
Clustering 1.000 0.720 0.837

Contributors

  • Shilin He, The Chinese University of Hong Kong
  • Jieming Zhu, The Chinese University of Hong Kong, currently at Huawei Noah's Ark Lab
  • Pinjia He, The Chinese University of Hong Kong, currently at ETH Zurich

Feedback

For any questions or feedback, please post to the issue page.

History

  • May 14, 2016: initial commit
  • Sep 21, 2017: update code and readme
  • Mar 21, 2018: rewrite most of the code and add detailed comments
  • Feb 18, 2019: restructure the repository with hands-on demo
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].