All Projects → BBVA → Tarkin

BBVA / Tarkin

Licence: Apache-2.0 License
A tool for anomaly detection over streaming data based on sentiment analysis

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Tarkin

k2hftfuse
File transaction by FUSE-based file system
Stars: ✭ 30 (+3.45%)
Mutual labels:  logs
gitlab-job-log-viewer
Browser extension for code highlighting raw logs in Gitlab CI
Stars: ✭ 21 (-27.59%)
Mutual labels:  logs
talks
💥 Browser-based slides or PDFs of our talks and presentations
Stars: ✭ 91 (+213.79%)
Mutual labels:  spacy
aws-cloudformation-resource-providers-logs
The CloudFormation Resource Provider Package For Amazon CloudWatch Logs
Stars: ✭ 25 (-13.79%)
Mutual labels:  logs
hmrb
Python Rule Processing Engine 🏺
Stars: ✭ 65 (+124.14%)
Mutual labels:  spacy
tweets-preprocessor
Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team
Stars: ✭ 26 (-10.34%)
Mutual labels:  spacy
sematext-logsene-android
Sematext Logs Client Library for Android
Stars: ✭ 22 (-24.14%)
Mutual labels:  logs
narc
Small utility to watch log files and ship to syslog service.
Stars: ✭ 18 (-37.93%)
Mutual labels:  logs
amrlib
A python library that makes AMR parsing, generation and visualization simple.
Stars: ✭ 107 (+268.97%)
Mutual labels:  spacy
Arch-Data-Science
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
Stars: ✭ 92 (+217.24%)
Mutual labels:  spacy
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: ✭ 62 (+113.79%)
Mutual labels:  spacy
spaCyTextBlob
A TextBlob sentiment analysis pipeline component for spaCy.
Stars: ✭ 30 (+3.45%)
Mutual labels:  spacy
calllogs
Android library for accessing device call logs
Stars: ✭ 57 (+96.55%)
Mutual labels:  logs
spacy-universal-sentence-encoder
Google USE (Universal Sentence Encoder) for spaCy
Stars: ✭ 102 (+251.72%)
Mutual labels:  spacy
weak-supervision-for-NER
Framework to learn Named Entity Recognition models without labelled data using weak supervision.
Stars: ✭ 114 (+293.1%)
Mutual labels:  spacy
ULogViewer
Cross-Platform Universal Log Viewer.
Stars: ✭ 64 (+120.69%)
Mutual labels:  logs
syncr
A rolling, append-only, local and remote data stream library for Go
Stars: ✭ 16 (-44.83%)
Mutual labels:  logs
weblogic-logging-exporter
Export server logs from WebLogic Server in JSON format to Elasticsearch.
Stars: ✭ 19 (-34.48%)
Mutual labels:  logs
autonomio
Core functionality for the Autonomio augmented intelligence workbench.
Stars: ✭ 27 (-6.9%)
Mutual labels:  spacy
augmenty
Augmenty is an augmentation library based on spaCy for augmenting texts.
Stars: ✭ 101 (+248.28%)
Mutual labels:  spacy

Security Anomalies in Logs Data

Tarkin is a project aimed to perform anomaly detection over security logs data.

Approach

Have you ever felt a shiver down your spine at the sight of a log line, even before reading it completely? That's because you spotted something unusual and probably one or two old keywords that, in your experience, are usually associated with issues.

Detecting anomalies, and especially security-related ones, is a hard job that too often requires going through zillions of log lines, queue messages, database registers, etc. To make things even more difficult this usually happens under tight time pressure to identify the origin and reasons of an incident.

There are tools out there that promise to reduce this load by classifying them automatically but they are barely more than specialized spam filters that pay little to none attention to the meaning of the message, and still require to check on each tagged result to help improving the accuracy, making us work for the system but offering no flexibility.

We believe it takes more than statistics to spot particular types of anomalies. Also, we believe simplicity is the key for powerful systems. This is why we decided to emulate the intuition of human analysts faced to this problem, modelling the "fear" the feel by reading the logs through the filters of their instinct and domain experience.

The project is named after the Grand Moff Tarkin, a Star Wars character who lends his name to the Tarkin Doctrine, a policy based he proposed to allow the Empire rule the galaxy without the burden of bureaucracy.

How it works (in a nutshell)

Tarkin implements a pipelined models strategy. The first step is training a character frequency model with a messages sample, then apply it to the content of testing/fresh incoming messages:

Character Frequency Scoring

Then, adds sentiment analysis on top of that to show only messages with an overall negative meaning:

Sentiment Analysis Scoring

The resulting output is an indicator of the "fear" perceived in each message by the system, which is used to filter out the ones below a threshold set by the model:

System Output

Requirements

You need Python 3.6.x or later to run Tarkin. You can have multiple Python versions (2.x and 3.x) installed on the same system without problems.

In Ubuntu, Mint and Debian you can install Python 3 like this:

$ sudo apt-get install python3 python3-pip

In OS X you can install Python using Brew like this:

$ brew install python3

For other Linux flavors and Windows, packages are available at

http://www.python.org/getit/

To run the project in your python3 environment, you will need to install the dependencies in the requirements.txt file, and it's highly recommended to create a separate virtual env, see below. Execute the following n a terminal window:

$ cd security-anomales-logs-data
$ pip install -r requirements.txt

Then, you will need to run the following command:

$ python -m spacy download en

Working with virtualenv

If you are using virtualenv, make sure you are running a python3 environment. Installing via pip3 in a v2 environment will not configure the environment to run installed modules from the command line.

$ python3 -m pip install -U virtualenv
$ python3 -m virtualenv env
$ source ./env/bin/activate  # Enter into VirtualEnv

Quick start

There are several shell scripts available from the top level directory of the project:

  • build.sh: Initializes the environment creating the necessary folders and building the docker images.

The project can be run in your own machine and python installation. You will first need to run the training script, then you can execute check.sh or check-demo.sh to analyze files configured in the same script or quoted sentences passed as command line parameters, respectively.

  • train.sh: Starts the training of the letter frequency model, producing a letterspace.pkl binary file.
  • check.sh: Evaluates the infrequency and applies sentiment analysis to the logs of the file configured in the script.
  • check-demo.sh: Useful for demo purposes; evaluates the infrequency and applies sentiment analysis to a quoted sentence received as a script parameter. NOTICE: unlike check.sh, this script returns an evaluation result even if the sentiment score value is above 0.

You can also run the dockerized version of the project, which is launched using the following equivalent shell scripts:

  • train-docker.sh
  • check-docker.sh
  • check-demo-docker.sh

Notebooks

The project includes a notebook to illustrate how the fear indicator is calculated. Before being able to run it, you'll need to execute the following commands from your virtual env:

$ python3 -m pip install jupyter seaborn matplotlib
$ jupyter notebook

Then navigate on your browser to Tarkin/notebooks from the Jupyter Home tree and open the file Log Mining.ipynb.

In case you experience an error running the notebook cells, make sure you executed the ./build.sh script that sets up the project by building the docker images and downloading the default lexicon dictionary, which is used by the notebook, or do it again if unsure.

Contributing

Feedback, ideas and contributions are welcome. For more details, please see the CONTRIBUTING.md file.

License

This project is distributed under the Apache License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].