Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → rapidsai → Clx

rapidsai / Clx

Licence: apache-2.0

A collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.

Labels

jupyter-notebook

Projects that are alternatives of or similar to Clx

Learning Vis Tools

Learning Vis Tools: Tutorial materials for Data Visualization course at HKUST

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Curated list of open-access databases with human structural MRI data

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Awesome Embedding Models

A curated list of awesome embedding models tutorials, projects and communities.

Stars: ✭ 1,486 (+1275.93%)

Mutual labels: jupyter-notebook

Python code examples for the feedly Machine Learning blog (https://blog.feedly.com/category/all/Machine-Learning/)

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Porting the R code in ISL to python. Labs and exercises

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Sas Viya Programming

Code samples and materials to help you learn to access SAS Viya services by writing programs in Python and other open-source languages

Stars: ✭ 107 (-0.93%)

Mutual labels: jupyter-notebook

Python Machine Learning

Tous les codes utilisés dans la série YouTube Python Spécial Machine Learning !

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

This github repository contains the code to the case studies in the O'Reilly book Machine Learning and Data Science Blueprints for Finance

Stars: ✭ 107 (-0.93%)

Mutual labels: jupyter-notebook

Periodic Spatial Generative Adversarial Networks

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Dask tutorial

Stars: ✭ 1,591 (+1373.15%)

Mutual labels: jupyter-notebook

Python package for dealing with whole slide images (.svs) for machine learning, particularly for fast prototyping. Includes patch sampling and storing using OpenSlide. Patches may be stored in LMDB, HDF5 files, or to disk. It is highly recommended to fork and download this repository so that personal customisations can be made for your work.

Stars: ✭ 107 (-0.93%)

Mutual labels: jupyter-notebook

Lda2vec Pytorch

Topic modeling with word vectors

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Tutorial how to use xgboost

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Board files to build Ultra 96 PYNQ image

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

A python library for decision tree visualization and model interpretation.

Stars: ✭ 1,857 (+1619.44%)

Mutual labels: jupyter-notebook

Robustness applications

Notebooks for reproducing the paper "Computer Vision with a Single (Robust) Classifier"

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Deep Ml Meetups

A central repository for all my projects

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

Shot Type Classifier

Detecting cinema shot types using a ResNet-50

Stars: ✭ 109 (+0.93%)

Mutual labels: jupyter-notebook

Math For Programmers

Source code for the book, Math for Programmers

Stars: ✭ 107 (-0.93%)

Mutual labels: jupyter-notebook

For the pandas tutorial at PyData Seattle: https://www.youtube.com/watch?v=otCriSKVV_8

Stars: ✭ 108 (+0%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Cyber Log Accelerators (CLX)

NOTE: For the latest stable README.md ensure you are on the main branch.

CLX ("clicks") provides a collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.

The goal of CLX is to:

Allow cyber data scientists and SecOps teams to generate workflows, using cyber-specific GPU-accelerated primitives and methods, that let them interact with code using security language,
Make available pre-built use cases that demonstrate CLX and RAPIDS functionality that are ready to use in a Security Operations Center (SOC),
Accelerate log parsing in a flexible, non-regex method. and
Provide SIEM integration with GPU compute environments via RAPIDS and effectively extend the SIEM environment.

Getting Started with Python and Notebooks

CLX is targeted towards cybersecurity data scientists, senior security analysts, threat hunters, and forensic investigators. Data scientists can use CLX in traditional Python files and Jupyter notebooks. The notebooks folder contains example use cases and workflow instantiations. It's also easy to get started using CLX with RAPIDS with Python. The code below reads cyber alerts, aggregates them by day, and calculates the rolling z-score value across multiple days to look for outliers in volumes of alerts. Expanded code is available in the alert analysis notebook.

import cudf
import s3fs
from os import path

# download data
if not path.exists("./splunk_faker_raw4"):
    fs = s3fs.S3FileSystem(anon=True)
    fs.get("rapidsai-data/cyber/clx/splunk_faker_raw4", "./splunk_faker_raw4")

# read in alert data
gdf = cudf.read_csv('./splunk_faker_raw4')
gdf.columns = ['raw']

# parse the alert data using CLX built-in parsers
from clx.parsers.splunk_notable_parser import SplunkNotableParser

snp = SplunkNotableParser()
parsed_gdf = cudf.DataFrame()
parsed_gdf = snp.parse(gdf, 'raw')

# define function to round time to the day
def round2day(epoch_time):
    return int(epoch_time/86400)*86400

# aggregate alerts by day
parsed_gdf['time'] = parsed_gdf['time'].astype(int)
parsed_gdf['day'] = parsed_gdf.time.applymap(round2day)
day_rule_gdf= parsed_gdf[['search_name','day','time']].groupby(['search_name', 'day']).count().reset_index()
day_rule_gdf.columns = ['rule', 'day', 'count']

# import the rolling z-score function from CLX statistics
from clx.analytics.stats import rzscore

# pivot the alert data so each rule is a column
def pivot_table(gdf, index_col, piv_col, v_col):
    index_list = gdf[index_col].unique()
    piv_gdf = cudf.DataFrame()
    piv_gdf[index_col] = index_list
    for group in gdf[piv_col].unique():
        temp_df = gdf[gdf[piv_col] == group]
        temp_df = temp_df[[index_col, v_col]]
        temp_df.columns = [index_col, group]
        piv_gdf = piv_gdf.merge(temp_df, on=[index_col], how='left')
    piv_gdf = piv_gdf.set_index(index_col)
    return piv_gdf.sort_index()

alerts_per_day_piv = pivot_table(day_rule_gdf, 'day', 'rule', 'count').fillna(0)

# create a new cuDF with the rolling z-score values calculated
r_zscores = cudf.DataFrame()
for rule in alerts_per_day_piv.columns:
    x = alerts_per_day_piv[rule]
    r_zscores[rule] = rzscore(x, 7) #7 day window

Getting Started With Workflows

In addition to traditional Python files and Jupyter notebooks, CLX also includes structure in the form of a workflow. A workflow is a series of data transformations performed on a GPU dataframe that contains raw cyber data, with the goal of surfacing meaningful cyber analytical output. Multiple I/O methods are available, including Kafka and on-disk file stores.

Example flow workflow reading and writing to file:

from clx.workflow import netflow_workflow

source = {
   "type": "fs",
   "input_format": "csv",
   "input_path": "/path/to/input",
   "schema": ["firstname","lastname","gender"],
   "delimiter": ",",
   "required_cols": ["firstname","lastname","gender"],
   "dtype": ["str","str","str"],
   "header": "0"
}
dest = {
   "type": "fs",
   "output_format": "csv",
   "output_path": "/path/to/output"
}
wf = netflow_workflow.NetflowWorkflow(source=source, destination=dest, name="my-netflow-workflow")
wf.run_workflow()

For additional examples, browse our complete API documentation, or check out our more detailed notebooks.

Getting CLX

Intro

There are 4 ways to get CLX :

Quick Start
Build CLX Docker Image
Conda Installation
Build from Source

Quick Start

Please see the Demo Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with CLX and its dependencies already installed.

Pull image:

docker pull rapidsai/rapidsai-clx-nightly:0.18-cuda11.0-runtime-ubuntu18.04-py3.7

Start CLX container

Preferred - Docker CE v19+ and nvidia-container-toolkit

docker run -it --gpus '"device=0"' \
  --rm -d \
  -p 8888:8888 \
  -p 8787:8787 \
  -p 8686:8686 \
  rapidsai/rapidsai-clx-nightly:0.18-cuda11.0-runtime-ubuntu18.04-py3.7

Legacy - Docker CE v18 and nvidia-docker2

docker run -it --runtime=nvidia \
  --rm -d \
  -p 8888:8888 \
  -p 8787:8787 \
  -p 8686:8686 \
  rapidsai/rapidsai-clx-nightly:0.18-cuda11.0-runtime-ubuntu18.04-py3.7

Container Ports

The following ports are used by the runtime containers only (not base containers):

8888 - exposes a JupyterLab notebook server
8786 - exposes a Dask scheduler
8787 - exposes a Dask diagnostic web server

Build CLX Docker Image

Prerequisites

NVIDIA Pascal™ GPU architecture or better
CUDA 10.1+ compatible NVIDIA driver
Ubuntu 16.04/18.04 or CentOS 7
Docker CE v18+
nvidia-docker v2+

Pull the RAPIDS image suitable to your environment and build CLX image. Please see the rapidsai-dev-nightly Docker repository, choosing a tag based on the NVIDIA CUDA version you’re running. More information on getting started with RAPIDS can be found here.

docker pull rapidsai/rapidsai-dev-nightly:0.18-cuda10.1-devel-ubuntu18.04-py3.7
docker build -t clx:latest .

Docker Container without SIEM Integration

Start the container and the notebook server. There are multiple ways to do this, depending on what version of Docker you have.

Preferred - Docker CE v19+ and nvidia-container-toolkit

docker run -it --gpus '"device=0"' \
  --rm -d \
  -p 8888:8888 \
  -p 8787:8787 \
  -p 8686:8686 \
  clx:latest

Legacy - Docker CE v18 and nvidia-docker2

docker run -it --runtime=nvidia \
  --rm -d \
  -p 8888:8888 \
  -p 8787:8787 \
  -p 8686:8686 \
  clx:latest

The container will include scripts for your convenience to start and stop JupyterLab.

# Start JupyterLab
/rapids/utils/start_jupyter.sh

# Stop JupyterLab
/rapids/utils/stop_jupyter.sh

Docker Container with SIEM Integration

The following steps show how to use docker-compose to create a CLX environment ready for SIEM integration. We will be using docker-compose to start multiple containers running CLX, Kafka and Zookeeper.

First, make sure to have the following installed:

Add the following to /etc/docker/daemon.json if not already there:

runtimes": {
        "nvidia": {
                "path": "/usr/bin/nvidia-container-runtime",
                "runtimeArgs": []
        }
}

Run the following to start your containers. Modify port mappings in docker-compose.yml if there are port conflicts.

docker-compose up

By default, all GPUs in your system will be visible to your CLX container. To choose which GPUs you want visible, you can add the following to the clx section of your docker-compose.yml:

environment:
      - NVIDIA_VISIBLE_DEVICES=0,1

Conda Install

It is easy to install CLX using conda. You can get a minimal conda installation with Miniconda or get the full installation with Anaconda.

Install and update CLX using the conda command:

conda install -c rapidsai-nightly -c nvidia -c pytorch -c conda-forge -c defaults clx

Building from Source and Contributing

For contributing guildelines please reference our guide for contributing.

Documentation

Python API documentation can be found here or generated from docs directory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 108

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗