All Projects → Ironholds → reconstructr

Ironholds / reconstructr

Licence: other
Tidy tools for session reconstruction and analysis

Programming Languages

r
7636 projects
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to reconstructr

Log3c
Log-based Impactful Problem Identification using Machine Learning [FSE'18]
Stars: ✭ 131 (+385.19%)
Mutual labels:  log-analysis
LogESP
Open Source SIEM (Security Information and Event Management system).
Stars: ✭ 162 (+500%)
Mutual labels:  log-analysis
dw-query-digest
MySQL slow log analyzer. Alternative to pt-query-digest.
Stars: ✭ 36 (+33.33%)
Mutual labels:  log-analysis
Graylog Ansible Role
Ansible role which installs and configures Graylog
Stars: ✭ 173 (+540.74%)
Mutual labels:  log-analysis
Wazuh Docker
Wazuh - Docker containers
Stars: ✭ 213 (+688.89%)
Mutual labels:  log-analysis
vim-log-highlighting
Syntax highlighting for generic log files in VIM
Stars: ✭ 164 (+507.41%)
Mutual labels:  log-analysis
Ft Tree
IWQoS 2017: A toolkit for log template extraction
Stars: ✭ 123 (+355.56%)
Mutual labels:  log-analysis
thinkphp gui tools
ThinkPHP 漏洞 综合利用工具, 图形化界面, 命令执行, 一键getshell, 批量检测, 日志遍历, session包含, 宝塔绕过
Stars: ✭ 190 (+603.7%)
Mutual labels:  log-analysis
Wazuh Kibana App
Wazuh - Kibana plugin
Stars: ✭ 212 (+685.19%)
Mutual labels:  log-analysis
wazuh-ansible
Wazuh - Ansible playbook
Stars: ✭ 166 (+514.81%)
Mutual labels:  log-analysis
Wazuh
Wazuh - The Open Source Security Platform
Stars: ✭ 3,154 (+11581.48%)
Mutual labels:  log-analysis
Graylog Docker
Official Graylog Docker image
Stars: ✭ 200 (+640.74%)
Mutual labels:  log-analysis
wazuh-packages
Wazuh - Tools for packages creation
Stars: ✭ 54 (+100%)
Mutual labels:  log-analysis
Documentation
Stars: ✭ 133 (+392.59%)
Mutual labels:  log-analysis
wazuh-cloudformation
Wazuh - Amazon AWS Cloudformation
Stars: ✭ 32 (+18.52%)
Mutual labels:  log-analysis
Logdeep
log anomaly detection toolkit including DeepLog
Stars: ✭ 125 (+362.96%)
Mutual labels:  log-analysis
beepbeep-3
An event stream processor anyone can use
Stars: ✭ 20 (-25.93%)
Mutual labels:  log-analysis
wazuh-puppet
Wazuh - Puppet module
Stars: ✭ 25 (-7.41%)
Mutual labels:  log-analysis
siemstress
Very basic CLI SIEM (Security Information and Event Management system).
Stars: ✭ 24 (-11.11%)
Mutual labels:  log-analysis
datastation
App to easily query, script, and visualize data from every database, file, and API.
Stars: ✭ 2,519 (+9229.63%)
Mutual labels:  log-analysis

Session reconstruction and analysis in R

Author: Os Keyes
License: MIT
Status: Stable

Travis-CI Build Status CRAN_Status_Badge downloads

Description

A well-studied part of web analytics and human-computer interaction is the concept of a "session": a series of linked user actions. This is used for anything from evaluating the impact of design or engineering changes on users, to providing common, high-level metrics such as time-on-page or bounce rate.

reconstructr is a library designed to efficiently reconstruct sessions from a series of user events, and then generate common metrics from that session-based data, including bounce rate, session length and time-on-page. It features heavy internal use of C++ to make it lightning-fast over datasets containing millions or tens of millions of events, along with a wide range of options with each function, allowing you to heavily customise what data is produced and what data is evaluated. For more information, see the introductory vignette.

The package is under active development: if you find bugs or have suggestions for new features, please feel free to report them.

Usage

So you've got a session dataset, which we'll call, well, session_dataset. It looks like this:

library(reconstructr)
data("session_dataset")
str(session_dataset)

# 'data.frame':	63524 obs. of  3 variables:
#  $ uuid     : chr  "47dc43895814861e21a2edf93348c826" "a736822df1890011694e7049cb3abef3" "674d2d00e096a3319874a4347caa1f4a" "f62d315398e6d04a3f2fa02e8ae42d49" ...
#  $ timestamp: POSIXlt, format: "2014-01-07 00:00:15" "2014-01-07 00:01:11" "2014-01-07 00:01:54" ...
#  $ url      : chr  "https://www.nasa.gov/history/mercury/mercury.html" "https://www.nasa.gov/images/ksclogosmall.gif" "https://www.nasa.gov/elv/hot.gif" "https://www.nasa.gov/facts/faq04.html" ...

You have timestamps, you have UUIDs for each user, and you have the URL (or any other metadata you might need!). What you really want to do is divide the data up into 'sessions' - distinguishable blocks of browsing activity by a single user. For this we use sessionise, passing it the dataset, the column names for timestamps and user IDs, and a threshold - the number of seconds after which to decide a user has entered a new session. By default this is 3600 (1 hour):

sessionised_data <- sessionise(session_dataset, "timestamp", "uuid")

str(sessionised_data)
# 'data.frame':	63524 obs. of  5 variables:
#  $ uuid      : chr  "0005839b3e8483d50870f61f50307fa7" "000b047bad36484451f12c114ab5eb28" "000b047bad36484451f12c114ab5eb28" "000b047bad36484451f12c114ab5eb28" ...
#  $ timestamp : POSIXlt, format: "2014-01-14 12:47:59" "2014-01-07 14:25:11" "2014-01-09 12:47:17" ...
#  $ url       : chr  "https://www.nasa.gov/history/apollo/images/footprint-logo.gif" "https://www.nasa.gov/ksc.html" "https://www.nasa.gov/biomed/threat/gif/beachmousefinsmall.gif" "https://www.nasa.gov/shuttle/resources/orbiters/atlantis.html" ...
#  $ session_id: chr  "9c77ea18bbef377253be1b22957071c1" "eda2ec544d96f0f1e3271902cbb693b7" "ee6d08bdaf1fb3c28edd0ac3290b82f5" "ee6d08bdaf1fb3c28edd0ac3290b82f5" ...
#  $ time_delta: int  NA NA NA 45 4 75 274 47 NA 28 ...

This adds two new columns - a unique ID for each session, and (for each event) the time elapsed between that event and the next one in a session.

From this we can calculate a lot of commmon session-related metrics:

# Number of sessions per user
sess_count <- session_count(sessionised_data, "uuid")
str(sess_count)

# 'data.frame':	10000 obs. of  2 variables:
#  $ user_id      : chr  "0005839b3e8483d50870f61f50307fa7" "000b047bad36484451f12c114ab5eb28" "000b2bc1a5438d8d54d4fbec139a2fd5" "001b6e80a14ba8d809c4ff18cdbade40" ...
#  $ session_count: int  1 2 1 1 1 6 1 1 1 1 ...

# Length of each session
sess_length <- session_length(sessionised_data)
str(sess_length)

# 'data.frame':	20820 obs. of  2 variables:
#  $ session_id    : chr  "0000664732878ba3409c138d4870a42d" "00029b1cd83040b8e14d7d65e057029e" "0002e5a2e75610bfb6c0598ea228a9d1" "00097364d131b6d6580d3c69a3e0a868" ...
#  $ session_length: int  0 62 101 0 83 7 3113 0 4071 0 ...

# The 'bounce rate' (overall or per user!)

sess_bounce <- bounce_rate(sessionised_data)
str(sess_bounce)
# num 18.9

sess_bounce <- bounce_rate(sessionised_data, "uuid")
str(sess_bounce)
# 'data.frame':	10000 obs. of  2 variables:
#  $ user_id    : chr  "0005839b3e8483d50870f61f50307fa7" "000b047bad36484451f12c114ab5eb28" "000b2bc1a5438d8d54d4fbec139a2fd5" "001b6e80a14ba8d809c4ff18cdbade40" ...
#  $ bounce_rate: num  100 14.3 0 100 100 ...

# And many others

Installation

For the current release version:

install.packages("reconstructr")

For the development version:

library(devtools)
install_github("ironholds/reconstructr")

Dependencies

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].