All Projects → hirolovesbeer → hayabusa

hirolovesbeer / hayabusa

Licence: MIT license
Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data

Programming Languages

CSS
56736 projects
HTML
75241 projects
python
139335 projects - #7 most used programming language
TeX
3793 projects
javascript
184084 projects - #8 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to hayabusa

Awesome Learning
实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (+358.14%)
Mutual labels:  bigdata
Simple It English
Simple-IT-English: smart wordbook from community for community
Stars: ✭ 233 (+441.86%)
Mutual labels:  bigdata
codefoundry
Examples for gauravbytes.com
Stars: ✭ 57 (+32.56%)
Mutual labels:  bigdata
Shifu
An end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (+381.4%)
Mutual labels:  bigdata
Tdengine
An open-source big data platform designed and optimized for the Internet of Things (IoT).
Stars: ✭ 17,434 (+40444.19%)
Mutual labels:  bigdata
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+469.77%)
Mutual labels:  bigdata
Flinkx
Based on Apache Flink. support data synchronization/integration and streaming SQL computation.
Stars: ✭ 2,651 (+6065.12%)
Mutual labels:  bigdata
NLog.Targets.Syslog
A Syslog server target for NLog
Stars: ✭ 63 (+46.51%)
Mutual labels:  syslog
Hadoop Attack Library
A collection of pentest tools and resources targeting Hadoop environments
Stars: ✭ 228 (+430.23%)
Mutual labels:  bigdata
workflUX
An open-source, cloud-ready web application for simplified deployment of big data workflows.
Stars: ✭ 26 (-39.53%)
Mutual labels:  bigdata
Flink Boot
懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本(不需要理解分布式计算的理论知识和Flink框架的细节)便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度,该脚手架默认集成Spring框架进行Bean管理,同时将微服务以及WEB开发领域中经常用到的框架集成进来,进一步提升开发速度。比如集成Mybatis ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Stars: ✭ 209 (+386.05%)
Mutual labels:  bigdata
Node Hbase
Asynchronous HBase client for NodeJs using REST
Stars: ✭ 226 (+425.58%)
Mutual labels:  bigdata
Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+479.07%)
Mutual labels:  bigdata
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (+372.09%)
Mutual labels:  bigdata
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+3041.86%)
Mutual labels:  bigdata
Kotlin Spark Api
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (+325.58%)
Mutual labels:  bigdata
Dpark
Python clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+6104.65%)
Mutual labels:  bigdata
Syslog
An Arduino library for logging to Syslog server in IETF format (RFC 5424) and BSD format (RFC 3164)
Stars: ✭ 105 (+144.19%)
Mutual labels:  syslog
Spark-MLlib-Tutorial
大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件
Stars: ✭ 32 (-25.58%)
Mutual labels:  bigdata
bigdatatutorial
bigdatatutorial
Stars: ✭ 34 (-20.93%)
Mutual labels:  bigdata

Hayabusa

Hayabusa: A Simple and Fast Full-Text Search Engine for Massive System Log Data

Concept

  • Pure python implement
  • Parallel SQLite processing engine
  • SQLite3 FTS(Full Text Search)
  • Core-scale architecture

Architecture

  • Design of the directory structure

    • By specifying a search range of time in ”the directory path + yyyy + mm + dd + hh + min.db”, the search program can select the search time systematically.
    /targetdir/yyyy/mm/dd/hh/min.db
    
  • StoreEngine

    • sample code
    import os.path import sqlite3
    db_file = ’test.db’ log_file = ’1m.log’
    
    if not os.path.exists(db_file):
        conn = sqlite3.connect(db_file) conn.execute("CREATE VIRTUAL TABLE SYSLOG USING FTS3(LOGS)");
        conn.close()
    conn = sqlite3.connect(db_file)
    
    with open(log_file) as fh:
        lines = [[line] for line in fh] 
        conn.executemany(’INSERT INTO SYSLOG VALUES ( ? )’, lines) 
        conn.commit()
    
  • SearchEngine

    • sample command
    $ python search_engine.py -h
    usage: search_engine.py [-h] [--time TIME] [--match MATCH] [-c] [-s] [-v]
    
    optional arguments:
      -h, --help     show this help message and exit
      --time TIME    time explain regexp(YYYY/MM/DD/HH/MIN). eg: 2017/04/27/10/*
      --match MATCH  matching keyword. eg: noc or 'noc Login'
      -e             exact match
      -c             count
      -s             sum
      -v             verbose
     
     $ python search_engine.py --time 2017/05/11/13/* --match 'keyword' -c 
    
  • Architecture image Hayabusa Architecture

Search condition

  • case-insensitive

    • no distinguish uppercase or lowercase
  • Exact match

    -e --match '192.168.0.1'
    
  • AND

    --match 'Hello World'
    
  • OR

    --match 'Hello OR World'
    
  • NOT

    --match 'Hello World -Wide'
    
  • PHRASE

    --match '"Hello World"'
    --match '\"192.168.0.1\"' <- IP address case(same as -e flag)
    --match '\"192.168.0.1\" src sent' <- PHRASE + AND search
    
  • asterisk(*)

    --match 'H* World'
    
  • HAT

    --match '^Hello World'
    

Development environment

  • CentOS 7.3
  • Python 3.5.1(use anaconda packages)
  • SQLite3(version 3.9.2)

Dependency softwares

  • Python 3
  • SQLite3
  • GNU Parallel

Benchmark

Compare with Apache Spark

  • Hayabusa and Spark time comparison

  • Comarison of distributes Spark environment and the stand-alone Hayabusa

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].