All Projects → Qihoo360 → Poseidon

Qihoo360 / Poseidon

Licence: bsd-3-clause
A search engine which can hold 100 trillion lines of log data.

Programming Languages

go
31211 projects - #10 most used programming language
java
68154 projects - #9 most used programming language
Roff
2310 projects
Protocol Buffer
295 projects
shell
77523 projects
Makefile
30231 projects

Projects that are alternatives of or similar to Poseidon

Vespa
The open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+108.98%)
Mutual labels:  big-data, search-engine
pypar
Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Stars: ✭ 66 (-96.32%)
Mutual labels:  big-data, map-reduce
shifting
A privacy-focused list of alternatives to mainstream services to help the competition.
Stars: ✭ 31 (-98.27%)
Mutual labels:  search-engine, big-data
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-79.81%)
Mutual labels:  big-data, search-engine
Collector Http
Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
Stars: ✭ 130 (-92.75%)
Mutual labels:  search-engine
Curatedseotools
Best SEO Tools Stash
Stars: ✭ 128 (-92.86%)
Mutual labels:  search-engine
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-92.86%)
Mutual labels:  big-data
Swift Selection Search
Swift Selection Search (SSS) is a simple Firefox add-on that lets you quickly search for some text in a page using your favorite search engines.
Stars: ✭ 125 (-93.03%)
Mutual labels:  search-engine
Cosmos Search
🌱 The next generation unbiased real-time privacy and user focused code search engine for everyone; Join us at https://discourse.opengenus.org/
Stars: ✭ 137 (-92.36%)
Mutual labels:  search-engine
Sonic
🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
Stars: ✭ 12,347 (+588.62%)
Mutual labels:  search-engine
Hama
Mirror of Apache Hama
Stars: ✭ 129 (-92.81%)
Mutual labels:  big-data
Azuredatalake
Samples and Docs for Azure Data Lake Store and Analytics
Stars: ✭ 128 (-92.86%)
Mutual labels:  big-data
Open Source Handbook
⭐️ Open source projects for all skill levels
Stars: ✭ 131 (-92.69%)
Mutual labels:  big-data
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+43.67%)
Mutual labels:  big-data
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-92.36%)
Mutual labels:  big-data
Richdem
High-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (-92.92%)
Mutual labels:  big-data
Instantsearch Android
A library of widgets and helpers to build instant-search applications on Android.
Stars: ✭ 129 (-92.81%)
Mutual labels:  search-engine
Rated Ranking Evaluator
Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures
Stars: ✭ 134 (-92.53%)
Mutual labels:  search-engine
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-8.42%)
Mutual labels:  big-data
Couchdb Documentation
Apache CouchDB Documentation
Stars: ✭ 128 (-92.86%)
Mutual labels:  big-data

波塞冬:Poseidon

波塞冬,是希腊神话中的海神,在这里是寓意着海量数据的主宰者。

Poseidon 系统是一个日志搜索平台,可以在数百万亿条、数百PB大小的日志数据中快速分析和检索特定字符串。 360公司是一个安全公司,在追踪 APT(高级持续威胁)事件时,经常需要在海量的历史日志数据中检索某些信息, 例如某个恶意样本在某个时间段内的活动情况。在 Poseidon 系统出现之前,都是写 Map/Reduce 计算任务在 Hadoop 集群中做计算, 一次任务所需的计算时间从数小时到数天不等,大大制约了 APT 事件的追踪效率。 Poseidon 系统就是为了解决这个需求,能在几秒钟内从数百万亿条规模的数据集中找出我们需要的数据,大大提高工作效率; 同时,这些数据不需要额外存储,仍然存放在Hadoop集群中,节省了大量存储和计算资源。该系统可以应用于任何结构化或非结构化海量(从万亿到千万亿规模)数据的查询检索需求。

Quick Start

所用技术

  • 倒排索引:构建日志搜索引擎的核心技术
  • Hadoop:用于存放原始数据和索引数据,并用来运行Map/Reduce程序来构建索引
  • Java:构建索引时是用Java开发的Map/Reduce程序
  • Golang:检索程序是用Golang开发的
  • Redis/Memcached:用于存储 Meta 元数据信息

目录结构

builder

这里存放的是数据生成工具

  • doc :将原始日志转换为Poseidon格式的数据。
  • docmeta :将Doc相关的元数据信息写入NoSQL库中的工具。
  • index :从原始日志生成倒排索引数据的程序工具,是Hadoop 的 Map/Reduce 作业程序。
  • indexmeta :将倒排索引的元数据写入NoSQL库中的工具。

common

目前仅仅用来存放该项目中用到的 protobuf 定义

docs

存放了相关的技术文档。

service

这里存放的是各个HTTP微服务服务的程序

  • hdfsreader :读取HDFS中某个文件路径的一段数据。
    • /service/hdfsreader
  • idgenerator :全局的ID生成中心
    • /service/idgenerator
  • meta :针对存放Meta信息的NoSQL提供统一的HTTP接口服务
    • /service/meta/business/doc/get : DocGzMeta 信息查询接口
    • /service/meta/business/doc/set : DocGzMeta 信息更新接口
    • /service/meta/business/index/get : InvertedIndexGzMeta 信息查询接口
    • /service/meta/business/index/set : InvertedIndexGzMeta 信息更新接口
  • searcher :Poseidon搜索引擎的核心检索服务
  • proxy :searcher的一个代理,并能实现跨时间的查询服务
  • allinone : 为简化部署,将 idgenerator/meta/searcher/proxy 四个微服务集成在一个进程中,提供统一的服务接口

其他

  • qq交流群:21557451
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].