All Projects → codeplea → Ahocorasickphp

codeplea / Ahocorasickphp

Licence: zlib
Aho-Corasick multi-keyword string searching library in PHP.

Projects that are alternatives of or similar to Ahocorasickphp

Algorithm
The repository algorithms implemented on the Go
Stars: ✭ 163 (-3.55%)
Mutual labels:  algorithm, search-algorithm
Data Structure And Algorithms With Es6
Data Structures and Algorithms using ES6
Stars: ✭ 594 (+251.48%)
Mutual labels:  algorithm, search-algorithm
Algorithms
A collection of algorithms and data structures
Stars: ✭ 11,553 (+6736.09%)
Mutual labels:  algorithm, search-algorithm
Pythonrobotics
Python sample codes for robotics algorithms.
Stars: ✭ 13,934 (+8144.97%)
Mutual labels:  algorithm
Algorithmictrading
This repository contains three ways to obtain arbitrage which are Dual Listing, Options and Statistical Arbitrage. These are projects in collaboration with Optiver and have been peer-reviewed by staff members of Optiver.
Stars: ✭ 157 (-7.1%)
Mutual labels:  algorithm
Rolling
Computationally efficient rolling window iterators for Python (including sum, min/max, median and more...)
Stars: ✭ 158 (-6.51%)
Mutual labels:  algorithm
Leetcode Js
用 JS 刷 LeetCode
Stars: ✭ 165 (-2.37%)
Mutual labels:  algorithm
Gasyori100knock
image processing codes to understand algorithm
Stars: ✭ 1,988 (+1076.33%)
Mutual labels:  algorithm
Awesome Differential Privacy
list of differential-privacy related repositories
Stars: ✭ 164 (-2.96%)
Mutual labels:  algorithm
Paper checking system
基于C#和C++开发的文本查重/论文查重系统,一亿字次级论文库秒级查重。关联:查重算法、数据去重、文档查重、文本去重、标书查重、辅助防串标
Stars: ✭ 160 (-5.33%)
Mutual labels:  algorithm
Python data structures and algorithms
Python 中文数据结构和算法教程
Stars: ✭ 2,194 (+1198.22%)
Mutual labels:  algorithm
Openalgo
💹 openAlgo is a public repository for various work product relavant to algorithms and the high frequency low latency electronic trading space with a bias toward market microstructure as well as exchange traded futures and options.
Stars: ✭ 158 (-6.51%)
Mutual labels:  algorithm
Go Jump Consistent Hash
⚡️ Fast, minimal memory, consistent hash algorithm
Stars: ✭ 163 (-3.55%)
Mutual labels:  algorithm
Boostaroota
A fast xgboost feature selection algorithm
Stars: ✭ 165 (-2.37%)
Mutual labels:  algorithm
Fe Interview
宇宙最强的前端面试指南 (https://lucifer.ren/fe-interview)
Stars: ✭ 2,284 (+1251.48%)
Mutual labels:  algorithm
Funnyalgorithms
A repository with a bunch of funny algorithms, beginners friendly
Stars: ✭ 161 (-4.73%)
Mutual labels:  algorithm
C Plus Plus
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
Stars: ✭ 17,151 (+10048.52%)
Mutual labels:  algorithm
Jstarcraft Ai
目标是提供一个完整的Java机器学习(Machine Learning/ML)框架,作为人工智能在学术界与工业界的桥梁. 让相关领域的研发人员能够在各种软硬件环境/数据结构/算法/模型之间无缝切换. 涵盖了从数据处理到模型的训练与评估各个环节,支持硬件加速和并行计算,是最快最全的Java机器学习库.
Stars: ✭ 160 (-5.33%)
Mutual labels:  algorithm
Internet Recruiting Algorithm Problems
《程序员代码面试指南》、公司招聘笔试题、《剑指Offer》、算法、数据结构
Stars: ✭ 163 (-3.55%)
Mutual labels:  algorithm
Autocomplete
Persistent, simple, powerful and portable autocomplete library
Stars: ✭ 166 (-1.78%)
Mutual labels:  search-algorithm

Aho Corasick in PHP

This is a small library which implements the Aho-Corasick string search algorithm.

It's coded in pure PHP and self-contained in a single file, ahocorasick.php.

It's useful when you want to search for many keywords all at once. It's faster than simply calling strpos many times, and it's much faster than calling preg_match_all with something like /keyword1|keyword2|...|keywordn/.

I originally wrote this to use with F5Bot, since it's searching for the same set of a few thousand keywords over and over again.

Usage

It's designed to be really easy to use. You create the ahocorasick object, add your keywords, call finalize() to finish setup, and then search your text. It'll return an array of the keywords found and their position in the search text.

Create, add keywords, and finalize():

require('ahocorasick.php');

$ac = new ahocorasick();

$ac->add_needle('art');
$ac->add_needle('cart');
$ac->add_needle('ted');

$ac->finalize();

Call search() to preform the actual search. It'll return an array of matches.

$found = $ac->search('a carted mart lot one blue ted');
print_r($found);

$found will be an array with these elements:

[0] => Array
    (
        [0] => cart
        [1] => 2
    )
[1] => Array
    (
        [0] => art
        [1] => 3
    )
[2] => Array
    (
        [0] => ted
        [1] => 5
    )
[3] => Array
    (
        [0] => art
        [1] => 10
    )
[4] => Array
    (
        [0] => ted
        [1] => 27
    )

See example.php for a complete example.

Speed

A simple benchmarking program is included which compares various alternatives.

$ php benchmark.php
Loaded 3000 keywords to search on a text of 19377 characters.

Searching with strpos...
time: 0.38440799713135

Searching with preg_match...
time: 5.6817619800568

Searching with preg_match_all...
time: 5.0735609531403

Searching with aho corasick...
time: 0.054709911346436

Note: the regex solutions are actually slightly broken. They won't work if you have a keyword that is a prefix or suffix of another. But hey, who really uses regex when it's not slightly broken?

Also keep in mind that building the search tree (the add_needle() and finalize() calls) takes time. So you'll get the best speed-up if you're reusing the same keywords and calling search() many times.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].