All Projects → imaben → Php Akm

imaben / Php Akm

Ahocorasick keyword match.

Programming Languages

c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to Php Akm

Libsodium Php
The PHP extension for libsodium.
Stars: ✭ 507 (+337.07%)
Mutual labels:  php-extension
Php Spx
A simple & straight-to-the-point PHP profiling extension with its built-in web UI
Stars: ✭ 972 (+737.93%)
Mutual labels:  php-extension
Ant php extension
PHP 扩展, 用于 PHP-FPM、FastCGI、LD_PRELOAD等模式下突破 disabled_functions
Stars: ✭ 85 (-26.72%)
Mutual labels:  php-extension
Wasmer Php
🐘🕸️ WebAssembly runtime for PHP
Stars: ✭ 796 (+586.21%)
Mutual labels:  php-extension
Graphql Parser Php
A PHP extension wrapping the libgraphqlparser library for parsing GraphQL.
Stars: ✭ 29 (-75%)
Mutual labels:  php-extension
Seaslog
An effective,fast,stable log extension for PHP.http://pecl.php.net/package/SeasLog http://php.net/SeasLog
Stars: ✭ 1,136 (+879.31%)
Mutual labels:  php-extension
Php7 Extension Explore
全网唯一PHP7扩展开发教程
Stars: ✭ 402 (+246.55%)
Mutual labels:  php-extension
Ext Collections
Array manipulation extension for PHP.
Stars: ✭ 94 (-18.97%)
Mutual labels:  php-extension
Php Extensions
PHP C扩展开发
Stars: ✭ 31 (-73.28%)
Mutual labels:  php-extension
Php Rs
A library to build PHP extensions in Rust.
Stars: ✭ 81 (-30.17%)
Mutual labels:  php-extension
Pinyin Php
`pinyin-php` is a php extension which could translate Chinese character into Chinese PinYin.
Stars: ✭ 26 (-77.59%)
Mutual labels:  php-extension
Xhprof Apm
Xhprof APM
Stars: ✭ 29 (-75%)
Mutual labels:  php-extension
Php Ion
Asynchronous PHP
Stars: ✭ 65 (-43.97%)
Mutual labels:  php-extension
Php Opencv
PHP extensions for OpenCV
Stars: ✭ 524 (+351.72%)
Mutual labels:  php-extension
Php Ext Collection
PHP collection extensions - PHP Version 7.x
Stars: ✭ 89 (-23.28%)
Mutual labels:  php-extension
Zan
高效稳定、安全易用、线上实时验证的全异步高性能网络库,通过PHP扩展方式使用。
Stars: ✭ 453 (+290.52%)
Mutual labels:  php-extension
Phptdlib
PHP Extension for tdlib/td written with PHP-CPP
Stars: ✭ 59 (-49.14%)
Mutual labels:  php-extension
Msphpsql
Microsoft Drivers for PHP for SQL Server
Stars: ✭ 1,570 (+1253.45%)
Mutual labels:  php-extension
Ip2region
Ip2region is a offline IP location library with accuracy rate of 99.9% and 0.0x millseconds searching performance. DB file is ONLY a few megabytes with all IP address stored. binding for Java,PHP,C,Python,Nodejs,Golang,C#,lua. Binary,B-tree,Memory searching algorithm
Stars: ✭ 9,836 (+8379.31%)
Mutual labels:  php-extension
Gutenberg Parser Rs
An experimental Rust parser for WordPress Gutenberg post format
Stars: ✭ 76 (-34.48%)
Mutual labels:  php-extension

Ahocorasick keyword match

    ____  __  ______  ___    __ __ __  ___
   / __ \/ / / / __ \/   |  / //_//  |/  /
  / /_/ / /_/ / /_/ / /| | / ,<  / /|_/ /
 / ____/ __  / ____/ ___ |/ /| |/ /  / /
/_/   /_/ /_/_/   /_/  |_/_/ |_/_/  /_/

关键字快速查找匹配

编译安装

$ git clone https://github.com/imaben/php-akm.git
$ cd php-akm
$ phpize
$ ./configure
$ make 
$ sudo make install

php.ini配置

[akm]
extension=akm.so
akm.enable=On|Off
akm.dict_dir=/home/dict

说明:

  • akm.enable表示扩展启用或关闭
  • akm.dict_dir用来指定关键词词典所在的文件夹

函数说明

akm_match

关键词匹配

array akm_match(string $dict_name, string $text)

参数说明

  • dict_name:字典名称,即akm.dict_dir配置所在文件夹下的字典库名称(文件名)
  • text:待匹配的文本

返回值

返回匹配含有keywordoffsetextension字段数组列表的二维数组,如:

[
	{
		"keyword" : "敏感词",
		"offset": 123,
		"extension": "扩展文本"
	},
	{
		"keyword" : "敏感词2",
		"offset": 1231,
		"extension": "扩展文本"
	}
]

说明:

  • keyword:敏感词
  • offset:敏感词所在文本中的位置
  • extension:扩展文本

akm_replace

关键词替换

int akm_replace(string $dict_name, string &$text, callable $callback)

参数说明

  • dict_name: 字典名称,即akm.dict_dir配置所在文件夹下的字典库名称(文件名)

  • text:待替换文本

  • callback:处理匹配字符串的回调,接受三个参数

    • string keyword:匹配出的关键词
    • int index:关键词在文本中的位置
    • string extension:扩展文本

    如回调中返回一个字符串,则把匹配到的关键词替换成返回值。如无返回值,则不做任何处理

返回值

返回成功匹配的关键词个数

akm_get_dict_list

获取词典列表

array akm_get_dict_list()

返回值

返回已索引的词典名称列表

字典数据结构

关键词|扩展文本
keyword1|extension_text1
keyword2|extension_text2
keyword3|extension_text3

说明:

  • “|”为关键词和扩展文本之间的分割符
  • “|”只对首行第一个有效,例“发票|政治|敏感”,则认定发票为关键词,政治|敏感为扩展文本
  • 如无“|”符,则整行被认为一个关键词,返回时无扩展文本
  • 每行定义一个关键词,空行自动跳过

性能测试

PC配置:

CPU:Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
内存:4GB*3 1600MHz
硬盘:东芝Q300
操作系统:Linux 4.6.4-1-ARCH

测试代码:

<?php

$dict_name = 'dict';
$text = file_get_contents("text.txt");

$start = microtime(true);;
$result = akm_match($dict_name, $text);
$end = microtime(true);;
echo '耗时' . ($end - $start) . "毫秒\n";
echo '内存占用:' . memory_get_usage() / 1024 / 1024 . "MB\n";

关键词数量:45423

测试文本大小:8KB

测试结果:

第一次:

耗时0.488毫秒
内存占用:0.365MB

第二次:

耗时0.491毫秒
内存占用:0.365MB

第三次:

耗时0.462毫秒
内存占用:0.365MB
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].