All Projects → SPuerBRead → Htmlsimilarity

SPuerBRead / Htmlsimilarity

网页相似度判断:根据网页结构判断页面相似性 ,可用于相似度计算、越权检测等(Determine page similarity based on HTML page structure)

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Htmlsimilarity

Composer Lock Diff
See what has changed after a composer update
Stars: ✭ 154 (-18.52%)
Mutual labels:  diff
Nbdime
Tools for diffing and merging of Jupyter notebooks.
Stars: ✭ 2,135 (+1029.63%)
Mutual labels:  diff
Delta
Delta is a command-line diff tool implemented in Go.
Stars: ✭ 178 (-5.82%)
Mutual labels:  diff
Coveragechecker
Allows old code to use new standards
Stars: ✭ 159 (-15.87%)
Mutual labels:  diff
Textdistance
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Stars: ✭ 2,575 (+1262.43%)
Mutual labels:  diff
Nytdiff
Code for the twitter bot nyt_diff
Stars: ✭ 166 (-12.17%)
Mutual labels:  diff
Php Htmldiff
A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables
Stars: ✭ 145 (-23.28%)
Mutual labels:  diff
Diffobj
Compare R Objects with a Diff
Stars: ✭ 188 (-0.53%)
Mutual labels:  diff
Waldo
Find differences between R objects
Stars: ✭ 165 (-12.7%)
Mutual labels:  diff
Docker Osm
A docker compose project to setup an OSM PostGIS database with automatic updates from OSM periodically
Stars: ✭ 172 (-8.99%)
Mutual labels:  diff
Graphtage
A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.
Stars: ✭ 2,062 (+991.01%)
Mutual labels:  diff
Api Diff
A command line tool for diffing json rest APIs
Stars: ✭ 164 (-13.23%)
Mutual labels:  diff
Expdevbadchars
Bad Characters highlighter for exploit development purposes supporting multiple input formats while comparing.
Stars: ✭ 167 (-11.64%)
Mutual labels:  diff
Deepdiff
🦀Amazingly incredible extraordinary lightning fast diffing in Swift
Stars: ✭ 1,995 (+955.56%)
Mutual labels:  diff
Vim Mergetool
🍰 Efficient way of using Vim as a Git mergetool
Stars: ✭ 179 (-5.29%)
Mutual labels:  diff
Dwifft
Swift Diff
Stars: ✭ 1,822 (+864.02%)
Mutual labels:  diff
Emacs Vdiff
Like vimdiff for Emacs
Stars: ✭ 165 (-12.7%)
Mutual labels:  diff
Split Diff
Side-by-side file compare for the Atom text editor.
Stars: ✭ 188 (-0.53%)
Mutual labels:  diff
Dtl
diff template library written by C++
Stars: ✭ 180 (-4.76%)
Mutual labels:  diff
Mirrordiffkit
Graduation from messy XCTAssertEqual messages.
Stars: ✭ 168 (-11.11%)
Mutual labels:  diff

HTMLSimilarity

根据网页结构判断页面相似性(Determine page similarity based on HTML page structure)

PyV

使用方法

from htmlsimilarity import get_html_similarity

is_similarity, value = get_html_similarity(html_doc1, html_doc2)

说明

输入参数:
  • HTML文档1
  • HTML文档2
  • 降维后的维数,默认是5000
返回值:
  • 是否相似
  • 相似值(value<0.2时相似,value>0.2时不相似)

判断方法

根据网页的DOM树确定网页的模板特征向量,对模板特征向量计算网页结构相似性。

详细参考:李景阳, 张波. 网页结构相似性确定方法及装置:.

原理参考上述专利文章,对其判断相似性部分进行简单实现。

用途

判断越权时,需要对response进行对比,当后端返回渲染后HTML的情况下,无法直接判断是否出现了越权,利用常规的文本相似度对比如difflib,通过分词或最长公共子串等方法进行判断并不适用于用来判断越权,所以使用根据页面结构判断相似度,确定是否出现了越权。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].