All Projects → ying32 → htmlparser

ying32 / htmlparser

Licence: MIT license
delphi html parser(代码是改自原wr960204的HtmlParser)

Programming Languages

pascal
1382 projects

Projects that are alternatives of or similar to htmlparser

Nsoup
NSoup is a .NET port of the jsoup (http://jsoup.org) HTML parser and sanitizer originally written in Java
Stars: ✭ 145 (+123.08%)
Mutual labels:  html-parser
React Native Htmlview
A React Native component which renders HTML content as native views
Stars: ✭ 2,546 (+3816.92%)
Mutual labels:  html-parser
html5parser
A super tiny and fast html5 AST parser.
Stars: ✭ 153 (+135.38%)
Mutual labels:  html-parser
Html Agility Pack
Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
Stars: ✭ 2,014 (+2998.46%)
Mutual labels:  html-parser
Kanna
Kanna(鉋) is an XML/HTML parser for Swift.
Stars: ✭ 2,227 (+3326.15%)
Mutual labels:  html-parser
Skrape.it
A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
Stars: ✭ 231 (+255.38%)
Mutual labels:  html-parser
Wxparse
微信小程序富文本解析
Stars: ✭ 135 (+107.69%)
Mutual labels:  html-parser
html-parser
Simple HTML to JSON parser use Regexp and String.indexOf
Stars: ✭ 34 (-47.69%)
Mutual labels:  html-parser
Nokogiri
HTML parser for PHP - Парсер HTML
Stars: ✭ 214 (+229.23%)
Mutual labels:  html-parser
html-parser
A simple and general purpose html/xhtml parser, using Pest.
Stars: ✭ 56 (-13.85%)
Mutual labels:  html-parser
Didom
Simple and fast HTML and XML parser
Stars: ✭ 1,939 (+2883.08%)
Mutual labels:  html-parser
Unhtml.rs
A magic html parser
Stars: ✭ 180 (+176.92%)
Mutual labels:  html-parser
Webpageparser
A delightful xml and html parsing relish for iOS
Stars: ✭ 236 (+263.08%)
Mutual labels:  html-parser
Minimize
Minimize HTML
Stars: ✭ 150 (+130.77%)
Mutual labels:  html-parser
bkit
build a messenger bot using HTML
Stars: ✭ 36 (-44.62%)
Mutual labels:  html-parser
Autocser
AutoCSer is a high-performance RPC framework. AutoCSer 是一个以高效率为目标向导的整体开发框架。主要包括 TCP 接口服务框架、TCP 函数服务框架、远程表达式链组件、前后端一体 WEB 视图框架、ORM 内存索引缓存框架、日志流内存数据库缓存组件、消息队列组件、二进制 / JSON / XML 数据序列化 等一系列无缝集成的高性能组件。
Stars: ✭ 140 (+115.38%)
Mutual labels:  html-parser
Posthtml
PostHTML is a tool to transform HTML/XML with JS plugins
Stars: ✭ 2,737 (+4110.77%)
Mutual labels:  html-parser
AdvancedHTMLParser
Fast Indexed python HTML parser which builds a DOM node tree, providing common getElementsBy* functions for scraping, testing, modification, and formatting. Also XPath.
Stars: ✭ 90 (+38.46%)
Mutual labels:  html-parser
DHTMLParser
D HTML Parser, similar to python BeautifulSoup
Stars: ✭ 17 (-73.85%)
Mutual labels:  html-parser
Prettyhtml
💅 The formatter for the modern web https://prettyhtml.netlify.com/
Stars: ✭ 241 (+270.77%)
Mutual labels:  html-parser

htmlparser

delphi html parser

代码是改自原wr960204的HtmlParser,因为自己的需求需要对html进行修改操作,但无奈只支持读取操作,所以在此基础上做了修改并命名为HtmlParserEx.pas与之区别。

使用

// 从文件加载示例
procedure Test;
var
  LHtml: IHtmlElement;
  LList: IHtmlElementList;
  LStrStream: TStringStream;
begin
  LStrStream := TStringStream.Create('', TEncoding.UTF8);
  try
    LStrStream.LoadFromFile('view-source_https___github.com_ying32_htmlparser.html');
    LHtml := ParserHTML(LStrStream.DataString);
    if LHtml <> nil then
    begin
      LList := LHtml.SimpleCSSSelector('a');
      for LHtml in LList do
        Writeln('url:', lhtml.Attributes['href']);
    end;
  finally
    LStrStream.Free;
  end;
end;

修改记录

ying32修改记录:
Email:[email protected]

2017年05月04日

1、去除RegularExpressions单元的引用,不再使用TRegEx改使用RegularExpressionsCore单元中的TPerlRegEx

2017年04月19日

1、增加使用XPath功能的编译指令"UseXPath",默认不使用XPath,个人感觉没什么用

2016年11月23日

1、简单支持XPath,简单的吧,利用xpath转css selector,嘿
xpath转换的代码改自python版本

IHtmlElement

  LHtml.FindX('/html/head/title').Each(
    procedure(AIndex: Integer; AEl: IHtmlElement) 
    begin
      Writeln('xpath index=', AIndex, ',  a=', AEl.Text);  
    end
  );

2016年11月15日

IHtmlElement和THtmlElement的改变:
1、Attributes属性增加Set方法
2、TagName属性增加Set方法
3、增加Parent属性
4、增加RemoveAttr方法
5、增加Remove方法
6、增加RemoveChild方法
7、增加Find方法,此为SimpleCSSSelector的一个另名
8、_GetHtml不再直接附加FOrignal属性值,而是使用GetSelfHtml重新对修改后的元素进行赋值操作,并更新FOrignal的值
9、增加Text属性
10、修改InnerText与Text属性增加write功能 11、增加AppedChild方法

IHtmlElementList和THtmlElementList的改变:
1、增加RemoveAll方法
2、增加Remove方法
3、增加Each方法
4、增加Text属性

修改后的新功能的一些使用法

IHtmlElement

     // 修改属性
     EL.Attributes['class'] := 'xxxx';
     // 修改标记
     EL.TagName = 'a';
     // 移除自己
     EL.Remove; 
     // 移除子结点
     EL.RemoveChild(El2);
     // css选择器查找,简化用
     El.Find('a');
     // 附加一个新的元素
     el2 := El.AppendChild('a');
     
     

IHtmlElementList

  // 移除选择的元素
  LHtml.Find('a').RemoveAll;

  // 查找并遍沥
  LHtml.Find('a').Each(
    procedure(AIndex: Integer; AEl: IHtmlElement)
    begin
      Writeln('Index=', AIndex, ',  href=', AEl.Attributes['href']);
    end);

  // 直接输出,仅选中的第一个元素
  Writeln(LHtml.Find('title').Text);
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].