Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → kannan-ar → Marigold.openxhtml

kannan-ar / Marigold.openxhtml

Licence: mit

MariGold.OpenXHTML is a wrapper library for Open XML SDK to convert HTML documents into Open XML word documents.

Labels

docx html-parser

Projects that are alternatives of or similar to Marigold.openxhtml

Fiduswriter

Fidus Writer is an online collaborative editor for academics.

Stars: ✭ 405 (+820.45%)

Mutual labels: docx

Phpword

A pure PHP library for reading and writing word processing documents

Stars: ✭ 6,017 (+13575%)

Mutual labels: docx

Docxtemplater Link Module

⚓️ Hyperlink module for docxtemplater

Stars: ✭ 12 (-72.73%)

Mutual labels: docx

Docx2md

Convert Microsoft Word Document to Markdown

Stars: ✭ 498 (+1031.82%)

Mutual labels: docx

Modest

Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.

Stars: ✭ 572 (+1200%)

Mutual labels: html-parser

Apifier

Apifier is a very simple HTML parser written in Python based on CSS selectors

Stars: ✭ 5 (-88.64%)

Mutual labels: html-parser

Jodd

Jodd! Lightweight. Java. Zero dependencies. Use what you like.

Stars: ✭ 3,616 (+8118.18%)

Mutual labels: html-parser

Clojure Soup

Clojurized access for Jsoup.

Stars: ✭ 38 (-13.64%)

Mutual labels: html-parser

Kodexplorer

A web based file manager,web IDE / browser based code editor

Stars: ✭ 5,490 (+12377.27%)

Mutual labels: docx

Docx Builder

NPM Module for creating or merging .docx files

Stars: ✭ 11 (-75%)

Mutual labels: docx

Html Parser

php html parser，类似与PHP Simple HTML DOM Parser，但是比它快好几倍

Stars: ✭ 510 (+1059.09%)

Mutual labels: html-parser

Koodo Reader

A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web

Stars: ✭ 2,938 (+6577.27%)

Mutual labels: docx

Fuzi

A fast & lightweight XML & HTML parser in Swift with XPath & CSS support

Stars: ✭ 894 (+1931.82%)

Mutual labels: html-parser

Justext

Heuristic based boilerplate removal tool

Stars: ✭ 418 (+850%)

Mutual labels: html-parser

Quip Export

Export all folders and documents from Quip

Stars: ✭ 28 (-36.36%)

Mutual labels: docx

Docx Templates

Template-based docx report creation

Stars: ✭ 382 (+768.18%)

Mutual labels: docx

Docconv

Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

Stars: ✭ 735 (+1570.45%)

Mutual labels: docx

Desktopeditors

An office suite that combines text, spreadsheet and presentation editors allowing to create, view and edit local documents

Stars: ✭ 1,008 (+2190.91%)

Mutual labels: docx

Htmlagilitypack.netcore

An agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. Deprecated as there's new maintainer for original HAP project.

Stars: ✭ 31 (-29.55%)

Mutual labels: html-parser

React Docx

React reconciler for DOCX.js.

Stars: ✭ 23 (-47.73%)

Mutual labels: docx

View All Similar Projects ➔

MariGold.OpenXHTML

OpenXHTML is a wrapper library for Open XML SDK to convert HTML documents into Open XML word documents. It has simply encapsulated the complexity of Open XML yet exposes the properties of Open XML for manipulation.

Installing via NuGet

In Package Manager Console, enter the following command:

Install-Package MariGold.OpenXHTML

Usage

To create an empty Open XML word document using the OpenXHTML, use the following code.

using MariGold.OpenXHTML;

WordDocument doc = new WordDocument("sample.docx");
doc.Save();

To create an Open XML document from an HTML document, use the following code.

using MariGold.OpenXHTML;

WordDocument doc = new WordDocument("sample.docx");
doc.Process(new HtmlParser("<div>sample text</div>"));
doc.Save();

Once the HTML is processed, you can access the Open XML document using the following properties in WordDocument.

public WordprocessingDocument WordprocessingDocument { get; }
public MainDocumentPart MainDocumentPart { get; }
public Document Document { get; }

Any modifications on Open XML document should be done before the Save method. This has to be done since the Save method will write all the changes and unload the document from the memory. So any further modifications may result in an exception. For example, if you want to append a paragraph at the document body, try the following code.

using MariGold.OpenXHTML;
using DocumentFormat.OpenXml.Wordprocessing;

WordDocument doc = new WordDocument("sample.docx");
doc.Process(new HtmlParser("<div>sample text</div>"));
doc.Document.Body.AppendChild<Paragraph>(new Paragraph(new Run(new Text("added text"))));
doc.Save();

You can also create an Open XML document in memory. Following example illustrates how to save the document in a MemoryStream.

using (MemoryStream mem = new MemoryStream())
{
	WordDocument doc = new WordDocument(mem);
	doc.Save();
}

Relative Images

OpenXHTML cannot process the images with relative URL. This can be solved using the ImagePath property to set the base address for every relative image paths. The image path can be either a URL or a physical folder address.

using MariGold.OpenXHTML;

WordDocument doc = new WordDocument("sample.docx");
doc.ImagePath = "http:\\abc.com";
doc.Process(new HtmlParser("<img src=\"sample.png\" />"));
doc.Save();

You can also assign any file URI address on image path.

doc.ImagePath = @"file:///C:/Img";

Base URL

Like relative images, an HTML document may also contain links with relative path. This can be resolved using the BaseURL property.

using MariGold.OpenXHTML;

WordDocument doc = new WordDocument("sample.docx");
doc.BaseURL = "http:\\abc.com";
doc.Process(new HtmlParser("<a href=\"index.htm\">sample</a>"));
doc.Save();

Also, if there are any relative images in the given html document and ImagePath is not assigned, OpenXHTML will attempt to use BaseURL to resolve relative image paths. So using BaseURL, you can resolve both relative image paths and links. The reason to create a seperate property for image path is that sometimes image location is different from base URL address.

Uri Schema

The protocol relative URLs can be resolved using the UriSchema property.

doc.UriSchema = Uri.UriSchemeHttp;

HTML Parsing

OpenXHTML has a built-in HTML and CSS parser (MariGold.HtmlParser) which can be complectly replaced with any external HTML and CSS parser. The Process method in WordDocument class expects an IParser interface type implementation to process the HTML and CSS. You can create an implementation of this IParser interface to parse the HTML and CSS.

public void Process(IParser parser);

interface IParser
{
	string BaseURL { get; set; }
	string UriSchema { get; set; }

	decimal CalculateRelativeChildFontSize(string parentFontSize, string childFontSize);
	IHtmlNode FindBodyOrFirstElement();
}

Here is the structure of IParser. The BaseURL and UriSchema are just two simple properties to store the base url address and uri schema for processing the HTML images and links. Both properties are used to resolve the protocol free and relative path of external style sheet URLs. The CalculateRelativeChildFontSize method is used to calculate the relative child font size. For example, in the below html, the font size of the h1 tag is 20 pixel.

<div style="font-size:16px"><h1>sample</h1></div>

If you don't want to re-implement this functionality, you can simply use the CSSUtility class in your implementation.

using MariGold.HtmlParser;

return CSSUtility.CalculateRelativeChildFontSize(parentFontSize, childFontSize);

The FindBodyOrFirstElement method is expected to return an IHtmlNode representation of html body tag and the hierarchy of its child elements. If the document does not have body element, then it is expected to return the first root element. All the CSS styles and HTML attributes of IHtmlNode must be resolved and filled in the respective properties.

References

Convert HTML to Word Document using CKEditor and MariGold.OpenXHTML

Windows Forms Application - Convert HTML Files To DOCX Files With MariGold.OpenXHTML

Implement Custom HTML Parser using AngleSharp

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 44

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗