Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

TinyPart is an iOS modularization framework implemented by Ojective-C. It also supports URL-routing and inter-module communication. TinyPart是一个由Objective-C编写的面向协议的iOS模块化框架，同时它还支持URL路由和模块间通信机制。

Stars: ✭ 120 (-23.57%)

Mutual labels: url

Urlshorting

A simple but powerful URL shortener

Stars: ✭ 150 (-4.46%)

Mutual labels: url

Pyfunceble

The tool to check the availability or syntax of domain, IP or URL.

Stars: ✭ 143 (-8.92%)

Mutual labels: url

Api Query Params

Convert URL query parameters to MongoDB queries

Stars: ✭ 141 (-10.19%)

Mutual labels: url

Favorite Link

❤️ 每日收集喜欢的开源项目 | RSS 订阅 | 快知 app 订阅

Stars: ✭ 1,741 (+1008.92%)

Mutual labels: links

Xioc

Extract indicators of compromise from text, including "escaped" ones.

Stars: ✭ 148 (-5.73%)

Mutual labels: extraction

Filite

A simple, light and standalone pastebin, URL shortener and file-sharing service

Stars: ✭ 125 (-20.38%)

Mutual labels: links

Tldts

JavaScript Library to work against complex domain names, subdomains and URIs.

Stars: ✭ 151 (-3.82%)

Mutual labels: url

Urltools

Elegant URL handling in R

Stars: ✭ 121 (-22.93%)

Mutual labels: url

Slugger

A Slugger for elixir.

Stars: ✭ 149 (-5.1%)

Mutual labels: url

Ie Survey

北航大数据高精尖中心张日崇研究团队对信息抽取领域的调研。包括实体识别，关系抽取，属性抽取等子任务，每类子任务分别对学术界和工业界进行调研。

Stars: ✭ 134 (-14.65%)

Mutual labels: extraction

Full Text Rss

Full-Text RSS can transform partial feeds to deliver the full content stripped of clutter and ads

Stars: ✭ 134 (-14.65%)

Mutual labels: extraction

Galimatias

galimatias is a URL parsing and normalization library written in Java.

Stars: ✭ 146 (-7.01%)

Mutual labels: url

View All Similar Projects ➔

autolink-java

Java library to extract links such as URLs and email addresses from plain text. It's smart about where a link ends, such as with trailing punctuation.

Introduction

You might think: "Do I need a library for this? I can just write a regex for this!". Let's look at a few cases:

In text like https://example.com/. the link should not include the trailing dot
https://example.com/, should not include the trailing comma
(https://example.com/) should not include the parens

Seems simple enough. But then we also have these cases:

https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda) should include the trailing paren
https://üñîçøðé.com/ä should also work for Unicode (including Emoji and Punycode)
<https://example.com/> should not include angle brackets

This library behaves as you'd expect in the above cases and many more. It parses the input text in one pass with limited backtracking.

Thanks to Rinku for the inspiration.

Usage

This library requires at least Java 7 (tested up to Java 11). It works on Android (minimum API level 15). It has no external dependencies.

Maven coordinates (see here for other build systems):

<dependency>
    <groupId>org.nibor.autolink</groupId>
    <artifactId>autolink</artifactId>
    <version>0.10.0</version>
</dependency>

Extracting links:

import org.nibor.autolink.*;

String input = "wow, so example: http://test.com";
LinkExtractor linkExtractor = LinkExtractor.builder()
        .linkTypes(EnumSet.of(LinkType.URL, LinkType.WWW, LinkType.EMAIL))
        .build();
Iterable<LinkSpan> links = linkExtractor.extractLinks(input);
LinkSpan link = links.iterator().next();
link.getType();        // LinkType.URL
link.getBeginIndex();  // 17
link.getEndIndex();    // 32
input.substring(link.getBeginIndex(), link.getEndIndex());  // "http://test.com"

Note that by default all supported types of links are extracted. If you're only interested in specific types, narrow it down using the linkTypes method.

The above returns all the links. Sometimes what you want to do is go over some input, process the links and keep the surrounding text. For that case, there's an extractSpans method.

Here's an example of using that to transform the text to HTML and wrapping URLs in an <a> tag (escaping is done using owasp-java-encoder):

import org.nibor.autolink.*;
import org.owasp.encoder.Encode;

String input = "wow http://test.com such linked";
LinkExtractor linkExtractor = LinkExtractor.builder()
        .linkTypes(EnumSet.of(LinkType.URL)) // limit to URLs
        .build();
Iterable<Span> spans = linkExtractor.extractSpans(input);

StringBuilder sb = new StringBuilder();
for (Span span : spans) {
    String text = input.substring(span.getBeginIndex(), span.getEndIndex());
    if (span instanceof LinkSpan) {
        // span is a URL
        sb.append("<a href=\"");
        sb.append(Encode.forHtmlAttribute(text));
        sb.append("\">");
        sb.append(Encode.forHtml(text));
        sb.append("</a>");
    } else {
        // span is plain text before/after link
        sb.append(Encode.forHtml(text));
    }
}

sb.toString();  // "wow <a href=\"http://test.com\">http://test.com</a> such linked"

Note that this assumes that the input is plain text, not HTML. Also see the "What this is not" section below.

Features

URL extraction

Extracts URLs of the form scheme://example with any potentially valid scheme. URIs such as example:test are not matched (may be added as an option in the future). If only certain schemes should be allowed, the result can be filtered. (Note that schemes can contain dots, so foo.http://example is recognized as a single link.)

Includes heuristics for not including trailing delimiters such as punctuation and unbalanced parentheses, see examples below.

Supports internationalized domain names (IDN). Note that they are not validated and as a result, invalid URLs may be matched.

Example input and linked result:

http://example.com. → http://example.com.
http://example.com, → http://example.com,
(http://example.com) → (http://example.com)
(... (see http://example.com)) → (... (see http://example.com))
https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda) → https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda)
http://üñîçøðé.com/ → http://üñîçøðé.com/

Use LinkType.URL for this, and see test cases here.

WWW link extraction

Extract links like www.example.com. They need to start with www. but don't need a scheme://. For detecting the end of the link, the same heuristics apply as for URLs.

Examples:

www.example.com. → www.example.com.
(www.example.com) → (www.example.com)
[..] link:www.example.com [..] → [..] link:www.example.com [..]

Not supported:

Uppercase www's, e.g. WWW.example.com and wWw.example.com
Too many or too few w's, e.g. wwww.example.com

The domain must have at least 3 parts, so www.com is not valid, but www.something.co.uk is.

Use LinkType.WWW for this, and see test cases here.

Email address extraction

Extracts emails such as [email protected]. Matches international email addresses, but doesn't verify the domain name (may match too much).

Examples:

[email protected] → [email protected]
[email protected] → [email protected].
[email protected], → [email protected],
üñîçøðé@üñîçøðé.com → üñîçøðé@üñîçøðé.com

Not supported:

Quoted local parts, e.g. "this is sparta"@example.com
Address literals, e.g. [email protected][127.0.0.1]

Note that the domain must have at least one dot (e.g. [email protected] isn't matched), unless the emailDomainMustHaveDot option is disabled.

Use LinkType.EMAIL for this, and see test cases here.

What this is not

This library is intentionally not aware of HTML. If it was, it would need to depend on an HTML parser and renderer. Consider this input:

HTML that contains <a href="https://one.example">links</a> but also plain URLs like https://two.example.

If you want to turn the plain links into a elements but leave the already linked ones intact, I recommend:

Parse the HTML using an HTML parser library
Walk through the resulting DOM and use autolink-java to find links within text nodes only
Turn those into a elements
Render the DOM back to HTML

Contributing

See CONTRIBUTING.md file.

License

MIT licensed, see LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 157

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗