All Projects → gagolews → Stringi

gagolews / Stringi

Licence: other
THE String Processing Package for R (with ICU)

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Stringi

stringx
Drop-in replacements for base R string functions powered by stringi
Stars: ✭ 14 (-93.14%)
Mutual labels:  unicode, text, icu, regex, regexp, string-manipulation, text-processing
Regex Automata
A low level regular expression library that uses deterministic finite automata.
Stars: ✭ 203 (-0.49%)
Mutual labels:  regex, text-processing, regexp
Regexpu
A source code transpiler that enables the use of ES2015 Unicode regular expressions in ES5.
Stars: ✭ 201 (-1.47%)
Mutual labels:  regex, unicode, regexp
r4strings
Handling Strings in R
Stars: ✭ 39 (-80.88%)
Mutual labels:  regex, string-manipulation, text-processing
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (-27.45%)
Mutual labels:  regex, text-processing, regexp
regXwild
⏱ Superfast ^Advanced wildcards++? | Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET via Conari (with caching of 0x29 opcodes +optimizations) etc.
Stars: ✭ 20 (-90.2%)
Mutual labels:  text, regex, regexp
Text
An efficient packed, immutable Unicode text type for Haskell, with a powerful loop fusion optimization framework.
Stars: ✭ 248 (+21.57%)
Mutual labels:  text, unicode, string-manipulation
Proposal Regexp Unicode Property Escapes
Proposal to add Unicode property escapes `\p{…}` and `\P{…}` to regular expressions in ECMAScript.
Stars: ✭ 112 (-45.1%)
Mutual labels:  regex, unicode, regexp
subst
Search and des... argh... replace in many files at once. Use regexp and power of Python to replace what you want.
Stars: ✭ 20 (-90.2%)
Mutual labels:  text, regex, regexp
Chr
🔤 Lightweight R package for manipulating [string] characters
Stars: ✭ 18 (-91.18%)
Mutual labels:  regex, text-processing, string-manipulation
Emoji Regex
A regular expression to match all Emoji-only symbols as per the Unicode Standard.
Stars: ✭ 1,134 (+455.88%)
Mutual labels:  regex, unicode, regexp
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (-35.29%)
Mutual labels:  natural-language-processing, unicode, icu
Regex Dos
👮 👊 RegEx Denial of Service (ReDos) Scanner
Stars: ✭ 143 (-29.9%)
Mutual labels:  regex, regexp
Stanza Old
Stanford NLP group's shared Python tools.
Stars: ✭ 142 (-30.39%)
Mutual labels:  natural-language-processing, text-processing
Regex
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
Stars: ✭ 2,125 (+941.67%)
Mutual labels:  regex, regexp
Nlpre
Python library for Natural Language Preprocessing (NLPre)
Stars: ✭ 158 (-22.55%)
Mutual labels:  natural-language-processing, text-processing
Grex
A command-line tool and library for generating regular expressions from user-provided test cases
Stars: ✭ 4,847 (+2275.98%)
Mutual labels:  regex, regexp
Voca rs
Voca_rs is the ultimate Rust string library inspired by Voca.js, string.py and Inflector, implemented as independent functions and on Foreign Types (String and str).
Stars: ✭ 167 (-18.14%)
Mutual labels:  unicode, string-manipulation
Guide To Swift Strings Sample Code
Xcode Playground Sample Code for the Flight School Guide to Swift Strings
Stars: ✭ 136 (-33.33%)
Mutual labels:  regex, unicode
Textwrap
An efficient and powerful Rust library for word wrapping text.
Stars: ✭ 164 (-19.61%)
Mutual labels:  text, unicode

stringi

THE String Processing Package for R

Build Status DOI RStudio CRAN mirror downloads RStudio CRAN mirror downloads RStudio CRAN mirror downloads

Online reference manual is available at https://stringi.gagolewski.com/.

Paper on stringi (draft version — any comments are welcome) is available at https://stringi.gagolewski.com/_static/vignette/stringi.pdf.

stringi (pronounced “stringy”, IPA [strinɡi]) is THE R package for string/text/natural language processing. It is very fast, consistent, convenient, and — thanks to the ICU – International Components for Unicode library — portable across all locales and platforms.

Available features include:

  • string concatenation, padding, wrapping,
  • substring extraction,
  • pattern searching (e.g., with Java-like regular expressions),
  • collation and sorting,
  • random string generation,
  • case mapping,
  • string transliteration,
  • Unicode normalisation,
  • date-time formatting and parsing,

and many more.

Package Maintainer: Marek Gagolewski

Authors and Contributors: Marek Gagolewski, with contributions from Bartłomiej Tartanus and many others.

The package's API was inspired by Hadley Wickham's stringr package (and since 2015 stringr powered by stringi).

Homepage: https://stringi.gagolewski.com/

CRAN Entry: https://cran.r-project.org/web/packages/stringi/

How to access the stringi C++ API from within an Rcpp-based R package

System Requirements: R >= 2.14, ICU4C >= 55 (refer to the INSTALL file for more details)

License: stringi's source code is licensed under the open source BSD-3-clause, for more details see the LICENSE file.

This git repository also contains a custom subset of ICU4C 55.1 and ICU4C 61.1 source code which is copyrighted by Unicode and others. A binary version of the Unicode Character Database is included. For more details on copyright holders see the LICENSE file. The ICU project is covered by the ICU license — a simple, permissive non-copyleft free software license, compatible with the GNU GPL. The ICU license is intended to allow ICU to be included both in free software projects and in proprietary or commercial products.

Changes: see the NEWS file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].