felipensp / liblex

Licence: MIT license

C library for Lexical Analysis

Programming Languages

50402 projects - #5 most used programming language

Makefile

30231 projects

Projects that are alternatives of or similar to liblex

epub-parser

A powerful yet easy-to-use epub parser

Stars: ✭ 103 (+312%)

Mutual labels: lib

farasapy

A Python implementation of Farasa toolkit

Stars: ✭ 69 (+176%)

Mutual labels: tokenizer

berserker

Berserker - BERt chineSE woRd toKenizER

Stars: ✭ 17 (-32%)

Mutual labels: tokenizer

lex

Lex is an implementation of lex tool in Ruby.

Stars: ✭ 49 (+96%)

Mutual labels: tokenizer

cheatengine-threadstack-finder

List all thread's base address based on process id

Stars: ✭ 39 (+56%)

Mutual labels: lib

metalsmith-paths

Metalsmith plugin that adds file path values to metadata

Stars: ✭ 19 (-24%)

Mutual labels: lib

tokenizer

A simple tokenizer in Ruby for NLP tasks.

Stars: ✭ 44 (+76%)

Mutual labels: tokenizer

teams-api

Unofficial Microsoft Teams Library

Stars: ✭ 92 (+268%)

Mutual labels: lib

cortexm-AES

high performance AES implementations optimized for cortex-m microcontrollers

Stars: ✭ 18 (-28%)

Mutual labels: lib

babelfish

🐡 Straightforward library for translations and dictionaries

Stars: ✭ 47 (+88%)

Mutual labels: lib

psr2r-sniffer

A PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions

Stars: ✭ 32 (+28%)

Mutual labels: tokenizer

rustfst

Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.

Stars: ✭ 104 (+316%)

Mutual labels: tokenizer

elasticsearch-plugins

Some native scoring script plugins for elasticsearch

Stars: ✭ 30 (+20%)

Mutual labels: tokenizer

ssh

golang ssh lib 远程执行命令，文件上传下载模仿rsync和cp

Stars: ✭ 29 (+16%)

Mutual labels: lib

fs-pochta-api

Библиотека для работы с API Почты России

Stars: ✭ 15 (-40%)

Mutual labels: lib

hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R

Stars: ✭ 101 (+304%)

Mutual labels: tokenizer

neural tokenizer

Tokenize English sentences using neural networks.

Stars: ✭ 64 (+156%)

Mutual labels: tokenizer

jigjs

🧩 A front-end framework

Stars: ✭ 22 (-12%)

Mutual labels: lib

wink-tokenizer

Multilingual tokenizer that automatically tags each token with its type

Stars: ✭ 51 (+104%)

Mutual labels: tokenizer

jargon

Tokenizers and lemmatizers for Go

Stars: ✭ 98 (+292%)

Mutual labels: tokenizer

View All Similar Projects ➔

liblex

C library for Lexical Analysis

Usage:

/*
 * Lexer example
 * 
 * Author: Felipe Pena <felipensp at gmail.com>
 */
 
#include <stdio.h>
#include <liblex.h>

enum { /* Tokens */
	MYLEXER_PLUS = 1,
	MYLEXER_MINUS,
	MYLEXER_DIV,
	MYLEXER_MULT,
	MYLEXER_MOD,
	MYLEXER_START_COMMENT,
	MYLEXER_END_COMMENT,
	MYLEXER_WHITESPACE,
	MYLEXER_NUMBER,
	MYLEXER_IGNORE
};

enum { /* States */
	INITIAL = 0,
	COMMENT
};

int start_comment_callback(llex *lex, char *str, size_t len) 
{
	llex_set_state(lex, COMMENT);
	return MYLEXER_START_COMMENT;
}

int end_comment_callback(llex *lex, char *str, size_t len) 
{
	llex_set_state(lex, INITIAL);
	return MYLEXER_END_COMMENT;
}

int number_callback(llex *lex, char *str, size_t len) 
{
	return MYLEXER_NUMBER;
}

int main(int argc, char **argv)
{
	llex lex;
	llex_token_id token_id;
		
	llex_init(&lex);
	llex_set_buffer(&lex, "1 - 2 + 3 / 4 \n"
						  "/* ignored str */");
	
	llex_set_state(&lex, INITIAL);
	llex_add_token_callback(&lex, "/*", start_comment_callback);
	
	llex_set_state(&lex, COMMENT);
	llex_add_token_callback(&lex, "*/", end_comment_callback);
	llex_add_token_regex(&lex, "(?:(?!\\*/).)+", MYLEXER_IGNORE);
	
	llex_set_state(&lex, INITIAL);
	llex_add_token(&lex, "+", MYLEXER_PLUS);
	llex_add_token(&lex, "-", MYLEXER_MINUS);
	llex_add_token(&lex, "/", MYLEXER_DIV);
	llex_add_token(&lex, "*", MYLEXER_MULT);
	llex_add_token_regex(&lex, "\\s+", MYLEXER_WHITESPACE);
	llex_add_token_regex_callback(&lex, "\\d+", number_callback);
	
	while ((token_id = llex_tokenizer(&lex)) > 0) {
		printf("Token id: %d - State: %d - '%.*s' - Start: %d:%d / End: %d:%d\n",
			token_id, 
			lex.current_state,
			lex.current_len,
			lex.current_token,
			lex.buffer_col_start,
			lex.buffer_line_start,
			lex.buffer_col_end,			
			lex.buffer_line_end);
	}
	if (token_id == -1) {
		printf("Unknown string `%s'\n", lex.current_token);
	}
	
	llex_cleanup(&lex);
	
	return 0;
}

Outputs:

Token id: 9 - State: 0 - '1' - Start: 0:1 / End: 1:1
Token id: 8 - State: 0 - ' ' - Start: 1:1 / End: 2:1
Token id: 2 - State: 0 - '-' - Start: 2:1 / End: 3:1
Token id: 8 - State: 0 - ' ' - Start: 3:1 / End: 4:1
Token id: 9 - State: 0 - '2' - Start: 4:1 / End: 5:1
Token id: 8 - State: 0 - ' ' - Start: 5:1 / End: 6:1
Token id: 1 - State: 0 - '+' - Start: 6:1 / End: 7:1
Token id: 8 - State: 0 - ' ' - Start: 7:1 / End: 8:1
Token id: 9 - State: 0 - '3' - Start: 8:1 / End: 9:1
Token id: 8 - State: 0 - ' ' - Start: 9:1 / End: 10:1
Token id: 3 - State: 0 - '/' - Start: 10:1 / End: 11:1
Token id: 8 - State: 0 - ' ' - Start: 11:1 / End: 12:1
Token id: 9 - State: 0 - '4' - Start: 12:1 / End: 13:1
Token id: 8 - State: 0 - ' 
' - Start: 13:1 / End: 1:2
Token id: 6 - State: 1 - '/*' - Start: 1:2 / End: 3:2
Token id: 10 - State: 1 - ' ignored str ' - Start: 3:2 / End: 16:2
Token id: 7 - State: 0 - '*/' - Start: 16:2 / End: 18:2

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

felipensp / liblex

Programming Languages

Labels

Projects that are alternatives of or similar to liblex

liblex

Usage: