Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → datamade → Parserator

datamade / Parserator

Licence: mit

🔖 A toolkit for making domain-specific probabilistic parsers

Programming Languages

python

139335 projects - #7 most used programming language

Labels

crf

Projects that are alternatives of or similar to Parserator

grobid-quantities

GROBID extension for identifying and normalizing physical quantities.

Stars: ✭ 53 (-92.5%)

Mutual labels: crf

Rnnsharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

Stars: ✭ 277 (-60.82%)

Mutual labels: crf

Sltk

序列化标注工具，基于PyTorch实现BLSTM-CNN-CRF模型，CoNLL 2003 English NER测试集F1值为91.10%（word and char feature）。

Stars: ✭ 338 (-52.19%)

Mutual labels: crf

entity recognition

Entity recognition codes for "2019 Datagrand Cup: Text Information Extraction Challenge"

Stars: ✭ 26 (-96.32%)

Mutual labels: crf

A Pytorch Tutorial To Sequence Labeling

Empower Sequence Labeling with Task-Aware Neural Language Model | a PyTorch Tutorial to Sequence Labeling

Stars: ✭ 257 (-63.65%)

Mutual labels: crf

Slot filling and intent detection of slu

slot filling, intent detection, joint training, ATIS & SNIPS datasets, the Facebook’s multilingual dataset, MIT corpus, E-commerce Shopping Assistant (ECSA) dataset, CoNLL2003 NER, ELMo, BERT, XLNet

Stars: ✭ 298 (-57.85%)

Mutual labels: crf

giantgo-render

基于vue3 element plus的快速表单生成器

Stars: ✭ 28 (-96.04%)

Mutual labels: crf

Bert Ner Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Stars: ✭ 654 (-7.5%)

Mutual labels: crf

Ner Pytorch

LSTM+CRF NER

Stars: ✭ 260 (-63.22%)

Mutual labels: crf

Ner Lstm Crf

An easy-to-use named entity recognition (NER) toolkit, implemented the Bi-LSTM+CRF model in tensorflow.

Stars: ✭ 337 (-52.33%)

Mutual labels: crf

pytorch-partial-crf

CRF, Partial CRF and Marginal CRF in PyTorch

Stars: ✭ 23 (-96.75%)

Mutual labels: crf

keras-bert-ner

Keras solution of Chinese NER task using BiLSTM-CRF/BiGRU-CRF/IDCNN-CRF model with Pretrained Language Model: supporting BERT/RoBERTa/ALBERT

Stars: ✭ 7 (-99.01%)

Mutual labels: crf

Bert seq2seq

pytorch实现bert做seq2seq任务，使用unilm方案,现在也可以做自动摘要，文本分类，情感分析，NER，词性标注等任务,支持GPT2进行文章续写。

Stars: ✭ 298 (-57.85%)

Mutual labels: crf

grobid-ner

A Named-Entity Recogniser based on Grobid.

Stars: ✭ 38 (-94.63%)

Mutual labels: crf

Bert Bilstm Crf Ner

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Stars: ✭ 3,838 (+442.86%)

Mutual labels: crf

lstm-crf-tagging

No description or website provided.

Stars: ✭ 13 (-98.16%)

Mutual labels: crf

Hscrf Pytorch

ACL 2018: Hybrid semi-Markov CRF for Neural Sequence Labeling (http://aclweb.org/anthology/P18-2038)

Stars: ✭ 284 (-59.83%)

Mutual labels: crf

Python Crfsuite

A python binding for crfsuite

Stars: ✭ 678 (-4.1%)

Mutual labels: crf

Lstm Crf Pytorch

LSTM-CRF in PyTorch

Stars: ✭ 364 (-48.51%)

Mutual labels: crf

Macropodus

自然语言处理工具Macropodus，基于Albert+BiLSTM+CRF深度学习网络架构，中文分词，词性标注，命名实体识别，新词发现，关键词，文本摘要，文本相似度，科学计算器，中文数字阿拉伯数字(罗马数字)转换，中文繁简转换，拼音转换。tookit(tool) of NLP，CWS(chinese word segnment)，POS(Part-Of-Speech Tagging)，NER(name entity recognition)，Find(new words discovery)，Keyword(keyword extraction)，Summarize(text summarization)，Sim(text similarity)，Calculate(scientific calculator)，Chi2num(chinese number to arabic number)

Stars: ✭ 309 (-56.29%)

Mutual labels: crf

View All Similar Projects ➔

parserator

A toolkit for making domain-specific probabilistic parsers

Do you have domain-specific text data that would be much more useful if you could derive structure from the strings? This toolkit will help you create a custom NLP model that learns from patterns in real data and then uses that knowledge to process new strings automatically. All you need is some training data to teach your parser about its domain.

What does probabilistic parser do?

Given a string, a probabilistic parser will break it out into labeled components. The parser uses conditional random fields to label components based on (1) features of the component string and (2) the order of labels.

When is a probabilistic parser useful?

A probabilistic parser is particularly useful for sets of strings that may have common structure/patterns, but which deviate from those patterns in ways that are difficult to anticipate with hard-coded rules.

For example, in most cases, addresses in the United States start with a street number. But there are exceptions: sometimes valid U.S. addresses deviate from this pattern (e.g., addresses starting with a building name or a P.O. box). Furthermore, addresses in real data sets often include typos and other errors. Because there are infinitely many patterns and possible typos to account for, a probabilistic parser is well-suited to parse U.S. addresses.

With a probabilistic (as opposed to a rule-based approach) approach, the parser can continually learn from new training data and thus continually improve its performance!

Some other examples of domains where a probabilistic parser can be useful:

addresses in other countries with unfamiliar conventions
product names/descriptions (e.g., parsing phrases like "Twizzlers Twists, Strawberry, 16-Ounce Bags (Pack of 6)" into brand, item, flavor, weight, etc.)
citations in academic writing

Examples of parserator

usaddress - Our first probabilistic parser and the basis for the parserator toolkit, it parses any address in the United States. Read our blog post on how it works.
probablepeople - Parser for romanized person names.

Try out these parsers on our web interface!

How to make a parser - quick overview

For more details on each step, see the parserator documentation.

Initialize a new parser

pip install parserator
parserator init [YOUR PARSER NAME]
python setup.py develop

Configure the parser to your domain
- configure labels (i.e., the set of possible tags for the tokens)
- configure the tokenizer (i.e., how a raw string will be split into a sequence of tokens to be tagged)
Define features relevant to your domain
- define token-level features (e.g., length, casing)
- define sequence-level features (e.g., whether a token is the first token in the sequence)
Prepare training data
- Parserator reads training data in XML format
- To create XML training data output from unlabeled strings in a CSV file, use parserator's command line interface to manually label tokens. It uses values in first column, and it ignores other columns. To start labeling, run parserator label [infile] [outfile] [modulename]
- For example, parserator label unlabeled/rawstrings.csv labeled_xml/labeled.xml usaddress
Train your parser
- To train your parser on your labeled training data, run parserator train [traindata] [modulename]
- For example, parserator train labeled_xml/labeled.xml usaddress or parserator train "labeled_xml/*.xml" usaddress
- After training, your parser will have an updated model, in the form of a .crfsuite settings file
Repeat steps 3-5 as needed!

How to use your new parser

Once you are able to create a model from training data, install your custom parser by running python setup.py develop.

Then, in a Python shell, you can import your parser and use the parse and tag methods to process new strings. For example, to use the probablepeople module:

>>> import probablepeople
>>> probablepeople.parse('Mr George "Gob" Bluth II')
[('Mr', 'PrefixMarital'), ('George', 'GivenName'), ('"Gob"', 'Nickname'), ('Bluth', 'Surname'), ('II', 'SuffixGenerational')]

Important Links

Documentation: http://parserator.rtfd.org/
Web interface for trying out parsers: http://parserator.datamade.us/
Blog post: http://datamade.us/blog/parse-name-or-parse-anything-really
Repository: https://github.com/datamade/parserator
Issues: https://github.com/datamade/parserator/issues
Distribution: https://pypi.python.org/pypi/parserator

Team

Forest Gregg, DataMade
Cathy Deng, DataMade

Errors and Bugs

If something is not behaving intuitively, it is a bug and should be reported. Report an issue.

Patches and Pull Requests

We welcome your ideas! You can make suggestions in the form of GitHub issues (bug reports, feature requests, general questions), or you can submit a code contribution via a pull request.

How to contribute code:

Fork the project.
Make your feature addition or bug fix.
Send us a pull request with a description of your work! Don't worry if it isn't perfect: think of a PR as a start of a conversation rather than a finished product.

Copyright and Attribution

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 707

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗