Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → FudanNLP → Nlpcc Wordseg Weibo

FudanNLP / Nlpcc Wordseg Weibo

NLPCC 2016 微博分词评测项目

Programming Languages

139335 projects - #7 most used programming language

Labels

natural-language-processing chinese-word-segmentation

Projects that are alternatives of or similar to Nlpcc Wordseg Weibo

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for the latest lucene,solr,elasticsearch

Stars: ✭ 754 (+528.33%)

Mutual labels: natural-language-processing, chinese-word-segmentation

中文分词词性标注命名实体识别依存句法分析新词发现关键词短语提取自动摘要文本分类聚类拼音简繁自然语言处理

Stars: ✭ 2,564 (+2036.67%)

Mutual labels: natural-language-processing, chinese-word-segmentation

Deeplearning nlp

基于深度学习的自然语言处理库

Stars: ✭ 154 (+28.33%)

Mutual labels: natural-language-processing, chinese-word-segmentation

基于深度学习的自然语言处理库

Stars: ✭ 34 (-71.67%)

Mutual labels: natural-language-processing, chinese-word-segmentation

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+1163.33%)

Mutual labels: natural-language-processing

Code for Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension

Stars: ✭ 112 (-6.67%)

Mutual labels: natural-language-processing

Papers and Book to look at when starting NLP 📚

Stars: ✭ 111 (-7.5%)

Mutual labels: natural-language-processing

Awesome Emotion Recognition In Conversations

A comprehensive reading list for Emotion Recognition in Conversations

Stars: ✭ 111 (-7.5%)

Mutual labels: natural-language-processing

Code for paper "Discourse-Aware Neural Extractive Text Summarization" (ACL20)

Stars: ✭ 120 (+0%)

Mutual labels: natural-language-processing

Nonautoreggenprogress

Tracking the progress in non-autoregressive generation (translation, transcription, etc.)

Stars: ✭ 118 (-1.67%)

Mutual labels: natural-language-processing

Unified Summarization

Official codes for the paper: A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss.

Stars: ✭ 114 (-5%)

Mutual labels: natural-language-processing

Deep Nlp Seminars

Materials for deep NLP course

Stars: ✭ 113 (-5.83%)

Mutual labels: natural-language-processing

Stanford Tensorflow Tutorials

This repository contains code examples for the Stanford's course: TensorFlow for Deep Learning Research.

Stars: ✭ 10,098 (+8315%)

Mutual labels: natural-language-processing

Open neural machine translation models and web services

Stars: ✭ 111 (-7.5%)

Mutual labels: natural-language-processing

Python implementation of TextRank for phrase extraction and summarization of text documents

Stars: ✭ 1,675 (+1295.83%)

Mutual labels: natural-language-processing

DaNLP is a repository for Natural Language Processing resources for the Danish Language.

Stars: ✭ 111 (-7.5%)

Mutual labels: natural-language-processing

Implementation of BERT in R

Stars: ✭ 114 (-5%)

Mutual labels: natural-language-processing

Dynamic Coattention Network Plus

Dynamic Coattention Network Plus (DCN+) TensorFlow implementation. Question answering using Deep NLP.

Stars: ✭ 117 (-2.5%)

Mutual labels: natural-language-processing

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Stars: ✭ 1,487 (+1139.17%)

Mutual labels: natural-language-processing

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!

Stars: ✭ 111 (-7.5%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

NLPCC2016-WordSeg-Weibo

NLPCC 2016 微博分词评测项目

##Description of the Task

Word is the fundamental unit in natural language understanding. However, Chinese sentences consists of the continuous Chinese characters without natural delimiters. Therefore, Chinese word segmentation has become the first mission of Chinese natural language processing, which identifies the sequence of words in a sentence and marks the boundaries between words.

Different with the popular used news dataset, we use more informal texts from Sina Weibo. The training and test data consist of micro-blogs from various topics, such as finance, sports, entertainment, and so on.

Each participant will be allowed to submit the three runs: closed track run, semi-open track run and open track run.

In the closed track, participants could only use information found in the provided training data. Information such as externally obtained word counts, part of speech information, or name lists was excluded.
In the semi-open track, participants could use the information extracted from the provided background data in addition to the provided training data. Information such as externally obtained word counts, part of speech information, or name lists was excluded.
In the open track, participants could use the information which should be public and be easily obtained. But it is not allowed to obtain the result by the manual labeling or crowdsourcing way.

Data

The data are collected from Sina Weibo. Both the training and test files are UTF-8 encoded. Besides the training data, we also provide the background data, from which the training and test data are drawn. The purpose of providing the background data is to find the more sophisticated features by the unsupervised way.

Download

The dataset provides a standard training/dev/test split. Specifically, the researchers interested in the dataset should download and fill up this Agreement Form and send the scanned version back to Xipeng Qiu ([email protected]; Email title: Fudan Micro-blog Dataset data request).

本数据集提供标准的训练集/开发集/测试集分割。如果您在论文中使用了本数据集，请您给我们发一份使用协议。请签名后扫描，将扫描的协议书发给我们 (邮件地址：[email protected]; 邮件主题: 复旦微博数据集申请)。

Evaluation Metric

Different with the standard precision, recall, F1-score, we will provide a new measure metric this year. The detailed information can be found in http://aclweb.org/anthology/P/P16/P16-1206.pdf .

Papers

Peng Qian, Xipeng Qiu, Xuanjing Huang, A New Psychometric-inspired Evaluation Metric for Chinese Word Segmentation, In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2016. [PDF]
Xipeng Qiu, Peng Qian, Zhan Shi, Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Segmentation for Micro-blog Texts, In Proceedings of The Fifth Conference on Natural Language Processing and Chinese Computing & The Twenty Fourth International Conference on Computer Processing of Oriental Languages, 2016.

Citation

如果你在论文中使用了本数据集，请引用下面文献。

@InProceedings{qiu2016overview,
  Title                    = {Overview of the {NLPCC-ICCPOL} 2016 Shared Task: Chinese Word Segmentation for Micro-blog Texts},
  Author                   = {Xipeng Qiu and Peng Qian and Zhan Shi},
  Booktitle                = {Proceedings of The Fifth Conference on Natural Language Processing and Chinese Computing \& The Twenty Fourth
International Conference on Computer Processing of Oriental Languages},
  Year                     = {2016}
}

Contact Information

For any questions about this shared task, please contact: Xipeng Qiu Group of NLP & DL School of Computer Science, Fudan University Email: [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 120

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗