Coleridge-Initiative / rclc

Licence: CC0-1.0 license
Rich Context leaderboard competition, including the corpus and current SOTA for required tasks.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to rclc

Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+10460%)
Mutual labels:  corpus, knowledge-graph
Opensource-Contribution-Leaderboard
Open Source project contributors tracking leaderboard built with ❤️ in NodeJS 😉
Stars: ✭ 30 (+50%)
Mutual labels:  leaderboard
Kaggle dstl submission
Code for a winning model (3 out of 419) in a Dstl Satellite Imagery Feature Detection challenge
Stars: ✭ 159 (+695%)
Mutual labels:  competition
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (+5%)
Mutual labels:  corpus
Xf event extraction2020top1
科大讯飞2020事件抽取挑战赛第一名解决方案&完整事件抽取系统
Stars: ✭ 186 (+830%)
Mutual labels:  competition
Market-Trend-Prediction
This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).
Stars: ✭ 57 (+185%)
Mutual labels:  knowledge-graph
Sv Benchmarks
Collection of Verification Tasks
Stars: ✭ 158 (+690%)
Mutual labels:  competition
awesome-knowledge-graphs
Graph databases, Knowledge Graphs, SPARQ
Stars: ✭ 56 (+180%)
Mutual labels:  knowledge-graph
knowledge-graph-change-language
Tools for working with KGCL
Stars: ✭ 14 (-30%)
Mutual labels:  knowledge-graph
Dialogue-Corpus
No description or website provided.
Stars: ✭ 27 (+35%)
Mutual labels:  corpus
Cool-NLPCV
Some Cool NLP and CV Repositories and Solutions (收集NLP中常见任务的开源解决方案、数据集、工具、学习资料等)
Stars: ✭ 143 (+615%)
Mutual labels:  knowledge-graph
Halite Ii
Season 2 of @twosigma's artificial intelligence programming challenge
Stars: ✭ 201 (+905%)
Mutual labels:  competition
Data-Science-Hackathon-And-Competition
Grandmaster in MachineHack (3rd Rank Best) | Top 70 in AnalyticsVidya & Zindi | Expert at Kaggle | Hack AI
Stars: ✭ 165 (+725%)
Mutual labels:  competition
Cail2019
法研杯2019相似案例匹配第二名解决方案(附数据集和文档)
Stars: ✭ 176 (+780%)
Mutual labels:  competition
OLGA
an Ontology SDK
Stars: ✭ 36 (+80%)
Mutual labels:  knowledge-graph
Halite Iii
Season 3 of @twosigma's artificial intelligence programming challenge
Stars: ✭ 159 (+695%)
Mutual labels:  competition
atari-leaderboard
A leaderboard of human and machine performance on the Arcade Learning Environment (ALE).
Stars: ✭ 22 (+10%)
Mutual labels:  leaderboard
ZS-F-VQA
Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
Stars: ✭ 51 (+155%)
Mutual labels:  knowledge-graph
german-nouns
A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.
Stars: ✭ 101 (+405%)
Mutual labels:  corpus
KCL
Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]
Stars: ✭ 61 (+205%)
Mutual labels:  knowledge-graph

Tracking Progress in Rich Context

The Coleridge Initiative at NYU has been researching Rich Context to enhance search and discovery of datasets used in scientific research – see the Background Info section for more details. Partnering with experts throughout academia and industry, NYU-CI has worked to leverage the closely adjacent fields of NLP/NLU, knowledge graph, recommender systems, scholarly infrastructure, data mining from scientific literature, dataset discovery, linked data, open vocabularies, metadata management, data governance, and so on. Leaderboards are published here on GitHub to track state-of-the-art (SOTA) progress among the top results.


Leaderboard 1

Entity Linking for Datasets in Publications

The first challenge is to identify the datasets used in research publications, initially focused on the problem of entity linking. Research papers generally mention the datasets they've used, although there are limited formal means to describe that metadata in a machine-readable way. The goal here is to predict a set of dataset IDs for each publication. The dataset IDs within the corpus represent the set of all possible datasets which will appear.

Identifying dataset mentions typically requires:

  • extracting text from an open access PDF
  • some NLP parsing of the text
  • feature engineering (e.g., attention to where text is located in a paper)
  • modeling to identify up to 5 datasets per publication

See Evaluating Models for Entity Linking with Datasets for details about how the Top5uptoD leaderboard metric is calculated.

Instructions

Use of open source and open standards are especially important to further the cause for effective, reproducible research. We're hosting this competition to focus on the research challenges of specific machine learning use cases encountered within Rich Context – see the Workflow Stages section.

If you have any questions about the Rich Context leaderboard competition – and especially if you identify any problems in the corpus (e.g., data quality, incorrect metadata, broken links, etc.) – please use the GitHub issues for this repo and pull requests to report, discuss, and resolve them.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].