All Projects → JonathanRaiman → Wikipedia_ner

JonathanRaiman / Wikipedia_ner

📖 Labeled examples from wiki dumps in Python

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Wikipedia ner

Cluener2020
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (+1029.51%)
Mutual labels:  dataset, named-entity-recognition
Deep learning projects
Stars: ✭ 28 (-54.1%)
Mutual labels:  jupyter-notebook, dataset
Caffenet Benchmark
Evaluation of the CNN design choices performance on ImageNet-2012.
Stars: ✭ 700 (+1047.54%)
Mutual labels:  jupyter-notebook, dataset
Comma2k19
A driving dataset for the development and validation of fused pose estimators and mapping algorithms
Stars: ✭ 391 (+540.98%)
Mutual labels:  jupyter-notebook, dataset
Ner blstm Crf
LSTM-CRF for NER with ConLL-2002 dataset
Stars: ✭ 51 (-16.39%)
Mutual labels:  jupyter-notebook, named-entity-recognition
Hate Speech And Offensive Language
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Stars: ✭ 543 (+790.16%)
Mutual labels:  jupyter-notebook, dataset
Tedsds
Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-77.05%)
Mutual labels:  jupyter-notebook, dataset
Medmnist
[ISBI'21] MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis
Stars: ✭ 338 (+454.1%)
Mutual labels:  jupyter-notebook, dataset
Chinesetrafficpolicepose
Detects Chinese traffic police commanding poses 检测中国交警指挥手势
Stars: ✭ 49 (-19.67%)
Mutual labels:  jupyter-notebook, dataset
Nagisa Tutorial Pycon2019
Code for PyCon JP 2019 talk "Python による日本語自然言語処理 〜系列ラベリングによる実世界テキスト分析〜"
Stars: ✭ 46 (-24.59%)
Mutual labels:  jupyter-notebook, named-entity-recognition
Transformers Tutorials
Github repo with tutorials to fine tune transformers for diff NLP tasks
Stars: ✭ 384 (+529.51%)
Mutual labels:  jupyter-notebook, named-entity-recognition
Cinemanet
Stars: ✭ 57 (-6.56%)
Mutual labels:  jupyter-notebook, dataset
Vpgnet
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)
Stars: ✭ 382 (+526.23%)
Mutual labels:  jupyter-notebook, dataset
Chatito
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Stars: ✭ 678 (+1011.48%)
Mutual labels:  dataset, named-entity-recognition
Bert Multitask Learning
BERT for Multitask Learning
Stars: ✭ 380 (+522.95%)
Mutual labels:  jupyter-notebook, named-entity-recognition
Covid Ct
COVID-CT-Dataset: A CT Scan Dataset about COVID-19
Stars: ✭ 820 (+1244.26%)
Mutual labels:  jupyter-notebook, dataset
Whylogs
Profile and monitor your ML data pipeline end-to-end
Stars: ✭ 328 (+437.7%)
Mutual labels:  jupyter-notebook, dataset
Dsprites Dataset
Dataset to assess the disentanglement properties of unsupervised learning methods
Stars: ✭ 340 (+457.38%)
Mutual labels:  jupyter-notebook, dataset
Kaggle Web Traffic Time Series Forecasting
Solution to Kaggle - Web Traffic Time Series Forecasting
Stars: ✭ 29 (-52.46%)
Mutual labels:  wikipedia, jupyter-notebook
Covidnet Ct
COVID-Net Open Source Initiative - Models and Data for COVID-19 Detection in Chest CT
Stars: ✭ 57 (-6.56%)
Mutual labels:  jupyter-notebook, dataset

Wikipedia NER

Tool to train and obtain named entity recognition labeled examples from Wikipedia dumps.

Usage in IPython notebook (nbviewer link).

Usage

Here is an example usage with the first 200 articles from the english wikipedia dump (dated lated 2013):

parseresult = wikipedia_ner.parse_dump("enwiki.bz2",
                        max_articles = 200)
most_common_category = wikipedia_ner.ParsedPage.categories_counter.most_common(1)[0][0]

most_common_category_children = [
		parseresult.index2target[child] for child in list(wikipedia_ner.ParsedPage.categories[most_common_category].children)
		]

"In '%s' the children are %r" % (
	most_common_category,
	", ".join(most_common_category_children)
	)

#=> "In 'Category : Member states of the United Nations' the children are 'Afghanistan, Algeria, Andorra, Antigua and Barbuda, Azerbaijan, Angola, Albania'"
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].