All Projects → alexandrainst → Danlp

alexandrainst / Danlp

Licence: bsd-3-clause
DaNLP is a repository for Natural Language Processing resources for the Danish Language.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Danlp

Flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Stars: ✭ 11,065 (+9868.47%)
Mutual labels:  natural-language-processing, named-entity-recognition, word-embeddings
Chatbot ner
chatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (+145.95%)
Mutual labels:  natural-language-processing, named-entity-recognition, nlp-library
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+19700%)
Mutual labels:  natural-language-processing, named-entity-recognition, nlp-library
Iob2corpus
Japanese IOB2 tagged corpus for Named Entity Recognition.
Stars: ✭ 51 (-54.05%)
Mutual labels:  natural-language-processing, named-entity-recognition
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-64.86%)
Mutual labels:  natural-language-processing, word-embeddings
Nagisa Tutorial Pycon2019
Code for PyCon JP 2019 talk "Python による日本語自然言語処理 〜系列ラベリングによる実世界テキスト分析〜"
Stars: ✭ 46 (-58.56%)
Mutual labels:  natural-language-processing, named-entity-recognition
Named Entity Recognition
name entity recognition with recurrent neural network(RNN) in tensorflow
Stars: ✭ 20 (-81.98%)
Mutual labels:  natural-language-processing, named-entity-recognition
Anago
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Stars: ✭ 1,392 (+1154.05%)
Mutual labels:  natural-language-processing, named-entity-recognition
Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (-45.95%)
Mutual labels:  natural-language-processing, word-embeddings
Nested Ner Tacl2020 Transformers
Implementation of Nested Named Entity Recognition using BERT
Stars: ✭ 76 (-31.53%)
Mutual labels:  natural-language-processing, named-entity-recognition
Bond
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (-13.51%)
Mutual labels:  natural-language-processing, named-entity-recognition
Easy Bert
A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Stars: ✭ 106 (-4.5%)
Mutual labels:  natural-language-processing, word-embeddings
Understanding Financial Reports Using Natural Language Processing
Investigate how mutual funds leverage credit derivatives by studying their routine filings to the SEC using NLP techniques 📈🤑
Stars: ✭ 36 (-67.57%)
Mutual labels:  natural-language-processing, named-entity-recognition
Corenlp
Stanford CoreNLP: A Java suite of core NLP tools.
Stars: ✭ 8,248 (+7330.63%)
Mutual labels:  natural-language-processing, named-entity-recognition
Deepnlp
基于深度学习的自然语言处理库
Stars: ✭ 34 (-69.37%)
Mutual labels:  natural-language-processing, named-entity-recognition
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+1155.86%)
Mutual labels:  natural-language-processing, word-embeddings
Toiro
A comparison tool of Japanese tokenizers
Stars: ✭ 95 (-14.41%)
Mutual labels:  natural-language-processing, nlp-library
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (-2.7%)
Mutual labels:  natural-language-processing, word-embeddings
Pynlp
A pythonic wrapper for Stanford CoreNLP.
Stars: ✭ 103 (-7.21%)
Mutual labels:  natural-language-processing, named-entity-recognition
Entity Recognition Datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Stars: ✭ 891 (+702.7%)
Mutual labels:  natural-language-processing, named-entity-recognition

DaNLP is a repository for Natural Language Processing resources for the Danish Language. It is a collection of available datasets and models for a variety of NLP tasks. The aim is to make it easier and more applicable to practitioners in the industry to use Danish NLP and hence this project is licensed to allow commercial use. The project features code examples on how to use the datasets and models in popular NLP frameworks such as spaCy, Transformers and Flair as well as Deep Learning frameworks such as PyTorch. See our [documentation pages](https://danlp-alexandra.readthedocs.io/en/latest/index.html) for more details about our models and datasets, and definitions of the modules provided through the DaNLP package.

If you are new to NLP or want to know more about the project in a broader perspective, you can start on our microsite.


Help us improve DaNLP

  • 🙋 Have you tried the DaNLP package? Then we would love to chat with you about your experiences from a company perspective. It will take approx 20-30 minutes and there's no preparation. English/danish as you prefer. Please leave your details here and then we will reach out to arrange a call.

News

  • 🎉 Version 0.0.11 has been released with new features using a pre-trained BERT model by BotXo for predicting mask word, next sentence prediction and embeddings. The NER bert model also come with a updated feature of predicting the tags combined. To new datasets is added, one dataset for coreference resolution and also the wordnet DanNet, which can be loaded to find e.g. synonyms.

  • 📘 A jupyter notebook tutorial of during data augmentation on texts

Next up

  • 🔗 Models for coreference resolution with benchmarks and documentation

Installation

To get started using DaNLP in your python project simply install the pip package. However installing the pip package will not install all NLP libraries because we want you to have the freedom to limit the dependency on what you use.

Install with pip

To get started using DaNLP simply install the project with pip:

pip install danlp 

Note that the installation of DaNLP does not install other NLP libraries such as Gensim, SpaCy, flair or Transformers. This allows the installation to be as minimal as possible and let the user choose to e.g. load word embeddings with either spaCy, flair or Gensim. Therefore, depending on the function you need to use, you should install one or several of the following: pip install flair, pip install spacy or/and pip install gensim. You can check the requirements.txt file to see what version the packages has been tested with.

Install from source

If you want to be able to use the latest developments before they are released in a new pip package, or you want to modify the code yourself, then clone this repo and install from source.

git clone https://github.com/alexandrainst/danlp.git
cd danlp
pip install . 

To install the dependencies used in the package with the tested versions:

pip install -r requirements.txt

Install from github

Alternatively you can install the latest version from github using:

pip install git+https://github.com/alexandrainst/danlp.git

Install with Docker

To quickly get started with DaNLP and to try out the models you can use our Docker image. To start a ipython session simply run:

docker run -it --rm alexandrainst/danlp ipython

If you want to run a <script.py> in your current working directory you can run:

docker run -it --rm -v "$PWD":/usr/src/app -w /usr/src/app alexandrainst/danlp python <script.py>
                  

Quick Start

Read more in our documentation pages.

NLP Models

Natural Language Processing is an active area of research and it consists of many different tasks. The DaNLP repository provides an overview of Danish models for some of the most common NLP tasks.

The repository is under development and this is the list of NLP tasks we have covered and plan to cover in the repository.

If you are interested in Danish support for any specific NLP task you are welcome to get in contact with us.

We do also recommend to check out this awesome list of Danish NLP stuff from Finn Årup Nielsen.

Datasets

The number of datasets in the Danish language is limited. The DaNLP repository provides an overview of the available Danish datasets that can be used for commercial purposes.

The DaNLP package allows you to download and preprocess datasets. You can read about the datasets here.

Examples

You will find examples and tutorials here that shows how to use NLP in Danish. This project keeps a Danish written blog on medium where we write about Danish NLP, and in time we will also provide some real cases of how NLP is applied in Danish companies.

Structure of the repo

To help you navigate we provide you with an overview of the structure in the github:

.
├── danlp		   			# Source files
│	├── datasets   			# Code to load datasets with different frameworks 
│	└── models     			# Code to load models with different frameworks 
├── docker         			# Docker image
├── docs	       			# Documentation and files for setting up Read The Docs
│   ├── docs	   			# Documentation for tasks, datasets and frameworks
│	    ├── tasks  			# Documentation for nlp tasks with benchmark results
│	    ├── frameworks 		# Overview over different frameworks used
│		├── gettingstarted 	  # Guides for installation and getting started  
│	    └── imgs   			 # Images used in documentation
│   └── library     		# Files used for Read the Docs
├── examples	   			# Examples, tutorials and benchmark scripts
│   ├── benchmarks 			# Scripts for reproducing benchmarks results
│   └── tutorials 			# Jupyter notebook tutorials
└── tests   	   			# Tests for continuous integration with travis

How do I contribute?

If you want to contribute to the DaNLP repository and make it better, your help is very welcome. You can contribute to the project in many ways:

  • Help us write good tutorials on Danish NLP use-cases
  • Contribute with your own pretrained NLP models or datasets in Danish (see our contributing guidelines for more details on how to contribute to this repository)
  • Create GitHub issues with questions and bug reports
  • Notify us of other Danish NLP resources or tell us about any good ideas that you have for improving the project through the Discussions section of this repository.

Who is behind?

The DaNLP repository is maintained by the Alexandra Institute which is a Danish non-profit company with a mission to create value, growth and welfare in society. The Alexandra Institute is a member of GTS, a network of independent Danish research and technology organisations.

The work on this repository is part of the Dansk For Alle performance contract allocated to the Alexandra Institute by the Danish Ministry of Higher Education and Science. The project runs in two years in 2019 and 2020, and an overview of the project can be found on our microsite. ````

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].