All Projects → ksopyla → Awesome Nlp Polish

ksopyla / Awesome Nlp Polish

Licence: mit
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.

Projects that are alternatives of or similar to Awesome Nlp Polish

Wongnai Corpus
Collection of Wongnai's datasets
Stars: ✭ 57 (-62.75%)
Mutual labels:  datasets, nlp-machine-learning
ake-datasets
Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.
Stars: ✭ 125 (-18.3%)
Mutual labels:  datasets, nlp-machine-learning
Machine Learning Resources
A curated list of awesome machine learning frameworks, libraries, courses, books and many more.
Stars: ✭ 226 (+47.71%)
Mutual labels:  datasets, nlp-machine-learning
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+800.65%)
Mutual labels:  datasets, nlp-machine-learning
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (-6.54%)
Mutual labels:  nlp-machine-learning
Pipedream
Connect APIs, remarkably fast. Free for developers.
Stars: ✭ 2,068 (+1251.63%)
Mutual labels:  datasets
Dl Text
Text pre-processing library for deep learning (Keras, tensorflow).
Stars: ✭ 119 (-22.22%)
Mutual labels:  nlp-machine-learning
Bird Recognition Review
A list of useful resources in the bird sound (song and calls) recognition, such as datasets, papers, links to open source projects and competitions
Stars: ✭ 116 (-24.18%)
Mutual labels:  datasets
Idenprof
IdenProf dataset is a collection of images of identifiable professionals. It is been collected to enable the development of AI systems that can serve by identifying people and the nature of their job by simply looking at an image, just like humans can do.
Stars: ✭ 149 (-2.61%)
Mutual labels:  datasets
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (-4.58%)
Mutual labels:  nlp-machine-learning
Remo Python
🐰 Python lib for remo - the app for annotations and images management in Computer Vision
Stars: ✭ 138 (-9.8%)
Mutual labels:  datasets
Nlp Pretrained Model
A collection of Natural language processing pre-trained models.
Stars: ✭ 122 (-20.26%)
Mutual labels:  nlp-machine-learning
Pix2code
pix2code: Generating Code from a Graphical User Interface Screenshot
Stars: ✭ 11,349 (+7317.65%)
Mutual labels:  datasets
Multi object datasets
Multi-object image datasets with ground-truth segmentation masks and generative factors.
Stars: ✭ 121 (-20.92%)
Mutual labels:  datasets
Pins
Pin, Discover and Share Resources
Stars: ✭ 149 (-2.61%)
Mutual labels:  datasets
G Reader
2018年机器阅读理解技术竞赛模型,国内外1000多支队伍中BLEU-4评分排名第6, ROUGE-L评分排名第14。(未ensemble,未嵌入训练好的词向量,无dropout)
Stars: ✭ 117 (-23.53%)
Mutual labels:  nlp-machine-learning
Complete Life Cycle Of A Data Science Project
Complete-Life-Cycle-of-a-Data-Science-Project
Stars: ✭ 140 (-8.5%)
Mutual labels:  datasets
Hands On Natural Language Processing With Python
This repository is for my students of Udemy. You can find all lecture codes along with mentioned files for reading in here. So, feel free to clone it and if you have any problem just raise a question.
Stars: ✭ 146 (-4.58%)
Mutual labels:  nlp-machine-learning
Seq2seq tutorial
Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"
Stars: ✭ 132 (-13.73%)
Mutual labels:  nlp-machine-learning
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+1280.39%)
Mutual labels:  datasets

awesome-nlp-polish

A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.

Awesome NLP Polish Logo

Table of Contents:

Polish text datasets

Task oriented datsets

Raw texts

Models and Embeddings

Polish Transformer models

  • Polish Roberta Model - model was trained on a corpus consisting of Polish Wikipedia dump, Polish books and articles, Polish Parliamentary Corpus
  • PoLitBert - Polish RoBERTA model trained on Polish Wikipedia, Polish literature and Oscar. Major assumption is that quality text will give good model.
  • PolBert - Polish BERT model. Model was trained with code provided in Google BERT's github repository. Merge with huggingface/Transformers
  • Allegro HerBERT - Polish BERT model trained on Polish Corpora using only MLM objective with dynamic masking of whole words.
  • SlavicBert - multilingual BERT model -BERT, Slavic Cased: 4 languages(Bulgarian,Czech, Polish, Russian), 12-layer, 768-hidden, 12-heads, 110M parameters, 600Mb. There is also another SlavicBert model http://docs.deeppavlov.ai/en/master/features/models/bert.html but I have problems to convert it to pytorch.

Other models

Language processing tools and libraries

Papers, articles, blog post

Contribution

If you have or know valuable materials (datasets, models, posts, articles) that are missing here, please feel free to edit and submit a pull request. You can also send me a note on LinkedIn or via email:[email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].