All Projects → WorksApplications → Sudachidict

WorksApplications / Sudachidict

Licence: apache-2.0
A lexicon for Sudachi

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Sudachidict

Kagome
Self-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+336.22%)
Mutual labels:  morphological-analysis, segmentation, pos-tagging
Sudachi
A Japanese Tokenizer for Business
Stars: ✭ 496 (+290.55%)
Mutual labels:  morphological-analysis, segmentation, pos-tagging
Sudachipy
Python version of Sudachi, a Japanese tokenizer.
Stars: ✭ 207 (+62.99%)
Mutual labels:  morphological-analysis, segmentation, pos-tagging
udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-88.19%)
Mutual labels:  pos-tagging, morphological-analysis
retinal-exudates-detection
exudates detection using hybrid approach (Image Morphology & Machine Learning)
Stars: ✭ 53 (-58.27%)
Mutual labels:  segmentation, morphological-analysis
Jumanpp
Juman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+100%)
Mutual labels:  morphological-analysis, pos-tagging
Qutuf
Qutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
Stars: ✭ 84 (-33.86%)
Mutual labels:  morphological-analysis, pos-tagging
Retina Features
Project for segmentation of blood vessels, microaneurysm and hardexudates in fundus images.
Stars: ✭ 95 (-25.2%)
Mutual labels:  morphological-analysis, segmentation
Nlp Models Tensorflow
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Stars: ✭ 1,603 (+1162.2%)
Mutual labels:  pos-tagging
Multi object datasets
Multi-object image datasets with ground-truth segmentation masks and generative factors.
Stars: ✭ 121 (-4.72%)
Mutual labels:  segmentation
Rnnmorph
Morphological analyzer for Russian and English languages based on neural networks and dictionary-lookup systems.
Stars: ✭ 111 (-12.6%)
Mutual labels:  morphological-analysis
Sejong Corpus
Korean sejong corpus download and simple analysis
Stars: ✭ 116 (-8.66%)
Mutual labels:  morphological-analysis
Deeplab V3 Plus Cityscapes
mIOU=80.02 on cityscapes. My implementation of deeplabv3+ (also know as 'Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation' based on the dataset of cityscapes).
Stars: ✭ 121 (-4.72%)
Mutual labels:  segmentation
Nnunet
No description or website provided.
Stars: ✭ 2,231 (+1656.69%)
Mutual labels:  segmentation
Camel tools
A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
Stars: ✭ 124 (-2.36%)
Mutual labels:  morphological-analysis
Masktrack
Implementation of MaskTrack method which is the baseline of several state-of-the-art video object segmentation methods in Pytorch
Stars: ✭ 110 (-13.39%)
Mutual labels:  segmentation
Deep Learning Based Ecg Annotator
Annotation of ECG signals using deep learning, tensorflow’ Keras
Stars: ✭ 110 (-13.39%)
Mutual labels:  segmentation
Rdrpostagger
A fast and accurate POS and morphological tagging toolkit (EACL 2014)
Stars: ✭ 126 (-0.79%)
Mutual labels:  pos-tagging
Hyperdensenet
This repository contains the code of HyperDenseNet, a hyper-densely connected CNN to segment medical images in multi-modal image scenarios.
Stars: ✭ 124 (-2.36%)
Mutual labels:  segmentation
Nucleisegmentation
cGAN-based Multi Organ Nuclei Segmentation
Stars: ✭ 120 (-5.51%)
Mutual labels:  segmentation

SudachiDict

A lexicon for Japanese tokenizer Sudachi.

Download

Click here for pre-built dictionaries.

Python packages

You can install the dictionaries for WorksApplications/SudachiPy, the Python version of Sudachi, as Python packages.

$ pip install sudachidict_core
$ pip install sudachidict_small
$ sudachipy link -t small
$ pip install sudachidict_full
$ sudachipy link -t full

Dictionary types

Sudachi has three types of dictionaries.

  • Small: includes only the vocabulary of UniDic
  • Core: includes basic vocabulary (default)
  • Full: includes miscellaneous proper nouns

Build from sources

SudachiDict needs Git LFS to download the sourses of the system dictionaries. If you fail to build the dictionaries, install Git LFS and git lfs pull.

Building the dictionaries fails with a locale other than UTF-8. Add -Dfile.encoding=UTF-8 to MAVEN_OPTS.

Licenses

SudachiDict by Works Applications Co., Ltd. is licensed under the Apache License, Version2.0

Copyright (c) 2017 Works Applications Co., Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

This project includes UniDic and a part of NEologd.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].