All Projects → anoopkunchukuttan → Indic_nlp_library

anoopkunchukuttan / Indic_nlp_library

Licence: mit
Resources and tools for Indian language Natural Language Processing

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Indic nlp library

Zhihu
This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.
Stars: ✭ 3,307 (+850.29%)
Mutual labels:  natural-language-processing
Ai Deadlines
⏰ AI conference deadline countdowns
Stars: ✭ 3,852 (+1006.9%)
Mutual labels:  natural-language-processing
Matchzoo
Facilitating the design, comparison and sharing of deep text matching models.
Stars: ✭ 3,568 (+925.29%)
Mutual labels:  natural-language-processing
Trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Stars: ✭ 311 (-10.63%)
Mutual labels:  natural-language-processing
Displacy
💥 displaCy.js: An open-source NLP visualiser for the modern web
Stars: ✭ 311 (-10.63%)
Mutual labels:  natural-language-processing
Chakin
Simple downloader for pre-trained word vectors
Stars: ✭ 323 (-7.18%)
Mutual labels:  natural-language-processing
Nlprule
A fast, low-resource Natural Language Processing and Text Correction library written in Rust.
Stars: ✭ 309 (-11.21%)
Mutual labels:  natural-language-processing
Nlp Papers With Arxiv
Statistics and accepted paper list of NLP conferences with arXiv link
Stars: ✭ 345 (-0.86%)
Mutual labels:  natural-language-processing
Bytenet Tensorflow
ByteNet for character-level language modelling
Stars: ✭ 319 (-8.33%)
Mutual labels:  natural-language-processing
Dynamic Memory Networks In Theano
Implementation of Dynamic memory networks by Kumar et al. http://arxiv.org/abs/1506.07285
Stars: ✭ 334 (-4.02%)
Mutual labels:  natural-language-processing
Biosentvec
BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
Stars: ✭ 308 (-11.49%)
Mutual labels:  natural-language-processing
Gcn Over Pruned Trees
Graph Convolution over Pruned Dependency Trees Improves Relation Extraction (authors' PyTorch implementation)
Stars: ✭ 312 (-10.34%)
Mutual labels:  natural-language-processing
Adam qas
ADAM - A Question Answering System. Inspired from IBM Watson
Stars: ✭ 330 (-5.17%)
Mutual labels:  natural-language-processing
Ltp
Language Technology Platform
Stars: ✭ 3,648 (+948.28%)
Mutual labels:  natural-language-processing
Adapter Transformers
Huggingface Transformers + Adapters = ❤️
Stars: ✭ 338 (-2.87%)
Mutual labels:  natural-language-processing
Awesome Arabic
A curated list of awesome projects and dev/design resources for supporting Arabic computational needs.
Stars: ✭ 309 (-11.21%)
Mutual labels:  natural-language-processing
Clause
🏇 聊天机器人,自然语言理解,语义理解
Stars: ✭ 323 (-7.18%)
Mutual labels:  natural-language-processing
Arxivtimes
repository to research & share the machine learning articles
Stars: ✭ 3,651 (+949.14%)
Mutual labels:  natural-language-processing
Lingua
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Stars: ✭ 341 (-2.01%)
Mutual labels:  natural-language-processing
Nndial
NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
Stars: ✭ 332 (-4.6%)
Mutual labels:  natural-language-processing

Indic NLP Library

The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages. Indian languages share a lot of similarity in terms of script, phonology, language syntax, etc. and this library is an attempt to provide a general solution to very commonly required toolsets for Indian language text.

The library provides the following functionalities:

  • Text Normalization
  • Script Information
  • Word Tokenization and Detokenization
  • Sentence Splitting
  • Word Segmentation
  • Syllabification
  • Script Conversion
  • Romanization
  • Indicization
  • Transliteration
  • Translation

The data resources required by the Indic NLP Library are hosted in a different repository. These resources are required for some modules. You can download from the Indic NLP Resources project.

If you are interested in Indian language NLP resources, you should check the Indic NLP Catalog for pointers.

Pre-requisites

  • Python 3.x
    • (For Python 2.x version check the tag PYTHON_2.7_FINAL_JAN_2019. Not actively supporting Python 2.x anymore, but will try to maintain as much compatibility as possible)
  • Indic NLP Resources
  • Other dependencies are listed in setup.py

Configuration

  • Installation from pip:

    pip install indic-nlp-library

  • If you want to use the project from the github repo, add the project to the Python Path:

    • Clone this repository
    • Install dependencies: pip install -r requirements.txt
    • Run: export PYTHONPATH=$PYTHONPATH:<project base directory>
  • In either case, export the path to the Indic NLP Resources directory

    Run: export INDIC_RESOURCES_PATH=<path to Indic NLP resources>

Usage

You can use the Python API to access all the features of the library. Many of the most common operations are also accessible via a unified commandline API.

Getting Started

Check this IPython Notebook for examples to use the Python API.

  • You can find the Python 2.x Notebook here

Documentation

You can find detailed documentation HERE

This documents the Python API as well as the commandline reference.

Citing

If you use this library, please include the following citation:

@misc{kunchukuttan2020indicnlp,
author = "Anoop Kunchukuttan",
title = "{The IndicNLP Library}",
year = "2020",
howpublished={\url{https://github.com/anoopkunchukuttan/indic_nlp_library/blob/master/docs/indicnlp.pdf}}
}

You can find the document HERE

Website

http://anoopkunchukuttan.github.io/indic_nlp_library

Author

Anoop Kunchukuttan ([email protected])

Companies, Organizations, Projects using IndicNLP Library

Revision Log

0.71 : 03 Sep 2020

- Improved documentation
- Bug fixes

0.7 : 02 Apr 2020:

- Unified commandline 
- Improved documentation
- Added setup.py

0.6 : 16 Dec 2019:

- New romanizer and indicizer
- Script Unifiers
- Improved script normalizers
- Added contrib directory for sample uses
- changed to MIT license 

0.5 : 03 Jun 2019:

- Improved word tokenizer to handle dates and numbers. 
- Added sentence splitter that can handle common prefixes/honorofics and uses some heuristics.
- Added detokenizer
- Added acronym transliterator that can convert English acronyms to Brahmi-derived scripts

0.4 : 28 Jan 2019: Ported to Python 3, and lots of feature additions since last release; primarily around script information, script similarity and syllabification.

0.3 : 21 Oct 2014: Supports morph-analysis between Indian languages

0.2 : 13 Jun 2014: Supports transliteration between Indian languages and tokenization of Indian languages

0.1 : 12 Mar 2014: Initial version. Supports text normalization.

LICENSE

Indic NLP Library is released under the MIT license

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].