All Projects → panggi → Pujangga

panggi / Pujangga

Licence: apache-2.0
Pujangga - Indonesian Natural Language Processing Tool with REST API, an Interface for InaNLP and Deeplearning4j's Word2Vec

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Pujangga

Awesome Embedding Models
A curated list of awesome embedding models tutorials, projects and communities.
Stars: ✭ 1,486 (+3061.7%)
Mutual labels:  natural-language-processing, word2vec
Deep Math Machine Learning.ai
A blog which talks about machine learning, deep learning algorithms and the Math. and Machine learning algorithms written from scratch.
Stars: ✭ 173 (+268.09%)
Mutual labels:  natural-language-processing, word2vec
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+3563.83%)
Mutual labels:  natural-language-processing, word2vec
Ja.text8
Japanese text8 corpus for word embedding.
Stars: ✭ 79 (+68.09%)
Mutual labels:  natural-language-processing, word2vec
Natural Language Processing
Programming Assignments and Lectures for Stanford's CS 224: Natural Language Processing with Deep Learning
Stars: ✭ 377 (+702.13%)
Mutual labels:  natural-language-processing, word2vec
Repo 2016
R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation
Stars: ✭ 103 (+119.15%)
Mutual labels:  natural-language-processing, word2vec
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+27055.32%)
Mutual labels:  natural-language-processing, word2vec
Scattertext Pydata
Notebooks for the Seattle PyData 2017 talk on Scattertext
Stars: ✭ 132 (+180.85%)
Mutual labels:  natural-language-processing, word2vec
Languagecrunch
LanguageCrunch NLP server docker image
Stars: ✭ 281 (+497.87%)
Mutual labels:  natural-language-processing, word2vec
Practical 1
Oxford Deep NLP 2017 course - Practical 1: word2vec
Stars: ✭ 220 (+368.09%)
Mutual labels:  natural-language-processing, word2vec
Sense2vec
🦆 Contextually-keyed word vectors
Stars: ✭ 1,184 (+2419.15%)
Mutual labels:  natural-language-processing, word2vec
Text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (+1421.28%)
Mutual labels:  natural-language-processing, word2vec
Kor2vec
Library for Korean morpheme and word vector representation
Stars: ✭ 64 (+36.17%)
Mutual labels:  natural-language-processing, word2vec
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+2865.96%)
Mutual labels:  natural-language-processing, word2vec
Repo 2017
Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano
Stars: ✭ 1,123 (+2289.36%)
Mutual labels:  natural-language-processing, word2vec
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (+302.13%)
Mutual labels:  natural-language-processing, word2vec
Cs224n
CS224n: Natural Language Processing with Deep Learning Assignments Winter, 2017
Stars: ✭ 656 (+1295.74%)
Mutual labels:  natural-language-processing, word2vec
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+1580.85%)
Mutual labels:  natural-language-processing, word2vec
Pytorch Skipgram
Implementing Skip-gram Negative Sampling with pytorch
Stars: ✭ 39 (-17.02%)
Mutual labels:  word2vec
Ludwig
Data-centric declarative deep learning framework
Stars: ✭ 8,018 (+16959.57%)
Mutual labels:  natural-language-processing

Pujangga

Indonesian Natural Language Processing REST API

An interface for InaNLP and Deeplearning4j's Word2Vec for Indonesian (Bahasa Indonesia) in the form of REST API.

Below is the screenshot of Pujangga's request and response using Paw REST Client

screenshot

Credits:

Local Setup

  1. Install scala 2.12.2 and Lightbend Activator

  2. Clone the project

$ git clone [email protected]:panggi/pujangga.git
  1. Download the dependencies
$ cd pujangga
$ activator
  1. Pretrained word2vec model can be downloaded here https://drive.google.com/uc?id=0B5YTktu2dOKKNUY1OWJORlZTcUU&export=download

  2. Run Application

$ export WORD2VEC_FILE=/path/to/word2vec_wiki_id   
$ activator run 
  1. Access on http://localhost:9000

API Endpoints

Stemmer

Request:

POST /stemmer

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": "prof Habibie akan laku kunjung resmi ke pt Pindad di bandung"
}

Phrase Chunker

Request:

POST /phrasechunker

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": {
    "map": {
      "Pindad ": "NP",
      "Prof. Habibie ": "NP",
      ".": ".",
      "di Bandung ": "PP",
      "akan melakukan kunjungan resmi ke PT ": "VP"
    },
    "list": [
      "NP",
      "VP",
      "NP",
      "PP"
    ]
  }
}

Part-of-Speech Tagger

Request:

POST /postagger

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": {
    "map": {
      "resmi": "JJ",
      ".": ".",
      "akan": "MD",
      "ke": "IN",
      "di": "IN",
      "Bandung": "NNP",
      "Pindad": "NNP",
      "PT": "NN",
      "Prof.": "NNP",
      "kunjungan": "NN",
      "Habibie": "NNP",
      "melakukan": "VBT"
    },
    "list": [
      "NNP",
      "NNP",
      "MD",
      "VBT",
      "NN",
      "JJ",
      "IN",
      "NN",
      "NNP",
      "IN",
      "NNP"
    ]
  }
}

Named-Entity Tagger

Request:

POST /netagger

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": [
    "OTHER",
    "PERSON-B",
    "OTHER",
    "OTHER",
    "OTHER",
    "OTHER",
    "OTHER",
    "LOCATION-B",
    "OTHER",
    "PERSON-B",
    "OTHER",
    "LOCATION-B"
  ]
}

Formalizer

Request:

POST /formalizer

{
  "string": "Sis, lu bisa nggak pesenin gw sepatu newbalance tipe 960? gpl ya. hati2 sama penipuan anak 4l4y"
}

Response:

{
  "status": "success",
  "data": "Sis , kamu bisa tidak pesankan saya sepatu newbalance tipe 960 ? tidak pakai lama iya . hati-hati sama penipuan anak norak "
}

Stopwords Removal

Request:

POST /stopwords

{
  "string": "Prof. Habibie akan melakukan kunjungan resmi ke PT. Pindad di Bandung"
}

Response:

{
  "status": "success",
  "data": "Prof. Habibie kunjungan resmi PT . Pindad Bandung "
}

Sentence Tokenizer

Request:

POST /sentence/tokenizer

{
  "string": "Saya pergi ke (bagian kanan) rumah sakit Prof. Dr. Soerojo."
}

Response:

{
  "status": "success",
  "data": [
    "Saya",
    "pergi",
    "ke",
    "(",
    "bagian",
    "kanan",
    ")",
    "rumah",
    "sakit",
    "Prof.",
    "Dr.",
    "Soerojo",
    "."
  ]
}

Sentence Tokenizer with Composite Words

Request:

POST /sentence/tokenizer/composite

{
  "string": "Saya pergi ke (bagian kanan) rumah sakit Prof. Dr. Soerojo."
}

Response:

{
  "status": "success",
  "data": [
    "Saya",
    "pergi",
    "ke",
    "(",
    "bagian kanan",
    ")",
    "rumah sakit",
    "Prof.",
    "Dr.",
    "Soerojo",
    "."
  ]
}

Sentence Splitter

Request:

POST /sentence/splitter

{
  "string": "Michael Jeffrey Jordan dilahirkan di Brooklyn, New York, Amerika Serikat, pada 17 Februari 1963 adalah pemain bola basket profesional asal Amerika. Michael Jordan merupakan pemain terkenal di dunia dalam cabang olahraga itu. Setidaknya ia enam kali merebut kejuaraan NBA bersama kelompok Chicago Bulls (1991-1993, 1996-1998). Ia memiliki tinggi badan 198 cm dan merebut gelar pemain terbaik."
}

Response:

{
  "status": "success",
  "data": [
    "Michael Jeffrey Jordan dilahirkan di Brooklyn, New York, Amerika Serikat, pada 17 Februari 1963 adalah pemain bola basket profesional asal Amerika .",
    "Michael Jordan merupakan pemain terkenal di dunia dalam cabang olahraga itu .",
    "Setidaknya ia enam kali merebut kejuaraan NBA bersama kelompok Chicago Bulls (1991-1993, 1996-1998) .",
    "Ia memiliki tinggi badan 198 cm dan merebut gelar pemain terbaik ."
  ]
}

Word2Vec Nearest Words

Request:

POST /word2vec/nearestwords

{
  "string": "mobil",
  "n": 10
}

Response:

{
  "status": "success",
  "data": [
    "motor",
    "dikendarai",
    "sepeda",
    "truk",
    "motornya",
    "mengemudikan",
    "mobil-mobil",
    "mobilnya",
    "mengendarai",
    "pengemudi"
  ]
}

Word2Vec Arithmetic

Request:

POST /word2vec/arithmetic

{
  "first_string": "serang",
  "second_string": "malang",
  "third_string": "surabaya",
  "n": 10
}

Response:

{
  "status": "success",
  "data": [
    "serang",
    "lebak",
    "puloampel",
    "keserangan",
    "bogor",
    "waringinkurung",
    "jawilan",
    "cianjur",
    "garut",
    "padarincang"
  ]
}

Word2Vec Similarity

Request:

POST /word2vec/similarity

{
  "first_string": "sore",
  "second_string": "petang"
}

Response:

{
  "status": "success",
  "data": 0.7748607993125916
}

License

All files in libs and resource directories are the property of Dr. Eng. Ayu Purwarianti, ST.,MT., et al and not part of the license below (Apache License, Version 2.0).

All other custom codes made by Panggi Libersa Jasri Akadol are licensed under the Apache License, Version 2.0 (the "License"); you may not use this project except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].