Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → starlordvk → Typing Assistant

starlordvk / Typing Assistant

Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.

Programming Languages

javascript

184084 projects - #8 most used programming language

python

139335 projects - #7 most used programming language

Labels

css nlp natural-language-processing keyboard prediction corpus autocompletion

Projects that are alternatives of or similar to Typing Assistant

Ja.text8

Japanese text8 corpus for word embedding.

Stars: ✭ 79 (+146.88%)

Mutual labels: corpus, natural-language-processing

Nlp bahasa resources

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

Stars: ✭ 158 (+393.75%)

Mutual labels: corpus, natural-language-processing

Ua Gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Stars: ✭ 108 (+237.5%)

Mutual labels: corpus, natural-language-processing

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (+278.13%)

Mutual labels: corpus, natural-language-processing

Awesome Persian Nlp Ir

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

Stars: ✭ 460 (+1337.5%)

Mutual labels: corpus, natural-language-processing

Coarij

Corpus of Annual Reports in Japan

Stars: ✭ 55 (+71.88%)

Mutual labels: corpus, natural-language-processing

Prosody

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text

Stars: ✭ 139 (+334.38%)

Mutual labels: corpus, natural-language-processing

A high-level machine learning and deep learning library for the PHP language.

Stars: ✭ 1,270 (+3868.75%)

Mutual labels: natural-language-processing, prediction

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (+696.88%)

Mutual labels: corpus, natural-language-processing

Nlvr

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Stars: ✭ 192 (+500%)

Mutual labels: corpus, natural-language-processing

Efaqa Corpus Zh

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

Stars: ✭ 170 (+431.25%)

Mutual labels: corpus, natural-language-processing

Quanteda

An R package for the Quantitative Analysis of Textual Data

Stars: ✭ 647 (+1921.88%)

Mutual labels: corpus, natural-language-processing

Weixin public corpus

微信公众号语料库

Stars: ✭ 465 (+1353.13%)

Mutual labels: corpus, natural-language-processing

Insuranceqa Corpus Zh

🚁 保险行业语料库，聊天机器人

Stars: ✭ 821 (+2465.63%)

Mutual labels: corpus, natural-language-processing

Named Entity Recognition

name entity recognition with recurrent neural network(RNN) in tensorflow

Stars: ✭ 20 (-37.5%)

Mutual labels: natural-language-processing

Voyager65 Keyplus

65% keyboard PCB for Keyplus firmware. Simplified variants available.

Stars: ✭ 29 (-9.37%)

Mutual labels: keyboard

This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett

Stars: ✭ 20 (-37.5%)

Mutual labels: natural-language-processing

Events

Repository for *SEM Paper on Event Coreference Resolution in ECB+

Stars: ✭ 20 (-37.5%)

Mutual labels: natural-language-processing

Punny captions

An implementation of the NAACL 2018 paper "Punny Captions: Witty Wordplay in Image Descriptions".

Stars: ✭ 31 (-3.12%)

Mutual labels: natural-language-processing

Tensorflow In Practice Specialization

DeepLearning.AI TensorFlow Developer Professional Certificate Specialization

Stars: ✭ 29 (-9.37%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

Typing-Assistant

Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.

Methodology

Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort. The implementation involves using a large corpus. The methods used by us are as follows:

A. Counting words in Corpora:

Counting of things in NLP is based on a corpus. NLTK (Natural Language Toolkit) provides a diverse set of corpora. For our project we'll be using the Brown corpus. The Brown corpus is a 1-million-word collection of samples from 500 written texts from different genres (newspaper, novels, non-fiction etc.). There are tasks such as spelling error detection, word prediction for which the location of the punctuation is important. Our application counts punctuation as words.

B. N-Grams Model:

Probabilistic models are used for computing the probability of an entire sentence or for giving a probabilistic prediction of what the next word will be in a sequence. This model involves looking at the conditional probability of a word given the previous words.

If we consider each word occurring in its correct location as an independent event, we might represent this probability as follows:

We can use the chain rule of probability to decompose this probability:

C. Bigram Model:

In this model we approximate the probability of a word given all the previous words by the conditional probability of the preceding word.

For a bigram grammar, then, we compute the probability of a complete string:

To calculate the probability, from this corpus we take the count of a particular bigram, and divide this count by the sum of all the bigrams that share the same first word.

D. Trigram Model:

A trigrammodel looks just the same as a bigram model, except that we condition on the two-previous words.

E. Minimum Edit Distance:

The distance between two strings is a measure of how alike two strings are to each other. The minimum edit distance between two strings is the minimum number of editing operations (insertion, deletion, substitution) needed to transform one string into another.

Minimum edit distance is used in the correction of spelling mistakes or OCR errors, and approximate string matching, where the objective is to find matches for short strings in many longer texts.

Implementation

Designing a keyboard interface:

The initial task was to design a keyboard interface as a web app. The keyboard layout consists of all keys which are present on a physical keyboard. The keyboard's interface will show the top three predictions for a given word sequence and suggest word-completion.

This interface was achieved by designing in HTML, CSS and dynamic behavior was made possible using JavaScript and AJAX.

Using Bigram and Trigram model to suggest predictions on the software keyboard:

To predict words for a given a sequence, a bigram and a trigram module were implemented in python. The bigram module calculates the probability of occurrence of a word after a given string. This is achieved by storing all the possible words in the corpus which can follow a given previous word and the count of this bigram as a key-value pair in a hash map. The probability can be calculated by dividing the value (count) by the number of times the given word occurs in the corpus.

Similarly, the trigram module is stored as a hash map of hash maps consisting of the possible words following a sequence of words (two words) with their respective counts. Hash maps are used to achieve faster lookups.

Using Minimum Edit Distance Module for auto-completion:

Often in real world typing, a user may mistype a word, which an intelligent typing assistant should be able to suggest corrections for. This is implemented using the concept of Minimum Edit Distance between predictions and what the user typed. We implement this using dynamic programming that finds the minimum number of additions, subtractions, or substitutions required to covert one word to another. However, when using this module with the large number of words in the corpus, we realise that it is not very time-efficient. Hence in our final implementation, we use the nltk function for finding the Levenshtein distance between given words. We also allow unit transposition cost to factor in cases where the user may have typed "fisrt" instead of "first", as is common in fast typing. We take all the possible predictions after the last completed word, and store them in an array. Then we find the distances between each prediction in the array and what the user has typed. We use a lambda function to sort these in ascending order and display the top results.

Building a simple python server in flask:

The probabilities will be calculated on the server and the top predictions will be sent to the client in response to the client's request.

The code for the working of server has been summarized below:

from flask import Flask, render_template, jsonify,request

import simplejson as json

app = Flask(__name__)

@app.route('/output', methods=['GET'])

def worker():

_ #Implementation of brigram and trigram modules_

_ _

_ Returns predictions_

if __name__=="__main__":

_ app.run()_

Flow of the application:

The following steps are executed in the application –

User types a string on the keyboard.
On every key press a AJAX request is sent to the python server with the request parameters being the string which has been typed till now.
If space key is pressed, then the server extracts the parameters from the query string and calculates the possible predictions for the string sequence by using the bigram/trigram model and returns them in the response which the client populates in the keyboard interface.
If space key is not pressed, then the server returns the auto-completions/spelling correction possible for the given word by using which of the n-gram word predictions previously made starts with the what the user is typing. If less than three such matches are found, the minimum edit distance module takes over and returns the remaining predictions, to make a total of three predictions at all times.
These predictions are converted to JSON and sent to the keyboard module.
The keyboard unwraps the JSON and puts the predictions over the keys.

Results:

Auto-Complete Suggestions:

If we type "he", then the top three auto-complete predictions are "heard", "held" and "headed".

When "call m" is typed, the suggestions are populated with the most common words beginning with "m" that are "me", "my", and "myself".

Word Predictions:

Following screenshots show the word predictions for the word sequence starting with "tell".

The top three words that can follow "tell" are "you", "him" and "me".

Word predictions for sequence "tell me" are "more", "me" and "a".

Top predictions for the word sequence "tell me more" are "about", ",", "than". Punctuations are also predicted.

The top three words that can follow "tell me more about" are "the", "him" and "Gabriel's".

When "hi" is typed after "tell me more about", auto-complete suggestions for "hi" are populated. This shows that both next-word prediction and auto-completion are working together.

From the above results it can be concluded that our implementation of

N-Gram model and minimum edit distance is capable of producing suggestions instantaneously thereby increasing the typing performance and reducing the user's efforts.

Screenshot of the requests received by the server

Case Study:

Most modern applications that rely on n-gram based models, such as machine translation applications, typically incorporate Bayesian inference. Modern statistical models are typically made up of two parts, a prior distribution describing the inherent likelihood of a possible result and a likelihood function used to assess the compatibility of a possible result with observed data. When a language model is used, it is used as part of the prior distribution.

N-gram find use in several areas of computer science, computational linguistics, and applied mathematics. It has been used for –

Keyboards like GBoard and SwiftKey - adapts to the way the user types, so that user spends less time correcting their typos.
Developing Systems for orally handicapped people -

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 32

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗