Alternatives and detailed information of synonym-extractor

vi3k6i5 / synonym-extractor

Licence: MIT license

Extract synonyms, keywords from sentences using modified implementation of Aho Corasick algorithm

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to synonym-extractor

Javainterview

最全的Java技术知识点，以及Java源码分析。为开源贡献自己的一份力。

Stars: ✭ 154 (+305.26%)

Mutual labels: datastructures

Algocasts Js

DSA in JavaScript ✅

Stars: ✭ 189 (+397.37%)

Mutual labels: datastructures

Schematics

Project documentation: https://schematics.readthedocs.io/en/latest/

Stars: ✭ 2,461 (+6376.32%)

Mutual labels: datastructures

Python data structures and algorithms

Python 中文数据结构和算法教程

Stars: ✭ 2,194 (+5673.68%)

Mutual labels: datastructures

Cosmos

Hacktoberfest 2021 | World's largest Contributor driven code dataset | Algorithms that run our universe | Your personal library of every algorithm and data structure code that you will ever encounter |

Stars: ✭ 12,936 (+33942.11%)

Mutual labels: datastructures

C Macro Collections

Easy to use, header only, macro generated, generic and type-safe Data Structures in C

Stars: ✭ 192 (+405.26%)

Mutual labels: datastructures

Competitive Programming

VastoLorde95's solutions to 2000+ competitive programming problems from various online judges

Stars: ✭ 147 (+286.84%)

Mutual labels: datastructures

wordhoard

This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.

Stars: ✭ 78 (+105.26%)

Mutual labels: synonyms

Interview Questions

List of all the Interview questions practiced from online resources and books

Stars: ✭ 187 (+392.11%)

Mutual labels: datastructures

Hackerranksolutions

This is a repo for HackerRankSolutions with Swift

Stars: ✭ 213 (+460.53%)

Mutual labels: datastructures

Algorithm

The repository algorithms implemented on the Go

Stars: ✭ 163 (+328.95%)

Mutual labels: datastructures

Matlab Octave

This repository contains algorithms written in MATLAB/Octave. Developing algorithms in the MATLAB environment empowers you to explore and refine ideas, and enables you test and verify your algorithm.

Stars: ✭ 180 (+373.68%)

Mutual labels: datastructures

Competitive Programming Resources

This repository consists of data helpful for ACM ICPC programming contest, in general competitive programming.

Stars: ✭ 199 (+423.68%)

Mutual labels: datastructures

Golang Set

A simple set type for the Go language. Trusted by Docker, 1Password, Ethereum and Hashicorp.

Stars: ✭ 2,168 (+5605.26%)

Mutual labels: datastructures

Staticvec

Implements a fixed-capacity stack-allocated Vec alternative backed by an array, using const generics.

Stars: ✭ 236 (+521.05%)

Mutual labels: datastructures

Umbrella

"A collection of functional programming libraries that can be composed together. Unlike a framework, thi.ng is a suite of instruments and you (the user) must be the composer of. Geared towards versatility, not any specific type of music." — @loganpowell via Twitter

Stars: ✭ 2,186 (+5652.63%)

Mutual labels: datastructures

Data Structures And Algorithms

Data Structures and Algorithms implementation in Go

Stars: ✭ 2,272 (+5878.95%)

Mutual labels: datastructures

cracking-interview

Cracking the coding interview

Stars: ✭ 19 (-50%)

Mutual labels: datastructures

Competitive Programming Library

Templates, algorithms and data structures implemented and collected for programming contests. Check README.md for an overview.

Stars: ✭ 236 (+521.05%)

Mutual labels: datastructures

Nearestneighbors.jl

High performance nearest neighbor data structures and algorithms for Julia.

Stars: ✭ 212 (+457.89%)

Mutual labels: datastructures

View All Similar Projects ➔

This project has moved to Flash Text.

synonym-extractor

Synonym Extractor is a python library that is loosely based on Aho-Corasick algorithm.

The idea is to extract words that we care about from a given sentence in one pass.

Basically say I have a vocabulary of 10K words and I want to get all the words from that set present in a sentence. A simple regex match will take a lot of time to loop over the 10K documents.

Hence we use a simpler yet much faster algorithm to get the desired result.

Installation

pip install synonym-extractor

Usage

# import module
from synonym.extractor import SynonymExtractor

# Create an object of SynonymExtractor
synonym_extractor = SynonymExtractor()

# add synonyms
synonym_names = ['NY', 'new-york', 'SF']
clean_names = ['new york', 'new york', 'san francisco']

for synonym_name, clean_name in zip(synonym_names, clean_names):
    synonym_extractor.add_to_synonym(synonym_name, clean_name)

synonyms_found = synonym_extractor.get_synonyms_from_sentence('I love SF and NY. new-york is the best.')

synonyms_found
>> ['san francisco', 'new york', 'new york']

Algorithm

synonym-extractor is based on Aho-Corasick algorithm.

Documentation

Documentation can be found at Read the Docs.

Why

Say you have a corpus where similar words appear frequently.

eg: Last weekened I was in NY.: I am traveling to new york next weekend.

If you train a word2vec model on this or do any sort of NLP it will treat NY and new york as 2 different words.

Instead if you create a synonym dictionary like:

eg: NY=>new york: new york=>new york

Then you can extract NY and new york as the same text.

To do the same with regex it will take a lot of time:

Docs count	# Synonyms	:	Regex	synonym-extractor
1.5 million	2K	:	16 hours	NA
2.5 million	10K	:	15 days	15 mins

The idea for this library came from the following StackOverflow question.

License

The project is licensed under the MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

vi3k6i5 / synonym-extractor

Programming Languages

Labels

Projects that are alternatives of or similar to synonym-extractor

This project has moved to Flash Text.

synonym-extractor

Installation

Usage

Algorithm

Documentation

Why

License