agrawal-rohit / stackoverflow-semantic-search

Licence: other

Word2Vec encodings based search engine for Stackoverflow questions

Programming Languages

11667 projects

Projects that are alternatives of or similar to stackoverflow-semantic-search

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+14721.74%)

Mutual labels: search-engine, semantic-search

Vectorsinsearch

Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015

Stars: ✭ 71 (+208.7%)

Mutual labels: search-engine, word2vec

revery

A personal semantic search engine capable of surfacing relevant bookmarks, journal entries, notes, blogs, contacts, and more, built on an efficient document embedding algorithm and Monocle's personal search index.

Stars: ✭ 200 (+769.57%)

Mutual labels: search-engine, word2vec

LegalQA

Korean LegalQA using SentenceKoBART

Stars: ✭ 77 (+234.78%)

Mutual labels: search-engine, semantic-search

solr

Apache Solr open-source search software

Stars: ✭ 651 (+2730.43%)

Mutual labels: search-engine

two-stream-cnn

A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data

Stars: ✭ 24 (+4.35%)

Mutual labels: word2vec

SmartImage

Reverse image search tool (SauceNao, ImgOps, trace.moe, and more)

Stars: ✭ 346 (+1404.35%)

Mutual labels: search-engine

api.rss.ui

Simple search interface around FeediRSS API.

Stars: ✭ 52 (+126.09%)

Mutual labels: search-engine

milli

Search engine library for Meilisearch ⚡️

Stars: ✭ 433 (+1782.61%)

Mutual labels: search-engine

Word-Embeddings-and-Document-Vectors

An evaluation of word-embeddings for classification

Stars: ✭ 32 (+39.13%)

Mutual labels: word2vec

word2vec-movies

Bag of words meets bags of popcorn in Python 3 中文教程

Stars: ✭ 54 (+134.78%)

Mutual labels: word2vec

fastHistory

A python tool connected to your terminal to store important commands, search them in a fast way and automatically paste them into your terminal

Stars: ✭ 24 (+4.35%)

Mutual labels: search-engine

mudrod

Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery and Access, online demo: https://mudrod.jpl.nasa.gov/#/

Stars: ✭ 15 (-34.78%)

Mutual labels: search-engine

bulksearch

Lightweight and read-write optimized full text search library.

Stars: ✭ 108 (+369.57%)

Mutual labels: search-engine

flipper

Search/Recommendation engine and metainformation server for fanfiction net

Stars: ✭ 29 (+26.09%)

Mutual labels: search-engine

Recommendation-based-on-sequence-

Recommendation based on sequence

Stars: ✭ 23 (+0%)

Mutual labels: word2vec

gsc-logger

Google Search Console Logger for Google App Engine

Stars: ✭ 38 (+65.22%)

Mutual labels: search-engine

doc2vec-api

document embedding and machine learning script for beginners

Stars: ✭ 92 (+300%)

Mutual labels: word2vec

hyperstar

Hyperstar: Negative Sampling Improves Hypernymy Extraction Based on Projection Learning.

Stars: ✭ 24 (+4.35%)

Mutual labels: word2vec

GE-FSG

Graph Embedding via Frequent Subgraphs

Stars: ✭ 39 (+69.57%)

Mutual labels: word2vec

View All Similar Projects ➔

Semantic Search for Stackoverflow

Problem Statement

Stack overflow provides one of the largest learning resources for programmers. Users post questions/doubts and his fellow peers try to provide solutions in the most helpful manner possible. The better an answer, the higher votes it gets, which also increase a user's reputation.

However, this huge amount of information makes it difficult to search for the solution you are looking for. It is not that big of an issue for Domain experts and other experienced professionals, because they are aware of the correct keywords required to get an appropriate answer. However, for a new programmer, this poses a great concern. For instance, if he needs to learn how to make a server using Python, it is quite unlikely that he would use the terms Django or Flask in the search box. Thus, this might intimidate the user to use the platform.

Proposed Solution

The Application Architecture

The Brain

What we want is for the platform to actually understand the semantics of what the user is trying to search for, and then return the most helpful results for him. Natural Language Processing (NLP) has come a long way since its inception in the 20th century. We decided to use this subfield of Artificial Intelligence in order to solve our problem. NLP has proven to work very well in the past few years due to development of fast processors, GPUs and sophisticated model architectures.

How to Install

Clone the repository using git clone https://github.com/agrawal-rohit/stackoverflow-semantic-search.git
In order to run the cells in the Jupyter notebooks, you need have jupyter-notebook installed in your python environment. This is optional, because the outputs have already been saved and included.
Enter the folder flask server using cd stacksearch webapp/flask server/ and run pip install -r requirements.txt from your python environment in order to install the required libraries.
The server can now be started by entering the folder flask server and running python app.py. The server should be up and running on http://127.0.0.1:5000/
Since the web interface has been written in ReactJS, you need to install npm. You can do so from this link
Enter the react frontend folder using cd stacksearch webapp/react frontend/
Install the required modules using npm install
Finally, you can start the web interface by running npm start. The web interface should be up and running on http://localhost:3000/

Limitations and Future improvement

Given the vast amount of data given on Stack overflow, I decided to exercise a few constraints for the proof of concept:

I have restricted the data to only Python Related Questions
I have restricted the possible tags to 500
I have used somewhat lower amounts of data points (~140,000) for faster processing
Since this project is mostly just a proof of concept, The web interface makes consecutive API calls to the server. This is not optimal for a production environment, and has only been added for visual aesthetic.

Further improvements may include:

Experiment to solve the problem using Topic Modelling or other sophisticated NLP tasks
Consider larger number of data points
Experiment with different architectures for the final classification network

Design Guide

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

agrawal-rohit / stackoverflow-semantic-search

Programming Languages

Labels

Projects that are alternatives of or similar to stackoverflow-semantic-search

Semantic Search for Stackoverflow

Problem Statement

Proposed Solution

How to Install

Limitations and Future improvement

Further improvements may include:

Design Guide