All Projects β†’ agrawal-rohit β†’ stackoverflow-semantic-search

agrawal-rohit / stackoverflow-semantic-search

Licence: other
Word2Vec encodings based search engine for Stackoverflow questions

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to stackoverflow-semantic-search

Haystack
πŸ” Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+14721.74%)
Mutual labels:  search-engine, semantic-search
Vectorsinsearch
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Stars: ✭ 71 (+208.7%)
Mutual labels:  search-engine, word2vec
revery
A personal semantic search engine capable of surfacing relevant bookmarks, journal entries, notes, blogs, contacts, and more, built on an efficient document embedding algorithm and Monocle's personal search index.
Stars: ✭ 200 (+769.57%)
Mutual labels:  search-engine, word2vec
LegalQA
Korean LegalQA using SentenceKoBART
Stars: ✭ 77 (+234.78%)
Mutual labels:  search-engine, semantic-search
solr
Apache Solr open-source search software
Stars: ✭ 651 (+2730.43%)
Mutual labels:  search-engine
two-stream-cnn
A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data
Stars: ✭ 24 (+4.35%)
Mutual labels:  word2vec
SmartImage
Reverse image search tool (SauceNao, ImgOps, trace.moe, and more)
Stars: ✭ 346 (+1404.35%)
Mutual labels:  search-engine
api.rss.ui
Simple search interface around FeediRSS API.
Stars: ✭ 52 (+126.09%)
Mutual labels:  search-engine
milli
Search engine library for Meilisearch ⚑️
Stars: ✭ 433 (+1782.61%)
Mutual labels:  search-engine
Word-Embeddings-and-Document-Vectors
An evaluation of word-embeddings for classification
Stars: ✭ 32 (+39.13%)
Mutual labels:  word2vec
word2vec-movies
Bag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (+134.78%)
Mutual labels:  word2vec
fastHistory
A python tool connected to your terminal to store important commands, search them in a fast way and automatically paste them into your terminal
Stars: ✭ 24 (+4.35%)
Mutual labels:  search-engine
mudrod
Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery and Access, online demo: https://mudrod.jpl.nasa.gov/#/
Stars: ✭ 15 (-34.78%)
Mutual labels:  search-engine
bulksearch
Lightweight and read-write optimized full text search library.
Stars: ✭ 108 (+369.57%)
Mutual labels:  search-engine
flipper
Search/Recommendation engine and metainformation server for fanfiction net
Stars: ✭ 29 (+26.09%)
Mutual labels:  search-engine
Recommendation-based-on-sequence-
Recommendation based on sequence
Stars: ✭ 23 (+0%)
Mutual labels:  word2vec
gsc-logger
Google Search Console Logger for Google App Engine
Stars: ✭ 38 (+65.22%)
Mutual labels:  search-engine
doc2vec-api
document embedding and machine learning script for beginners
Stars: ✭ 92 (+300%)
Mutual labels:  word2vec
hyperstar
Hyperstar: Negative Sampling Improves Hypernymy Extraction Based on Projection Learning.
Stars: ✭ 24 (+4.35%)
Mutual labels:  word2vec
GE-FSG
Graph Embedding via Frequent Subgraphs
Stars: ✭ 39 (+69.57%)
Mutual labels:  word2vec

Semantic Search for Stackoverflow

Problem Statement

Stack overflow provides one of the largest learning resources for programmers. Users post questions/doubts and his fellow peers try to provide solutions in the most helpful manner possible. The better an answer, the higher votes it gets, which also increase a user's reputation.

However, this huge amount of information makes it difficult to search for the solution you are looking for. It is not that big of an issue for Domain experts and other experienced professionals, because they are aware of the correct keywords required to get an appropriate answer. However, for a new programmer, this poses a great concern. For instance, if he needs to learn how to make a server using Python, it is quite unlikely that he would use the terms Django or Flask in the search box. Thus, this might intimidate the user to use the platform.

Proposed Solution

The Application Architecture

App Flow

The Brain The Brain

What we want is for the platform to actually understand the semantics of what the user is trying to search for, and then return the most helpful results for him. Natural Language Processing (NLP) has come a long way since its inception in the 20th century. We decided to use this subfield of Artificial Intelligence in order to solve our problem. NLP has proven to work very well in the past few years due to development of fast processors, GPUs and sophisticated model architectures.

How to Install

  1. Clone the repository using git clone https://github.com/agrawal-rohit/stackoverflow-semantic-search.git
  2. In order to run the cells in the Jupyter notebooks, you need have jupyter-notebook installed in your python environment. This is optional, because the outputs have already been saved and included.
  3. Enter the folder flask server using cd stacksearch webapp/flask server/ and run pip install -r requirements.txt from your python environment in order to install the required libraries.
  4. The server can now be started by entering the folder flask server and running python app.py. The server should be up and running on http://127.0.0.1:5000/
  5. Since the web interface has been written in ReactJS, you need to install npm. You can do so from this link
  6. Enter the react frontend folder using cd stacksearch webapp/react frontend/
  7. Install the required modules using npm install
  8. Finally, you can start the web interface by running npm start. The web interface should be up and running on http://localhost:3000/

Limitations and Future improvement

Given the vast amount of data given on Stack overflow, I decided to exercise a few constraints for the proof of concept:

  1. I have restricted the data to only Python Related Questions
  2. I have restricted the possible tags to 500
  3. I have used somewhat lower amounts of data points (~140,000) for faster processing
  4. Since this project is mostly just a proof of concept, The web interface makes consecutive API calls to the server. This is not optimal for a production environment, and has only been added for visual aesthetic.

Further improvements may include:

  • Experiment to solve the problem using Topic Modelling or other sophisticated NLP tasks
  • Consider larger number of data points
  • Experiment with different architectures for the final classification network

Design Guide

Design Guide

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].