All Projects → Sarthakjain1206 → Intelligent_Document_Finder

Sarthakjain1206 / Intelligent_Document_Finder

Licence: MIT license
Document Search Engine Tool

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
CSS
56736 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Intelligent Document Finder

Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+18548.89%)
Mutual labels:  scrapy, webcrawler
python-Reptile
python-Reptile
Stars: ✭ 31 (-31.11%)
Mutual labels:  scrapy-spider, scrapy
Entity2Topic
[NAACL2018] Entity Commonsense Representation for Neural Abstractive Summarization
Stars: ✭ 20 (-55.56%)
Mutual labels:  text-summarization, document-summarization
indexer4j
Simple full text indexing and searching library for Java
Stars: ✭ 47 (+4.44%)
Mutual labels:  search-engine, search-algorithm
Flexsearch
Next-Generation full text search library for Browser and Node.js
Stars: ✭ 8,108 (+17917.78%)
Mutual labels:  search-engine, search-algorithm
ScrapyProject
Scrapy项目(mysql+mongodb豆瓣top250电影)
Stars: ✭ 18 (-60%)
Mutual labels:  scrapy-spider, scrapy
scrapy plus
scrapy 常用爬网必备工具包
Stars: ✭ 18 (-60%)
Mutual labels:  scrapy-spider, scrapy
iresearch
IResearch is a cross-platform, high-performance document oriented search engine library written entirely in C++ with the focus on a pluggability of different ranking/similarity models
Stars: ✭ 121 (+168.89%)
Mutual labels:  search-engine, bm25
Funpyspidersearchengine
Word2vec 千人千面 个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索
Stars: ✭ 782 (+1637.78%)
Mutual labels:  search-engine, scrapy
bulksearch
Lightweight and read-write optimized full text search library.
Stars: ✭ 108 (+140%)
Mutual labels:  search-engine, search-algorithm
devsearch
A web search engine built with Python which uses TF-IDF and PageRank to sort search results.
Stars: ✭ 52 (+15.56%)
Mutual labels:  search-engine, scrapy
viconf
My (n)Vim config files
Stars: ✭ 18 (-60%)
Mutual labels:  spellchecker
elves
🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 322 (+615.56%)
Mutual labels:  scrapy
botanalyse
botsonar analyse open api
Stars: ✭ 19 (-57.78%)
Mutual labels:  search-engine
twitter mining
Twitter Mining in Java
Stars: ✭ 25 (-44.44%)
Mutual labels:  latent-dirichlet-allocation
scrapy.dart
Scrapy, a fast high-level web crawling & scraping framework for dart and Flutter
Stars: ✭ 50 (+11.11%)
Mutual labels:  scrapy
gosearch
Web crawler and Search engine in Golang.
Stars: ✭ 19 (-57.78%)
Mutual labels:  search-engine
GreasyFork-Scripts
该项目开源代码用于主流浏览器的油猴脚本,包含字体渲染脚本 Font Rendering.user.js, 搜索引擎跳转工具 Google & Baidu Switcher.user.js.
Stars: ✭ 260 (+477.78%)
Mutual labels:  search-engine
scrapy-cookies
A middleware of cookies persistence for Scrapy
Stars: ✭ 19 (-57.78%)
Mutual labels:  scrapy
wink-bm25-text-search
Fast Full Text Search based on BM25
Stars: ✭ 44 (-2.22%)
Mutual labels:  bm25

Intelligent Document Finder 2.0



A tool which can find your any document using semantic search.

This is an Improvised Version of Intelligent-Document-Finder
List of New Features--

  1. Implemented Document Similarity Script, which allows you to see related or most similar documents.
  2. Revamped website UI.
  3. Reduces time complexities of search functions.

What is Intelligent Document Finder ?

How easy do you find it to remember the exact location of a document that you created last year? Not very easy, right? Big Organizations/people deal with hundreds of documents daily and forget about them, most of the time.
But what if we want that old documentation again for some work, but unfortunately you do not remember the name or the actual content of that document to retrieve it from the large storage of your computer.
In such cases, use of a Intelligent document finder can really make a huge difference. As, it can Search for the document(semantically) of your need based on a query input. This will not only help in faster access to the document, but will also help in grouping similar documents together and in analysing them.

Watch Project Demo:

Watch Demo

Note

Currently this repositry is using predefined database of news articles gathered by web scraping. Due to the github restrictions on uploading the large files, we cannot upload it here.

Soon, we will add the support of the dynamic databases, so that you can use this tool for your own databases to build your own custom search engine.

Technologies Used

Python3.6 JavaScript jQuery HTML & CSS

Database Used:

SQlite

For implementing searching:

Various NLP(Natural Language Processing) techniques is used.

For website:

  • Python-based Web framework : Flask
  • JavaScript
  • jQuery

Program Flow

Trulli

Compatibility

  • Backend (AI part) is compatible on any machine that has python and required dependencies installed.
  • Recommended browsers: Mozilla Firefox and Google Chrome.

How to Install and Use ?

> mkdir IntelligentDocumentFinder

> cd IntelligentDocumentFinder

> git clone https://github.com/Sarthakjain1206/Intelligent_Document_Finder_2.0.git

Install Vitual Environment if not installed

  • On Linux/MacOs > python3 -m pip install --user virtualenv
  • On windows > py -m pip install --user virtualenv

Create Virtual Environment

  • On macOS and Linux: > python3 -m venv env
  • On Windows: > py -m venv env

Activate Environment:

  • On macOS and Linux: > source env/bin/activate
  • On Windows: > .\env\Scripts\activate

> pip install -r requirements.txt

Download Glove Word Embeddings from this link, decompress it and copy the glove.6B.100d file in DataBase folder

then, run initial_file.py through this command > python initial_file.py

Now you are good to go.. Just type this command everytime you want to access it, and open the website in chrome/firefox
> python src/app.py

Developers

You can get in touch with us on linkedln profiles


Sarthak Jain Machine Learning NLP Web Crawling

Foo

You can also follow me on Github to stay updated about my latest projects Foo

Rishabh Mishra Full Stack Web Developer

Foo

You can also follow me on Github to stay updated about my latest projects Foo

If you liked this repository, then do support it by giving it a star

Contributions

If you find any bug or have any suggestions to improve this project, then feel free to generate a pull request.

There are a lot of features that can be added to this tool.

  1. Query Segmentation
  2. Query Expansion (Mainly - Pseudo Relevance Feedback technique)
  3. Improvising Spell Checker
  4. Collocations For example- Currently this project consider "New York" as ["New","York"] i.e two different words but it should be consider as a single entity like ["New_York"], this can definitely make a big difference in search results.
  5. Query Logs (Game changing technique for search engines)
  6. Search result's segmentation [like- Luecene]

If you have any experience in implementing any of these features then, do contribue.

References

  1. Awsome article of BM25 ranking algorithm on wikipedia - Okapi BM25

  2. Read this article on Topic Modeling

  3. Completely followed this beautiful article on SVOs tagging for generating tags for this project.

  4. Used the BM25 ranking fuction implementation from this great repositry on github by dorianbrown.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].