Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jiny2001 → Cvpr_paper_search_tool

jiny2001 / Cvpr_paper_search_tool

Licence: bsd-3-clause

Automatic paper clustering and search tool by fastext from Facebook Research

Programming Languages

python

139335 projects - #7 most used programming language

Labels

nlp scikit-learn fasttext jinja2

Projects that are alternatives of or similar to Cvpr paper search tool

Shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+355.81%)

Mutual labels: scikit-learn, fasttext

Computer Vision

Computer vision sabbatical study materials

Stars: ✭ 39 (-9.3%)

Mutual labels: scikit-learn

100 Days Of Ml Code

100 Days of ML Coding

Stars: ✭ 33,641 (+78134.88%)

Mutual labels: scikit-learn

Prediciting Binary Options

Predicting forex binary options using time series data and machine learning

Stars: ✭ 33 (-23.26%)

Mutual labels: scikit-learn

Deep learning projects

Stars: ✭ 28 (-34.88%)

Mutual labels: scikit-learn

Neural Networks

All about Neural Networks!

Stars: ✭ 34 (-20.93%)

Mutual labels: fasttext

Pythondatasciencehandbook

The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.

Stars: ✭ 31,995 (+74306.98%)

Mutual labels: scikit-learn

Jinja

A very fast and expressive template engine.

Stars: ✭ 8,170 (+18900%)

Mutual labels: jinja2

Machinelearningcourse

A collection of notebooks of my Machine Learning class written in python 3

Stars: ✭ 35 (-18.6%)

Mutual labels: scikit-learn

Mljar Supervised

Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀

Stars: ✭ 961 (+2134.88%)

Mutual labels: scikit-learn

Redzone

Lightweight C++ template engine with Jinja2-like syntax

Stars: ✭ 30 (-30.23%)

Mutual labels: jinja2

Traingenerator

🧙 A web app to generate template code for machine learning

Stars: ✭ 948 (+2104.65%)

Mutual labels: scikit-learn

Mlcourse.ai

Open Machine Learning Course

Stars: ✭ 7,963 (+18418.6%)

Mutual labels: scikit-learn

Ryuzaki bot

Simple chatbot in Python using NLTK and scikit-learn

Stars: ✭ 28 (-34.88%)

Mutual labels: scikit-learn

Embeddingsviz

Visualize word embeddings of a vocabulary in TensorBoard, including the neighbors

Stars: ✭ 40 (-6.98%)

Mutual labels: fasttext

Ailearning

AiLearning: 机器学习 - MachineLearning - ML、深度学习 - DeepLearning - DL、自然语言处理 NLP

Stars: ✭ 32,316 (+75053.49%)

Mutual labels: scikit-learn

Machine Learning Alpine

Alpine Container for Machine Learning

Stars: ✭ 30 (-30.23%)

Mutual labels: scikit-learn

The Deep Learning With Keras Workshop

An Interactive Approach to Understanding Deep Learning with Keras

Stars: ✭ 34 (-20.93%)

Mutual labels: scikit-learn

Sklearn Porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

Stars: ✭ 1,014 (+2258.14%)

Mutual labels: scikit-learn

The Hello World Of Machine Learning

Learn to build a basic machine learning model from scratch with this repo and tutorial series.

Stars: ✭ 41 (-4.65%)

Mutual labels: scikit-learn

View All Similar Projects ➔

Paper2Vec - Automatic document clustering and search tool

Automatic document clustering and search tools for ICCV / CVPR papers by using fasttext from Facebook Research

ICCV2017 Paper Search Tool >>

CVPR2018 Paper Search Tool >>

CVPR2019 Paper Search Tool >>

screen

Features

Build word/phrase dataset by text corpus
Train word embedding vector by fasttext
Build dcoument vector and reduce data dimension
Clusterize and visualize each document
Search Document by keywords
Find Document by similar Document

Steps

0. Scraping

Scrape paper info (title, abstract and PDF) from CVPR open access repository. Then extracts words to build a corpus. Scraping HTML/PDF is a little off-topic for this sample, so I removed those scraping code from this repository.

After extracting text from each PDF, we pre-processes the data like below.

Remove "-" with "/n" to concatenate words divided by CR.
Then replace "-" with " ".
Replace all other non character codes with " "
Convert all capital to small.
Remove "one character word" and special words like "http/https/ftp" or urls
Remove the, an in, on, and, of to, is, for, we, with, as, that, are, by, our, this, from, be, ca, at, us, it, has, have, been, do, does, these, those, and "et al".
Replace popular plural noun to singular noun.
Remove people's name.

The input corpus we built is placed under raw_data

1. Count all words' occurrences and unite multiple corpus files to one input corpus file.

Build my Paper2Vec instance. Load multiple corpus files. Replace rare words with UNK token to build a suitable size of dictionary.

p2v = Paper2Vec(data_dir=args.data_dir, word_dim=args.word_dim)
p2v.add_dictionary_from_file('CVPR2016/corpus.txt')
p2v.add_dictionary_from_file('CVPR2017/corpus.txt')
p2v.add_dictionary_from_file('CVPR2018/corpus.txt')
p2v.build_dictionary(args.max_dictionary_words)

2. Detects phrases by their appearance frequency. Then re-build a new corpus.

Count occurrences of words sequence. Unite frequent sequence words with "_". For ex "deep learning" will be now one word, "deep_learning".

p2v.detect_phrases(args.phrase_threshold)

p2v.create_corpus_with_phrases('corpus.txt')
p2v.convert_text_with_phrases('CVPR2018/abstract.txt', 'abstract.txt')
copyfile(args.data_dir + '/CVPR2018/paper_info.txt', args.data_dir + '/paper_info.txt')

3. Train word representation with fasttext.

You can train with "skipgram" or "cbow" by fasttext. Default dimension of vector is 75. Also you can find similar word by calling get_most_similar_words().

p2v.train_words_model('corpus.txt', 'fasttext_model', model=args.train_model, min_count=args.min_count)

print('[deep_learning]', p2v.get_most_similar_words('deep_learning', 12))

4. Build paper representation vectors with fasttext.

Calculate mean vector of each words' vector in abstract and title to define the vector as a paper representation vector.

p2v.build_paper_vectors()

5. Reduce dimensions and then apply k-means clustering.

Reduce 75-dim of paper vector into 2-dim by using t-SNE.

After clustering those papers, pick frequently used words in their title and abstract.

cluster[0] keywords: [question, over, representations, visual, vqa, scene, structure answering]

cluster[1] keywords: [feature, room, physics, optimization, semantic, transfer, layout, estimation]

cluster[2] keywords: [binary, local, lbc, convolutional_layer, cnn, linear, weights, savings, sparse, learnable]

cluster[3] keywords: [facial, wild, three_dimensional, texture, captured, morphable, new, datasets]

cluster[4] keywords: [quality, light, fields, light_field, dense, metrics, compression, reference, distorted]

cluster[5] keywords: [tracking, position, virtual, reality, commodity, vr, experience, infrastructure, infrared]

cluster[6] keywords: [tof, material, distortion, classification, depth, time, frequency, flight]

How to use

Find similar papers

python find_paper_by_paper.py "Hyperspectral Image Super-Resolution via Non-Local Sparse Tensor Factorization" --c 5

Output:
Loaded 783 papers info.

Target: [ Hyperspectral Image Super-Resolution via Non-Local Sparse Tensor Factorization ]

5 Papers found ---
Score:4.10, [ Hyper-Laplacian Regularized Unidirectional Low-Rank Tensor Recovery for Multispectral Image Denoising ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Chang_Hyper-Laplacian_Regularized_Unidirectional_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Chang_Hyper-Laplacian_Regularized_Unidirectional_CVPR_2017_paper.pdf

Score:5.33, [ A Non-Local Low-Rank Framework for Ultrasound Speckle Reduction ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Zhu_A_Non-Local_Low-Rank_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhu_A_Non-Local_Low-Rank_CVPR_2017_paper.pdf

Score:6.36, [ Nonnegative Matrix Underapproximation for Robust Multiple Model Fitting ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Tepper_Nonnegative_Matrix_Underapproximation_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Tepper_Nonnegative_Matrix_Underapproximation_CVPR_2017_paper.pdf

Score:6.76, [ Fractal Dimension Invariant Filtering and Its CNN-Based Implementation ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Xu_Fractal_Dimension_Invariant_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Xu_Fractal_Dimension_Invariant_CVPR_2017_paper.pdf

Score:7.64, [ On the Global Geometry of Sphere-Constrained Sparse Blind Deconvolution ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Zhang_On_the_Global_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhang_On_the_Global_CVPR_2017_paper.pdf

Find papers by keywords

python find_paper_by_words.py super resolution  -c 5

Loaded 783 papers info.

Keyword(s): ['super', 'resolution']

5 Papers found ---
Score:14, [ Hyperspectral Image Super-Resolution via Non-Local Sparse Tensor Factorization ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Dian_Hyperspectral_Image_Super-Resolution_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Dian_Hyperspectral_Image_Super-Resolution_CVPR_2017_paper.pdf

Score:12, [ Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Ledig_Photo-Realistic_Single_Image_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Ledig_Photo-Realistic_Single_Image_CVPR_2017_paper.pdf

Score:9, [ Real-Time Video Super-Resolution With Spatio-Temporal Networks and Motion Compensation ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Caballero_Real-Time_Video_Super-Resolution_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Caballero_Real-Time_Video_Super-Resolution_CVPR_2017_paper.pdf

Score:8, [ Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Lai_Deep_Laplacian_Pyramid_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Lai_Deep_Laplacian_Pyramid_CVPR_2017_paper.pdf

Score:7, [ Simultaneous Super-Resolution and Cross-Modality Synthesis of 3D Medical Images Using Weakly-Supervised Joint Convolutional Sparse Coding ]
Abstract URL:http://openaccess.thecvf.com/content_cvpr_2017/html/Huang_Simultaneous_Super-Resolution_and_CVPR_2017_paper.html
PDF URL:http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Simultaneous_Super-Resolution_and_CVPR_2017_paper.pdf

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 43

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗