All Projects → fastread → src

fastread / src

Licence: MIT license
tools for fast reading of docs

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
HTML
75241 projects
CSS
56736 projects
Batchfile
5799 projects

Projects that are alternatives of or similar to src

EMNLP2020
This is official Pytorch code and datasets of the paper "Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News", EMNLP 2020.
Stars: ✭ 55 (+37.5%)
Mutual labels:  information-retrieval, learning-to-rank
Ranking
Learning to Rank in TensorFlow
Stars: ✭ 2,362 (+5805%)
Mutual labels:  information-retrieval, learning-to-rank
aerial wildlife detection
Tools for detecting wildlife in aerial images using active learning
Stars: ✭ 177 (+342.5%)
Mutual labels:  active-learning
AILA-Artificial-Intelligence-for-Legal-Assistance
Python implementations of the various methods used in FIRE 2019 conference.
Stars: ✭ 39 (-2.5%)
Mutual labels:  information-retrieval
MixGCF
MixGCF: An Improved Training Method for Graph Neural Network-based Recommender Systems, KDD2021
Stars: ✭ 73 (+82.5%)
Mutual labels:  information-retrieval
molpal
active learning for accelerated high-throughput virtual screening
Stars: ✭ 110 (+175%)
Mutual labels:  active-learning
fastrank
My most frequently used learning-to-rank algorithms ported to rust for efficiency. Try it: "pip install fastrank".
Stars: ✭ 43 (+7.5%)
Mutual labels:  learning-to-rank
netizenship
a commandline #OSINT tool to find the online presence of a username in popular social media websites like Facebook, Instagram, Twitter, etc.
Stars: ✭ 33 (-17.5%)
Mutual labels:  information-retrieval
GNN-Recommender-Systems
An index of recommendation algorithms that are based on Graph Neural Networks.
Stars: ✭ 505 (+1162.5%)
Mutual labels:  information-retrieval
activelearning
Active Learning in R
Stars: ✭ 43 (+7.5%)
Mutual labels:  active-learning
BERT-QE
Code and resources for the paper "BERT-QE: Contextualized Query Expansion for Document Re-ranking".
Stars: ✭ 43 (+7.5%)
Mutual labels:  information-retrieval
ProQA
Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval
Stars: ✭ 44 (+10%)
Mutual labels:  information-retrieval
kex
Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.
Stars: ✭ 46 (+15%)
Mutual labels:  information-retrieval
rust-stemmers
A rust implementation of some popular snowball stemming algorithms
Stars: ✭ 85 (+112.5%)
Mutual labels:  information-retrieval
FastAP-metric-learning
Code for CVPR 2019 paper "Deep Metric Learning to Rank"
Stars: ✭ 93 (+132.5%)
Mutual labels:  learning-to-rank
COVID19-IRQA
No description or website provided.
Stars: ✭ 32 (-20%)
Mutual labels:  information-retrieval
naacl2018-fever
Fact Extraction and VERification baseline published in NAACL2018
Stars: ✭ 109 (+172.5%)
Mutual labels:  information-retrieval
3d model retriever
Experimenting with a newly published deep learning paper and how it can be used for content-based 3D model retrieval. (info retrieval for CAD)
Stars: ✭ 45 (+12.5%)
Mutual labels:  information-retrieval
ml4ir
Machine Learning for Information Retrieval
Stars: ✭ 75 (+87.5%)
Mutual labels:  information-retrieval
ml-nlp-services
机器学习、深度学习、自然语言处理
Stars: ✭ 23 (-42.5%)
Mutual labels:  information-retrieval

What is FASTREAD (or FAST2)?

FASTREAD (FAST2) is a tool to support primary study selection in systematic literature review.

Latest Versions:

Cite as:

@article{Yu2019,
title = "FAST2: An intelligent assistant for finding relevant papers",
journal = "Expert Systems with Applications",
volume = "120",
pages = "57 - 71",
year = "2019",
author = "Zhe Yu and Tim Menzies",
keywords = "Active learning, Literature reviews, Text mining, Semi-supervised learning, Relevance feedback, Selection process"
}


@Article{Yu2018,
author="Yu, Zhe
and Kraft, Nicholas A.
and Menzies, Tim",
title="Finding better active learners for faster literature reviews",
journal="Empirical Software Engineering",
year="2018",
month="Mar",
day="07",
issn="1573-7616",
doi="10.1007/s10664-017-9587-0",
url="https://doi.org/10.1007/s10664-017-9587-0"
}

Setting up FASTREAD

  1. Setting up Python:
  • We use anaconda by continuum.io (see Why?)
    • We won't need the entire distribution. Download a Python 3.7+ & install a minimal version of anaconda.
  • Make sure you select add to PATH during install.
  • Next, run setup.bat. This will install all the dependencies needed to run the tool. Or:
  • If the above does not work well. Remember you only need a Python 3.7 and three packages listed in requirements.txt installed. So pip install -r requirements.txt will work.
  1. Running script:
  • Navigate to src and run index.py.
  • If all is well, you'll be greeted by this:
  1. The Interface:

Use FASTREAD

  1. Get data ready:
  • Put your candidate list (a csv file) in workspace > data.
  • The candidate list can be as the same format as the example file workspace > data > Hall.csv or a csv file exported from IEEExplore and saved in the format of MS-DOS csv.
  1. Load the data:
  • Click Target: Choose File button to select your csv file in workspace > data. Wait a few seconds for the first time. Once the data is successfully loaded, you will see the following:
  1. Begin reviewing studies:
  • Check the box before Enable Estimation.
  • A simple search with two or three keywords can help find Relevant studies fast before any training starts.
  • Choose from Relevant, Irrelevant, or Undetermined for each study and hit Submit.
  • Hit Next when you want a to review more.
  • Statistics are displayed as Documents Coded: x/y (z), where x is the number of relevant studies retrieved, y is the number of studies reviewed, and z is the total number of candidate studies.
  • When x is greater than or equal to 1, an SVM model will be trained after hitting Next. From now on, different query strategies can be chosen.
  • It is suggested to keep using Uncertain until the highest probability score for Certain is greater than 0.9 or no Relevant studies can be found throught Uncertain (switch to Certain at that point of time).
  • keep reviewing studies until you think most relevant ones have been retrieved. (If Estimation is enabled, stop when x is close to or greater than 0.95 (or 0.90) of the estimated number of Relevant studies.)
  1. Plot the curve:
  • Click Plot button will plot a Relevant studies retrieved vs. Studies reviewed curve.
  • Check Auto Plot so that every time you hit next, a curve will be automatically generated.
  • You can also find the figure in src > static > image.
  1. Export csv:
  • Click Export button will generate a csv file with your coding in workspace > coded.
  1. Restart:
  • Click Restart button will give you a fresh start and loose all your previous effort on the current data.
  1. Remember to click Next button:
  • User data will be saved when and only when you hit Next button, so please don't forget to hit it before you want to stop reviewing.

Double checking previous labels:

Now we allow users to recheck their previously labeled results and change their decisions. Therefore human errors/concept drift can be handled. Model learned so far is also used to suggest which labels are most suspicious. Two options added:

  • Labeled Pos: recheck the studies previously labeled as Relevant. Sorted by the level of disagreement between current model prediction and human label.
  • Labeled Neg: recheck the studies previously labeled as Irrelevant. Sorted by the level of disagreement between current model prediction and human label. It is recommended to recheck the top 10 suspicious labels every 50 studies reviewed.
  • Latest: change latest submitted labels.

Run simulations of FASTREAD on labeled datasets:

simulate.py

Version Logs

Dec 5, 2016. v1.0.0 The very first, basic version is released.

May 14, 2017. v1.1.0 Features of UPDATE/REUSE are edited to allow FASTREAD import previously exported data to bootstrap a new review.

Jun 29, 2017. v1.2.0 Estimate the number of Relevant studies in the pool. Enabling Estimation will slow down the training, but provide the following benefits:

  • number of Relevant studies will be estimated, thus helps to decide when to stop;
  • probability scores will be more accurate.

Aug 01, 2017. v1.3.0 Core algorithm updated to utilize both Weighting and aggressive undersampling.

Aug 28, 2017. v1.4.0 Integrated as FAST2.

Nov 15, 2017. v1.5.0 Allow user to change their decision on previous labels. Machine suggestions used to efficiently handle human errors or concept drift.

Jan 28, 2020. v1.6.0 Added Support for Python 3. Show latest labeled items so that it is easy for the users to change their decisions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].