All Projects → flandolfi → IR-exercises

flandolfi / IR-exercises

Licence: other
Solutions of the various test exams of the Information Retrieval course

Programming Languages

TeX
3793 projects

Projects that are alternatives of or similar to IR-exercises

Information-Retrieval
Information Retrieval algorithms developed in python. To follow the blog posts, click on the link:
Stars: ✭ 103 (+267.86%)
Mutual labels:  information-retrieval, exercises
dsge
Course on Dynamic Stochastic General Equilibrium (DSGE): Models, Solution, Estimation (graduate level)
Stars: ✭ 41 (+46.43%)
Mutual labels:  master, exercises
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+45482.14%)
Mutual labels:  information-retrieval
Catalyst
Accelerated deep learning R&D
Stars: ✭ 2,804 (+9914.29%)
Mutual labels:  information-retrieval
Openmatch
An Open-Source Package for Information Retrieval.
Stars: ✭ 186 (+564.29%)
Mutual labels:  information-retrieval
Bm25
A Python implementation of the BM25 ranking function.
Stars: ✭ 159 (+467.86%)
Mutual labels:  information-retrieval
Hdltex
HDLTex: Hierarchical Deep Learning for Text Classification
Stars: ✭ 191 (+582.14%)
Mutual labels:  information-retrieval
Pyserini
Python interface to the Anserini IR toolkit built on Lucene
Stars: ✭ 148 (+428.57%)
Mutual labels:  information-retrieval
ComposeAE
Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval
Stars: ✭ 49 (+75%)
Mutual labels:  information-retrieval
Vec4ir
Word Embeddings for Information Retrieval
Stars: ✭ 188 (+571.43%)
Mutual labels:  information-retrieval
Aquiladb
Drop in solution for Decentralized Neural Information Retrieval. Index latent vectors along with JSON metadata and do efficient k-NN search.
Stars: ✭ 222 (+692.86%)
Mutual labels:  information-retrieval
Neuralqa
NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT
Stars: ✭ 185 (+560.71%)
Mutual labels:  information-retrieval
Books
Books worth spreading
Stars: ✭ 161 (+475%)
Mutual labels:  information-retrieval
Pwnback
Burp Extender plugin that generates a sitemap of a website using Wayback Machine
Stars: ✭ 203 (+625%)
Mutual labels:  information-retrieval
Sf1r Lite
Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search
Stars: ✭ 158 (+464.29%)
Mutual labels:  information-retrieval
Trinity
Trinity IR Infrastructure
Stars: ✭ 227 (+710.71%)
Mutual labels:  information-retrieval
Terrier Core
Terrier IR Platform
Stars: ✭ 156 (+457.14%)
Mutual labels:  information-retrieval
K Nrm
K-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling
Stars: ✭ 183 (+553.57%)
Mutual labels:  information-retrieval
Rank bm25
A Collection of BM25 Algorithms in Python
Stars: ✭ 187 (+567.86%)
Mutual labels:  information-retrieval
sigir19-neural-ir
Source code for: On the Effect of Low-Frequency Terms on Neural-IR Models, SIGIR'19
Stars: ✭ 44 (+57.14%)
Mutual labels:  information-retrieval

Exercises of Information Retrieval

This repository contains the exercises (and some of their solutions) of various test exams of the Information Retrieval (IR) course, taught by Prof. Paolo Ferragina.

Subjects of the course

Like the course, the various solutions will be divided into the following topics:

  1. Introduction: Boolean retrieval model. Matrix document-term. Inverted list: dictionary + postings. How to implement an AND, OR and NOT queries, and their time complexities.
  2. Crawling: Mercator, Bloom Filters, Consistent Hashing, Web graph.
  3. Locality-Sensitive Hashing: K-means, Hamming distance, Locality Sensitive Hashing (LSH).
  4. Index Construction: The issue of hierarchical memories: I/O-model. Index construction: multi-way mergesort, BSBI and SPIMI. Sketch on MapReduce. Distributed indexing: Term-based vs Doc-based partitioning. Dynamic indexing: two indexes, a cascade of indexes.
  5. Documents Compression: Compressed storage of documents: LZ-based compression. Storage and Transmission of single/group of file(s): Delta compression (Zdelta), File Synchronization (rsync, zsync), and Set Reconciliation.
  6. Parsing and Text Laws: Parsing: tokenization, normalization, lemmatization, stemming, thesauri. Statistical properties of texts: Zipf law: classical and generalized, Heaps law, Luhn's consideration.
  7. Dictionary Search: Exact search: hashing with chaining, univeral hashing, cuckoo hashing. Prefix search: compacted trie, front coding, 2-level indexing. Edit distance via brute-force approach, or Dynamic Programming (possibly weighted). Overlap measure with k-gram index. Edit distance with k-gram index. One-error match. Wild-card queries (permuterm, k-gram). Phonetic match. Context-sensitive match.
  8. Query Resolver: Query processing: skip pointers (with solution based on dynamic programming), caching, phrase queries. Zone index and tiered index. The auto-complete problem and its solutions for the top-1, top-2, ..., top-k strings.
  9. Postings Compression: Posting list compression, codes: gamma, delta, variable bytes, PForDelta and Elias-Fano. Rank and Select data structures, two approaches: the case of B untouched and extra o(B) bits, and the case of Elias-Fano's approach with B compressed. Succinct representation of binary trees and its navigational operations (heap like notation), LOUDS.
  10. Text Ranking: Text-based ranking: dice, jaccard, tf-idf. Vector space model. Storage of tf-idf and use for computing document-query similarity. Fast top-k retrieval: high idf, champion lists, many query-terms, fancy hits, clustering. Exact Top-K: WAND and blocked-WAND. Relevance feedback, Rocchio, pseudo-relevance feedback, query expansion. Performance measures: precision, recall, F1 and user happiness.
  11. Web Ranking: Random Walks. Link-based ranking: pagerank, topic-based pagerank, personalized pagerank, CoSim rank, HITS.
  12. Projections: Projections to smaller spaces: Latent Semantic Indexing (LSI). Random Projections: Johnson-Linderstauss Lemma and its applications.
  13. Topic Annotator: Semantic-annotation tools: basics, Wikipedia structure, TAGME and other annotators. How to evaluate those systems.
  14. Lucene: Introduction to Lucene.

Topics covered by the exams

In this table is shown which kind of exercises you may find in a specific test exam (denoted by the date in which it was taken). The numbers describe the topics as in the previous section.

WARNING: The following table is just a stub. Many exercises may be misclassified.
WARNING: Every exam taken before 2016 may contain exercises form a previous programme.

Test Date 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Status
29/10/18 Status
30/10/17 Status
05/09/17 Status
27/07/17 Status
29/06/17 Status
12/06/17 Status
01/02/17 Status
12/01/17 Status
10/01/17 Status
02/09/16 Status
27/06/16 Status
01/02/16 Status
11/01/16 Status
10/09/15 Status
20/07/15 Status
29/06/15 Status
05/06/15 Status
09/02/15 Status
16/01/15 Status
29/07/14 Status
30/06/14 Status
09/06/14 Status
29/01/14 Status
08/01/14 Status
16/07/13 Status
25/06/13 Status
12/02/13 Status
10/01/13 Status
03/09/12 Status
23/07/12 Status
08/06/12 Status
01/02/12 Status
11/01/12 Status
07/12/11 Status
01/09/11 Status
20/07/11 Status
24/06/11 Status
21/02/11 Status
01/02/11 Status

Solutions file

For the latest PDF file, see the releases page.

You can otherwise build it yourself, using the following commands:

git clone https://github.com/flandolfi/IR-exercises.git
cd IR-exercises/
make

How to contribute

Any kind of contribution is welcome! If you wish to add a missing solution, follow these instructions:

  1. Fork this repository;
  2. Create a .tex file containing:
    • The text of the problem, preceded by the LaTeX macro \exercise;
    • The solution of the problem, preceded by the LaTeX macro \solution;
  3. If you need a package, add it in the IR-exercise.tex file, using \usepackage{<package>};
  4. Place the file in the specific folder for the subject of the exercise you have solved;
  5. Append your name in the \author{<name>} field in the IR-exercise.tex file, preceded by \and;
  6. Submit a pull request!

Thank you for your contribution! 😊

Links

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].