All Projects → TheMTank → ai-distillery

TheMTank / ai-distillery

Licence: MIT License
Automatically modelling and distilling knowledge within AI. In other words, summarising the AI research firehose.

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to ai-distillery

Research Paper Notes
Notes and Summaries on ML-related Research Papers (with optional implementations)
Stars: ✭ 218 (+990%)
Mutual labels:  research, arxiv
Catalyst
Accelerated deep learning R&D
Stars: ✭ 2,804 (+13920%)
Mutual labels:  information-retrieval, research
Asreview
Active learning for systematic reviews
Stars: ✭ 233 (+1065%)
Mutual labels:  research, arxiv
Pyndri
pyndri is a Python interface to the Indri search engine.
Stars: ✭ 85 (+325%)
Mutual labels:  information-retrieval, research
football-graphs
Graphs and passing networks in football.
Stars: ✭ 81 (+305%)
Mutual labels:  graphs, network-science
Ingraph
Incremental view maintenance for openCypher graph queries.
Stars: ✭ 40 (+100%)
Mutual labels:  research, graphs
Tutorial Utilizing Kg
Resources for Tutorial on "Utilizing Knowledge Graphs in Text-centric Information Retrieval"
Stars: ✭ 148 (+640%)
Mutual labels:  information-retrieval, knowledge-base
Arxiv Equations
🚀 Provides equations in latex format from arxiv paper.
Stars: ✭ 23 (+15%)
Mutual labels:  research, arxiv
ntds 2019
Material for the EPFL master course "A Network Tour of Data Science", edition 2019.
Stars: ✭ 62 (+210%)
Mutual labels:  graphs, network-science
patzilla
PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
Stars: ✭ 71 (+255%)
Mutual labels:  information-retrieval, research
ultimate-defi-research-base
Here we collect and discuss the best DeFI & Blockchain researches and tools. Feel free to DM me on Twitter or open pool request.
Stars: ✭ 1,074 (+5270%)
Mutual labels:  research, knowledge-base
SeaPearl.jl
Julia hybrid constraint programming solver enhanced by a reinforcement learning driven search.
Stars: ✭ 119 (+495%)
Mutual labels:  research, graphs
EstimNetDirected
Equilibrium Expectation for ERGM parameter estimation for large directed networks
Stars: ✭ 18 (-10%)
Mutual labels:  research, network-science
ntds 2018
Material for the EPFL master course "A Network Tour of Data Science", edition 2018.
Stars: ✭ 59 (+195%)
Mutual labels:  graphs, network-science
VIATRA-Generator
An efficient graph solver for generating well-formed models
Stars: ✭ 21 (+5%)
Mutual labels:  graphs
disparity filter
Implements a disparity filter in Python, based on graphs in NetworkX, to extract the multiscale backbone of a complex weighted network (Serrano, et al., 2009)
Stars: ✭ 17 (-15%)
Mutual labels:  graphs
GeeseDB
Graph Engine for Exploration and Search
Stars: ✭ 14 (-30%)
Mutual labels:  information-retrieval
CodeAndQuestsEveryDay
Regular research on the Quest for developers.
Stars: ✭ 27 (+35%)
Mutual labels:  research
typedb-loader
TypeDB Loader - Data Migration Tool for TypeDB
Stars: ✭ 43 (+115%)
Mutual labels:  knowledge-base
abcvoting
Python implementations of approval-based committee (multi-winner) voting rules
Stars: ✭ 17 (-15%)
Mutual labels:  research

ai-distillery

Build Status

Automatically modelling and distilling knowledge within AI. In other words, summarise the arxiv firehose. Map, categorise, quantify, qualify, filter, search, browse, reduce, digest, compress, summarise and model all knowledge within ML/DL/RL/AI/DS/CS/Stats. And, always for the community.

We are showing some of our results on ai-distillery.io.

AI Distllery visualization section Server GitHub repo at ai-distillery-app

Number of arxiv papers released over time from 2014 Jan - November 2018

Num arxiv papers released over time 2014+

Installation

Please consider using a virtual environment as shown below. This way, the scripts won't pollute your global $PATH.

git clone https://github.com/ai-distillery
cd ai-distillery
virtualenv venv && source venv/bin/activate # STRONGLY RECOMMENDED
pip install -e .

The package will install a single executable distill. Distill can be invoked to apply latent semantic analysis, word2vec, doc2vec, and extracting named entities. Consolt distill -h for more information on the available subcommands.

Fetching data

We maintain a fork of Karpathy's Arxiv Sanity Preserver to harvest structured meta-data as well as full-text data from ArXiV.

We assume in the following that the data/db.p holds the database of structured metadata. The directory data/txt contains the raw <arxiv_Id>.pdf.txt full-text files.

For convenience we have registered our fork of arxiv-sanity-preserver as a submodule. To clone the submodule, issue the following command.

git submodule update --init

Then follow the guide by Karpathy to run the code.

Executing scripts

Please consult -h for more information on how to run one of the executables.

An example call to run LSA

An example call to compute 2-dimensional LSA (latent semantic analysis) vectors for the documents:

distill lsa data/txt/ -n 2 --annotate data/full_paper_id_to_title_dict.pkl -o data/embeddings/lsa-2.pkl

This call assumes that data/txt/ contains *.pdf.txt files. The -n arguments determines the number of components at which the singular value decomposition in LSA should truncate. This also determines the embedding dimension. The optional --annotate argument supplies a path to a pickled dict which maps identifiers (filenames without .pdf.txt) to titles for visualization. The output is stored in Ben format. A pickled dict of type {'labels': labels:list(str), 'embeddings': embeddings:numpy.ndarray } such that labels[i] corresponds to embeddings[i].

Contributing

Make sure to install the aidistillery package by pip install -e . or python3 setup.py develop. This way, any changes take effect without the need to reinstall. We look forward to receiving your pull requests.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].