Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Deliverable: This Jupyter notebook will help aspiring data scientists learn and practice the necessary python code needed for many data science projects.

Stars: ✭ 86 (-1.15%)

Mutual labels: jupyter-notebook

Deep Learning Notes

Experiments with Deep Learning

Stars: ✭ 1,278 (+1368.97%)

Mutual labels: jupyter-notebook

Kaggle Competitions

There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.

Stars: ✭ 86 (-1.15%)

Mutual labels: jupyter-notebook

Caffeonspark

Distributed deep learning on Hadoop and Spark clusters.

Stars: ✭ 1,272 (+1362.07%)

Mutual labels: jupyter-notebook

Airbnb Dynamic Pricing Optimization

[BA project] Dynamic Pricing Optimization for Airbnb listing to optimize yearly profit for host. Use Clustering for competitive analysis, kNN regression for demand forecasting, and find dynamic optimal price with Optimization model.

Stars: ✭ 85 (-2.3%)

Mutual labels: jupyter-notebook

Deep Learning Boot Camp

A community run, 5-day PyTorch Deep Learning Bootcamp

Stars: ✭ 1,270 (+1359.77%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Simple Question Answering — EMNLP 2018

This is the code for the EMNLP 2018 paper "SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach".

On the SimpleQuestions dataset task, one of the most commonly used benchmarks for studying single-relation factoid questions, we:

Show that ambiguity in the data bounds performance on this benchmark at 83.4%; there are often multiple answers that cannot be disambiguated from the question alone.
Introduce a baseline that sets a new state-of-the-art performance level at 78.1% accuracy, using only standard methods.

Example

Structure

.
├── /notebooks/                          
│   ├── /Simple QA End-To-End/           # Experiments on components of the end-to-end QA pipeline
│   ├── /Simple QA Models                # Experiments on various neural models
│   ├── /Simple QA KG to PostgreSQL DB   # Scripts to populate postgreSQL
│   ├── /Simple QA Numbers               # Scripts for computing and verifying various numbers
├── /pretrained_models/                   
├── /lib/                                # Various utility functionality
├── /tests/                               
├── .flake8                               
└── requirements.txt                     # Required python packages

Prerequisites

This repository requires Python 3.5 or greater and PostgreSQL.

Installation

Clone the repository and cd into it

git clone https://github.com/PetrochukM/Simple-QA-EMNLP-2018.git
cd Simple-QA-EMNLP-2018

Install the required packages

python -m pip install -r requirements.txt

Create and populate a PostgreSQL table named fb_two_subject_name with notebooks/Simple QA KG to PostgreSQL DB/fb_two_subject_name.csv.gz
Create a .pass file using the below template:
```
DB_NAME=
DB_PORT=
DB_USER=
DB_HOST=
DB_PASS=
```
Such that:
- DB_NAME: the database name
- DB_USER: user name used to authenticate
- DB_PASS: password used to authenticate
- DB_HOST: database host address
- DB_PORT: connection port number (typically 5432)
Download the SimpleQuestions v2 dataset from Facebook Research. Use the notebook at Simple-QA-EMNLP-2018/notebooks/Simple QA KG to PostgreSQL DB/FB5M & FB2M KG to DB.ipynb to create and populate a PostgreSQL table.
You're done! Feel free to run Simple-QA-EMNLP-2018/notebooks/Simple QA End-To-End.

Slides

The slides used for our EMNLP talk.

Citation

@article{Petrochuk2018SimpleQuestionsNS,
  title={SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach},
  author={Michael Petrochuk and Luke S. Zettlemoyer},
  journal={CoRR},
  year={2018},
  volume={abs/1804.08798}
}

Important Notes

The FB2M and FB5M subsets of Freebase KG can complete 7,188,636 and 7,688,234 graph queries respectively; therefore, the FB5M subset is 6.9% larger than the FB2M subset. Also, the FB5M dataset only contains 3.98M entities. This contradicts the statement that "FB5M, is much larger with about 5M entities" (Bordes et al., 2015).
FB5M and FB2M contain 4,322,266 and 3,654,470 duplicate grouped facts respectively.
FB2M is not a subset of FB5M, 1 atomic fact is in FB2M that is not in FB5M: (01g4wmh, music/album/acquire_webpage, 02q5zps).
FB5M and FB2M do not contain the answer for 24 and 36 examples in SimpleQuestions dataset respectively; therefore, those examples are unanswerable.

Other Important Papers

Other Important GitHub Repositories

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 87

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗