All Projects → amueller → Scipy 2018 Sklearn

amueller / Scipy 2018 Sklearn

Licence: cc0-1.0
Scipy 2018 scikit-learn tutorial by Guillaume Lemaitre and Andreas Mueller

Projects that are alternatives of or similar to Scipy 2018 Sklearn

Data Cleaning 101
Data Cleaning Libraries with Python
Stars: ✭ 243 (-1.62%)
Mutual labels:  jupyter-notebook
Yolo Series
A series of notebooks describing how to use YOLO (darkflow) in python
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Exploratory computing with python
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
2016 01 Tennis Betting Analysis
Methodology and code supporting the BuzzFeed News/BBC article, "The Tennis Racket," published Jan. 17, 2016.
Stars: ✭ 244 (-1.21%)
Mutual labels:  jupyter-notebook
Recmetrics
A library of metrics for evaluating recommender systems
Stars: ✭ 244 (-1.21%)
Mutual labels:  jupyter-notebook
Zhihu
知乎看山杯 第二名 解决方案
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Kdepy
Kernel Density Estimation in Python
Stars: ✭ 244 (-1.21%)
Mutual labels:  jupyter-notebook
Bigquery Oreilly Book
Source code accompanying: BigQuery: The Definitive Guide by Lakshmanan & Tigani to be published by O'Reilly Media
Stars: ✭ 246 (-0.4%)
Mutual labels:  jupyter-notebook
Guided Evolutionary Strategies
Guided Evolutionary Strategies
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Jupyter Tips And Tricks
Using Project Jupyter for data science.
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Fouriertalkoscon
Presentation Materials for my "Sound Analysis with the Fourier Transform and Python" OSCON Talk.
Stars: ✭ 244 (-1.21%)
Mutual labels:  jupyter-notebook
Delf Pytorch
PyTorch Implementation of "Large-Scale Image Retrieval with Attentive Deep Local Features"
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Box Plots Sklearn
An implementation of some of the tools used by the winner of the box plots competition using scikit-learn.
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Smpybandits
🔬 Research Framework for Single and Multi-Players 🎰 Multi-Arms Bandits (MAB) Algorithms, implementing all the state-of-the-art algorithms for single-player (UCB, KL-UCB, Thompson...) and multi-player (MusicalChair, MEGA, rhoRand, MCTop/RandTopM etc).. Available on PyPI: https://pypi.org/project/SMPyBandits/ and documentation on
Stars: ✭ 244 (-1.21%)
Mutual labels:  jupyter-notebook
Pomegranate
Fast, flexible and easy to use probabilistic modelling in Python.
Stars: ✭ 2,789 (+1029.15%)
Mutual labels:  jupyter-notebook
Abu ml
机器学习技术研究室——by阿布量化小组
Stars: ✭ 244 (-1.21%)
Mutual labels:  jupyter-notebook
Link Prediction
Representation learning for link prediction within social networks
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Audio Classification
Code for YouTube series: Deep Learning for Audio Classification
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Recsys core
[电影推荐系统] Based on the movie scoring data set, the movie recommendation system is built with FM and LR as the core(基于爬取的电影评分数据集,构建以FM和LR为核心的电影推荐系统).
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook
Conceptualsearch
Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs
Stars: ✭ 245 (-0.81%)
Mutual labels:  jupyter-notebook

SciPy 2018 Scikit-learn Tutorial

Instructors


This repository will contain the teaching material and other info associated with our scikit-learn tutorial at SciPy 2018 held July 9-15 in Austin, Texas.

Parts 1 to 12 make up the morning session, while parts 13 to 23 will be presented in the afternoon (approximately)

Schedule:

The 2-part tutorial will be held on Tuesday, July 10, 2018.

Obtaining the Tutorial Material

If you have a GitHub account, it is probably most convenient if you clone or fork the GitHub repository. You can clone the repository by running:

git clone https://github.com/amueller/scipy-2018-sklearn.git

If you are not familiar with git or don’t have an GitHub account, you can download the repository as a .zip file by heading over to the GitHub repository (https://github.com/amueller/scipy-2018-sklearn) in your browser and click the green “Download” button in the upper right.

Please note that we may add and improve the material until shortly before the tutorial session, and we recommend you to update your copy of the materials one day before the tutorials. If you have an GitHub account and cloned the repository via GitHub, you can sync your existing local repository with:

git pull origin master

If you don’t have a GitHub account, you may have to re-download the .zip archive from GitHub.

Installation Notes

This tutorial will require recent installations of

The last one is important and you should be able to type:

jupyter notebook

in your terminal window and see the notebook panel load in your web browser. Try opening and running a notebook from the material to see check that it works. Alternatively you can use Jupyter lab.

For users who do not yet have the required packages installed, a relatively painless way to install all the requirements is to use a Python distribution such as Anaconda, which includes the most relevant Python packages for science, math, engineering, and data analysis; Anaconda can be downloaded and installed for free including commercial use and redistribution. The code examples in this tutorial should be compatible to Python 2.7, Python 3.4-3.6.

After obtaining the material, we strongly recommend you to open and execute the Jupyter Notebook jupter notebook check_env.ipynb that is located at the top level of this repository. Inside the repository, you can open the notebook by executing

jupyter notebook check_env.ipynb

inside this repository. Inside the Notebook, you can run the code cell by clicking on the "Run Cells" button as illustrated in the figure below:

Finally, if your environment satisfies the requirements for the tutorials, the executed code cell will produce an output message as shown below:

Although not required, we also recommend you to update the scikit-learn the latest release version to ensure best compatibility with the teaching material. Please upgrade already installed packages by executing

  • pip install --no-deps --upgrade [package-name]
  • or conda update [package-name]

Depending on how you installed scikit-learn.

Data Downloads

The data for this tutorial is not included in the repository. We will be using several data sets during the tutorial: most are built-in to scikit-learn, which includes code that automatically downloads and caches these data.

Because the wireless network at conferences can often be spotty, it would be a good idea to download these data sets before arriving at the conference. Please run

python fetch_data.py

to download all necessary data beforehand.

The download size of the data files are approx. 280 MB, and after fetch_data.py extracted the data on your disk, the ./notebook/dataset folder will take 480 MB of your local hard drive.

Outline

Morning Session

  • 01 Introduction to machine learning with sample applications, Supervised and Unsupervised learning [view]
  • 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib [view]
  • 03 Data formats, preparation, and representation [view]
  • 04 Supervised learning: Training and test data [view]
  • 05 Supervised learning: Estimators for classification [view]
  • 06 Supervised learning: Estimators for regression analysis [view]
  • 07 Unsupervised learning: Unsupervised Transformers [view]
  • 08 Unsupervised learning: Clustering [view]
  • 09 The scikit-learn estimator interface [view]
  • 10 Preparing a real-world dataset (titanic) [view]
  • 11 Working with text data via the bag-of-words model [view]
  • 12 Application: IMDb Movie Review Sentiment Analysis [view]

Afternoon Session

  • 13 Cross-Validation [view]
  • 14 Model complexity and grid search for adjusting hyperparameters [view]
  • 15 Scikit-learn Pipelines [view]
  • 16 Supervised learning: Performance metrics for classification [view]
  • 17 Supervised learning: Linear Models [view]
  • 18 Supervised learning: Decision trees and random forests, and ensemble methods [view]
  • 19 Supervised learning: feature selection [view]
  • 20 Unsupervised learning: Hierarchical and density-based clustering algorithms [view]
  • 21 Unsupervised learning: Non-linear dimensionality reduction [view]
  • 22 Unsupervised learning: Anomaly Detection [view]
  • 23 Supervised learning: Out-of-core learning [view]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].