All Projects → soumendra → Data Science Toolbox Bootcamp

soumendra / Data Science Toolbox Bootcamp

A 4 week program to get started with Data Science. Useful for beginners who want to get started by themselves.

Projects that are alternatives of or similar to Data Science Toolbox Bootcamp

2015 Julia Hands On
Julia Hands-on at ERAD-NE 2015
Stars: ✭ 9 (-10%)
Mutual labels:  jupyter-notebook
Cds Ta Meetup
Competitive Data Science @ Tel Aviv Meetup
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Tensorflow Tutorial
Basics of Tensorflow
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Algorithmic Trading Python
The repository for freeCodeCamp's YouTube course, Algorithmic Trading in Python
Stars: ✭ 846 (+8360%)
Mutual labels:  jupyter-notebook
Resnet
Tensorflow ResNet implementation on cifar10
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Variational Autoencoders Summerschool 2016
Exercises for the semi-supervised summer school https://semisupervised-learning.compute.dtu.dk.
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Sphere Challenge
SPHERE Challenge: Activity Recognition with Multimodal Sensor Data
Stars: ✭ 9 (-10%)
Mutual labels:  jupyter-notebook
Dl Workshop Series
Material used for Deep Learning related workshops for Machine Learning Tokyo (MLT)
Stars: ✭ 857 (+8470%)
Mutual labels:  jupyter-notebook
Ml
Varios mini-projectos de ML
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Changepoint Detection
Online Change-point Detection Algorithm for Multi-Variate Data: Applications on Human/Robot Demonstrations.
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Awesome Ai Books
Some awesome AI related books and pdfs for learning and downloading, also apply some playground models for learning
Stars: ✭ 855 (+8450%)
Mutual labels:  jupyter-notebook
Group meetings
Notes and ideas for MARL group meetings
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Movielens Recommender
Course project for Programing Machine Learnings Applications class
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Notes Lsju Machine Learning
机器学习笔记
Stars: ✭ 852 (+8420%)
Mutual labels:  jupyter-notebook
Python And Spark For Data Analysis
A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course given by Patrick Varilly to one of our clients in December 2015
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Lyrics Lab
CS109 Final Project
Stars: ✭ 9 (-10%)
Mutual labels:  jupyter-notebook
Meetupcityfinder
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Convolutional Pose Machines Release
Code repository for Convolutional Pose Machines, CVPR'16
Stars: ✭ 857 (+8470%)
Mutual labels:  jupyter-notebook
Pandas jupyter
Laboranyagok
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook
Deeplearningcameraapp
Deep Learning Capstone Project. Live camera app that can interpret number strings in real-world images.
Stars: ✭ 10 (+0%)
Mutual labels:  jupyter-notebook

This is a work-in-progress. This is still in pre-alpha stage

Datascience Toolbox Bootcamp

A pragmatic 5 week program that 80% of companies can use to get their interns/new employees started with Data Science inside their organization. Also useful for beginners who want to get started by themselves.

  • Learn to set up a useful datascience ecosystem.
  • Pick up SQL and NoSQL solutions that you prefer or that your company uses.
  • Learn common (and powerful) Machine Learning ecosystems in vogue today, and learn how to learn a new statical programming language quickly.
  • Gain experience with hands-on projects that reinforces technologies and techniques learnt.
  • Beginner-friendly and adaptible to different levels of beginner-experience levels.

However, this is not a detailed instruction manual. There are just enough hints (sometimes more) to make the activities interesting for the audience and help them master a series of increasingly difficult challenges. Most of the materials used here is curated from the web and freely available.

Activities in a week are centered around a topic/theme/technology stack. Only the first week is needed to get started. Rest of the weeks can be picked or left out based on requirements.

Who uses this material?

In the past, this material was used in workshops I conducted. Currently, I am using them in a Big Data and Analytics course that I teach in, the Data Science team I lead, and the Data Science team I mentor.

The curriculum

Overview

  • Statistical Programming Language: We start with R, and use it to learn the design patterns in statistical programming languages (which teaches us both how to learn a new language for statistical programming and how to use these patterns for effective statistical programming as well). Then we quickly move to learn Python as well. While the student may not need to use both these languages in the longer run, learning both of them in the beginning is useful.
  • SQL and NoSQL: Setting up multiple SQL solutions (MySQL, PostgreSQL) is covered, and only one of them needs to be picked up. The rest of the curriculum is agnostic of the choice. NoSQL, however, refers to a diverse collection of solutions. No standardization like SQL is available, and as a result the curriculum is not agnostic of the choice of NoSQL db. We learn CRUD in mongodb and dynamoDB and then use them in various activities afterwards.
  • Cloud: AWS (ec2, S3, dynamoDB, redshift)

Details

  • Week 1: Basics

    • Setup the system
    • Learn R, Python
    • Learn a SQL database system (MySQL or PostgreSQL)
    • Learn a NoSQL database system (mongodb)
    • Small Projects
  • Week 2: Supervised Machine Learning

    • Linear and Logistic Regression, Penalized Regression
    • Decision Trees, Random Forest, GBM and eXtreme Gradient Boosting
    • The Kaggle Titanic Competition
    • Twitter Sentiment Analysis
    • Ensembling in Practice
  • Week 3: Into the Cloud

    • ec2 and S3
    • Going CLI
    • dynamoDB
    • redshift
    • Mock ETL with Airflow
  • Week 4: Projects

    • Document Classification
    • Build a Recommender System
    • Analyzing the Analyzers (Analyzing Data Science job postings)
    • Setup an ML pipeline (caret vs mlr, scikit-learn)
    • Beat a Kaggle Champion!
  • Week 5: Data Science in the Wild!

    • Visualization with Grammar of Graphics (ggplot2, bokeh)
    • Reports, Table and Charts: Sharing results
    • Create an ML api (flask)
    • Build a Dashboard (flask)
    • Reasoning about metrics
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].