All Projects → tyleransom → Dscourses19

tyleransom / Dscourses19

Licence: mit
ECON 5253: Data Science for Economists, University of Oklahoma (Spring 2019)

Projects that are alternatives of or similar to Dscourses19

Neural networks
This is the code for "Neural Networks - The Math of Intelligence #4" by Siraj Raval on Youtube
Stars: ✭ 28 (-3.45%)
Mutual labels:  jupyter-notebook
Ismir2020 u nets svs
A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Artee.ai
AI Generated Tees
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Keras Faster Rcnn
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Stars: ✭ 28 (-3.45%)
Mutual labels:  jupyter-notebook
Adv fin ml exercises
Experimental solutions to selected exercises from the book [Advances in Financial Machine Learning by Marcos Lopez De Prado]
Stars: ✭ 944 (+3155.17%)
Mutual labels:  jupyter-notebook
Mlnet Workshop
ML.NET Workshop to predict car sales prices
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Imageretrieval
Stars: ✭ 28 (-3.45%)
Mutual labels:  jupyter-notebook
Chatbot
Chatbot based on Rasa Framework
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Tetrahedra
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Symbolic Metamodeling
Codebase for "Demystifying Black-box Models with Symbolic Metamodels", NeurIPS 2019.
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Tensorflow Binary Classification
A binary classification model based on tensorflow.
Stars: ✭ 28 (-3.45%)
Mutual labels:  jupyter-notebook
Gpufilter
GPU Recursive Filtering
Stars: ✭ 28 (-3.45%)
Mutual labels:  jupyter-notebook
Tensorflow In Practice Specialization
DeepLearning.AI TensorFlow Developer Professional Certificate Specialization
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Icyface offline
offline part of icyface
Stars: ✭ 28 (-3.45%)
Mutual labels:  jupyter-notebook
Eci2019 Nlp
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Dalr
Implementation of "Domain-adaptive deep network compression", ICCV 2017
Stars: ✭ 28 (-3.45%)
Mutual labels:  jupyter-notebook
Financial Machine Learning Articles
Contains the code for my financial machine learning articles
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Downloadconceptualcaptions
Reliably download millions of images efficiently
Stars: ✭ 28 (-3.45%)
Mutual labels:  jupyter-notebook
Kaggle Santander Customer Transaction Prediction 5th Place Partial Solution
Kaggle Competition notebooks
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook
Kaggle
Kaggle에서 진행하는 경진대회의 코드를 올려둔 공간입니다.
Stars: ✭ 29 (+0%)
Mutual labels:  jupyter-notebook

ECON 5253: Data Science for Economists (Spring 2019)

Join the chat at https://gitter.im/DScourseS19/community

Tyler Ransom
Email [email protected]
Office 322 CCD1
Office Hours M 9:30-10:30am, Th 12-1pm
GitHub tyleransom
  • Meeting day/time: T,Th 1:30-2:45pm, CCD1, Room 174
  • Office hours also available by appointment
  • This course takes inspiration and extensively borrows materials from similar courses taught by Jason DeBacker (U of South Carolina) and Rick Evans (U of Chicago). Thanks to them for providing a framework for using GitHub as a class collaboration tool and for insights into teaching programming skills.

Course description

Data science is a rapidly developing field that combines the recent Big Data revolution with ever-developing statistical algorithms to inform business and policy decisions. Nearly every company you've heard of uses data science to optimize its services: Netflix uses it to recommend new programs to its viewers, Amazon uses it to determine how much it should charge for its Prime services. This class will provide students with an overview of the data science workflow, from collecting raw data to drawing a set of insights from which a decision maker can make informed decisions. Along the way we will broadly cover a variety of advances in data collection, data storage, visualization, machine learning and econometrics topics, as well as teaching and reinforcing good programming practices. The primary goal of this course is to provide you, the student, with a set of skills that will allow you to compete for a data science job.

Course Objectives and Learning Outcomes

By the end of the course, students should be able to do the following:

  1. Explain the data science workflow from start to finish
  2. Be able to collect data from online sources via APIs or scraping
  3. Describe similarities and differences between econometrics and machine learning
  4. Explain what data science is, and how Big Data differs from other types of data
  5. Demonstrate good programming practices by writing code that can allow for easy collaboration with others
  6. Understand the differences between prediction and causality, and the cases in which each is useful

In this course students, through lecture and application, will learn about:

  • Good programming practices, including how to write code collaboratively with others
  • Software to increase research productivity including:
    • LaTeX/Markdown
    • git
  • Software to collect & clean data, and estimate statistical models:
    • R
    • Julia
    • Python
  • Software to manage big data sets:
    • SQL
    • RDDs (Resilient Distributed Datasets) --- Spark, Hadoop
  • How to access and utilize cluster computing resources
    • SSH (Secure Shell)
    • SFTP (Secure File Transfer Protocol)
    • SLURM (Simple Linux Utility for Resource Management)
  • Methods to gather and handle data including:
    • Costs and benefits of different data structures
    • Using APIs
    • Web scraping
  • Best practices for cleaning and visualizing data
  • Computational methods to:
    • Optimize and find roots of functions
    • Perform Monte Carlo simulations
    • Run computations in parallel using multiple processors (time permitting)
  • Basics for modeling different types of data
  • Machine learning basics:
    • Supervised vs. unsupervised learning
    • The five "tribes" of machine learning: how they are interconnected, and how they differ
    • Machine learning vs. econometrics: prediction vs. causality
    • Evaluating model performance
  • Using economic models to inform policy decisions
    • Computing structural models

Grades

Grades will be based on the categories listed below with the corresponding weights.

Component Percent
Class Participation 10%
Problem Sets 35%
Exam & Quizzes 20%
Final project 35%
Total points 100%

Final grades will be assigned according to the standard cutoffs (90%+ for an A, 80%-89.99% for a B, etc.).

  • Participation:

    • An important part of learning is face-to-face interaction. Thus, some of your grade will depend on attendance and active participation in class meetings.
  • Problem sets: will be assigned approximately weekly throughout the semester.

    • You must write and submit your own computer code, although I encourage you to collaborate with your fellow students. I DO NOT want to see a bunch of copies of identical code. I DO want to see each of you learning how to code these problems so that you could do it on your own.
    • Problem set solutions, both written and code portions, will be turned in via a pull request from your private GitHub.com repository which is a fork of the class master repository on my account. (You will need to set up a GitHub account if you do not already have one.)
    • Written solutions must be submitted as PDF documents or Jupyter Notebooks.
    • Problem sets will be due on the day listed in the Daily Course Schedule section of this syllabus (see below) unless otherwise specified. Late problem sets will not receive any credit. Partially completed problem sets will receive partial credit.
  • Exam & Quizzes:

    • We may periodically have in-class quizzes as low-stakes ways to get feedback
    • There will be a written final exam, but no midterm
  • Final Project:

    • Collect data on and analyze a research question of your choosing, using methods taught in this course
    • Write up a ~10 page (12pt font, double spaced, excluding References, Figures, and Tables) summary of your findings, including discussion about what prior studies of the same topic have found, as well as citations to prior studies
    • Turn in the written summary report and a GitHub repository containing all materials required to reproduce the results
    • Summary report should be written in LaTeX or RMarkdown and turned in as a PDF (source code for the summary report should also be included in your GitHub repository)
    • An example of what the final product should look like is here, with LaTeX source code here and BibTeX source code here.
    • A detailed rubric for the final project is here

Communication

  • I will always be available via email, and in person during office hours.
  • Additionally, I have set up a Gitter community (see the badge at the top of this document) where I am hoping you can chat with each other about programming or other questions you have regarding the course. I will also be a participant in that community.

Daily Course Schedule

(Will be continuously updated throughout the semester)

Date Day Topic Due
Jan 15 T What is data science / big data / why is it important? (Slides)
Jan 17 Th Git, GitHub, computing environment, and Coding best practices (Notes) Read Gentzkow & Shapiro's handbook; Ch. 1 of The Master Algorithm; register for GitHub account
Jan 22 T Linux command line, SSH, accessing OSCER (Notes) PS 1
Jan 24 Th Overview of Data Scientists' tools (Notes)
Jan 29 T Using data: data types, storage (Notes) PS 2
Jan 31 Th Big Data: SQL (Notes) & RDDs (link); running jobs on the OSCER cluster
Feb 5 T Sampling & storing Big Data (Notes) PS 3
Feb 7 Th Web scraping/APIs to gather data (Notes)
Feb 12 T Web scraping/APIs to gather data (Notes) PS 4
Feb 14 Th (Maybe) No class: career fair; otherwise Intro to Julia (Julia notes)
Feb 19 T "Snow" day (class canceled)
Feb 21 Th Getting to know your data: descriptive statistics, cleaning, tips, tricks, transformations, visualization (Notes; HTML slides) PS 5
Feb 26 T Modeling continuous and discrete variables (Notes) HTML slides); Simple R script
Feb 28 Th Linear Algebra Introduction / Review (Handout)
Mar 5 T Introduction to optimization (Notes) PS 6
Mar 7 Th Writing and optimizing functions in R, Python, and Julia (Notes)
Mar 12 T Writing and optimizing functions in R, Python, and Julia (Notes) PS 7
Mar 14 Th Debugging strategies and simulations (Notes)
Mar 19 T No class (Spring break)
Mar 21 Th No class (Spring break)
Mar 26 T Intro to Machine Learning (Notes) PS 8
Mar 28 Th Supervised ML: Regularization, measuring model fit, tuning with cross-validation, the elastic net model (Notes)
Apr 2 T Supervised ML: The 5 Tribes of Machine Learning (Notes) PS 9
Apr 4 Th Unsupervised ML: Clustering (Notes)
Apr 9 T Unsupervised ML: Dimensionality reduction and reinforcement learning (Notes) PS 10
Apr 11 Th Machine learning vs. econometrics (Notes)
Apr 16 T Structural modeling: static discrete choice (Slides) PS 11
Apr 18 Th Structural modeling: dynamic discrete choice (Slides)
Apr 25 Th Final Project presentations (Rubric)
Apr 30 T Final Project presentations (Rubric)
May 2 Th Final Project presentations (Rubric) PS 12 (optional)
May 9 Th Final Exam (in class, 1:30-3:30pm) Final project due (Scoresheet)

Helpful Links

Books

University Policies

Religious Observance

It is the policy of the University to excuse the absences of students that result from religious observances and to reschedule examinations and additional required classwork that may fall on religious holidays, without penalty.

Reasonable Accommodation Policy

If a student requires an accommodation based on disability, the student should meet with me in my office during the first week of the semester. Student responsibility primarily rests with informing faculty at the beginning of the semester and in providing authorized documentation through designated administrative channels. The Disability Resource Center is located in the University Community Center at 730 College Avenue (405-325-3852).

Academic Integrity:

I do not tolerate academic misconduct, and neither does the University of Oklahoma. I will not hesitate to fail students who do not fully comply with the University's academic misconduct policy. If you find yourself contemplating cheating, plagiarism, or other forms of academic misconduct, please come see me first. Help is available if you are struggling. I want everyone in the class to try their best and to do their own work. Please be advised that I reserve the right to utilize anti-plagiarism resources such as TurnItIn when grading assignments.

Title IX Resources and Reporting Requirement

For any concerns regarding gender-based discrimination, sexual harassment, sexual assault, dating/domestic violence, or stalking, the University offers a variety of resources. To learn more or to report an incident, please contact the Sexual Misconduct Office at (405) 325-2215 (8 to 5, M-F) or [email protected]. Incidents can also be reported confidentially to OU Advocates at (405) 615-0013 (phones are answered 24 hours a day, 7 days a week). Also, please be advised that a professor/GA/TA is required to report instances of sexual harassment, sexual assault, or discrimination to the Sexual Misconduct Office. Inquiries regarding non-discrimination policies may be directed to: Bobby J. Mason, University Equal Opportunity Officer and Title IX Coordinator at (405) 325-3546 or [email protected]. For more information, visit http://www.ou.edu/eoo.html.

Adjustments for Pregnancy/Childbirth Related Issues

Should you need modifications or adjustments to your course requirements because of documented pregnancy-related or childbirth-related issues, please contact your professor or the Disability Resource Center at (405) 325-3852 as soon as possible. Also, see http://www.ou.edu/eoo/faqs/pregnancy-faqs.html for answers to commonly asked questions.

Reasonable Accommodations for Students with Disabilities

If a student requires an accommodation based on disability, the student should meet with me in my office during the first week of the semester. Student responsibility primarily rests with informing faculty at the beginning of the semester and in providing authorized documentation through designated administrative channels. The Disability Resource Center is located in Goddard Hall (405-325-3852).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].