All Projects → akiratwang → Unimelb Data Science

akiratwang / Unimelb Data Science

Licence: gpl-3.0
All my Lecture Notes, Assignments and Past Exam material.

Projects that are alternatives of or similar to Unimelb Data Science

Spark Tutorials
Code and Notebooks for Spark Tutorials for Learning Journal @ Youtube
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Tensorflow From Zero To One
TensorFlow 最佳学习资源大全(含课程、书籍、博客、公开课等内容)
Stars: ✭ 1,052 (+2004%)
Mutual labels:  jupyter-notebook
Vapoursynthcolab
AI Video Processing/Upscaling With VapourSynth in Google Colab
Stars: ✭ 47 (-6%)
Mutual labels:  jupyter-notebook
Spotifyml
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Pure Numpy Feedfowardnn
Simple feedforward neural network class written in pure python+numpy
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Winter 2016 Cs231n
Assignments: CNN for Visual Recognition.
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Do Zero Ao Ml
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Teaching Ml In Production
Stars: ✭ 50 (+0%)
Mutual labels:  jupyter-notebook
Mlapp Solutions
Solutions in Python for Kevin Murphy's Machine Learning: a Probabilistic Perspective
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Wsdm Adhoc Document Retrieval
This is our solution for WSDM - DiggSci 2020. We implemented a simple yet robust search pipeline which ranked 2nd in the validation set and 4th in the test set. We won the gold prize at innovation track and bronze prize at dataset track.
Stars: ✭ 50 (+0%)
Mutual labels:  jupyter-notebook
Salmonte
SalmonTE is an ultra-Fast and Scalable Quantification Pipeline of Transpose Element (TE) Abundances
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Teal deer
Teal deer (from TL;DR) helps you get the gist of all the stuff you need to read, so you don't have to read it all at once.
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
My Projects
It's my projects
Stars: ✭ 50 (+0%)
Mutual labels:  jupyter-notebook
Universodiscreto
Códigos explicados nos vídeos do canal Universo Discreto (YouTube)
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Probandstats Pydatanyc2019
Introduction to Probability and Statistics
Stars: ✭ 50 (+0%)
Mutual labels:  jupyter-notebook
Lipreading
Stars: ✭ 49 (-2%)
Mutual labels:  jupyter-notebook
Feature Engineering Book
Code repo for the book "Feature Engineering for Machine Learning," by Alice Zheng and Amanda Casari, O'Reilly 2018
Stars: ✭ 1,052 (+2004%)
Mutual labels:  jupyter-notebook
Octave
Musical data transmission
Stars: ✭ 50 (+0%)
Mutual labels:  jupyter-notebook
Presentations
Talks & Workshops by the CODAIT team
Stars: ✭ 50 (+0%)
Mutual labels:  jupyter-notebook
Live Video Analytics
A collection of reference applications using live video analytics capabilities in Azure Media Services
Stars: ✭ 50 (+0%)
Mutual labels:  jupyter-notebook

Welcome!

Hi, this repository is created to give a taste of what the UniMelb undegraduate Data Science major consists of.
The handbook for the Data Science major can be viewed here.


Data Science = big brain excel?


Akira's First Law of Code

Let E = errors, M = more, C = code. Then the equation E = MC^2 holds true for all E > 0 as E must be positive definite in any code written.

For subject content, some of the following are provided:

  • Lecture Notes
  • Past Exams / MSTs (Crowd-Sourced cohort solutions are also provided for those pesky subjects without exam solutions)
  • Textbooks (for a couple subjects)
  • LaTeX notes for third year subjects
  • My old tutoring material

The content here is meant to be a reference for what you can expect from the subject. If you decide to "borrow" ideas from here (and all other public sources), you will be picked up for plagiarism. Aside from being against the University's policies on academic honesty, it's also outright copyright infringement since the GPL mandates that a full copy of the license and copyright information be included in copies of the work.

If you're from COMP20003 (or any subject that uses the C language), I have written this guide for setting up your Windows 10 device for both SSH-ing and WSL (Windows Subsystem Linux). As for those interested in installing an Anaconda distribution for Jupyter Notebooks, refer to this old guide from 2017

For a (very very) neat timetable planner for university, visit my mate Rohyl's lookahead which has the cleanest interface ever and has saved me numerous time for sorting out my timetable.

About Me

  • Data Engineer, Consultant at DXC Technology
  • Graduated 2019 with a major in Data Science (UniMelb)
  • Academic Teaching Staff for Foundations of Computing (COMP10001) and Algorithm & Data Structures (COMP20003)
  • Experienced as a Cloud Architect (AWS, Azure and GCP), Data Engineer and as a Data Analyst.
  • I also love developing assorted Python scripts to automate the boring stuff

Lastly, if you liked this guide / repository, please leave a star so I know it's worth keeping updated and online! This is the kind of material I wish I had to explore other subjects and get a better understanding of the content.

The "Data Science" Major Subjects

Captstone Project (akin to IT project for CompSci students):

  • Applied Data Science (MAST30034)

Core Computing & Information Systems (CIS) Subjects:

  • Foundations of Computing (COMP10001)
  • Foundations of Algorithms (COMP10002)
  • Database Systems (INFO20003)
  • Elements of Data Processing (COMP20008)
  • Machine Learning (COMP30027)

CIS Electives:

  • Algorithm Data Structures (COMP20003)
  • Artificial Intelligence (COMP30024)
  • Information Security and Privacy (INFO30006)

Core Math Subjects:

There's a lot more maths than you think (the major shares the same core maths as Actuarial Science up to second year), and so I highly recommend you take either AM1 / AM2 or Real Analysis, else Probability will be a hard leap.

  • Calculus 2 (MAST10006)
  • Linear Algebra (MAST10007)
  • Probability (MAST20004)
  • Statistics (MAST20005)
  • Linear Statistical Models (MAST30025)
  • Modern Applied Statistics (MAST30027)

Science Electives:

  • Physics 1 (PHYC10003)
  • Fundamentals of Chemistry (CHEM10007)
  • Engineering Systems Design 2 (ENGR10003)
  • Science and Internship Program (SCIE30002)

Breadths:

  • Japanese 3 (JAPN10007)
  • Music in the Culture of the Renaissance (MUSI30011)
  • High Baroque Music of the German World (MUSI30014)
  • Music Health (MUSI20150)
  • Positive Leadership and Careers (EDUC30072)

Subject Reviews (Pooled from a variety of people and my opinion)

Foundations of Computing:

  • Subject is run well both semesters and is a great introduction to Python and Computer Science.
  • Not crammable. If you want to do well, you have to consistently grind and practice it (like maths).
  • Consider doing it if you're interested in programming (and it's a better alternative to ENG COMP).

Foundations of Algorithms:

  • Introduction to basic sorting algorithms and the C programming language.
  • Makes you appreciate memory management in Python because of bloody malloc (though calloc is better imo).
  • Tutors are amazing and can actually teach. (Shout out to my tutor Alex Zable if he's still teaching).
  • Learn to use valgrind if you don't want Segmentation Faults, and gdb to avoid debugging nightmares.

Elements of Data Processing:

  • SO it was really poorly run when we took the subject, but apparently its better now so I guess it's better?
  • Teaches concepts of ETL, data processing and data cleaning via Python (Jupyter Notebooks)
  • Good entry level material taught, and you'll find that a lot of the fun stuff (such as ML and research) need to have a good ETL pipeline setup in order for it to be efficient and working
  • I personally think this subject is worth taking, but real world data is much worse and there's no one telling you what to do. If you went to get better at this kind of stuff, suss out kaggle datasets and perform your own data cleaning and analysis

Database Systems:

  • I just want to say that this is one of the most useful subjects I have taken in undergrad. I'm using a decent chunk of SQL (Microsoft SQL Server or IPython-sql library) for my work and the material I learned from this subject has come in handy
  • Reneta / David are the lecturers, and both are very clear and passionate about teaching.
  • Reneta is actually good once you get used to her accent trust me.
  • The theory content is very useful, and the concepts taught are very applicable in real life jobs.
  • 1st assignment is a bit iffy since it's a conceptual diagram of an ER-Model
  • The intuition will help you for ML, AI, and any Data Science or Data Analytics position.

Algorithm Data Structures:

  • By far the best 2nd year CIS subject ever (better alternative to Design of Algorithms)
  • Goes through all the great algorithms, including path-finding algorithms (unlike DoA which covers hashing and compression instead)
    • For example, the second assignment is usually on path finding and very basic artificial intelligence implementations to solve a 15 puzzle or to even play pacman!
  • Assignments are great fun, and after FoA you should (hopefully) be experienced enough in C to appreciate it.
  • If you're rusty on C don't worry, first few lectures is revision (we recover malloc as well for eng comp kids8)
  • The 2018 Exam question about electrical outages landed me a Graduate offer at EssentialEnergy (ayyyy)
  • I highly recommend this subject over Design of Algorithms if you prefer applications of algorithms over the theory!
  • Students who never had to experience dimefox / nutmeg gonna hate and not appreciate. JupyterHub is so good compared to dimefox and nutmeg servers.
  • I want to add on by saying you guys are super lucky, JupyterHub has only recently become a more commercially used way of showing visualizations and running code on the cloud - and you people have first hand experience of it!

Artificial Intelligence:

  • First third of the lectures are review of basic search algorithms (ADS students will find it a breeze).
  • Assignments ARE AMAZINGLY FUN.
  • Tutors and lectures are ACTUALLY GOOD.
  • Hard and conceptual tutorial questions (although there is no full solution) but are quite useful in expanding your problem solving.
  • Notation for Probability (YES PROB IS IN THE SUBJECT) uses logical AND/OR/NOT, so you have been warned.
  • If you loved ADS or DoA, you'll love this even more (and it's beneficial to both ML and Applied Data Science)

Machine Learning:

  • Subject has no maths pre-req, but they did try to attempt to cover "Linear Statistical Models" in 1 lecture.
  • The usual lecturer is great, given that the maths is not a pre-req (at least tries to make it interesting). However, he was sick for a vast majority of the time we took the subject so my experience may be a bit more biased to the worse side.
  • A lot of content was attempted to be covered and felt a bit too ambitious given that the maths was not pre-req.
  • First assignment is a joke if you have done maths, but the second is a lot more interesting.
  • Quoting my tutor: "this subject is a money grabber because no one would take this subject if Probability was an actual pre-req".
  • Tim Baldwin got us good for Exam Section D
  • UPDATE: The lecturer is Kris Ehinger, the same lecturer as Algorithms & Data Structures! A few students and myself have gone around to get feedback and have discussed some possible improvements for the subject, so rest assured that the subject is better coordinated :)

Applied Data Science (Capstone Project):

  • A very informative subject to make an overall great learning experience
  • Project is very fun and enjoyable if you love applying different techniques you've learnt over the previous subjects
    • Specifically, we had to create a game playing agent to act as a Taxi driver using real world data.
  • Group project is very dependent on how good your team works together, so find peers that have a similar work ethic as you!

Maths Subjects:

  • They're all well run
  • Probability is a big jump (if you didn't do Real Analysis or AM1/AM2) so prepare to grind
  • Have yet to find anything bad about the lecturers / lectures and content
  • Yao-Ban is the best lecturer (you'll have him for LSM)

Final Tips

  • I highly recommend you go learn LaTeX, which can be easily done through www.overleaf.com. Not only does it make your assignments look amazing, it's a great way of making great notes and reports (which employers love)
  • DO try your best, but WAM isn't everything (I have a mediocre WAM in the 75s) and anything above a 65+ will land you a job provided you network and apply yourself well.
  • Try to do projects outside of university. A lot of job offers were as a result of several projects I completed outside of University.
  • Be active in your tutorial and labs! It helps the tutor a lot and will help you learn better
  • My plea: Honestly, people who think sharepoints with excel spreadsheets are "databases" should just end themselves. Become a good Database admin - document and normalize your work, make it easier for everyone else please.

So why Data Science?

It's actually just statistics rebranded with computer science. You're essentially combining the brains of a statistician with the brawn of computing power. If you love working with data (ethical or non-ethical) or a keen to analyze things, Data Science is definitely a great field to go into.

Work and Job Pathway as a Graduate

This is purely taken from my own personal experience
You can be accepted for a wide variety of roles inlcuding (but not limited to):

  • Junior Data Scientist
  • Consultant
  • Technical Consultant
  • Data Analyst
  • Business Analyst
  • Data Engineer
  • Cloud Architect
  • Machine Learning Engineer/Expert
  • Data Developer

For companies, the places that I applied (early half of the year) and was offered include:

  • DXC ANZ
  • Essential Energy
  • Big 4 (KPMG, Deloite)
  • Department of Defence
  • Department of Industry, Science, Energy & Resources
  • Accenture ANZ
  • Coles
  • ATO

There are plenty of jobs around and networking opportunities to be made - I highly suggest you join the Melbourne Data Science MeetUp group to meet like-minded people: https://duckduckgo.com/?q=meetup+data+science+melbourne&t=braveed&ia=web

License

All of the source code in this project is licensed under the GNU General Public License v3 (or later).

Copyright (C) 2019 Akira Wang

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].