akiratwang / Unimelb Data Science
Labels
Projects that are alternatives of or similar to Unimelb Data Science
Welcome!
Hi, this repository is created to give a taste of what the UniMelb undegraduate Data Science major consists of.
The handbook for the Data Science major can be viewed here.
Data Science = big brain excel?
Akira's First Law of Code
Let E = errors
, M = more
, C = code
. Then the equation E = MC^2
holds true for all E > 0
as E
must be positive definite in any code written.
For subject content, some of the following are provided:
- Lecture Notes
- Past Exams / MSTs (Crowd-Sourced cohort solutions are also provided for those pesky subjects without exam solutions)
- Textbooks (for a couple subjects)
- LaTeX notes for third year subjects
- My old tutoring material
The content here is meant to be a reference for what you can expect from the subject. If you decide to "borrow" ideas from here (and all other public sources), you will be picked up for plagiarism. Aside from being against the University's policies on academic honesty, it's also outright copyright infringement since the GPL mandates that a full copy of the license and copyright information be included in copies of the work.
If you're from COMP20003 (or any subject that uses the C language), I have written this guide for setting up your Windows 10 device for both SSH-ing and WSL (Windows Subsystem Linux). As for those interested in installing an Anaconda distribution for Jupyter Notebooks, refer to this old guide from 2017
For a (very very) neat timetable planner for university, visit my mate Rohyl's lookahead which has the cleanest interface ever and has saved me numerous time for sorting out my timetable.
About Me
- Data Engineer, Consultant at DXC Technology
- Graduated 2019 with a major in Data Science (UniMelb)
- Academic Teaching Staff for Foundations of Computing (COMP10001) and Algorithm & Data Structures (COMP20003)
- Experienced as a Cloud Architect (AWS, Azure and GCP), Data Engineer and as a Data Analyst.
- I also love developing assorted Python scripts to automate the boring stuff
Lastly, if you liked this guide / repository, please leave a star so I know it's worth keeping updated and online! This is the kind of material I wish I had to explore other subjects and get a better understanding of the content.
The "Data Science" Major Subjects
Captstone Project (akin to IT project for CompSci students):
- Applied Data Science (MAST30034)
Core Computing & Information Systems (CIS) Subjects:
- Foundations of Computing (COMP10001)
- Foundations of Algorithms (COMP10002)
- Database Systems (INFO20003)
- Elements of Data Processing (COMP20008)
- Machine Learning (COMP30027)
CIS Electives:
- Algorithm Data Structures (COMP20003)
- Artificial Intelligence (COMP30024)
- Information Security and Privacy (INFO30006)
Core Math Subjects:
There's a lot more maths than you think (the major shares the same core maths as Actuarial Science up to second year), and so I highly recommend you take either AM1 / AM2 or Real Analysis, else Probability will be a hard leap.
- Calculus 2 (MAST10006)
- Linear Algebra (MAST10007)
- Probability (MAST20004)
- Statistics (MAST20005)
- Linear Statistical Models (MAST30025)
- Modern Applied Statistics (MAST30027)
Science Electives:
- Physics 1 (PHYC10003)
- Fundamentals of Chemistry (CHEM10007)
- Engineering Systems Design 2 (ENGR10003)
- Science and Internship Program (SCIE30002)
Breadths:
- Japanese 3 (JAPN10007)
- Music in the Culture of the Renaissance (MUSI30011)
- High Baroque Music of the German World (MUSI30014)
- Music Health (MUSI20150)
- Positive Leadership and Careers (EDUC30072)
Subject Reviews (Pooled from a variety of people and my opinion)
Foundations of Computing:
- Subject is run well both semesters and is a great introduction to Python and Computer Science.
- Not crammable. If you want to do well, you have to consistently grind and practice it (like maths).
- Consider doing it if you're interested in programming (and it's a better alternative to ENG COMP).
Foundations of Algorithms:
- Introduction to basic sorting algorithms and the C programming language.
- Makes you appreciate memory management in Python because of bloody
malloc
(thoughcalloc
is better imo). - Tutors are amazing and can actually teach. (Shout out to my tutor Alex Zable if he's still teaching).
- Learn to use
valgrind
if you don't wantSegmentation Faults
, andgdb
to avoid debugging nightmares.
Elements of Data Processing:
- SO it was really poorly run when we took the subject, but apparently its better now so I guess it's better?
- Teaches concepts of ETL, data processing and data cleaning via Python (Jupyter Notebooks)
- Good entry level material taught, and you'll find that a lot of the fun stuff (such as ML and research) need to have a good ETL pipeline setup in order for it to be efficient and working
- I personally think this subject is worth taking, but real world data is much worse and there's no one telling you what to do. If you went to get better at this kind of stuff, suss out kaggle datasets and perform your own data cleaning and analysis
Database Systems:
- I just want to say that this is one of the most useful subjects I have taken in undergrad. I'm using a decent chunk of SQL (Microsoft SQL Server or IPython-sql library) for my work and the material I learned from this subject has come in handy
- Reneta / David are the lecturers, and both are very clear and passionate about teaching.
- Reneta is actually good once you get used to her accent trust me.
- The theory content is very useful, and the concepts taught are very applicable in real life jobs.
- 1st assignment is a bit iffy since it's a conceptual diagram of an ER-Model
- The intuition will help you for ML, AI, and any Data Science or Data Analytics position.
Algorithm Data Structures:
- By far the best 2nd year CIS subject ever (better alternative to Design of Algorithms)
- Goes through all the great algorithms, including path-finding algorithms (unlike DoA which covers hashing and compression instead)
- For example, the second assignment is usually on path finding and very basic artificial intelligence implementations to solve a
15 puzzle
or to even playpacman
!
- For example, the second assignment is usually on path finding and very basic artificial intelligence implementations to solve a
- Assignments are great fun, and after FoA you should (hopefully) be experienced enough in C to appreciate it.
- If you're rusty on C don't worry, first few lectures is revision (we recover
malloc
as well for eng comp kids8) - The 2018 Exam question about electrical outages landed me a Graduate offer at EssentialEnergy (ayyyy)
- I highly recommend this subject over Design of Algorithms if you prefer applications of algorithms over the theory!
- Students who never had to experience dimefox / nutmeg gonna hate and not appreciate. JupyterHub is so good compared to dimefox and nutmeg servers.
- I want to add on by saying you guys are super lucky, JupyterHub has only recently become a more commercially used way of showing visualizations and running code on the cloud - and you people have first hand experience of it!
Artificial Intelligence:
- First third of the lectures are review of basic search algorithms (ADS students will find it a breeze).
- Assignments ARE AMAZINGLY FUN.
- Tutors and lectures are ACTUALLY GOOD.
- Hard and conceptual tutorial questions (although there is no full solution) but are quite useful in expanding your problem solving.
- Notation for Probability (YES PROB IS IN THE SUBJECT) uses logical AND/OR/NOT, so you have been warned.
- If you loved ADS or DoA, you'll love this even more (and it's beneficial to both ML and Applied Data Science)
Machine Learning:
- Subject has no maths pre-req, but they did try to attempt to cover "Linear Statistical Models" in 1 lecture.
- The usual lecturer is great, given that the maths is not a pre-req (at least tries to make it interesting). However, he was sick for a vast majority of the time we took the subject so my experience may be a bit more biased to the worse side.
- A lot of content was attempted to be covered and felt a bit too ambitious given that the maths was not pre-req.
- First assignment is a joke if you have done maths, but the second is a lot more interesting.
- Quoting my tutor: "this subject is a money grabber because no one would take this subject if Probability was an actual pre-req".
- Tim Baldwin got us good for Exam Section D
- UPDATE: The lecturer is Kris Ehinger, the same lecturer as Algorithms & Data Structures! A few students and myself have gone around to get feedback and have discussed some possible improvements for the subject, so rest assured that the subject is better coordinated :)
Applied Data Science (Capstone Project):
- A very informative subject to make an overall great learning experience
- Project is very fun and enjoyable if you love applying different techniques you've learnt over the previous subjects
- Specifically, we had to create a game playing agent to act as a Taxi driver using real world data.
- Group project is very dependent on how good your team works together, so find peers that have a similar work ethic as you!
Maths Subjects:
- They're all well run
- Probability is a big jump (if you didn't do Real Analysis or AM1/AM2) so prepare to grind
- Have yet to find anything bad about the lecturers / lectures and content
- Yao-Ban is the best lecturer (you'll have him for LSM)
Final Tips
- I highly recommend you go learn LaTeX, which can be easily done through www.overleaf.com. Not only does it make your assignments look amazing, it's a great way of making great notes and reports (which employers love)
- DO try your best, but WAM isn't everything (I have a mediocre WAM in the 75s) and anything above a 65+ will land you a job provided you network and apply yourself well.
- Try to do projects outside of university. A lot of job offers were as a result of several projects I completed outside of University.
- Be active in your tutorial and labs! It helps the tutor a lot and will help you learn better
- My plea: Honestly, people who think sharepoints with excel spreadsheets are "databases" should just end themselves. Become a good Database admin - document and normalize your work, make it easier for everyone else please.
So why Data Science?
It's actually just statistics rebranded with computer science. You're essentially combining the brains of a statistician with the brawn of computing power. If you love working with data (ethical or non-ethical) or a keen to analyze things, Data Science is definitely a great field to go into.
Work and Job Pathway as a Graduate
This is purely taken from my own personal experience
You can be accepted for a wide variety of roles inlcuding (but not limited to):
- Junior Data Scientist
- Consultant
- Technical Consultant
- Data Analyst
- Business Analyst
- Data Engineer
- Cloud Architect
- Machine Learning Engineer/Expert
- Data Developer
For companies, the places that I applied (early half of the year) and was offered include:
- DXC ANZ
- Essential Energy
- Big 4 (KPMG, Deloite)
- Department of Defence
- Department of Industry, Science, Energy & Resources
- Accenture ANZ
- Coles
- ATO
There are plenty of jobs around and networking opportunities to be made - I highly suggest you join the Melbourne Data Science MeetUp group to meet like-minded people: https://duckduckgo.com/?q=meetup+data+science+melbourne&t=braveed&ia=web
License
All of the source code in this project is licensed under the GNU General Public License v3 (or later).
Copyright (C) 2019 Akira Wang
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.