All Projects → CSI-SFIT → Data-Science-Resources

CSI-SFIT / Data-Science-Resources

Licence: MIT license
A guide to getting started with Data Science and ML.

Projects that are alternatives of or similar to Data-Science-Resources

Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+46741.18%)
Mutual labels:  math, numpy, pandas, data-analysis
Data-Scientist-In-Python
This repository contains notes and projects of Data scientist track from dataquest course work.
Stars: ✭ 23 (+35.29%)
Mutual labels:  numpy, pandas, datascience, machinelearning
Pyda 2e Zh
📖 [译] 利用 Python 进行数据分析 · 第 2 版
Stars: ✭ 866 (+4994.12%)
Mutual labels:  numpy, pandas, data-analysis
100 Pandas Puzzles
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
Stars: ✭ 1,382 (+8029.41%)
Mutual labels:  numpy, pandas, data-analysis
Data-Structures-and-Algorithms--A-Comprehensive-Guide
Data Structures & Algorithms - A Comprehensive Guide
Stars: ✭ 15 (-11.76%)
Mutual labels:  csi, csi-sfit, csisfit
visions
Type System for Data Analysis in Python
Stars: ✭ 136 (+700%)
Mutual labels:  numpy, pandas, data-analysis
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+1505.88%)
Mutual labels:  numpy, pandas, data-analysis
Data Analysis
主要是爬虫与数据分析项目总结,外加建模与机器学习,模型的评估。
Stars: ✭ 142 (+735.29%)
Mutual labels:  numpy, pandas, data-analysis
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (+1370.59%)
Mutual labels:  pandas, datascience, machinelearning
Pynamical
Pynamical is a Python package for modeling and visualizing discrete nonlinear dynamical systems, chaos, and fractals.
Stars: ✭ 458 (+2594.12%)
Mutual labels:  math, numpy, pandas
Awkward 1.0
Manipulate JSON-like data with NumPy-like idioms.
Stars: ✭ 203 (+1094.12%)
Mutual labels:  numpy, pandas, data-analysis
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (+82.35%)
Mutual labels:  numpy, pandas, data-analysis
data-analysis-using-python
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (+376.47%)
Mutual labels:  numpy, pandas, data-analysis
Data-Analyst-Nanodegree
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Stars: ✭ 42 (+147.06%)
Mutual labels:  numpy, pandas, data-analysis
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+25705.88%)
Mutual labels:  numpy, pandas, data-analysis
Seaborn Tutorial
This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (+570.59%)
Mutual labels:  numpy, pandas, data-analysis
Code
Compilation of R and Python programming codes on the Data Professor YouTube channel.
Stars: ✭ 287 (+1588.24%)
Mutual labels:  pandas, datascience, machinelearning
Pbpython
Code, Notebooks and Examples from Practical Business Python
Stars: ✭ 1,724 (+10041.18%)
Mutual labels:  pandas, datascience, data-analysis
Data Science Notebook
📖 每一个伟大的思想和行动都有一个微不足道的开始
Stars: ✭ 196 (+1052.94%)
Mutual labels:  numpy, pandas, data-analysis
Studybook
Study E-Book(ComputerVision DeepLearning MachineLearning Math NLP Python ReinforcementLearning)
Stars: ✭ 1,457 (+8470.59%)
Mutual labels:  math, numpy, pandas

Data Science Resources

ML

* A guide to getting started with Data Science and ML *
(Deep Learning not included)


MATH


For Data Analysis knowledge of Statistics is enough but for building ML models Calculus, Linear Algebra and Probability also plays a huge role.

  1. Math for Data Science
  2. Statistics Revision
  3. Khan Academy Calculus
  4. Gilbert Strang's linear algebra
  5. Blog post for all Math resources required for ML

Reading thoeritical books might be getting too involved, if your goal is to make ML models to just fulfill your applications. But for people who'd like to understand deep learning algorithms and the math behind it, this is a short list of resources.

  1. How do I learn mathematics for machine learning?
    This quora answer gives a detailed 5 month roadmap (which can and should be extended according to your comfort) for learning the math behind machine learning and math that every engineer must knof of in general.
  2. Maths for Machine Learning
    This book brings the mathematical foundations of basic machine learning concepts to the fore and collects the information in a single place. This book is intended to be a guidebook to the vast mathematical literature that forms the foundations of modern machine learning.

Data Analysis


Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively

Numpy
A very useful library for math and Scientific Computing

  1. Numpy tutorial
  2. Numpy Videos

Pandas
Most used Python library for Data Analysis

  1. Pandas Documentation
  2. YouTube Playlist
  3. DataCamp Tutorial

Data Visualization

  1. Matplotlib Playlist
  2. Matplotlib Tutorials
  3. Seaborn Tutorials

SQL

  1. MYSQL Tutorial
  2. MYSQL TutorialsPoint
  3. MYSQL YouTube Videos
  4. Postgres SQL

Big Data Analytics


Big Data is a massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools. Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. Big Data analytics provides various advantages—it can be used for better decision making, preventing fraudulent activities, among other things.

Tools Used in Big Data Analytics

Here are some popular tools used in Big Data analytics:

  1. Hadoop - helps in storing and analyzing data
  2. Spark - used for real-time processing and analyzing large amounts of data
  3. Kafka - a distributed streaming platform that is used for fault-tolerant storage
  4. Cassandra - a distributed database used to handle chunks of data

Big Data Courses

  1. Big Data Coursera
  2. Big Data Essentials: HDFS, MapReduce and Spark RDD

ML Courses

Practical (More bent towards Programming)

  1. Intro to Machine-Learning Udacity
  2. Kaggle Mini-Courses
  3. Machine Learning A-Z: Hands-On Python & R In Data Science Udemy

Theoritical (More in-depth Math Concepts)

  1. Machine Learning Andrew Ng (MATLAB)
  2. Stanford CS229: Machine Learning (Autumn 2018)
  3. Machine Learning Crash Course by Google

Books

For absolute beginners

  1. Python for Data Analysis:Data Wrangling with Pandas,NumPy,and IPython
  2. Intro to ML with Python
  3. Hands on ML with Scikit-learn and Tensorflow

For intermediates

  1. Approaching almost any ML problem (Abhishek Thakur)

Websites

  1. Made with ML by Goku Mohandas
  2. End to end ML by Brendan Rohrer
  3. A.I. by Google Researchers
  4. Towards Data Science by Medium

Notes

  1. Data Science Notes by Chris Albon
  2. Andrew Ng's ML Notes
  3. CS229 Stanford Notes

YouTube Channels

  1. Pydata
  2. Siraj Raval
  3. Sentdex
  4. Krish Naik
  5. Corey Schafer

Best Websites to get free datasets

  1. Kaggle
  2. UCL Machine learning repositories
  3. Stanford Data
  4. Google public datasets
  5. FiveThirtyEight

How to Contribute

  1. Clone repo and create a new branch: $ git checkout https://github.com/CSI-SFIT/Data-Science-Resources -b name_for_new_branch.
  2. Make changes and test.
  3. Submit Pull Request with comprehensive description of changes.

Acknowledgements

CSI SFIT Tech Team 2020 - 2021 :

csi_logo

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].