All Projects → iewaij → Introdatascience

iewaij / Introdatascience

Licence: cc-by-sa-4.0
Notes on Data Science. 数理统计、机器学习和数据编程的学习笔记。

Projects that are alternatives of or similar to Introdatascience

Stockperformanceclassification
Keras 1D CNN on Azure ML Workbench to classify 4 week stock performance based on text in public earnings statements
Stars: ✭ 126 (-0.79%)
Mutual labels:  jupyter-notebook
Finlib
A streamlined library for getting historical financial price data, fundamental data, and financial ratios.
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Mastering Python For Finance Second Edition
Sources codes for: Mastering Python for Finance, Second Edition
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Ajax Movie Recommendation System With Sentiment Analysis
Content-Based Recommender System recommends movies similar to the movie user likes and analyses the sentiments on the reviews given by the user for that movie.
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Azure Sentinel Notebooks
Interactive Azure Sentinel Notebooks provides security insights and actions to investigate anomalies and hunt for malicious behaviors.
Stars: ✭ 126 (-0.79%)
Mutual labels:  jupyter-notebook
Udacity Ml Capstone
Udacity 2018 Machine Learning Nanodegree Capstone project
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Deepkoopman
neural networks to learn Koopman eigenfunctions
Stars: ✭ 126 (-0.79%)
Mutual labels:  jupyter-notebook
Pynq Computervision
Computer Vision Overlays on Pynq
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Earthengine Community
Tutorials and content created by Earth Engine users, for Earth Engine users
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Trustscore
To Trust Or Not To Trust A Classifier. A measure of uncertainty for any trained (possibly black-box) classifier which is more effective than the classifier's own implied confidence (e.g. softmax probability for a neural network).
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
2019 Scalingattack
Image-Scaling Attacks and Defenses
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Bnlp
BNLP is a natural language processing toolkit for Bengali Language.
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Data Science For Marketing Analytics
Achieve your marketing goals with the data analytics power of Python
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Lisa
Linux Integrated System Analysis
Stars: ✭ 126 (-0.79%)
Mutual labels:  jupyter-notebook
Python notes
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Reptilesomething
抓点什么玩玩~
Stars: ✭ 126 (-0.79%)
Mutual labels:  jupyter-notebook
Jupyter Datatables
Jupyter Notebook extension leveraging pandas DataFrames by integrating DataTables and ChartJS.
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Algorithms Illuminated
My notes for Tim Roughgarden's awesome course on Algorithms and his 4 part books
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Visual Attribution
Pytorch Implementation of recent visual attribution methods for model interpretability
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook
Aihub
I use this repository for my Youtube channel where I share videos about Artificial Intelligence. The repository includes Machine Learning, Deep Learning, and Reinforcement learning's code.
Stars: ✭ 127 (+0%)
Mutual labels:  jupyter-notebook

数据科学导论

去年暑假我开始写一些数据科学入门的笔记,现在回头看惨不忍睹,不至于误人子弟(所以我就赖着不删了),但没达到我现在的期望,决定重写。

我用朴素的厨房观看待人类思想:知识的生产就像炒菜,先要有炊具,然后有原材料,加上心灵手巧就能做一盘好菜。数据科学也一样,说来说去核心工具有三大件:数理统计、机器学习、数据编程,带上工具、数据和脑子,你就可以自称数据科学家去解决问题了。数理统计是用数理逻辑分析数据,机器学习从数据里提炼关键信息用于预测,数据编程做数据的维护工作,怎么存储、怎么读取、怎么可视化、怎么高效压榨计算机的性能。

说一下我对数理统计和机器学习的理解。数理统计讲求严格的证明推导,是从空地起高楼;机器学习归纳从数据里寻找规律的方法(就是 SICP 说的「抽象」),然后反复调用这些方法用来预测。谈不上高低,数理统计是演绎,对结论的解释和因果推断更到位,令人信服,但预测效果不怎么样;机器学习是归纳,预测得准,但说不清预测是怎么来的。同时了解这两门学科是有必要的,而且很多学者在努力结合这两者。

这个专栏是涉及数理统计、机器学习和数据编程三个方面的学习笔记,主要关注数理统计和机器学习。为什么要有笔记?笔记通常比教科书更简洁,比视频和课件更方便阅读,适合读者按图索骥、复习和总结。很多人讲数据科学这么多东西要学怎么学得完,我的看法是,做不到样样精通,因此更需要有一个地图,这样遇到问题了能知道是哪里出了问题、要去哪里找答案。那我写得完吗?当然写不完... 拖延症最擅长的就是做完成不了的承诺了, 所以欢迎大家投稿和指责。

由于 Github 不渲染数学公式,我把大致梳理的数理统计和机器学习思路放在了 preface.ipynb 里。

虽然我不喜欢列书单这种营造焦虑感的事情,但考虑到这份笔记的涵盖范围,有必要列出参考书目和课程:

Practical Data Science
这门课是 CMU 面向本科生和硕士生的数据科学导论课程,今年第一次把课程视频公开,同时附有详细的笔记和作业。

Statistics 110: Probability
这是 Harvard 面向本科生的数理统计课程。

Learning From Data
这门课是 Caltech 面向本科生的机器学习线上公开课程,相比于 Cousera 上 Ng 的课程,理论性更强。

Statistical Learning
这门课是 Stanford 面向无统计背景学生的统计学习线上公开课程,统计学习和机器学习非常相似,但数理统计的味道更浓。

Rice, J. A. (2007) Mathematical statistics and data analysis. 3rd ed.
Wackerly, D. D., Mendenhall, W. and Scheaffer, R. L. (2008) Mathematical statistics with applications. 7. ed.
Wasserman, L. (2013) All of statistics: a concise course in statistical inference.
这三本都可用作学习数理统计的参考书。

笔记更新时间不定,也不会按顺序写,有空就零星地想怎么写就怎么写,还会有一些数据分析。最好的跟踪方式是 RSS知乎专栏和我的 Github 仓库

对了,我目前感觉最好的数据科学 IDE 是 Jupyter Lab,无论是写作还是写代码都好用。

目录

  • [ ] 前言

  • [ ] 数理统计笔记

  • [ ] 机器学习笔记

  • [ ] 数据编程笔记

  • [ ] 数据分析

Changelog

2018-02-28 高斯马尔科夫定理 2018-02-21 添加 CC BY-NC-SA 4.0 条款
2018-02-21 最小二乘法线性回归
2018-02-01 前言

知识共享许可协议
本作品采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可。
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].