All Projects → martinapugliese → tales-science-data

martinapugliese / tales-science-data

Licence: other
Companion repo to the GitBook, notes on Data Science topics

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to tales-science-data

peakutils
PeakUtils mirror from bitbucket.
Stars: ✭ 25 (-39.02%)
Mutual labels:  science
periodum
Periodum: An Interactive, Open-Source Periodic Table!
Stars: ✭ 346 (+743.9%)
Mutual labels:  science
AlertaDengue
Portal de dados do Projeto Alerta Dengue
Stars: ✭ 16 (-60.98%)
Mutual labels:  science
scitizen
Scitizen - Help scientific research for the benefit of mankind and humanity 🔬
Stars: ✭ 21 (-48.78%)
Mutual labels:  science
Ocean-Data-Map-Project
The Ocean Navigator is an online tool that is used to help visualise scientific research data. a users guide is available at https://dfo-ocean-navigator.github.io/Ocean-Navigator-Manual/ and the tool is live at
Stars: ✭ 41 (+0%)
Mutual labels:  science
fenris
A library for advanced finite element computations in Rust
Stars: ✭ 52 (+26.83%)
Mutual labels:  science
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (-4.88%)
Mutual labels:  science
atacr
Analysing Capture Seq Count Data
Stars: ✭ 14 (-65.85%)
Mutual labels:  science
SpatialDataScience
Introduction to Data Science with R
Stars: ✭ 29 (-29.27%)
Mutual labels:  science
scipp
Multi-dimensional data arrays with labeled dimensions
Stars: ✭ 55 (+34.15%)
Mutual labels:  science
sciware
Learning materials for scientific software development
Stars: ✭ 40 (-2.44%)
Mutual labels:  science
PatCit
Making Patent Citations Uncool Again
Stars: ✭ 84 (+104.88%)
Mutual labels:  science
getfem
Mirror of GetFEM repository
Stars: ✭ 23 (-43.9%)
Mutual labels:  science
Git-for-bio-scientists
Presentation about digital lab journalling with Git
Stars: ✭ 30 (-26.83%)
Mutual labels:  science
adorad
Fast, Expressive, & High-Performance Programming Language for those who dare
Stars: ✭ 54 (+31.71%)
Mutual labels:  science
SHARE
SHARE is building a free, open, data set about research and scholarly activities across their life cycle.
Stars: ✭ 93 (+126.83%)
Mutual labels:  science
mxfactorial
a payment application intended for deployment by the united states treasury
Stars: ✭ 36 (-12.2%)
Mutual labels:  science
covid19 scenarios data
Data preprocessing scripts and preprocessed data storage for COVID-19 Scenarios project
Stars: ✭ 43 (+4.88%)
Mutual labels:  science
bsu
🎓Repository for university labs on FAMCS, BSU
Stars: ✭ 91 (+121.95%)
Mutual labels:  science
mlearn
Benchmark Suite for Machine Learning Interatomic Potentials for Materials
Stars: ✭ 89 (+117.07%)
Mutual labels:  science

Tales of Science and Data

A project by Martina Pugliese.

This book is a collection of notes on Data Science, from Statistics to Machine Learning, passing through all sorts of related areas.

I've decided to give form to a rather disorderly collection of notes I had about data science & all sorts of related areas, which is how this project has generated. You can read more in the Meta page about the how's and the why's of this.

Contents

Meta and resources

This section explains how this whole thing has started and why, what it is and how it's done, plus some awesome resources found on the web.

Probability, statistics & data analysis

A collection of notes on topics regarding Probability and Statistics and the way to use them to analyse data and draw conclusions.

Machine learning: concepts and procedures

How do we do Machine Learning? This chapter offers a high-level overview of the techniques and methodologies.

Machine learning: fundamental algorithms

This chapter is pretty much a page for each algorithm in "shallow learning", that is, all non "deep". Neural networks, even when shallow, are not presented here as there is a dedicated chapter on them, which is the same chapter that dives into deep learning. The division here is into the main learning paradigms.

Machine learning: model assessment

This part deals with how to assess the quality of a model and diagnose problems.

Artificial neural networks

Digging into the world of Artificial Neural Networks, a fascinating area of Machine Learning particularly on the rise these days. This deserved its own chapter.

Natural language processing

Natural Language Processing (NLP) is the field (a part of Machine Learning) which deals with text, an unstructured data source. What NLP tries to do is putting text into numerical representations, and extracting information from it.

Computer vision

Images, seen by the machine. This section deals with using computers to extract and use information from visual data. We will illustrate a whole set of methods, which may or may not encompass the use of Neural Networks.

The Computer science appendix

Some (non-comprehensive) notes on Computer Science fundamentals.

The mathematics appendix

Some (non-comprehensive) notes on mathematics, used everywhere in data work. Useful little bits.

Toolbox

(Some) software tools used in Data Science, high-level overviews.

About the code parts

Several pages contain snippets of code. I've been using Python (3) and for those pages a link to a relative Jupyter notebook in the Github repo corresponding to this book is provided for your perusal if you want to play around. The overall repo is reachable on ****Github and you can also visualise the notebooks prettyfied via the Jupyter Notebooks viewer.

The libraries used in the notebooks are usually (unless specified) those of the Python data stack (Numpy, Scipy, sklearn, Pandas, ...). The plots presented in here have been customised, the repo contains all styling files.

Notify me of mistakes

Mistakes happen. Inaccuracies and oversights as well, from the content point to view to the rendering/graphics one (e.g., one TeX formula doesn't appear rendered). You are more than welcome, encouraged in fact, to submit issues to the repo for these things.

License

(C) 2017-2021 Martina Pugliese

This book is released under the Creative Commons NoDerivatives 4.0 International (CC BY-NC-ND 4.0).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].