All Projects → JasonMDev → Learning Python Predictive Analytics

JasonMDev / Learning Python Predictive Analytics

Tracking, notes and programming snippets while learning predictive analytics

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Learning Python Predictive Analytics

srqm
An introductory statistics course for social scientists, using Stata
Stars: ✭ 43 (+65.38%)
Mutual labels:  linear-regression, logistic-regression
models-by-example
By-hand code for models and algorithms. An update to the 'Miscellaneous-R-Code' repo.
Stars: ✭ 43 (+65.38%)
Mutual labels:  linear-regression, logistic-regression
GDLibrary
Matlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (+92.31%)
Mutual labels:  linear-regression, logistic-regression
Python-AndrewNgML
Python implementation of Andrew Ng's ML course projects
Stars: ✭ 24 (-7.69%)
Mutual labels:  linear-regression, logistic-regression
Machine Learning With Python
Python code for common Machine Learning Algorithms
Stars: ✭ 3,334 (+12723.08%)
Mutual labels:  logistic-regression, linear-regression
machine learning course
Artificial intelligence/machine learning course at UCF in Spring 2020 (Fall 2019 and Spring 2019)
Stars: ✭ 47 (+80.77%)
Mutual labels:  linear-regression, logistic-regression
cobra
A Python package to build predictive linear and logistic regression models focused on performance and interpretation
Stars: ✭ 23 (-11.54%)
Mutual labels:  linear-regression, logistic-regression
Machine Learning Models
Decision Trees, Random Forest, Dynamic Time Warping, Naive Bayes, KNN, Linear Regression, Logistic Regression, Mixture Of Gaussian, Neural Network, PCA, SVD, Gaussian Naive Bayes, Fitting Data to Gaussian, K-Means
Stars: ✭ 160 (+515.38%)
Mutual labels:  logistic-regression, linear-regression
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-34.62%)
Mutual labels:  linear-regression, logistic-regression
VBLinLogit
Variational Bayes linear and logistic regression
Stars: ✭ 25 (-3.85%)
Mutual labels:  linear-regression, logistic-regression
Deeplearning.ai
该存储库包含由deeplearning.ai提供的相关课程的个人的笔记和实现代码。
Stars: ✭ 181 (+596.15%)
Mutual labels:  logistic-regression, linear-regression
Machine learning basics
Plain python implementations of basic machine learning algorithms
Stars: ✭ 3,557 (+13580.77%)
Mutual labels:  logistic-regression, linear-regression
Deep Math Machine Learning.ai
A blog which talks about machine learning, deep learning algorithms and the Math. and Machine learning algorithms written from scratch.
Stars: ✭ 173 (+565.38%)
Mutual labels:  logistic-regression, linear-regression
Machine-Learning-Models
In This repository I made some simple to complex methods in machine learning. Here I try to build template style code.
Stars: ✭ 30 (+15.38%)
Mutual labels:  linear-regression, logistic-regression
Machine learning
Estudo e implementação dos principais algoritmos de Machine Learning em Jupyter Notebooks.
Stars: ✭ 161 (+519.23%)
Mutual labels:  logistic-regression, linear-regression
SGDLibrary
MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (+534.62%)
Mutual labels:  linear-regression, logistic-regression
Mylearn
machine learning algorithm
Stars: ✭ 125 (+380.77%)
Mutual labels:  logistic-regression, linear-regression
The Python Workshop
A New, Interactive Approach to Learning Python
Stars: ✭ 150 (+476.92%)
Mutual labels:  logistic-regression, linear-regression
Machine-Learning-Andrew-Ng
机器学习-Coursera-吴恩达- python+Matlab代码实现
Stars: ✭ 127 (+388.46%)
Mutual labels:  linear-regression, logistic-regression
Fuku Ml
Simple machine learning library / 簡單易用的機器學習套件
Stars: ✭ 280 (+976.92%)
Mutual labels:  logistic-regression, linear-regression

Predictive Analytics with Python

These are my notes from working through the book Learning Predictive Analytics with Python by Ashish Kumar and published on Feb 2016.

General

###Chapter 1: Getting Started with Predictive Modelling

  • [x] Installed Anaconda Package.
  • [x] Python3.5 has been installed.
  • [x] Book follows python2, so some codes is modified along the way for python3.

###Chapter 2: Data Cleaning

  • [x] Reading the data: variations and examples
  • [x] Data frames and delimiters.

####Case 1: Reading a dataset using the read_csv method

  • [x] File: titanicReadCSV.py
  • [x] File: titanicReadCSV1.py
  • [x] File: readCustomerChurn.py
  • [x] File: readCustomerChurn2.py
  • [x] File: changeDelimiter.py

####Case 2: Reading a dataset using the open method of Python

  • [x] File: readDatasetByOpenMethod.py

####Case 3: Reading data from a URL

  • [x] Modified the code that it works and prints out line by line dictionary of the dataset.
  • [x] File: readURLLib2Iris.py
  • [x] File: readURLMedals.py

####Case 4: Miscellaneous cases

  • [x] File: readXLS.py
  • [x] Created the file above to read from both .xls an .xlsx

####Basics: Summary, dimensions, and structure

  • [x] File: basicDataCheck.py
  • [x] Created the file above to read from both .xls an .xlsx

####Handling missing values

  • [x] File: basicDataCheck.py
  • [x] RE: Treating missing data like NaN or None
  • [x] Deletion orr imputaion

####Creating dummy variables

  • [x] File: basicDataCheck.py
  • [x] Split into new variable 'sex_female' and 'sex_male'
  • [x] Remove column 'sex'
  • [x] Add both dummy column created above.

####Visualizing a dataset by basic plotting

  • [x] File: plotData.py
  • [x] Figure file: ScatterPlots.jpeg
  • [x] Plot Types: Scatterplot, Histograms and boxplots

###Chapter 3: Data Wrangling ####Subsetting a dataset

  • [x] Selecting Columns
  • [x] File: subsetDataset.py
  • [x] Selecting Rows
  • [x] File: subsetDatasetRows.py
  • [x] Selecting a combination of rows and columns
  • [x] File: subsetColRows.py
  • [x] Creating new columns
  • [x] File: subsetNewCol.py

####Generating random numbers and their usage

  • [x] Various methods for generating random numbers
  • [x] File: generateRandomNumbers.py
  • [x] Seeding a random number
  • [x] File: generateRandomNumbers.py
  • [x] Generating random numbers following probability distributions
  • [x] File: generateRandomProbDistr.py
  • [x] Probability density function: PDF = Prob(X=x)
  • [x] Cumulative density function: CDF(x) = Prob(X<=x)
  • [x] Uniform distribution: random variables occur with the same (uniform) frequency/probability
  • [x] Normal distribution: Bell Curve and most ubiquitous and versatile probability distribution
  • [x] Using the Monte-Carlo simulation to find the value of pi
  • [x] File: calcPi.py
  • [x] Geometry and mathematics behind the calculation of pi
  • [x] Generating a dummy data frame
  • [x] File: generateDummyDataFrame.py

####Grouping the data – aggregation, filtering, and transformation

  • [x] File: groupData.py
  • [x] Grouping
  • [x] Aggregation
  • [x] Filtering
  • [x] Transformation
  • [x] Miscellaneous operations

####Random sampling – splitting a dataset in training and testing datasets

  • [ ] File: splitDataTrainTest.py
  • [x] Method 1: using the Customer Churn Model
  • [x] Method 2: using sklearn
  • [ ] Method 3: using the shuffle function

####Concatenating and appending data

  • [x] File: concatenateAndAppend.py
  • [x] File: appendManyFiles.py

####Merging/joining datasets

  • [x] File: mergeJoin.py
  • [x] Inner Join
  • [x] Left Join
  • [x] Right Join
  • [x] An example of the Inner Join
  • [x] An example of the Left Join
  • [x] An example of the Right Join
  • [x] Summary of Joins in terms of their length

###Chapter 4: Statistical Concepts for Predictive Modelling ####Random sampling and central limit theorem ####Hypothesis testing

  • [x] Null versus alternate hypothesis
  • [x] Z-statistic and t-statistic
  • [x] Confidence intervals, significance levels, and p-values
  • [x] Different kinds of hypothesis test
  • [x] A step-by-step guide to do a hypothesis test
  • [x] An example of a hypothesis test

####Chi-square testing ####Correlation

  • [x] File: linearRegression.py
  • [x] File: linearRegressionFunction.py
  • [x] Picture: TVSalesCorrelationPlot.png
  • [x] Picture: RadioSalesCorrelationPlot.png
  • [x] Picture: NewspaperSalesCorrelationPlot.png

###Chapter 5: Linear Regression with Python ####Understanding the maths behind linear regression

  • [x] Linear regression using simulated data
  • [x] File: linearRegression.py
  • [x] Picture: CurrentVsPredicted1.png
  • [x] Picture: CurrentVsPredictedVsMean1.png
  • [x] Picture: CurrentVsPredictedVsModel1.png

####Making sense of result parameters

  • [x] File: linearRegression.py
  • [x] p-values
  • [x] F-statistics
  • [x] Residual Standard Error (RSE)

####Implementing linear regression with Python

  • [x] File: linearRegressionSMF.py
  • [x] Linear regression using the statsmodel library
  • [x] Multiple linear regression
  • [x] Multi-collinearity: sub-optimal performance of the model
  • [x] Variance Inflation Factor
  • [x] It is a method to quantify the rise in the variability of the coefficient estimate of a particular variable because of high correlation between two or more than two predictor variables.

####Model validation

  • [x] Training and testing data split
  • [x] File: linearRegressionSMF.py
  • [x] Linear regression with scikit-learn
  • [x] File: linearRegressionSKL.py
  • [x] Feature selection with scikit-learn
  • [x] Recursive Feature Elimination (RFE)
  • [x] File: linearRegressionRFE.py

####Handling other issues in linear regression

  • [x] Handling categorical variables
  • [x] File: linearRegressionECom.py
  • [x] Transforming a variable to fit non-linear relations
  • [x] File: nonlinearRegression.py
  • [x] Picture: MPGVSHorsepower.png
  • [x] Picture: MPGVSHorsepowerVsLine.png
  • [x] Picture: MPGVSHorsepowerModels.png
  • [x] Handling outliers
  • [x] Other considerations and assumptions for linear regression

###Chapter 6: Logistic Regression with Python ####Linear regression versus logistic regression ####Understanding the math behind logistic regression

  • [x] File: logisticRegression.py
  • [x] Contingency tables
  • [x] Conditional probability
  • [x] Odds ratio
  • [x] Moving on to logistic regression from linear regression
  • [x] Estimation using the Maximum Likelihood Method
  • [x] Building the logistic regression model from scratch
  • [x] File: logisticRegressionScratch.py
  • [ ] Read above again.
  • [x] Making sense of logistic regression parameters
  • [x] Wald test
  • [x] Likelihood Ratio Test statistic
  • [x] Chi-square test
  • [x]

####Implementing logistic regression with Python

  • [x] File: logisticRegressionImplementation.py
  • [x] Processing the data
  • [x] Data exploration
  • [x] Data visualization
  • [x] Creating dummy variables for categorical variables
  • [x] Feature selection
  • [x] Implementing the model

####Model validation and evaluation

  • [x] File: logisticRegressionImplementation.py
  • [x] Cross validation

####Model validation

  • [x] File: logisticRegressionImplementation.py
  • [x] The ROC curve {see terms}

###Chapter 7: Clustering with Python ####Introduction to clustering – what, why, and how?

  • [x] What is clustering?
  • [x] How is clustering used?
  • [x] Why do we do clustering?

####Mathematics behind clustering

  • [x] Distances between two observations
  • [x] Euclidean distance
  • [x] Manhattan distance
  • [x] Minkowski distance
  • [x] The distance matrix
  • [x] Normalizing the distances
  • [x] Linkage methods
  • [x] Single linkage
  • [x] Compete linkage
  • [x] Average linkage
  • [x] Centroid linkage
  • [x] Ward's method uses ANOVA method
  • [x] Hierarchical clustering
  • [x] K-means clustering
  • [x] File: kMeanClustering.py

####Implementing clustering using Python

  • [x] File: clusterWine.py
  • [x] Importing and exploring the dataset
  • [x] Normalizing the values in the dataset
  • [x] Hierarchical clustering using scikit-learn
  • [x] K-Means clustering using scikit-learn
  • [x] Interpreting the cluster

####Fine-tuning the clustering

  • [x] The elbow method
  • [x] Silhouette Coefficient

###Chapter 8: Trees and Random Forests with Python ####Introducing decision trees

  • [x] A decision tree

####Understanding the mathematics behind decision trees

  • [x] Homogeneity
  • [x] Entropy
  • [x] Information gain
  • [x] ID3 algorithm to create a decision tree
  • [x] Gini index
  • [x] Reduction in Variance
  • [x] Pruning a tree
  • [x] Handling a continuous numerical variable
  • [x] Handling a missing value of an attribute

####Implementing a decision tree with scikit-learn

  • [x] File: decisionTreeIris.py
  • [x] Visualizing the tree
  • [x] Picture: dtree2.png
  • [x] File: dtree2.dot
  • [x] Cross-validating and pruning the decision tree

####Understanding and implementing regression trees

  • [x] File: regressionTree.py
  • [x] Regression tree algorithm
  • [x] Implementing a regression tree using Python

####Understanding and implementing random forests

  • [x] File: randomForest.py
  • [x] The random forest algorithm
  • [x] Implementing a random forest using Python
  • [x] Why do random forests work?
  • [x] Important parameters for random forests

###Chapter 9: Best Practices for Predictive Modelling ####Best practices for coding

  • [x] Commenting the codes
  • [x] Defining functions for substantial individual tasks
  • [x] Example 1
  • [x] Example 2
  • [x] Example 3
  • [x] Avoid hard-coding of variables as much as possible
  • [x] Version control
  • [x] Using standard libraries, methods, and formulas

####Best practices for data handling

####Best practices for algorithms

####Best practices for statistics

####Best practices for business contexts

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].