All Projects → Mooseburger1 → Springboard-Data-Science-Immersive

Mooseburger1 / Springboard-Data-Science-Immersive

Licence: other
No description or website provided.

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Springboard-Data-Science-Immersive

Atsd Use Cases
Axibase Time Series Database: Usage Examples and Research Articles
Stars: ✭ 335 (+544.23%)
Mutual labels:  statistical-analysis, time-series-analysis
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-34.62%)
Mutual labels:  hadoop, pyspark
fireTS
A python multi-variate time series prediction library working with sklearn
Stars: ✭ 62 (+19.23%)
Mutual labels:  time-series-analysis, time-series-prediction
awesome-time-series
Resources for working with time series and sequence data
Stars: ✭ 178 (+242.31%)
Mutual labels:  time-series-analysis, time-series-prediction
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+188.46%)
Mutual labels:  hadoop, pyspark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+113.46%)
Mutual labels:  hadoop, pyspark
Uc Davis Cs Exams Analysis
📈 Regression and Classification with UC Davis student quiz data and exam data
Stars: ✭ 33 (-36.54%)
Mutual labels:  web-scraping, statistical-analysis
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-1.92%)
Mutual labels:  eda, pyspark
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+680.77%)
Mutual labels:  hadoop, pyspark
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-51.92%)
Mutual labels:  hadoop, pyspark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-25%)
Mutual labels:  hadoop, pyspark
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-38.46%)
Mutual labels:  hadoop, pyspark
Tensorflow-Wide-Deep-Local-Prediction
This project demonstrates how to run and save predictions locally using exported tensorflow estimator model
Stars: ✭ 28 (-46.15%)
Mutual labels:  tensorboard
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-51.92%)
Mutual labels:  pyspark
cobra-policytool
Manage Apache Atlas and Ranger configuration for your Hadoop environment.
Stars: ✭ 16 (-69.23%)
Mutual labels:  hadoop
tsa-tutorial
Material for the tutorial, "Time series analysis with pandas" at T-Academy
Stars: ✭ 21 (-59.62%)
Mutual labels:  time-series-analysis
platys-modern-data-platform
Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....
Stars: ✭ 35 (-32.69%)
Mutual labels:  hadoop
time series notebooks
My Experiments with Time Series
Stars: ✭ 20 (-61.54%)
Mutual labels:  time-series-analysis
Spark-for-data-engineers
Apache Spark for data engineers
Stars: ✭ 22 (-57.69%)
Mutual labels:  pyspark
clusterdock
clusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-50%)
Mutual labels:  hadoop

Springboard-Data-Science-Immersive

This repository will house all code, data, and files related to my work in the Springboard Data Science Immersive program. The following acts as a table of contents for the whole repository with links to the respective work cited

Capstone 1


Facilitation of Cryptocurrency Price Prediction by Sentiment Analysis

Key Skills

  • Web Scraping
  • NLP - Natural Language Processing
  • Time Series Analysis
  • Deep Neural Networks

Custom Sentiment Analysis Library Created to facilitate in Overall Sentiment Analysis on Cryptocurrency News Articles scraped form the web. Used in conjunction with historical price data, the analysis is used in a deep neural network in order to predict future pricing for a crypto coin of interest

Capstone 2


Exploring Computational Efficiency in Object Detection with Convolutional Neural Networks

Key Skills

  • Image Processing
  • Video Processing
  • H5 Storage
  • Object Oriented Programming
  • Tensorflow
  • Tensorboard
  • Convolutional Neural Networks
  • Object Detection

Exploring different image preprocessing techniques and methods in order to speed up CNN training. As a positive side effect, the transformation of original full scale data results in a smaller memory expense, both hard drive and RAM.

Clustering Methods


K-Nearest Neighbors and PCA

Key Skills

  • K-Means
  • PCA - Principle Component Analysis
  • Elbow Sum of Squares Method

Mini project on customer segmentation and being able to identify different types of customers and then figure out ways to find more of those individuals so you can get more customers! The data comes from John Foreman's book Data Smart. The dataset contains both information on marketing newsletters/e-mail campaigns (e-mail offers sent) and transaction level data from customers (which offer customers responded to and what they bought).

Exploratory Data Analysis' (EDA)


Hospital Readmittance Data

Human Temperature Data

Racial Discrimination Data

Key Skills

  • Central Limit Theorem
  • Statistical Analysis
  • Data Visualization
  • z-test
  • t-test
  • Margin of Error (MOE)
  • Chi-Squared Test
  • Bootstrap Statistics

Several EDA's performed on varying data categories. Hospital Readmittance performs a statistical analysis on a previously done analysis to critique its validity. Human Temperature EDA uses bootstrap statistics to determine the true average temperature of the human body in both male and females. Racial Discrimination performs a statistical analysis on if race has a meaningful impact on the callback rate of candidates who have submitted resumes to jobs of interest.

Machine Learning Algorithms

Linear Regression

Logistic Regression

Naive Bayes

Key Skills

  • Logistic Regression
  • Linear Regression
  • Naive Bayes

Performing several Machine Learning Algorithms in miniprojects such as: Labeling an obersvation as either male or female based on height and weight data (Logistic Regression), Regression Price Estimate on Boston Housing data using Linear Regression, and predicting movie reviews with Naive Bayes Models

PYSPARK

MapReduce with Pyspark

Performing several exercises utlitizing MapReduce Pyspark (RDD) with a touch of MLlib

Key Skills

  • Pyspark
  • RDD
  • Spark Dataframes

SQL

Yammer SQL Case Study

Key Skills

  • SQL
  • Time Series Analysis
  • Applied Plotting and Charting

This is a SQL case study as proposed from Mode Analytics at https://modeanalytics.com/. The Jupyter notebook in this repository is a cleaned up verison of the original case study which contains all original SQL queries, and can be found here: https://modeanalytics.com/mooseburger/reports/14cbbb5670b8

JSON

Data Wrangling with JSON

Key Skills

  • JSON Manipulation and Extraction
  • Applied Plotting and Charting

An exercise of data extraction and exploration utilizing a JSON data source

Take Home Data Challenges

Relax Chalenge

Ultimate Challenge Parts 1 & 2

Ultimate Challange Part 3

Key Skills

  • Full Stack Data Scientist

Relax Challenge - Defining an "adopted user" as a user who has logged into a product on three separate days in at least one seven-day period, identify which factors predict future user adoption. You are given two datasets

  1. A user table ("takehome_users") with data on 12,000 users who signed up for the product in the last two years
  2. A usage summary table ("takehome_user_engagement") that has a row for each day that a user logged into the product.

Ultimate Challenge

  • Part 1 ‐ Exploratory data analysis
  • Part 2 ‐ Experiment and metrics design
  • Part 3 - Predictive Modelling
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].