All Projects → ehsanmok → Sparkling Titanic

ehsanmok / Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Sparkling Titanic

ODSC India 2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (+116.67%)
Mutual labels:  spark, pyspark
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (+91.67%)
Mutual labels:  spark, pyspark
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (+316.67%)
Mutual labels:  spark, pyspark
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+1566.67%)
Mutual labels:  spark, pyspark
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (+108.33%)
Mutual labels:  spark, pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+24058.33%)
Mutual labels:  spark, pyspark
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+20391.67%)
Mutual labels:  spark, pyspark
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+19258.33%)
Mutual labels:  spark, pyspark
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (+183.33%)
Mutual labels:  spark, pyspark
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (+108.33%)
Mutual labels:  spark, pyspark
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+20883.33%)
Mutual labels:  spark, pyspark
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+5175%)
Mutual labels:  spark, pyspark
Spark Iforest
Isolation Forest on Spark
Stars: ✭ 166 (+1283.33%)
Mutual labels:  spark, pyspark
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+1700%)
Mutual labels:  spark, pyspark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+1275%)
Mutual labels:  spark, pyspark
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (+166.67%)
Mutual labels:  spark, pyspark
Learningapachespark
LearningApacheSpark
Stars: ✭ 155 (+1191.67%)
Mutual labels:  spark, pyspark
Handyspark
HandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+1216.67%)
Mutual labels:  spark, pyspark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+825%)
Mutual labels:  spark, pyspark
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+3283.33%)
Mutual labels:  spark, pyspark

Sparkling Titanic

Introduction

titanic_logReg.py trains a Logistic Regression and makes prediction for Titanic dataset as part of Kaggle competition using Apache-Spark spark-1.3.1-bin-hadoop2.4 with its Python API on a local machine. I used pyspark_csv.py to load data as Spark DataFrame, for more instructions see this.

The following will be added later

  • Imputing NAs in train and test sets
  • Cross-validation
  • Using more features and feature engineering
  • RandomForest classifier, SVM, etc.

Running PySpark Script in Shell

Use $SPARK_HOME/bin/spark-submit scriptDirectoryPath/titanic_logReg.py. For multithreading, you can add the option --master local[N] where N is the number of threads.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].