MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

Stars: ✭ 253 (+336.21%)

Mutual labels: pyspark

spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …

Stars: ✭ 23 (-60.34%)

Mutual labels: pyspark

Relation extraction

Relation Extraction using Deep learning(CNN)

Stars: ✭ 96 (+65.52%)

Mutual labels: pyspark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+272.41%)

Mutual labels: pyspark

Pyspark Tutorial

PySpark Code for Hands-on Learners

Stars: ✭ 91 (+56.9%)

Mutual labels: pyspark

Spark python ml examples

Spark 2.0 Python Machine Learning examples

Stars: ✭ 87 (+50%)

Mutual labels: pyspark

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+244.83%)

Mutual labels: pyspark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (+8.62%)

Mutual labels: pyspark

Awesome Spark

A curated list of awesome Apache Spark packages and resources.

Stars: ✭ 1,061 (+1729.31%)

Mutual labels: pyspark

NBi

NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…

Stars: ✭ 102 (+75.86%)

Mutual labels: data-quality

optimus

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Stars: ✭ 1,351 (+2229.31%)

Mutual labels: pyspark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+186.21%)

Mutual labels: pyspark

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+1544.83%)

Mutual labels: pyspark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-79.31%)

Mutual labels: pyspark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+3905.17%)

Mutual labels: pyspark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-60.34%)

Mutual labels: pyspark

airflow-dbt-python

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.

Stars: ✭ 111 (+91.38%)

Mutual labels: data-engineering

Pyspark Cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Stars: ✭ 108 (+86.21%)

Mutual labels: pyspark

pyspark-cassandra

pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4

Stars: ✭ 70 (+20.69%)

Mutual labels: pyspark

Pyspark Stubs

Apache (Py)Spark type annotations (stub files).

Stars: ✭ 98 (+68.97%)

Mutual labels: pyspark

Quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

Stars: ✭ 217 (+274.14%)

Mutual labels: pyspark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+2206.9%)

Mutual labels: pyspark

jgit-spark-connector

jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.

Stars: ✭ 71 (+22.41%)

Mutual labels: pyspark

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (+56.9%)

Mutual labels: pyspark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+4898.28%)

Mutual labels: pyspark

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (+10.34%)

Mutual labels: pyspark

ohsome-quality-analyst

Data quality estimations for OpenStreetMap

Stars: ✭ 28 (-51.72%)

Mutual labels: data-quality

Petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Stars: ✭ 1,108 (+1810.34%)

Mutual labels: pyspark

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+4241.38%)

Mutual labels: pyspark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+1600%)

Mutual labels: pyspark

qsv

CSVs sliced, diced & analyzed.

Stars: ✭ 438 (+655.17%)

Mutual labels: data-engineering

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-75.86%)

Mutual labels: pyspark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (+184.48%)

Mutual labels: pyspark

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (-58.62%)

Mutual labels: pyspark

hive compared bq

hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.

Stars: ✭ 27 (-53.45%)

Mutual labels: data-quality

Cluster Pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

Stars: ✭ 23 (-60.34%)

Mutual labels: pyspark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (+172.41%)

Mutual labels: pyspark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+1100%)

Mutual labels: pyspark

AirflowETL

Blog post on ETL pipelines with Airflow

Stars: ✭ 20 (-65.52%)

Mutual labels: data-engineering

workshop-spark

Código para workshops Spark com ambiente de desenvolvimento em docker

Stars: ✭ 27 (-53.45%)

Mutual labels: pyspark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (+167.24%)

Mutual labels: pyspark

Spark Syntax

This is a repo documenting the best practices in PySpark.

Stars: ✭ 412 (+610.34%)

Mutual labels: pyspark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+600%)

Mutual labels: pyspark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+158.62%)

Mutual labels: pyspark

Pyspark Boilerplate

A boilerplate for writing PySpark Jobs

Stars: ✭ 318 (+448.28%)

Mutual labels: pyspark

Spark Gotchas

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks