Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+194.12%)
olliePyOlliePy is a python package which can help data scientists in exploring their data and evaluating and analysing their machine learning experiments by utilising the power and structure of modern web applications. The data scientist only needs to provide the data and any required information and OlliePy will generate the rest.
Stars: ✭ 46 (-9.8%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-23.53%)
ScattertextBeautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+3276.47%)
typed-preludeReliable, standards-oriented software for browsers & Node.
Stars: ✭ 48 (-5.88%)
DataprepDataPrep — The easiest way to prepare data in Python
Stars: ✭ 639 (+1152.94%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+117.65%)
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+1980.39%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+5584.31%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-1.96%)
Pyspark StubsApache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+92.16%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+83496.08%)
Data Describedata⎰describe: Pythonic EDA Accelerator for Data Science
Stars: ✭ 269 (+427.45%)
SweetvizVisualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+3529.41%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+6478.43%)
Spark GotchasSpark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+503.92%)
Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+325.49%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-45.1%)
100 Days Of Ml CodeA day to day plan for this challenge. Covers both theoritical and practical aspects
Stars: ✭ 172 (+237.25%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+125.49%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-72.55%)
Inspectdf🛠️ 📊 Tools for Exploring and Comparing Data Frames
Stars: ✭ 195 (+282.35%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-54.9%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+174.51%)
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+52.94%)
streamsx.kafkaRepository for integration with Apache Kafka
Stars: ✭ 13 (-74.51%)
Autoeda ResourcesA list of software and papers related to automatic and fast Exploratory Data Analysis
Stars: ✭ 268 (+425.49%)
skimpyskimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
Stars: ✭ 236 (+362.75%)
Hn so analysisIs there a relationship between popularity of a given technology on Stack Overflow (SO) and Hacker News (HN)? And a few words about causality
Stars: ✭ 94 (+84.31%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+16231.37%)
leilaLibrería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (+9.8%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+209.8%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (+62.75%)
dqlab-career-trackA collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (+3.92%)
datartDatart is a next generation Data Visualization Open Platform
Stars: ✭ 1,042 (+1943.14%)
alc-siteThe web site of the ALC Beijing (Apache Local Community Beijing)
Stars: ✭ 75 (+47.06%)
OSCIOpen Source Contributor Index
Stars: ✭ 107 (+109.8%)
oshinko-s2iThis is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-68.63%)
Easy-HotSpotEasy HotSpot is a super easy WiFi hotspot user management utility for Mikrotik RouterOS based Router devices. Voucher printing in 6 ready made templates are available. Can be installed in any PHP/MySql enabled servers locally or in Internet web servers. Uses the PHP PEAR2 API Client by boenrobot.
Stars: ✭ 45 (-11.76%)
hack-cs-toolsclient side (C-S) penetration toolkit
Stars: ✭ 111 (+117.65%)
Data-Analyst-NanodegreeThis repo consists of the projects that I completed as a part of the Udacity's Data Analyst Nanodegree's curriculum.
Stars: ✭ 13 (-74.51%)
spydrnetA flexible framework for analyzing and transforming FPGA netlists. Official repository.
Stars: ✭ 49 (-3.92%)
Effortless-SPIFFSA class designed to make reading and storing data on the ESP8266 and ESP32 effortless
Stars: ✭ 27 (-47.06%)
UnityCommonA collection of common frameworks and tools for Unity-based projects
Stars: ✭ 61 (+19.61%)
eyy-indexerAn image and video friendly directory indexer for web directories.
Stars: ✭ 53 (+3.92%)
greycatGreyCat - Data Analytics, Temporal data, What-if, Live machine learning
Stars: ✭ 104 (+103.92%)