datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+69.57%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+117.39%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+382.61%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+121.74%)
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+4513.04%)
Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+843.48%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+14486.96%)
mmtf-sparkMethods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-13.04%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+12504.35%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+552.17%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (+21.74%)
Pyspark StubsApache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+326.09%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+400%)
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+239.13%)
Spark GotchasSpark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+1239.13%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-39.13%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+834.78%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+7382.61%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+508.7%)
seamlessSeamless is a framework to set up reproducible computations (and visualizations) that respond to changes in cells. Cells contain the input data as well as the source code of the computations, and all cells can be edited interactively.
Stars: ✭ 19 (-17.39%)
Analytics ZooDistributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Stars: ✭ 2,448 (+10543.48%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (+356.52%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+456.52%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+7639.13%)
kdtreeA pure Nim k-d tree implementation for efficient spatial querying of point data
Stars: ✭ 40 (+73.91%)
SparktorchTrain and run Pytorch models on Apache Spark.
Stars: ✭ 195 (+747.83%)
Docker SparkApache Spark docker image
Stars: ✭ 1,396 (+5969.57%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (+273.91%)
spinmobRapid and flexible acquisition, analysis, fitting, and plotting in Python. Designed for scientific laboratories.
Stars: ✭ 34 (+47.83%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+669.57%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (+260.87%)
MlflowOpen source platform for the machine learning lifecycle
Stars: ✭ 10,898 (+47282.61%)
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (+147.83%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (+139.13%)
spark-connectorA connector for Apache Spark to access Exasol
Stars: ✭ 13 (-43.48%)
workshop-sparkCódigo para workshops Spark com ambiente de desenvolvimento em docker
Stars: ✭ 27 (+17.39%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+973.91%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (+613.04%)
Sparkit LearnPySpark + Scikit-learn = Sparkit-learn
Stars: ✭ 1,073 (+4565.22%)
Spark Atlas ConnectorA Spark Atlas connector to track data lineage in Apache Atlas
Stars: ✭ 160 (+595.65%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (+117.39%)
Spark Sklearn(Deprecated) Scikit-learn integration package for Apache Spark
Stars: ✭ 1,055 (+4486.96%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (+95.65%)