Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-87.76%)
HAL-9000Automatically setup a productive development environment with Ansible on macOS
Stars: ✭ 72 (-26.53%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-48.98%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-75.51%)
MlflowOpen source platform for the machine learning lifecycle
Stars: ✭ 10,898 (+11020.41%)
spark-utilsBasic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-74.49%)
Cluster PackA library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-76.53%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+2409.18%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-67.35%)
SparklyrR interface for Apache Spark
Stars: ✭ 775 (+690.82%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-7.14%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+610.2%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+238.78%)
SparkTwitterAnalysisAn Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-70.41%)
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+525.51%)
dlsaDistributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-74.49%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-35.71%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-83.67%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+383.67%)
spark-operatorOperator for managing the Spark clusters on Kubernetes and OpenShift.
Stars: ✭ 129 (+31.63%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (+327.55%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-82.65%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1265.31%)
Spark SyntaxThis is a repo documenting the best practices in PySpark.
Stars: ✭ 412 (+320.41%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+906.12%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-70.41%)
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-41.84%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+275.51%)
hyperdriveExtensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (-68.37%)
parquet-dotnet🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+103.06%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-75.51%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-43.88%)
Spark FlamegraphEasy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-69.39%)
Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+3285.71%)
phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+17.35%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-12.24%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+213.27%)
cloud-integrationSpark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-79.59%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-67.35%)
Datahacksummit 2017Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark
Stars: ✭ 30 (-69.39%)
MistServerless proxy for Spark cluster
Stars: ✭ 309 (+215.31%)
python mozetlETL jobs for Firefox Telemetry
Stars: ✭ 25 (-74.49%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-31.63%)
cejaPySpark phonetic and string matching algorithms
Stars: ✭ 24 (-75.51%)
Sparkit LearnPySpark + Scikit-learn = Sparkit-learn
Stars: ✭ 1,073 (+994.9%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+873.47%)
MorpheusMorpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (+209.18%)